Vishal SQL Blog

Thursday, March 17, 2011

50+ New Features of SQL Server 2008

Transparent Data Encryption
Enable encryption of an entire database, data files, or log files, without the need for application changes. Benefits of this include: Search encrypted data using both range and fuzzy searches, search secure data from unauthorized users, and data encryption without any required changes in existing applications.
Extensible Key Management

SQL Server 2005 provides a comprehensive solution for encryption and key management. SQL Server 2008 delivers an excellent solution to this growing need by supporting third-party key management and HSM products.
Auditing

Create and manage auditing via DDL, while simplifying compliance by providing more comprehensive data auditing. This enables organizations to answer common questions, such as, "What data was retrieved?"
Enhanced Database Mirroring
SQL Server 2008 builds on SQL Server 2005 by providing a more reliable platform that has enhanced database mirroring, including automatic page repair, improved performance, and enhanced supportability.
Automatic Recovery of Data Pages
SQL Server 2008 enables the principal and mirror machines to transparently recover from 823/824 types of data page errors by requesting a fresh copy of the suspect page from the mirroring partner transparently to end users and applications.
Log Stream Compression
Database mirroring requires data transmissions between the participants of the mirroring implementations. With SQL Server 2008, compression of the outgoing log stream between the participants delivers optimal performance and minimizes the network bandwidth used by database mirroring.
Resource Governor
Provide a consistent and predictable response to end users with the introduction of Resource Governor, allowing organizations to define resource limits and priorities for different workloads, which enable concurrent workloads to provide consistent performance to their end users.
Predictable Query Performance
Enable greater query performance stability and predictability by providing functionality to lock down query plans, enabling organizations to promote stable query plans across hardware server replacements, server upgrades, and production deployments.
Data Compression
Enable data to be stored more effectively, and reduce the storage requirements for your data. Data compression also provides significant performance improvements for large I/O bound workloads, like data warehousing.
Hot Add CPU
Dynamically scale a database on demand by allowing CPU resources to be added to SQL Server 2008 on supported hardware platforms without forcing any downtime on applications. Note that SQL Server already supports the ability to add memory resources online.
Policy-Based Management
Policy-Based Management is a policy-based system for managing one or more instances of SQL Server 2008. Use this with SQL Server Management Studio to create policies that manage entities on the server, such as the instance of SQL Server, databases, and other SQL Server objects.
Streamlined Installation
SQL Server 2008 introduces significant improvements to the service life cycle for SQL Server through the re-engineering of the installation, setup, and configuration architecture. These improvements separate the installation of the physical bits on the hardware from the configuration of the SQL Server software, enabling organizations and software partners to provide recommended installation configurations.
Performance Data Collection
Performance tuning and troubleshooting are time-consuming tasks for the administrator. To provide actionable performance insights to administrators, SQL Server 2008 includes more extensive performance data collection, a new centralized data repository for storing performance data, and new tools for reporting and monitoring.
Language Integrated Query (LINQ)
Enable developers to issue queries against data, using a managed programming language, such as C# or VB.NET, instead of SQL statements. Enable seamless, strongly typed, set-oriented queries written in .NET languages to run against ADO.NET (LINQ to SQL), ADO.NET DataSets (LINQ to DataSets), the ADO.NET Entity Framework (LINQ to Entities), and to the Entity Data Service Mapping provider. Use the new LINQ to SQL provider that enables developers to use LINQ directly on SQL Server 2008 tables and columns.
ADO.NET Data Services
The Object Services layer of ADO.NET enables the materialization, change tracking, and persistence of data as CLR objects. Developers using the ADO.NET framework can program against a database, using CLR objects that are managed by ADO.NET. SQL Server 2008 introduces more efficient, optimized support that improves performance and simplifies development.
DATE/TIME
SQL Server 2008 introduces new date and time data types:
DATE—A date-only type
TIME—A time-only type
DATETIMEOFFSET—A time-zone-aware datetime type
DATETIME2—A datetime type with larger fractional seconds and year range than the existing DATETIME type
The new data types enable applications to have separate data and time types while providing large data ranges or user defined precision for time values.
HIERARCHY ID
Enable database applications to model tree structures in a more efficient way than currently possible. New system type HierarchyId can store values that represent nodes in a hierarchy tree. This new type will be implemented as a CLR UDT, and will expose several efficient and useful built-in methods for creating and operating on hierarchy nodes with a flexible programming model.
FILESTREAM Data
Allow large binary data to be stored directly in an NTFS file system, while preserving an integral part of the database and maintaining transactional consistency. Enable the scale-out of large binary data traditionally managed by the database to be stored outside the database on more cost-effective storage without compromise.
Integrated Full Text Search
Integrated Full Text Search makes the transition between Text Search and relational data seamless, while enabling users to use the Text Indexes to perform high-speed text searches on large text columns.
Sparse Columns
NULL data consumes no physical space, providing a highly efficient way of managing empty data in a database. For example, Sparse Columns allows object models that typically have numerous null values to be stored in a SQL Server 2005 database without experiencing large space costs.
Large User-Defined Types
SQL Server 2008 eliminates the 8-KB limit for User-Defined Types (UDTs), allowing users to dramatically expand the size of their UDTs.
Spatial Data Types
Build spatial capabilities into your applications by using the support for spatial data.
Implement Round Earth solutions with the geography data type. Use latitude and longitude coordinates to define areas on the Earth's surface.
Implement Flat Earth solutions with the geometry data type. Store polygons, points, and lines that are associated with projected planar surfaces and naturally planar data, such as interior spaces.

Backup Compression
Keeping disk-based backups online is expensive and time-consuming. With SQL Server 2008 backup compression, less storage is required to keep backups online, and backups run significantly faster since less disk I/O is required.
Partitioned Table Parallelism
Partitions enable organizations to manage large growing tables more effectively by transparently breaking them into manageable blocks of data. SQL Server 2008 builds on the advances of partitioning in SQL Server 2005 by improving the performance on large partitioned tables.
Star Join Query Optimizations
SQL Server 2008 provides improved query performance for common data warehouse scenarios. Star Join Query optimizations reduce query response time by recognizing data warehouse join patterns.
Grouping Sets
Grouping Sets is an extension to the GROUP BY clause that lets users define multiple groupings in the same query. Grouping Sets produces a single result set that is equivalent to a UNION ALL of differently grouped rows, making aggregation querying and reporting easier and faster.
Change Data Capture
With Change Data Capture, changes are captured and placed in change tables. It captures complete content of changes, maintains cross-table consistency, and even works across schema changes. This enables organizations to integrate the latest information into the data warehouse.
MERGE SQL Statement
With the introduction of the MERGE SQL Statement, developers can more effectively handle common data warehousing scenarios, like checking whether a row exists, and then executing an insert or update.
SQL Server Integration Services (SSIS) Pipeline Improvements
Data Integration packages can now scale more effectively, making use of available resources and managing the largest enterprise-scale workloads. The new design improves the scalability of runtime into multiple processors.
SQL Server Integration Services (SSIS) Persistent Lookups
The need to perform lookups is one of the most common ETL operations. This is especially prevalent in data warehousing, where fact records need to use lookups to transform business keys to their corresponding surrogates. SSIS increases the performance of lookups to support the largest tables.

Analysis Scale and Performance
SQL Server 2008 drives broader analysis with enhanced analytical capabilities and with more complex computations and aggregations. New cube design tools help users streamline the development of the analysis infrastructure enabling them to build solutions for optimized performance.
Block Computations
Block Computations provides a significant improvement in processing performance enabling users to increase the depth of their hierarchies and complexity of the computations.
Writeback
New MOLAP enabled writeback capabilities in SQL Server 2008 Analysis Services removes the need to query ROLAP partitions. This provides users with enhanced writeback scenarios from within analytical applications without sacrificing the traditional OLAP performance.
Enterprise Reporting Engine
Reports can easily be delivered throughout the organization, both internally and externally, with simplified deployment and configuration. This enables users to easily create and share reports of any size and complexity.
Internet Report Deployment
Customers and suppliers can effortlessly be reached by deploying reports over the Internet.
Manage Reporting Infrastructure
Increase supportability and the ability to control server behaviour with memory management, infrastructure consolidation, and easier configuration through a centralized store and API for all configuration settings.
Report Builder Enhancements
Easily build ad-hoc and author reports with any structure through Report Designer.
Forms Authentication Support
Support for Forms authentication enables users to choose between Windows and Forms authentication.
Report Server Application Embedding
Report Server application embedding enables the URLs in reports and subscriptions to point back to front-end applications.
Microsoft Office Integration
SQL Server 2008 provides new Word rendering that enables users to consume reports directly from within Microsoft Office Word. In addition, the existing Excel renderer has been greatly enhanced to accommodate the support of features, like nested data regions, sub-reports, as well as merged cell improvements. This lets users maintain layout fidelity and improves the overall consumption of reports from Microsoft Office applications.
Predictive Analysis
SQL Server Analysis Services continues to deliver advanced data mining technologies. Better Time Series support extends forecasting capabilities. Enhanced Mining Structures deliver more flexibility to perform focused analysis through filtering as well as to deliver complete information in reports beyond the scope of the mining model. New cross-validation enables confirmation of both accuracy and stability for results that you can trust. Furthermore, the new features delivered with SQL Server 2008 Data Mining Add-ins for Office 2007 empower every user in the organization with even more actionable insight at the desktop.

Sunday, December 09, 2007

SQL Server 2008 (catmai) new features

Security/Auditing
- Transparent Data Encryption (encryption while data is 'still' on disk, transparent to applications)
- External Key Management (Consolidation of key management, integration with external products)
- Data Auditing (1st-class 'AUDIT' objects; DDL support; audit objects, principals, data, etc.; support for multiple logging targets)

Availability/Reliability
- Pluggable CPU support
- Enhanced Database Mirroring (compression of mirror streams, enhanced performance, automatic page-level repair for principal/mirror)

Performance
- Data compression (easy to enable/disable online, more efficient data storage (this is NOT traditional data compression))
- Backup stream compression (server level control or backup statement control, all backup types)
- Performance data collection (single, common framework for data collection, reporting, and storage/warehousing)
- Improved Plan Guide support (plan freezing, pull plans directly from plan cache, SSMS integration, etc.)
- Resource Governor (create pools and groups to govern, define classifications based on built-in functions, segment resource utilization amoung groups)

Management
- Policy-based management framework (manage via policies vs. scripts, enterprise-wide support, automated monitoring/enforcement, etc.)
- Integrate with Microsoft System Center
- Extended Events (high perf lightweight tracing infrastructure, NOT sql trace, integrated with ETW, unprecidented insight into goings-on)

Development Enhancements
- Improved datetime datatypes (100th nanosecond precision (7 digits past second), time-zone datetime offset, date only, time only)
- HierarchyID datatype (hierarchical-aware data type, ORDPath values, built-in functions, methods, etc.)
- Entity Data Model support (develop 'business entities' vs. tables, model complex relationships, retrieve entities vs. rows/columns)
- LINQ
- Sql Server Change Tracking (Change Data Capture, get 'diff' data changes WITHOUT a comparible value (i.e. datetime, timestamp, etc.))
- Table Valued Parameters
- MERGE statement ('upsert' data, also includes deletion functionality)
- Large UDT's (no more 8000 byte limit on CLR-based UDTs, no more 8000 byte limit for UDA's)
- Spatial data (GEOMETRY and GEOGRAPHY data types, built-in spatial function support, spatial indexes)
- XML enhancements (support for lax validation, office 12 support, xs:dateTime support, lists/union types, LET FLOWR support, etc.)
- Inline initialization and compound assignment

Service Broker
- New UI and Tools for working with (add/drop/edit functionality within SSMS, Diag tools, )
- Conversation Priority (set message ordering, send/receive impact, 1-10 levels)

Data Storage
- Data compression (see above)
- FILESTREAM attribute (get the 'best of both' functionality from BLOBs in the DB vs. BLOBs on filesystem, no more "to blob or not to blob")
- Integrated Full Text Search (FTS fully integrated into DB engine, no external storage, no external service, more efficient and reliable costing)
- Sparse columns (more efficient storage for 'wide' tables with many columns that repeat and don't contain data)
- New index types (spatial indexes, hierarchical indexes, FILTERED indexes (indexes on filtered values within columns), etc.)

Data Warehousing/ETL
- Partitioned Table Parallelism (no more thread limit per partition)
- Star Join support (no special syntax, optimizer based, full backward syntax support)
- Data compression (see above)
- Resource Governor (see above)
- Persistent Lookups in SSIS (no more re-querying for lookup operators, cache lookups in multiple ways, persist lookups to disk)
- Improved thread scheduling in SSIS (shared thread pool, pipeline parallelism)
- Change Data Capture (see above)
- MERGE statement (see above, great uses with slowly changing dimensions)
- Scale-out analysis services (read-only storage supports multiple AS servers)
- Subspace computations
- New Tools for Cube design
- Best Practice Design Alerting
- Backup cubes with better scalability
- Data-mining add-ins for Excell

Reporting
- IIS Agnostic Reporting Services Deployment (no IIS required to run RS any longer)
- Rich-text support
- Enhanced visualiztion (graphing)
- New Word rendering (render reports to Microsoft Word)

Deprecation
- Many 'old' features ARE REMOVED/GONE (those that have been deprecated for some time - 60/65/70 compat modes, nolog / truncateonly syntax, etc.)

Sunday, September 30, 2007

ISOLATION LEVEL - SQL Server 2005

SET TRANSACTION ISOLATION LEVEL
{ READ UNCOMMITTED
| READ COMMITTED
| REPEATABLE READ
| SNAPSHOT
| SERIALIZABLE
}
[ ; ]

READ UNCOMMITTED
Specifies that statements can read rows that have been modified by other transactions but not yet committed.

Transactions running at the READ UNCOMMITTED level do not issue shared locks to prevent other transactions from modifying data read by the current transaction. READ UNCOMMITTED transactions are also not blocked by exclusive locks that would prevent the current transaction from reading rows that have been modified but not committed by other transactions. When this option is set, it is possible to read uncommitted modifications, which are called dirty reads. Values in the data can be changed and rows can appear or disappear in the data set before the end of the transaction. This option has the same effect as setting NOLOCK on all tables in all SELECT statements in a transaction. This is the least restrictive of the isolation levels.

In SQL Server 2005, you can also minimize locking contention while protecting transactions from dirty reads of uncommitted data modifications using either:

The READ COMMITTED isolation level with the READ_COMMITTED_SNAPSHOT database option set to ON.

The SNAPSHOT isolation level.

READ COMMITTED
Specifies that statements cannot read data that has been modified but not committed by other transactions. This prevents dirty reads. Data can be changed by other transactions between individual statements within the current transaction, resulting in nonrepeatable reads or phantom data. This option is the SQL Server default.

The behavior of READ COMMITTED depends on the setting of the READ_COMMITTED_SNAPSHOT database option:

If READ_COMMITTED_SNAPSHOT is set to OFF (the default), the Database Engine uses shared locks to prevent other transactions from modifying rows while the current transaction is running a read operation. The shared locks also block the statement from reading rows modified by other transactions until the other transaction is completed. The shared locks are released when the statement completes.

If READ_COMMITTED_SNAPSHOT is set to ON, the Database Engine uses row versioning to present each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement. Locks are not used to protect the data from updates by other transactions.

When the READ_COMMITTED_SNAPSHOT database option is ON, you can use the READCOMMITTEDLOCK table hint to request shared locking instead of row versioning for individual statements in transactions running at the READ_COMMITTED isolation level.

Note:
When you set the READ_COMMITTED_SNAPSHOT option, only the connection executing the ALTER DATABASE command is allowed in the database. There must be no other open connection in the database until ALTER DATABASE is complete. The database does not have to be in single-user mode.

REPEATABLE READ
Specifies that statements cannot read data that has been modified but not yet committed by other transactions and that no other transactions can modify data that has been read by the current transaction until the current transaction completes.

Shared locks are placed on all data read by each statement in the transaction and are held until the transaction completes. This prevents other transactions from modifying any rows that have been read by the current transaction. Other transactions can insert new rows that match the search conditions of statements issued by the current transaction. If the current transaction then retries the statement it will retrieve the new rows, which results in phantom reads. Because shared locks are held to the end of a transaction instead of being released at the end of each statement, concurrency is lower than the default READ COMMITTED isolation level. Use this option only when necessary.

SNAPSHOT
Specifies that data read by any statement in a transaction will be the transactionally consistent version of the data that existed at the start of the transaction. The transaction can only recognize data modifications that were committed before the start of the transaction. Data modifications made by other transactions after the start of the current transaction are not visible to statements executing in the current transaction. The effect is as if the statements in a transaction get a snapshot of the committed data as it existed at the start of the transaction.

Except when a database is being recovered, SNAPSHOT transactions do not request locks when reading data. SNAPSHOT transactions reading data do not block other transactions from writing data. Transactions writing data do not block SNAPSHOT transactions from reading data.

During the roll-back phase of a database recovery, SNAPSHOT transactions will request a lock if an attempt is made to read data that is locked by another transaction that is being rolled back. The SNAPSHOT transaction is blocked until that transaction has been rolled back. The lock is released immediately after it has been granted.

The ALLOW_SNAPSHOT_ISOLATION database option must be set to ON before you can start a transaction that uses the SNAPSHOT isolation level. If a transaction using the SNAPSHOT isolation level accesses data in multiple databases, ALLOW_SNAPSHOT_ISOLATION must be set to ON in each database.

A transaction cannot be set to SNAPSHOT isolation level that started with another isolation level; doing so will cause the transaction to abort. If a transaction starts in the SNAPSHOT isolation level, you can change it to another isolation level and then back to SNAPSHOT. A transaction starts the first time it accesses data.

A transaction running under SNAPSHOT isolation level can view changes made by that transaction. For example, if the transaction performs an UPDATE on a table and then issues a SELECT statement against the same table, the modified data will be included in the result set.

SERIALIZABLE
Specifies the following:

Statements cannot read data that has been modified but not yet committed by other transactions.

No other transactions can modify data that has been read by the current transaction until the current transaction completes.

Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current transaction until the current transaction completes.

Range locks are placed in the range of key values that match the search conditions of each statement executed in a transaction. This blocks other transactions from updating or inserting any rows that would qualify for any of the statements executed by the current transaction. This means that if any of the statements in a transaction are executed a second time, they will read the same set of rows. The range locks are held until the transaction completes. This is the most restrictive of the isolation levels because it locks entire ranges of keys and holds the locks until the transaction completes. Because concurrency is lower, use this option only when necessary. This option has the same effect as setting HOLDLOCK on all tables in all SELECT statements in a transaction.

Remarks

Only one of the isolation level options can be set at a time, and it remains set for that connection until it is explicitly changed. All read operations performed within the transaction operate under the rules for the specified isolation level unless a table hint in the FROM clause of a statement specifies different locking or versioning behavior for a table.

The transaction isolation levels define the type of locks acquired on read operations. Shared locks acquired for READ COMMITTED or REPEATABLE READ are generally row locks, although the row locks can be escalated to page or table locks if a significant number of the rows in a page or table are referenced by the read. If a row is modified by the transaction after it has been read, the transaction acquires an exclusive lock to protect that row, and the exclusive lock is retained until the transaction completes. For example, if a REPEATABLE READ transaction has a shared lock on a row, and the transaction then modifies the row, the shared row lock is converted to an exclusive row lock.

With one exception, you can switch from one isolation level to another at any time during a transaction. The exception occurs when changing from any isolation level to SNAPSHOT isolation. Doing this causes the transaction to fail and roll back. However, you can change a transaction started in SNAPSHOT isolation to any other isolation level.

When you change a transaction from one isolation level to another, resources that are read after the change are protected according to the rules of the new level. Resources that are read before the change continue to be protected according to the rules of the previous level. For example, if a transaction changed from READ COMMITTED to SERIALIZABLE, the shared locks acquired after the change are now held until the end of the transaction.

If you issue SET TRANSACTION ISOLATION LEVEL in a stored procedure or trigger, when the object returns control the isolation level is reset to the level in effect when the object was invoked. For example, if you set REPEATABLE READ in a batch, and the batch then calls a stored procedure that sets the isolation level to SERIALIZABLE, the isolation level setting reverts to REPEATABLE READ when the stored procedure returns control to the batch.

Note:
User-defined functions and common language runtime (CLR) user-defined types cannot execute SET TRANSACTION ISOLATION LEVEL. However, you can override the isolation level by using a table hint. For more information, see Table Hint (Transact-SQL).

When you use sp_bindsession to bind two sessions, each session retains its isolation level setting. Using SET TRANSACTION ISOLATION LEVEL to change the isolation level setting of one session does not affect the setting of any other sessions bound to it.

SET TRANSACTION ISOLATION LEVEL takes effect at execute or run time, and not at parse time.

Optimized bulk load operations on heaps block queries that are running under the following isolation levels:

SNAPSHOT

READ UNCOMMITTED

READ COMMITTED using row versioning

Conversely, queries that run under these isolation levels block optimized bulk load operations on heaps. For more information about bulk load operations, see About Bulk Import and Bulk Export Operations and Optimizing Bulk Import Performance.

Examples

The following example sets the TRANSACTION ISOLATION LEVEL for the session. For each Transact-SQL statement that follows, SQL Server holds all of the shared locks until the end of the transaction.

Copy Code
USE AdventureWorks;
GO
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
GO
BEGIN TRANSACTION;
GO
SELECT *
FROM HumanResources.EmployeePayHistory;
GO
SELECT *
FROM HumanResources.Department;
GO
COMMIT TRANSACTION;
GO

Saturday, September 29, 2007

SQL Server Architecture - 2000

Locking - 2000

The Lock Manager

SQL Server can lock data using several different modes. For example, read operations acquire shared locks and write operations acquire exclusive locks. Update locks are acquired during the initial portion of an update operation when the data is read. The SQL Server lock manager acquires and releases these locks. It also manages compatibility between lock modes, resolves deadlocks, and escalates locks if necessary. It controls locks on tables, on the pages of a table, on index keys, and on individual rows of data. Locks can also be held on system data—data that's private to the database system, such as page headers and indexes.

The lock manager provides two separate locking systems. The first system affects all fully shared data and provides row locks, page locks, and table locks for tables, data pages, text pages, and leaf-level index pages. The second system is used internally for index concurrency control, controlling access to internal data structures, and retrieving individual rows of data pages. This second system uses latches, which are less resource intensive than locks and provide performance optimization. You could use full-blown locks for all locking, but because of their complexity, they would slow down the system if they were used for all internal needs. If you examine locks using the sp_lock system stored procedure or a similar mechanism that gets information from the syslockinfo table, you cannot see latches—you see only information about locks for fully shared data.

Another way to look at the difference between locks and latches is that locks ensure the logical consistency of the data and latches ensure the physical consistency. Latching happens when you place a row physically on a page or move data in other ways, such as compressing the space on a page. SQL Server must guarantee that this data movement can happen without interference.

The Lock Manager and Isolation Levels

SQL Server supports all four transaction isolation levels specified by ANSI and ISO: Serializable, Repeatable Read, Read Committed, and Read Uncommitted.
To achieve the Serializable isolation level, you must prevent phantoms, because the transaction's behavior must be identical to what would have occurred had the transaction run on a single-user system. SQL Server provides serializability, which you can set using SET TRANSACTION ISOLATION LEVEL SERIALIZABLE. To support serializability, SQL Server locks index ranges using a special type of lock called a key-range lock. Such locks are held until the end of the transaction in order to prevent phantoms. If no index exists, the lock manager uses a table lock to guarantee serializability.
The lock manager provides fairly standard two-phase locking services. Although the names are similar, two-phase locking (2PL) and the two-phase commit (2PC) protocol are not directly related other than by the obvious fact that 2PC must use 2PL services. In 2PL, a transaction has a "growing" phase, during which it acquires locks, and a "shrinking" phase, during which it releases locks. To achieve serializability, all acquired locks are held until the end of the transaction and then are dropped all at once.

For a lower isolation level, such as Committed Read, locks can be released sooner—when the use of the object is completed. For example, if a range of data is being queried in the table, there will probably be shared locks outstanding. With Committed Read isolation, a shared lock is released as soon as the scan moves off one piece of data and onto the next. Exclusive locks, on the other hand, are always held until the end of the transaction so that the transaction can be rolled back if necessary.

With Serializable or Repeatable Read isolation, shared locks must be held until the end of the transaction to guarantee that the data that was read will not change or that new rows that meet the criteria of the query cannot be added while the transaction is in progress. Like shared locks, latches are not tied to the boundaries of a transaction because they are used to provide mutual exclusion (mutex) functionality rather than to directly lock data. For example, during a row insert in a table with a clustered index, the nearby index page is latched to prevent other inserts from colliding. The latches are needed to provide mutual exclusion only during long periods of time (that is, periods with more than a few instruction cycles).

Spinlocks

For shorter-term needs, SQL Server achieves mutual exclusion using a latch implemented with a spinlock. Spinlocks are used purely for mutual exclusion and never to lock user data. They are even more lightweight than latches, which are lighter than the full locks used for data and index leaf pages. The spinlock is the only functionality in SQL Server in which processor-specific assembly language is used. A spinlock is implemented in a few lines of assembly language specific to each processor type (such as x86/Pentium or Alpha). The requester of a spinlock repeats its request if the lock is not immediately available. (That is, the requester "spins" on the lock until it is free.)

Spinlocks are often used as mutexes within SQL Server for resources that are usually not busy. If a resource is busy, the duration of a spinlock is short enough that retrying is better than waiting and then being rescheduled by the operating system, which results in context switching between threads. The savings in context switches more than offsets the cost of spinning as long as you don't have to spin too long. Spinlocks are used for situations in which the wait for a resource is expected to be brief (or if no wait is expected).

Deadlocks

A deadlock occurs when two processes are waiting for a resource and neither process can advance because the other process prevents it from getting the resource. A true deadlock is a catch-22 in which, without intervention, neither process can ever progress. When a deadlock occurs, SQL Server intervenes automatically.

Note A simple wait for a lock is not a deadlock. When the process that's holding the lock completes, the waiting process gets the lock. Lock waits are normal, expected, and necessary in multiple-user systems

In SQL Server, two main types of deadlocks can occur: a cycle deadlock and a conversion deadlock. Below Figure shows an example of a cycle deadlock. Process A starts a transaction, acquires an exclusive table lock on the authors table, and requests an exclusive table lock on the publishers table. Simultaneously, process B starts a transaction, acquires an exclusive lock on the publishers table, and requests an exclusive lock on the authors table. The two processes become deadlocked—caught in a "deadly embrace." Each process holds a resource needed by the other process. Neither can progress, and, without intervention, both would be stuck in deadlock forever. You can actually generate the deadlock using SQL Query Analyzer, as follows:

pen a query window, and change your database context to the pubs database. Execute the following batch for process A:

BEGIN TRAN
UPDATE authors SET contract = 0
GO
Open a second window, and execute this batch for process B:

BEGIN TRAN
UPDATE publishers SET city = 'Redmond', state = 'WA'
GO
Go back to the first window, and execute this update statement:

UPDATE publishers SET city = 'New Orleans', state = 'LA'
GO
At this point, the query should block. It is not deadlocked yet, however. It is waiting for a lock on the publishers table, and there is no reason to suspect that it won't eventually get that lock.

Go back to the second window, and execute this update statement:

UPDATE authors SET contract = 1
GO
At this point, a deadlock occurs. The first connection will never get its requested lock on the publishers table because the second connection will not give it up until it gets a lock on the authors table. Since the first connection already has the lock on the authors table, we have a deadlock. One of the processes received the following error message. (Of course, the actual process ID reported will probably be different.)

Server: Msg 1205, Level 13, State 1, Line 1
Transaction (Process ID 55) was deadlocked on {lock} resources with
another process and has been chosen as the deadlock victim. Rerun the
transaction.
Figure 14-2 shows an example of a conversion deadlock. Process A and process B each hold a shared lock on the same page within a transaction. Each process wants to promote its shared lock to an exclusive lock but cannot do so because of the other process's lock. Again, intervention is required.

SQL Server automatically detects deadlocks and intervenes through the lock manager, which provides deadlock detection for regular locks. In SQL Server 2000, deadlocks can also involve resources other than locks. For example, if process A is holding a lock on Table1 and is waiting for memory to become available and process B has some memory it can't release until it acquires a lock on Table1, the processes will deadlock. Threads and communication buffers can also be involved in deadlocks. Latches are not involved in deadlock detection because SQL Server uses deadlock-proof algorithms when it acquires latches. When SQL Server detects a deadlock, it terminates one process's batch, rolling back the active transaction and releasing all that process's locks to resolve the deadlock.

In SQL Server 2000, a separate thread called LOCK_MONITOR checks the system for deadlocks every 5 seconds. The lock monitor also uses an internal counter called a deadlock detection counter to determine whether to check the system for deadlocks more often. The deadlock detection counter starts at a value of 3 and is reset to 3 if a deadlock occurs. If the LOCK_MONITOR thread finds no deadlocks when it checks at the end of its 5-second cycle, it decrements the deadlock detection counter. If the counter has a value greater than 0, the lock manager requests that the lock monitor also check all the locks for a deadlock cycle if a process requests a lock resource and is blocked. Thus, after 20 seconds of not finding any deadlocks, the deadlock detection counter is 0 and the lock manager stops requesting deadlock detection every time a process blocks on a lock. The deadlock detection counter stays at 0 most of the time and the checking for deadlocks happens only at the 5-second intervals of the lock monitor.

This LOCK_MONITOR thread checks for deadlocks by inspecting the list of waiting locks for any cycles, which indicate a circular relationship between processes holding locks and processes waiting for locks. SQL Server attempts to choose as the victim the process that would be the least expensive to roll back, considering the amount of work the process has already done. However, certain operations are marked as golden, or unkillable, and cannot be chosen as the deadlock victim. For example, a process involved in rolling back a transaction cannot be chosen as a deadlock victim because the changes being rolled back could be left in an indeterminate state, causing data corruption.

Using the SET DEADLOCK_PRIORITY LOW | NORMAL statement, you can make a process sacrifice itself as the victim if a deadlock is detected. If a process has a deadlock priority of LOW, it terminates when a deadlock is detected even if it is not the process that closed the loop. However, there is no counterpart SET option to set a deadlock priority to HIGH. As much as you might want your own processes to always come out the winner in a deadlock situation, this feature has not yet been implemented in SQL Server.

Note The lightweight latches and spinlocks used internally do not have deadlock detection services. Instead, deadlocks on latches and spinlocks are avoided rather than resolved. Avoidance is achieved via strict programming guidelines used by the SQL Server development team. These lightweight locks must be acquired in a hierarchy, and a process must not have to wait for a regular lock while holding a latch or spinlock. For example, one coding rule is that a process holding a spinlock must never directly wait for a lock or call another service that might have to wait for a lock, and a request can never be made for a spinlock that is higher in the acquisition hierarchy. By establishing similar guidelines for your development team for the order in which SQL Server objects are accessed, you can go a long way toward avoiding deadlocks in the first place.

In the example in Figure 14-1, the cycle deadlock could have been avoided if the processes had decided on a protocol beforehand—for example, if they had decided to always access the authors table first and the publishers table second. Then one of the processes would get the initial exclusive lock on the table being accessed first, and the other process would wait for the lock to be released. One process waiting for a lock is normal and natural. Remember, waiting is not a deadlock.

You should always try to have a standard protocol for the order in which processes access tables. If you know that the processes might need to update the row after reading it, they should initially request an update lock, not a shared lock. If both processes request an update lock rather than a shared lock, the process that is granted an update lock is assured that the lock can later be promoted to an exclusive lock. The other process requesting an update lock has to wait. The use of an update lock serializes the requests for an exclusive lock. Other processes needing only to read the data can still get their shared locks and read. Since the holder of the update lock is guaranteed an exclusive lock, the deadlock is avoided.

By the way, the time that your process holds locks should be minimal so other processes don't wait too long for locks to be released. Although you don't usually invoke locking directly, you can influence locking by keeping transactions as short as possible. For example, don't ask for user input in the middle of a transaction. Instead, get the input first and then quickly perform the transaction.

Lock Types for User Data

Now we'll examine four aspects of locking user data. First, we'll look at the mode of locking (the type of lock). I already mentioned shared, exclusive, and update locks, and I'll go into more detail about these modes as well as others. Next, we'll look at the granularity of the lock, which specifies how much data is covered by a single lock. This can be a row, a page, an index key, a range of index keys, an extent, or an entire table. The third aspect of locking is the duration of the lock. As mentioned earlier, some locks are released as soon as the data has been accessed and some locks are held until the transaction commits or rolls back. For example, cursor scroll locks are held until a new FETCH operation is executed. The fourth aspect of locking concerns the ownership of the lock (the scope of the lock). Locks can be owned by a session, a transaction, or a cursor.

Lock Modes

SQL Server uses several locking modes, including shared locks, exclusive locks, update locks, and intent locks.

Shared Locks

Shared locks are acquired automatically by SQL Server when data is read. Shared locks can be held on a table, a page, an index key, or an individual row. Many processes can hold shared locks on the same data, but no process can acquire an exclusive lock on data that has a shared lock on it (unless the process requesting the exclusive lock is the same process as the one holding the shared lock). Normally, shared locks are released as soon as the data has been read, but you can change this by using query hints or a different transaction isolation level.

Exclusive Locks
SQL Server automatically acquires exclusive locks on data when it is modified by an insert, update, or delete operation. Only one process at a time can hold an exclusive lock on a particular data resource; in fact, as you'll see when we discuss lock compatibility, no locks of any kind can be acquired by a process if another process has the requested data resource exclusively locked. Exclusive locks are held until the end of the transaction. This means that the changed data is normally not available to any other process until the current transaction commits or rolls back. Other processes can decide to read exclusively locked data by using query hints.

Update Locks
Update locks are really not a separate kind of lock; they are a hybrid between shared and exclusive locks. They are acquired when SQL Server executes a data modification operation but first needs to search the table to find the resource that will be modified. Using query hints, a process can specifically request update locks, and in that case the update locks prevent the conversion deadlock situation presented in Figure 14-2.

Update locks provide compatibility with other current readers of data, allowing the process to later modify data with the assurance that the data hasn't been changed since it was last read. An update lock is not sufficient to allow you to change the data—all modifications require that the data resource being modified have an exclusive lock. An update lock acts as a serialization gate to queue future requests for the exclusive lock. (Many processes can hold shared locks for a resource, but only one process can hold an update lock.) As long as a process holds an update lock on a resource, no other process can acquire an update lock or an exclusive lock for that resource; instead, another process requesting an update or exclusive lock for the same resource must wait. The process holding the update lock can acquire an exclusive lock on that resource because the update lock prevents lock incompatibility with any other processes. You can think of update locks as "intent-to-update" locks, which is essentially the role they perform. Used alone, update locks are insufficient for updating data—an exclusive lock is still required for actual data modification. Serializing access for the exclusive lock lets you avoid conversion deadlocks.

Don't let the name fool you: update locks are not just for update operations. SQL Server uses update locks for any data modification operation that requires a search for the data prior to the actual modification. Such operations include qualified updates and deletes, as well as inserts into a table with a clustered index. In the latter case, SQL Server must first search the data (using the clustered index) to find the correct position at which to insert the new row. While SQL Server is only searching, it uses update locks to protect the data; only after it has found the correct location and begins inserting does it escalate the update lock to an exclusive lock.

Intent Locks
Intent locks are not really a separate mode of locking; they are a qualifier to the modes previously discussed. In other words, you can have intent shared locks, intent exclusive locks, and even intent update locks. Because SQL Server can acquire locks at different levels of granularity, a mechanism is needed to indicate that a component of a resource is already locked. For example, if one process tries to lock a table, SQL Server needs a way to determine whether a row (or a page) of that table is already locked. Intent locks serve this purpose. We'll discuss them in more detail when we look at lock granularity.

Special Lock Modes
SQL Server offers three additional lock modes: schema stability locks, schema modification locks, and bulk update locks. When queries are compiled, schema stability locks prevent other processes from acquiring schema modification locks, which are taken when a table's structure is being modified. A bulk update lock is acquired when the BULK INSERT command is executed or when the bcp utility is run to load data into a table. In addition, the copy operation must request this special lock by using the TABLOCK hint. Alternatively, the table can set the table option called table lock on bulk load to true, and then any bulk copy IN or BULK INSERT operation will automatically request a bulk update lock. If multiple connections have requested and received a bulk update lock, they can perform parallel loads into the same table.

Another lock mode that you might notice is the SIX lock. This mode is never requested directly by the lock manager but is the result of a conversion. If a transaction is holding a shared (S) lock on a resource and later an IX lock is needed, the lock mode will be indicated as SIX. For example, suppose you are operating at the Repeatable Read transaction isolation level and you issue the following batch:

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRAN
SELECT * FROM bigtable
UPDATE bigtable
SET col = 0
WHERE keycolumn = 100
Assuming that the table is large, the SELECT statement will acquire a shared table lock. (If there are only a few rows in bigtable, SQL Server will acquire individual row or key locks.) The UPDATE statement will then acquire a single exclusive key lock to do the update of a single row, and the X lock at the key level will mean an IX lock at the page and table level. The table will then show SIX when viewed through sp_lock.

Lock Granularity

SQL Server can lock user data resources (not system resources, which are protected with latches) at the table, page, or row level. SQL Server also locks index keys and ranges of index keys. Figure 14-3 shows the possible lock levels in a table. Note that if the table has a clustered index, the data rows are at the leaf level of the clustered index and they are locked with key locks instead of row locks.

The syslockinfo table keeps track of each lock by storing the type of resource locked (such as a row, key, or page), the mode of the lock, and an identifier for the specific resource. When a process requests a lock, SQL Server compares the lock requested to the resources already listed in the syslockinfo table and looks for an exact match on the resource type and identifier. (The lock modes don't have to be the same to yield an exact match.) However, if one process has a row exclusively locked in the authors table, for example, another process might try to get a lock on the entire authors table. Since these are two different resources, SQL Server does not find an exact match unless additional information is already stored in syslockinfo. This is what intent locks are for. The process that has the exclusive lock on a row of the authors table also has an intent exclusive lock on the page containing the row and another intent exclusive lock on the table containing the row. When the second process attempts to acquire the exclusive lock on the table, it finds a conflicting row already in the syslockinfo table on the same lock resource (the authors table). Not all requests for locks on resources that are already locked will result in a conflict. A conflict occurs when one process requests a lock on a resource that is already locked by another process in an incompatible lock mode. For example, two processes can each acquire shared locks on the same resource because shared locks are compatible with each other. I'll discuss lock compatibility in detail later in this chapter.

Key Locks
SQL Server 2000 supports two kinds of key locks, whose use depends on the isolation level of the current transaction. If the isolation level is Read Committed or Repeatable Read, SQL Server tries to lock the actual index keys accessed while processing the query. With a table that has a clustered index, the data rows are the leaf level of the index, and you will see key locks acquired. If the table is a heap, you might see key locks for the nonclustered indexes and row locks for the actual data.

If the isolation level is Serializable, the situation is special. We want to prevent phantoms, which means that if we have scanned a range of data within a transaction, we need to lock enough of the table to make sure that no one can insert a value into the range that was scanned. For example, we can issue the following query within an explicit transaction:

BEGIN TRAN
SELECT * FROM employees
WHERE salary BETWEEN 30000 AND 50000
Locks must be acquired to make sure that no new rows with salary values between 30000 and 50000 are inserted before the end of the transaction. Prior to version 7.0, SQL Server guaranteed this by locking whole pages or even the entire table. In many cases, however, this was too restrictive—more data was locked than the actual WHERE clause indicated, resulting in unnecessary contention. SQL Server 2000 uses key-range locks, which are associated with a particular key value in an index and indicate that all values between that key and the previous one in the index are locked.

Suppose we have an index on the lastname field in the employees table. We are in TRANSACTION ISOLATION LEVEL SERIALIZABLE and we issue this SELECT statement:

SELECT *
FROM employees
WHERE last_name BETWEEN 'Delaney' AND 'DuLaney'

If Dallas, Donovan, and Duluth are sequential leaf-level index keys in the table, the second two of these (Donovan and Duluth) acquire key-range locks (although only one row, for Donovan, is returned in the result set). The key-range locks prevent any inserts into the ranges ending with the two key-range locks. No values greater than Dallas and less than or equal to Donovan can be inserted, and no values greater than Donovan and less than or equal to Duluth can be inserted. Note that the key-range locks imply an open interval starting at the previous sequential key and a closed interval ending at the key on which the lock is placed. These two key-range locks prevent anyone from inserting either Delany or Delanie, which are in the range specified in the WHERE clause. However, the key-range locks would also prevent anyone from inserting DeLancey (which is greater than Dallas and less than Donovan) even though DeLancey is not in the query's specified range. Key-range locks are not perfect, but they do provide much greater concurrency than locking whole pages or tables, which was the only possibility in previous SQL Server versions.

Additional Lock Resources
Locking is also done on extents—units of disk space that are 64 KB in size (eight pages of 8 KB each). This kind of locking occurs automatically when a table or an index needs to grow and a new extent must be allocated. You can think of an extent lock as another type of special purpose latch, but it does show up in the output of the sp_lock procedure. Extents can have both shared extent and exclusive extent locks.

When you examine the output of sp_lock, notice that most processes hold a lock on at least one database. In fact, any process holding locks in any database other than master or tempdb will have a DB lock for that database. These are always shared locks and are used by SQL Server for determining when a database is in use. SQL Server detects DB locks when determining whether a database can be dropped, restored, or closed. Since master and tempdb cannot be dropped or closed, DB locks are unnecessary. In addition, tempdb is never restored, and to restore the master database the entire server must be started in single-user mode, so again, DB locks are unnecessary. Generally, you don't need to be concerned with extent or database locks, but you might see them if you are running sp_lock or perusing syslockinfo.

Application Locks
The method used by SQL Server to store information about locking and to check for incompatible locks is very straightforward and extensible. As you've seen, the SQL Server lock manager knows nothing about the object it is locking. It works only with strings representing the resources without knowing the actual structure of the item. If two processes are trying to obtain incompatible locks on the same resource, blocking will occur.

If the SQL Server developers were to decide to allow you to lock individual columns as well as rows, pages, and tables, they could simply decide on an internal code number for column locks, and then we could add that to the list of resources in Table 14-3.

Instead of adding new lock resources, SQL Server 2000 lets you extend the resources that can be locked. You can take advantage of the supplied mechanisms for detecting blocking and deadlocking situations, and you can choose to lock anything you like. These lock resources are called application locks. To define an application lock, you specify a name for the resource you are locking, a mode, an owner, and a timeout.

Two resources are considered to be the same resource and are subject to blocking if they have the same name and the same owner in the same database. Remember that by lock owner we mean the session, the transaction, or a cursor. For your own application locks, the only possible owners are transaction and session. Two requests for locks on the same resource can be granted if the modes of the locks requested are compatible. The locks are checked for compatibility using the same compatibility matrix used for SQL Server supplied locks.

For example, suppose you have a stored procedure that only one user at a time should execute. You can "lock" that procedure by using the sp_getapplock procedure to acquire a special lock, which means that someone is using this procedure. When the procedure is complete, you can use sp_releaseapplock to release the lock:

EXEC sp_getapplock 'ProcLock', 'Exclusive', 'session'
EXEC MySpecialProc
EXEC sp_releaseapplock 'ProcLock', 'session'
Until the lock is released using sp_releaseapplock, or until the session terminates, no other session can execute this procedure as long as it follows the protocol and uses sp_getapplock to request rights to the procedure before trying to execute it. SQL Server doesn't know what the resource ProcLock means. It just adds a row to the syslockinfo table that it will use to compare against other requested locks. Note that the procedure itself is not really locked. If another user or application doesn't know that this is a special procedure and tries to execute MySpecialProc without acquiring the application lock, SQL Server will not prevent the session from executing the procedure.

The resource name used in these procedures can be any identifier up to 255 characters long. The possible modes of the lock, which is used to check compatibility with other requests for this same resource, are Shared, Update, Exclusive, IntentExclusive, and IntentShared. There is no default; you must specify a mode. The possible values for lock owner, the third parameter, are transaction (the default) or session. A lock with an owner of transaction must be acquired with a user-defined transaction, and it will be automatically released at the end of the transaction without any need to call sp_releaseapplock. A lock with an owner of session will be released automatically only when the session disconnects.

Identifying Lock Resources
When the lock manager tries to determine whether a requested lock can be granted, it checks the syslockinfo table to determine whether a matching lock with a conflicting lock mode already exists. It compares locks by looking at the database ID (dbid), the object ID (objid), the type of resource locked, and the description of the specific resource referenced by the lock. The lock manager knows nothing about the meaning of the resource description. It simply compares the strings identifying the lock resources to look for a match. If it finds a match, it knows the resource is already locked; it then uses the lock compatibility matrix to determine whether the current lock is compatible with the one being requested. Table 14-3 shows all the lock resources, the abbreviations used in the output of sp_lock, and the information used to define the actual resource locked.

Lock Duration
The length of time that a lock is held depends primarily on the mode of the lock and the transaction isolation level in effect. The default isolation level for SQL Server is Read Committed. At this level, shared locks are released as soon as SQL Server has read and processed the locked data. An exclusive lock is held until the end of the transaction, whether it is committed or rolled back. An update lock is also held until the end of the transaction unless it has been promoted to an exclusive lock, in which case the exclusive lock, as with all exclusive locks, remains for the duration of the transaction. If your transaction isolation level is Repeatable Read or Serializable, shared locks have the same duration as exclusive locks. That is, they are not released until the transaction is over.

In addition to changing your transaction isolation level, you can control the lock duration by using query hints. I'll discuss query hints for locking and for other purposes in Chapter 16.

Lock Ownership
Lock duration can also be affected by the lock ownership. There are three types of lock owners: transactions, cursors, and sessions. These are available through the req_ownertype column in the syslockinfo table. (This information is not visible through the sp_lock stored procedure.) A req_ownertype value of 1 indicates that the lock is owned by transaction, and its duration is as discussed as described in the previous section. Most of our locking discussion, in fact, deals with locks owned by a transaction.

A cursor lock has a req_ownertype value of 2. If a cursor is opened using a locking mode of SCROLL_LOCKS, a cursor lock is held on every row fetched until the next row is fetched or the cursor is closed. Even if the transaction commits before the next fetch, the cursor lock is not released.

Locks owned by a session have a req_ownertype value of 3. A session lock is one taken on behalf of a process that is outside the scope of a transaction. The most common example is a database lock, as discussed earlier. A process acquires a session lock on the database when it issues the USE database command, and that lock isn't released until another USE command is issued or until the process is disconnected.

Viewing Locks
To see the locks currently outstanding in the system as well as those that are being waited for, examine the syslockinfo system table or execute the system stored procedure sp_lock. The syslockinfo table is not really a system table. It is not maintained on disk because locks are not maintained on disk. Rather, it is materialized in table format based on the lock manager's current accounting of locks each time syslockinfo is queried. Another way to watch locking activity is with the excellent graphical representation of locking status provided by SQL Server Enterprise Manager. Even those who think that GUIs are for wimps can appreciate SQL Server Enterprise Manager's view of locking.

In some cases, the output of sp_lock can be voluminous. You can reduce the output by specifying one or two process ID values; sp_lock will then show you only locks held by those processes. The process ID for a particular connection is available using the system function @@spid. You can execute sp_lock and specify only your current connection:

EXEC sp_lock @@spid
However, even limiting the output to just the locks for the current connection can sometimes generate more output that you're interested in. To produce the lock output, SQL Server must translate internal ID numbers for the type of lock and mode of lock into the strings shown in Tables 14-1, 14-2, and 14-3. To do this translation, SQL Server uses the spt_values table in the master database as a giant lookup table. If you're using the Serializable isolation level, locks can be held on this table in the master database as well as on temporary tables in tempdb. Having to wade through these additional locks—which exist only because you're running sp_lock to examine your locks—can make it difficult to understand the locks on your user data. To help solve this problem, I have written a modified sp_lock procedure called sp_lock2, which does not print out any locks in the master, model, tempdb, or msdb databases. In addition, the procedure translates the database ID into the database name. You can find the script to create sp_lock2 on the companion CD.

The following examples show what each of the lock types and modes discussed earlier look like when reported by the sp_lock2 procedure. Note that the call to the sp_lock2 procedure is preceded by the keyword EXECUTE, which is required when the call to a stored procedure is not the first item in a batch. Note also that the sp_lock2 procedure is given an argument of @@spid so that we'll see only locks for the current process.

Lock Compatibility

Two locks are compatible if one lock can be granted while another lock on the same object held by a different process is outstanding. On the other hand, if a lock requested for an object is not compatible with a lock currently being held, the requesting connection must wait for the lock. For example, if a shared page lock exists on a page, another process requesting a shared page lock for the same page is granted the lock because the two lock types are compatible. But a process that requests an exclusive lock for the same page is not granted the lock because an exclusive lock is not compatible with the shared lock already held

Two locks are compatible if one lock can be granted while another lock on the same object held by a different process is outstanding. On the other hand, if a lock requested for an object is not compatible with a lock currently being held, the requesting connection must wait for the lock. For example, if a shared page lock exists on a page, another process requesting a shared page lock for the same page is granted the lock because the two lock types are compatible. But a process that requests an exclusive lock for the same page is not granted the lock because an exclusive lock is not compatible with the shared lock already held

Internal Locking Architecture

Locks are not on-disk structures—you won't find a lock field directly on a data page or a table header—because it would be too slow to do disk I/O for locking operations. Locks are internal memory structures—they consume part of the memory used for SQL Server. Each locked data resource (a row, index key, page, or table) requires 64 bytes of memory to keep track of the database, the type of lock, and the information describing the locked resource. Each process holding a lock also must have a lock owner block of 32 bytes. A single transaction can have multiple lock owner blocks; a scrollable cursor sometimes uses several. Also, one lock can have many lock owner blocks, as in the case with a shared lock. Finally, each process waiting for a lock has a lock waiter block of another 32 bytes. Since lock owner blocks and lock waiter blocks have identical structures, I'll use the term lock owner block to refer to both of them.

The lock manager maintains a lock hash table. Lock resources, contained within a lock block, are hashed to determine a target hash slot in the hash table. (I'll discuss hashing in detail when I talk about the SQL Server hash join algorithm in Chapter 15.) All lock blocks that hash to the same slot are chained together from one entry in the hash table. Each lock block contains a 16-byte field that describes the locked resource. This 16-byte description is viewable in the syslockinfo table in the rsc_bin column. I'll dissect that column later in this section. The lock block also contains pointers to lists of lock owner blocks. A lock owner block represents a process that has been granted a lock, is waiting for the lock, or is in the process of converting to the lock. Figure 14-4 shows the general lock architecture.

The lock manager preallocates a number of lock blocks and lock owner blocks at server startup. If the number of locks is fixed by sp_configure, it allocates that configured number of lock blocks and the same number of lock owner blocks. If the number is not fixed (0 means auto-tune), it allocates 500 lock blocks on a SQL Server 2000, Desktop Edition server and 2500 lock blocks for the other editions. It allocates twice as many (2 * # lock blocks) of the lock owner blocks. At their maximum, the static allocations can't consume more than 25 percent of the committed buffer pool size.

Note In this context, I use the term process to refer to a SQL Server subtask. Every user connection is referred to as a process, as are the checkpoint manager, the lazywriter, the log writer, and the lock monitor. But these are only subtasks within SQL Server, not processes from the perspective of the operating system, which considers the entire SQL Server engine to be a single process with multiple threads.

When a request for a lock is made and no free lock blocks remain, the lock manager dynamically allocates new lock blocks instead of denying the lock request. The lock manager cooperates with the global memory manager to negotiate for server allocated memory. Each lock contains a flag indicating whether the block was dynamically allocated or preallocated. When necessary, the lock manager can free the dynamically allocated lock blocks. The lock manager is limited to 60 percent of the buffer manager's committed target size allocation to lock blocks and lock owner blocks.

Lock Blocks
The lock block is the key structure in SQL Server's locking architecture, as shown earlier in Figure 14-4. A lock block contains the following information:

Lock resource name

Pointers to connect the lock blocks to the lock hash table

General summary information

Pointer to a list of lock owner blocks for locks on this resource that have been granted (granted list)

Pointer to a list of lock owner blocks for locks on this resource that are waiting to be converted to another lock mode (convert list)

Pointer to a list of lock owner blocks for locks that have been requested on this resource but have not yet been granted (wait list)

The lock resource block is the most important element of the lock block. Its structure is shown in Figure 14-5. Each "row" in the figure represents 4 bytes, or 32 bits.

Following are some of the possible SR (SubResouce) values:

If the lock is on a DB resource, SR indicates one of the following:

Full database lock

Bulk operation lock

If the lock is on a Table resource, SR indicates one of the following:

Full table lock (default)

Update statistics lock

Compile lock

If the lock is on an Index resource, SR indicates one of the following:

Full index lock (default)

Index ID lock

Index name lock

Lock Owner Blocks
Each lock owned or waited for by a session is represented in a lock owner block. Lists of lock owner blocks form the grant, convert, and wait lists that hang off of the lock blocks. Each lock owner block for a granted lock is linked with all other lock owner blocks for the same transaction or session so that they can be freed as appropriate when the transaction or session ends.

Syslockinfo Table

The system procedures sp_lock and sp_lock2 are based on information extracted from the syslockinfo table, which exists only in the master database. Lots more information, including the name of the lock owner, is kept in the syslockinfo table. Table 14-7 shows the columns in that table. Columns prefixed by rsc_ are taken from a lock resource block. Columns prefixed by req_ are taken from a lock owner block.

Bound Connections
Remember that the issue of lock contention applies only between different SQL Server processes. A process holding locks on a resource does not lock itself from the resource—only other processes are denied access. But any other process (or connection to SQL Server) can actually execute as the same application and user. It is common for applications to have more than one connection to SQL Server. Every such connection is treated as an entirely different SQL Server process, and by default no sharing of the "lock space" occurs between connections, even if they belong to the same user and the same application.

However, two or more different connections can share a lock space and hence not lock each other out. This capability is known as a bound connection. With a bound connection, the first connection asks SQL Server to give out its bind token. The bind token is passed by the application (using a client-side global variable, shared memory, or another method) for use in subsequent connections. The bind token acts as a "magic cookie" so that other connections can share the lock space of the original connection. Locks held by bound connections do not lock each other. (The sp_getbindtoken and sp_bindsession system stored procedures get and use the bind token.)

Bound connections are especially useful if you're writing an extended stored procedure—a function written in your own DLL—and that extended stored procedure needs to call back into the database to do some work. Without a bound connection, the extended stored procedure collides with its own calling process's locks. When multiple processes share a lock space and a transaction space by using bound connections, a COMMIT or ROLLBACK affects all the participating connections. If you are going to use bound connections for the purpose of passing the bind token to an extended stored procedure to call back into the server, you must use a second parameter of the constant 1. If no parameter of 1 is passed, the token cannot be used in an extended stored procedure.

Here's an example of using bound connections between two different windows in SQL Query Analyzer. In SQL Server 2000, you must be inside of a transaction in order to get a bind token. Since we don't have a controlling application to declare and store the bind token in a client-side variable, we have to actually copy it from the first session and paste it into the second. So, in your first query window, you execute this batch:

DECLARE @token varchar(255)
BEGIN TRAN
EXEC sp_getbindtoken @token OUTPUT
SELECT @token
GO
This should return something like the following:

-----------dPe---5---.?j0U<_WP?1HMK-3/D8;@1
Normally, you wouldn't have to look at this messy string; your application would just store it and pass it on without your ever having to see it. But for a quick example using SQL Query Analyzer, it's necessary to actually see the value. You use your keyboard or mouse to select the token string that you received and use it in the following batch in a second SQL Query Analyzer window:

EXEC sp_bindsession 'dPe---5---.?j0U<_WP?1HMK-3/D8;@1'
GO
Now go back to the first query window and execute a command that locks some data. Remember that we have already begun a transaction in order to call sp_getbindtoken. You can use something like this:

USE pubs
UPDATE titles
SET price = $100
GO
This should exclusively lock every row in the titles table. Now go to the second query window and select from the locked table:

SELECT title_id, price FROM titles
GO
You should be able to see all the $100 prices in the titles table, just as if you were part of the same connection as the first query. Besides sharing lock space, the bound connection also shares transaction space. You can execute a ROLLBACK TRAN in the second window even though the first one began the transaction. If the first connection tries to then issue a ROLLBACK TRAN, it gets this message:

Server: Msg 3903, Level 16, State 1, Line 1
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION.
The transaction active in this session has been committed or aborted by
another session.
SQL Server keeps track of bound connections in an internal structure called the XCB (transaction control block), which is used to relate multiple connections in a bound session. You can actually see which connections are bound using SQL Query Analyzer and an undocumented DBCC command. First, you need the spid (Server Process ID) of any connections you're interested in. You can get this value using the function @@spid:

SELECT @@spid

Row-Level vs. Page-Level Locking

The debate over whether row-level locking is better than page-level locking or vice-versa has been one of those near-religious wars and warrants a few comments here. Although some people would have you believe that one is always better, it's not really that simple.

Prior to version 7, the smallest unit of data that SQL Server could lock was a page. Even though many people argued that this was unacceptable and it was impossible to maintain good concurrency while locking entire pages, many large and powerful applications were written and deployed using only page-level locking. If they were well designed and tuned, concurrency was not an issue, and some of these applications supported hundreds of active user connections with acceptable response times and throughput. However, with the change in page size from 2 KB to 8 KB for SQL Server 7, the issue has become more critical. Locking an entire page means locking four times as much data as in previous versions. Beginning with version 7, SQL Server implements full row-level locking, so any potential problems due to lower concurrency with the larger page size should not be an issue. However, locking isn't free. Considerable resources are required to manage locks. Recall that a lock is an in-memory structure of about 32 bytes, with another 32 bytes for each process holding the lock and each process waiting for the lock. If you need a lock for every row and you scan a million rows, you need more than 30 MB of RAM just to hold locks for that one process.

Beyond memory consumption issues, locking is a fairly processing-intensive operation. Managing locks requires substantial bookkeeping. Recall that, internally, SQL Server uses a lightweight mutex called a spinlock to guard resources, and it uses latches—also lighter than full-blown locks—to protect non–leaf-level index pages. These performance optimizations avoid the overhead of full locking. If a page of data contains 50 rows of data, all of which will be used, it is obviously more efficient to issue and manage one lock on the page than to manage 50. That's the obvious benefit of page locking—a reduction in the number of lock structures that must exist and be managed.

If two different processes each need to update a few separate rows of data and some of the rows needed by each process happen to exist on the same page, one process must wait until the page locks of the other process are released. If, in this case, you use row-level locking instead of page-level locking, the other process does not have to wait. The finer granularity of the locks means that no conflict occurs in the first place because each process is concerned with different rows. That's the obvious benefit of row-level locking. Which of these obvious benefits wins? Well, the decision isn't clear cut, and it depends on the application and the data. Each type of locking can be shown to be superior for different types of applications and usage.

The stored procedure sp_indexoption lets you manually control the unit of locking within an index. It also lets you disallow page locks or row locks within an index. Since these options are available only for indexes, there is no way to control the locking within the data pages of a heap. (But remember that if a table has a clustered index, the data pages are part of the index and are affected by the sp_indexoption setting.) The index options are set for each table or index individually. Two options, AllowRowLocks and AllowPageLocks, are both set to TRUE initially for every table and index. If both of these options are set to FALSE for a table, only full table locks are allowed.

As mentioned earlier, SQL Server determines at runtime whether to initially lock rows, pages, or the entire table. The locking of rows (or keys) is heavily favored. The type of locking chosen is based on the number of rows and pages to be scanned, the number of rows on a page, the isolation level in effect, the update activity going on, the number of users on the system needing memory for their own purposes, and so on.

Lock Escalation

SQL Server automatically escalates row, key, or page locks to coarser table locks as appropriate. This escalation protects system resources—it prevents the system from using too much memory for keeping track of locks—and increases efficiency. For example, after a query acquires many row locks, the lock level can be escalated to a table lock. If every row in a table must be visited, it probably makes more sense to acquire and hold a single table lock than to hold many row locks. A single table lock is acquired and the many row locks are released. This escalation to a table lock reduces locking overhead and keeps the system from running out of locks. Because a finite amount of memory is available for the lock structures, escalation is sometimes necessary to make sure the memory for locks stays within reasonable limits.

When the lock count for one transaction exceeds 1250 or when the lock count for one index or table scan exceeds 765, the lock manager looks to see how much memory is being used for all locks in the system. If more than 40 percent of the memory pool is being used for locks, SQL Server attempts to escalate multiple page, key, or RID locks into a table lock. SQL Server tries to find a table that is partially locked by the transaction and holds the largest number of locks for which no escalation has already been performed, and which is capable of escalation. Multiple RID, key, or page locks cannot be escalated to a table lock if some other processes hold incompatible locks on other RIDs, keys, or pages of the same table. SQL Server will keep looking for other tables partially locked by the same transaction until all possible escalations have taken place or the total memory used for locks drops under 40 percent. Note that SQL Server never escalates to page locks; the result of a lock escalation is always a table lock.

Locking Hints and Trace Flags

Just as you can specify hints on queries to direct the query optimizer to choose a certain index or strategy in its query plan, you can specify hints for locking. For example, if you know that your query will scan so many rows that its page locks will escalate to table locks, you can direct the query to use table locks in the first place, which is more efficient.

For now, armed with the knowledge of locking you've gained here, you can observe the locking activity of your system to understand how and when locks occur. Trace flag 1204 provides detailed information about deadlocks, and this information can help you understand why locks are occurring and how to change the order of access to objects to reduce them. Trace flag 1200 provides detailed locking information as every request for a lock is made.

Summary

SQL Server lets you manage multiple users simultaneously and ensure that transactions observe the properties of the chosen isolation level. Locking guards data and the internal resources that make it possible for a multiple-user system to operate like a single-user system. In this chapter, we looked at the locking mechanisms in SQL Server, including full locking for data and leaf-level index pages and lightweight locking mechanisms for internally used resources.

We also looked at the types and modes of locks as well as lock compatibility and lock escalation. It is important to understand the issues of lock compatibility if you want to design and implement high-concurrency applications. We also looked at deadlocks and discussed ways to avoid them.

Indexes

An index provides fast access to data when the data can be searched by the value that is the index key.Indexes allow data to be organized in a way that allows optimum performance when you access or modify it.
You can use indexes to quickly find data rows that satisfy conditions in your WHERE clauses, to find matching rows in your JOIN clauses, or to efficiently maintain uniqueness of your key columns during INSERT and UPDATE operations. In some cases, you can use indexes to help SQL Server sort, aggregate, or group your data or to find the first few rows as indicated in a TOP clause.

It is the job of the query optimizer to determine which indexes, if any, are most useful in processing a specific query. The final choice of which indexes to use is one of the most important components of the query optimizer's execution plan.

clustered and nonclustered indexes in SQL Server store their information using standard B-trees.B-trees cluster records with similar keys. The B stands for balanced, and balancing the tree is a core feature of a B-tree's usefulness. The trees are managed, and branches are grafted as necessary, so that navigating down the tree to find a value and locate a specific record takes only a few page accesses. Because the trees are balanced, finding any record requires about the same amount of resources, and retrieval speed is consistent because the index has the same depth throughout.

An index consists of a tree with a root from which the navigation begins, possible intermediate index levels, and bottom-level leaf pages. You use the index to find the correct leaf page. The number of levels in an index will vary depending on the number of rows in the table and the size of the key column or columns for the index. If you create an index using a large key, fewer entries will fit on a page, so more pages (and possibly more levels) will be needed for the index. On a qualified select, update, or delete, the correct leaf page will be the lowest page of the tree in which one or more rows with the specified key or keys reside. A qualified operation is one that affects only specific rows that satisfy the conditions of a WHERE clause, as opposed to one that accesses the whole table. In any index, whether clustered or nonclustered, the leaf level contains every key value, in key sequence.

Clustered Indexes

The leaf level of a clustered index contains the data pages, not just the index keys. Another way to say this is that the data itself is part of the clustered index. A clustered index keeps the data in a table ordered around the key. The data pages in the table are kept in a doubly linked list called the page chain. The order of pages in the page chain, and the order of rows on the data pages, is the order of the index key or keys. Deciding which key to cluster on is an important performance consideration. When the index is traversed to the leaf level, the data itself has been retrieved, not simply pointed to.

Because the actual page chain for the data pages can be ordered in only oneway, a table can have only one clustered index. The query optimizer strongly favors a clustered index because such an index allows the data to be found directly at the leaf level. Because it defines the actual order of the data, a clustered index allows especially fast access for queries looking for a range of values. The query optimizer detects that only a certain range of data pages must be scanned.

Most tables should have a clustered index. If your table will have only one index, it generally should be clustered. Many documents describing SQL Server indexes will tell you that the clustered index physically stores the data in sorted order. This can be misleading if you think of physical storage as the disk itself. If a clustered index had to keep the data on the actual disk in a particular order, it could be prohibitively expensive to make changes. If a page got too full and had to be split in two, all the data on all the succeeding pages would have to be moved down. Sorted order in a clustered index simply means that the data page chain is in order. If SQL Server follows the page chain, it can access each row in clustered index order, but new pages can be added by simply adjusting the links in the page chain.

In SQL Server 2000, all clustered indexes are unique. If you build a clustered index without specifying the UNIQUE keyword, SQL Server forces uniqueness by adding a uniqueifier to the rows when necessary. This uniqueifier is a 4-byte value added as an additional sort key to only the rows that have duplicates of their primary sort key. You'll be able to see this extra value when we look at the actual structure of index rows later in this chapter.

Nonclustered Indexes

In a nonclustered index, the lowest level of the tree (the leaf level) contains a bookmark that tells SQL Server where to find the data row corresponding to the key in the index.
If the table has a clustered index, the bookmark is the clustered index key for the corresponding data row. If the table is a heap (in other words, it has no clustered index), the bookmark is a row identifier (RID), which is an actual row locator in the form File#:Page#:Slot#. (In contrast, in a clustered index, the leaf page is the data page.)
The presence or absence of a nonclustered index doesn't affect how the data pages are organized, so you're not restricted to having only one nonclustered index per table, as is the case with clustered indexes. Each table can include as many as 249 nonclustered indexes, but you'll usually want to have far fewer than that.

When you search for data using a nonclustered index, the index is traversed and then SQL Server retrieves the record or records pointed to by the leaf-level indexes. For example, if you're looking for a data page using an index with a depth of three—a root page, one intermediate page, and the leaf page—all three index pages must be traversed. If the leaf level contains a clustered index key, all the levels of the clustered index then have to be traversed to locate the specific row. The clustered index will probably also have three levels, but in this case remember that the leaf level is the data itself. There are two additional index levels separate from the data, typically one less than the number of levels needed for a nonclustered index. The data page still must be retrieved, but because it has been exactly identified, there's no need to scan the entire table. Still, it takes six logical I/O operations to get one data page. You can see that a nonclustered index is a win only if it's highly selective.

Figure illustrates this process without showing you the individual levels of the B-trees. I want to find the first name for the employee named Anson, and I have a nonclustered index on the last name and a clustered index on the employee ID. The nonclustered index uses the clustered keys as its bookmarks. Searching the index for Anson, SQL Server finds that the associated clustered index key is 7. It then traverses the clustered index looking for the row with a key of 7, and it finds Kim as the first name in the row I'm looking for.

Creating an Index

The typical syntax for creating an index is straightforward:

CREATE [UNIQUE] [CLUSTERED | NONCLUSTERED] INDEX index_name
ON table_name (column_name [ASC | DESC][,...n])

CREATE INDEX has some additional options available for specialized purposes. You can add a WITH clause to the CREATE INDEX command:

[WITH
[FILLFACTOR = fillfactor]
[[,] [PAD_INDEX]
[[,] IGNORE_DUP_KEY]
[[,] DROP_EXISTING]
[[,] STATISTICS_NORECOMPUTE]
[[,] SORT_IN_TEMPDB]
]

FILLFACTOR is probably the most commonly used of these options. FILLFACTORlets you reserve some space on each leaf page of an index. In a clustered index, since the leaf level contains the data, you can use FILLFACTOR to control how much space to leave in the table itself. By reserving free space, you can later avoid the need to split pages to make room for a new entry

If you need to, you can use the DBCC DBREINDEX command to rebuild the index and reestablish the original FILLFACTOR specified.

Tip If you plan to rebuild all of a table's indexes, simply specify the clustered index with DBCC DBREINDEX. Doing so internally rebuilds the entire table and all nonclustered indexes.

FILLFACTOR isn't usually specified on an index-by-index basis, but you can specify it this way for fine-tuning. If FILLFACTOR isn't specified, the serverwide default is used. The value is set for the server via sp_configure, fillfactor. This value is 0 by default, which means that leaf pages of indexes are made as full as possible. FILLFACTOR generally applies only to the index's leaf page (the data page for a clustered index). In specialized and high-use situations, you might want to reserve space in the intermediate index pages to avoid page splits there, too. You can do this by specifying the PAD_INDEX option, which uses the same value as FILLFACTOR.

The DROP_EXISTING option specifies that a given index should be dropped and rebuilt as a single transaction. This option is particularly useful when you rebuild clustered indexes. Normally, when a clustered index is dropped, every nonclustered index has to be rebuilt to change its bookmarks to RIDs instead of the clustering keys. Then, if a clustered index is built (or rebuilt), all nonclustered indexes need to be rebuilt again to update the bookmarks. The DROP_EXISTING option of the CREATE INDEX command allows a clustered index to be rebuilt without having to rebuild the nonclustered indexes twice. If you are creating the index on the exact same keys that it had previously, the nonclustered indexes do not need to be rebuilt at all. If you are changing the key definition, the nonclustered indexes are rebuilt only once, after the clustered index is rebuilt.

You can ensure the uniqueness of an index key by defining the index as UNIQUE or by defining a PRIMARY KEY or UNIQUE constraint. If an UPDATE or INSERT statement would affect multiple rows, and if even one row is found that would cause duplicate keys defined as unique, the entire statement is aborted and no rows are affected. Alternatively, when you create the unique index, you can use the IGNORE_DUP_KEY option so that a duplicate key error on a multiple-row INSERT won't cause the entire statement to be rolled back. The nonunique row will be discarded, and all other rows will be inserted or updated. IGNORE_DUP_KEY doesn'tallow the uniqueness of the index to be violated; instead, it makes a violation in a multiple-row data modification nonfatal to all the nonviolating rows.

The SORT_IN_TEMPDB option allows you to control where SQL Server performs the sort operation on the key values needed to build an index. The default is that SQL Server uses space from the filegroup on which the index is to be created. While the index is being built, SQL Server scans the data pages to find the key values and then builds leaf-level index rows in internal sort buffers. When these sort buffers are filled, they are written to disk. The disk heads for the database can then move back and forth between the base table pages and the work area where the sort buffers are being stored. If, instead, your CREATE INDEX command includes the option SORT_IN_TEMPDB, performance can be greatly improved, particularly if your tempdb database is on a separate physical disk from the database you're working with, with its own controller. You can optimize head movement because two separate heads read the base table pages and manage the sort buffers. You can get even more improvement in index creation speed if your tempdb database is on a faster disk than your user database and you use the SORT_IN_TEMPDB option. As an alternative to using the SORT_IN_TEMPDB option, you can create separate filegroups for a table and its indexes—that is, the table is on one filegroup and its indexes are on another. If the two filegroups are on different disks with their own controllers, you can also minimize the disk head movement.

Constraints and Indexes

when you declare a PRIMARY KEY or UNIQUE constraint, a unique index is created on one or more columns, just as if you had used the CREATE INDEX command. The names of indexes that are built to support these constraints are the same as the constraint names. In terms of internal storage and maintenance of indexes, there is no difference between unique indexes created using the CREATE INDEX command and indexes created to support constraints. The query optimizer makes decisions based on the presence of the unique index rather than on the fact that a column was declared as a primary key. How the index got there in the first place is irrelevant to the query optimizer.

When you create a table that includes PRIMARY KEY or UNIQUE constraints, you can specify whether the associated index will be clustered or nonclustered and you can also specify the fillfactor

If you check the documentation for CREATE TABLE and ALTER TABLE, you'll see that the SORT_IN_TEMPDB option is not available for either command. It really doesn't make sense to specify a sort location when you first create the table because there's nothing to sort. However, the fact that you can't specify this alternate location when you add a PRIMARY KEY or UNIQUE constraint to a table with existing data seems like an oversight. Also note that SORT_IN_TEMPDB is not an option when you use DBCC DBREINDEX. Again, there's no reason why it couldn't have been included, but it isn't available in this release

The biggest difference between indexes created using the CREATE INDEX command and indexes that support constraints is in how you can drop the index. The DROP INDEX command allows you to drop only indexes that were built with the CREATE INDEX command. To drop indexes that support constraints, you must use ALTER TABLE to drop the constraint.

the output of sp_helpindex tells us only the names of the indexes, the property (clustered or unique), and the columns the index is on. It doesn't tell us if the index supports a constraint. However, if we execute sp_help on a table, the output will tell us that.

The Structure of Index Pages

Index pages are structured much like data pages. As with all other types of pages in SQL Server, index pages have a fixed size of 8 KB, or 8192 bytes. Index pages also have a 96-byte header, but just like in data pages, there is an offset array at the end of the page with two bytes for each row to indicate the offset of that row on the page. However, if you use DBCC PAGE with style 1 to print out the individual rows on an index page, the slot array is not shown. If you use style 2, it will only print out all the bytes on a page with no attempt to separate the bytes into individual rows. You can then look at the bottom of the page and notice that each set of 2 bytes does refer to a byte offset on the page. Each index has a row in the sysindexes table, with an indid value of either 1, for a clustered index, or a number between 2 and 250, indicating a nonclustered index. An indid value of 255 indicates LOB data (text, ntext or image information). The root column value contains a file number and page number where the root of the index can be found. You can then use DBCC PAGE to examine index pages, just as you do for data pages.

There are basically three different kinds of index pages: leaf level for nonclustered indexes, node (nonleaf) level for clustered indexes, and node level for nonclustered indexes. There isn't really a separate structure for leaf level pages of a clustered index because those are the data pages, which we've already seen in detail. There is, however, one special case for leaf-level clustered index pages.

Clustered Index Rows with a Uniqueifier

If your clustered index was not created with the UNIQUE property, SQL Server adds a 4-byte field when necessary to make each key unique. Since clustered index keys are used as bookmarks to identify the base rows being referenced by nonclustered indexes, there needs to be a unique way to refer to each row in a clustered index. SQL Server adds the uniqueifier only when necessary—that is, when duplicate keys are added to the table

....
.
.
.
.
.

Index Space Requirements

Types of Fragmentation

Special Indexes

SET Options
The following seven SET options can affect the result value of an expression or predicate, so you must set them as shown to create indexed views or indexes on computed columns:

SET ARITHABORT ON
SET CONCAT_NULL_YIELDS_NULL ON
SET QUOTED_IDENTIFIER ON
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET NUMERIC_ROUNDABORT OFF

Indexed Views

Using an Index