Abstract
This chapter gives you an in-depth look at indexing. Specifically, the authors walk through all the options in the Manage Indexes dialog box, consider advanced topics in indexing, and discuss indexing strategies. You may also want to read "Performance Tuning SQL Server" and "Optimizing SQL Server Queries."
USING THE MANAGE INDEXES DIALOG BOX
The Manage Indexes dialog box, shown in Figure 17.1, lets you perform almost every function related to indexing. In SQL Enterprise Manager, choose Indexes from the Manage menu. Walking through the options in this dialog box is a good, methodical way to cover many indexing topics.
Proper Indexing
One of the most important ways to improve the performance of your applications is to index properly. SQL Server currently allows only b-tree indexing, which you could see as a limitation; however, this specialization also means you must become expert in only one form of indexing to optimize your indexes.
SQL Servers b-tree indexing structure is composed of three parts: a non-leaf level, which has a root page and other levels, depending on the size of the fields in the key; a leaf level, where the key is in sorted order; and the data itself. The non-leaf and leaf levels are allocated their own separate storage areas (in extents, or eight-page blocks).
B-tree indexes have strengths; for example, they are the most flexible kind of index over the widest variety of applications. However, they are not the fastest index for certain data retrieval operations. For example, hashing is considered the fastest algorithm for retrieving records when you have exact matches on key values, but hashing is poor for partial key retrievals.
Index Attributes
The Index Attributes section of Figure 17.1 contains many options that you can use to create the fastest index for any particular situation. We will consider the attributes individually.
Clustered vs. Nonclustered
Although this option is not the first one in this section of the dialog box, we need to consider it first because its basic to an understanding of some of the other options. SQL Server allows two basic types of indexes: clustered and nonclustered. A clustered index has three fundamental properties.
the data is kept in sorted order
the leaf-level rows are the data rows
the non-leaf level pages and the data are in separate storage areas
Only one clustered index is allowed in each table because you can keep the records in only one sorted order.
Nonclustered indexes have an extra level, with the keys in sorted order and pointers to the actual data rows. This type of index requires three separate storage areas. You can have as many as 250 indexes per table, and they can all be nonclustered. If you have one clustered index, then you can have up to 249 nonclustered indexes.
Unique Keys
This option allows only one instance of each value in the index table. If you try to insert another row with the same value in the field(s) designated with the unique tag, you get an error and the insert will be rolled back. Every table should have one unique index. Although SQL Server does not force this rule, having a table without at least one unique index violates relational theory.
Some front-end products do not work well unless you have one unique index. For example, Microsoft Access does not allow updates to a SQL Server table unless it has a unique index. The only tables without unique indexes in most applications are intermediate work tables, usually used when converting or importing data from legacy systems.
Ignore Duplicate Keys
This attribute controls what happens when an Insert or Update statement attempts to create a duplicate key entry in a table with a unique clustered index. The IGNORE_DUP_KEY parameter of the Create Index command gives you this same option. When this attribute is not set, a warning is generated and the duplicate entry is discarded, which can be disastrous in an Update statement.
However, this option can be very useful because it lets transactions proceed when a duplicate is encountered. It is particularly helpful when you are inserting more than one record at a time; for example, using BCP when you dont want to drop your index and you dont want the entire batch to fail.
Even when this option is turned on, you cannot create a unique index on a table that already has duplicate data. To create a unique index on a table with duplicate data, first use the next option, Ignore Duplicate Rows, which gets rid of the duplicate data. Then come back and create the unique index.
Ignore/Allow Duplicate Rows
These mutually exclusive options control nonunique clustered indexes. The Ignore Duplicate Rows option removes duplicate rows, if any, when the index is created. This option is one way to pull duplicate rows out of legacy data but note that it removes the entire row, not just the key. Duplicate rows you try to insert after the index is created are ignored, meaning in this case that the row is never inserted in the table.
When you use the Allow Duplicate Rows option, SQL Server generates no errors or warnings if data is duplicated. You should avoid using the Allow Duplicate Rows option because it violates relational theory and you will not be able to create a unique index, which can in turn create other problems.
The IGNORE_DUP_ROW and ALLOW_DUP_ROW parameters to the Create Index statement also give you these options.
Sorted Data
This option avoids resorting the data when a clustered index is created. You should check the box for this option only when you are positive that the data is in sorted order if its not, an error is generated and the Create Index statement fails. The SORTED_DATA parameter on the Create Index statement gives you this same option.
Sorted Data Reorganization
When you check this option, SQL Server physically reorganizes the data without doing a sort. The data must already be in sorted order before you can use this option.
This option is particularly useful when youre using the Fill Factor option (discussed below) and when the data needs to be reorganized for better performance. Reorganization is also a good idea when the data becomes fragmented, which you can check by running the DBCC SHOW_CONTIG (table, index) command. This option is also available with the SORTED_DATA_ REORG parameter to the Create Index statement.
Fill Factor
This seldom-used but effective option, also available through the Fillfactor parameter of the Create Index statement, can do more than most people expect. It specifies the initial percentage to which the index leaf-level pages are filled. The key word in the previous sentence is initial, because the indexes are not kept at this percentage. Over time, index page splits, inserts, and deletes change the average. Although an overall, server-wide option in the system configuration sets the fill factor for all databases, you should not use it. It is better to use this option on individual indexes.
The default setting for this option is 0, which means that only the leaf pages are filled. Space for one row is reserved in the nonleaf pages for nonclustered indexes (either unique or nonunique) and for unique clustered indexes. Space for two entries is reserved in the nonleaf pages for nonunique clustered indexes.
A value of 100 means that all pages, leaf and nonleaf, will be filled to capacity. This value should be used only for read-only tables because inserted data will cause a page split, which makes two of the pages approximately 50 percent full. An Update statement may or may not cause a page split, depending upon whether the update is deferred or direct and whether the length of a value in a variable-length key field increases enough to overflow a page.
When a value from 1 to 99 is specified, nonleaf pages follow the same rule as they do when the value is 0; i.e., space for one row is reserved in the nonleaf pages for nonclustered indexes (either unique or nonunique) and for unique clustered indexes. Space for two entries is reserved in the nonleaf pages for nonunique clustered indexes. Leaf pages are filled to a capacity that is less than or equal to the fill factor value. Because the data pages in a clustered index are the leaf-level pages, the fill factor affects the data pages of clustered indexes.
Table 17.1 shows the effect of the fill factor on index packing. For further discussion of the usefulness of this command, see the discussion of hot spots and page splits in Chapter 18, Optimizing Queries.
Manage Index Buttons
These buttons, which appear down the right side of the Manage Indexes dialog box (Figure 17.1), help you manage your indexes.
Rebuild, Remove, and Rename Buttons
These buttons affect the currently selected index in the ways their names imply. To change the table or the index the buttons will affect, use the Table or Index fields next to these buttons.
The Rebuild button rebuilds the current index. You should regularly rebuild indexes of active tables. You can use the DBCC DBREINDEX command to schedule this task regularly. When you rebuild a clustered index, all the nonclustered indexes will be rebuilt as well. Whether you click the Rebuild button, use DBCC DBREINDEX, or use SQLMaint.exe (the Database Maintenance Wizard) to rebuild the indexes, the actual indexing is done by the DBCC REINDEX command. You can rebuild the indexes on a table without dropping all the constraints associated with the table, which is required if you attempt to drop and rebuild an index that is also a primary key.
The Remove button deletes the index.
The Rename button renames the index. This action affects only one record in the sysindexes table and does not affect any other table, even if the index is referenced by a primary key.
Check Size Button
Clicking Check Size shows you how large an index is and updates the sysindexes table. The figure in the Actual Size field in the Index Size box at the bottom of the dialog box comes from the sysindexes table, which is not kept up-to-date. Clicking Check Size brings up a warning that updating the size can take a long time and the pages could be unavailable for update while its processing. You can also check the size of all the indexes for this table. If you have an overwhelming need to check the size of an index, you can do it here, but we recommend that you wait until you run the DBCC CHECKDB or DBCC CHECKTABLE commands during your normal nightly or weekly schedule.
Distribution Button
SQL Server has one of the best query optimizers in the database industry. The core of the optimizer is a distribution page, which keeps a statistical distribution of the values in the first field of each index in a table. Every index has one and only one distribution page. The optimizer uses this page to help make its final decision of how to minimize I/O to satisfy a query.
The Distribution button brings up the window in Figure 17.2, which shows the statistical distribution of the values in the first field in the index. This window was one of the most impressive additions to SQL Server 6.0 you dont have to calculate statistical distribution values. Instead, you must know only how to interpret the significant values on this screen, which is no easy task each statistic has far-reaching implications.
Figure 17.2 displays the distribution statistics in an easy-to-understand fashion. SQL Server distribution pages keep an even distribution, as opposed to a normal or standard distribution. An even distribution simply tracks every nth value in the first field of the index. To find the value of n, divide the number of rows in the table by the length, in bytes, of the first field in the index. This calculation may not be so simple when you are using a variable-length field. SQL Server tracks as many values as it can with variable-length fields, using the maximum length of the field to calculate an initial number of steps. The number of steps is shown in list of statistics at the top, and the value in each step is shown in the scrollable window at the bottom.
The value for the Sampling Step Interval indicates how many rows are represented by each recorded step. In the bottom window of the screen, the last value on the right is a pseudo record number representing the rows sorted by this field. Subtracting one value from the prior one in the list gives you the Sampling Step Interval.
The value of Average Row Hits is the most significant statistic on this page. The optimizer uses this statistic to determine which index has the best selectivity. Selectivity is an estimate of the number of rows that will be returned by a query that uses the equal sign in the Where clause for the first field in the index. (The best selectivity is the lowest number.) Selectivity is calculated by dividing the number of rows by the number of unique values in the first field in the index. You can find the number of unique values in a field with the following query:
Master SharePoint with 3 eLearning Seminars Learn how to build a better SharePoint infrastructure and enable powerful collaboration with MVPs Dan Holme and Michael Noel. Register today!
SharePointConnections Conference Fall 2008 Don’t miss the premier event for Microsoft IT Professionals in Las Vegas, November 10-13. Register and book your room by August 25 and receive a FREE room night (based on a three night minimum stay).
VMworld 2008 - Sign Up Today! Join your peers on September 15-18 at The Venetian Hotel in Las Vegas as VMware hosts VMworld 2008, the leading Virtualization event.
Microsoft® Tech•Ed EMEA 2008 IT Professionals Advance your thinking with new ideas and practical real-world solutions at Microsoft’s FIVE day technical infrastructure conference 3-7 Nov., 2008. Register before 26 September 2008 to save €300.
Order Your SQL Fundamentals CD Today! Learn how to use SQL Server, understand Office integration techniques and dive into the essentials of SQL Express and Visual Basic with this free SQL Fundamentals CD.
Are You Really Compliant with Software Regulations? View this web seminar that will help you with compliance best practices and check out a management solution to assure that you won’t be in jeopardy of an audit.