Memory
The Memory:
Pages/sec counter is the number of pages read or written to the disk when
the system can’t find the page in memory. This page management process is
referred to as paging. If the average value for this counter is
five, you need to tune the system. If this value is 10 or more, put tuning
the server high on your priority list. Before SQL Server 6.0, the value
for this counter was an important flag to tell you whether memory was the
bottleneck. Now, with SQL Server’s parallel read-ahead feature, this
counter will give you only an indication of how busy the read-ahead
manager is. However, we will discuss other counters that are better at
tracking the read-ahead manager. In other words, this counter may have
been one of the most significant counters to track in the past, and it
still is on machines without SQL Server, but better ones are available to
track memory.
The Memory: Available Bytes counter displays the
amount of free physical memory. If the value for this counter is
consistently less than 10 percent of your total memory, paging is probably
occurring. You have too much memory allocated to SQL Server and not enough
to Windows NT.
Processor Before
we start talking about the counters in the processor category, it is
important to know that Windows NT assigns certain responsibilities to
certain processors if you have four or more CPUs. Processor 0 is the
default CPU for the I/O subsystem. Network Interface Cards (NIC) are
assigned to the remaining CPUs, starting from the highest-numbered CPU. If
you have four processors and one NIC, that card is assigned Processor 3.
The next NIC gets Processor 2. Windows NT does a good job of spreading out
processor use. You can also set which processors SQL Server uses. See
Chapter 16, “Performance Tuning,” particularly the notes on the Affinity
Mask, for more information about allocating processors.
You can
monitor each processor individually or all the processors together. For
monitoring individual processors, use the Processor: % Process Time
counter. This counter lets you see which processors are the
busiest.
A better counter to monitor over the long term is the
System: % Total Processor Time counter, which groups all the processors to
tell you the average percentage of time that all processors were busy
executing non-idle threads.
Who (or what) is consuming the CPU
time? Is it the users, system interrupts, or other system processes? The
Processor: Interrupts/sec counter will tell you if it is the system
interrupts. A value of more than 1,000 indicates that you should get
better network cards, disk controllers, or both. If the Processor: %
Privileged Time is greater than 20 percent (of the total processor time)
and Processor: % User Time is consistently less than 80 percent, then SQL
Server is probably generating excessive I/O requests to the system. If
your machine is not a dedicated SQL Server machine, make it so. If none of
these situations is occurring, user processes are consuming the CPU. We
will look at how to monitor user processes when we consider SQL
Server-specific counters in the next section.
Disk I/O As discussed in Chapter 16,
“Performance Tuning,” having many smaller drives is better than having one
large drive for SQL Server machines. Let’s say that you need 4 GB of disk
space to support your application with SQL Server. Buy four 1-GB drives
instead of one 4-GB drive. Even though the seek time is faster on the
larger drive, you will still get a tremendous performance improvement by
spreading files, tables, and logs among more than one drive.
Special Note: The
single best performance increase on a SQL Server box comes from
spreading I/O among multiple drives (adding memory is a close
second).
Monitor
the disk counters to see whether the I/O subsystem is the bottleneck, and
if it is, to determine which disk is the culprit. The problem may be the
disk controller board. The first thing to know about monitoring disk I/O
is that to get accurate readings from the Physical Disk counters, you must
go to a command prompt window and type DISKPERF -y, then reboot.
This procedure turns on the operating system hooks into the disk
subsystem. However, this setup also causes a small performance decrease of
3 to 5 percent, so you want to turn this on only periodically and only for
a short period. Use the Diskperf -n command to turn it off, then restart
your system.
Track Physical Disk: % Disk Time to see how much time
each disk is busy servicing I/O, including time spent waiting in the disk
driver queue. If this counter is near 100 percent on a consistent basis,
then the physical disk is the bottleneck. Do you rush out and buy another
disk? Perhaps that is the best strategy if the other drives are also busy,
but you have other options. You may get more benefit from buying another
controller and splitting the I/O load between the different controllers.
Find out what files or SQL Server tables reside on that disk, and move the
busy ones to another drive. If the bottleneck is the system drive, split
the virtual memory swap file to another drive, or move the whole file to a
less busy drive. You should already have split the swap file, unless you
only have one drive (which is very silly on a SQL Server
machine).
LogicalDisk: Disk Queue Length and PhysicalDisk: Disk
Queue Length can reveal whether particular drives are too busy. These
counters track how many requests are waiting in line for the disk to
become available. Values of less than 2 are good; if the value is any
higher, it’s too high.
Network Redirector: Read Bytes Network/Sec
gives the actual rate at which bytes are being read from the network.
Dividing this value by the value for the Redirector: Bytes Received/Sec
counter gives the efficiency with which the bytes are being
processed.
If this ratio is 1:1, your system is processing network
packets as fast as it gets them. If this ratio is below 0.8, then the
network packets are coming in faster than your system can process them. To
correct this problem on a multiprocessor system, use the Affinity Mask and
SMP Concurrency options in the SQL Configuration dialog box to allocate
the last processor to the network card, and don’t let SQL Server use that
processor. For example, if you have four CPUs, set the Affinity Mask to 7
(binary 0111) and SMP Concurrency to 3. This setup gives three CPUs to SQL
Server and the fourth processor to the network card, which Windows NT
assigns to that processor by default. If I/O is also a problem, set the
Affinity Mask to 6 (binary 0110) and SMP Concurrency to 2, because Windows
NT assigns the I/O subsystem to the first processor by
default.
Monitoring SQL
Server The questions to ask yourself when monitoring the SQL
Server categories are “Do I have the optimal configuration values for SQL
Server?” and “Who is consistently using the most resources?”
If any
of the counters considered in this section indicate a problem, the problem
is somewhere related to SQL Server. If the problem is I/O, memory, CPU, or
locks, you can dig deeper and find out who the culprits are. However, if
you are using a long-term logging strategy for monitoring, you must
monitor every session to be sure you have the necessary historical data
when you want to see what was happening at a particular time.
If
you are watching the monitor when a problem occurs, go to the SQL
Server-Users object and turn on the counter for all instances. The
instances in this case are the sessions currently logged on. You can see
the login ID and the session number. If you see one or more sessions
causing the problem, you can spy on them to find the last command sent. Go
to the Enterprise Manager, click the Current Activity button on the
toolbar, and double-click the line in the display corresponding to the
session number. You will see the last command received from the session.
To trace commands in more depth, use the SQLTrace utility that is new with
version 6.5. (See Chapter 3, “Administrative and Programming Tools,” for
details.)
The five main categories of SQL Server counters to
monitor are cache, disk I/O, log, locks, and users. We will consider each
of these categories separately as well as a mix of other important
predefined counters. The final part of this section discusses the new
user-defined counters.
Cache To
monitor your cache, watch SQL Server — Cache Hit Ratio. It monitors the
rate at which the system finds pages in memory without having to go to
disk. The cache hit ratio is the number of logical reads divided by the
total of logical plus physical reads. If the value for this counter is
consistently less than 80 percent, you should allocate more memory to SQL
Server, buy more system memory, or both. However, before you buy more
memory, you can try changing the read-ahead configuration options. Also
look at the discussion of free buffers in the next chapter to determine
whether the number of free buffers is approaching zero. Changing the free
buffers configuration parameter may increase the cache hit
ratio.
To find out if you have configured SQL Server properly, you
should monitor SQL Server-Procedure Cache: Max Procedure Cache Used (%).
If this counter approaches or exceeds 90 percent during normal usage,
increase the procedure cache in the SQL Server configuration options. If
the maximum cache used is less than 50 percent, you can decrease the
configuration value and give more memory to the data cache. Rumor has it
that SQL Server 7.0 will have a floating-point number for the procedure
cache configuration parameter so that you can give the procedure cache
less than 1 percent of your SQL Server memory. For a super server with
gigabytes of memory, even 1 percent is too much for procedure
cache.
If a 2K data page has been swapped to the Windows NT virtual
memory file and read in again later, SQL Server still counts the page as
already in memory for the purposes of the Cache Hit Ratio counter.
Therefore, a system bogged down by heavy swapping to virtual memory could
still show a good cache hit ratio. To find out if your system is in this
category, monitor the Memory: Page Faults/Sec counter.
The Memory:
Page Faults/Sec counter watches the number of times a page was fetched
from virtual memory, meaning that the page had been swapped to the Windows
NT swap file. It also adds to the counter the number of pages shared by
other processes. This value can be high while system services, including
SQL Server, are starting up. If it is consistently high, you may have
given too much memory to SQL Server. The network and operating system may
not have enough memory to operate efficiently.
Warning: This
counter is a strange one to figure out. Running this counter on four
different types of machines gave widely different results. To try to get a
baseline value, we turned off all services, including SQL Server,
unplugged the boxes from the network, and ran Performance Monitor with
only the Memory: Page Faults/Sec counter turned on. The lowest measurement
of page faults per second was from the system we least expected — a 50 MHz
486 with 16 MB of memory and one disk drive. It settled in at about five
to seven page faults per second. The DEC Alpha with 4 processors, 10 GB
RAID 5 striping on 5 drives, and 256 MB of memory was up in the 35 to 40
page faults per second range. So was a similarly configured Compaq
ProLiant. The laptop performed in the middle, at about 15 page faults per
second. It is a 90 MHz Pentium with 1 disk drive and 40 MB of memory. All
were running Microsoft Windows NT version 3.51 service pack 4. All
services except Server and Workstation were turned off. Running the same
experiment with Windows NT 4.0 service pack 1 showed approximately the
same results, except that the page faults per second numbers ran
consistently 10 percent less than in Windows NT 3.51.
The result of
this experiment is that we can’t recommend a range to gauge the
performance of your machine. The best you can do is turn off all services
for a brief period to get a baseline measurement on your machine, then use
this value as a guide for your regular usage.
Disk I/O Several counters measure how busy
your disk drives are and which disk drives are the busiest. Remember that
for any I/O measurements to be effective, you must run the Windows NT
Diskperf -y command and reboot the system.
Even though the SQL
Server: I/O Transactions Per Second counter is a bit misleading, it is
still good, especially for capacity planning. This counter measures the
number of Transact-SQL batches processed since the last refresh period.
You should not use these results against any standard TPC benchmark tests
that give results in transactions per second — it is not referring to a
Begin/Commit transaction, just to batches of commands. Watch this number
over a span of several months, because an increase in this counter can
indicate that the use of SQL Server is growing.
The SQL Server: I/O
— Lazy Writes/Sec counter monitors the number of pages per second that the
lazy writer is flushing to disk. The lazy writer is the background Windows
NT process that takes the data from cached memory and writes it to disk,
although sometimes a lazy writer is hardware that reads the cached memory
on the disk drive and is managed by the disk controller. A sustained high
rate of lazy writes per second could indicate any of three possible
problems:
the Recovery Interval configuration parameter is too short, causing
many checkpoints
too little memory is available for page caching
the Free Buffers parameter is set too low
Normally this rate
is zero until the least-recently used (LRU) threshold is reached. LRU is
the indicator by which memory is released for use by other processes.
Buying more memory may be the best solution if the configuration
parameters seem to be in line for your server size.
The SQL Server:
I/O Outstanding Reads counter and the I/O Outstanding Writes counter
measure the number of physical reads and writes pending. These counters
are similar to the PhysicalDisk: Disk Queue Length counter. A high value
for this counter for a sustained period may point to the disk drives as a
bottleneck. Adding memory to the data cache and tuning the read-ahead
parameters can decrease the physical reads.
The SQL Server: I/O
Page Reads per Second counter is the number of pages not found in SQL
Server data cache, which indicates physical reads of data pages from disk.
This value does not count pages that are read from the Windows NT virtual
memory disk file. There is no way to watch only the logical page reads per
second. According to sources in the SQL development team, counters for
logical pages reads are hidden in a structure that is not available in
this version of SQL Server. However, you can figure out the logical page
reads per second by taking the total page reads per second and subtracting
the physical page reads per second.
WinConnections Conference Fall 2008 Don’t miss the premier event for Microsoft IT Professionals in Las Vegas, November 10-13. Register and book your room by August 25 and receive a FREE room night (based on a three night minimum stay).
Master SharePoint with 3 eLearning Seminars Learn how to build a better SharePoint infrastructure and enable powerful collaboration with MVPs Dan Holme and Michael Noel. Register today!
SharePointConnections Conference Fall 2008 Don’t miss the premier event for Microsoft IT Professionals in Las Vegas, November 10-13. Register and book your room by August 25 and receive a FREE room night (based on a three night minimum stay).
VMworld 2008 - Sign Up Today! Join your peers on September 15-18 at The Venetian Hotel in Las Vegas as VMware hosts VMworld 2008, the leading Virtualization event.
Microsoft® Tech•Ed EMEA 2008 IT Professionals Advance your thinking with new ideas and practical real-world solutions at Microsoft’s FIVE day technical infrastructure conference 3-7 Nov., 2008. Register before 26 September 2008 to save €300.
Order Your Fundamentals CD Today! Gain an introduction to Exchange, learn server security requirements, and understand how unified communications can play a role in your messaging strategies with this free Exchange CD.
Are You Really Compliant with Software Regulations? View this web seminar that will help you with compliance best practices and check out a management solution to assure that you won’t be in jeopardy of an audit.
Virtualization Congress Oct. 14-16 in London Don't miss Virtualization Congress, the premiere EMEA conference dedicated to hardware, OS and application virtualization. Oct. 14-16 in London.