Abstract
This chapter discusses the most important counters to monitor (in both Windows NT and SQL Server), consider time intervals, and recommend a long-term strategy for monitoring performance. A table summarizes which counters to consider for particular problems.
PERFORMANCE MONITOR
Performance Monitor collects data about different counters, such as memory use. Performance Monitor can show you data in graphical format in real time, or you can save the data to log files. Pay particular attention to the discussion of log files in this section, because you will also use log files for long-term performance monitoring, which we will discuss in detail later. Working with a log file can be difficult to learn on your own because the options are not intuitive and are hidden on different screens.
You can choose between two Performance Monitors: one in the Administrative Tools group and one in the SQL Server group. They are the same basic program, but you need to run the one in the SQL Server group because it automatically loads the SQL Server-related counters. You run this version of the program with the following command: Perfmon.exe C:\Mssql\Binn\ Sqlctrs.pmc, where the .pmc file is the Performance Monitor counter file that contains the SQL counters. You can write applications that provide your own counters, and you can modify the new system stored procedures called sp_user_counter1 through sp_user_counter10 and track them, too.
When you run the program from the SQL Server group, the window in Figure 15.1 appears. Because you started the program that includes the set of SQL counters, five counters appear at the bottom of the window when Performance Monitor starts. The five counters are
Cache Hit Ratio
I/O Transactions per second
I/O Page Reads per second
I/O Single Page Writes per second
User Connections
These counters will be explained in more detail later, but first, lets learn how to navigate in Performance Monitor.
Changing Menu Options
The first set of buttons on the tool bar at the top of the window corresponds to the four views of the monitor: chart, alert, log, and report views. You can get to the same options using the View menu.
The menu options change depending upon which view is currently active. Without going into too much detail about the View menu options, their basic purpose is to let you set up and save standard viewing templates for each of the four views.
Understanding Counters
Windows NT lets you watch the performance of the system by counting the activity associated with any of its objects. Examples of objects in Windows NT are processors, disk drives, and processes. Each object has specific counters associated with it; for example, the % User Time counter is associated with a CPU or processor to designate what percent of the CPU is taken up by user programs (as opposed to system processes). This chapter gives you enough information to help you choose the right counters at the right time.
SQL Server includes many predefined counters, most of which you arent likely to use except in special cases. It can be difficult to know which counters are the basic ones to watch. If you have chosen the SQL Server Performance Monitor, several counters have been set up as default counters, such as Cache Hit Ratio and User Connections. You can create your own defaults by creating a .pmc file.
The counters are hooks into the operating system and other programs, like SQL Server, that have been built into the software to let Performance Monitor get data. Data collection is performed efficiently so that the additional load on the system is minimized. Windows NT needs most of the information gathered for managing memory, processes, and threads, and Performance Monitor is a good program to display the results.
On the tool bar in Figure 15.1, the button next to the four view buttons at the top of the window is a big plus sign, which you use to add counters to monitor. Click the + button, and the window in Figure 15.2 will appear. The first field, Computer, has a search button at the end of the field. You can click on this field to bring up a list of all computers in your domain and choose a computer from the list, or you can type the name of a server you want to monitor. To monitor other servers, you need Windows NT administrative privileges on them.
In the next field, Object, you choose an object to monitor. The default is Processor, and the default counter shown in the field below is % Processor Time. The box on the right is the Instance. Any particular resource may have more than one instance ; that is, more than one of that particular resource in this case, processors may exist. This computer has only one processor (CPU) because the instance in the box is 0. Instance 3 refers to the fourth CPU.
From the fields along the bottom, you can pick the color, scale, line width, and line style of the information that will be displayed about the counter you are adding. These options let you choose a different look for each counter you add to the window. The only display choice that may need explanation is scale. The scale field is a multiplier that helps you fit the values on the screen in the range you have set on the y-axis, which by default is 0 100.
After you choose the Object, Counter, and Instance you want to monitor and determine how you want the information to appear, click Add. The counter is added at the bottom of the list on the main window (Figure 15.1) and starts graphing the next time your data is refreshed.
If you click the Explain button in Figure 15.2, a brief explanation of the counter you specified will appear (Figure 15.3). Sometimes, though, it uses abbreviations and acronyms that require further research, unless you are a Windows NT internals guru.
Setting Up Alerts
An alert is the warning the computer sends you when a resource such as memory or the network becomes a bottleneck. When an alert occurs, it is written to a log file, along with the date and time it occurred. The log file is a circular file, allowing at most 1,000 entries before it starts overwriting the oldest alerts. The alert can also be written to the Windows NT event log.
To add a new alert, click the second button on the toolbar in Figure 15.1, then click the + button. The dialog box shown in Figure 15.4 will appear. Choose the counters you want to create alerts for, then click Add. The example on the screen will create an alert when the Cache Hit Ratio drops below 85 percent.
Notice the Run Program option in the lower right portion of the screen. You can use it to execute a program when the alert occurs. For example, you can choose SQL Server Log in the Object field, Log Space Used (%) for the Counter, and the database you want to monitor from the Instance list. When the log file for that database gets above 90 percent, you can execute a batch file that runs an ISQL script to dump the transaction log. In this way you can reduce your chances of running out of log space.
Starting Log Files
Learning how to establish log files is very important, because log files are a critical part of the long-term strategy recommended later in this chapter. It can be a bit confusing, so lets go through the steps.
Click the third button on the toolbar View Output Log File Status. Notice that the Log File entry at the top is blank, the status is closed, the file size is zero, and the log interval is 15.00 (seconds).
Click +, and the list of objects shown in Figure 15.5 will appear. Select the ones you want to add to the log and click Add. If you hold down the Ctrl key while selecting, you can choose more than one counter, and holding down Shift lets you highlight all the items in a range. All counters in the objects you pick will be tracked in the log file. We will discuss what to monitor later.
Now we need to specify a log file. From the Options menu, choose Log. The dialog box shown in Figure 15.6 will appear.
This dialog box looks almost like the standard file dialog box, but it has two very important additions. At the bottom of the screen, the Update Time section shows the refresh interval. For short-term tracking, keep it at 15 seconds. For long-term tracking, set it at 300 seconds (5 minutes). The other important difference between this dialog box and the standard file name dialog box is the Start Log button. Nothing happens until you click this button to start collecting data. Once you do, the text of the button will change to Stop Log.
Type a log file name in the File Name box at the top. Then click Start Log.
Click OK to close this dialog box, then minimize the window and let the log run for a while.
Maximize the window and click the Stop Log button. Then switch to the Chart view by clicking the first button on the toolbar.
From the Options menu, choose Data From. Select the log file you named earlier. You can then choose the counters you want to view from the log.
The best part about using log files is that you can view a few counters at a time to avoid overcrowding the window. You can also mix and match the counters you want to analyze at the same time. This feature is important because many of the counters depend on other counters.
Special Note: The log file does not do anything until you click the Start Log button in the Log Options dialog box (also available by choosing Log in the Options menu).
Reports
The fourth button on the toolbar, the Reports button, lets you print customized reports of the data collected in your log file. Experiment with the available reports when you have a chance; we wont cover this option here.
TIME INTERVALS
The default refresh interval for Performance Monitor is one second. Every second, you get new information about your systems performance. This interval is good for a very short-term examination of the system, but it can be a drain on the server. A five-second interval causes much less overhead, probably in the neighborhood of five percent extra activity. However, for long-term monitoring, 5 seconds produces a very large log file.
Setting the interval to 5 minutes creates a reasonable size log file, but this large an interval can mask performance peaks. However, because each entry in the log file stores the minimum, average, and maximum values for each counter, or aspect of SQL Server you want to monitor, you can discover the peaks with a little extra analysis. Five minutes is a good setting for long-term logging. You can always fire up another copy of Performance Monitor and look at one- to five-second intervals if you want a short-term peek into the systems performance.
To determine the amount of drain on the system from Performance Monitor, shut down all the services and start the Monitor again. Add the CPU usage and watch it for about 30 seconds at the default interval of one second. Then change the interval to 0.1 seconds. Your CPU usage will jump dramatically. One odd observation is that the effect of changing from one second to 0.1 seconds is different on different computers, and it is different between Windows NT 4.0 and Windows NT 3.51. For example, when changing the interval on two 133 MHz computers a laptop and a tower box the tower machine has the better performance at the shorter interval, showing about 55 percent utilization, while the laptop shows about 60 percent utilization.
Special Note: The faster your refresh option, the more the drain on the system. The default one-second refresh interval creates less than 5 percent overhead on a single-processor machine. For multiprocessor machines, the overhead is negligible. With the refresh interval set to 0.01 seconds, Performance Monitor takes about 60 percent of the resources. At 10 seconds per refresh, the drain is almost too small to measure, even with a lot of counters turned on.
WHAT TO MONITOR
Now that you know how to use the program, lets get to the section youve been waiting for: How do you know what to monitor? Of the hundreds of Windows NT counters and 50 or so SQL counters, how do you choose? Should you monitor everything? How long should you monitor the system?
Monitoring performance helps you perform two related tasks: identifying bottlenecks and planning for your future hardware and software needs (capacity planning). Learning about the important counters will help identify potential bottlenecks. The strategy section later in this chapter will help you put together a good plan for creating a general monitoring strategy.
What do you want to monitor? Everything! Well, monitoring everything may be a good idea for a short period, but the results will show that many of the counters are always at or near zero; monitoring them all the time may be a waste of time and resources. You need to establish a baseline for your system. This baseline lets you know what results are normal and what results indicate a problem. Once you establish a baseline, you dont need to track everything.
The key categories to monitor can be split into two major sections: Windows NT categories and SQL Server categories. Categories in this sense are groups of objects that contain counters.
Windows NT
Memory
Processor
Disk I/O
Network
SQL Server
Cache
Disk I/O
Log
Locks
Users
Other Predefined Counters
User-Defined Counters
When monitoring both categories of data, look for trends of high and low activity. For example, particular times during the day, certain days of the week, or certain weeks of the month might show more activity than others. After you identify highs and lows, try to redistribute the workload. These peaks and valleys are especially good to know when something new is added to the schedule. If the peak loads are causing problems, identify which things can be scheduled at a later time when the system is not so busy. Knowing the load patterns is also helpful when problems occur, so that you can re-run a particular job or report when the load is low.
Get to know your users find out which reports they need first thing in the morning. Perhaps you can schedule these reports to run at night in a batch mode, instead of having the user starting them during a busy time.
Monitoring Windows NT
The purpose of monitoring the Windows NT categories is to answer one of two questions: What resource is my bottleneck? or Do I see any upward usage trends that tell me what resource I might run low on first? SQL Server 6.5 introduced several highwater markers, such as Max Tempdb space used, which make it easier to identify potential long-term problems.
Critical Challenges of ESI & Email Retention Are you storing too much electronic information? Get expert legal advice and better understanding of what you are required to do as an IT professional.
Rev Up Your IT Know-How with Our Recharged Magazine! The improved Windows IT Pro provides trusted IT content with an enhanced new look and functionality! Get comprehensive coverage of industry topics, expert advice, and real-world solutions—PLUS access to over 10,000 articles online. Order today!
Get It All with Windows IT Pro VIP Stock your IT toolbox with every solution ever printed in Windows IT Pro and SQL Server Magazine plus bonus Web-exclusive content on hot topics. Subscribe to receive the VIP CD and a subscription to your choice of Windows IT Pro or SQL Server Magazine!
Order Your Fundamentals CD Today! Gain an introduction to Exchange, learn server security requirements, and understand how unified communications can play a role in your messaging strategies with this free Exchange CD.