Troubleshooting a Physical RAM Shortage
Detecting or isolating a physical RAM shortage involves several steps. First, measure or evaluate the paging activity of the system. This is done by monitoring the values of the following counters:
Memory: Page Faults/sec displays the average number of page faults per second. This counter includes both soft and hard pages.
Memory: Pages Input/sec displays the average number of pages per second read from disk. In other words, this counter measures the level of hard fault paging. This counter shows the number of pages read, not the number of disk reads.
Memory: Page Reads/sec displays the average number of times per second the disk is read to resolve hard fault paging. This counter shows only the number of disk reads, not the number of pages read.
By comparing Page Faults with Pages Input, you can determine the proportion of page faults resulted in disk access. In Figure 2.13, you can see the differences between the Page Faults and Pages Input counters.
The Page Faults plot line often has values over 100. This indicates a high level of page faults. The Pages Input line rarely peaks over 100. These two plot lines intersect when the VMM reads a page from disk, that is, a hard page. The distance between these plot lines represents faults satisfied with soft paging.
By comparing Pages Input with Page Reads, you can determine how many pages are read per disk access. Each time these plot lines intersect, one page is read per disk access. The distance between these plot lines represents multiple page reads per disk access.
The average number of Page Faults per measurement interval was 80; Pages Input, 19; and Page Reads, 6. These averages reveal several important items. Eighty percent of all memory accesses resulted in a page fault. Approximately 24 percent of those faults caused hard paging (Pages Input/Page Faults). Each disk access resulted in an average transfer of 3 pages (Pages Input/Page Reads). A consistent level of hard paging in excess of 20 percent should be considered abnormal. A consistent Page Reads per second of 5 or greater indicates significant disk access.
Second, compare paging with disk activity. When paging consumes more than 10 percent of disk activity, too much paging is occurring. To compare paging with disk activity, add the following counters to your Chart view:
LogicalDisk: % Disk Read Time: _Total This counter displays the percentage of time the disk was busy processing read requests. Levels greater than 75 percent should come under suspicion.
LogicalDisk: Avg. Disk Read Queue Length: _Total This counter displays the average number of requests waiting to be processed by the disk. A value greater than 2 can indicate a performance problem.
LogicalDisk: Disk Reads/sec: _Total This counter displays the number of disk accesses per second. Most high performance disks can process 40 I/Os per second.
When too little memory is present on a system, the disk system is taxed to process all of the paging requests. High values for % Disk Read Time and Avg. Disk Read Queue Length should be explored. However, comparing Memory: Page Reads/sec with LogicalDisk: Disk Reads/sec should indicate whether the problem lies with RAM or the physical disk. If Memory: Page Reads/sec is greater than 10 percent of LogicalDisk: Disk Reads/sec, then the RAM is the problem.
This same comparison can be made with paging writes as well. The relevant counters to evaluate memory-caused disk writes are
Memory: Page Writes/sec
Memory: Pages Output/sec
Logical Disk: Disk Writes/sec
Logical Disk: Disk Write Bytes/sec
Logical Disk: Avg. Disk Write Queue Length
Third, the last step in evaluating physical memory shortage is to monitor the use of the swap file. On systems with too little RAM, NT expands the paging file to create additional virtual memory. NT can expand the paging file to either the maximum limit defined through the System applets Performance Tab or to the point where there is no available disk space. The size of the paging file should be monitored using the Process: Page File Bytes counter. If the swap file expands to its maximum capacity, this can indicate too little physical RAM, too small a swap file, or a leaky application that fails to release resources. If your page file consumes all available and allocated space, you need to eliminate the possibility of a leaky application. This is done by monitoring the Process: Page Faults/sec counter for all active processes. A histogram view can quickly reveal whether any one application is causing the paging.
Once you determine that your system doesnt have enough physical RAM, there are several actions you can take to eliminate or reduce the effect of the shortage on system performance. The following list is arranged in a suggested order of approach, within the parameters of reasonable cost, ease of implementation, and manageable effects on the overall system (i.e., When you implement the solution, how much is the whole system changed?):
Increase the maximum size of your swap file.
Split the swap file across multiple fast disks.
Reduce memory usage by limiting applications and services.
Remove or correct leaky applications.
Increase the speed of the RAM (i.e., replace the current RAM with new faster RAM). Remember that RAM operates at the speed of the slowest RAM chip.
Add more physical RAM.
Add a fast disk systemfaster drives and/or faster drive controllers.
If your system has sufficient physical RAM, you can observe the symptoms of a low memory system by altering your startup parameters. (This is purely for experimental purposes; if you do modify your startup parameters, be sure to revoke your changes to return to normal operations.) By adding the MAXMEM parameter to the appropriate line in the BOOT.INI file, you can limit how much physical RAM NT can see and use. This setting can be used to simulate a system with not enough physical RAM without altering the physical configuration of your system. The parameter is /MAXMEM=n, where n is the number of megabytes of memory to which you limit NTs ability to see. Add this parameter after the label on a line under the [operating systems] heading. For example,
[boot loader]
timeout=5
default=multi(0)disk(0)rdisk(0)partition (2)\WINNTW
[operating systems]
multi(0)disk(0)rdisk(0)partition(2)\WINNTW="WINDOWS NT WORKSTATION VERSION 4.00"
multi(0)disk(0)rdisk(0)partition(2)\WINNTW="WINDOWS NT WORKSTATION VERSION 4.00 [VGA MODE]" /basevideo /sos
C:\ = "MS-DOS 6.22"
should be edited to
[boot loader]
timeout=5
default=multi(0)disk(0)rdisk(0)partition (2)\WINNTW
[operating systems]
multi(0)disk(0)rdisk(0)partition(2)\WINNTW="WINDOWS NT WORKSTATION VERSION 4.00, 16 MB" /MAXMEM=16
multi(0)disk(0)rdisk(0)partition(2)\WINNTW="WINDOWS NT WORKSTATION VERSION 4.00 [VGA MODE]" /basevideo /sos
C:\ = "MS-DOS 6.22"
Warning: Make sure that you do not set the value of MAXEM to less than 8, because NT will fail to boot. We suggest using a value of 32, 16, or 12 to perform the simulation. Be sure to remove this parameter once you have completed the simulation.
Processor Bottlenecks
The CPU is the central component of a computer system. Almost every transmission of data occurs through the CPU. Thus, it is essential to identify and solve processor-related bottlenecks quickly. Before you can determine whether you have a processor bottleneck, you need to eliminate the possibilities of memory, application, and disk problems.
In most cases, a processor bottleneck is identified by one of the following counters:
Processor: % Processor Time indicates the amount of time the CPU spends on non-idle work. Its common for this counter to reach 100 percent during application launches or kernel-intensive operations, such as Security Accounts Manager (SAM) synchronization. But if this counter remains above 90 percent for an extended period, you should suspect a CPU bottleneck. This level of activity indicates that the system is performing work 90 percent of the time and does not have much capacity for additional work.
Processor: % Total Processor Time applies to multiprocessor systems only. This counter should be used the same way as the single CPU counter: If any value remains consistently higher than 90 percent, at least one of your CPUs is a bottleneck.
System: Processor Queue Length indicates the number of threads waiting for processor time. A sustained value of 2 or higher for this counter indicates processor congestion. Note that this counter is a snapshot at the time of measurement, not an average value over time.
A processor bottleneck occurs when the CPU is so busy that it cant respond to new requests for computing cycles. While high CPU utilization can indicate a bottleneck, more revealing indicators are the queue length and poor interface response times.
Once a CPU bottleneck has been determined, it can be eliminated or its effects on system performance can be reduced with the following actions. The following list is arranged in a suggested order of approach, within the parameters of reasonable cost, ease of implementation, and manageable effects on the overall system (i.e., When you implement the solution, how much is the whole system changed?):
Remove all graphic-intensive screen savers.
Transfer CPU-intensive applications to other servers.
Alter the execution priorities for nonkernel processes.
Add more L2 or secondary cache.
Upgrade older motherboards.
Replace the current CPU with a faster CPU. Remember that you may have to replace your motherboard and memory to upgrade your CPU.
Upgrade network and disk controller cards to 16-, 32-, and 64-bit PCI models with bus mastering instead of programmed I/O. Avoid ISA cards if at all possible, especially for heavily taxed disk and network controllers.
Add a second CPU. Keep in mind that on most systems additional CPUs offer a diminishing return on performance improvement.
Tip: Before rushing to correct a CPU bottleneck, try to identify any processes that could be causing CPU constricion. In some cases, a faulty application can cause performance problems that have symptoms like those of a true CPU bottleneck. Correcting an application problem is often easier and less expensive than replacing a CPU. Process/application inspection requires monitoring the performnace of individual processes and comparing them to your systems baseline.
Order Your SQL Fundamentals CD Today! Learn how to use SQL Server, understand Office integration techniques and dive into the essentials of SQL Express and Visual Basic with this free SQL Fundamentals CD.
You've Deployed SharePoint...Now What? This one-day free online conference delivers the technical knowledge needed to kick MOSS up a notch. In one information-packed day, independent SharePoint experts will present practical, real-world information and provide take-away, ready-to-use solutions
What Would You Do If You Ran Microsoft? ITTV's 2008 inaugural video contest, "If I Ran Microsoft..." is your chance to tell it like it is. Be goofy or be serious, but don"t miss this chance to have fun, win prizes, and go viral in a major way.
Maximize Your SharePoint Investment This web seminar discusses how true bi-directional replication of SharePoint content from one server to another enables branch offices to maintain access to current SharePoint content.