Last Known Good
So what is Last Known Good? As I mentioned above, it is a copy of the HKLM\System\CurrentControlSet key and all of its subkeys and values. This part of the Registry defines device drivers, installed services, and system configuration information, and it is typically the area that causes something to fail when it is modified. A typical system contains two or more ControlSets. These are copies of past configuration changes that have been made to the system. Another key — the Select key under the HKLM\System key — is of interest with respect to ControlSets. The Select key tells Win2K which of the ControlSets it recognizes as the Current one, which is the Last Known Good one, and which, if any, was the last failed configuration. Figure 14 shows an example of this on a typical system.
There are two ControlSets on this system — 001 and 002 — and both are listed along with the CurrentControlSet. The Select key indicates that 001 is the current one; indeed, the key listed as CurrentControlSet is just a symbolic link to ControlSet001. When changes are made to a running system, they are made to CurrentControlSet, which is CurrentControlSet001. The Select key also indicates that ControlSet002 is the Last Known Good. This means that this is the last Control Set with a different configuration from the current one that successfully booted. When you choose Use Last Known Good from one of the boot menus, you actually tell the Select key to set the Current Registry value to be the same as LastKnownGood.
Notice also that there is a Failed value listed here. It is set to 0, which means there have been no failed configurations. If I had made a change to a key under CurrentControlSet that caused the system to crash, Win2K would have copied that ControlSet to a new number (e.g., ControlSet003) and listed that in the Failed value. A crash is not the only way that a control set gets marked as failed. If a device driver fails to initialize during system startup, the CurrentControlSet is marked failed and, depending upon the error the driver has returned, the system either blue screens or reboots and automatically uses the LastKnownGood configuration.
One key that was present on NT 4.0 and earlier versions that you wont find on Win2K systems is the Clone key, previously listed under System. The Clone key was yet another copy of the CurrentControlSet, created by the kernel at system initialization time. This key has been eliminated in Win2K, most likely because of its redundant role.
The role of ControlSets is an important one, but it is also important to remember that it is only really useful for recovering configuration changes made under this specific set of keys. Changes made to HKEY_CURRENT_USER or HKEY_CLASSES_ROOT are not rolled back using Last Known Good. However, because it is primarily kernel-mode device drivers or errant services that cause a system to crash, errors in these other keys can usually be recovered in different ways, as discussed in subsequent sections.
Safe Mode
Having reviewed the Last Known Good options, lets look at a new feature in Win2K — safe mode. When you boot your Win2K system, at the point where the boot menu appears you see a message at the bottom of the screen indicating that you can access the "Troubleshooting and Advanced Startup Options" menu by pressing F8. Once you do that, you see the menu of choices listed below:
For the purposes of our discussion, I talk about the first three options. As with Win95/98, each of the safe modes enables or disables a certain set of functions. You can use safe mode to help troubleshoot problem services or drivers. From a Registry perspective, there are few differences among the different safe modes. In the first three safe modes listed above, all of the Registry hives are loaded and active. Safe Mode and Safe Mode with Command Prompt are generally the same. Both start four Win2K services:
Eventlog
Logical Disk Manager
Plug and Play
Remote Procedure Call (RPC)
In Safe Mode, Plug and Play detection is enabled to let Win2K properly enumerate installed devices. However, although it may detect new devices that have been added since the last boot, those devices wont be installed unless that class of devices has started. Both Safe Mode and Safe Mode with Command Prompt disable all network bindings. If you try to access the network, it fails. However, Safe Mode loads a copy of the Explorer shell as the primary interface, whereas Safe Mode with Command Prompt only loads a command shell, with no Explorer shell. You can still run GUI applications such as Control Panel, but they must be started from the command line. Figure 15 shows Win2K running in Safe Mode
In Safe Mode with Networking, the network bindings are enabled and the following services are started:
Computer Browser
DHCP Client
DNS Client
Eventlog
Logical Disk Manager
Messenger
Plug and Play
RPC
Server
TCIP/IP NetBIOS Helper Service
Workstation
In Safe Mode with Networking, you also get the Explorer shell as the default interface. And, as in the other safe modes, all Registry hives are loaded. Additionally, all safe modes load VGA video drivers only, so you operate in 640x480, 16-color mode while in Safe Mode.
Safe Mode operation is a good choice when you have conflicting devices or services that require a stripped down environment to resolve the conflict. In some cases, you may find Registry problems that benefit from these modes. However, more often than not, the Last Known Good option probably takes you where you need to go without entering Safe Mode.
Tip: You wont be able to boot into Safe Mode on a Win2K domain controller, because the netlogon service must be running, and it depends upon the network stack being operational.
Dealing with Registry Corruption
Registry corruption is, fortunately, a fairly infrequent phenomenon. In Win2K, Microsoft has taken steps to reduce its frequency in the thorniest areas, namely corruption of ntuser.dat within the user profile. This hive has been subject to many problems in the past due to its pattern of use. That is, it is loaded into HKEY_CURRENT_USER at user logon time and written back to disk at logoff. For example, if users decide to power down their machines while ntuser.dat is open and being written to disk, HKCU corruption can occur.
The problem could become especially severe if the users profile were of the roaming type. Roaming profiles reside on a server and are downloaded to the "c:\documents and settings" folder locally at logon time. This lets users roam from machine to machine and have their own applications and settings at each machine. In many cases in NT 4.0, the profile was easily corrupted if users powered off their machine instead of waiting for the logoff process to complete.
How do you know whether ntuser.dat is corrupt? Well, users who have corruption problems usually find out first. At logon time, they get a message indicating that their profile couldnt be loaded. Now, there could be several causes. For example, if you use roaming profiles and the server-based version of a users profile is unavailable, then the user sees this message.
However, if that is not the case, yet the users profile doesnt load, Win2K has some new ways of dealing with it. Specifically, if users profiles are corrupted or their central profile is unavailable, then their current user profile is copied to a backup directory and a new temporary profile is created from "c:\documents and settings\default user" ( Figure 16).
In this figure, you see a directory under "c:\documents and settings" called jsmith.bak. The user jsmith encountered a corrupt ntuser.dat hive file at logon. His whole user profile directory was copied from jsmith to jsmith.bak, and a new temporary profile was created from the Default User profile. This new temporary profile is called TEMP. If, at the next logon, the corruption problems on jsmiths ntuser.dat are fixed, TEMP is discarded and jsmith.bak is renamed jsmith.
How do you actually find out, in the example above, if jsmiths ntuser.dat file is corrupt? Here is a quick test. From Regedt32, you can load the suspected ntuser.dat hive file into a temporary key (as described in "Backing Up and Securing the Registry and Manipulating Hives and Keys," the fourth installment in this series). If all is well with the hive file, the load completes successfully. If not, then it likely fails. Figure 17 shows the message you get when you try to load a corrupted hive file.
Once you have discovered a corrupted hive file, there are few things you can do to recover it. If it can be loaded into a temporary key, you can use Regedt32s Save Key function to save off keys of interest, which can then later be restored to a new ntuser.dat file. However, this assumes you can actually get to the key. The best defense against corrupted ntuser.dat files is to keep backups available. For example, if you use roaming profiles, make sure you back up your profile servers each night to ensure access to a fairly recent profile.
The effects of corruption within one of the hive files other than ntuser.dat (e.g., system, software) are likely to be more severe — that is, your system probably does not boot or if it does, the applications that depend on the Registry fail. In those cases, you have a few options. In "Viewing and Manipulating the Registry" and "Backing Up and Securing the Registry and Manipulating Hives and Keys," I discussed a few of the available Registry backup and restore tools. You can use a tool such as Regrest from the Resource Kit to restore a hive file from a Regback, but that assumes you have used Regback frequently. (The same holds true for frequent backups by either NTBackup or a third-party backup product.)
Fortunately, in Win2K, we no longer have to worry about backing up and restoring the SAM and Security hives (at least on domain controllers) because this information is now kept in the Active Directory database files (which, of course, still need to be backed up).
Lets assume that you dont have backups of the hive files to restore. For the other hive files (software, system, and default), you can choose a couple of paths, summarized here:
Boot from the Win2K CD or floppies and select the Repair option. This gives you the opportunity to repair system files, but it restores them only from %systemroot%\repair or from the original hives from the distribution media. All subsequent changes that you have made are lost.
Install a second Win2K installation on the system, into a different directory from the first (e.g., winnt2 instead of winnt). Once you have completed this second install, you can get to the %systemroot%\system32\config directory within the original install. You can try using Regedt32 to load the suspect hive into a temporary key. If it can be loaded, you might be able to modify the errant key to fix the problem. If not, you can rename one of the .sav files that are in this directory to the original hive file name. For example, if you have a problem in System, copy system.sav to System and reboot into the original configuration.
Use a third-party tool like ERD Commander from Winternals Software (www.winternals.com). This tool lets you boot from floppies into a Win2K system and access the NTFS file partition using normal DOS tools, such as copy and delete. From here, you can copy healthy hive files over the corrupted ones and reboot into your Win2K system.
Use the new Recovery Console feature available in Win2K. This utility works much like Winternals Softwares ERD Commander, but it is included with Win2K. It must be installed manually on a server or workstation. From your distribution CD, go to the i386 folder and run the Win2K setup program (winnt32) with the following parameter: Winnt32 /cmdcons. You need to reboot after completing the installation. When you do, you see a new option on the boot.ini menu for the Recovery Console. When you choose this option, you are asked to authenticate yourself to the system as local administrator. Once authenticated, you have full access to the local file system (with security), as well as to floppy drives and CD-ROMS. From here, you can access files such as Registry hives and replace them as needed. Type Help at the command line to see what commands are available in this mode.
The problem with Registry corruption is that it is usually an all-or-nothing proposition. If a hive file is corrupted, there is no good way to repair just the corrupted parts. You end up having to replace the whole hive file. This underscores the need for frequent Registry backups. Every time you install a new application, new information is added to the Registry that is lost if the hive file is corrupted. The easiest way to avoid that is to schedule regular backups of the Registry on any system that matters to you.
Troubleshooting Suspected Registry Problems
Now that we have examined some of the tools you can use to monitor and clean up Registry problems, lets finish by looking at some best practices for tackling Registry problems. In this section, I describe how some common problems present themselves and what methods can be used to solve them:
Problem:
You have a problem that occurs on every system, regardless of which user is logged on.
Solution:
The problem is likely to be within HKLM. Try using the RegMon utility to monitor the system during the particular operation. Keep an eye out for "Access Denied" or "Not Found" errors that might indicate problems. Also, try booting in one of the safe modes, or use Last Known Good to narrow the problem.
Problem:
You have a problem that follows the user rather than the machine.
Solution:
The problem could be in HKCU. Try the following: Have the user log off the workstation. Make a backup of the users profile (the entire contents of "C:\documents and settings\%username%"), then delete it from "C:\documents and Settings."
If the user has a roaming profile, delete it or move it off the server. Have the user log on, at which point the user gets a default profile from "c:\documents and settings\default user." If the problem goes away, the user may have a problem with ntuser.dat. (Use one of the methods described previously for handling ntuser.dat corruption.)
Problem:
You get application failures for no apparent reason.
Solution:
Try using RegMon during application launch or at the point where the application function fails. Use the filter feature to filter on the applications process only. Look in RegMon for Access Denied messages, which would indicate Registry permission problems. Or, if you see a series of "Not Found" results on a particular set of keys or values, there could be missing Registry entries.
Problem:
Many users are affected by a problem at the same time.
Solution:
There are many reasons why this might be the case — not all Registry related. Check to make sure it is not a Group Policy issue. Group Policies are typically distributed to many users at logon or machine startup and periodically thereafter. Therefore, they could cause this kind of widespread problem. If you have recently installed a new Administrative Template, check to ensure that is not the source of the problem. Look for any other changes made recently to all systems.
Another common problem is Registry security, particularly on the HKCU hive. You may recall that the Permitted to Use feature within user profiles sets security on the HKCU Registry hive. Users must be able to access and write to this subtree to log on and change their environment. If a set of profiles has been assigned to a security group instead of to individual users, make sure that that group still exists. If not, the affected users lose all rights to their profiles.
Warning: Never implement any systemwide change without first testing it in a representative environment. This is especially true now with Win2K, because tools such as GPOs give you widespread control over many users and machines. If you plan to add new administrative templates, software installations, or Registry security configurations to a GPO, test the GPO on a representative sample first.
Tip: If you plan to deploy a new GPO, I recommend that you create a test OU that contains a number of machines and users. Deploy your new GPO there first, and check out its effect. When you are satisfied that it doesnt break anything, you can move it over to a production OU or domain.
Summary
In this chapter, I have discussed tools available for monitoring Registry activity, which is usually the first step in being able to successfully troubleshoot a problem. The RegMon utility stands out as a great way to interactively monitor what is going on in your Registry. I have considered ways of using the Windows Installer tools to log problems with application deployment, which can often affect the Registry in areas like HKCR and HKLM\Software. I have examined features such as Last Known Good and Safe Mode to see how they can help recover corrupt or errant Registry hives. Finally, I have presented a set of common problems and solutions to use as a starting point for troubleshooting Registry problems.
WinConnections Conference Fall 2008 Don’t miss the premier event for Microsoft IT Professionals in Las Vegas, November 10-13. Register and book your room by August 25 and receive a FREE room night (based on a three night minimum stay).
Master SharePoint with 3 eLearning Seminars Learn how to build a better SharePoint infrastructure and enable powerful collaboration with MVPs Dan Holme and Michael Noel. Register today!
SharePointConnections Conference Fall 2008 Don’t miss the premier event for Microsoft IT Professionals in Las Vegas, November 10-13. Register and book your room by August 25 and receive a FREE room night (based on a three night minimum stay).
VMworld 2008 - Sign Up Today! Join your peers on September 15-18 at The Venetian Hotel in Las Vegas as VMware hosts VMworld 2008, the leading Virtualization event.
Microsoft® Tech•Ed EMEA 2008 IT Professionals Advance your thinking with new ideas and practical real-world solutions at Microsoft’s FIVE day technical infrastructure conference 3-7 Nov., 2008. Register before 26 September 2008 to save €300.
Order Your Fundamentals CD Today! Gain an introduction to Exchange, learn server security requirements, and understand how unified communications can play a role in your messaging strategies with this free Exchange CD.
Are You Really Compliant with Software Regulations? View this web seminar that will help you with compliance best practices and check out a management solution to assure that you won’t be in jeopardy of an audit.
Virtualization Congress Oct. 14-16 in London Don't miss Virtualization Congress, the premiere EMEA conference dedicated to hardware, OS and application virtualization. Oct. 14-16 in London.