Increasing Backup/Recovery Performance
Many times, this is much easier said than done. Depending on how you back up your Exchange server, you may be able to apply technology and better design principles toward this goal. Many Exchange deployments I come in contact with have limits to backup/restore performance that are imposed by either their technology choice or the architecture that technology operates within. If you have a single DAT tape drive attached to every Exchange server in your deployment, you are going to get a "ballpark estimate" maximum backup rate of about 1GB2GB per hour. For DLT, that number increases to about 10GB15GB per hour. For a database that is 50GB, the DAT device will never deliver the needed performance (~25 hours to backup and ~50 hours to restore). For the DLT tape drive, you may have acceptable backup rates (~34 hours) but the restore rates would be barely adequate (~56 hours). From an architecture viewpoint, you may have high-speed devices available but perform your backup and restore operations over a network instead of locally attached devices. In this case, the bottleneck could be the network, and a dedicated backup network backbone would be required. When attempting to reduce disaster recovery windows with performance enhancements, you will have to evaluate the relative cost versus performance trade-offs. While an 8-drive DLT array locally attached to every Exchange server may yield the ultimate in performance (Compaq has tested as high as 70GB per hour on backup), this solution is not cost-effective for a large deployment. However, in this example, you may be able to deploy several backup servers with DLT arrays or library devices and back up your Exchange server via a dedicated disaster recovery "backbone" network. This may strike a better price/performance balance while still accomplishing the goal of increased backup and restored performance. You can look to other areas to increase disaster recovery performance as well. The disk subsystem on your Exchange server is another key area that will impact backup and restore performance. Upgrades to this important server subsystem can ensure that backup and/or restore performance is optimized. There are many areas to look at when attempting to increase performance of your disaster recovery operations for Exchange. The important point is to focus on where your current bottlenecks and limitation exist in your cur-rent strategy. By identifying where these bottlenecks exist, you can make sound, cost-effective decisions about which areas make the best sense for further enhancement and investment.
Using Alternative Technologies
In Chapter 6, I will discuss storage technology and the various features that can be leveraged to increase reliability for Exchange deployments. Technologies such as business continuance volumes (BCVs) and data replication can add to existing disaster recovery techniques and measures and can provide alternative recovery options. As an example, utilizing BCV technology can provide for another media that holds Exchange data in addition to the data that exists as part of your regular online backup ( Figure 4). In a scenario in which BCVs are used, they could function as a backup volume (from which backups are performed) or could be used as a rapid-recovery measure in the event of database corruption or data loss. Since many of these alternative technologies are new, there are many caveats. I suggest that these technologies can provide some answers to the challenges of Exchange disaster recovery. However, I do not recommend that these options be used in lieu of established and Microsoft-supported measures. For example, I would not use BCV technology as a replacement for regular online backups. I do believe, however, that these technologies can be an important complement to existing practices and methods. In later releases of Exchange 2000 and when these technologies mature, I expect to see many of them used regularly and as the primary means of increasing backup and restore performance as well as functionality. In the meantime, approach with caution and stay tuned to Chapter 6, where I discuss these technologies in more detail.
By shrinking the amount of time it takes to accomplish disaster recovery, we can scale our Exchange deployments to larger user populations and data sets. There are several ways to approach this challenge. Reducing data, increasing performance, and leveraging alternative technologies are among the leading strategies. Whatever your approach, seek to identify ways to accomplish this and thereby enable your Exchange deployment to meet the ever-growing service-level requirements of mission-critical systems.
BEST PRACTICE #7: DEVELOP AN EXCHANGE DISASTER RECOVERY PLAN
A solid backup plan should be in place when moving from a development or test environment to a production one. In the same way as you would never deploy untested software, the production phase of a project should never be entered into without a solid backup and restore plan. Best practices for Exchange deployments mandate the development of a backup plan as one of the essential steps in the design of a Microsoft Exchange infrastructure. In addition to protecting data as thoroughly as possible, the backup plan must reflect organizational requirements. Some of these requirements can be a driving factor in the hardware solution design. In the course of developing a backup plan, a series of questions should be addressed:
How often should the backups be performed?
What information should be backed up?
Which medium will be used (tape, disk, or BCV)?
What level of automation is required (unattended)?
How can it be ensured that any problems that occur are trapped, reported, and resolved?
What should be the retention, rotation, and archival policy?
In case of failure, how long will the restore operation take?
Is there a mechanism that makes sure if the backups are good?
How are the responsibilities defined to ensure that the backup operations are running smoothly and according to plan?
Are all the procedures documented thoroughly in such a manner that any member of the technical staff (including temporary staff) is able to perform backup and restore operations should the need arise?
These questions are just samples that may or may not apply to your environment. However, I strongly encourage you to consider them during the planning and design phases for each significant Exchange disaster recovery planning project. Any plan should also include and comprehend the entire system, not just Exchange. While you as an Exchange system manager may not have ultimate responsibility or accountability for all aspects of the system recovery (such as the Active Directory or Windows 2000), your plan needs to comprehend the complete system. For Exchange 2000, this includes Windows 2000 and Exchange 2000 but must also look at other components such as Internet Information Server (IIS) and third-party components as well. The disaster recovery plan must also consider each possible disaster scenario from the most minor incident (individual item recovery such as a message) to the most catastrophic event (such as fire, flood, theft, malicious activity, etc.). The plan must address backup procedures and methods, archival, tape management, rotation, personnel training requirements, and any resource requirements and staffing roles required. Once the plan has been developed, it must be thoroughly tested to ensure that the methods and procedures are valid for your environment. Dont rely on common MIS-generic disaster recovery planning for Exchange. Exchange is a specialized application that requires individual attention and a separate plan. The care that you take in developing and validating your disaster recovery plan for Exchange will have a direct correlation to how well the plan is executed and the level of availability you are able to achieve for your Exchange deployment.
BEST PRACTICE #8: TRAIN PERSONNEL AND PRACTICE DISASTER RECOVERY
In larger, established companies, this is usually a given (but not always...). However, in small companies, this needs to be the rule as well. Training should start with the disaster recovery plan and the execution of that plan. Personnel should be trained in all aspects of the system. Understanding how the backup software works is not enough. Train operations personnel on Windows NT/2000, Exchange, and the hardware platform on which they run. If your Exchange servers are deployed in a SAN or a clustered environment, this added complexity must be well understood by those responsible for system recovery. Ensure that personnel know the intricacies of the Exchange database engine, transaction-based storage, and how recovery is performed. Engage Microsoft PSS and other knowledgeable support resources in the process of planning and training your staff. Ensure that those responsible are aware of support resources and how utilize them and how to escalate issues when things go wrong. When a problem does occur, make sure that each player in the recovery scenario understands his or her role.
The best training ground for personnel is a disaster recovery "fire drill." Not just one, however, lots of them. An Exchange system manager who drops a bomb on his recovery staff when there is no real emergency will be much better prepared when the situation is real. Finding out that your recovery procedures dont really work during an actual emergency is the worst time to get this news. Hopefully, the procedures have seen hours of QA and validation before this point. However, as an added measure, test your plan and procedures thoroughly. Murphys Law says that it is a very real possibility that, when you most need something to work, it wont. Not only can your procedures be flawed (and therefore must be tested) but also, despite your best efforts and failsafe measures, your backup could be useless. Knowing how to handle an exception like this is much better addressed during a drill than a real-world crisis. Dont be afraid to periodically test your backups by restoring them onto a spare server or deployed recovery server. The more practice your operations staff has recovering Exchange data, the more likely it is that they will respond in an accurate and timely manner during a "live" outage. While validation of your procedures should work out most of these "bugs," Exchange fire drills are an invaluable practice to get into. Your organization should not neglect this important point and should implement a solid program to train all system managers, operators, and administrators on disaster recovery plans and procedures. This training program should include escalation procedures, recovery scenarios, and periodic disaster recovery drills that simulate all scenarios.
BEST PRACTICE #9: DISASTER RECOVERY MANAGEMENT
Although we will never be able to prevent data loss and catastrophes or plan for all contingencies, there are some good disaster recovery management practices that should be part of your deployment routines to potentially help alleviate the problems when they occur.
The first item is the creation of an Exchange Server disaster recovery toolkit. The Exchange disaster recovery toolkit is unique to an Exchange deployment and should go beyond the typical kit that your organization may have for a Windows NT/2000 server. This toolkit must "add a layer" and provide tools for successfully recovering not only the operating system but also Exchange Server. The disaster recovery toolkit ensures that all materials and documentation are available when and where you need it in the event a disaster occurs. Some typical items that I recommend be included in your disaster recovery toolkit are as follows:
A server hardware configuration worksheet Provides documentation on how the server hardware components were installed and configured. Most important are hardware CMOS/BIOS settings configured using the system configuration program. Critical to recovery is the configuration of the server storage including how disk devices are configured and RAID levels if applicable.
An operating system configuration worksheet Provides documentation on installation and configuration parameters needed to return the operating system to the same state it was before the disaster occurred. This should include any additional device drivers or utilities installed as well as registry settings that were modified from defaults. If the server is a Windows NT/2000 domain controller or Global Catalog server, the worksheet should contain any information settings pertinent to this configuration.
An Exchange Server configuration worksheet Provides information on Exchange-specific server configuration such as services installed and configurations. Critical to recovery operations is configuration data on how Exchange storage groups and databases are configured and allocated across server storage. Details on where log files and databases are stored will be critical to successful restoration to the last known good state. The Exchange worksheet should also contain data about any Exchange-specific configurations such as routing and administrative groups the server belongs to or Active Directory, SMTP, IIS, and X.400 connector settings.
A contact information worksheet Provides a source listing of proper individuals to contact in the event of an emergency or if specific configuration or security data is required. May also contain escalation procedures and contacts for both hardware and software issues encountered during recovery operations.
Recovery disks and CD-ROMs Should include all necessary software for successful installation, setup, and recovery of the Exchange Server including Windows NT/2000 emergency repair disk, hardware system configuration disks, device driver disks/CDs, and third-party software disks/CDs. May include Windows NT/2000 and Exchange Server CD-ROMs if they are not readily available.
These key components will form your disaster recovery toolkit for your Exchange deployment. In addition, your toolkit should also include any components specific to your deployment or organizational needs. The disaster recovery toolkit forms a solid cornerstone for excellent configuration management practices in your Exchange deployment. Configuration management with disaster recovery in mind will ensure that disaster recovery operations are performed efficiently and smoothly across the entire population of servers in an Exchange deployment. The following are some key points that are part of good configuration management practices. In Chapter 9, I will discuss configuration management in greater detail and beyond the limited scope of disaster recovery.
Tightly control the configuration of all Exchange servers.
Document all server configurations and keep a change log.
Ensure that hardware device drivers and firmware updates are consistent across the deployment.
Ensure that operating system and application service packs are consistently applied across the deployment.
Use like hardware configurations for all Exchange servers.
Deploy management software that provides configuration management capabilities.
WinConnections Conference Fall 2008 Don’t miss the premier event for Microsoft IT Professionals in Las Vegas, November 10-13. Register and book your room by August 25 and receive a FREE room night (based on a three night minimum stay).
Master SharePoint with 3 eLearning Seminars Learn how to build a better SharePoint infrastructure and enable powerful collaboration with MVPs Dan Holme and Michael Noel. Register today!
SharePointConnections Conference Fall 2008 Don’t miss the premier event for Microsoft IT Professionals in Las Vegas, November 10-13. Register and book your room by August 25 and receive a FREE room night (based on a three night minimum stay).
VMworld 2008 - Sign Up Today! Join your peers on September 15-18 at The Venetian Hotel in Las Vegas as VMware hosts VMworld 2008, the leading Virtualization event.
Microsoft® Tech•Ed EMEA 2008 IT Professionals Advance your thinking with new ideas and practical real-world solutions at Microsoft’s FIVE day technical infrastructure conference 3-7 Nov., 2008. Register before 26 September 2008 to save €300.
Order Your Fundamentals CD Today! Gain an introduction to Exchange, learn server security requirements, and understand how unified communications can play a role in your messaging strategies with this free Exchange CD.
Are You Really Compliant with Software Regulations? View this web seminar that will help you with compliance best practices and check out a management solution to assure that you won’t be in jeopardy of an audit.
Virtualization Congress Oct. 14-16 in London Don't miss Virtualization Congress, the premiere EMEA conference dedicated to hardware, OS and application virtualization. Oct. 14-16 in London.