Disaster Recovery - A White Paper

Why was this White Paper produced?

Just about everyone with a television, radio, or newspaper has seen or heard about the devastation that has occurred throughout the world in the past several years. Unfortunately some have experienced tragedy first hand. You can be sure disasters will continue to affect people and companies worldwide. After the news media has left the scene of a disaster, people must put their lives and businesses back together. Some are more fortunate than others, some have a plan. As in real life, some people will not survive the disaster, so it is with businesses. This document describes why a Business Recovery Plan is absolutely necessary, and the essential elements in developing a plan.

This White Paper was produced by Bob Janusaitis, a Certified Disaster Recovery Planner located in Houston, TX.



What is the background of Business Recovery Planning?

Many organizations have strong business recovery plans for their mainframe and mini-computer systems. But, as more and more critical applications are migrated to distributed systems, companies are becoming concerned about how they can protect these systems in the event of a disaster. Chances of a disaster increase significantly as systems are moved away from traditional central computer facilities that have hardened security and environmental controls. Hurricanes, fires, floods, earthquakes and terrorist attacks may be rare at a computer center but as more applications are distributed over local and wide area networks it is inevitable that something will go wrong.

Recently we have seen one disaster after another headlined in the national and international news; from hurricanes to floods, earthquakes, bombings and riots. The possibilities for disasters continue to be numerous. Most disasters happen without warning and when they do, there is no time for planning and organizing, only scrambling, to recover. Since their destruction cannot be prevented, organizations must prepare for them by implementing a plan for expedient and successful recovery. The recovery problem is further compounded by the complexity of distributed environments and the heterogeneity of hardware, software, and communications protocols.

Most all businesses now depend heavily or totally on computer technology and other automated systems, and their disruption for even a few hours can cause severe financial loss and even threaten survival of the organization. The ongoing operations of an organization depend on senior management support of the business recovery process.

Organizations must include provisions for recovering the functional areas of the enterprise that have been identified as critical. This includes recovering more than just the information system. Consideration must be given to replacing PBX equipment, 800 and long distance service, a location for the employees to work, salvage of usable building contents, and the list goes on. When these critical items can not be replaced on a timely basis the ability to manage the organization can become nearly impossible.

Most companies can insure that operations continue after an interruption by taking a few initial steps. First determine if you have someone in the company who has the experience and the time to begin assembling the necessary requirements of the plan.



What is a disaster recovery plan and why is it important?

A disaster recovery plan is a comprehensive set of action steps to be taken before, during, and after a disaster. The plan is documented and tested to ensure the continuity of operations and availability of critical resources after a disaster.

Since every company is different the business recovery plan will need to reflect the differences in core functions, recovery windows, customer service, regulatory issues, etc.



What are the benefits of having a Business Recovery Plan?

It will provide the guidance required during a climate of crisis, that will insure vital issues will not be overlooked. When designed properly, the plan will guide even inexperienced employees in helping the company to recover.

Business Recovery Plans are required by law in the financial industry, but a comprehensive, regularly tested plan can help any organization insulate itself from litigation for negligence. The very existence of a plan could be a defense that the company had not neglected preparation for disasters in management responsibilities.

The major benefits from developing a comprehensive business recovery plan can be summarized as:



What are the consequences for not having a Business Recovery Plan?

A frequently quoted study conducted by the University of Texas (Christensen, S. R., et. al, "FINANCIAL AND FUNCTIONAL IMPACTS OF COMPUTER OUTAGES ON BUSINESSES", Center for Research on Information Systems, The University of Texas at Arlington.) revealed the following sobering statistics:


The study concluded that "Organizations which had prepared for an extended computer outage through insurance and/or a contingency plan reported significantly lower expected loss of revenues, additional costs, and loss of functioning. As a group, these organizations estimated that their revenue losses would be 2.5 times as severe if their contingency plans were not activated."

If an organization is fortunate enough to survive a disaster without a plan for recovery, it will not survive unscathed. Aside from the direct revenue losses incurred during the failure, the organization will also suffer intangible costs such as cash flow interruption, loss of customers, loss of competitive edge, erosion of industry image, and reduced market share.

Numerous other studies have revealed that the majority of LANs and WANs have no established recovery plans. Though the business world increasingly is employing networks of microcomputers as cost-effective alternatives to mainframe platforms, many have not acknowledged the mission critical nature of these systems.



Who is responsible for the plan?

Ultimately, a company's management is responsible. They must control company assets, and this means controlling their information systems, proactively managing systems and insuring their continued operation.

The Securities and Exchange Act of 1934 was amended by the 1977 Foreign Corrupt Practices Act to require all publicly held companies to keep accurate records and maintain internal control systems to safeguard assets. The courts have defined assets as including the computer systems and all the data they contain. Critical records and original documents also are assets that should be protected.

The law places equal emphasis on "making...records." The company that fails to generate a record is as liable as the company that fails to preserve it. A company without adequate disaster plans may not be able to create records for a substantial period of time.

The penalty for conviction under the Foreign Corrupt Practices Act is a fine up to $10,000 or five years imprisonment, or both.

Business recovery should be a concern for the entire company. It is not limited to network management. All department managers who are dependent on the services provided by the network must be responsible for the development of the emergency procedures within their own areas as well as participating in the recovery plan for the computer network. Each area must be able to activate its own portion of the plan in concert with the overall recovery effort. Even support functions such as building maintenance and facilities need to be part of the plan. These functions may not be directly dependent upon the network for their own productivity; however, they will be responsible, in part, for the restoration of the facility.



What does it take to develop a Business Recovery Plan?

The commitment of senior management is most important to a plan development effort being a success. Without their support it will be almost impossible to get functional departments to commit the resources required to develop a usable plan.

Participation and acceptance by the user community is also essential. They must take ownership responsibility for the systems they use. If they are not part of the plan development process, the plan has little chance of being truly useful. Their involvement also will help to identify some important aspects of recovery:


A Business Recovery Plan requires an ongoing investment of time and financial resources. If the plan is not maintained, it is almost as bad as not having a plan.



What are the steps in developing the Business Recovery Plan?

A project workplan should be established to manage the tasks, deadlines, and deliverables. The major steps of the typical business recovery project workplan are:


Project Organization includes project administration, defining assumptions, meetings, and policy issuance.

Risk Assessment

A risk assessment identifies the type of disasters a specific location is likely to endure. It examines the physical infra-structure within and outside the building for several kilometers. A relative value is assigned to each category and estimates of duration are noted. A scale of 0 to 3 is used, with 0 being not likely to 3 that is very likely. This in turn determines what areas should be examined further to mitigate the risks.

Business Impact Analysis

After the risk assessment has been performed, a business impact analysis is undertaken. It involves determining the cost of not being able to transact business. It may be fairly straightforward, like number of widgets not sold per hour/day or week; or it may be more abstract and management will have to quesstimate the loss. In any event, the intent is not to get an exact answer, but to identify what is critical to keeping the company in operation. This step will determine the breadth of the recovery plan. Overprotecting will cost excess funds, under protecting will give you a false sense of security.

Recovery Strategy Development

Once the requirements have been determined, decisions can be made concerning how best to provide support following a disruption. Various options are available that include:

Hot site - a vendor in the business of providing a prepared site with hardware, telecommunications, technical support personnel, etc., usually under an annual contract. Subscribers get access to the facility on a first-come, first-serve basis. Work area recovery would also be considered under this area.

Cold site - an empty facility or lease space that is ready for occupancy. Immediately after the disaster hardware, software and support services are shipped to the location. Requires commitment from vendors to provide services on an expedited basis.

Internal backup - Another company owned facility in another region is used to provide services on an emergency basis. Requires coordination and possible displacement of personnel at backup site.

Mutual support agreement - An agreement with another company to share resources after a disaster. This assumes that the backup site always has adequate capacity, and you are comfortable with security issues surrounding sharing resources on a system you don't manage.

In some cases it may be necessary to use a combination of these options. Large multi-national corporations are increasingly using the internal backup approach for LANs.

Conventional disaster recovery service vendors are generally geared toward the mainframe environment and may not be able to provide full support to a client/server based enterprise. Since the number of recovery centers available is limited, it could also mean not having a site to use in an emergency. A regional disaster could cause all sites to be occupied and leave a company without a location to restore operations.

A well thought out plan will provide a company with step-by-step directions depending on the type and severity of a disaster. It will prescribe functional teams of company employees who are trained in the skills necessary to initiate and execute the recovery plan. It will also insure that a critical component is not overlooked in the high stress situation of a recovery effort.

Documentation

The documentation of the plan can take several forms. Most companies still use traditional word processors, others use commercially available software. Whatever method is used, it is important to insure that change control procedures are in effect to keep the plan up to date.

Training

Training the members of the Emergency Operations Team is essential to insure each person knows their roles and responsibilities. Alternate EOT members should also be included in the training.

Simulation

The plan must be tested on a regular basis. Most companies test at least semi-annually. A simulated disaster will exercise the plan, identify any weaknesses, and demonstrate interaction between the participants. A critique of the plan will normally include minor modifications to the plan.
Finally, to be successful, the plan must be tested and updated on a regular basis. Few business recovery plans operate perfectly as initially designed. Since adjustments and corrections are needed regularly, recovery plans should be easy to update and change.



What else should I consider?

The following should be considered in developing a business recovery plan for an organization:




What specific areas should be included in a Business Recovery Plan?

A typical Business Recovery Plan includes recovery teams for areas such as:

Team Responsibility
Initial Response Determines the extent of damage
Emergency Operations Acts as the command center during the recovery process
Public Relations Handles press releases and the media
Facilities Management Configures new location and begins reconstruction of damaged site
Personnel Support Deals with travel, relocations issues, injuries, familiy assistance
Computer Operations Reestablishes production infra-structure
Business Operations Coordinates all functional business unit recoveries
Voice and Data Communications Reestablishes production voice and data network
Vital Records Coordinates salvage, restoration of damaged records and offsite storage
Administrative Support Provides direct support to members of the Emergency Operations Team



Procedures are developed in the following areas:


The plan should also contain documents that can be used by personnel unfamiliar with specific areas that must be recovered. They include items such as:



Who can help me?

To develop the actual plan, there are three basic options:


Each of these approaches can be successful. They vary in cost, but in all cases, a number of dedicated site personnel are required for research and implementation. A successful plan requires thorough user cooperation and detailed departmental self-evaluations.

In-house development requires in-house business recovery planning expertise that is only gained through extensive training and experience. In most organizations, this is rarely available, especially in the distributed computing environment. If you have an existing recovery plan for your legacy systems, it may be useful as a guide, but it will not address every issue in a distributed systems environment.

Though slowly evolving to include networks, software packages which facilitate the development of a business recovery plan have almost exclusively been developed for the traditional single-vendor mainframe environment. Additional effort will be needed to include areas not addressed by these software packages.

The consultants generally have experience and training in the mainframe environment. To address the recovery of a distributed systems network, a company should identify a consulting firm with a successful track record in this environment.



Is there anything I can do myself?

In order to help an organization get started developing its plan, we have compiled the following non-exhaustive checklist. Reviewing it just might give company management some idea of what is involved in preparing to recover its distributed systems.

___ Identify potential disasters and prioritize them by likelihood of occurrence.
___ Estimate the impact each might cause and identify items that could be damaged.
___ Estimate time required to restore damaged resources and potential business loss due to outage.
___ Identify mission-critical resources.
___ Inventory your assets to expedite insurance claims.
___ Build in fault tolerance (Mirroring, RAID, UPS).
___ Protect your applications and data (Virus scanning, off-site backups).
___ Maintain voice and data communications (switched services, dial-in, 800, cellular).
___ Implement alternate sites and plan for obtaining required resources.
___ Prepare the formal plan (use flowcharts and step-by-step instructions), test and revise it regularly.



Who do I call when I need help?

DRT Systems International has the expertise to provide full life cycle business recovery planning. Starting with the risk assessment, business impact analysis, strategic recovery plan, and moving on to tactical implementation, testing, maintenance and review. DRTs leadership in distributed systems, combined with experience in business recovery planning makes us the choice for those who want to do it right the first time.

Copyright (C) Robert G. Janusaitis 1994,1995,1996. All rights reserved. Used with permission.


For more information send mail to: rminfo@notes.drthou.com.

Send comments and questions about this site to webmaster@notes.drthou.com.

Copyright 1996 DRT Systems International, L.P.

Document last modified on: 02/08/96