Just about everyone with a television, radio, or newspaper has seen or heard about the
devastation that has occurred throughout the world in the past several years.
Unfortunately some have experienced tragedy first hand. You can be sure disasters will
continue to affect people and companies worldwide. After the news media has left the scene
of a disaster, people must put their lives and businesses back together. Some are more
fortunate than others, some have a plan. As in real life, some people will not survive the
disaster, so it is with businesses. This document describes why a Business Recovery Plan
is absolutely necessary, and the essential elements in developing a plan.
This White Paper was produced by Bob Janusaitis, a Certified Disaster Recovery Planner
located in Houston, TX.
Many organizations have strong business recovery plans for their mainframe and
mini-computer systems. But, as more and more critical applications are migrated to
distributed systems, companies are becoming concerned about how they can protect these
systems in the event of a disaster. Chances of a disaster increase significantly as
systems are moved away from traditional central computer facilities that have hardened
security and environmental controls. Hurricanes, fires, floods, earthquakes and terrorist
attacks may be rare at a computer center but as more applications are distributed over
local and wide area networks it is inevitable that something will go wrong.
Recently we have seen one disaster after another headlined in the national and
international news; from hurricanes to floods, earthquakes, bombings and riots. The
possibilities for disasters continue to be numerous. Most disasters happen without warning
and when they do, there is no time for planning and organizing, only scrambling, to
recover. Since their destruction cannot be prevented, organizations must prepare for them
by implementing a plan for expedient and successful recovery. The recovery problem is
further compounded by the complexity of distributed environments and the heterogeneity of
hardware, software, and communications protocols.
Most all businesses now depend heavily or totally on computer technology and other
automated systems, and their disruption for even a few hours can cause severe financial
loss and even threaten survival of the organization. The ongoing operations of an
organization depend on senior management support of the business recovery process.
Organizations must include provisions for recovering the functional areas of the
enterprise that have been identified as critical. This includes recovering more than just
the information system. Consideration must be given to replacing PBX equipment, 800 and
long distance service, a location for the employees to work, salvage of usable building
contents, and the list goes on. When these critical items can not be replaced on a timely
basis the ability to manage the organization can become nearly impossible.
Most companies can insure that operations continue after an interruption by taking a few initial steps. First determine if you have someone in the company who has the experience and the time to begin assembling the necessary requirements of the plan.
A disaster recovery plan is a comprehensive set of action steps to be taken before,
during, and after a disaster. The plan is documented and tested to ensure the continuity
of operations and availability of critical resources after a disaster.
Since every company is different the business recovery plan will need to reflect the
differences in core functions, recovery windows, customer service, regulatory issues, etc.
It will provide the guidance required during a climate of crisis, that will insure
vital issues will not be overlooked. When designed properly, the plan will guide even
inexperienced employees in helping the company to recover.
Business Recovery Plans are required by law in the financial industry, but a
comprehensive, regularly tested plan can help any organization insulate itself from
litigation for negligence. The very existence of a plan could be a defense that the
company had not neglected preparation for disasters in management responsibilities.
The major benefits from developing a comprehensive business recovery plan can be
summarized as:
A frequently quoted study conducted by the University of Texas (Christensen, S. R.,
et. al, "FINANCIAL AND FUNCTIONAL IMPACTS OF COMPUTER OUTAGES ON BUSINESSES",
Center for Research on Information Systems, The University of Texas at Arlington.)
revealed the following sobering statistics:
The study concluded that "Organizations which had prepared for an extended
computer outage through insurance and/or a contingency plan reported significantly lower
expected loss of revenues, additional costs, and loss of functioning. As a group, these
organizations estimated that their revenue losses would be 2.5 times as severe if their
contingency plans were not activated."
If an organization is fortunate enough to survive a disaster without a plan for
recovery, it will not survive unscathed. Aside from the direct revenue losses incurred
during the failure, the organization will also suffer intangible costs such as cash flow
interruption, loss of customers, loss of competitive edge, erosion of industry image, and
reduced market share.
Numerous other studies have revealed that the majority of LANs and WANs have no
established recovery plans. Though the business world increasingly is employing networks
of microcomputers as cost-effective alternatives to mainframe platforms, many have not
acknowledged the mission critical nature of these systems.
Ultimately, a company's management is responsible. They must control company assets,
and this means controlling their information systems, proactively managing systems and
insuring their continued operation.
The Securities and Exchange Act of 1934 was amended by the 1977 Foreign Corrupt
Practices Act to require all publicly held companies to keep accurate records and maintain
internal control systems to safeguard assets. The courts have defined assets as including
the computer systems and all the data they contain. Critical records and original
documents also are assets that should be protected.
The law places equal emphasis on "making...records." The company that fails
to generate a record is as liable as the company that fails to preserve it. A company
without adequate disaster plans may not be able to create records for a substantial period
of time.
The penalty for conviction under the Foreign Corrupt Practices Act is a fine up to
$10,000 or five years imprisonment, or both.
Business recovery should be a concern for the entire company. It is not limited to
network management. All department managers who are dependent on the services provided by
the network must be responsible for the development of the emergency procedures within
their own areas as well as participating in the recovery plan for the computer network.
Each area must be able to activate its own portion of the plan in concert with the overall
recovery effort. Even support functions such as building maintenance and facilities need
to be part of the plan. These functions may not be directly dependent upon the network for
their own productivity; however, they will be responsible, in part, for the restoration of
the facility.
The commitment of senior management is most important to a plan development effort
being a success. Without their support it will be almost impossible to get functional
departments to commit the resources required to develop a usable plan.
Participation and acceptance by the user community is also essential. They must take
ownership responsibility for the systems they use. If they are not part of the plan
development process, the plan has little chance of being truly useful. Their involvement
also will help to identify some important aspects of recovery:
A Business Recovery Plan requires an ongoing investment of time and financial
resources. If the plan is not maintained, it is almost as bad as not having a plan.
A project workplan should be established to manage the tasks, deadlines, and
deliverables. The major steps of the typical business recovery project workplan are:
Project Organization includes project administration, defining assumptions, meetings,
and policy issuance.
Risk Assessment
A risk assessment identifies the type of disasters a specific location is likely to
endure. It examines the physical infra-structure within and outside the building for
several kilometers. A relative value is assigned to each category and estimates of
duration are noted. A scale of 0 to 3 is used, with 0 being not likely to 3 that is very
likely. This in turn determines what areas should be examined further to mitigate the
risks.
Business Impact Analysis
After the risk assessment has been performed, a business impact analysis is undertaken.
It involves determining the cost of not being able to transact business. It may be fairly
straightforward, like number of widgets not sold per hour/day or week; or it may be more
abstract and management will have to quesstimate the loss. In any event, the intent is not
to get an exact answer, but to identify what is critical to keeping the company in
operation. This step will determine the breadth of the recovery plan. Overprotecting will
cost excess funds, under protecting will give you a false sense of security.
Recovery Strategy Development
Once the requirements have been determined, decisions can be made concerning how best
to provide support following a disruption. Various options are available that include:
Hot site - a vendor in the business of providing a prepared site with hardware,
telecommunications, technical support personnel, etc., usually under an annual contract.
Subscribers get access to the facility on a first-come, first-serve basis. Work area
recovery would also be considered under this area.
Cold site - an empty facility or lease space that is ready for occupancy. Immediately
after the disaster hardware, software and support services are shipped to the location.
Requires commitment from vendors to provide services on an expedited basis.
Internal backup - Another company owned facility in another region is used to provide
services on an emergency basis. Requires coordination and possible displacement of
personnel at backup site.
Mutual support agreement - An agreement with another company to share resources after a
disaster. This assumes that the backup site always has adequate capacity, and you are
comfortable with security issues surrounding sharing resources on a system you don't
manage.
In some cases it may be necessary to use a combination of these options. Large
multi-national corporations are increasingly using the internal backup approach for LANs.
Conventional disaster recovery service vendors are generally geared toward the
mainframe environment and may not be able to provide full support to a client/server based
enterprise. Since the number of recovery centers available is limited, it could also mean
not having a site to use in an emergency. A regional disaster could cause all sites to be
occupied and leave a company without a location to restore operations.
A well thought out plan will provide a company with step-by-step directions depending
on the type and severity of a disaster. It will prescribe functional teams of company
employees who are trained in the skills necessary to initiate and execute the recovery
plan. It will also insure that a critical component is not overlooked in the high stress
situation of a recovery effort.
Documentation
The documentation of the plan can take several forms. Most companies still use
traditional word processors, others use commercially available software. Whatever method
is used, it is important to insure that change control procedures are in effect to keep
the plan up to date.
Training
Training the members of the Emergency Operations Team is essential to insure each
person knows their roles and responsibilities. Alternate EOT members should also be
included in the training.
Simulation
The plan must be tested on a regular basis. Most companies test at least semi-annually.
A simulated disaster will exercise the plan, identify any weaknesses, and demonstrate
interaction between the participants. A critique of the plan will normally include minor
modifications to the plan.
Finally, to be successful, the plan must be tested and updated on a regular basis. Few
business recovery plans operate perfectly as initially designed. Since adjustments and
corrections are needed regularly, recovery plans should be easy to update and change.
The following should be considered in developing a business recovery plan for an
organization:
A typical Business Recovery Plan includes recovery teams for areas such as:
| Team | Responsibility |
| Initial Response | Determines the extent of damage |
| Emergency Operations | Acts as the command center during the recovery process |
| Public Relations | Handles press releases and the media |
| Facilities Management | Configures new location and begins reconstruction of damaged site |
| Personnel Support | Deals with travel, relocations issues, injuries, familiy assistance |
| Computer Operations | Reestablishes production infra-structure |
| Business Operations | Coordinates all functional business unit recoveries |
| Voice and Data Communications | Reestablishes production voice and data network |
| Vital Records | Coordinates salvage, restoration of damaged records and offsite storage |
| Administrative Support | Provides direct support to members of the Emergency Operations Team |
Procedures are developed in the following areas:
The plan should also contain documents that can be used by personnel unfamiliar with
specific areas that must be recovered. They include items such as:
To develop the actual plan, there are three basic options:
Each of these approaches can be successful. They vary in cost, but in all cases, a
number of dedicated site personnel are required for research and implementation. A
successful plan requires thorough user cooperation and detailed departmental
self-evaluations.
In-house development requires in-house business recovery planning expertise that is
only gained through extensive training and experience. In most organizations, this is
rarely available, especially in the distributed computing environment. If you have an
existing recovery plan for your legacy systems, it may be useful as a guide, but it will
not address every issue in a distributed systems environment.
Though slowly evolving to include networks, software packages which facilitate the
development of a business recovery plan have almost exclusively been developed for the
traditional single-vendor mainframe environment. Additional effort will be needed to
include areas not addressed by these software packages.
The consultants generally have experience and training in the mainframe environment. To
address the recovery of a distributed systems network, a company should identify a
consulting firm with a successful track record in this environment.
In order to help an organization get started developing its plan, we have compiled the
following non-exhaustive checklist. Reviewing it just might give company management some
idea of what is involved in preparing to recover its distributed systems.
___ Identify potential disasters and prioritize them by likelihood of occurrence.
___ Estimate the impact each might cause and identify items that could be damaged.
___ Estimate time required to restore damaged resources and potential business loss due to
outage.
___ Identify mission-critical resources.
___ Inventory your assets to expedite insurance claims.
___ Build in fault tolerance (Mirroring, RAID, UPS).
___ Protect your applications and data (Virus scanning, off-site backups).
___ Maintain voice and data communications (switched services, dial-in, 800, cellular).
___ Implement alternate sites and plan for obtaining required resources.
___ Prepare the formal plan (use flowcharts and step-by-step instructions), test and
revise it regularly.
DRT Systems International has the expertise to provide full life cycle business
recovery planning. Starting with the risk assessment, business impact analysis, strategic
recovery plan, and moving on to tactical implementation, testing, maintenance and review.
DRTs leadership in distributed systems, combined with experience in business recovery
planning makes us the choice for those who want to do it right the first time.
Copyright (C) Robert G. Janusaitis 1994,1995,1996. All rights reserved. Used with
permission.
For more information send mail to: rminfo@notes.drthou.com.
Send comments and questions about this site to webmaster@notes.drthou.com.
Copyright 1996 DRT Systems International, L.P.
Document last modified on: 02/08/96