Seven Habits of Highly Resilient Organizations
Small and mid-sized organizations
are especially at risk when disaster strikes, since few have the resources or
knowledge to develop full-scale continuity plans. Often, first actions are
directed toward the protection of physical property. But more important than an
organization’s physical property is ensuring the integrity of its data,
communications capabilities and the information technology infrastructure to
support both – regardless of the circumstances surrounding a disaster.
Here are seven habits that CDW
Government LLC (CDW-G), a provider of information technology (IT) solutions to
business, government, and education, advises organizations adopt to ensure they
are prepared for any business contingency and that they remain resilient in the
event of unplanned interruptions. These habits can help organizations prevent
costly downtime, reduce inconvenience to customers and minimize damage to an
organization or agency’s reputation. These habits are provided by CDW-G’s team
of technology specialists and systems engineers, who are experts in evaluating
and designing technology solutions for government agencies, educational
institutions and healthcare facilities.
1. Conduct a
business impact assessment.
Because even the most thorough
disaster preparedness plan won’t be able to justify the cost of including every
mission process – especially for small organizations with limited resources – it
is important to inventory and prioritize critical processes for the entire
organization.
Organizations should tier data based
on its import to operations. For example, processes that need to be resumed
within 24 hours to prevent serious mission impact, such as citizen service
delivery, or that will have major effect on stakeholders could receive an “A”
rating, while those that need to be resumed within 72 hours could receive a “B”
rating followed by those “C” functions that can be restored in more than 72
hours.
In addition, several software
packages can help an agency or institution assess its disaster preparedness and
map out strategies that fit the organization’s needs and goals.
2. Take
steps to protect data.
Aside from people, information is
the single most critical asset for virtually any organization. Organizations
should back up data frequently to ensure records are kept, and consider
upgrading the backup equipment to a faster version to reduce the time it takes
to complete a backup cycle. Automated, remote backup services are available from
many vendors.
Organizations should also store
multiple copies of data off site and a long distance from the primary data
center. Outsourcing this service may make sense for small and mid-sized
organizations that do not currently operate in a suitable, alternative location.
There are a few different approaches
to backing up data that are increasingly affordable for smaller agencies and
institutions. They include:
·
Tape Rotation: Information on
servers is copied to storage media (typically tapes) on a set schedule. These
tapes are then removed to an offsite location for safe storage. This is the
most basic approach to data backup
·
Data Replication: Information on
servers in one location are copied – either in real time or on a set schedule –
to servers in another location. As a result, the data in one location has an
exact mirror image in another location – often at a great distance. The
off-site server then takes over operations if the primary server is damaged
·
Appliance Backup: Like data
replication, the information on servers in one location is copied – either in
real time or on a set schedule – to a storage appliance in another location.
This does allow for a mirror image of the data on the server, but does not
include offsite facilities should the primary server infrastructure be
destroyed
·
Data Vaulting Facilities:
Information on servers is copied to an on-site central depository, which is then
replicated to an off-site data vaulting facility typically owned by a
third-party organization
Once data is backed up,
organizations will need to carry out a practical and well-tested plan to
retrieve the information. The same IT architecture should frame both the
organization’s disaster recovery site and the primary data center, reducing
complications. If the organization uses a wide-are network (WAN), the Internet,
an intranet portal and telephones to provide citizen services, the same
infrastructure should be built at its backup facility, for example.
Organizations focus so much on
protecting and backing up network server data that they often fail to take steps
to ensure their employees can remotely access that data if they are unable to
work in the office. Remote-access software, such as products provided by Citrix
and Microsoft, can enable employees to access networked server or desktop
information offsite.
- Review power
options.
Organizations should add
uninterrupted power supplies (UPS) for critical servers, network connections,
and selected personal computers to keep the most essential applications running.
In addition, cooling systems should
be supported by backup generators. Computer rooms can heat up quickly if
computers operate on backup power without adequate, precision cooling.
Monitoring for heat and humidity also are essential in critical computer rooms.
Heat is the biggest threat to UPS battery life, and temperature increases can
reduce the lifespan of network equipment by half – and also cause unplanned
system interruptions when agency operations are most critical.
Having a power backup system does
not eliminate the requirement to regularly inspect and maintain the power
infrastructure. System administrators should periodically ensure that automatic
transfer switches are configured so that there is little lag time to disrupt UPS
power to computer systems. At the same time, they should take the opportunity
to conduct regular battery inspections and replacement. Like flashlight and
smoke detector batteries, UPS systems need to be inspected before they are
needed.
Finally, if the system must stay
operation, building redundancy into the power system is another proven means to
ensure power system reliability and, therefore, network availability.
Redundancy enables maintenance of a UPS module without affecting power to
connected equipment. It also increases fault tolerance.
4. Identify and
appoint a cross-functional preparedness team and a recovery
team.
Organizations should pull together a cross-functional team from appropriate
departments that can include computer operations, applications development,
server and systems administration, facilities, key service departments, data
security, physical security and network operations. This team can identify and
prioritize critical processes, design the overall process for recovery, select
an outside service provider, conduct tests, identify members of the preparedness
team and document the plan.
The cross-functional preparedness
team will select the recovery team, which will participate in recovery
activities after any declared disaster. While the recovery team can be similar
to the cross-functional preparedness team, its members should not be identical,
even within a small organization. Additional members should include the
executive sponsor (e.g., CIO or COO), key stakeholder representatives (e.g.,
community liaison), and representatives from outside service providers.
5.
Document, test and update the disaster preparedness
plan.
The cross-functional preparedness
team should document a disaster preparedness plan that clearly defines the role
of each individual on both the cross-functional preparedness and recovery teams.
Documentation should include updated configuration diagrams of the hardware,
software and network components to be used in the recovery. The plan should
include logistical details, including travel to backup sites, and even who has
spending authority for emergency needs. This plan also should include lists of
emergency contacts and instructions.
Once complete, the
plan should be tested to ensure that it will be accurate and effective in an
emergency. The true value of a continuity plan can be assessed only if rigorous
testing is carried out in a realistic environment. That means testing the plan
in an environment that simulates the series of events likely to occur in an
actual emergency. It also is important that the tests be carried out by the
people who would be responsible for those activities in a crisis. While an
organization is likely to make mistakes during such testing, it is best to
experience, identify and address these errors well in advance of a real
emergency.
Because change is constant within
most organizations, and because the organizations are increasingly dependent on
information systems, it also pays to update the plan regularly. Products and
services designed to help in the event of an emergency also change, as does
their method of delivery. A business continuity plan must keep pace with these
changes for it to be useful in the event of a disruptive emergency, and tests
must be conducted regularly to ensure organizational preparedness.
6. Consider
telecommunications alternatives.
Key to any organization’s disaster
preparedness plan is a contingency plan for telecommunications. Alternative
communications vehicles, including wireless phones and satellite phones, should
be considered.
Power for communications is just as
important as it is for the rest of an organization’s IT infrastructure, so it is
important to become familiar with the local telephone system’s emergency power
capabilities and limitations. Organizations may want to investigate auxiliary
power sources such as an uninterruptible power supply or battery back-up, either
of which can be coupled with a surge protector. If on-premises
telecommunications equipment uses software voice mail or a call accounting
system, the software should be backed up regularly so valuable information about
the system’s configuration is not lost if it goes down. Copies should be stored
both on and off-site.
In addition, various
telecommunications services can help organizations quickly restore
communications connectivity:
·
If the agency uses an 800 number for
critical functions such as order taking or citizen services, this number can be
terminated, or rerouted to another telephone number. A plan should be in place
for answering those calls as well
·
Call forwarding is an optional
feature offered by the local phone company. A main telephone number can be
forwarded to another office location, depending on anticipated call volume, or
to an employee’s home. Calls can even be forwarded to cellular phones.
Organizations may want to have call forwarding permanently installed on their
main business telephone number so it can be easily activated in the event of an
emergency
·
In an emergency, the ability to
place long distance calls can be greatly restricted. To minimize disruptions,
organizations should maintain relationships with multiple service providers,
enabling access with one network if another is down
7. Form tight
relationships with vendors.
A strong relationship with hardware,
software, network, and service vendors can help expedite recovery, as these
vendor contacts often can work to ensure priority replacement of critical
telecommunications equipment, personal computers, servers and network hardware
in the event of a disaster. This is especially important for small- and
medium-size organizations, which may lack the resources that larger companies
can tap in an emergency.