A quantitative approach to planning the protection of the digital imaging data investment ensures that reasonable effort is expended to mitigate risk.
Recent events have brought a renewed focus to preventing
catastrophic data loss. Data loss can occur due to a major
destructive event, a series of events, or a security breach. This
paper discusses some of the risks, planning pitfalls, and ways to
financially quantify the spending of monies to prevent failures of
data integrity. The discussion will apply specifically to some of
the unique challenges surrounding PACS, focusing on methods for the
justification and implementation of strategies to prevent security
breaches or catastrophic data loss.
A BRIEF OVERVIEW
Business continuity, disaster avoidance, disaster recovery, and
data risk management are all the same terms for the planning and
implementation of actions that lessen the risk of data loss,
corruption, or theft from abnormal and highly improbable
destructive events.
For purposes of clarification, expected operational failures are
different than disaster events. Operational failures are those that
are expected to happen in time. For example, all electronic
devices, ie, hard drives, CPUs, floppy disks, or optical drives,
have a mean time to failure (MTF) rating, which represents the
average time it should take the device to fail based on its
manufacturing data. Note that this measure is not a guaranteed
lifespan, but its average life. Therefore, these devices can be
expected to fail in time.
These types of failures should not be included in disaster or
security planning. These are operational failures of disposable
devices and proficient data center operations procedures, such as
solid backup schemes, drive redundancy, and life-cycle management,
should mitigate these problems. While hard drives are expected to
fail and network routers will lose power, these events are far
different from an entire data center being destroyed by a
tornado.
Protecting assets must also be differentiated from protecting
data. As the health care industry moves further away from
paper-based practice, we need to realize that what is important is
data and the functionality it provides, not the hardware that
stores or processes it.
This is an important distinction, for the costs of guaranteeing
the survival of a physical asset is far higher than warranting the
survival of data. One could lose all computer and storage systems
for an enterprise, but, if a complete and up-to-date backup of the
information resides somewhere else, then it can all be
recreated.
THE ODDS GAME
All business continuity and security management projects are
nothing more than specific applications of risk management. And
when properly done, these all are extensions of the age-old odds
game. Just as a successful, long-time gambler must know the exact
odds of winning a game and its payout, for someone to successfully
mitigate risks of either security- or disaster-related threats, we
must understand the exact odds, the costs we will incur if the
negative event takes place, and the responsibility we must incur to
attempt to prevent it.
The alternative approach is blanket coverage for all risks. The
drawback of this approach is that it may miss specific risks or may
become too cost prohibitive.
In the past, the challenge was to accurately quantify risks.
During the past 15 years, insurance companies and government
agencies have collected large amounts of relatively accurate data
involving disasters and security breaches. This allows us to very
accurately calculate the probability of a disaster event
happening.
For example, if it is known that, in a radius of 50 miles from a
given facility, two square miles have been directly hit by category
five tornados in the past 10 years, we can calculate the odds of
our building being hit over the next 10 years as follows:
We can accurately calculate this for most natural disasters
using data collected by various government agencies. For incidents
such as internal flooding or fire, we can rely on insurance company
data regarding our structures and locations.
The most difficult risks to calculate are soft risks. These are
events such as employee sabotage or theft, interfaced system
failures, or information theft. We may not be able to calculate
these risks with the same specificity as the aforementioned, but,
with a bit of diligence, it should be possible to calculate them
with less than 10% error.
During the process of identifying risks and calculating their
odds, one will come upon three categories of events that can be
excluded from further planning. The first we have discussed, which
are those that fall under normal expected operating failures.
The second are those events that, after analysis, prove to be so
highly unlikely that there is no use planning for them (ie, a
hurricane in Nebraska). These events can be removed in order to not
cloud the planning process.
The final category comprises those events that regardless of
their odds are so catastrophic that they would eliminate not only
the data storage facilities, but also the business they support
along with the customer base. An example of this is nuclear war or
asteroid strike.
Remember, the odds of the earth getting hit by a large,
environment-changing asteroid are better than those of winning the
lottery for every dollar played.
REMOVING THE EMOTIONS
One of the greatest pitfalls of disaster or security planning is
the emotions and individual reactions that surround any disaster,
security, or terrorist event. The goal of terrorism is to create an
unrealistic fear in a population. To that end, the September 11,
2001 attacks succeeded fantastically.
While there has been a lot of talk about terrorist attacks in
the past few years, in the past 10, only three major attacks have
made the news (two separate attempts at the World Trade Center and
the Oklahoma City Federal Building). These three attacks have
killed less than 3,000 people. Yet a great deal of resources and
energy has been devoted to either mitigating invisible risks or
incorrectly trying to prevent real risks.
For example, all the focus has been on bioterrorism attacks that
are known, such as anthrax. Self-proclaimed experts tell people
that they should buy plastic and duct tape to prevent against such
an attack, yet these supplies would be almost completely useless
against a real anthrax attack. The method that has been overlooked
is a simple can of LysolŪ spray. We fear a 300 lives per year
event, but do not seem to bat an eye at the 26,000 people that die
per year from the flu.
The reason these examples are raised is to expose the huge cost
to both industry and individuals that is incurred in response to
hype. A numbers-based approach will remove the speculation and hype
surrounding disaster planning and bring the costs in line with the
real risks.
September 11 played on this irrational fear. For example, one
self-proclaimed expert convinced a few Congressmen that a small
plane could be flown into a nuclear power plant and cause a nuclear
explosion. We leave it to the reader to examine the details of this
absurd scenario, especially in light of the government study that
showed that even a Boeing 737 jet flying into a nuclear power plant
could not breach the containment vessel.
CALCULATING THE COSTS
How does one go about calculating the costs of a disaster or
security breach? Again, we are not very concerned with any impacted
computing or storage hardware, but with the data and functionality
it provides. As we move toward a paperless environment and one that
is becoming highly automated, a complete loss of all electronic
information may well result in a company ceasing to do business. We
can then honestly show that such a loss is the value of the
company.
Other models dictate calculating the loss costs based on revenue
or operating profit. We believe it is more accurate to calculate
what it would cost the organization in real dollars based on manual
processes, legal ramifications, re-entry of data, loss of
competitive advantage, customer perceptions, decision-making,
command and control loss, and other standard business practices.
This method requires someone on the planning and analysis team that
has a good understanding of the operating costs of the
business.
Be aware that many times the costs are grossly miscalculated or
underestimated. There have been a number of examples over the past
few years of the costs of losing data centers to flooding and other
disasters that exemplify this point.
THE COST/RISK MODEL
In order to quantify the costs and justified expenditures
involved with security and disaster prevention planning, we propose
using a modified cost risk model based on actuary and risk barter
concepts from old financial market practices.
We have previously talked about calculation of odds and the need
to be able to quantify, as accurately as possible, these risks.
What follows is a method to use the risks and the corresponding
costs to come up with a fact-based risk cost.
After one has, as accurately as possible, calculated the odds of
each event occurring utilizing data from, for instance, the
hospital insurance company, the National Oceanographic and
Atmospheric Administration, and in-house security personnel, the
next step is to ascertain the costs of damage associated with each
event. In other words, if the identified risk were to occur, how
much damage would it do? It may be helpful to scale large events
into various subgroups. For example, a tornado may be of three
different magnitudes and may also pass near the facility, only
graze it, or hit it completely. These permutations can be used to
create multiple damage estimates.
A hurricane provides a good example. If a category five
hurricane hit a data center, it would be a complete loss. Remember,
we are looking at the data loss far more than the physical assets.
So if all the data is lost, we need to look at the impact to the
business. Take the example of a facility with a fully electronic
medical record, billing, document imaging, and PACS. Though basic
functionality may exist if all data was lost, the most reasonable
outcome would be that the business would cease. Therefore, the cost
of this event is the value of the company.
It can be argued that one could subtract out the cost of backups
or the time it would take to rebuild all the systems and restore
them from tape. While this does hold merit, it should not be
included in the weighting or cost factors.
Now that you have the costs of the events, calculating risk
becomes simple:
RISK COST = The Cost of Damage x The Odds of Damage
For example, in the case of the aforementioned hurricane, if the
odds of being hit by the hurricane are 1 in 10,000 (probability
=.0001) and the value of the business is $100 million, then we
have:
.0001 x $100,000,000 = $10,000
Figure 1. Cost/risk model calculation for a tornado hitting a data center. Odds and cost are hypothetical.
|
The risk cost is $10,000. This is a realistic mathematical-based
amount to spend on the mitigation of this identified risk. Figure 1
(page 32) illustrates a simple example based on a catastrophic
tornado strike on a data center. The odds and costs are not
accurate and are presented only for the sake of example.
If we take the odds of a tornado of complete destructive force
hitting our data center to be .01, or 1%, over the next 10 years
and we know that the loss of the data center will cost the business
$1.5 billion, we can multiply these to get a risk cost of $15
million. This would closely parallel the costs of insuring data
(which no one will do) over a 10-year period and, it can be argued,
is realistic spending to prevent such a disaster.
What one can do, in order to simplify planning while remaining
objective and quantitative, is to group the disaster/security
events into common cost groupings. For example, one can group all
of the events that would result in the closure of business, those
that may cost $250 million, $100 million, and down to $50,000
events. The number of groupings can be expanded or reduced as
needed.
Figure 2. Disaster/security events are placed in common cost groupings to simplify the planning process.
|
Figure 2 on page 32 provides a simplified example of this
method. In the simple example illustrated in Figure 2 (again the
numeric values are presented only for example purposes and are not
accurate), we have grouped our risk events into three categories;
$50 million risk events, $10 million risk events, and $1 million
events. In an actual evaluation, we would use more discrete
categories.
After assigning risk events to their respective damage costs
groups, we then plug in the calculated risk scores or probability
of each event happening. In our example, the $50 million events are
major tornado, data center flood, data center fire, and explosion.
For each of these, we know the probability, or risk, of the event
occurring. These are reflected in the oval to the right of each
risk factor. Since these are unrelated and not compounding events,
we can total them to get the risk of a $50 million event happening.
In our example we have a 0.1 probability of having a $50 million
event. Multiplying .1 x $50 million gives us a risk cost of $5
million.
We now can do the same for the other risk categories. We then
have three boxes with $5 million, $2 million, and $1 million as the
risk costs for each group. Since these are also independent, we can
add these up to come to a total risk costs of $7.3 million over the
span of the risk time frame, which in this case is 10 years. Again,
this calculation should represent what it would cost to insure the
data, and therefore a reasonable amount to spend to guarantee
survival of the data.
This model is advantageous in that it is scaleable to any
organization. It can be made as complex or simple as needed. And
finally, it takes out the ubiquity involved with preventing
loss.
SECURITY CONCERNS
In the planning process, much attention has been given to
preventing malicious data theft, so we will bypass that area and
focus on some of the less visible risks.
Inherent in most DICOM images are the header data. These include
many patient identifiers, institutional information, and other bits
of data that could prove damaging both to a patient and from
litigation against a facility if it fell into the public domain.
For this reason, close scrutiny must be placed on the procedures
for the transfer, for either diagnostic or research purposes, of
DICOM images.
As we move to a setup where we attempt to display radiology data
on nonradiologist workstations across the enterprise, we must deal
with the risks of incorrect access and distribution of sensitive
information.
Hospital and clinic security is somewhat open due to the nature
of our business. Can you picture it if we asked for two forms of ID
and ran a background check before we admitted someone into the
emergency department? Our products and services are focused on
fixing organic machines. To this end, we create an environment that
fosters many people visiting and entering our facilities with very
little in the way of security.
Yet we have started to move toward securing vulnerable patient
areas. Pediatric monitoring is an example, but these systems are
focused more toward preventing the unauthorized removal of a
patient or the malicious entry of an entity. But it is an excellent
example of how we can leverage technology to enhance security.
Though there is not yet an accepted trust by the general
populace in electronic media, it is far easier to break into a
mailbox and steal someone's paper mail than it is to crack into the
standard email system. Down this line, we can, with correct design
and configuration, leverage our PACS and document storage systems
as enhanced security tools. Both lessen the number of films and
documents that exist both inside and outside of the health care
system. It is arguably better to have the films or documents locked
away electronically where only those with a need-to-know can access
it, than to use multiple film clerks or medical records clerks, to
pass the information through.
Finally, as with any optical or tape storage system, we must
work to secure the areas that house the data archives. It is not a
pretty sight when a primary system has the highest security, yet
someone walks off with half of the optical storage platters. To
this end, optical and tape storage systems must be physically
protected from all employees except those involved in supporting
systems.
SPECIFIC PACS CHALLENGES
Digital image storage systems for both medical diagnostic images
and documents have their own inherent challenges on top of those mentioned.
The primary challenge revolves around the costs of these
systems. Per user, they are usually the most expensive in health
care. Redundancy in access is very difficult and costly due to
unique configurations (banks of eight monitors, for example, in a
reading room).
Frequently, sets of images are stored so that order and indexing
are very important. Of all the data stored in health care, medical
diagnostic images represent the largest per study. This is due to
the required resolution and the tendency not to compress data, even
if lossless methods are available. Furthermore, we have challenges
where not only do we need to store the source data but also any
further diagnostic studies created by enhancement or rendering
software. Many times, this data is extremely important for
historical and legal reasons.
Vendors have not been forthcoming in moving users toward a
redundant platform, or helping to reduce the costs of storage.
Two bright prospects are now on the horizon. The first is large
storage area networks (SANs). With most PACS implementations using
well over 2 or 3 TB of data, until recently, it was cost
prohibitive to store all of the image data on magnetic disks. With
the fall in prices of disk drives and the evolving disk chaining
technology, it is now possible to use magnetic disk to store all
images, both historical and current, and utilize the optical
subsystems as remote backup silos. Though this is not optimal and
creates only two interdependent data sets, it is far less costly
than the present option of creating two entire sets of optical
archives.
The other prospect is long-term streaming storage, which has
undergone recent advances. These systems copy all data committed to
disk, either optical or magnetic, over a fast network connection to
super servers in remote locations. There, the backup system stores
the critical information on disk, while constantly writing data to
magnetic tape. Furthermore, the system manages the tapes and backup
sets and can create multiple copies.
Most PACS today run on standard hardware and operating systems.
Even most modalities are available for shipping in a very short
period of time. This, again, relates back to the preservation of
data as opposed to hardware.
One final risk, in contrast to the previous paragraph, revolves
around specialized systems, made with older or less than
market-dominating hardware. It is wise to assess not only the
imaging vendor's health, but also the health of the companies that
provide the components.
CONCLUSIONS
Disaster avoidance and security planning in health care have
many unique challenges due to our ethics of caring for all, along
with our increasing dependence on data-based systems to preserve
human lives. It is our hope that by focusing on the facts
surrounding the risks, organizations can provide a sound data
preservation environment, while keeping costs aligned with the
identified risks.
Sean D'Arcy is senior consultant for HealthLink, Houston, TX, a
health care information technology consulting company; Sean.Darcy@healthlinkinc.com.
Julie D'Arcy, PhD, is a technical writer.