2005 Anonimo Linking Health Care Information
2005 Anonimo Linking Health Care Information
2005 Anonimo Linking Health Care Information
INFORMATION: PROPOSED
METHODS FOR IMPROVING
CARE AND PROTECTING
PRIVACY
Working Group on Accurately Linking Information
for Health Care Quality and Safety
February 2005
Members of the Working Group on Accurately Linking
Information for Healthcare Quality and Safety, Connecting for
Health
Introduction 1
Background 4
Working Process 6
The Problem of Linking 11
Architectural Principles 28
Architectural Overview 34
Sharing Appropriate Records 39
Example: Priscilla Switches Doctors 40
Privacy Enhancing Technology Built into the Architecture 43
Network of Networks 45
System-Wide Concerns 49
Security 54
Conclusion 58
Appendix: MPI Survey Summary Report 59
Introduction
This document outlines a strategy for linking patient information across multiple
sites of care, developed by the Working Group on Accurately Linking Information
for Healthcare Quality and Safety, a part of the Connecting for Health effort
sponsored by the Markle Foundation and the Robert Wood Johnson Foundation.
The linking problem is simple to describe but hard to solve: how does a
healthcare professional link a patient with their health files, and how do they
know that any two files stored in different places refer to the same person? This
problem occurs every time a care provider asks to have a patient's file pulled or
updated, and every time a patient moves or changes doctors, visits a new lab or
specialist, or falls ill while traveling. At its core the linking problem is one of
identity -- how can we say for sure that a patient in the office is to be matched
with a particular set of records, or that two sets of records can be merged
because they belong to the same patient?
The goal of the Linking Working Group was to address these issues, proposing
practical strategies for improving healthcare through improved linking of
information in a secure and efficient manner, and in a way that allows healthcare
professionals much improved access to needed information while respecting
patients' privacy rights. Additionally, we assumed that our proposals would be
implemented in a five-year time frame, with the additional assumption that any
test bed or pilot project implementations would therefore have to be ready in
between one and three years, depending on the complexity of the problems to be
worked on. We thus focused on techniques for record linking already in use in
other areas, rather than on the design of entirely new methods.
Solving the linking problem is only part of the effort needed to improve the
healthcare system's use of information technology (IT), of course. There is
considerable work to be done on the format and use of Electronic Health records;
on the use of available data to improve both medical research and public health;
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
on the economic models around sustainable deployment and upkeep of these
new technologies; and many other issues. Connecting for Health has addressed
the broad spectrum of these issues in its “Roadmap” report: Achieving
Electronic Connectivity in Healthcare: A Preliminary Roadmap from the
Nation's Public and Private-Sector Healthcare Leaders, describing in
overview a broad vision for improving healthcare through the use of IT. In
addition, two Working Groups operating in parallel to the Linking Working Group
issued reports on sharing electronic information with patients (Connecting
Americans to Their Healthcare) and on the business and organizational issues
of community-based information exchange (Financial, Legal and
Organizational Approaches to Achieving Electronic Connectivity in
Healthcare), respectively. These reports are available at
http://www.connectingforhealth.org/, as is the response Connecting for Health
prepared in collaboration with twelve other influential groups to the federal
government’s RFI on the “National Health Information Network.”
The work of the Linking Working Group is meant to address a set of problems
that touches almost everyone in the US healthcare system, from individual
clinicians to large insurance firms; from local clinics to national hospital chains;
from neighborhood pharmacies to state and national public health departments;
and so on. Because of this breadth, it has been difficult to find one term that
adequately reflects the diversity in size and mission of all the different
participants. We have settled on the generic phrases “institutions and providers”
or, alternatively “entities” or “organizations” when we mean all participants in the
healthcare system, regardless of size, mission, or sector (profit, non-profit,
government). As noted in the report below, we also include patients in the list of
authorized entities, as we believe the system as proposed will greatly improve
their access to their own health information.
In our work, we focused on the problem of linking patient information where the
information is widely distributed, and on some of the architectural requirements
for supporting that linking in a way that would allow authorized entities to access
patient records remotely and securely. Though solving the linking problem would
not be a panacea, it would represent significant progress on an issue that is both
important in and of itself, and a necessary precursor to tackling other, more
complex issues.
Current solutions to the linking problem tend to be ad hoc, paper based, local,
and ineffective. Though every institution in the healthcare system from sole
practitioners to giant hospital chains faces the linking problem, there is no
standard solution, and for many sites of care, paper records are still the norm.
Paper records have the advantages of tangibility, making it possible to aggregate
individual files easily within a single institution, but are hard to search and hard to
share.
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
As a result, the only files on a patient that can be easily called up are those held
locally. Healthcare personnel are thus often forced to work with a very partial
subset of the available information on a patient in their care, and frequently end
up re-running tests because earlier results are unavailable. At best this creates
enormous waste and additional expense. (One of the participants in the Linking
Working Group reports that an audit of expenses across systems in
Massachusetts found that 15% of expenses were in running duplicate tests
because the early results were unobtainable.) At worst, it delays critical diagnosis
or exposes patients to invasive procedures unnecessarily.
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Background
While the benefits of improved exchange of healthcare information are well
known, more efforts to achieve it have failed than succeeded. Problems have
included concerns about information ownership, privacy (particularly on a
national scale), lack of trust among the participants, the lack of electronic
systems in providers' organizations, and the lack of standards that are effective
beyond the scope of a single organization.
The Linking Working Group's proposal balances the need to protect privacy with
improved discovery and delivery of patient's medical records when they are
needed, where they are needed, and only by authorized individuals who need
them. The ability to locate patient records and deliver them securely will enable a
number of improvements in healthcare, including especially:
A Decentralized Approach
In approaching this problem we have tried to learn from earlier efforts, but we are
also optimistic that the present opportunity offers us a significant advantage
unavailable to previous work. Past attempts to create new infrastructure at
national scale forced all-or-nothing choices. Often this was because the only
models we had for such systems were highly centralized and controlled by the
government, e.g. the FAA flight control system or the IRS database.
In the last 5 years, however, we have seen the growth of large-scale but
decentralized architectures, everything from AOL's instant messaging system,
used by millions daily, to collaborative tools like Groove, approved for secure
field use by the Department of Defense. These decentralized architectures mix a
high degree of local autonomy with enough global coordination to ensure the
functioning of the system. We believe that the flexibility of decentralized
architectures offers a way out of the all-or-nothing deadlock. Though much of the
required architecture is out of scope for the narrower question of linking patient
records, we believe that the architectural characteristics of our proposed
approach for linking information is compatible with the needs of a larger health IT
system.
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
The decentralized approach also leaves clinical information in the hands of the
clinicians and institutions that have a direct relationship with the patient, rather
than moving or replicating it to giant central servers. This approach maximizes
the value of incremental development, as the information is already where it
needs to be for the system to work. It greatly reduces the risk of misuse, by
ensuring that there is no single "bucket" holding clinical information. This
decentralization also leaves judgments about who should and should not see
patient information in the hands of the patient and the clinicians and institutions
that are directly responsible for the patient's care.
Both a Big Bang and incremental scenario will require significant investment over
the next decade, as the healthcare system shifts to more automated ways of
delivering information relevant to care. However, any Big Bang scenario would
require the standardization of record format, storage, access and transport at
hundreds of thousands of sites throughout the US prior to the launch of the
system.
This would delay by years the value that can be gotten by simpler but more
partial upgrades. So long as there is a clear upgrade path and well-defined
standards on each of those fronts, there will be steady improvement from even
partial improvements in record linking.
Creating the infrastructure for improved linking of records requires some the
deployment of additional hardware and software, most of it for the envisioned
Record Locator Service and Certification Authorities (detailed below). The clinical
records themselves will remain in the hands of the organizations responsible for
them. Thus, as with the growth of the fax network or the Internet, the bulk of the
IT implementation can be undertaken locally, one institution at a time, and in
response to their own needs, budgets, and timelines.
Our work is not complete, having gotten only to the stage where it is good
enough to criticize. We continue to work on it within the framework of Connecting
for Health, and to present it to knowledgeable people in the health and IT
industries, and it will undergo considerable additions and modifications during
those conversations. Even early trials of a proposed Record Locator Service
(described below) and attendant standards and practices will alter the problems it
sets out to solve. As a result, this recommendation is a set of principles, goals,
and proposed early tests, but will require constant monitoring and course
correction to become effective in any large-scale system.
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Working Process
Our goal from the outset was to define ways in which the US healthcare system
could be significantly improved over the next five years, through an increased
ability to match patients with their existing health records, and through the timely
delivery of those records to sites of care.
The Linking Working Group began our process with an articulation of principles
that we have used to guide our work. As always with such principles, there is no
guarantee that they will not clash, and indeed, there are several such clashes
present here. There is, for example, a tension between technological
improvement and backwards compatibility. The most backwards-compatible
system possible would change nothing, whereas the most radical set of
improvements imaginable would require an immediate wholesale upgrade, both
distinctly impractical options.
Nevertheless, given the opacity and complexity of some of the issues we are
tackling, we have found these principles useful as guiding lights.
Any proposed solution must support the accurate, timely, and secure
handling and sharing of patient records. It must increase the quality of care,
the economic sustainability of the healthcare system, and preserve the
privacy of patient information. And it must create value for many different
kinds of participants, from private, non-profit and government institutions to
the individual healthcare professionals and patients who use it.
While that risks sounding like ‘motherhood and apple pie,’ it actually contains
several important mutual constraints. We cannot simply trade patient privacy for
increased efficiency, for example, or saddle individual providers with unfunded
mandates as a way of deploying new tools or technologies.
Privacy
Preserving privacy is important to ensuring acceptance of the system and its
benefits. Trust is a crucial component of the doctor-patient relationship, including
those elements of the relationship that involve the disclosure and sharing of
sensitive information. Privacy is an important factor contributing to that trust.
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Privacy advocates have long agreed that patients should be informed by
providers of the benefits of linking records. However, even well informed patients
are reluctant to share information because of privacy concerns. A 1999 survey by
the California HealthCare Foundation showed that even when people understood
the huge health advantages that could result from linking their health records, a
majority believed that the risks of lost privacy and discrimination outweighed the
benefits.
The architecture proposed to support the linking of health records has been
designed to eliminate the two largest perceived privacy threats associated with
the linking of health records: centralization and the use of a unique national
health ID. Our approach leaves records with the healthcare providers who
created them and uses a person’s ordinary name and common identifiers such
as date of birth and address to link those records. The only thing centralized is a
directory of providers holding patient records and pointers to those files.
Availability of information
Privacy is only an issue because clinicians must share patient information to do
their jobs. Knowledge of existing medical conditions, drug lists, allergies, and
other kinds of information about the patient can mean the difference between
good and mediocre care and, in extreme cases, between life and death.
And yet, in the US system as it exists today, the main locus for relevant
information is not the doctors or labs who have previously seen the patient, but
the patient herself. Patients are often asked to remember details about their
medical histories, current problems, prescriptions, and allergies, a task they often
fail to fulfill. In addition, a provider seeing a patient who has had a test run
elsewhere is likelier to choose to have it run again, or to proceed without the
results, than to undertake the often futile effort to retrieve the results in a timely
manner. (In Massachusetts alone, for example, 15% of medical expenditures
have been attributed to redundant testing, costing $4.5 billion per year.)
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Local control of records
Under the system we propose, decisions about linking and sharing are made by
the participating institutions and providers at the edges of the network. The
system supports (1) linking of records via a directory of pointers and sharing
among healthcare providers participating in the system, but it also allows (2)
linking without sharing or sharing pursuant only to higher authorization as well as
(3) treatment situations that do not result in linking, such as drug or alcohol
rehabilitation.
There are a number of anecdotes and studies that make a promising case for
why it is important for patients to have their own medical records. During the
course of our work, one of our Working Group members who runs a system
within CareGroup that enables patients to access their records reported a patient
catching an incorrect diagnosis of a growing tumor. The patient was able to do
this because she realized that the "growth" of the tumor was an artifact of an
earlier and incorrect recording of its size. The patient's knowledge of her own
medical record saved her an invasive and unnecessary surgical procedure and
potentially harmful chemotherapeutic intervention. (See the story of Jerilyn
Heinold in Achieving Electronic Connectivity in Healthcare: A Preliminary
Roadmap from the Nation's Public and Private-Sector Healthcare Leaders,
at www.connectingforhealth.org.)
There are two types of data a patient can and should be given access to -- the
audit trail, detailing when and who looked up the location of their records; and the
location of their clinical information itself.
There is obvious appeal in having a simple electronic portal allowing for direct
patient access. However, the inability to authenticate users securely often
prevents implementation of this idea. Current methods used for electronic
commerce, for example, use credit cards as proof of authorization, an obvious
impossibility here, both because it would lock out patients that cannot or don't
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
want to use their credit cards in this way, and because credit card companies
often deal with fraud or identity theft after the fact, an unacceptable option where
health records are involved. From studying systems that offer patients access to
their medical records, we found that most often first use is authenticated by their
provider. Because there is no way to positively authenticate patients remotely
during their first use of the system, any patient access must first be authorized,
whether in-person or by signature (physical or electronic) by an authorized
institutional user.
Once these access credentials are provided, our proposed Record Locator
Service will offer a patient remote access (in practice, secure login from a Web
browser) to the audit record held in the Record Locator Service, and will provide
the same contact and retrieval information to his or her records that the
institutions and providers receive. (The patient, of course, will not be able to see
any other records but his or her own.)
Once in possession of location of their records, the patient will still have to
request them directly from those institutions and providers, but as that function is
HIPAA-mandated and will become increasingly popular, the need to deliver such
records at low cost will, we believe, be an additional driver for automation.
As a result, we have consistently looked for solutions that could be rolled out for
testing as pilot projects, and where partial implementation would produce some
value. This mandate for marked improvement in five years has also led us to be
suspicious of solutions that require Big Bang development, where many pieces of
the system are upgraded all at once.
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
a patient with their information as the conceptual core, our proposal has four
layers:
10
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
The Problem of Linking
Any attempt to improve healthcare IT must solve this problem, since giving
healthcare professionals access to information about patients in their care is a
core function. Furthermore, this is not just a problem of linking records between
different institutions and providers. Operators of large healthcare databases
recognize that individual patients are listed more than once in the same and in
different databases, within the same institution.
Our recommendation for linking patient records is:
1. The system should not require the existence of a national unique health
identifier
2. The system should be designed to create the potential advantages of a
national unique health identifier without requiring top-down issuance
3. The system should use probabilistic algorithmic matching of commonly
available identifiers to link records
These recommendations are really three parts of the same idea -- design a
system for linking authorized patient records using existing demographics and
identifiers, rather than waiting for the deployment of new health identifier, but
without foreclosing the ability to take advantage of new identifiers should they
arise.
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
comparing the numbers. If they match, the records refer to the same patient. If
they don't match, they refer to different patients.
Political resistance to any form of national identifier has always run high in
the United States, and earlier attempts to discuss the creation and
maintenance of such an identifier by the federal government (as no other
body would be able to do so) have always been shelved.
Because there are so few cases where proposed national identifiers have
ever come close to practical implementation, it is difficult to use past
examples when trying to predict the result of any given effort. What we
can predict, even in a climate driving increased government inspection of
personal data for national security, is that any proposal for a national
health identifier will generate violent opposition. The record of success for
linking information in the face of such opposition, even for efforts that don't
require a common identifier such as Computer Assisted Passenger
Prescreening System (CAPPS ll), is extremely poor.
12
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Past history and current resistance to government-managed identifiers
suggest a high chance of outright failure in any attempt to create a
universal identifier for healthcare.
Similarly, much rule making on healthcare is done at the state level, and
federal rules that pre-empt state authority are therefore quite contentious.
Because of the single-issuer nature of any federally-run health identifier
system, it is unlikely that partial or trial implementations will be allowed
before these issues and others are settled, making progress on this issue
vulnerable to long and possibly paralyzing debate on a variety of topics
from immigration policy and the nature of citizenship to the relationship of
the federal and state governments.
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
number. Discussions of a national health identifier system often focus on
how it will work when it is finished, but the task of building such a system
requires solving difficult sub-problems as well as implementing significant
updates across a poorly coordinated system. In any system whose goal is
improvement in a five-year timeframe, the technical challenges alone risk
pushing such a system out of the realm of practicality.
The ASTM Report puts it starkly: "To gain the benefits from such an
identifier, it must be used by all relevant organizations." In practice, this
means it must be deployed to a significant portion of the clinics, labs,
insurance agencies, hospitals and other participants in the US healthcare
system before it begins to create any great value. Given that it will
necessarily be capitalized by these individual organizations (no one
organization, not even the government, could underwrite the necessary
technology upgrade), there will be significant inertia to overcome, as
everyone will want to postpone implementing a system that will only be
really valuable once everyone else has implemented it as well.
These issues have led us to the following conclusion: Any effort to produce a
health identifier will require significant effort and investment; will suffer from a
high risk of failure; and will not produce partial improvements when partially
implemented. In addition, there will be a persistent requirement for a system that
14
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
can link user records without recourse to such an ID, even if such an ID were
deployed. This combination of high cost simply to secure political agreement,
long lead time and enormous expense to get such agreement, and the
uncertainty that such agreement could ever be secured, makes us skeptical that
work on a national health identifier is the best use of time and resources
dedicated to improving healthcare through the use of IT.
Therefore, we believe that the effort and expense trying to make a national health
identifier a reality we could better spend on improving the current systems that
link patient records using existing identifiers. Furthermore, should a national
health identifier or indeed any broadly accepted identifiers come into being, they
can be used as additional sources of likelihood of match. No system will ever rely
on a single identifier, as some secondary set of information will be needed to
resolve ambiguous matches, and any data that can be used for such
disambiguation can thus be integrated into the system we propose. Armed with
this conclusion, we then set about examining the alternatives.
This type of identifier could provide the most critical four of the six possible
characteristics of the health identifier, with caveats. Such an identifier would be:
15
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Uniqueness and permanence can be provided so long as HIPAA-mandated
identifiers of healthcare organizations and providers are themselves unique, and
so long as an institution does not re-use in-house patient identifiers.
(A "No re-use of identifiers" policy is widely regarded as best practice in database
management, but is not universally adopted at present.)
Ubiquity is definitional – since being listed in the system required the presence of
such an identifier, the identifier will be (tautologically) ubiquitous. The larger
challenge is to extend the system to the broadest possible adoption in the
shortest time. Another requirement imposed by this approach is a unique
identifier for the healthcare organization and provider, which HIPAA has
mandated by 2007. (Pilot projects and other early work will require the issuing of
temporary versions of such identifiers.)
A comparison with e-mail is instructive here – there are many John Smiths, but
only one [email protected]. IBM.com is a globally unique entity, and is
responsible for the local uniqueness of email addresses in its domain. Likewise,
John Smith might also have [email protected], also a valid and globally unique
email address. In such a system, a person will have several such pointers to
medical records, as they do today for their records that exist in multiple places,
and there is no guarantee that any one identifier will point to a patient for life.
Thus the two characteristics of a health identifier this system forgoes are
canonicalness and invariability. Though these would be desirable characteristics
if they could be obtained at little implementation cost, they are not requirements
for successful identity systems (as the email example shows), and they are the
two characteristics that that create the greatest difficulty in implementation, and
create the requirement for a single issuer of identity (in practice, the federal
government, whether directly or by proxy). We believe that a system without
canonicalness or invariability is better suited to incremental creation of value and
to shared participation among a large number of otherwise uncoordinated actors.
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
your office today). This problem occurs today, whenever two institutions or
providers need to share information on a patient -- primary care physician and
specialist, clinic and lab, hospital and HMO. Our proposed solution for handling
these cases is probabilistic matching of the patient, using existing patient
identifiers.
Such a system can operate with a single cutoff for a match (e.g. "Treat these two
records as belonging to the same patient if first name, gender, date of birth and
SSN all match"), but the system can be further improved by weighing the
probability that similar information in different records indicates that the records
belong to the same patient.
You can give record pairs a 'potential match' score -- low to high likelihood of a
match, and record pair frequency -- number of records that have a particular
score. Such a graph will have this rough distribution, where the area under the
17
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
dotted line contains record pairs that do not refer to the same individual, while the
smaller area under the red line contains record pairs that do:
The most common category by far is obvious non-matches, in the shaded area
on the left. These are low scoring record pairs that do not refer to the same
person. A record for Susan Smith, DOB 3/9/1969 is not to be linked to a record
for Anthony Moon, DOB 4/5/1997. The most important category is high-scoring
pairs where both records actually refer to the same person, the area shaded here
on the right. Improving the ability to find and make such records accessible is a
core goal of improving linking.
The greatest challenge lies in the middle region, which contains the records
whose scores are high enough to be possible matches, but not so high as to be
obvious matches. In an ideal world, we would be able to completely separate
non-matches from true matches, but because of the overlap, the middle zone
contains two additional categories. Record pairs that have a high match score
but do not actually refer to the same person are false positives, while record pairs
that have a low score but do refer to the same person are false negatives.
Thus the interaction between record pairs that have a high match score and
those that actually refer to the same person creates four conceptual categories:
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Despite the similarities, the outcomes of false negatives vs. false positives in a
clinical setting are radically different. The US healthcare system currently
functions under the assumption of prevalent false negatives -- caregivers are
accustomed to operating with incomplete patient information. False negatives,
while undesirable, are normal and, in any system that protects patient privacy,
also inevitable. No system that allows patients to opt to keep certain records out
of view can also guarantee that caregivers have a complete clinical record. Thus
a goal of improved record linkage is to greatly enhance access to relevant
information, without ever pretending to guarantee 100% coverage of all of a
patient's records.
False positives, on the other hand, can be catastrophic, as they can lead a
caregiver to wrongly believe they have information that may have life or death
consequences. A doctor given incorrect medication or allergy lists, for example,
may prescribe an inappropriate drug resulting in significant and negative
consequences, where patient records are inappropriately disclosed by being
incorrectly combined with the records of patients with similar names. Thus the
first critical design step in pulling records on a particular patient is to raise a very
high threshold for matching data, in order to optimize the system against false
positives. This will of necessity raise the number of false negatives, but this is a
distinctly less bad outcome than allowing false positives and false negatives to
appear at similar rates.
Once this high threshold for a presumed match has been created, the second,
critical step is to use the available identifying characteristics to remove the
remaining false positives, leaving only true matches. This is the function of
probability-weighted matching algorithms.
19
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
out as many true matches as possible, while producing few false positives,
ideally zero.
To do this, the corresponding fields of every linked record pair are compared, to
see how likely it is that a particular field, such as last name, matches in similar
vs. dissimilar records. The important calculation is the ratio of correct data in true
matches vs. incorrect data in non-matching records overall. Some initial predictor
for asserting a true match needs to be produced, but simply acts as a stake in
the ground for further refinement of the measurements. For the purposes of the
discussion below, we use data drawn from the 2002 paper Analysis of Identifier
Performance using a Deterministic Linkage Algorithm, by Shaun J. Grannis MD,
J. Marc Overhage MD PhD, and Clement J. McDonald MD. (Marc Overhage was
a member of our Working Group, and Shaun Grannis provided comment on a
draft version of this document.)
In this patient population, last name was the same in 93.5% of true matches (the
last 6.5% being accounted for by last name change, data entry error, etc.) In the
same population, where SSN matched, last name was the same in 21.6% of non-
matching pairs. Thus, a matched last name is 4.3 times more likely to occur in
true matches than in non-matches. (The unusually high co-valence of matching
last names is an artifact of using SSN as the predictor of a link, which tends to
match among people with the same last name. Multi-variate predictors of
matches will have fewer artifacts of this sort.)
You could perform this calculation for every possible identifying field. Gender, for
instance, has a better chance being correctly recorded and unchanging than last
name does, being accurate in roughly 97% of cases. However, gender also
overlaps by chance in roughly 50% of cases. Thus, though gender is usually
more accurately recorded than last name, it is only a little less than twice as likely
to be the same in true matches as in non-matches.
First names are more complex. There are more first names than last names in
the US population, making them better predictors of true matches. Variable
spelling (Marcia, Marsha) and the acceptance of nicknames as synonyms
(William, Bill) complicates the match prediction problem. The use of name-
similarity databases such as Soundex, Metaphone, and the New York State
Identification and Intelligence System algorithm (NYSIIS) can greatly increase
the predictive value of a first-name match. Matching NYSIIS-transformed first
20
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
names were found to be present in 89% of true matches and only 1.4% of non-
matches, or 63.5 times more likely to occur among true matches than non-
matches.
Date of birth exhibits still another pattern, where each of the sub-elements (day,
month, year) can be analyzed as a match predictor. The AIMA data indicated that
a match on day was 11.5 times more likely to occur in true matches than non-
matches, month was 19.4 times more likely, and year was 22.2 times more likely.
The advantage of treating date of birth as a collection of sub-fields is that even in
the event that one element is missing or incorrectly recorded, there is still some
predictive value in the remaining fields.
Multiplicative Value
The principal value of such variables is not in isolation -- no one would try to
identify a patient based on a single characteristic, not even SSN -- but in
combination. And, critically for the algorithm, the combinations are multiplicative.
For example, a complete match on all three DOB fields (day, month, year) is
almost 5,000 times more likely in a true match than a non-match in the patient
population involved. Similarly, in a population where a similar first name is 63.5
times more likely to refer to a true match, and date of birth is 4,953 times more
likely to do so, matching first name and Date of Birth is more than 300,000 times
more likely to do so. (This multiplicative effect is variable, however, depending on
the fields being concatenated. Ethnicity, for example, means that first names and
last names are not completely independent variables. Likewise, first name is
strongly correlated with gender, so knowing gender does not double the accuracy
of knowing first name.)
These rates improve further when secondary characteristics can be matched on,
such as SSN or Zip code. Furthermore, matching on multiple variables is robust
and can protect against the occurrence of certain non-matching characteristics,
such as inaccurately entered data or changed last name. Note that even if a
universal health identifier of some sort existed, such a process would be needed
in the event of missing or mis-recorded identifiers, and could use such an
identifier to improve the matching algorithm, even given the inevitable
inaccuracies in at least some of the recording of such an identifier.
21
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
The method as described here is a greatly simplified version of a more complex
and iterated operation. In particular, pre-processing of the data can produce
much more accurate inputs, by grouping sound-alike and nicknames as noted
above, or by analyzing numerical data for simple number transpositions, which
can increase the predictive rate for fields like Date of Birth, Zip, and SSN.
The critical question is which combination of fields will produce such a high
likelihood of accuracy that the number of false positives produced will be
miniscule compared to the number of true matches recovered.
For the short term, any work on pilot projects must make the existence of a
patient's identifying data in electronic form a pre-condition for participation. The
question of what will lead the myriad small providers who make up much of the
healthcare system in this country to upgrade their record handling systems will
be beyond the scope of any short-term test.
Longer term, however, work must be done to understand how to provide both the
necessary technology and incentives to get providers to collect patient data in an
electronic format, and to use this data as part of the linking infrastructure outlined
here. In all likelihood, this effort will be linked with other efforts to improve the
storage of clinical information in an Electronic Health Record format. In addition,
work needs to be done on methods and incentives for improving existing records,
both merging duplicate records and updating incorrect fields in existing records.
Given the diffuse incentive structure of the US healthcare system, some sort of
pay-for-performance incentive for gathering accurate records and cleaning
22
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
existing ones is one obvious possibility to explore. Such a change, however, will
almost certainly be part of a larger re-alignment of incentives in the direction of
use of IT, and thus can't be easily integrated into any narrower test of particular
capabilities such as linking patient records.
Real-World Implementations
To work at any large scale (millions of patient records or more), such a system of
probabilistic matching must have enough identifying characteristics about the
patients to make one-and-only-one matches in the majority of cases, and must
produce a negligible the false positive rate.
Because of these requirements, the Linking Working Group was initially skeptical
that probabilistic matching could work in a large (and ultimately national) network
of linked healthcare providers. However, as we uncovered research in the field
such as the work by Grannis, Overhage, and McDonald, as well as examples of
healthcare systems that were using such matching while handling millions of
records (e.g. Sutter Health, the North Carolina Immunization Registry), and
similar systems outside healthcare (Defense personnel, Las Vegas casino staff),
we came to the conclusion that such a system is not only workable, but is already
working at large scale in many places.
Partly as a result of these early examples, we then decided to survey the linking
practices of a number of healthcare institutions who met the following criteria:
large patient population, spread out among a number of institutions, thus
requiring some form of distributed linking. The institutions we surveyed were
CareGroup (Boston, MA), the North Carolina Emergency Department Database
(NCEDD), Provider Access to Immunization Registry Securely (PAiRS, also in
North Carolina), Regenstrief Institute (Indianapolis, IN), RxHub, Santa Barbara
County Care Data Exchange, Santa Barbara, and Sutter Health (California).
These interviews confirmed our earlier sense that probabilistic matching can be
effective at large scale. (A narrative description of the survey appears at the end
of this document.)
23
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
The existing systems are not perfect, of course. Successful use of such systems
requires that the participating entities capture a number of identifying
characteristics; that the data be relatively clean; and that there be a minimum of
duplicates and data-entry errors. Even when these criteria are met, the system
will still generate a number of ambiguous results, requiring either careful
performance tuning to make sure that these do not become false positives (for
the reasons noted above), or a staff trained to make the judgment calls the
machines are incapable of.
These requirements, though, are still less onerous than what would be needed
for a national Health ID, which also requires clean data entry and database
access, but would also require propagation of an entirely new standard, even to
systems that currently meet the other data requirements.
Probability-weighted matching has two other advantages that Big Bang proposals
lack. First, the chance of false matches rises only gradually with scale. In a clinic
with only hundreds of patients, first+last name alone will be enough to identify
most patient records uniquely, so sharing those records among a small number
of small clinics will not create the same issues of name clashes as a multi-million
record system. Thus small providers can improve incrementally, as they
interconnect incrementally, cleaning data and capturing new fields as they grow,
rather than re-engineering everything all at once. The second advantage is that
large service-oriented systems such as labs and pharmacies are already well
along the path towards clean, more queryable data, offering even small providers
immediate value for plugging into a network of health data.
24
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Document current practices and possible improvements
Our simple and qualitative survey of large health systems has convinced
us that deep knowledge about probabilistic matching exists in many
places. An obvious next step would be a more quantitative comparison of
the specific algorithms used for matching. Of particular concern in such a
survey will be practices around data entry, data cleanliness and the
merging of duplicate records and purging of inaccurate records.
We would also want to uncover which auxiliary databases are in use, such
as sound-alike and nickname dictionaries, and which additional sources of
data are used (e.g. non-traditional identifiers such as Zip+4, mobile phone
numbers, etc) to aid in more accurate disambiguation.
Even more important than knowing the current state of practice is figuring
out how to improve data collection and cleanliness, one institution at a
time. The valuable effects that can come from pooling information about a
patient can only be attained if the data is clean enough to be worth linking
-- a database whose records are too dirty will generate more false
negatives than true matches.
25
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Understanding how to help participants undertake the necessary upgrades
and changes to process, ideally out of local interest, will help advance the
larger goals of accurate linking.
26
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
building and running such a system in order to identify practical
bottlenecks. The design of these projects will require significant and
ongoing attention to the myriad practical details of health IT, and thus
cannot be specified completely without significant input from the actual
participating organizations. Thus a first goal for launching pilot projects
should be the recruitment of organizations who are willing to help design
and test the linking techniques included in the reference implementation,
and to participate in the design, construction and testing of the additional
infrastructure, including especially the Record Locator Service (described
below) necessary to make patient linking part of a larger healthcare
network.
27
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Architectural Principles
Because the linking problem involves multiple organizations and providers, it is
necessarily a network architecture problem as well. Though there would be some
advantage in improving the ability to link records among different databases
internal to an institution, the most complex linking issues appear when patients
are moving between different localities or sites of care. As a result, along with
improvements in probabilistic matching, the Linking Working Group focused on
how different organizations would be able to run such matches on data held
elsewhere.
28
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
The core requirements and constraints are:
• Decentralized
• Federated
• Built without requiring 'Rip and Replace'
• Built through decoupled development
• Built on top of the Internet -- no new wires
Decentralized
We are confident in predicting that this situation will still hold true in five
years time. Therefore, any proposed improvement to the healthcare
system must assume that the participants will be decentralized, and must
be designed to accommodate at least some voluntary, partial, and
incremental participation.
Federated
As has been noted in our meetings numerous times, one cannot take the
healthcare system down for the weekend in order to re-tool it. The
strictures of economic sustainability and practicality demand a clear
migration path for participants in any health architecture.
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
destroying the staff practices that go with these systems, and replace
them with a uniform set of software, requiring wholesale re-training of
staff.
Decoupled development
30
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Any proposed change must take into account the current infrastructure of the
healthcare system, and must work with that infrastructure where possible. Some
of this infrastructure will need to be replaced, of course, and the replacement and
migration will generate new costs, if only during the period of transition, but
where possible, the system should work alongside what has been deployed
today, and the changes, when they come, should ideally be staged so that they
can be adopted gradually over time.
The question is not whether to work from top up or bottom down—both are
necessary. The question is which problems are most amenable to which type of
solution. How an institution chooses to store and retrieve patient records will be
local because it is local – there is too much diversity in medical record keeping to
impose a single national set of tools and techniques. Instead, we will start by
recommending best practices and working towards migrating all players to a
common set of supported standards, but will assume that local diversity will
continue for the foreseeable future.
Ultimately, the design challenge is the federal one: leave to the local systems
those things best handled locally, while specifying at a national level those things
required as universals, in order to allow for interoperability in those areas where
the local systems must communicate or share.
31
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
information are governed by conversations with previous providers that occurred
at the time the relationship was established. In our proposed system, retrieval of
records involves a two-step process: First, the requester queries the directory
and gets pointers to any authorized records indexed in the directory. Then each
provider holding records has the discretion to disclose, depending on that
provider’s rules, as defined in the provider’s initial encounter with the patient.
Thus, there are two decisions to be made locally: whether to index and whether
to share.
32
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
The system neither guarantees nor forecloses patient anonymity. That is a
decision to be made by the patient and provider together; whether a patient’s
identifiers are reported to the directory is a local decision. In a one-time
encounter with a provider, a patient can ask that the records not be indexed in
the directory. If the provider complies, the information cannot be located via the
Record Locator Service. In the case of an established relationship, a patient can
keep records out of the system by asking that they be stored locally under a
pseudonym. Even if the records are linked, they cannot be located by the
patient’s true name. If the patient remembers the pseudonym and associated
identifiers, the records can be retrieved in the future.
Close to the local level, systems can set additional rules for access. Higher levels
of approval can be set locally for sharing some records, or the system can
provide notification to the creator of sensitive records when they are accessed.
The vision that a patient should be able to say, “You can share this record, not
that record, this particular piece of information, not that one” is a vision that
cannot be easily implemented currently. The complexity of the healthcare
system makes it very hard to fulfill this kind of request with high accuracy. A
patient's HIV positive status, for example, can be inferred not just from a label on
a chart but from problem and complaint lists, medication lists, and written doctor's
notes, discharge summaries, imaging studies, etc.
It is our goal to design a linking system that works with the realities of healthcare
record systems as they are being designed. Though some domains such as
intelligence sharing have sub-record-level permissions for sharing, the healthcare
industry typically does not, both because the patient is the key entity in the
system, not the individual record, and because health information is still highly
unstructured. Most health record systems do not allow for record-by-record
distinctions between what can and cannot be shared. It would be a disservice to
both patients and professionals to create the expectation of such highly granular
and controllable records in today’s systems. Such a high degree of granularity is
not required by law and is not being implemented in most new systems.
Payment systems are separate and pose their own privacy issues. Indeed, while
it is possible to include payment pointers in the directory of healthcare records,
so that insurance information is available to providers, it is also desirable to
decouple the payment system from the treatment system for day to day
transactions yet have the capability to conduct authorized sharing for payment,
treatment, and operations.
Under the system we propose, a patient can also collect information in her own
home if she wishes, by using the Record Locator Service, then making HIPAA
requests from the organizations who hold relevant records. Indeed, the system
makes it easier for patients to find and compile their own records than today.
33
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Architectural Overview
With these requirements and constraints listed, we turned our attention to
defining an architectural approach to making records available where and as
needed, given the linking recommendations above.
The core architectural idea of our proposal is that patient records must remain in
the hands of the organizations who create or manage these records – clinics,
hospitals, labs – but these records must be readily locatable by other institutions
and providers who have responsibility to the patient. Examples include
healthcare while traveling, chronically ill patients being treated by multiple
clinicians, emergency care, patients changing physicians after a move or
healthcare plan change, and so on.
The system will rely on optimizing the current methods of institutional and
provider record keeping, using local record numbers of a patient at each site of
care, while improving interoperability of existing systems and methods for
handling patient records locally. Current work on systems for distributed
supercomputing or storage suggests that the problem of interconnecting multiple
nodes is best solved using a "connection broker" pattern. A connection broker is
a database that maintains records of distributed resources and matches requests
with the holders of the appropriate resources.
This central database of pointers can be quite small, relative to the enormity of
distributed resources that can be identified through it. Furthermore, for systems
operating on the Internet (as we assume this one will), once the organizations
involved in information sharing are identified to one another, they can share the
requested data directly, without further involvement by the connection broker.
(However, in some cases, it may be desirable to set up proxy or caching servers,
to allow less technically sophisticated clinics, hospitals and other users access to
the system.)
• The RLS holds information authorized by the patient about where health
information can be found, but not the actual information the records may
contain. It thus enables a separation, for reasons of security, privacy, and
the preservation of the autonomy of the participating entities, of the
34
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
function of locating authorized records from the function of sharing them
with authorized users.
• Release of information from one entity to another is subject to
authorization requirements between those parties; in certain sensitive
treatment situations patients or providers may choose not to share
information.
• RLSs are operated by multi-stakeholder collaboratives at each sub-
network and are built on the current use of Master Patient Indices.
• The Record Locator Service needs to enable a care professional looking
for a specific piece of information (PCP visit or ER record) to find it rapidly.
An open design question is how and where in the model this capability can
best be accomplished.
(For more on the Record Locator Service and the proposed Standards and Policy
Entity which would set the guidelines by which it would operate, see the
response Connecting for Health prepared in collaboration with twelve other
influential groups to the federal government’s RFI on the “National Health
Information Network” at www.connectingforhealth.org.)
We treat these efforts as separate for two reasons: first, we know from the
growth of large technical systems in heterogeneous environments (e.g. email, the
Web) that when too many separate standards are bundled together in an all-or-
nothing package, the expense and organizational difficulty of upgrading
everything all at once becomes prohibitive. Instead, we imagine a set of related
standards that can be implemented in various orders, and can be effective at
various levels of completeness.
35
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Construction of the Record Locator Service
The Record Locator Service is new infrastructure, and will require ongoing
institutional support. In practice, it will be a cluster of databases holding four
types of records -- patient identifying information, healthcare provider information,
a list of patient records held by those providers (though not the records
themselves), and contact details and other services made available by the
providers.
The Record Locator Service (RLS) can only be used by authorized parties and
only over secure connections, to allow a query to come in. Once the RLS
receives an authorized query, it will search for a patient, and return a list of
entities it knows have information on that patient, telling the querying institution
where that information is located and whom to contact in order to access it.
Some organization will have to take responsibility for the ownership and
operation of the RLSs; they will also be responsible for guaranteeing service
level agreements, and must ensure the security and safe handling of the records
contained in the database. There are a number of organizational models for this,
from setting up a new institution who owns and operates the RLS on behalf of
client organizations to a 'first among equals' approach, where an existing
institution takes on the running of the RLS, in return for support from partners.
The design of the institutional structure for supporting an RLS will be a key part
of designing any pilot project.
The RLS would be queried when Institution A had a patient whose existing
records they needed from other labs, hospitals, or clinics. Institution A would offer
authorization credentials over a secure network connection. They would then
send a request for records about a particular patient, offering a set of identifiers
that uniquely identify that patient (e.g. name, DOB, gender, address, phone).
These characteristics would then be run compared using the probability-weighted
matching algorithm described above, with the locations of the matching records
returned to the querying institution.
36
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Survey of existing technical practices
Because there will never be a time when all data is deleted and reloaded,
constant checking between local and remote records will need to be
implemented in a way that maintains high data quality without creating
unsupportable system load. There are several possible approaches to this
problem; we will need to identify which of those approaches have been
found workable by existing organizations.
37
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Launch pilot projects involving three or more entities
Because so many of the difficulties in getting any such system running are
in negotiating multi-lateral agreements among the various parties, any
pilot project designed to test the viability of the Record Locator Service
must be multi-lateral, involving at least three parties at launch. Likewise,
more than one of these pilots should be undertaken in the same time
frame, in order to observe both similarities and differences between
instantiations.
38
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Sharing Appropriate Records
Thus linking creates value in locating records, without requiring the immediate
upgrade of every bit of health IT in the country. Likewise, upgrades (so long as
they are interoperable) become more valuable as more entities begin exchanging
records in this manner. This exemplifies our strategy in general: define a floor for
technological engagement that maximizes participation, but provide every
opportunity and incentive to outperform that minimum. Because upgrades in a
system as large and fragmented as U.S healthcare will necessarily be piecemeal,
the architecture needs to be a platform that both supports and rewards
incremental improvements.
39
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Example: Priscilla Switches Doctors
Clinic A, a participant in the system, has provided the Record Locator Service
with an updated list of patients it holds records on. This is a background process,
where Clinic A communicates directly with the Record Locator Service at regular
intervals, rather than part of the individual search transaction.
Once the staff of Clinic B has taken Priscilla’s identifying details (transaction #1
above), they will authenticate themselves to the Record Locator Service (RLS) to
allow for auditing. After they are authenticated, they will make a request for the
location of any of Priscilla’s other records.
40
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
The request from Clinic B to the RLS will travel over secure transport such as a
Secure Socket Layers (SSL). On receiving it, the RLS will compare Pricilla’s
information with its database. There are three possible outcomes here -- the
Record Locator Service finds records with such a high probability match that they
can be identified as Priscilla's; it finds no records that match; or it finds records
that might match, and asks Clinic B for more identifying information. (This third
option would require staff allocated to handling such requests; some system
designs may simply treat such ambiguous pairs as non-matches, to minimize
human input, even at the expense of additional false negatives.)
Assuming there is a match, the RLS will return pointers to other entities such as
Clinic A that hold her records (transaction #2 above). Clinic B will then make a
request for Priscilla’s records directly to Clinic A, also via a secure Internet
connection, again providing authorization credentials to show that it is allowed to
do so (transaction #3).
Some of the resulting records may be returned from A to B directly over the
Internet, using standardized interfaces for secure transport. The content of the
messages may also be represented in a standardized format, for direct and
automatic import into the new clinic’s database, while other records may be sent
by secure email, or even simple fax. Once B has the results of her earlier pap
smear (as well as any other records held by clinic A), the staff of Clinic B can
then add them to Priscilla's file.
Architectural Features
This example illustrates several key aspects of the imagined architecture:
• The main focus for use of the system is still in the hands of patients and
providers – the system exists to support treatment, payment, and
operations of the current healthcare system, rather than attempting to
replace them.
• Even in its earliest form, it creates value for both doctors and patients. The
staff of Clinic B can spend less time while gathering more information,
Priscilla’s doctor will be better informed, and both the doctor and Priscilla
can avoid the expense and hassle of re-running tests that have already
been done at Clinic A.
• It provides several layers of security. Only entities with authorization will
be allowed access to the system; traffic between entities and network
hosted services will be encrypted; no central repository of all identifiable
clinical information will be held in the center of the network; and traffic
between two entities will either be encrypted or take place outside the
network (e.g. through fax or the mail).
41
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
• It does not require that Priscilla have any sort of national Health ID.
Instead, it uses her existing identifying details to determine a match.
• It separates ‘knowing that’ and ‘knowing what’ information about the
patient. The pointer database offers only ‘knowing that’ information, where
information that a patient has records in a particular institution is available,
but the records themselves are not.
• It leaves the information in the hands of the entities that have a direct
relationship with the patients. Actual sharing of information is left between
the requesting and responding entities, as today. The network service
lowers the enormous costs and difficulties of discovery and location of
remote patient records, but does not require the entities with the patient
relationship to surrender control of those records to a third party.
• It leaves privacy controls where the information was created. If there is
information Priscilla does not want disclosed, the institution holding that
information can opt out of identifying Priscilla as a patient.
• It allows for enormous variability in the technical sophistication of
participants. The minimum level needed to participate is a list of patients
about whom an institution can provide records when asked by an
authorized party. The providing institution at this minimal level of
participation only needs to provide such a list, and to be ready to reply by
fax or mail to valid requests, with no onsite technical requirements for
hardware or software. At the other extreme, large multi-institution
organizations can offer direct lookup and information sharing in response
to authorized requests, thus potentially automating a complex and
expensive task.
The above example illustrates a general design goal – create the minimal level of
new functionality to be useful, while offering a gradient of services and
automation that allows large entities with significant IT investments to gain
additional value.
The basic threshold here is the ability to provide, in electronic format, a standard
list of patients about whom a practice has information. This is the participation
threshold -- “You must be at least this high to get on this ride.” Institutions or
providers that can’t provide a simple list in electronic format are not ready to be
members of the network. For others, the goal of connecting to the Record
Locator Service may provide the impetus to clean up their internal record
keeping.
42
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Privacy Enhancing Technology Built into the
Architecture
Any system of linkage or identification must be secure, preventing unauthorized
outside access and limiting disclosures from within. Security and privacy policies
and procedures that support electronic health information systems should
provide strong controls throughout the environment and for all pathways and
modes of access and use. Strong controls include a regularly updated
authentication and authorization regimen; auditable records of access and
transmittal; and mechanisms for enforcement, including sanctions for violation.
No information system, regardless of the safeguards built in, can be 100%
secure, but appropriate levels of protection coupled with tough remedies and
enforcement measures for breaches can strike a fair balance.
Authentication
A critical component of privacy protection will be authentication of users. While
further research needs to be done on authenticating users in large, decentralized
systems, at this point a user name and strong password, properly managed, offer
sufficient security. Proper management means, among other things, that
procedures for issuing and revoking credentials must be strictly enforced. Since
persistent identifiers pose a security risk, passwords must be time limited, so
there is automatic revocation and reauthorization. In one major system we
studied, passwords are issued initially in face-to-face encounters and are good
for 90 days, and are reissued online with a “secret question” to verify identity.
This in turn raises the question of who manages authentication. There are
several models, ranging from peer-to-peer to governmental, but the one that
seems most likely to succeed (and that is most consistent with emerging
practices) involves a non-profit entity at the center of the national system and at
the center of the regional or other sub-network system that compose the national
system. The New England Health EDI Network (NEHEN) is an example of such
an entity at a regional level. It is a contractually based membership organization,
supported by fees. Each member has one vote in system governance. The
network admits institutions, and the institutions authorize individuals (doctors,
nurses, administrators, etc.) pursuant to guidelines set by the network.
Authentication (who you are) is not the same as authorization (what you can
access). Some hospitals permit all users to access all records. A better
approach is to establish levels of authorization, at least based on occupational
category or function. Under such an approach, doctors, for example, might get
access to all records, while pharmacists would get access to one subset and
administrators to a different subset.
43
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
The issuance and revocation of authentication in large systems is a specialized
part of the computer security universe. Experts from that field must be brought
into the design of the connecting for health system and its component systems.
Audit trails
Another important component of privacy protection is immutable audit logs for
each access to a record, identifying the person who accessed the record and the
purpose. Immutable logs are tamper resistant and tamper evident trails of
activity. Ideally, these logs require multiple parties to access their contents, and
all alterations are treated as updates; no data is ever deleted, and all changes
are signed. In such logging models substantial collusion would be required to
actually falsify the audit log. Such logs improve accountability and oversight, and
can be used to identify patterns of abuse. Audit information would be available
directly to patients, as well as to other participants in the health system.
Encryption
Encryption should be used to protect medical records both in transit and in
storage. Virtual Private Networks (VPNs) or Secure Socket Layers (SSL) offer
protection to data in transit. In addition to encryption of the data as it passes
between entities, we considered the possibility of encrypting (or "hashing") the
identifiers in the RLS. We believe that such hashing is a promising technique,
especially when used to compare sensitive numerical data such as SSNs, but
that it should only be contemplated when it can achieve substantially similar
results as comparing unhashed data. We concluded that the greatest risk is not
theft of database, but insider abuse by authorized internal users, so we focused
protections on that problem.
Limiting queries
Further protections can be built in by limiting query formats. For example, the
directory can be designed to make it impossible to ask for all 20-25 year old
females in a certain neighborhood. In this regard, we believe it is important that
Social Security Numbers (SSNs) not be used as search terms. The SSN has
become so compromised as a result of its widespread use as a generic identifier
that it is a risk to any system that relies on it. We recommend that health records
systems wean themselves from the SSN as the patient record number, and that
the SSN be reserved for disambiguation of records, possibly using only the last 4
digits, and that where possible, comparisons of SSN matches be conducted with
hashed data, so that no SSNs are actually stored in the system itself.
44
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Network of Networks
It's important to note that the proposed architecture sets a floor for interaction
between entities needing to share clinical information, but not a ceiling. Much
higher levels of interoperability are possible and, if agreed to by all parties,
desirable.
As an example, there are a number of health initiatives that tie together several
institutions into one network, including many of the organizations we surveyed
such as CareGroup, NCEDD, Regenstrief, and Sutter Health. These regional
networks have higher degrees of both contractual and technological
standardization than specified here, and consequently offer a higher level of
service.
• They are much smaller, involving a few dozen entities. This enormously
reduces the complexity of the required infrastructure.
• There is a much higher degree of both trust and familiarity between
entities in a regional network, which are likely to share care for many
patients, and to refer patients to one another frequently. By contrast, any
national system must work even between entities that don't collaborate or
share information regularly.
• Regional networks typically have a high degree of mutual contractual
obligation, including shared financial obligations. This is beyond anything
that can be imagined at a national level -- provision must be made for
simpler and less onerous obligations for participants.
• Regional networks typically operate in the borders of a single state,
eliminating the cross-border complexities of varying state regulation.
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
We believe the network of networks model allows us to get the best of both
worlds. As most care is local to the patient's home, they can receive care within
the regional network, but when they need records moved outside that network,
currently very difficult, this system will provide the necessary services, as well as
serving as a method of connecting healthcare entities not affiliated with any
regional networks.
One example of such a mismatch would be between entities that can and can't
provide round-the-clock access to their records. Likewise, some institutions may
expect near real-time communication with partners (analogous to instant
messaging), while others may only be able to support asynchronous
communication (analogous to email). In both of these cases, it may be necessary
to provide an intermediate service to solve these mismatches.
Since the system will necessarily grow in pieces, particularly in the pilot program
phase, many of the decisions about how to handle impedance mismatch will be
best handled on a case-by-case basis. However, the Linking Working Group did
identify two strategies we believe will be worth testing in these situations: local
gateways, and proxy servers.
With the Record Locator Service in the center of the network, and the record-
holding entities at the edges, these two strategies are listed in ascending order of
centrality, which is to say distance from the edge entities and proximity to the
Record Locator Service.
Local gateway
46
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
a simple computer that sits between an institution's local network and the
broader Internet, and is connected to both. The gateway would have three
functions: First, it would provide a standards-compliant interface to
whatever system the local institution happened to be running. In practice,
this would mean fielding queries from the Record Locator Service in
whatever format that service produces, translating those queries into
queries for the local database, which might be as simple as an Access
database running on a single PC. Likewise, it would take the results of
such a query, wrap them up in the format the Record Locator Service
expected in return, and send them back upstream.
This idea has a long pedigree, being the pattern the Internet launched
with. In that situation, the original designers reasoned that it would be too
hard to engineer interoperability between all computers at all the sites
selected for participation, and instead provided small computers that acted
as gateways between local systems and the Internet. This model is
currently in use in the Regenstrief system in Indiana, to good effect.
Proxy servers
47
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
communication. These practices will include issues of identity, (how does
an provider know who a request is coming from), authorization (how does
a provider decide whether they have a right to see the records they are
requesting), security (how can a provider respond in a way that protects
patient information), and transport (how do providers communicating with
one another -- over the Web, via secure email, etc.)
48
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
System-Wide Concerns
While an obvious goal of any attempt to improve healthcare through better use of
IT will include wide support for electronic health records, our proposed principle
of decoupled development suggests that participants should be able to share any
health information they may have on file for a patient, in whatever format it exists.
In practice, this will mean that at least some data is unstructured, possibly as
scans of written notes, while at the other extreme, some data will be in a well-
structured electronic health record format.
Thus, for any pilot project, there will be a range of support for the receiving party
in interpreting or handling the health records themselves, ranging from no formal
support to full computer-aided decision support:
No interpretive support
The most immediate benefit of being able to locate and share records
comes when the information is communicated in any form (even fax) to a
caregiver or other person that will interpret the information in order to
make decisions.
49
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Decision support can be as simple as avoiding an order for a test that has
already been performed to detecting drug allergies or to much more
elaborate rules that support patient safety and evidence-based medicine.
The earliest progress can be made by assuming that all systems operate without
formal interpretive support. There is a danger, however, that making that
assumption becomes a self-fulfilling prophecy. Care must be taken in any system
to build in incentives for continual upgrade of capabilities, especially those
relating to the adoption of electronic health records, even when basic use of the
system does not require it.
Network Membership
Though it would be possible to design a network in which all participants have a
high degree of technical acumen, such a test would be a poor guide to the issues
involved in a broader rollout. However, forcing any network to the level of its least
sophisticated member will dampen much of the potential value.
50
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
One possible solution is a two-tiered system, with two classes of participants:
members and users. The users will implement the absolute minimum amount of
technology necessary to get records from the system (providing the authorization
necessary to query the system while being audited), while members would
implement the full range of technological requirements.
Users of the system, by contrast, will be authorized to query the system to locate
patient information, but won't provide information themselves. The possibility of a
lower level of interaction is included as an escape valve for entities that want to
begin participating but which do not yet have the technical capabilities to offer
member-level service interfaces. However, provisions may need to be made for
use charges or other forms of offsetting revenue, to avoid having the economic
free-rider problem turn the Record Locator Service into an unsustainable
resource. In early pilots, these questions of levels of participation will have to be
decided ad hoc.
Membership cannot be a one size fits all relationship -- there will be giant
institutions, 30 bed hospitals, and solo practitioners participating, and the system
must facilitate the sharing of information among them. (Indeed, coping with
heterogeneous systems of varying levels of sophistication is one of the core
challenges of any health IT system.) Instead, the contracting and fees will have
to be negotiated to take into account the varying technological and financial
circumstances of the members.
51
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Certification for Standards Conformance
After some early test implementations there will be incentives to add software
tools for certifying vendor products and their implementations. This will allow new
participating systems to be added with less personnel time spent testing
interfaces. The methods and software developed early on will have even more
value as the system grows.
The next phase of this project should include consulting with the public and
private sector organizations that have experience certifying compliance with
standards. Some of the factors that must be considered in developing the
software and methodology are:
Vendor testing. Vendors should be able to use the certification tool from
within their software labs to demonstrate that a given version of a product
can be implemented in a manner that conforms with the profile. The
certifying organization should award a certification identifying that a
specific software version has passed testing for a specific use case.
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
evaluation that the data was faithfully entered into the database of the
system under test. Performing this evaluation in an automated
environment is a challenge that can be met to a limited extent.
Testing for error conditions. The software certification system must evaluate
the system under test not only with correct data and operation. It must
evaluate its response to incorrect data and to simulated errors in the
network or communicating systems.
53
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Security
This example, but one of many, illustrates the issue: security is a process, not a
product. In a system whose contents are as critical as the imagined architecture's
will be, and whose round-the-clock uptime is as critical as it will be, security must
be both an early and steady concern.
Though many of the Linking Working Group members have or have had
operational responsibility for secure systems, most of us are not security experts
– rather, we have called on that expertise when building systems. That approach
is needed here as well – a critical next step will be to convene a group of security
experts to contribute to the architecture.
What follows is a brief outline of security features we know we will need; the work
of the security experts in the next phase will be to both flesh out these intuitions
and add what we have missed.
54
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Whatever the particulars of the security systems as they are deployed, they must
serve those goals. Beyond that, the system needs security standards in three
domains:
Wire security
Securing material “on the wire” means making sure that in its transit from
point A to point B it is defended from eavesdropping, copying, or other
interception. In practice, this means encrypting all the material passing
over that connection.
In addition, there are some potential policy changes, as with the Medicare
provision forbidding the use of the Internet to transmit information, that will
need to be re-visited in light of a sound security policy.
Perimeter security
Securing material on the wire is only part of the answer – it’s no good
securing material in transit from A to B, if B is the malefactor. As a result,
we also need to secure the perimeter of the network as well.
55
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
In any case, whatever authentication method is used, the method will
authenticate the participating organization or entity. This entity will, in turn,
authenticate the individuals that it is accountable for, through employment
or other relationships, who are to have access to certain sorts of records.
A critical aspect of auditing is that responsibility is not transitive, but rests
with the authenticated institution. An institution suffering from an exploit
that allows unauthorized use of the system is responsible for the damage,
even if the malefactor broke into the system, rather than being one of their
authorized users.
Content security
56
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Recommendations for Security
Because security is a process and not a product, it must be undertaken as an
ongoing effort. Therefore, rather than producing several recommendations for
security, we have a single one:
57
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Conclusion
We are optimistic that the recommendations provided here for improved linking of
patient information can lead to marked improvement in the amount and quality of
clinical information available at the site of care. However, it is always tempting to
believe you know more than you do. In the planning phase of any technology
project, this temptation shows up as a desire to predict in advance the results of
proposed changes.
This is dangerous because such predictions are always in part wrong, and the
larger the project and longer the imagined timeframe, the likelier it is that any
error in prediction will be serious. The literature of large system design is filled
with projects that wrongly assumed success would be an uncomplicated outcome
of a set of proposed actions. In addition, our own conversations with groups
successfully managing distributed multi-million record systems have convinced
us that the myriad implementation details can only be dealt with in an operational
environment or the closest possible simulation.
58
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
A
Appppeennddiixx
M
MPPII S
Suurrvveeyy S
Suum
mm ma arryy RRe
eppo
orrtt
C
Coon
nnneeccttiin
nggFFoorr H
Heeaalltth
h
August 2004
Survey conducted by Ben Reis, PhD, formerly with the Markle Foundation
Health Program and Clay Shirky, Chair of this Working Group.
59
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Overview
Healthcare organizations typically maintain a Master Patient Index (MPI) or
Enterprise-wide Master Patient Index (EMPI), as the definitive listing of all of their
patients. (We will refer to both classes of index as MPIs throughout.) All patient
data stored by the organization is assigned a patient ID that can be looked up
using the MPI. Two pieces of information concerning the same patient will
(ideally) share the same patient ID, stored in the MPI.
Survey
The Connecting for Health Working Group on Accurately Linking Information for
Healthcare Quality and Safety sought to understand the current practices and
issues involved in locating patient data in systems with multiple entities, each
with its own Master Patient Index.
To understand the issues faced by specific projects doing this today, the Working
Group conducted telephone surveys with technical and administrative personnel
at the following seven regional efforts:
CareGroup, Boston, MA
North Carolina Emergency Department Database (NCEDD), NC
Provider Access to Immunization Registry Securely (PAiRS), NC
Regenstrief Institute, Indianapolis, IN
RxHub, multi-state
Santa Barbara County Care Data Exchange, Santa Barbara, CA
Sutter Healthcare, CA
These seven projects represent only a sampling of current ongoing efforts, and
the present survey is an informational exercise, not a definitive scientific analysis.
60
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Overview of Results
Different projects currently being developed around the country are aimed at
fulfilling a wide variety of different purposes. The designs of the various systems
often reflect both the specific mission of the particular project as well as the
organizational and technological conditions under which it was developed.
Number of Records
The illustration below shows the number of records in each system. Most
systems ranged in size from between 1 million and 10 million records. Santa
Barbara had around 100,000 records. RxHub had over 150 million, being a
combination of three largest Pharmacy Benefits Management (PBM) databases
in the country.
12,000,000
10,000,000
8,000,000
6,000,000
4,000,000
2,000,000
0
0 1 2 3 4 5 6 7 8
Number of Organizations
61
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Organizational Structure
Technical Structure
Most systems reported keeping clinical data at the edges -- i.e. at the various
local entities where it is collected -- with only demographic lookup for matching
identity information across multiple MPIs being stored in the center.
Some projects maintain dynamically updating central copies of all the MPIs of the
participating organizations. Queries are then performed across all the MPIs.
Other projects merge these MPIs into one master MPI and queries are performed
on the master MPI only.
Local MPIs
A vast majority of systems had one local MPI for each participating institution.
Some systems had some local entities with no local MPI.
Data Quality
Most respondents indicated that data quality and cleanliness varied across
different entities. Some indicated that hospital-based systems generally had
higher data quality than those at smaller practices. Respondents found that they
needed to focus on encouraging local entities to clean up their own data.
62
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
What Data is Shared?
Discovery Process
While all systems automatically performed the multi-MPI identity matching step,
the ensuing peer-to-peer data retrieval step was performed automatically in only
some of the systems.
Other systems simply served to inform the user where the information was
located, and it was the user’s job to use other means to access it, including
calling the particular institution by phone to request the records.
Respondents seemed to fall into three categories regarding the approach they
take to handling cases of identity match ambiguity:
63
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
- Push ambiguity to the user – The user of the system is presented with the
full list of possible matches, together with machine-generated probabilities
for each match. The user then decides which matches are good and
which are not. While this removes the need for central disambiguation, it
requires everyday users to deal with issues that they would otherwise not
have to.
This varied widely, depending on which of the above approaches the system
took. A number of systems reported using Initiate System’s “Identity Hub” MPI
matching product with positive results.
Security
Caching Policies
Systems that do not store clinical data in a central location indicated that they do
not cache clinical data there either. However, not all of them had a policy
regarding the caching of data by the users requesting data from the system.
Some do have a policy: any data retrieved by the system must be stored locally
by the user who requested it, as if the requesting user had originally recorded the
data him- or herself.
In some systems once records are linked, they stay linked. In other systems, they
do not stay linked, and the linking must be re-done every time.
64
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Biggest Development Hurdle
Projects reported significant challenges in the setup phase with data cleanliness
and integration. Some also reported issues with the initial phase of legal efforts to
work out the contracts among the various parties.
Some projects are still dealing with data cleanliness issues. Others are facing
challenges in growing their system to include more organizations. Others face
political pressures with different stakeholders promoting progress in different
directions.
Plans for the future include getting cleaner data in more standardized form, from
more organizations, and covering a wider geographical area. Plans also include
providing access to patients through a patient portal.
Scaling
Most projects do not expect significant scaling issues. One project reported that it
is expecting to scale up its capacity in order to be able to provide EMRs to local
doctor’s offices as an ASP service.
Standards Used
Many different standards are used, as appropriate for the data handled by the
particular system. HL7 adoption is nearly universal. Some projects reported that
they are looking to move towards a Web services model.
Contracts
65
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Survey Text
As part of the Connecting for Health Working Group on Accurately Linking
Information for Healthcare Quality and Safety, we are working to understand
current practices and issues involved in locating patient data in systems with
multiple organizations, each with its own Master Patient Index.
We’d be grateful if you could help us understand the work you’ve done on your
system, in return for which we’ll be happy to share the resulting Final Report with
you when it’s done.
General Notes:
MPI
Discovery
10. How do you discover that a patient has records in another system? (e.g.
Query a central database? Broadcast a message to all other participants?
Ask the patient to identify other places where they have records?)
11. What data do you use to determine a match? (e.g. Name, DoB, Gender?
Social? Address, phone, email? etc.)
12. How well does this work? (e.g. Roughly what percentage of tested
matches require further disambiguation?)
66
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Security and Privacy
Subsequent Operations
16. Once two patient records are linked, do they stay linked, or do they need
to be re-matched every time patient data needs to be retrieved across
entities?
17. If they are re-linked, do the entities involved share indices? Or do they
create a master foreign key? Held by whom?
18. Once the data attached to a patient is shared, is that data held in both
locations, or deleted in the ‘subscribing’ institution to be re-imported later?
19. What has been the biggest hurdle to overcome in getting to where you are
to date?
20. What is the hardest part of operating the system today?
21. What is the biggest opportunity or priority for future improvement?
22. Do you expect to need to scale the system to a larger version than you
have today? If not, why not? If so, what do you imagine the hardest
coming challenge will be?
Post-script:
67
Linking Healthcare Information: Proposed Methods for Improving Care and Protecting Privacy
Connecting for Health is an unprecedented collaborative of over 100 public and
private stakeholders designed to address the barriers to electronic connectivity in
healthcare. It is operated by the Markle Foundation and receives additional support
from The Robert Wood Johnson Foundation. Connecting for Health is committed to
accelerating actions on a national basis to tackle the technical, financial and pol-
icy challenges of bringing healthcare into the information age. Connecting for
Health has demonstrated that blending together the knowledge and experience of
the public and private sectors can provide a formula for progress, not paralysis.
Early in its inception, Connecting for Health convened a remarkable group of gov-
ernment, industry and healthcare leaders that led the national debate on electron-
ic clinical data standards. The group drove consensus on the adoption of an initial
set of standards, developed case studies on privacy and security and helped define
the electronic personal health record.