8. HIPAA de-identification
Meaningful Use Requirments
AR.FND 06.01- Provide the capability to remove the identifiers enumerated in Section 164.514(b)(2)(i) of the HIPAA Privacy Rule.
HIPAA De-identification
De-identification of Patient Health information (PHI) refers to the patients health information’s excluding the information identifying the patient uniquely. (or) It is the process of removing the identification information from PHI (eg: name, address, contact numbers etc)
The De-identified Patient Health information (PHI) is used for various research, census and other activities.
According to HIPAA, de-identify data can be obtained by removing all 18 elements that could be used to identify the individual or the individual's relatives, employers, or household members.
The identifiers that must be removed are the following:
1.Names;
2.All geographic subdivisions smaller than a State, including street address, city, county, precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code if, according to the current publicly available data from the Bureau of the Census:
1.The geographic unit formed by combining all zip codes with the same three initial digits contains more han 20,000 people; and
2.The initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000.
3.All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older;
4.Telephone numbers;
5.Fax numbers;
6.Electronic mail addresses;
7.Social security numbers;
8.Medical record numbers;
9.Health plan beneficiary numbers;
10.Account numbers;
11.Certificate/license numbers;
12.Vehicle identifiers and serial numbers, including license plate numbers;
13.Device identifiers and serial numbers;
14.Web Universal Resource Locator's (URLs);
15.Internet Protocol (IP) address numbers;
16.Biometric identifiers, including finger and voice prints;
17.Full face photographic images and any comparable images; and
18.Any other unique identifying number, characteristic, or code (except as permitted by the re-identification rules)
Re-identification
A covered entity (trusted group of members) may assign a code or other means of record identification to allow de-identified information to be re-identified, provided that the code or other means of record identification is not derived from or related to information about the individual.
The re-identified data provides information which is used to uniquely identify the person. Eg: name, address, contact numbers etc.
The re-identified PHI is used for the purpose to carry out beneficiary activities (eg: provide relief funds or measures to affected patients) for each patient or class of patients, for which access of identifying information of the patient is needed (eg: name, address, contact numbers etc).
Eg: Govt may want to provide some relief measure for the heart disease patients, if number of heart disease patients is above certain limit in the country. For which the govt first gets the de-identified data, checks whether the amendment needs to be passed or not. If amendment is passed, then it is needed to uniquely identify the persons to whom the relief measures should reach. Hence re-identification is performed to identify the person.
PHI Data Classification
The Data present in PHI can be divided into two types,
1.Structured data
2.Unstructured data
Structured data:
Structured data may be numerical (e.g., blood pressure readings, lab results) or single words or finite word combinations (e.g., name, address). This information can easily be analyzed and decided whether it can be included in de-identified data or not.
Unstructured data:
Data contained in an unstructured/free text format can also add to the research capabilities of EMR data, but unstructured data also has the potential risk of containing personal identifying information.
Unstructured data includes Patient notes, progress notes, transfer notes, patients relatives’ history of disease, history data, notification logs, and user text areas.
For unstructured data lexical look-up tables, regular expressions, and simple heuristics should be used to to locate the sensitive data (18 identifiers mentioned by HIPAA).
Lexical Analysis
What data to be loaded into lexical look up table:
• Known names of patients and hospital staff and other elements specified by HIPAA not to be included in de-identified data.
• Generic female and male first names, last names, last name prefixes, hospital names, locations and states which can be obtained from other external sources like census list etc. [This is ignored for the time being]
• Design some regular expressions to identify URL's, date, person names etc. (Eg: name indicators/titles like “Mr. “, “Dr.”, “Ms.” are found, can identify it as name)
Procedure Lexical analysis:
Pre requirement: Lexical look up table contains the values for all the 18 unique identifiers from the openemr database. And regular expressions to identify URL's, date, person names are designed
Steps:
1. Input free text (unstructured data) for lexical analysis
2. Perform a word by word check for the input data with the data in lexical look up table. If a match is found replace the particular word with as “xxx” or “---”
3. Return the free text.
Proposed Solution for De-identification:
1.Design a De-identification input screen, which enables the user to enter the selection criteria for the request of de-identified data (Ex: request for de-identified data of patients with particular disease).
2.Create a table called metadata for “de-identification”, which gives information about what columns in which table needs to be considered for de_identification, re_identification, whether to load it into lexical look up table or not.
3.Load lexical look up table with the values of 18 unique identifiers from openemr database.
(based on details represent in metadata table, col name:load_to_lexical_look_up_table)
4.Input selection criteria. (from the de-identification input screen)
5.Obtain the patient id's which matches the selection criteria.
(for each patient id, check if unique re_identification code is already generated. if not generate unique random re_identification code and store re_identification code, patient id in re_identification_code_table)
6.Obtain de-identified data for all patient's who matches the selection criteria. (based on details represent in metadata table, col name:include_in_de_identification) and store it in de_identified_data table.
7.Output de_identified_data in text format (export in the form of pdf or txt file)
Proposed Solution for Re-identification
Pre requirement: metadata table for de-identification is available - table which gives information about what columns in which table needs to be considered for de_identification, re_identification, whether to load it into lexical look up table or not.
Steps:
1.Input re_identification codes list (import in .txt file)
2.Obtain the patient id for each re_identification code. (for re_identification_code_table which contains patient id and re-identification code)
3.Obtain the identifying data for each patient id (based on details represent in metadata table, col name:include_in_re_identification) and store it in re_identified_data table.
4.Output re_identified_data in text format (export in the form of pdf or txt file)
Status
Implementation Ongoing
Links
- Associated with Sourceforge forum thread: http://sourceforge.net/projects/openemr/forums/forum/202506/topic/3483966