RAMPANT MISUSE OF THE WORD “DE-IDENTIFIED”

January 12 2021 Jen Wind, MA, PMP

For some reason, I love words.  I love vocabulary, and I enjoy how words have precise meanings.  I find written communication fascinating; a well-written text is a joy to behold!

Having worked in the world of clinical research for 15 years, I can’t help but be baffled by the rampant misuse of the word “de-identified” as it relates to the classification of clinical data sets.

In the United States, “the HIPAA1 Privacy Rule provides federal protections for personal health information [PHI] held by covered entities and gives patients an array of rights with respect to that information.”2 De-identified health information is not PHI and thus is not protected by HIPAA.

It’s a common misconception that if you remove patients’ names from a set of clinical data, it becomes de-identified and is no longer governed by HIPAA. However, to make a set of clinical data truly de-identified, you must remove much more than just the patients’ names. 

Did you know?
  • If a clinical data set includes a subject’s city, county, or zip code, it is not de-identified.
  • If a clinical data set includes the month or day of a subject’s birth date, admission date, discharge date, laboratory test date, or date of death, it is not de-identified.
  • If a clinical data set includes a subject’s medical record number, it is not de-identified.

In fact, there are 18 identifiers3,4 that must be removed before a data set can be defined as de-identified. 

DE-IDENTIFIED data set versus LIMITED data set

In my experience, when people use the word de-identified, they often actually mean limited.3,4  Like a de-identified dataset, a limited data set cannot contain names, social security numbers, medical record numbers, or several of the other 18 identifiers. Unlike a de-identified dataset, however, a limited data set is allowed to contain month, day and year of birth date, admission date, discharge date, laboratory test date, and date of death, as well as city and zip code. Importantly, unlike a de-identified data set, limited data sets are governed by HIPAA (see table below).                                              

data elements de-identified data set limited data set identified data set
Is this type of data set governed by HIPAA?
Can this type of data set contain the following identifiers?      
1.  Names
2.  a) State
     b) Street Address
     c) City
     d)  Zip Code
3.  All elements of dates (except year) for dates directly related to an individual including birth date, admission date, discharge date, date of death
4.  Telephone numbers
5.  Email addresses
6.  Social security numbers

So what are the options? How can covered entities use and disclose clinical data in compliance with HIPAA?
  1. DE-IDENTIFICATION 
    Truly de-identified data are not governed by HIPAA, so use and disclosure are allowed.  There are two ways to de-identify data in accordance with HIPAA.5 The first is Safe Harbor whereby all 18 identifiers are explicitly and implicitly removed. The second is Expert Determination whereby an expert determines that the risk of re-identification of an individual from the data set is very small. 

    If de-identification is not possible, there are still several ways covered entities can use and disclose PHI for research and comply with the privacy rule.

  2. PARTICIPANT AUTHORIZATION 
    If a subject signs an HIPAA Authorization, either as part of the informed consent form (ICF) or as a separate form, in the context of a clinical trial, then specific use and disclosure of PHI as outlined in the agreement are allowed. 
     
  3. IRB WAIVER 
    Use and disclosure of PHI for research, without participant authorization, are allowed if a waiver of authorization has been granted by an Institutional Review Board (IRB) or Privacy Board.3 IRB waivers are only permitted if the use and disclosure of PHI poses minimal risk to individuals’ privacy, the research could not practicably be conducted without the waiver, and the research could not be practicably conducted without use of PHI.
     
  4. PREPARATORY TO RESEARCH
    Use and disclosure of PHI are allowed in certain situations when they are solely to prepare a research proposal, design a research study, or assess the feasibility of conducting a study.
     
  5. RESEARCH ON PHI OF DECEDENTS
    Use and disclosure of PHI are allowed in certain situations when they are solely for research on the PHI of decedents (people who have died).
     
  6. USING A LIMITED DATA SET WITH A DATA USE AGREEMENT
    PHI contained in a limited data set may be used and disclosed for specific purposes-such as research or public health-without authorization or waiver if the covered entity and the user enter into a data use agreement that clearly stipulates how the data may be used and how it will be protected.
Who cares? 

This may seem like nothing more than pedantic semantics, but it’s important to be precise when it comes to protecting patients’ right to privacy.  De-identified health information is not PHI and thus is not protected by HIPAA. Limited data sets contain PHI and are protected by HIPAA.  If someone mistakenly thinks that data in their possession is de-identified when it is really limited or identified, they may think it is okay to freely use and disclose those data… but it is not.

The biggest risk of improper use of the term de-identification is re-identification, when an individual can be identified and singled out from an aggregated data set. Unauthorized access to protected heath information can lead to profiling and discrimination, especially of vulnerable populations, in areas such as employment, health, education, and lending.  The ultimate goal is to maximize the benefits of clinical research while minimizing the risk of breaching patient privacy.

Next time you find yourself using the word “de-identified”, I encourage you to pause and think about the weight of that word and whether you truly mean “de-identified”.  Our patients’ privacy is at stake.

 

Key words: De-identified, Limited Data, HIPAA, PHI, Privacy, Discrimination, Clinical Research
________

1HIPAA stands for Health Insurance Portability and Accountability Act.

2What is PHI? https://www.hhs.gov/answers/hipaa/what-is-phi/index.html, Accessed 21August2020.

3For a full list of the 18 PHI identifiers, visit: How Can Covered Entities Use and Disclose Protected Health Information for Research and Comply with the Privacy Rule? https://privacyruleandresearch.nih.gov/pr_08.asp, Accessed 21August2020.

4HHS.gov, Research. https://www.hhs.gov/hipaa/for-professionals/privacy/guidance/research/index.html, Accessed 17September2020.

5Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html, Accessed 21August2020.

6Civil Rights Principles for the Era of Big Data. https://civilrights.org/2014/02/27/civil-rights-principles-era-big-data/, Accessed 21August2020.

If someone mistakenly thinks that the clinical data in their possession is de-identified when it is really limited or identified, they may think it is okay to freely use and disclose those data… but it is not.

Share This Article