The General Data Protection Regulation (GDPR) only applies to personal data, which is data that relates to “an identified or identifiable natural person” (a human). The central concepts of “identifiability” and of “personal data” are broadly defined in the GDPR, which can create challenges for actors in the health sector hoping to determine what data is regulated by the GDPR and what data is not.
Personal data can consist of “any information” relating to an individual through an identifier. The identifier can be a “name, an identification number, location data, an online identifier” and can also include factors that refer to the concerned person with reference to “the physical, physiological, genetic, mental, economic, cultural or social identity” of that person. For instance, a characteristic, code, or object that relates to a natural person may be considered that person’s identifier. Whole genome sequence data represents a rich network of potential identifiers. This makes such data difficult to anonymize, i.e. render non-personal.
Data can relate to the individual in “content,” “purpose”, or “effect”. Data relates to an individual in content if the substance of the data concerns the person. Data relates to an individual in its purpose if its intended use is to affect the interests of the concerned person, or to make determinations about that person. Last, data relates to a person if its processing is likely to have the result of affecting that person’s interests, even if such an outcome is not the primary purpose of the processing.
Abstract reasoning or concepts used in making decisions about an individual will not necessarily be personal data.
Regarding the “identifiability” of personal data, a contextual, risk-based approach should be adopted. Consequently, the identifiability of data is assessed relative to the factual circumstances of the data’s use, and to relative to other existing datasets that might reasonably linked to the data, rather than in the abstract. Further, data should only be considered identifiable if there is a foreseeable risk of re-identification. All means that are “reasonably likely” to be used to perform re-identification must be considered in determining if the individual is identifiable. According to the Court of Justice of the European Union, this test would include all means except those that are practically impossible to be used, or those that are illegal.
Data can be identifiable even if the controller or processor using the data is not able to perform re-identification themselves. This could be the case, for instance, if third parties can perform the re-identification of de-identified genetic data by combining such data with other accessible sources of genetic data held in ancestry databases.
For practical purposes, some academic commentators recommend assessing the likelihood that an individual could conceivably be re-identified as a result of the controller’s data use, and considering the degree to which such data use could affect the interests of the concerned individuals. This is consistent with risk-based approaches to GDPR compliance.
Last, personal data in the GDPR is distinct from similar concepts in other laws, such as Canadian data privacy legislation or the United States’ Health Insurance Portability and Accountability Act (HIPAA). International research consortia could thus face difficulties in establishing consistent policies regarding data governance, as regulated data may differ across participating jurisdictions.
Alexander Bernier is a Montreal-based lawyer and an Academic Associate at McGill University’s Centre of Genomics and Policy.