Your tasks: Sensitive data

Is your data sensitive?

Description

In general, the term “sensitive data” is used for any data that could do harm (for example to people, organisations, countries, or ecosystems) if it would be openly available. This can for example be personal or commercial information, but also information such as breeding grounds of endangered species. Any such data must be protected against unauthorized access. What is considered sensitive information is usually regulated by national laws and may differ between countries. You should be cautious when you are dealing with sensitive, or potentially sensitive, information.

Considerations

If you deal with any information about individuals from the EU, you are bound by the General Data Protection Regulation (GDPR). In GDPR, such data is called “personal data”.
In the context of GDPR “special category data” is a subclass of “personal data” that is potentially even more harmful, and GDPR prescribes very strict rules for dealing with this data. Article 9 of GDPR defines the special categories as data consisting of racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, genetic data, biometric data, data concerning health or data concerning a natural person’s sex life or sexual orientation. Confusingly, these special categories are sometimes colloquially called “sensitive data”. Note that this page is concerned with the broader definition of “sensitive data”.
Information in Life Science projects are for the most part categorised under health and genetic data and are considered special category data under the GDPR.
You need to assess whether or not your dataset contains personally identifying attributes. Note that combinations of attributes that are themselves not identifyable can be identifyable together.
You need to know the de-identification status of your data. Life Science research data rarely contains directly identifying attributes. Research data would typically be pseudonymised or anonymised. If you work with personal data, you must understand the difference between these two (see under de-identification below).
For some studies there is a cohort owner, often a clinical party or a trusted third party that can map study participant keys back to names and surnames. Such data is considered pseudonymous.
If there are no means to map the data back to individuals, then the data is considered anonymous and is out of the scope of the GDPR.
You should keep in mind that anonymising data is a notoriously difficult task. Does your dataset contain a wide array of attributes, or exhibit unique traits/patterns such that one can reasonably expect that not more than a dozen people in the world have those together? In that case, you can not assume that it is anonymous. Such data run the risk of being linked back to individuals through various technical means. You need to take into account that technical means to identify people in the future may be more powerful than than they are right now: i.e. data that is anonymous right now may not be anonymous forever.

Solutions

Identify what legislations and regulations there are that you are expected to follow. Your institution’s website may give you hints on where you can look for information about sensitive data.
If you cannot determine if your data is sensitive, contact someone with expert knowledge in that area.

How can you de-identify your data?

Description

Data anonymization is the process of irreversibly modifying personal data in such a way that subjects cannot be identified directly or indirectly by anyone, including the study team. If data are anonymized, no one can link data back to the subject.

Pseudonymization is a process where identifying-fields within data records are replaced by artificial identifiers called pseudonyms or pseudonymized IDs. Pseudonymization ensures no one can link data back to the subject, apart from nominated members of the study team who will be able to link pseudonyms to identifying records, such as name and address.

Data anonymization involves modifying a dataset so that it is impossible to identify a subject from their data. Pseudonymization involves replacing identifying data with artificial IDs, for example, replacing a healthcare record ID with an internal participant ID only known to a named clinician working in the study.

Considerations

Both anonymization and pseudonymization are approaches that comply with the GDPR. Simply removing identifiers cannot guarantee data anonymity. A dataset may contain unique traits/patterns that could identify individuals. An example of this would be recording 2 potentially unrelated attributes such as the instance of a rare disease and country of residence, where there is only a single case of this disease in this country. Data that is anonymous currently may not be anonymous in the future. Future datasets on the same individual may disclose their identity. Anonymization techniques can sometimes damage the statistical properties of the data, for example, translating current participant age into an age range.

Solutions

An example of pseudonymization is where participants in a study are assigned a non-identifying ID and all identifying data (such as name and address) are removed from the metadata to be shared. The mapping of this ID to personal data is held separately and securely by a named researcher who will not share this data. There are well-established data anonymization approaches, such as k-anonymity, l-diversity, and differential privacy.

Tool assembly

More information

Training

Training in TeSS

Links to other ELIXIR resources

Relevant tools and resources

Tool or resource	Description	Related pages	Registry
Amnesia	Amnesia is a GDPR compliant high accuracy data anonymization tool
BBMRI-ERIC's ELSI Knowledge Base	The ELSI Knowledge Base is an open-access resource platform that aims at providing practical know-how for responsible research.	Data protection Data Steward: policy Data Steward: research Human data
ELIXIR-AAI	The ELIXIR Authentication and Authorisation Infrastructure (AAI)	NeLS TSD TransMed	Training
GA4GH Data Security Toolkit	Principled and practical framework for the responsible sharing of genomic and health-related data.	Data publication Data Steward: policy Data Steward: research Data Steward: infrastructure Human data
GA4GH Regulatory and Ethics toolkit	Framework for Responsible Sharing of Genomic and Health-Related Data	Data protection Data Steward: policy Data Steward: research Data Steward: infrastructure Human data
Nettskjema	Form and survey tool, also for sensitive data	TSD
Tryggve ELSI Checklist	A list of Ethical, Legal, and Societal Implications (ELSI) to consider for research projects on human subjects	Data Steward: policy Data Steward: research Human data NeLS CSC TSD
TSD	Norwegian Services for sensitive data	TSD Data storage	Training
National resources
Federated EGA Finland	FEGA allows you to store and shaare sensitive data in Finland in a way that fulfils all the requirements of the General Data Protection Regulation (GDPR).	CSC Researcher Data Steward: research Data publication Existing data Human data
Findata	The Health and Social Data Permit Authority. Findata offers services and enables secure and efficient utilisation of data materials containing health and social data.	CSC Researcher Data Steward: research Existing data Human data
Fingenious	Finnish Biobank Cooperative (FINBB) connects researchers to Finnish biomedical research. Via Fingenious® services the researcher can connect to all Finnish public bio banks.	CSC Researcher Data Steward: research Human data
Sensitive Data Services for Research	CSC Sensitive Data Services for Research are designed to support secure sensitive data management through web-user interfaces accessible from the user’s own computer	CSC Researcher Data Steward: research Data analysis Data storage Data publication Human data
Norwegian COVID-19 Data Portal	The Norwegian COVID-19 Data Portal aims to bundle the Norwegian research efforts and offers guidelines, tools, databases and services to support Norwegian COVID-19 researchers.	Human data Existing data Data publication
Norwegian Federated EGA	Federated instance collects metadata of -omics data collections stored in national or regional archives and makes them available for search through the main EGA portal. With this solution, sensitive data will not physically leave the country, but will reside on TSD. The European Genome-phenome Archive (EGA)	Human data Existing data Data publication TSD
usegalaxy.no	Galaxy is an open source, web-based platform for data intensive biomedical research. This instance of Galaxy is coupled with NeLS for easy data transfer. Galaxy	Data analysis Existing data Data publication NeLS
Educloud Research	Educloud Research is a platform provided by the Centre for information Technology (USIT) at the University of Oslo (UiO). This platform provides access to a work environment accessible to collaborators from other institutions or countries. This service provides a storage solution and a low threshold HPC system that offers batch job submission (SLURM) and interactive nodes. Data up to the red classification level can be stored/analysed.	Data analysis Data storage
TSD	The TSD – Service for Sensitive Data, is a platform for collecting, storing, analysing and sharing sensitive data in compliance with the Norwegian privacy regulation. TSD is developed and operated by UiO.	Human data Data analysis Data storage TSD
HUNTCloud	The HUNT Cloud, established in 2013, aims to improve and develop the collection, accessibility and exploration of large scale information. HUNT Cloud offers cloud services, lab management, and is a key service that has established a framework for data protection, data security, and data management. HUNT Cloud is owned by NTNU and operated by HUNT Research Centre at the Department of Public Health and Nursing at the Faculty of Medicine and Health Sciences.	Human data Data analysis Data storage
SAFE	SAFE (secure access to research data and e-infrastructure) is solution for secure processing of sensitive personal data in research at the University of Bergen. SAFE is based on “Norwegian Code of conduct for information security in the health and care sector” (Normen) and ensures confidentiality, integrity, and availability are preserved when processing sensitive personal data. Through SAFE, the IT-department offers a service where employees, students and external partners get access to dedicated resources for processing of sensitive personal data.	Human data Data analysis Data storage
RETTE	System for Risk and compliance. Processing of personal data in research and student projects at UiB.	Human data Data protection Data Steward: policy Data Steward: research
NBIS Data Management Consultation	Free consultation service regarding data management questions in life science research.	Data management plan Data publication
Swedish COVID-19 Data Portal	The Swedish COVID-19 Data Portal provides information, guidelines, tools and services to support researchers to utilise Swedish and European infrastructures for data sharing.	COVID-19 Data Portal Human data Existing data Data publication
Human Data Guidelines	Guidelines as well as further information on legal considerations when working with human biomedical data.	Human data