Skip to content Skip to footer

Your domain: Human data

Introduction

When you do research on data derived from human individuals, there are additional aspects that must be considered during the data life cycle. Note, much of the topics discussed on this page will refer to the EU General Data Protection Regulation (GDPR) as it is a central piece of legislation that affects basically all research done on human subjects in the EU and on individuals residing in the EU. Much of the information on this page is of a general nature when it comes to working with human data, an additional focus is on human genomic data and the sharing of such information for research purposes.

Planning for, and collection of, human research data

Description

For research on human data, you must follow established research ethical guidelines and legislations. Preferably, planning for these aspects should be done before starting to handle personal data and in some cases such as in the case of the GDPR, it is an important requirement by laws and regulations.

Considerations

  • Have you got an ethical permit for your research project?
    • To get an ethical permit, you have to apply for an ethical review by an ethical review board.
      • The legislation that governs this differs between countries. Do seek advice from your research institute.
    • In most cases, you should get informed consents from your research subjects.
      • An informed consent is an agreement from the research subject to participate in and share personal data for a particular purpose. It shall describe the purpose and any risks involved (along with any mitigations to minimize those risks) in such a way that the research subject can make an informed choice about participating. It should also state under what circumstances the data can be used for the initial purpose, as well as for later re-use by others.
        • Consider describing data use conditions using a machine-readable formalized description such as DUO. This will greatly improve the possibilities to make the data FAIR later on.
      • Informed consents should be aquired for different purposes:
        • It is a cornerstone of research ethics. Regardless of legal obligations, it is important to ask for informed consents as it is a good research ethics practice and maintains trust in research.
        • Ethical permission legislation to perform research on human subjects demand informed consents in many cases.
        • Personal data protection legislation might have informed consent as one legal basis for processing the personal data.
        • Note that the content of an informed consent, as defined by one piece of legislation, might not live up to the demands of another piece of legislation. For example, an informed consent that is good enough for an ethical permit, might not be good enough for the demands of the GDPR.
    • The Global Alliance for Genomics and Health (GA4GH) has recommendations for these issues in their GA4GH regulatory and ethical toolkit.
  • Personal data protection legislation
    • If you are performing research in the EU on human research subjects, or on human research subject in the EU, you must adhere to the General Data Protection Regulation - GDPR.
      • See Data protection for more information on this law.
      • The sensitivity of your data affects what considerations you have make when handling it, see Determining the sensitivity of your data for more information.
      • For some sensitive data you have to perform a Data Protection Impact Assessments. In general, any biomedical research on human subjects will need to do this.
    • Outside EU

Solutions

Processing and analysing human research data

Description

For human data, it is very important to use technical and procedural measures to ensure that the information is kept secure. There might exist legal obligations to document and implement measures to ensure an adequate level of security.

Considerations

  • Establish adequate Information security measures. This should be done for all types of research data, but is even more important for human data.
    • Information security is usually described as containing three main aspects - Confidentiality, Integrity, and Accessibility.
      • Confidentiality is about measures to ensure that data is kept confidential from those that do not have rights to access the data.
      • Integrity is about measures to ensure that data is not corrupted or destroyed.
      • Accessibility is about measures to ensure that data can be accessed by those that have a right to access it, when they need to access it.
    • Information security measures are both procedural and technical.
    • What information security measures that need to be established should be defined at the planning stage (see above), when doing a risk assessment, e.g. a GDPR Data Protection Impact Assessment. This should identify information security risks, and define measures to mitigate those risks.
    • Contact the IT or Information security office at your institution to get guidance and support to address these issues.
    • ISO/IEC 27001 is an international information security standard adopted by data centres of some universities and research institutes.
  • Locating tools and platforms suited to handle human data
    • Local research infrastructures might have established compute and/or storage solutions with strong information security measures tailored for working on human data (see some examples below). Contact your institute or your ELIXIR node for guidance.
    • There are also emerging alternative approaches to analyse sensitive data, such as doing “distributed” computation, where defined analysis workflows are used to do analysis on datasets that do not leave the place where they are stored.
  • Data quality. When processing human data, data quality is a very important aspect to consider because it can influence the results of the research. Especially in the healthcare sector, some of the data that is used for research was not collected for research purposes, and therefore it is not guaranteed to have sufficient quality. Check the Data Quality page of the RDMkit to learn more about how to assess the quality of health data.

Solutions

  • EUPID is a tool that allows researchers to generate unique pseudonyms for patients that participate in rare disease studies.
  • RD-Connect Genome Phenome Analysis Platform is a platform to improve the study and analysis of Rare Diseases.
  • DisGeNET is a platform containing collections of genes and variants associated to human diseases.
  • PMut is a platform for the study of the impact of pathological mutations in protein structures.
  • IntoGen collects and analyses somatic mutations in thousands of tumor genomes to identify cancer driver genes.
  • BoostDM is a method to score all possible point mutations in cancer genes for their potential to be involved in tumorigenesis.
  • Cancer Genome Interpreter is designed to identify tumor alterations that drive the disease and detect those that may be therapeutically actionable.
  • GA4GH’s Data Security, and Genomic Data toolkits provide policies, standards for the secure transfer and processing of human genomics data. GA4GH standards are often implemented into multiple tools. For example, the Crypt4GH data encryption standard is implemented both in SAMTools and also provided as a utility from the EGA Archive.
  • GA4GH’s Cloud Workstream is a more recent initiative and focuses on keeping data in secure cloud environments and meanwhile bringing computational analysis to the data.

Preserving human research data

Description

It is a good ethical practice to ensure that data underlying research is preserved, preferably in a way that adheres to the FAIR principles. There might also exist legal obligations to preserve the data. With human data, you have to take extra precautions into account when doing this.

Considerations

  • Depositing data in an international repository
    • To make the data as accessible as possible according to the FAIR principles, do deposit the data in an international repository under controlled access whenever possible, see the section Sharing & Reusing of human research data below
  • Legal obligations for preserving research data
    • In some countries there are legal obligations to preserve research data long-term, e.g. for ten years.
    • Even if the data has been deposited in an international repository, this might not live up to the requirements of the law.
    • The legal responsibility for preserving the data would in most cases lie with the research institution where you perform your research. You should consult the Research Data and/or IT support functions of your institution.
  • Information security
    • The solutions you use need to provide information security measures that are appropriate for storing personal data, see the section Processing and Analysing human research data above. Note that the providers of the solutions must be made aware that there are probably extra information security measures needed for long-term storage of this type of data.
  • Regardless of where your data is preserved long-term, do ensure that it is associated with proper metadata according to community standards, to promote FAIR sharing of the data.
  • Planning for long-term storage
    • Do address these issues of long-term preservation and data publication as early as possible, preferably already at the planning stage. If you are relying on your research institution to provide a solution, it might need time to plan for this.

Solutions

Sharing and reusing of human research data

Description

To make human research data reusable for others, it must be discoverable, stored in a safe way, and it must be clear under what circumstances it can be reused.

Considerations

  • Selecting suitable access modes for sharing human data
    • Human data often carries restrictions to its use and it would need to be shared in a manner that obeys such restrictions. There are three access modes for sharing research data:
      • Open access: Data is shared publicly. Open-access is a rarely used access mode for the sharing of human data. To use open-access researchers need to ensure that the shared data cannot be traced back to individual study participants. In other words the data needs to be anonymised, which is difficult in practice.
      • Registered access: Data is shared with researchers, whose “researcher” status has been vouched for by their institution and who agree to abide by data usage policies of repositories that serve the shared data. Datasets that are shared via registered-access would typically have no restrictions besides the condition that data is to be used for research.
      • Controlled access: Data can only be shared with researchers, whose research is reviewed and approved by a data access committee (DAC). Typically researchers, who were involved in the primary collection of data will form the DAC. Use conditions for controlled-access could be a multitude and includes allowed research topics, allowed geographical regions, allowed recipients e.g. non-profit organisations.
  • Publishing Human Research Data
    • It is highly recommended that Human Research Data is shared under controlled access. There are emerging models of sharing data through repositories under federated models.
    • The European Genome-phenome Archive (EGA) is the prime repository for human genomic and phenotypic data. The EGA applies a controlled access model.
  • Transferring human data
    • Transferring human data has to be done in a secure way in order to avoid breaches of privacy. Encrypting of human data whilst it is being transferred provides successful protection if the data is intercepted by an external party while the transfer is being done.

Solutions

  • The European Genome-phenome Archive (EGA) is an international service for secure archiving and sharing of all types of personally identifiable genetic and phenotypic data resulting from biomedical studies and healthcare centres. Human genomic data is considered Sensitive data and is protected by European GDPR, therefore access must be restricted to authorized users. The EGA platform offers secure and European law-compliant data storage, working with GA4GH standards for encryption and storage. At the same time, data is discoverable in the EGA website and shareable with other researchers through authorization and authentication protocols. The right to allow access belongs to the Data providers (and not to the EGA), who are responsible to sign a DAA (Data Access Agreement) with researchers requesting access to their data. The EGA hosts data from all around the world and distributes it where and when the data providers’ law allows.
  • dbGAP and JGA are other international data repositories, based in the USA and Japan respectively, that adopt a controlled-access model based on their national regulations. Due to European GDPR specific requirements, it may not be possible to deposit EU subjects’ data to these repositories.
  • The GA4GH Beacon project is a Global Alliance for Genomics & Health (GA4GH) initiative that enables genomic and clinical data sharing across federated networks. A Beacon is defined as a web-accessible service that can be queried for information about a specific allele with no reference to a specific sample or patient, thereby reducing privacy risks.
  • GA4GH Data Use Ontology DUO is an international standard, which provides codes to represent data use restrictions for controlled access datasets.
  • Crypt4gh is a Python tool to encrypt, decrypt or re-encrypt files, according to the GA4GH encryption file format.

Related pages

More information

Relevant tools and resources

Skip tool table
Tool or resource Description Related pages Registry
BBMRI-ERIC's ELSI Knowledge Base The ELSI Knowledge Base is an open-access resource platform that aims at providing practical know-how for responsible research. Data protection Sensitive data Data Steward: policy Data Steward: research
Beacon The Beacon protocol defines an open standard for genomics data discovery. Researcher Data Steward: research Data Steward: infrastructure Tool info Training
BIONDA BIONDA is a free and open-access biomarker database, which employs various text mining methods to extract structured information on biomarkers from abstracts of scientific publications Data storage Researcher Proteomics Tool info
BoostDM BoostDM is a method to score all possible point mutations (single base substitutions) in cancer genes for their potential to be involved in tumorigenesis. Data analysis Tool info
Cancer Genome Interpreter Cancer Genome Interpreter (CGI) is designed to support the identification of tumor alterations that drive the disease and detect those that may be therapeutically actionable. Data analysis Tool info
ChIPSummitDB ChIPSummitDB is a database of transcription factor binding sites and the distances of the binding sites relative to the peak summits. Tool info
Consent Clauses for Genomic Research A resource for researchers when drafting consent forms so they can use language matching cutting-edge GA4GH international standards
Crypt4GH A Python tool to encrypt, decrypt or re-encrypt files, according to the GA4GH encryption file format.
DAISY Data Information System to keep sensitive data inventory and meet GDPR accountability requirement. Data Steward: infrastructure Data Steward: policy Data protection TransMed Tool info
Data Use Ontology DUO allows to semantically tag datasets with restriction about their usage. Data Steward: research Researcher Standards/Databases Training
DAWID The Data Agreement Wizard is a tool developed by ELIXIR-Luxembourg to facilitate data sharing agreements. Data protection Data Steward: policy
dbGAP The database of Genotypes and Phenotypes (dbGaP) archives and distributes data from studies investigating the interaction of genotype and phenotype in Humans Data publication Researcher Data Steward: infrastructure Tool info Standards/Databases Training
DisGeNET A discovery platform containing collections of genes and variants associated to human diseases. Data analysis Researcher Toxicology data Tool info Standards/Databases
EU General Data Protection Regulation Regulation (eu) 2016/679 of the european parliament and of the council on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation). Data protection Data Steward: policy TSD
EUPID EUPID provides a method for identity management, pseudonymisation and record linkage to bridge the gap between multiple contexts. Data Steward: infrastructure Data Steward: policy
GA4GH Data Security Toolkit Principled and practical framework for the responsible sharing of genomic and health-related data. Data publication Data Steward: policy Data Steward: research Data Steward: infrastructure Sensitive data
GA4GH Genomic Data Toolkit Open standards for genomic data sharing. Data Steward: research Data Steward: infrastructure
GA4GH Regulatory and Ethics toolkit Framework for Responsible Sharing of Genomic and Health-Related Data Data protection Sensitive data Data Steward: policy Data Steward: research Data Steward: infrastructure
HumanMine HumanMine integrates many types of human data and provides a powerful query engine, export for results, analysis for lists of data and FAIR access via web services. Data organisation Data Steward: research Researcher Data analysis Tool info Standards/Databases Training
Informed Consent Ontology The Informed Consent Ontology (ICO) is an ontology for the informed consent and informed consent process in the medical field. Data Steward: infrastructure Data Steward: policy Standards/Databases
International Compilation of Human Research Standards The International Compilation of Human Research Standards enumerates over 1,000 laws, regulations, and guidelines (collectively referred to as standards) that govern human subject protections in 133 countries, as well as standards from a number of international and regional organizations
IntoGen IntoGen collects and analyses somatic mutations in thousands of tumor genomes to identify cancer driver genes. Data analysis Tool info
ISO/IEC 27001 International information security standard Data protection Data Steward: policy
MONARC A risk assessment tool that can be used to do Data Protection Impact Assessments Data protection Data Steward: policy TransMed Standards/Databases
OTP One Touch Pipeline (OTP) is a data management platform for running bioinformatics pipelines in a high-throughput setting, and for organising the resulting data and metadata. Documentation and metadata Data management plan Data analysis Tool info
PAA PAA is an R/Bioconductor tool for protein microarray data analysis aimed at biomarker discovery. Data analysis Researcher Proteomics Tool info
PMut Platform for the study of the impact of pathological mutations in protein stuctures. Data analysis Tool info
Privacy Impact Assessment Tool Privacy Impact Assessment Tool is a software, that allows you to carry out Privacy Impact Assessment (PIA) independently. Data protection Data Steward: policy
RD-Connect Genome Phenome Analysis Platform The RD-Connect GPAP is an online tool for diagnosis and gene discovery in rare disease research. Researcher Training
The European Genome-phenome Archive (EGA) EGA is a service for permanent archiving and sharing of all types of personally identifiable genetic and phenotypic data resulting from biomedical research projects Different instances available Data publication Data Steward: policy CSC TSD Tool info Standards/Databases
The Genomic Standards Consortium (GSC) Minimum Information about any (x) Sequence Documentation and metadata Researcher Data Steward: infrastructure Data Steward: policy Standards/Databases
Tryggve ELSI Checklist A list of Ethical, Legal, and Societal Implications (ELSI) to consider for research projects on human subjects Sensitive data Data Steward: policy Data Steward: research NeLS CSC TSD
National resources
Federated EGA Finland

FEGA allows you to store and shaare sensitive data in Finland in a way that fulfils all the requirements of the General Data Protection Regulation (GDPR).

CSC Researcher Data Steward: research Sensitive data Data publication Existing data
Findata

The Health and Social Data Permit Authority. Findata offers services and enables secure and efficient utilisation of data materials containing health and social data.

CSC Researcher Data Steward: research Sensitive data Existing data
Fingenious

Finnish Biobank Cooperative (FINBB) connects researchers to Finnish biomedical research. Via Fingenious® services the researcher can connect to all Finnish public bio banks.

CSC Researcher Data Steward: research Sensitive data
Sensitive Data Services for Research

CSC Sensitive Data Services for Research are designed to support secure sensitive data management through web-user interfaces accessible from the user’s own computer

CSC Researcher Data Steward: research Sensitive data Data analysis Data storage Data publication
Norwegian COVID-19 Data Portal

The Norwegian COVID-19 Data Portal aims to bundle the Norwegian research efforts and offers guidelines, tools, databases and services to support Norwegian COVID-19 researchers.

Sensitive data Existing data Data publication
Norwegian Federated EGA

Federated instance collects metadata of -omics data collections stored in national or regional archives and makes them available for search through the main EGA portal. With this solution, sensitive data will not physically leave the country, but will reside on TSD.

The European Genome-phenome Archive (EGA)
Sensitive data Existing data Data publication TSD
TSD

The TSD – Service for Sensitive Data, is a platform for collecting, storing, analysing and sharing sensitive data in compliance with the Norwegian privacy regulation. TSD is developed and operated by UiO.

Data analysis Sensitive data Data storage TSD
HUNTCloud

The HUNT Cloud, established in 2013, aims to improve and develop the collection, accessibility and exploration of large scale information. HUNT Cloud offers cloud services, lab management, and is a key service that has established a framework for data protection, data security, and data management. HUNT Cloud is owned by NTNU and operated by HUNT Research Centre at the Department of Public Health and Nursing at the Faculty of Medicine and Health Sciences.

Data analysis Sensitive data Data storage
SAFE

SAFE (secure access to research data and e-infrastructure) is solution for secure processing of sensitive personal data in research at the University of Bergen. SAFE is based on “Norwegian Code of conduct for information security in the health and care sector” (Normen) and ensures confidentiality, integrity, and availability are preserved when processing sensitive personal data. Through SAFE, the IT-department offers a service where employees, students and external partners get access to dedicated resources for processing of sensitive personal data.

Data analysis Sensitive data Data storage
RETTE

System for Risk and compliance. Processing of personal data in research and student projects at UiB.

Data protection Sensitive data Data Steward: policy Data Steward: research
Swedish COVID-19 Data Portal

The Swedish COVID-19 Data Portal provides information, guidelines, tools and services to support researchers to utilise Swedish and European infrastructures for data sharing.

COVID-19 Data Portal Sensitive data Existing data Data publication
Human Data Guidelines

Guidelines as well as further information on legal considerations when working with human biomedical data.

Sensitive data
Contributors