Skip to content Skip to footer

Your tasks: Data publication

Can you really deposit your data in a public repository?

Description

Sometimes it is difficult to determine if publishing data you have at hand is the right thing to do. Some reasons for hesitations might be that you have not used the data in a publication yet and don’t want to be scooped, that the data contains personal information about patients or that the data was collected or produced in a collaboration.

Considerations

  • Publishing data does not necessarily mean open access nor public. Data can be published with closed or restricted access.
  • Data doesn’t have to be published immidiately while you are still working on the project. Data can be made available during the revision of the paper or after the publication of the paper.
  • Make sure to have the rights or permissions to publish the data.
    • Is the data commercially-sensitive?
    • Does the data contain confidential/restricted information?
    • Who controls the data?

Solutions

  • If ethical, legal or contractual issues apply to your data (e.g. personal or sensitive data, confidential or third-party data, data with copyright, data with potential economic or commercial value, intellectual property or IP data, etc) ask help to the Legal Team, Tech Transfer Office or Data Protection Officer of your institute.
  • Decide what is the right type of access for your data, for instance:
    • Open access.
    • Registered access or with authentication procedure.
    • Controlled access or via Data Access Committees (DACs).
  • Decide what licence should be applied to your metadata and data.
  • Certain repositories offer solutions for depositing data that need to be under restricted access. This allows for data to be findable even when it can not be published openly. One example is the The European Genome-phenome Archive (EGA) that can be used to deposit potentially identifiable genetic and phenotypic human data.
  • Many repositories provide the option to put an embargo on a deposited dataset. This might be useful if you prefer to use the data in a publication before making it available for others to use.
  • Establish an agreement outlining the controllership of the data and each collaborators’ rights and responsibilities.
  • Even if the data cannot be published, it is good practice to publish the metadata of your datasets.

Which repository should you use to publish your data?

Description

Once you have completed your experiments and have performed quality control of your data it is good scientific practice to share your data in a public repository. Publishing your data is often required by funders and publishers.

The most suitable repository will depend on the data type and your discipline.

Considerations

  • What type of data are you planning to publish?
  • Does the repository need to provide solutions for restricted access for sensitive data?
  • Do you have the rights to publish the data via the repository?
  • How sustainable is the repository, will the data remain public over time?
  • How FAIR is the repository?
  • Does the funding agency or the scientific journal pose specific requirements regarding data sharing?
  • What are the repository’s policies concerning licences and data reuse?

Solutions

  • Based on the possible ethical, legal and contractual implications of your data, decides:
  • Check if/what discipline-specific repositories can apply the necessary access conditions and licences to your (meta)data.
  • Discipline-specific repositories: if a discipline-specific repository, recognised by the community, exists this should be your first choice since discipline-specific repositories often increases the FAIRness of the data.
  • General-purpose and institutional repositories: For other cases, a repository that accepts data of different types and disciplines should be considered. It could be a general-purpose repository or a centralised repository provided by your institution or university.
  • re3data.org or Repository Finder gathers information about existing repositories and allows you to filter them based on access and licence types.
  • re3data.org and FAIRsharing websites gather features of repositories, which you can filter by discipline, data type, taxonomy and many other features.

How do you prepare your data for publication in data repositories?

Description

Once you have decided where to publish your data, you will have to make your (meta)data ready for repository submission. For this reason it is recommended to become aware of repository’s requirements before start collecting the data.

Considerations

  • What file formats should be used for the data?
  • How is the data uploaded?
  • What metadata do you need to provide?
  • Under which licence should the data be published?

Solutions

  • Learn the following information about the chosen repositories:
    • Required metadata schemes.
    • Required ontologies or controlled vocabularies.
    • Accepted file formats for data and metadata.
    • Costs for sharing and storing data.
  • Repositories generally have information about data formats, metadata requirements and how data can be uploaded under a section called “submit”, “submit data”, “for submitters” or something similar. Read this section in detail.
  • To ascertain re-usability data should be released with a clear and accessible data usage licence. We suggest making your data available under licences that permit free reuse of data, e.g. a Creative Commons licence, such as CC0 or CC-BY.
    • Note that every repository can have one default licence for all datasets. For instance, sequence data submitted to for example ENA are implicitly free to reuse by others as specified in the INCD Standards and policies.
  • See the corresponding page for more detailed information about metadata, licences and data transfer.

More information

Relevant tools and resources

Skip tool table
Tool or resource Description Related pages Registry
ArrayExpress A repository of array based genomics data Microbial biotechnology Tool info Standards/Databases Training
b2share Store and publish your research data. Can be used to bridge between domains Data storage Bioimaging data Standards/Databases
BigNASim Repository for Nucleic Acids MD simulations Biomolecular simulation data Tool info
BioImageArchive The BioImage Archive stores and distributes biological images that are useful to life-science researchers. Bioimaging data Standards/Databases
BioModels A repository of mathematical models for application in biological sciences Microbial biotechnology Tool info Standards/Databases Training
BioStudies A database hosting datasets from biological studies. Useful for storing or accessing data that is not compliant for mainstream repositories. Microbial biotechnology Documentation and metadata Plant sciences Tool info Standards/Databases Training
dbGAP The database of Genotypes and Phenotypes (dbGaP) archives and distributes data from studies investigating the interaction of genotype and phenotype in Humans Researcher Data Steward: infrastructure Human data Tool info Standards/Databases Training
Dryad Open-source, community-led data curation, publishing, and preservation platform for CC0 publicly available research data Biomolecular simulation data Bioimaging data Standards/Databases
e!DAL-PGP Plant Genomics and Phenomics Research Data Repository Plant sciences Plant Genomics Researcher Data Steward: research Data Steward: infrastructure Standards/Databases
ELIXIR Deposition Databases for Biomolecular Data List of discipline-specific deposition databases recommended by ELIXIR. Researcher Data Steward: research Data Steward: infrastructure COVID-19 Data Portal NeLS IFB CSC Standards/Databases
EMBL-EBI's data submission wizard EMBL-EBI's wizard for finding the right EMBL-EBI repository for your data. Researcher Data Steward: research
EMPIAR Electron Microscopy Public Image Archive is a public resource for raw, 2D electron microscopy images. You can browse, upload and download the raw images used to build a 3D structure OMERO Bioimaging data
fairsharing A curated, informative and educational resource on data and metadata standards, inter-related to databases and data policies. Documentation and metadata Data Steward: policy Data Steward: research Researcher Microbial biotechnology Existing data Standards/Databases Training
FigShare Data publishing platform Different instances available Biomolecular simulation data Bioimaging data Standards/Databases Training
GA4GH Data Security Toolkit Principled and practical framework for the responsible sharing of genomic and health-related data. Data Steward: policy Data Steward: research Data Steward: infrastructure Human data Sensitive data
Gene Expression Omnibus (GEO) A repository of MIAME-compliant genomics data from arrays and high-throughput sequencing Microbial biotechnology Documentation and metadata Data transfer OMERO Bioimaging data Toxicology data
GitHub Versioning system, used for sharing code, as well as for sharing of small data Data organisation Data Steward: infrastructure Data Steward: research Standards/Databases Standards/Databases Training
GitLab GitLab is an open source end-to-end software development platform with built-in version control, issue tracking, code review, CI/CD, and more. Self-host GitLab on your own servers, in a container, or on a cloud provider. Data organisation Data Steward: infrastructure Data Steward: research Standards/Databases Training
GPCRmd Repository of GPCR protein simulations Biomolecular simulation data Tool info
Image Data Resource (IDR) A repository of image datasets from scientific publications Microbial biotechnology Documentation and metadata Data transfer OMERO Bioimaging data Tool info Standards/Databases
Mendeley data Multidisciplinary, free-to-use open repository specialized for research data Biomolecular simulation data Standards/Databases
MetabolomeXchange A repository of genomics data relating to the study of the metabolome Microbial biotechnology Tool info
MoDEL-CNS Repository for Central Nervous System-related mainly membrane protein MD simulations Biomolecular simulation data
ModelArchive Repository for theoretical models of macromolecular structures with DOIs for models Biomolecular simulation data Structural Bioinformatics Standards/Databases
NMRlipids Repository for lipid MD simulations to validate force fields with NMR data Biomolecular simulation data
OpenScienceFramework free and open source project management tool that supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery Biomolecular simulation data Standards/Databases
PANGAEA Data Publisher for Earth and Environmental Science Tool info Standards/Databases
Repository Finder Repository Finder can help you find an appropriate repository to deposit your research data. The tool is hosted by DataCite and queries the re3data registry of research data repositories. Researcher Data Steward: research
Scientific Data's Recommended Repositories List of respositories recommended by Scientific Data, contains both discipline-specific and general repositories. Researcher Data Steward: research Data Steward: infrastructure
SSBD:database Added-value database for biological dynamics images Bioimaging data
SSBD:repository An open data archive that stores and publishes bioimaging and biological quantitative datasets Bioimaging data
The European Genome-phenome Archive (EGA) EGA is a service for permanent archiving and sharing of all types of personally identifiable genetic and phenotypic data resulting from biomedical research projects Different instances available Human data Data Steward: policy CSC TSD Tool info Standards/Databases
Wellcome Open Research - Data Guidelines Wellcome Open Research requires that the source data underlying the results are made available as soon as an article is published. This page provides information about data you need to include, where your data can be stored, and how your data should be presented. Researcher Data Steward: research
WorkflowHub WorkflowHub is a registry for describing, sharing and publishing scientific computational workflows. Data Steward: research Researcher Tool info Standards/Databases
Zenodo Generalist research data repository built and developed by OpenAIRE and CERN Biomolecular simulation data Bioimaging data Standards/Databases Training
National resources
PUBLISSO

Open access publishing platform for life sciences

Researcher Data Steward: research
Fairdata.fi

With the Fairdata Services you can store, share and publish your research data with easy-to-use web tools.

CSC Researcher Data Steward: research Data storage Existing data
Federated EGA Finland

FEGA allows you to store and shaare sensitive data in Finland in a way that fulfils all the requirements of the General Data Protection Regulation (GDPR).

CSC Researcher Data Steward: research Sensitive data Existing data Human data
Sensitive Data Services for Research

CSC Sensitive Data Services for Research are designed to support secure sensitive data management through web-user interfaces accessible from the user’s own computer

CSC Researcher Data Steward: research Sensitive data Data analysis Data storage Human data
Norwegian COVID-19 Data Portal

The Norwegian COVID-19 Data Portal aims to bundle the Norwegian research efforts and offers guidelines, tools, databases and services to support Norwegian COVID-19 researchers.

Human data Sensitive data Existing data
Norwegian Federated EGA

Federated instance collects metadata of -omics data collections stored in national or regional archives and makes them available for search through the main EGA portal. With this solution, sensitive data will not physically leave the country, but will reside on TSD.

The European Genome-phenome Archive (EGA)
Human data Sensitive data Existing data TSD
usegalaxy.no

Galaxy is an open source, web-based platform for data intensive biomedical research. This instance of Galaxy is coupled with NeLS for easy data transfer.

Galaxy
Data analysis Sensitive data Existing data NeLS
DataverseNO

DataverseNO is a national, generic repository for open research data. Various Norwegian research institutions have established a partner agreements about using DataverseNO as institutional repositories for open research data.

DATAVERSE
SciLifeLab Data Repository (Figshare)

A repository for publishing any kind of research-related data, e.g. documents, figures, or presentations.

FigShare
Existing data
NBIS Data Management Consultation

Free consultation service regarding data management questions in life science research.

Data management plan Sensitive data
Swedish COVID-19 Data Portal

The Swedish COVID-19 Data Portal provides information, guidelines, tools and services to support researchers to utilise Swedish and European infrastructures for data sharing.

COVID-19 Data Portal Human data Sensitive data Existing data
Contributors