Skip to main content

Repositories

Introduction

Repositories are locations where digital objects are stored and made available to the public or data users. They can be regarded as the core of data sharing as they provide platforms to acquire, store, archive, publish, curate, preserve, and access data.

Repositories can be classified differently according to:

  • the type of objects to be stored (e.g. publications or research data),
  • the domain of the data contained (institutional, subject-specific or generic),
  • the storage period of the data (e.g. at least 10 years) or
  • the data access and -reuse policy

Repositories can be an institutional publication server, a subject-specific Open Access (OA) repository, a subject-specific data repository or a long-term archive for data and publications.

How do repositories work?

Curators proof-check data before ingestion in the repository, with regard to their content or quality, and, sometimes, also regarding their legal aspects (copyright, data protection) and ensure that the data are directly available to data-users.

A repository is constituted by repository software and a database. Data providers can transfer the data to the repository typically via a web-based user interface, or the repository operators collect (harvest) the data automatically from other platforms via appropriate protocols and interfaces.

In order to allow data re-use by third parties, metadata are required beside the actual data. They describe the content of the research data and provide information about its creation, the software or methods used and legal aspects. Metadata can be either added manually or provided through other applications. The metadata should also include terms of use in the form of licences that regulate access to the data (registration, embargo, etc.).

Usually repositories offer a search function with which users can find, view and download data. In order to ensure that data are permanently referenced and cited, repositories assign unique persistent identifiers (PIDs).

Finding the right repository

Because of the multitudes of data repositories existing, users could better orient themselves using a research data repository registry (e.g. https://fairsharing.org/ or https://www.re3data.org/). Repository registry services are an essential component of the FAIR (Findable Accessible Interoperable Reusable)-strategy as they help researchers to navigate through thousands of data repository services to find the most suitable one to store and provide access to their research outputs as well as to enable them to search for and repurpose FAIR data created by others. DataCite's Repository Finder finds repositories that meet the criteria recommended by the Enabling FAIR Data Project. ROAR and OPENDOAR are directories that list Open Access repositories from all over the world.

Repositories can also be certified (e.g. CoreTrustSeal). Such certification ensures data users, among other things, that data will be usable, citable and preserved in the long run.

Nowadays more and more peer review journals mandate authors to deposit their data into repositories (e.g. figshare) prior to manuscript submission. If authors do not comply with these rules, the review process is not initiated.

Repositories for Chemistry

In the field of synthetic and analytical chemistry, the repository Chemotion was developed aiming to fill the existing gap for the preservation and publication of experimental work. Chemotion supports scientists in depositing and sharing data in a structured and transparent way. It is free of charge and its software code is available on GitHub. It uses available standardised identifiers such as the reaction InChI (RInChI and RInChIKey). In addition, it is possible to transfer data from an Open Source Electronic Laboratory Notebook (ELN) into the workspace area of the repository. The main advantages of the Chemotion repository are:

  • comprehensive functionality for the collection, preparation and reuse of data using discipline specific methods and data processing tools
  • existence of automated procedures to ease data curation
  • provision of functions for a seamless publication and citation of deposited data
  • support of Digital Object Identifier (DOI) generation, the comparison of the submissions with PubChem instances, and workflows for peer reviewing of the submissions including embargo settings

Core repositories

  • Chemotion Repository Repository for molecules, reactions and research data
  • MassBank EU Ecosystem of databases and tools for mass spectrometry reference spectra (open source)
  • SUPRABANK Curated open resource for intermolecular interactions
  • STRENDA DB Standards for reporting enzymology data

Associated repositories and databases

  • CSD/CCDC Cambridge Structural Database
  • ICSD Inorganic Crystal Structure Database

Other relevant repositories and databases

  • NOMAD FAIR sharing and use of materials science data
  • ioChem-BD Repository for computational chemistry results
  • RADAR Repository for multi-disciplinar research data
  • Zenodo General-purpose open-access repository developed under the European OpenAIRE program and operated by CERN
  • EUDAT CDI Pan-European network consisting of more than 25 research organisations, data and computing centers
  • ChemSpider Free chemical structure database providing fast text and structure search access to over 100 million structures from hundreds of data sources
  • PubChem Open chemistry database at the National Institutes of Health (NIH)

Sources and further information