Guidelines for genomics & transcriptomics data

Guidelines for COVID-19 data

Make your COVID-19 research data useful and accessible for the rest of the research community by publishing in a public repository together with descriptive metadata. Please check here the collection of national Data Management Resources like the Greek instance of the Data Stewardship Wizard, a platform that facilitates drafting of data management plans (DMPs) or the extensible and searchable catalogue for DMP resources DMPlanner.

Repositories

We suggest that raw virus sequence data as well as assembled and annotated genomes are submitted to ENA. See documentation about submission at SARS-CoV-2 submission. Before submission of raw sequence data (e.g. shotgun sequencing) it is necessary to remove contaminating human reads.

Host (human) sequence data requires restricted access, and ELIXIR Greece is building a local federated version of the European Genome-phenome Archive (EGA), allowing for the publication of sensitive personal data within a legal framework. Until local EGA is available, the dataset should remain in the secure analysis environment.

Metadata

Metadata provides ‘data about data’ , and may include information on the methodology used to collect the data, analytical and procedural information, definitions of variables, units of measurement, any assumptions made, the format and file type of the data and software used to collect and/or process the data. Researchers are strongly encouraged to use community metadata standards where these are in place.

MINSEQE (Minimal Information about a high throughput SEQuencing Experiment) is the preferred minimal metadata standard for transcriptomics data in general. For viral data, consider using the ENA virus pathogen reporting standard checklist.

It is highly recommended to, from the very beginning of the project, structure e.g. sample metadata in a way that enables sequence data submission without having to reformat the metadata.