Define your data sharing scope
Before sharing any sequence data, determine whether the dataset contains human genomic variants or pathogen sequences. This distinction dictates the legal framework: GDPR for human data, or the Nagoya Convention and GISAID for biological resources. Confusing these categories leads to immediate compliance failures.
Human genomic variants
If your data involves human subjects, it is classified as personal data under the General Data Protection Regulation (GDPR) in the EU and similar laws globally. Genomic data is uniquely identifying and often cannot be fully anonymized. You must ensure any sharing mechanism respects data subject rights and has a lawful basis, such as explicit consent or public interest.
Pathogen sequences
Pathogen sequence data, such as viral genomes, often falls under the Convention on Biological Diversity (CBD) and specific databases like GISAID. GISAID operates on a unique sharing mechanism where access is free for those who agree to the Database Access Agreement and provide proper attribution. Unlike traditional open data, GISAID requires users to register and adhere to specific terms regarding data usage and credit, particularly for clinical or epidemiological insights.
Start by cataloging the origin and type of your sequences. If human, consult your Data Protection Officer to assess GDPR obligations. If pathogenic, review the GISAID EULA or relevant national biosafety regulations. Clear scoping prevents costly rework later in the sharing process.
Anonymize human genomic variants
Sharing human genomic data requires stripping identifiers that could link a sequence back to a specific person. This process, known as de-identification, is essential for complying with genetic privacy laws while maintaining scientific utility. Without proper anonymization, shared sequence data remains vulnerable to re-identification attacks, even when names are removed.
The following steps outline the technical workflow for preparing genomic variants for public sharing. These steps align with standards from major repositories like GISAID and the International Nucleotide Sequence Database Collaboration (INSDC), which mandate strict data governance for open access.
By following this sequence, you can share valuable genomic data while protecting participant privacy. Always consult the latest guidelines from the target data repository, as privacy standards evolve rapidly in response to new re-identification techniques.
Use GISAID for Pathogen Sequences
GISAID (Global Initiative on Sharing All Influenza Data) is the standard repository for pathogen sequences, particularly for viruses like SARS-CoV-2. It operates on a shared governance model where data access is governed by the Database Access Agreement (DAA). Unlike open databases that may have fewer attribution requirements, GISAID ensures that contributors receive proper credit and that users adhere to specific data use terms.
To share pathogen sequence data ethically and legally through GISAID, follow this workflow:

Using GISAID’s structured approach helps balance the need for rapid global data sharing with the rights of data contributors. By adhering to the DAA, you help maintain a trusted ecosystem where scientists can collaborate without legal ambiguity.
Draft a data use agreement
A Data Use Agreement (DUA) is the legal boundary that prevents your shared sequence data from being misused. Without it, recipients may publish findings before you, use the data for commercial gain, or share it with unauthorized third parties. This document turns a simple data transfer into a controlled, accountable exchange.
You need to define exactly who can access the data, for what purpose, and for how long. The agreement should explicitly forbid re-identification attempts and mandate secure storage. By setting these rules upfront, you protect the privacy of the individuals behind the sequences and maintain your institution’s compliance with regulations like HIPAA or GDPR.
Key clauses to include
Start by defining the Scope of Use. Specify whether the data is for academic research, public health monitoring, or commercial development. Any use outside this scope constitutes a breach. Be specific about permitted analyses to avoid ambiguity later.
Next, outline Data Security Requirements. Recipients must store the data on encrypted, access-controlled servers. They should not share the raw sequences with anyone not listed in the agreement. Include a clause requiring immediate notification if a security breach occurs, so you can mitigate potential privacy risks.
Finally, establish Publication and Attribution Rules. Many disputes arise when recipients publish results without citing the data provider or delaying release to allow the original team to publish first. A clear timeline for review and acknowledgment protects your intellectual contribution while ensuring fair credit.
Official templates and resources
Do not start from scratch. Use existing frameworks that are already tested in the scientific community. The Global Initiative on Sharing All Influenza Data (GISAID) provides a robust Database Access Agreement that sets a high standard for sequence data sharing. Their model ensures that users identify themselves and adhere to strict ethical guidelines.
Similarly, the International Nucleotide Sequence Database Collaboration (INSDC) offers guidelines that align with major repositories like GenBank. These resources provide a baseline for legal language that is widely recognized by institutions and journals. Adapting these templates saves time and reduces the risk of legal oversights.
Verify compliance before upload
Before you submit sequence data to a public repository or share it with a partner, you must confirm that all legal and ethical obligations are satisfied. This final check prevents accidental breaches of patient privacy or institutional agreements.
Review your consent forms to ensure they explicitly cover the intended use of the data. If the data includes genomic sequences, verify that no protected health information (PHI) remains attached to the metadata. Check that any required data use agreements are signed and on file.
Confirm that the receiving platform’s terms align with your sharing goals. For example, GISAID requires users to agree to its Database Access Agreement to ensure open access while maintaining attribution standards. Similarly, the INSDC guarantees free access but relies on strict submission guidelines to maintain data integrity.
-
Consent forms cover public release
-
PHI stripped from metadata
-
Data use agreements signed
-
Repository terms reviewed
Only proceed with the upload once every item in this checklist is marked complete. This step protects your institution and the subjects whose data you are sharing.
Common questions about sequence data
Users often confuse technical sequence generators with global biological data sharing. Below are specific answers regarding Informatica IICS and GISAID access.

No comments yet. Be the first to share your thoughts!