Skip to main content

Genetic Database Website

· 5 min read

genetic.png

This article will be useful for researchers performing comparative genomics, scientists identifying clinically relevant variants, molecular biologists, students and educators seeking reference sequences.

The four main global repositories USA (GenBank), Japan (DDBJ) and Europe (ENA) exchange data daily as part of the International Nucleotide Sequence Database Collaboration. In addition to these, other major repositories such as China GSA further contribute to global data sharing.

INSDC (USA (GenBank), Japan (DDBJ) and Europe (ENA))

Data submitted to any of these databases are automatically exchanged on a daily basis, ensuring the global dissemination of sequence records across all three platforms. This systematic synchronization guarantees consistent and equivalent access to identical datasets, irrespective of the INSDC partner used for data submission or retrieval.

USA (NCBI)​

GenBank represents one of the largest and most comprehensive publicly accessible repositories of nucleotide sequence data worldwide. Submitted records undergo both automated and manual curation procedures aimed at ensuring data integrity, accuracy, and compliance with established formatting and annotation standards. The database is closely integrated within the broader NCBI ecosystem, encompassing resources such as PubMed, BLAST, RefSeq, dbSNP, and ClinVar. This integration facilitates a wide range of research activities, including similarity searches using BLAST, access to high-quality curated reference sequences, linkage of genomic data with the scientific literature, identification of clinically relevant variants, and seamless navigation across interconnected biological databases.

How to download data from this repository?​

GenBank sequence records may be downloaded from the FTP site or accessed using NCBI's E-utilities API. SRA sequence records are available using the SRA Toolkit API or on Amazon Web Services (AWS) and Google Cloud Platform (GCP) clouds. SRA availability on cloud platforms enables rapid access to large datasets.

The SRA Toolkit is a collection of command-line utilities provided by NCBI for accessing, downloading, and converting sequencing data stored in the Sequence Read Archive (SRA). It is the primary toolset used by researchers to work with raw high-throughput sequencing data locally or within automated pipelines.

Submission​

Submit to the world's largest public repository of biological and scientific information Submission Portal

Japan (DDBJ)​

Efficient support for next-generation sequencing (NGS) data, including large-scale datasets such as metagenomic and transcriptomic data. Its infrastructure is optimized for high-volume data deposition, making DDBJ an efficient option for projects involving extensive NGS output or requiring rapid, large-scale data submission workflows.

Services : GEA, MetaboBank, BioProject, BioSample, MSS, NSSS, AGD, JGA (Japanese Genotype-phenotype Archive), NBDC Human Database, ARSA, getentry, DDBJ Search, TXSearch, GGGenome, GGRNA, DFAST, CRISPRdirect, Gendoo, RefEx, DDBJ Core Database, DDBJ-LD, TogoVar, TogoVar-repository, VecScreen.

DRA​

DDBJ Sequence Read Archive (DRA) stores raw sequencing data and alignment information to enhance reproducibility and facilitate new discoveries through data analysis. DRA is part of the International Nucleotide Sequence Database Collaboration (INSDC) and archiving data in collaboration with the NCBI Sequence Read Archive and the EBI European Nucleotide Archive.

How to download data from this repository?​

You can download data files in formats such as FASTQ and SRA.

DRA Submission​

Create a DDBJ account and register a public key to your account. Upload data files to the submission directory on the file server.

Europe (ENA)​

ENA supports a broad spectrum of data types, ranging from raw next-generation sequencing (NGS) reads to assembled genomes and annotated sequences. Closely integrated with other EMBL-EBI databases (e.g., UniProt, Ensembl), enabling richer biological insights. Сonvenient for uploading data in accordance with European legislation requirements.

Access to ENA data is provided through the browser, through search tools, through large scale file download and through the API.

How to download data from this repository?​

Providing users with the ability to download submitted data for further analysis purposes is a key part of ENA’s mission. Files are therefore made available through a public FTP server.

ENA provides to access the data it hosts, suiting a range of use-cases and computational ability levels:

Submitting and updating data Getting Started

China (GSA)​

GSA is its strong support for next-generation sequencing (NGS) data, including whole-genome, transcriptomic, epigenomic, and metagenomic datasets.

These centers offer access to specialized datasets, analytical tools, training resources, and computational infrastructure. They are particularly useful when working with population-specific datasets or region-specific research initiatives.

How to download data from this repository?​

Public datasets can be downloaded directly from the web interface or via FTP. On each dataset page, a Download option is available.

Submitting and updating data

  • Account registration
  • Create a project and samples
  • Prepare metadata
  • Upload files
  • Validation and release

There are also additional databases such as:​

By leveraging these global resources, the scientific community can efficiently share data, maintain data integrity, and accelerate genomic research, contributing to a more open, collaborative, and data-driven future in molecular biology and genomics.

Join our Discord community to connect with other users and get support!

Top 5 Major DNA Sequencing Equipment Producers

· 2 min read

Here is a comprehensive list of companies that produce DNA sequencing equipment, spanning established multinational corporations and innovative startups.

Major DNA Sequencing Equipment Producers​

  • Illumina: Globally dominant, known for NovaSeq, MiSeq, HiSeq platforms. They have driven innovation and cost reductions in high-throughput sequencing.
  • Thermo Fisher Scientific: Offers a diverse range of sequencers for various research and clinical needs, including capillary and next-generation sequencing platforms[1][2][3][6][7].
  • MGI/BGI: Chinese manufacturer specializing in high-throughput sequencing (DNBSEQ technology), widely used in research and commercial applications.
  • Oxford Nanopore Technologies: Known for portable devices like MinION and GridION enabling real-time, direct DNA/RNA sequencing.
  • Pacific Biosciences (PacBio): Focuses on long-read sequencing platforms, instrumental for highly accurate and complex genomics tasks.

Other Notable Companies​

  • Element Biosciences: AVITI sequencing platform for scalable, cost-effective applications.
  • Ultima Genomics: Developing ultra-high-throughput, affordable sequencing solutions for large projects.
  • Genia (acquired by Roche): Involved in developing nanopore sequencing technology.
  • GeneMind: Specializes in DNA sequencer R&D for molecular diagnostic platforms.
  • Vela Diagnostics, Verogen: Known for specialized clinical and forensic sequencing devices.

Table: Representative Producers and Platforms​

CompanyFlagship Technology/Platform
IlluminaNovaSeq, MiSeq, HiSeq
Thermo FisherIon Torrent, Applied Biosystems
MGI/BGIDNBSEQ-G50/T7/2000
Oxford NanoporeMinION, GridION
PacBioSequel, RS II
Element BiosciencesAVITI
Ultima GenomicsUG 100