top of page

Retrieve an entire Genome

BioCodeKb - Bioinformatics Knowledgebase

After a genome has been sequenced, assembled and annotated it needs to be shared and stored in a format that is easily and freely accessible to all. This can be done via a database called a Genome Browser. Here, the stored data of the genome could be used by anyone at any time. So there arose a need for the data retrieval methods.


Researchers and scientists need data due to;

  • Analysis of organisms functional and evolutionary history, which requires combining disparate data from a variety of sources

  • Reliable information resources, compiling data on sequenced genomes and linking it to the wealth of associated functional data

  • Study on comparative genomics

  • Potential working to personalize modern medicine

  • Understanding the blueprint for building any organism

  • Learn more about the functions of genes and proteins, that knowledge will have a major impact in the fields of medicine, biotechnology, and the life sciences

  • knowing about the regions of DNA that have other important roles, such as the regulation of our genes etc


The amount of genome-related information stored in public databases and freely available to anyone with an Internet access is enormous. It has been experience, however, that many researchers who should benefit the most from this information are not comfortable navigating these databases, let alone assessing the reliability of the data.


To retrieve an entire genome sequence, first users can check whether or not the genome, proteome, CDS, RNA, GFF, GTF, or genome assembly statistics of their interest is available for download.


Using the scientific name of the organism of interest, users can check whether the corresponding genome is available or not.


List for genome retrieval Databases

  • NCBI Genomes

  • Ensembl Genomes

  • Personal Genome Project

  • GMOD Project

  • ENCODE

  • GenBank


Sequence retrieval from NCBI

The Genome database contains sequence and map data from the whole genomes of over 1000 species or strains. The genomes represent both completely sequenced genomes and those with sequencing in-progress. All three main domains of life (bacteria, archaea, and eukaryota) are represented, as well as many viruses, phages, viroids, plasmids, and organelles.


Visit the NCBI site and select the “Genome” databes and write the name of organism whose genome is required.


A page opened there. Download the genome sequence for an organism, all the cDNA, genes, proteins, or ncRNAs for a species, and more with the ftp site. You can get the whole mouse genome sequence, all the proteins in the human genome, or the genes for zebrafish etc.


You can also download GenBank files, gene sets in GTF formats, or the MySQL tables themselves while retrieval of genomes.


Sequence retrieval from Ensembl

See the README file in the directory for general information about the organization of the ftp files.

  1. Locate the directory for your organism of interest. Within that directory a README file will describe the various files available. In many cases, the sequence data of genome is segregated into directories for each chromosome.

  2. Use any FTP client to download the data.

ad-scaled.webp

Need to learn more about BioCodeKB - Bioinformatics Knowledge... | BioCode and much more?

To learn Bioinformatics, analysis, tools, biological databases, Computational Biology, Bioinformatics Programming in Python & R through interactive video courses and tutorials, Join BioCode.

bottom of page