top of page

Sequence Retrieval

BioCodeKb - Bioinformatics Knowledgebase

Bioinformatics is an interdisciplinary scientific field that develops methods for storing, retrieving, organizing and analyzing biological data. A major activity in bioinformatics is to develop software tools to produce useful biological knowledge. Bioinformatics uses computers to better understand biology. Databases and information systems are used to store and organize biological data.


The sequence retrieval tool allows downloading of nucleotide and protein sequences including chromosomes, scaffolds, genes, mRNAs, transcript coding sequences, protein, reftrans contigs and unigene contigs. For the sequences aligned to larger sequences, such as genes, mRNAs and transcript coding sequences, a numeric value specifying the number of upstream bases and downstream bases can be typed in the text boxes.


Data retrieval from different databases requires a search capability using a data retrieval system (tool). Some common data retrieval systems are;

  1. Entrez/GQuery

  2. DBGET/LinkDB

  3. Sequence Retrieval System (SRS)

  4. retrieval system from EMBL-EBI


SRS supports the data structure of the libraries by providing special indices for inzplemenzing lists of subenfities (e.g. feature tables) or hierarchically structured data–fields (e.g. taxonomic classification). A language (ODD) has been designed for the convenient specification of library format and organization, representation of individual data–fields within the system (design of indices) and structuring other data needed during retrieval. This ensures flexibility required for coping with different library formats, which are subject to continuous change. Queries and inspection of retrieved entries can be performed from a user interface with pull–down menus and windows. SRS supports many input and output formats but is particularly well adapted to the GCG programs. SRS is a homogeneous interface to over 80 biological databases that had been developed at the European Bioinformatics Institute (EBI) at Hinxton, UK. It includes databases of sequences, metabolic pathways, transcription factors, application results (like BLAST, SSEARCH, FASTA), protein 3-D structures, genomes, mappings, mutations, and locus specific mutations.


DBGET is an integrated database retrieval system, developed at the university of Tokyo. Having more limited options, the DBGET is less recommended than the two others

Entrez is a molecular biology database and retrieval system. Developed by the National Center for Biotechnology information (NCBI). It is entry point for exploring distinct but integrated databases.


Sequence retrieval systems are mainly used for;

  1. When obtaining a new DNA sequence, one needs to know whether it has already been deposited in the databanks, or whether they contain any homologous sequences (sequences which are derived from a common ancestry) exist there.

  2. Given a putative coding ORF, we can search for Homologous proteins - proteins similar in their folding or structure of function).

  3. To find similar non-coding DNA stretches in the database Repeat elements or regulatory sequences for instance.

  4. Locating false priming sites for a set of PCR oligonucleotides.

ad-scaled.webp

Need to learn more about Sequence Retrieval and much more?

To learn Bioinformatics, analysis, tools, biological databases, Computational Biology, Bioinformatics Programming in Python & R through interactive video courses and tutorials, Join BioCode.

bottom of page