FASTA (Sequence Format)
You're currently learning a lecture from the course:
In order to have thorough understanding of the main topic, you should have the basic concept of the following terms:
Sequence Retrieval using NCBI
FASTA is text-based format file containing Biological Sequence that is
used to organize, sequence and store the Biological Data. It is one of the
simplest and widely used format in the Bioinformatics. The Biological
Sequence can be either in the form of nucleotide or amino acids in
which nucleotides or amino acids are represented using single-letter
codes. The format also allows for sequence names and comments to
precede the sequences. The first line of the format consists of the
description of the sequence and the second line initiates with the
The basic syntax of the typical FASTA is as:
>Description of the sequence………………………………….
The Description always starts from the ‘>’ sign and usually consists of the Accession Number and the name of the specie of which the sequence is.
The Sequence is based on the single-letter code denoting either nucleotides or amino acids that have been standardized by IUB/IUPAC.
Each row consists of 70 to 80 letters, each letter represents the corresponding nucleotide or amino acid.
Like all other formats FASTA also has its own filename extensions in
which it is stored, each extension denotes specific type of sequence
which are given as:
fasta generic fasta
fna fasta nucleic acid
ffn FASTA nucleotide of gene regions
faa fasta amino acid
frn FASTA non-coding RNA
In this tutorial, we’ve learnt about the FASTA format that it is most simple and widely used text-based format and what is the syntax of the FASTA and extensions of FASTA. We have used the sequence of mRNA of Homo sapiens Lactase (LCT), you can also retrieve the sequence of your requirement4
If a particular file is required for this video, and was discussed in the lecture, you can download it by clicking the button below.