top of page

Gene File Format/Gene Transfer Format

BioCodeKb - Bioinformatics Knowledgebase

The GFF and GTF formats are used for annotating genomic intervals at high levels.

In Bioinformatics, the general feature format (gene-finding format, generic feature format, Gene File Format, GFF) is a file format used for describing genes and other features of DNA, RNA and protein sequences.

The GFF file type is primarily associated with SignalMap by NimbleGen Systems Inc. and we need a suitable software SignalMap from NimbleGen Systems Inc. to open a GFF file. GFF is produced by UniProt and is used by client servers such as GBrowse, Jalview, JBrowse and ZENBU etc.

  1. seqname - name of the chromosome or scaffold: chromosome names can be given with or without the 'chr' prefix. The seqname must be one used within Ensembl, such as a standard chromosome name or an Ensembl identifier such as a scaffold ID, without any additional content such as species or assembly.

  2. Source: name of the program that generated this feature, or the data source (database or project name)

  3. feature: feature type name, e.g. Gene, Variation, Similarity

  4. start: Start position of the feature, with sequence numbering starting at 1.

  5. End: End position of the feature, with sequence numbering starting at 1.

  6. Score: A floating point value.

  7. Strand: defined as + (forward) or - (reverse).

  8. Frame: One of '0', '1' or '2'. '0' indicates that the first base of the feature is the first base of a codon, '1' that the second base is the first base of a codon, and so on.

  9. Attribute: A semicolon-separated list of tag-value pairs, providing additional information about each feature.

The Gene transfer format (GTF) is a file format used to hold information about gene structure. It is a tab-delimited text format based on the general feature format (GFF), means it has been borrowed from GFF but contains some additional conventions and structure specific to gene information. A significant feature of the GTF that can be validated, given a sequence and a GTF file, one can check that the format is correct. This significantly reduces problems with the interchange of data between groups. We can obtain GTF files easily from the UCSC table browser and Ensembl. It is widely used format for storing the gene annotations.

This format also contains 9 fields. Fields must be tab-separated. Also, all but the final field in each feature line must contain a value while "empty" columns should be denoted with a '.'.

The first eight GTF fields are the same as GFF. The feature field is the same as GFF, with the exception that it also includes the following optional values: 5UTR, 3UTR, inter, inter_CNS, and intron_CNS. The group field has been expanded into a list of attributes for GTF. Each attribute consists of a type/value pair. The Attributes field is same for both of the GFF and GTF but with the differences in the content and format. Attributes must end in a semi-colon, and be separated from any following attribute by exactly one space.


Need to learn more about Gene File Format/Gene Transfer Format and much more?

To learn Bioinformatics, analysis, tools, biological databases, Computational Biology, Bioinformatics Programming in Python & R through interactive video courses and tutorials, Join BioCode.

bottom of page