Gene Ontology (GO) enrichment analysis is a technique for interpreting sets of genes making use of the Gene Ontology system of classification, in which genes are assigned to a set of predefined bins depending on their functional characteristics. For example, the gene FasR is categorized as being a receptor, involved in apoptosis and located on the plasma membrane.
Researchers performing high-throughput experiments that yield sets of genes (for example, genes that are differentially expressed under different conditions) often want to retrieve a functional profile of that gene set, in order to better understand the underlying biological processes. This can be done by comparing the input gene set with each of the bins (terms) in the GO – a statistical test can be performed for each bin to see if it is enriched for the input genes.
There are a variety of methods for performing a term enrichment using GO. Methods may vary according to the type of statistical test applied, the most common being a Fisher's exact test / hypergeometric test. Some methods make use of Bayesian statistics. There is also variability in the type of correction applied for Multiple comparisons, the most common being Bonferroni correction.
One of the main uses of the GO is to perform enrichment analysis on gene sets. For example, given a set of genes that are up-regulated under certain conditions, an enrichment analysis will find which GO terms are over-represented (or under-represented) using annotations for that gene set.
Users can perform enrichment analyses directly from the home page of the GOC website. This service connects to the analysis tool from the PANTHER Classification System, which is maintained up to date with GO annotations.
Using the GO enrichment analysis tools
1. Paste or type the names of the genes to be analyzed, one per row or separated by a comma. The tool can handle both MOD specific gene names and UniProt IDs (e.g. Rad54 or P38086).
2. Select the GO aspect (molecular function, biological process, cellular component) for our analysis (biological process is default).
3. Select the species our genes come from (Homo sapiens is default).
4. Press the submit button. Note that we will be able to upload a REFERENCE (aka “background”) LIST at a later step.
5. We will be redirected to the results on the PANTHER website. These results are based on enrichment relative the set of all protein-coding genes in the genome we selected in step 3.
6. (optional but HIGHLY RECOMMENDED) Add a custom REFERENCE LIST and re-run the analysis. Press the “change” button on the “Reference list” line of the PANTHER analysis summary at the top of the results page, upload the reference list file, and press the “Launch analysis” button to re-run the analysis. The reference list should be the list of all the genes from which our smaller analysis list was selected.
Interpreting the results table
The results page shows a table that lists significant shared GO terms (or parents of GO terms) used to describe the set of genes that users entered on the previous page, the background frequency, the sample frequency, expected p-value, an indication of over/underrepresentation for each term, and p-value. In addition, the results page displays all the criteria used in the analysis. Any unresolved gene names will be listed on top of the table.
Background frequency is the number of genes annotated to a GO term in the entire background set, while sample frequency is the number of genes annotated to that GO term in the input list.
The symbols + and - indicate over or underrepresentation of a term.
P-value is the probability or chance of seeing at least x number of genes out of the total n genes in the list annotated to a particular GO term, given the proportion of genes in the whole genome that are annotated to that GO Term. That is, the GO terms shared by the genes in the user’s list are compared to the background distribution of annotation. The closer the p-value is to zero, the more significant the particular GO term associated with the group of genes is.