This is to prevent frustration when doing a beginner’s task of annotating genes with GO IDs, or Gene Ontologies. This is useful to visualise large datasets of genes.
First, convert your gene names to a format recognised by UniProtKB. The tool you can use is DAVID. Don’t try and use the other DAVID tools to try convert your gene names to GO terms or IDs. I struggled this for quite a while, and it turns out that sometimes UniProtKB has the same tool.
Drosophila melanogaster. I worked with transcript IDs (eg. CG32954). When trying to use these with UniProtKB directly, using the FLYBASE option for it, it failed miserably. I then spent some very frustrating minutes trying to get it to work. Here is where DAVID comes in handy.
What you want to do is go to the ‘Gene ID Conversion Tool‘.
- Go to the ‘Upload’ tab, then paste your list into the ‘list’ spot.
- Then attempt to tell it what kind your gene is, and it will have a hissy fit.
- Then STOP.
- You then need to leave that list over there (don’t submit your list), and instead use Option 1 to convert the list. This should tell you what your input is, and hopefully give you a converted list to a format you need (ENTREZ_GENE_ID).
Sometimes it just decides to be difficult, and will tell you you haven’t told it what kind of original identifier you have. This could be your fault (for example if you left a -RA on the end of your Drosophila gene, it won’t accept it) or DAVID is just being stupid. If this happens, try submitting just a couple of gene identifiers.
DAVID may also become confused about how many genes you have submitted to it. I had the problem that I had submitted a couple of IDs just so that I could check what kind they were, and then it had a hissy fit and ‘forgot’ when I tried to enter a bigger gene list. Try using a different browser to get around this.
Once you’ve done all that, DAVID can be made to output the table, then you want to put that in an excel doc (or use your handy programming skills) to pull out just the column that has the ENTREZ_GENE_IDs in it.
That needs to be submitted to UniProtKB. You may need to edit what the output table looks like to get the GO IDs out. Youcan simply export, then import into Excel for your final list.
This list that you’ve just created is going to be very long – each gene has been mapped to a term, so there will be duplicates here. A simple way to remove these, and also slim down your list a little bit, is to use REVIGO. REVIGO maps your terms and you can see the different ones clustering together in space. It also differentiates between the three kinds of GO terms.
Finally, you can use it to reduce your ID list, by clustering IDs under larger terms. I’m not sure how it actually does this, so use it with caution if you are trying to map groups of genes. Be careful you do not get an error when giving it long gene lists.
REVIGO keeps the terms with the strongest p-values (or enrichments, depending what you had specified on the input form). Consider filtering the list by an external criterion (e.g. enrichment) before submitting to REVIGO.
Ok, so you have your GO IDs, and you really want to know what the hell they mean. The tool you want to use here is SIGE@NE. This tool will convert your GO IDs to GO terms. Then, if you are comparing data sets, I used VENNY 2.0 to compare the groups. Or you can use GO View to look at the data online interactively.
For your ease of reference (and mine, when I write up this paper), these are the citations you will need.
Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources. Nature Protoc. 2009;4(1):44-57.
Huang DW, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37(1):1-13.
Oliveros, J.C. (2007-2015) Venny. An interactive tool for comparing lists with Venn’s diagrams. http://bioinfogp.cnb.csic.es/tools/venny/index.html
Supek F, Bošnjak M, Škunca N, Šmuc T. “REVIGO summarizes and visualizes long lists of Gene Ontology terms” PLoS ONE 2011. doi:10.1371/journal.pone.0021800
X. SIGENAE [http://www.sigenae.org/]