PIPE2 Tutorial
PIPE2 Tutorial
http://pipe2.systemsbiology.net/ (PIPE1: http://pipe.systemsbiology.net/) Please follow along in this tutorial as we go through it in the class, or go your own pace if you choose. Feel free to add your own notes. If have suggestions or bug reports that might be useful for others, please email them to [email protected]. We will be using a set of Yeast proteins derived from a real ISB experiment. The researcher was interested in identifying any potential protein complex in the sample. These proteins had a protein prophet probability of > 0.9. We will use PIPE2, the Firegoose, Entrez gene, Kegg and STRING to explore the functions of and interactions between these proteins and come to a conclusion about any potential protein complex. Before we get started, be sure to make sure that the most recent Gaggle Firefox extension is installed on your Firefox browser. At time of writing, this was version 0.8.204. (This step is already taken care of for the course laptops. For other computers, see http://gaggle.systemsbiology.org/docs/geese/firegoose).
3. Press Broadcast button on the Firegoose. You will see the NCBI Entrez Gene index page for the genes. Click into these descriptions to find the answers to the following questions: How many interactions are noted for each of the following genes (estimations are perfectly OK)? 1. FBA1 - ___________ 2. HXK2 - ___________ 3. MVD1- __________ Bonus question: How many of those interactions are with other genes in our list? (You really dont have to answer that, but well visually answer this question in a minute)
PIPE2 and Gaggle -- Tutorial 4. Press the layout button: 5. From the menu bar, click View -> Set Node Labels -> -> Gene Symbol. Your Network Viewer PIPElet should look something like this:
PIPE2 and Gaggle -- Tutorial Lets see what STRING says about these proteins. 6. Back in IDMapper 1, broadcast the first column (By selecting Yeast Demo List from the broadcast panel data source field) of the data to the Firegoose, and from the Firegoose, broadcast to EMBL String. Press Continue until you get to this screen:
(Note: you may have to click the confidence icon to get this view.) Locate the 3 proteins from the end of the last section (FBA1, HXK1, HXK2). Investigate the difference in connectivity between these 3 proteins. Where does STRING get the connections not found in PIPE2? (hint: click on the connecting edges) _____________
PIPE2 and Gaggle -- Tutorial Well come back to these networks in a minute.
2. Hit Submit. (When you get good at PIPE2, you can multitask and do other things while this process is completing, but for now, just relax.) We are enriching for biological process GO categories. The p-value for each GO category corresponds to the hypergeometric distribution value based on the 4 parameters: # of items in your list that mapped to that category, your lists size, number of genes total (in the yeast genome) that map to the same category, and number of total genes possible in the organisms genome (for Yeast, ~6,000). e.g., for alcohol catabolic process: hyperg(6, 29, 67, 6000) = 4.33804668787027e-07 Notice that the results on the first page seem to also suggest a lot of sugar metabolic processes (like KEGG did).
3. Hit Broadcast. 4. Do the same thing (broadcast to Network Viewer) with the following GO categories: - glucose catabolic process - fructose import - glucose import (on second page click the right arrow button at bottom) 5. Go back to the Network Viewer 1 PIPElet, maximize it (similar to windows on your desktop) and click the layout button: Now we have a cluster of proteins connected by direct interaction experiments (yeast-2-hybrid) and functional associations (GO terms). Lets see how STRING compares. 6. Select the cluster in the Network Viewer by clicking and dragging across it.
7. Expand the Broadcast panel of the Network Viewer PIPElet. Select Selected Nodes (Namelist) as the datasource and Firegoose as the target and hit Broadcast. It should look something like this:
8. In the Firegoose, ensure Network nodes: NameList(13) is the data Source and EMBL String is the target, and hit Broadcast. 9. Caution: In String, select Saccharomyces cerevisiae as the organism and click continue. On the page following that, String tries to map all of your input to identifiers it recognizes. In particular, at the bottom of the page, youll notice that it also tried to map alcohol catabolic process, fructose import, glucose catabolic process, and glucose import.
PIPE2 and Gaggle -- Tutorial Uncheck the mappings String attempted to make! Then click continue. 10. Explore the String network. In particular, look at edges that are in String and not in PIPE2. Click on them and investigate the evidence they provide for those edges. That type of information is not in PIPE2 yet perhaps one day.
VIII. Conclusion
No conclusive evidence for enrichment of any known protein complexes, however the co-occurrence of the 3 proteins FBA1, HXK1, and HXK2 in different annotation databases may warrant further experimental investigation into possible interactions.
PIPE2 and Gaggle -- Tutorial from red to green. These options are available in the View -> Attribute Data -> Visual Cue mapping menu item.