Here we use gene name in Yeast Proteomics Database(YPD) to identify proteins and YPD cellular roles are used as the function categories. We download data from YPD at April 8,2002, which contain 6416 proteins in Yeast with one or more of 43 cellular roles, while 3894 proteins are annotated and 2522 proteins are unknown. The data is given as belowing.
Text file (825K byte)2. Protein-protein interaction data.
Compressed file (244K byte--Recommended)
There are two sources of yeast two-hybrid protein-protein interactions. One referred as "Uetz",
is from Fields' group
released with their Nature paper (Uetz et al. 2001).
Another referred "Ito", is from Ito's group
released with their PNAS paper (Ito et al. 2001). The first one contains 1436 protein pairs with name in
YPD, and the second on contain 4443 pairs with name in YPD. To estimate the reliability of yeast two-hybrid data,
the reference data, which are treated as the real interactions, are download from
MIPS, referred as "MIPS physical", which contain 2439 protein pairs (excluding 120
self-interacting pairs). For completeness, we also download
Database of Interacting Proteins (DIP)
protein-protein interaction data, which contain 14455 pairs with name in YPD.
Here are these data sets.
Uetz, Text file (79K byte)
Uetz, Compressed file (17K byte----Recommended)
Ito, Text file (235K byte)
Ito, Compressed file (50K byte----Recommended)
MIPS, Text file (171K byte)
MIPS, Compressed file (39K byte----Recommended)
DIP, Text file (761K byte)
DIP, Compressed file (145K byte----Recommended)
The two data sets based on systematic purification of protein complex are the largest
interaction data sets to date. However, they used different protocols, one is Tandem
affinity purification, which is referred as "TAP", the other is hight-throughput
mass-spectrometric protein complex identification, which is referred as "HMS-PCI",
so we treat them seperately. 232 TAP complexes are vailable in the website
(http://yeast.cellzome.com/)
and 551 HMS-PCI complex are available in website
(http://www.mdsp.com/yeast/).
For reference, 265 MIPS complexes are download from
MIPS, referred as "MIPS complex". It should be noted that
all these complex data are available in MIPS database.
To estimate the reliability in the same manner as that for two-hybrid system, we transform
the complexes into pairwise interactions by connecting all possible protein
pairs within the complex. We obtain 17962, 32667 and 9583 protein pairs with name in YPD from
TAP, HMS and MIPS complex, respectively. The following are the those transformed data sets.
For completeness, we also use other three data sets extracted from
the supplementary data
accompanying with C. Von Mering's paper
Nature Vol. 417, 399-403.
They are co-expression data, referred as "Synexpression", computation predicted data, referred as " In-Silico", and
Genetic data. These data sets contain 16228, 7440 and 886 protein pairs with name in YPD, respectively.
Synexpress, Text file (841K byte)
Synexpress, Compressed file (144K byte----Recommended)
In-Silico, Text file (402K byte)
In-Silico, Compressed file (72K byte----Recommended)
Genetic, Text file (46K byte)
Genetic, Compressed file (11K byte----Recommended)
We also use gene expression data to verify our prediction. The expression data we used contain 6080 genes with 77 time points is listed.
Text file (3.36M byte)
Compressed file (820K byte----Recommended)
We compute the gene expression correlation coefficient for each protein pairs and draw the distribution for each data set as well as randomly chosen pairs. Shown in the figure1 and figure2. To test whether the mean of correlation coefficient for interaction data is significantly higher than that for randomly pairs, the T-score and the corresponding p-values are computed. The following table list all the statistics of the distributions for different data sets.
We use a maximum likelihood method to estimate the reliability of putative protein-protein interaction data. For physical interaction data, such as Uetz's, Ito's data and Ito's data with different IST numbers, we use MIPS physical interaction data as the reference data. For protein complex data, such as TAP and HMS-PCI data, we use MIPS complex data as reference to estimate the reliabilities. The results are given in the following table.
We predict protein function based on protein-protein interaction data using neighborhood-count method and chi-square method, and the leave-one-out method is used to measure the specificity and sensitivity of predictions. The specificity and sensitivity of prediction based on a specific protein-protein interaction data is drawn for comparison of different data sets.
Ting Chen
Last modified: Thursday, July 18, 2002.