Inferring domain-domain interactions from protein-protein interactions
Minghua Deng , Shipra Mehta , Fengzhu Sun and Tim Chen
Program in Molecular and Computational Biology, Department of Biological Sciences
University of Southern California, 1042 West 36th Place, DRB 155, Los Angeles, CA 90089-1113.





Abstract

The interaction between proteins is one of the most important features of protein functions. Behind protein-protein interactions there are protein domains physically interacting with one another to perform the necessary functions. Therefore, understanding protein interactions at the domain level gives a global view of the protein interaction network, and possibly of functions of proteins. Two research groups (Uetz et al 2000. and Ito et al., 2001.) used Yeast 2-hybrid assays to generate 5719 interactions between proteins of the yeast Saccharomyces cerevisiae . This allows us to study the large-scale conserved patterns of interactions between protein domains. Using evolutionarily conserved domains defined in a protein-domain database called Pfam (http://pfam.wustl.edu), we apply a Maximum Likelihood Estimation method to infer interacting domains that are consistent with the observed protein-protein interactions. We estimated the probabilities of interactions between every pair of domains and measured the accuracies of our predictions at the protein level. Using the inferred domain-domain interactions, we predict interactions between proteins. Our predicted protein-protein interactions have a significant overlap with the protein-protein interactions in MIPS (http://mips.gfs.de/) obtained by methods other than the two-hybrid systems. The mean correlation coefficient of the gene expression profiles for our predicted interaction pairs is significantly higher than that for random pairs as well as that for the original 5719 interactions. Our method has shown robustness in analyzing incomplete data sets and dealing with various experimental errors. We found several novel protein-protein interactions such as RPS0A interacting with APG17 and TAF40 interacting with SPT3, that are consistent with the functions of the proteins.


Original data

1. Protein-domain relationship.

The definition of proteins and domains are obtained from SwissPfam directly, where proteins are identified by their Swissprot, TrEMBL ids. The SGD ids for yeast genes are downloaded from the gene name table in the SGD database, and mapped to their corresponding SwissProt, TrEMBL ids. With the mapping between the Yeast proteins and their corresponding Swissprot ids, we then map SwissPfam 6.5 domains to the Yeast proteins. The final version of the protein-domain relationship is given as below.

Text file (585K byte)
Compressed file (232K byte--Recommended)
2. Protein-protein interaction data.

There are two sources of yeast two-hybrid protein-protein interactions. One is from Fields' group released with their Nature paper (Uetz et al. 2001). Another is from Ito's group released with their PNAS paper (Ito et al. 2001). The combined data contain 5719 protein-protein interactions.

3. Gene expression data.

We also use gene expression data to verify our prediction. The expression data contain 2465 genes with 79 time points (original 2467 but two genes duplicated):

Text file (1.4M byte)
Compressed file (347K byte----Recommended)


Prediction based on yeast two-hybrid (Y2H) data.

We compute domain-domain interaction probabilities from Y2H protein-protein interactions, and then use these domain-domain interaction probabilities to compute the interaction probability between every pair of proteins. The prdiction results with a false positive rate fp=2.5E-4 and a false negative rate fn=0.80 are listed blow.

We compute specificity and sensitivity for our predictions based on different fns (fp fixed) and compare them with those for the association method.

The matches between our predictions and the MIPS data are counted, and compared with random matches (fold numbers). The results are listed in the following tables.

We compute the gene expression correlation for each predicted interacting protein pair and compare the predictions against randomly chosen pairs. The statistics are given in following table.



Prediction based on MIPS data.

We repeat the above procedures on MIPS data with fp=0.00, fn=0.80 :

We compute specificity and sensitivity for our predictions based on different fns (fp fixed to 0.0) and compare with the association method.

Similarly, the following tables list all the pairwise matches among our predictions, the MIPS data, and the Y2H data.

We computed the expression correlation for each interacting protein pair and compare the predictions against randomly chosen pairs. The statistics are given in following table.





Tim Chen
Last modified: Wednesday Jun 12, 2002.