August 2010

PHARMACOLOGY:

Improved Protein Minimotif Prediction for Drug Development

How might two seemingly disparate proteins contribute to the same biochemical function? How can small similarities among proteins be utilized to predict such functional relationships?

These are very important questions for pharmaceutical scientists wishing to target a drug to a specific protein or biochemical pathway. Relevant to this issue, scientists have recently begun to link different proteins and diseases together based on similar drug side effects, and have developed a web server for predicting the specific location in protein molecules most critical to their function.

Protein minimotifs: Relevance to pharmaceutical science.

Of interest to this line of research are protein minimotifs (mini: small; motif: recurring element). These are small amino acid (protein subunit) sequences of known function.

Suppose a second protein possesses a very similar minimotif, in terms of amino acid composition, geometry, or some other criteria. It's likely that the two proteins perform the same or similar biochemical function.

Consequently, a drug targeted to one of the proteins will likely target the other one as well. This must be taken into consideration when predicting the ultimate medical outcome of the drug.

Furthermore, if one discovers that a small change in DNA sequence causes or removes a protein minimotif, it's possible that the minimotif is involved in a disease, and warrants further investigation. Clearly, identifying minimotifs among proteins, especially proteins which have recently been discovered and have not been the subject of extensive experimental investigations, is of medical value.

A number of computer programs are available for identifying protein minimotifs. However, all of them generate excessive false positive matches, i.e. identify two minimotifs as being functionally similar when they are not.

Sanguthevar Rajasekaran (University of Connecticut, United States), Martin Schiller (University of Nevada Las Vegas, United States), and coworkers have improved upon Minimotif Miner computer software for protein minimotif prediction. By incorporating cellular or molecular functions of protein minimotifs, known from experimental data, they have significantly enhanced its predictive accuracy.

Gathering protein minimotif data for the model.

The scientists needed to first gather protein minimotif data on which to develop a model that predicts protein minimotif similarities among any two proteins. They obtained protein minimotif cellular and molecular function data from the Gene Ontology database, using it to generate a group of positive protein minimotifs.

In other words, they generated pairs of proteins with a shared minimotif and a shared cellular or molecular function, confirmed by experiments. Over 1700 minimotifs of a known cellular function and over 2000 of a known molecular function were included in the dataset.

The scientists did not have access to a database of negative protein minimotifs. In other words, they did not have a database of protein pairs which do not share a minimotif and do not share either a cellular or molecular function, explicitly disproven by experiments.

Consequently, they randomly paired up over 3100 proteins of known cellular function and over 3700 proteins of known molecular function, based on the idea that random pairing is very unlikely to generate a functionally linked pair. They checked to ensure that their random pairings didn't inadvertently match a positive protein minimotif.

Shared protein minimotifs via the computational model.

With this data in hand, the scientists next developed a predictive model for functionally linking two proteins. Their goal was to develop a model that yields a large percentage of true positives and a small percentage of false positives.

Correlating protein minimotifs based on common cellular function gave a true positive match rate of 26% and a false positive match rate of 6% (optimal). This means that their model is optimally roughly 4.6 times as likely to retain an accurate rather than an incorrect match.

Correlating protein minimotifs based on common molecular function gave a true positive match rate of 59% and a false positive match rate of 21% (optimal). This means that their model is optimally roughly 2.9 times as likely to retain an accurate rather than an inaccurate match.

Most interestingly, combining the cellular functional relationships with another algorithm currently in use improved the models' optimal predictive power by a further 66%. On the other hand, combining the molecular functional relationships with this other algorithm decreased the models' optimal predictive power by 54%.

Overall evaluation.

Many pharmaceutical and other scientists are interested in elucidating functional relationships among proteins, e.g. for predicting the biochemical effects (medical consequences) of a new drug along multiple biochemical pathways. The research discussed herein will enable scientists to rapidly screen out false positive matches, albeit at the cost of eliminating true positive matches.

NOTE: The scientists' research was funded by the National Institutes of Health, the National Science Foundation, the University of Nevada Las Vegas, and the University of Connecticut.

for more information:
Rajasekaran, S., Mi, T., Merlin, J. C., Oommen, A., Gradie, P., & Schiller, M. R. (2010). Partitioning of Minimotifs Based on Function with Improved Prediction Accuracy PLoS ONE, 5 (8) DOI: 10.1371/journal.pone.0012276