Research lead
A team from the Institute of Enzymology of the Hungarian Academy of Sciences, Budapest, led by László Patthy have developed MisPred, a bioinformatics tool that is capable of identifying and correcting incorrect protein annotations in public databases.
The manual retrieval and storage of published protein sequence information requires a huge amount of effort. This has led to the development of automated tools to scoop up and annotate sequence information.
However, Patthy said, “Recent studies have shown that a significant proportion of eukaryotic genes are mispredicted at the transcript level. As the MisPred routines are able to detect many of these errors, and may aid in their correction, we suggest that it may significantly improve the quality of protein sequence data based on gene predictions.” This promises to time and effort that would otherwise be spent in further investigation of erroneously identified genes.
The MisPred approach rates annotations according to five principles that assess whether or not a sequence complies with current knowledge:
Extracellular or transmembrane proteins must have appropriate secretory signals.
A protein with intra- and extra-cellular parts must have a transmembrane segment.
Extracellular and nuclear domains must not occur in a single protein.
The number of amino acid residues in closely related members of a globular domain family must fall into a relatively narrow range.
A protein must be encoded by exons located on a single chromosome.
Although he acknowledges there are some exceptions to these rules Patthy said, “Nevertheless, the fact that MisPred analyses of protein sequences of the Swiss-Prot database identified very few such exceptions indicates that the rules of MisPred are generally valid.”