Finding Novel Enzymatic Activities Linked to Human Diseases using Bioinformatics and Untargeted Metabolite Profiling

Charandeep Singh, PhD Student, Luxembourg Center for Systems Biomedicine

A tremendous amount of genome data is available now as a result of genome sequencing projects. According to the GOLD genome online database, there are around 6653 genomes sequenced until date. Thousands of genes in these genomes remain of unknown function. From our analysis of the Saccharomyces cerevisiae proteome, we found that around 30% of proteins of this well studied organism have no assigned function yet. We used the Pfam database along with Hidden Markov Models prepared in-house starting from the KEGG and MetaCyc databases, to make functional predictions for those remaining unknown yeast proteins. Among the predicted metabolic proteins, we found that around 100 had human homologs. Out of these 100 proteins, around 33 were linked to human diseases according to the OMIM database. We are now investigating the role of these 33 human disease-linked probable enzymes, using untargeted Liquid chromatography-high resolution mass spectrometry (pos/neg full scan method and a data-dependent MS2 method) in combination with in vitro enzyme assays. By comparing metabolite profiles of wild-type and knockout yeast strains using the untargeted LC-HRMS methods, we aim at identifying possible endogenous substrates and subsequently validate them using recombinant purified enzymes from yeast and human. Using this enzyme function discovery pipeline we already successfully identified the physiological substrate of a predicted yeast sugar phosphotransferase and its human ortholog.

