Show simple item record

dc.contributor.authorWong, Andrewen
dc.date2013-04-24 21:07:13.983
dc.date.accessioned2013-04-25T14:27:07Z
dc.date.available2013-04-25T14:27:07Z
dc.date.issued2013-04-25
dc.identifier.urihttp://hdl.handle.net/1974/7923
dc.descriptionThesis (Master, Computing) -- Queen's University, 2013-04-24 21:07:13.983en
dc.description.abstractProteins perform many important functions in the cell and are essential to the health of the cell and the organism. As such, there is much effort to understand the function of proteins. Due to the advances in sequencing technology, there are many sequences of proteins whose function is yet unknown. Therefore, computational systems are being developed and used to help predict protein function. Most computational systems represent proteins using features that are derived from protein sequence or protein structure to predict function. In contrast, there are very few systems that use the biomedical literature as a source of features. Earlier work demonstrated the utility of biomedical literature as a source of text features for predicting protein subcellular location. In this thesis we build on that earlier work, and examine the effectiveness of using text features to predict protein function. Using the molecular function and biological process terms from the Gene Ontology (GO) as our function classes, we trained two classifiers (k-Nearest Neighbour and Support Vector Machines) to predict protein function. The proteins were represented using text features that were extracted from biomedical abstracts based on statistical properties. For evaluation, the performance of our two classifiers was compared to that of two baseline classifiers: one that assigns function based solely on the prior distribution of protein function, and one that assigns function based on sequence similarity. The systems were trained and tested using 5-fold cross-validation over a dataset of more than 36,000 proteins. Overall, we show that text features extracted from biomedical literature can be used to predict protein function for any organism. Our results also show that our text-based classifier typically has comparable performance to the sequence-similarity baseline classifier. Based on our results and what previous work had shown, we believe that text features can be integrated with other types of features to provide more accurate predictions for protein function.en
dc.language.isoengen
dc.relation.ispartofseriesCanadian thesesen
dc.rightsThis publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.en
dc.subjectComputer Scienceen
dc.subjectProtein Function Predictionen
dc.titlePrediction of Protein Function Using Text Features Extracted From the Biomedical Literatureen
dc.typethesisen
dc.description.degreeM.Sc.en
dc.contributor.supervisorShatkay, Hagiten
dc.contributor.departmentComputingen
dc.degree.grantorQueen's University at Kingstonen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record