Comparing Naïve Bayes Classifiers with Support Vector Machines for Predicting Protein Subcellular Location Using Text Features

dc.contributor.authorLam, Yinen
dc.contributor.departmentComputingen
dc.contributor.supervisorShatkay, Hagiten
dc.contributor.supervisorBlostein, Dorotheaen
dc.date2010-07-06 11:06:47.613
dc.date.accessioned2010-07-07T18:25:08Z
dc.date.available2010-07-07T18:25:08Z
dc.date.issued2010-07-07T18:25:08Z
dc.degree.grantorQueen's University at Kingstonen
dc.descriptionThesis (Master, Computing) -- Queen's University, 2010-07-06 11:06:47.613en
dc.description.abstractProteins play many roles in the body, and the task of understanding how proteins function is very challenging. Determining a protein’s location within the cell (also referred to as the subcellular location) helps shed light on the function of that protein. Protein subcellular location can be inferred through experimental methods or predicted using computational systems. In particular, we focus on two existing computational systems, namely EpiLoc and HomoLoc, that use features derived from text (abstracts of technical papers), and apply a support vector machine (SVM) classifier to classify proteins into their respective locations. Both EpiLoc and HomoLoc’s prediction accuracy is comparable to that of state-of-the-art protein location prediction systems. However, in addition to accuracy, other factors such as training efficiency must be considered in evaluating the quality of a location prediction system. In this thesis, we replace the SVM classifier in EpiLoc and HomoLoc, by a naïve Bayes classifier and by a novel classifier which we call the Mean Weight Text classifier. The Mean Weight Text classifier and the naïve Bayes classifier are simple to implement and execute efficiently. In addition, naïve Bayes classifiers have been shown effective in the context of protein location prediction and are considered preferable to SVM due to clarity in explaining the process used to derive the results. Evaluating the performance of these classifiers on existing data sets, we find that SVM classifiers have a slightly higher accuracy than naïve Bayes and Mean Weight Text classifiers. This slight advantage is offset by the simplicity and efficiency offered by naïve Bayes and Mean Weight Text classifiers. Moreover, we find that the Mean Weight Text classifier has a slightly higher accuracy than the naïve Bayes classifier.en
dc.description.degreeM.Sc.en
dc.identifier.urihttp://hdl.handle.net/1974/5920
dc.language.isoengen
dc.relation.ispartofseriesCanadian thesesen
dc.rightsThis publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.en
dc.subjectText Miningen
dc.subjectProtein Subcellular Locationen
dc.titleComparing Naïve Bayes Classifiers with Support Vector Machines for Predicting Protein Subcellular Location Using Text Featuresen
dc.typethesisen
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Lam_Yin_P_201006_MSc.pdf
Size:
1.21 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.72 KB
Format:
Item-specific license agreed upon to submission
Description: