Automated Biomedical Text Fragmentation In Support Of Biomedical Sentence Fragment Classification

Thumbnail Image
Salehi, Sara
text fragmentation , fragment , automated biomedical text fragmentation , edit distance with move
The past decade has seen a tremendous growth in the amount of biomedical literature, specifically in the area of bioinformatics. As a result, biomedical text categorization has become a central task for providing researchers with literature appropriate for their specific information needs. Pan et al. have explored a method that automatically identifies information-bearing sentence fragments within scientific text. Their proposed method aims to automatically classify sentence fragments into certain sets of categories defined to satisfy specific types of information needs. The categories are grouped into five different dimensions known as Focus, Polarity, Certainty, Evidence, and Trend. The reason that fragments are used as the unit of classification is that the class value along each of these dimensions can change mid-sentence. In order to automatically annotate sentence fragments along the five dimensions, automatically breaking sentences into fragments is a necessary step. The performance of the classifier depends on the sentence fragments. In this study, we investigate the problem of automatic fragmentation of biomedical sentences, which is a fundamental layer in the multi-dimensional fragment classification. In addition, we believe that our proposed fragmentation algorithm can be used in other domains such as sentiment analysis. The goal of sentiment analysis is often to classify the polarity (positive or negative) of a given text. Sentiment classification can be conducted at different levels such as document, sentence, or phrase (fragment) level. Our proposed fragmentation algorithm can be used as a prerequisite for phrase-level sentiment categorization which aims to automatically capture multiple sentiments within a sentence.
External DOI