Inferring Systemic Functional Language Models
Loading...
Authors
Alsadhan, Nasser
Date
2014-08-29
Type
thesis
Language
eng
Keyword
Text Mining
Alternative Title
Abstract
Language production in the brain is a complicated process that is not yet fully understood. The bag-of-words model, which considers the frequencies of each word in a document, is a useful approach in many text mining fields, but it does not provide any information about how language is produced. Systemic networks model language as a set of choices, where each choice operates in a particular context. Capturing patterns of choices used to create a particular document provides useful information about the authors and what they were feeling and thinking when they created the document.
However, producing systemic networks manually is expensive. We define an automated way of producing systemic networks. Given a set of documents, we cluster words of interest into smaller groups, by using Non-Negative Matrix Factorization (NNMF). We create hierarchical clusters that we interpret as systemic networks. We validate the produced systemic networks in a number of ways; we use them in an authorship prediction problem and compare their results to that of the bag-of-words model, as well as how well they cluster the different choices made by the authors. We also generate random systemic networks and compare their performance with the produced systemic networks.
Description
Thesis (Master, Computing) -- Queen's University, 2014-08-28 23:28:17.897
Citation
Publisher
License
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
