Inferring Systemic Functional Language Models

Thumbnail Image
Alsadhan, Nasser
Text Mining
Language production in the brain is a complicated process that is not yet fully understood. The bag-of-words model, which considers the frequencies of each word in a document, is a useful approach in many text mining fields, but it does not provide any information about how language is produced. Systemic networks model language as a set of choices, where each choice operates in a particular context. Capturing patterns of choices used to create a particular document provides useful information about the authors and what they were feeling and thinking when they created the document. However, producing systemic networks manually is expensive. We define an automated way of producing systemic networks. Given a set of documents, we cluster words of interest into smaller groups, by using Non-Negative Matrix Factorization (NNMF). We create hierarchical clusters that we interpret as systemic networks. We validate the produced systemic networks in a number of ways; we use them in an authorship prediction problem and compare their results to that of the bag-of-words model, as well as how well they cluster the different choices made by the authors. We also generate random systemic networks and compare their performance with the produced systemic networks.
External DOI