Toward Smart Classrooms: Automated Detection of Speech Analytics and Disfluency with Deep Learning

Loading...
Thumbnail Image

Authors

Kourkounakis, Tedd

Date

Type

thesis

Language

eng

Keyword

Speech , Stutter , Disfluency , Deep Learning , Squeeze-and-Excitation , BLSTM , Attention

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

Strong presentation skills are valuable and sought-after in workplace and classroom environments alike. Though they are necessary for day-to-day life, there is a lack of effective resources to help improve upon these abilities. This thesis presents a speech assessment tool capable of providing feedback on different metrics of speech quality during a presentation. The information provided by this application has the ability to identify obstacles in one's vocal performance and allows users to further develop these skills. Of the possible improvements to vocal presentations, disfluencies and stutters in particular remain one of the most common and prominent factors of someone's demonstration. Millions of people are affected by stuttering and other speech disfluencies, with the majority of the world having experienced mild stutters while communicating under stressful conditions. While there has been much research in the field of automatic speech recognition and language models, there lacks the sufficient body of work when it comes to disfluency detection and recognition. To this end, we propose an end-to-end deep neural network, FluentNet, capable of detecting a number of different disfluency types. FluentNet consists of a Squeeze-and-Excitation residual convolutional neural network which facilitates the learning of strong spectral frame-level representations, followed by a set of bidirectional long short-term memory layers that aid in learning effective temporal relationships. Lastly, FluentNet uses an attention mechanism to focus on the important parts of speech to obtain a better performance. We perform a number of different experiments, comparisons, and ablation studies to evaluate our model. Our model achieves state-of-the-art results by outperforming other solutions in the field on the publicly available UCLASS dataset. Additionally, we present LibriStutter: a disfluency dataset based on the public LibriSpeech dataset with synthesized stutters. We also evaluate FluentNet on this dataset, showing the strong performance of our model versus a number of benchmark techniques.

Description

Citation

Publisher

License

Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.

Journal

Volume

Issue

PubMed ID

External DOI

ISSN

EISSN