Estimation of Prosodic Features for Personalized Voice Assistant Interaction

Loading...
Thumbnail Image

Authors

Monir, Tahosina

Date

2025-01

Type

other

Language

en

Keyword

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

The rapid adoption of voice-based assistive technologies has transformed how users interact with devices, yet many voice assistants fail to adapt to individual vocal characteristics. This limitation is particularly challenging for older adults, whose speech patterns may vary due to age-related factors such as reduced vocal strength or slower speech rates. This project addresses this gap by developing a machine learning-based approach to predict pitch and speech rate from voice data. We used the Common Voice dataset to extract acoustic and prosodic features such as MFCCs, spectral properties, and chroma features. Multiple machine learning models, including Linear Regression, Support Vector Regression, Random Forest, and XGBoost were evaluated where XGBoost demonstrated the best performance for both tasks. While the model for speech rate demonstrated robust accuracy with strong alignment between predicted and actual values, pitch prediction revealed areas for refinement, particularly at higher pitch ranges. Feature importance analysis highlighted the critical role of spectral and temporal characteristics, such as MFCCs and duration, in capturing nuanced variations in speech patterns. SHAP analysis further validated these insights, showcasing feature-specific contributions to predictions. These findings pave the way for developing adaptive voice assistants capable of delivering personalized user interactions and enhance user engagement across a wide range of applications.

Description

Citation

Publisher

Journal

Volume

Issue

PubMed ID

External DOI

ISSN

EISSN