Estimation of Prosodic Features for Personalized Voice Assistant Interaction
Loading...
Authors
Monir, Tahosina
Date
2025-01
Type
other
Language
en
Keyword
Alternative Title
Abstract
The rapid adoption of voice-based assistive technologies has transformed how users interact with devices, yet many voice assistants fail to adapt to individual vocal characteristics. This limitation is particularly challenging for older adults, whose speech patterns may vary due to age-related factors such as reduced vocal strength or slower speech rates. This project addresses this gap by developing a machine learning-based approach to predict pitch and speech rate from voice data. We used the Common Voice dataset to extract acoustic and prosodic features such as MFCCs, spectral properties, and chroma features. Multiple machine learning models, including Linear Regression, Support Vector Regression, Random Forest, and XGBoost were evaluated where XGBoost demonstrated the best performance for both tasks. While the model for speech rate demonstrated robust accuracy with strong alignment between predicted and actual values, pitch prediction revealed areas for refinement, particularly at higher pitch ranges. Feature importance analysis highlighted the critical role of spectral and temporal characteristics, such as MFCCs and duration, in capturing nuanced variations in speech patterns. SHAP analysis further validated these insights, showcasing feature-specific contributions to predictions. These findings pave the way for developing adaptive voice assistants capable of delivering personalized user interactions and enhance user engagement across a wide range of applications.