Accent Classification Across Continents: A Deep Learning Approach

CNN Accent Classification Speaker Recognition Continent MFCC

Authors

Downloads

This study focuses on a deep learning based accent classification across continents and greatly enhances speech recognition systems by identifying the accents of Asia, Europe, North America, Africa, and Oceania. The Convolutional Neural Network (CNN) was trained on the Mozilla Common Voice dataset, which comprises the features extracted - Mel-Frequency Cepstral Coefficients, Delta, Delta-Delta, Chroma Frequency, and spectral features- and trained to classify accents. Multiple convolutional and dense layers for accent classification were combined with dropout and batch normalization layers to avoid overfitting during training. Out of the total validation data, 82% accuracy has been achieved. The Asian and European accents were classified with greater accuracy since their datasets were larger, whereas African and Oceanian accents were more misclassified due to limited representation and the greater diversity of languages. In contrast to the past research, which focused only on country-based accent classification, this work introduced a feature based deep learning approach of continent-based accent classification along the way. The recognition of this accent variation, in turn, helps integrate and improve various aspects of speech recognition systems and makes their application more inclusive for voice assistants and language learning tools with diverse linguistic patterns. The future work will concentrate on extending the dataset to the seven continents while enhancing classification accuracy via better feature engineering and model tuning.