Integrated Framework for Speech Enhancement and Voice Activity Detection in Robust Speech Processing

Adappa S. Angadi, Nagaraja B.G.

doi:10.52710/cfs.681

PDF

Published: Dec 31, 2024

DOI: https://doi.org/10.52710/cfs.681

Keywords:

VAD, DNN, SNR, NOIZEUS, PESQ, STOI, FER, F1-score

Adappa S. Angadi, Nagaraja B.G.

Abstract

Robust speech processing in noisy environments is a critical requirement for a wide range of applications including telecommunication systems, voice-controlled interfaces, and hearing aids. This paper proposes an integrated framework that combines Deep Neural Network (DNN)-based Speech Enhancement (SE) with Voice Activity Detection (VAD) to improve both speech quality and activity detection accuracy under adverse acoustic conditions. First, several conventional and deep learning-based SE techniques are evaluated using the NOIZEUS database across different noise types and signal-to-noise ratio (SNR) levels. Performance is assessed using Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI) metrics. An energy-based VAD approach is then applied, and its performance is evaluated using Frame Error Rate (FER) and Precision. The proposed integrated system demonstrates superior performance over standalone methods, with improvements observed in both enhancement and detection stages. Evaluation results show significant gains in FER and F1 score, highlighting the effectiveness of combining SE and VAD within a unified framework. This work offers a practical and efficient solution for real-time speech processing in complex acoustic environments.

Issue

Volume 2024, Issue 12

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details