Voice activity detection (VAD) can distinguish human speech from other sounds. Various applications?including speech coding and speech recognition?can benefit from VAD. To accurately detect voice activity, the algorithm must take into account the char...
Voice activity detection (VAD) can distinguish human speech from other sounds. Various applications?including speech coding and speech recognition?can benefit from VAD. To accurately detect voice activity, the algorithm must take into account the characteristic features of human speech and/or background noise. For many real-life applications, noise can frequently occur in an unexpected manner, and it is therefore difficult to accurately determine the characteristics of noise in such situations. As a result, robust VAD algorithms that are less dependent on correct noise estimates are more desirable for real-life applications. Formants are the major spectral peaks of human voice and are highly useful for distinguishing human vowel sounds. Because of the characteristics of their spectral peaks, formants are likely to survive in a signal after severe corruption by noise, making them attractive features for voice activity detection under low signal-to-noise ratio (SNR) conditions. However, nonrelevant spectral peaks from background noise make it difficult to accurately extract formants from noisy signals. In this paper, a simple formant-based VAD algorithm is proposed that overcomes the problem of formant detection under conditions with severe noise. The proposed method has much faster processing time and outperforms standard VAD algorithms under various noise conditions. The robustness against various types of noise and the light computational load of the proposed method make it suitable for various applications.