RISS 검색 - 학위논문 상세보기

다국어 초록 (Multilingual Abstract)

Voice activity detection (VAD) can distinguish human speech from other sounds. Various applications?including speech coding and speech recognition?can benefit from VAD. To accurately detect voice activity, the algorithm must take into account the characteristic features of human speech and/or background noise. For many real-life applications, noise can frequently occur in an unexpected manner, and it is therefore difficult to accurately determine the characteristics of noise in such situations. As a result, robust VAD algorithms that are less dependent on correct noise estimates are more desirable for real-life applications. Formants are the major spectral peaks of human voice and are highly useful for distinguishing human vowel sounds. Because of the characteristics of their spectral peaks, formants are likely to survive in a signal after severe corruption by noise, making them attractive features for voice activity detection under low signal-to-noise ratio (SNR) conditions. However, nonrelevant spectral peaks from background noise make it difficult to accurately extract formants from noisy signals. In this paper, a simple formant-based VAD algorithm is proposed that overcomes the problem of formant detection under conditions with severe noise. The proposed method has much faster processing time and outperforms standard VAD algorithms under various noise conditions. The robustness against various types of noise and the light computational load of the proposed method make it suitable for various applications.

번역하기

목차 (Table of Contents)

CHAPTER 1 INTRODUCTION 1
CHAPTER 2 RELATED WORKS 5
2.1 Speech-Related Features 7
2.1.1 Energy and zero-crossing rate (ZCR) 7
2.1.2 Spectral entropy 9

CHAPTER 1 INTRODUCTION 1
CHAPTER 2 RELATED WORKS 5
2.1 Speech-Related Features 7
2.1.1 Energy and zero-crossing rate (ZCR) 7
2.1.2 Spectral entropy 9
2.1.3 Band-partitioned spectral entropy 10
2.2 Statistical Methods 12
2.2.1 Likelihood ratio test (LRT)-based method 12
2.2.2 Distributional modeling of speech signals 14
2.2.3 Parametric representation of speech signals 15
2.3 G.729 Annex.B Algorithm 16
2.3.1 Feature extraction 17
2.3.2 Background noise parameter estimation 19
2.3.3 Multiboundary VAD decision 20
2.3.4 VAD decision smoothing 22
2.4 ETSI AMR Option 1 Algorithm 23
2.4.1 Feature extraction 24
2.4.2 Background noise parameter estimation 24
2.4.3 Initial VAD decision 25
2.4.4 Hang-over addition 25
2.5 ETSI AMR Option 2 Algorithm 26
2.5.1 Feature extraction 27
2.5.2 Background noise parameter estimation 28
2.5.3 VAD decision 29
2.5.4 Hang-over addition 31
2.6 Summary 33
CHAPTER 3 IN-DEPTH ANALYSIS OF SIGNAL CORRUPTIONS BY NOISES 34
3.1 Analysis of Spectral Peaks 36
3.2 Vector Distance Metrics 39
3.2.1 Unnormalized vector distance metric 39
3.2.2 Normalized vector distance metric by total energies 41
3.2.3 Normalized vector distance metric by maximum energies 43
3.3 Spectral Peak-Based Metric 46
3.3.1 Direct comparison of spectral peak bands 46
3.3.2 Peak extraction-based approach 48
3.4 Summary 50
CHAPTER 4 DIRECT SIMILARITY COMPUTATION BETWEEN PEAK SIGNATURE AND CORRUPTED SPECTRUM 51
4.1 Peak Valley Difference (PVD) 52
4.1.1 Analysis of differences in average energy 52
4.1.2 VAD using average energy differences 54
4.1.3 Remarks on PVD algorithm 55
4.2 Peak-Neighbor Difference (PND) 56
4.2.1 VAD using formant frequencies 56
4.2.2 Band-limited computation for increased robustness against noises 58
4.2.3 Threshold calculation and post processing 60
CHAPTER 5 EXPERIMENTS 61
5.1 Experimental Conditions 61
5.1.1 Data preparation 61
5.1.2 Evaluation metrics 62
5.1.3 Noise mixing using FaNT 63
5.1.4 Baseline systems 64
5.1.5 Test sets 64
5.2 Aurora-2 Results 66
5.2.1 Averaged accuracy by noise type 66
5.2.2 Averaged accuracy by SNR level 67
5.3 NOISEX-92 Results 68
5.3.1 Averaged accuracy by noise type 68
5.3.2 Averaged accuracy by SNR level 69
5.4 Music Results 70
5.4.1 Averaged accuracy by noise type 70
5.4.2 Averaged accuracy by SNR level 71
5.5 Contours of VAD algorithms 72
5.6 Computational overheads 75
CHAPTER 6 CONCLUSION 77

상세검색

RISS 보유자료

상세검색

해외전자자료

Robust voice activity detection using formant frequencies

부가정보

분석정보

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료