RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      KCI등재 SCIE SCOPUS

      An Adaptation Method in Noise Mismatch Conditions for DNN-based Speech Enhancement = An Adaptation Method in Noise Mismatch Conditions for DNN-based Speech Enhancement

      한글로보기

      https://www.riss.kr/link?id=A105676669

      • 0

        상세조회
      • 0

        다운로드
      서지정보 열기
      • 내보내기
      • 내책장담기
      • 공유하기
      • 오류접수

      부가정보

      다국어 초록 (Multilingual Abstract)

      The deep learning based speech enhancement has shown considerable success. However, it still suffers performance degradation under mismatch conditions. In this paper, an adaptation method is proposed to improve the performance under noise mismatch con...

      The deep learning based speech enhancement has shown considerable success. However, it still suffers performance degradation under mismatch conditions. In this paper, an adaptation method is proposed to improve the performance under noise mismatch conditions. Firstly, we advise a noise aware training by supplying identity vectors (i-vectors) as parallel input features to adapt deep neural network (DNN) acoustic models with the target noise. Secondly, given a small amount of adaptation data, the noise-dependent DNN is obtained by using L2 regularization from a noise-independent DNN, and forcing the estimated masks to be close to the unadapted condition. Finally, experiments were carried out on different noise and SNR conditions, and the proposed method has achieved significantly 0.1%-9.6% benefits of STOI, and provided consistent improvement in PESQ and segSNR against the baseline systems.

      더보기

      참고문헌 (Reference)

      1 Bingyin Xia, "Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification" 60 : 13-29, 2014

      2 Zechao Li, "Weakly-supervised Deep Matrix Factorization for Social Image Understanding" 1-13, 2016

      3 Zechao Li, "Weakly Supervised Deep Metric Learning for Community-Contributed Image Retrieval" 1989-1999, 2015

      4 Timo Gerkmann, "Unbiased mmse-based noise power eatimation with low complexity and low tracking delay" 20 (20): 1383-1393, 2012

      5 J. Stadermann, "Two-stage speaker adaptation of hybrid tied-posterior acoustic models" I : 997-1000, 2005

      6 Stadermann, J., "Two-stage speaker adaptation of hybrid tied-posterior acoustic models" 2005

      7 C. Veaux, "The voice bank corpus: Design, collection and data analysis of a large regional accent speech database" 1-4, 2013

      8 Steven Boll, "Suppression of acoustic noise in speech using spectral subtraction" 27 (27): 113-122, 1979

      9 Y. Ephraim, "Statistical-model-based speech enhancement systems" 80 (80): 1526-1555, 1992

      10 Kim, D. Y., "Speech recognition in noisy environments using first-order vector Taylor Series" 24 (24): 39-49, 1998

      1 Bingyin Xia, "Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification" 60 : 13-29, 2014

      2 Zechao Li, "Weakly-supervised Deep Matrix Factorization for Social Image Understanding" 1-13, 2016

      3 Zechao Li, "Weakly Supervised Deep Metric Learning for Community-Contributed Image Retrieval" 1989-1999, 2015

      4 Timo Gerkmann, "Unbiased mmse-based noise power eatimation with low complexity and low tracking delay" 20 (20): 1383-1393, 2012

      5 J. Stadermann, "Two-stage speaker adaptation of hybrid tied-posterior acoustic models" I : 997-1000, 2005

      6 Stadermann, J., "Two-stage speaker adaptation of hybrid tied-posterior acoustic models" 2005

      7 C. Veaux, "The voice bank corpus: Design, collection and data analysis of a large regional accent speech database" 1-4, 2013

      8 Steven Boll, "Suppression of acoustic noise in speech using spectral subtraction" 27 (27): 113-122, 1979

      9 Y. Ephraim, "Statistical-model-based speech enhancement systems" 80 (80): 1526-1555, 1992

      10 Kim, D. Y., "Speech recognition in noisy environments using first-order vector Taylor Series" 24 (24): 39-49, 1998

      11 Bing-yin Xia, "Speech enhancement with weighted denoising auto-encoder" INTERSPEECH 3444-3448, 2013

      12 Cohen, "Speech enhancement for nonstationary noise environments" 81 (81): 2403-2418, 2001

      13 Xugang Lu, "Speech enhancement based on deep denoising autoencoder" INTERSPEECH 436-440, 2013

      14 Wang D, "Speech Separation by Humans and Machines" Kluwer 181-197, 2005

      15 Philipos. C. Loizou, "Speech Enhancement: Theory and Practice" CRC Press, Inc 2013

      16 George Saon, "Speaker adaptation of neural network acoustic models using i-vectors" 55-59, 2013

      17 R. Talmon, "Single-channel transient interference suppression with diffusion maps" 21 (21): 132-144, 2013

      18 Glembek, O., "Simplification and optimization of i-vector extraction" 4515-4519, 2011

      19 Kjems U, "Role of mask pattern in intelligibility of ideal binary-masked noisy speech" 126 : 1415-1426, 2009

      20 Kevin W Wilson, "Regularized non-negative matrix factorization with temporal dependencies for speech denoising" Interspeech 411-414, 2008

      21 Li, X., "Regularized adaptation of discriminative classifiers" 1 : I-I, 2006

      22 Hui Zou, "Regularization and variable selection via the elastic net" 67 (67): 301-320, 2005

      23 Tibshirani, R., "Regression shrinkage and selection via the lasso" 58 (58): 267-288, 1996

      24 H. Hermansky, "RASTA processing of speech" 2 (2): 578-589, 1994

      25 H. Hermansky, "Perceptual linear predictive(PLP)analysis of speech" 87 : 1738-1752, 1990

      26 A. Rix, "Perceptual evaluation of speech quality(PESQ)-a new method for speech quality assessment of telephone networks and codecs" 749-752, 2001

      27 Yuxuan Wang, "On Training Target for Supervised Speech Separation" 22 (22): 1849-1858, 2014

      28 Y. Bengio, "Learning deep architectures for AI" 2 (2): 1-127, 2009

      29 Yu, D., "Kl-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition" 7892-7897, 2013

      30 Hinton G.E, "Improving neural networks by preventing co-adaptation of feature detectors" Cornell University 2013

      31 Miao Yajie, "Improving Low-Resource CD-DNN-HMM using Dropout and Multilingual DNN Training" ISCA 2237 (2237): 2237-2241, 2013

      32 Li, J., "HMM adaptation using a phase-sensitive acoustic distortion model for environment-robust speech recognition" 4069-4072, 2008

      33 Dehak, N., "Front-end factor analysis for speaker verification" 19 (19): 788-798, 2011

      34 Yu, D., "Feature learning in deep neural networks-studied on speech recognition tasks" 2013

      35 Seide, F., "Feature engineering in context-dependent deep neural networks for conversational speech transcription" 24-29, 2011

      36 Abdel-Hamid, O., "Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code" 7942-7946, 2013

      37 Li N, "Factors influencing intelligibility of ideal binary masked speech : Implications for noise reduction" 123 (123): 1673-1682, 2008

      38 Christine De Mol, "Elastic-net regularization in learning theory" 25 (25): 201-230, 2009

      39 Shaofei Xue, "Direct Adaptation of Hybrid DNN/HMM Model for Fast Speaker Adaptation in LVCSR Based on Speaker Code" 6389 (6389): 6389-6393, 2013

      40 D. L. Wang, "Computational auditory scene analysis: Principles, algorithms, and applications" Wiley-IEEE Press 2006

      41 D. K. Kim, "Baysian speaker adaptation based on probabilistic principal component analysis" INTERSPEECH 734-737, 2000

      42 A.Varga, "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems" 12 : 247-251, 1993

      43 Seltzer, M., "An investigation of deep neural networks for noise robust speech recognition" 2013

      44 G. Kim, "An algorithm that improves speech intelligibility in noise for normal-hearing listeners" 126 : 1486-1494, 2009

      45 C. Taal, "An algorithm for intelligibility prediction of time-frequency weighted noisy speech" 19 (19): 2125-2136, 2011

      46 J. Lim, "All-pole modeling of degraded speech" 26 (26): 197-210, 1978

      47 J. Duchi, "Adaptive subgradient methods for online learning and stochastic optimization" 2121-2159, 2011

      48 Albensano, D., "Adaptation of artificial neural networks avoiding catastrophic forgetting" 1554-1561, 2006

      49 Scott Pennock, "Accuracy of the perceptual evaluation of speech quality (pesq) algorithm" 25 : 2002

      50 Moreno, P. J., "A vector Taylor series approach for environment-independent speech recognition" 733-736, 1996

      51 Li, J., "A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions via vector Taylor series" 65-70, 2007

      더보기

      동일학술지(권/호) 다른 논문

      동일학술지 더보기

      더보기

      분석정보

      View

      상세정보조회

      0

      Usage

      원문다운로드

      0

      대출신청

      0

      복사신청

      0

      EDDS신청

      0

      동일 주제 내 활용도 TOP

      더보기

      주제

      연도별 연구동향

      연도별 활용동향

      연관논문

      연구자 네트워크맵

      공동연구자 (7)

      유사연구자 (20) 활용도상위20명

      인용정보 인용지수 설명보기

      학술지 이력

      학술지 이력
      연월일 이력구분 이력상세 등재구분
      학술지등록 한글명 : KSII Transactions on Internet and Information Systems
      외국어명 : KSII Transactions on Internet and Information Systems
      2023 평가예정 해외DB학술지평가 신청대상 (해외등재 학술지 평가)
      2020-01-01 평가 등재학술지 유지 (해외등재 학술지 평가) KCI등재
      2013-10-01 평가 등재학술지 선정 (기타) KCI등재
      2011-01-01 평가 등재후보학술지 유지 (기타) KCI등재후보
      2009-01-01 평가 SCOPUS 등재 (신규평가) KCI등재후보
      더보기

      학술지 인용정보

      학술지 인용정보
      기준연도 WOS-KCI 통합IF(2년) KCIF(2년) KCIF(3년)
      2016 0.45 0.21 0.37
      KCIF(4년) KCIF(5년) 중심성지수(3년) 즉시성지수
      0.32 0.29 0.244 0.03
      더보기

      이 자료와 함께 이용한 RISS 자료

      나만을 위한 추천자료

      해외이동버튼