http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Toward Natural and Intelligible Speech Synthesis : An Empirical Study on Transfer Learning
Chaewon Kang,Jeewoo Yoon,Daeun Lee,Migyeong Kang,Seohyun Lim,Juho Jung,Sejung Son,Jinyoung Han 한국방송·미디어공학회 2023 한국방송공학회 학술발표대회 논문집 Vol.2023 No.6
To synthesize natural and intelligible speech with a small amount of data, transfer learning with well-maintained and pre-trained data has been known to be useful. However, little attention has been paid to answer the following research questions with empirically-grounded evidence, How much pre-trained (source) speech data (e.g., 10 K utterances or 10 hours) used in transfer learning is enough for generating natural and intelligible speech? and For generating natural and intelligible speech, how much (target) speech data should at least be provided?, which are essential for the quality of speech synthesis. To answer these questions, this paper conducts extensive experiments on speech synthesis with multiple source and target data with different lengths, speakers, and languages. We show that intelligible and natural speech can be synthesized with only 500 utterances of target data using transfer learning. Our work also reveals that at least 5000 utterances of source pre-trained data are required to synthesize decent speech.