등록 화자 임베딩을 활용한 온디바이스 음성 분리 방법|RISS 상세보기

국문 초록 (Abstract)

본 연구는 다중 화자 환경에서 등록된 화자의 음성만을 실시간으로 분리하는 온디바이스 음성 분리 방법을 제안한다. 또한 온디바이스 음성 분리 시스템에 적합한 임베딩 벡터 및 음성 분리 모델을 선정하기 위한 비교 분석을 수행한다. 제안된 시스템은 사용자 등록 모듈, 음성 분리 모듈, 검증 및 전송 모듈, 그리고 Transmission Control Protocol 기반의 Interface Definition Language 세션 전송 구조로 구성된다. 음성 분리 시스템에 적합한 방법 선정을 위해 화자 임베딩에서는 Mel-Frequency Cepstral Coefficients, x-vector, d-vector를 비교하고, 음성 분리에서는 Conv-TasNet, DPRNN-TasNet, and SepFormer를 비교하였다. 다양한 화자 조합 및 소음 환경에서 실험한 결과, x-vector는 평균 88.89%의 높은 화자 검증 정확도를 기록하였고, Conv-TasNet은 음성 명료도, 신호 보존, 처리 속도 측면에서 가장 우수한 성능을 보였다. 본 연구는 다중 사용자 환경에서의 실시간 음성 분리에 있어 온디바이스 기반 음성 분리 시스템의 효율성과 실용 가능성을 확인하였다.

번역하기

본 연구는 다중 화자 환경에서 등록된 화자의 음성만을 실시간으로 분리하는 온디바이스 음성 분리 방법을 제안한다. 또한 온디바이스 음성 분리 시스템에 적합한 임베딩 벡터 및 음성 분리...

다국어 초록 (Multilingual Abstract)

This study proposes an on-device speech separation method that enables real-time voice extraction of registered speakers in multi-speaker environments. In addition, we conduct a comparative analysis to identify suitable embedding vectors and speech separation models for the on-device speech separation method. The proposed system consists of a user registration module, a speech separation module, a verification and transmission module, and a Transmission Control Protocol-based Interface Definition Language session transmission structure. To identify suitable methods for the speech separation system, we compared Mel-Frequency Cepstral Coefficients, x-vectors, and d-vectors for speaker embedding, and compared Conv-TasNet, DPRNN-TasNet, and SepFormer for speech separation. Experimental results across various speaker combinations and noise conditions showed that x-vector achieved a high average speaker verification accuracy of 88.89%, and Conv-TasNet demonstrated superior overall performance in terms of speech clarity, signal preservation, and processing speed. This study confirms the efficiency and applicability of on-device speech separation systems in real-time multi-user environments.

번역하기

상세검색

RISS 보유자료

상세검색

해외전자자료

등록 화자 임베딩을 활용한 온디바이스 음성 분리 방법 = On-Device Speech Separation Method Utilizing Registered Speaker Embeddings

부가정보

동일학술지(권/호) 다른 논문

분석정보

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료