RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      구음장애 환자를 위한 실시간 온디바이스 발화 변환 시스템

      한글로보기

      https://www.riss.kr/link?id=T17367955

      • 0

        상세조회
      • 0

        다운로드
      서지정보 열기
      • 내보내기
      • 내책장담기
      • 공유하기
      • 오류접수

      부가정보

      다국어 초록 (Multilingual Abstract) kakao i 다국어 번역

      Dysarthria, resulting from neurological impairment, significantly degrades speech intelligibility, thereby infringing upon patients' communication rights and leading to social isolation. While deep learning-based Voice Conversion (VC) technology has emerged as a promising alternative, existing approaches often rely on high-performance server resources, raising concerns regarding internet connectivity and privacy. Furthermore, their reliance on batch processing frequently results in high latency, making them unsuitable for real-time conversational scenarios. To overcome these limitations, this study proposes a real-time, intelligent, on-device speech conversion system designed to support practical communication for patients with dysarthria. The proposed system is engineered to operate independently on the NVIDIA Jetson Orin Nano, an edge computing platform. To ensure real-time performance, a Contextual Block Conformer-based streaming Automatic Speech Recognition (ASR) architecture is introduced to process continuous speech input with low latency. This is integrated with an End-to-End Text-to-Speech (TTS) model based on Jointly Training FastSpeech2 and HiFi-GAN, enabling the immediate conversion of generated text into intelligible speech. Furthermore, to guarantee stability in real-world environments, the robustness of the model has been enhanced through data augmentation techniques, including device-specific noise synthesis and speed perturbation. Additionally, a bidirectional communication support function has been implemented by integrating microphone array-based Direction of Arrival (DOA) estimation technology to identify speakers and facilitate dialogue. Experimental results demonstrate that the proposed streaming model significantly reduces the First Response Time to approximately 0.04 seconds compared to conventional batch models. Moreover, it achieved a Real-Time Factor (RTF) sufficient for real-time processing across the entire pipeline, including both ASR and TTS. These findings validate that dysarthric speech can be instantaneously converted into high-quality standard speech even within limited computational resources. This study holds significance in presenting an independent and practical communication assistance solution that transcends laboratory settings and is directly applicable to the daily lives of patients.
      번역하기

      Dysarthria, resulting from neurological impairment, significantly degrades speech intelligibility, thereby infringing upon patients' communication rights and leading to social isolation. While deep learning-based Voice Conversion (VC) technology has e...

      Dysarthria, resulting from neurological impairment, significantly degrades speech intelligibility, thereby infringing upon patients' communication rights and leading to social isolation. While deep learning-based Voice Conversion (VC) technology has emerged as a promising alternative, existing approaches often rely on high-performance server resources, raising concerns regarding internet connectivity and privacy. Furthermore, their reliance on batch processing frequently results in high latency, making them unsuitable for real-time conversational scenarios. To overcome these limitations, this study proposes a real-time, intelligent, on-device speech conversion system designed to support practical communication for patients with dysarthria. The proposed system is engineered to operate independently on the NVIDIA Jetson Orin Nano, an edge computing platform. To ensure real-time performance, a Contextual Block Conformer-based streaming Automatic Speech Recognition (ASR) architecture is introduced to process continuous speech input with low latency. This is integrated with an End-to-End Text-to-Speech (TTS) model based on Jointly Training FastSpeech2 and HiFi-GAN, enabling the immediate conversion of generated text into intelligible speech. Furthermore, to guarantee stability in real-world environments, the robustness of the model has been enhanced through data augmentation techniques, including device-specific noise synthesis and speed perturbation. Additionally, a bidirectional communication support function has been implemented by integrating microphone array-based Direction of Arrival (DOA) estimation technology to identify speakers and facilitate dialogue. Experimental results demonstrate that the proposed streaming model significantly reduces the First Response Time to approximately 0.04 seconds compared to conventional batch models. Moreover, it achieved a Real-Time Factor (RTF) sufficient for real-time processing across the entire pipeline, including both ASR and TTS. These findings validate that dysarthric speech can be instantaneously converted into high-quality standard speech even within limited computational resources. This study holds significance in presenting an independent and practical communication assistance solution that transcends laboratory settings and is directly applicable to the daily lives of patients.

      더보기

      목차 (Table of Contents)

      • I. 서론 1
      • 1. 연구 배경 1
      • 2. 연구 목적 4
      • II. 관련 연구 6
      • 1. ASR-TTS 파이프라인 기반 구음장애 음성 변환 시스템 7
      • I. 서론 1
      • 1. 연구 배경 1
      • 2. 연구 목적 4
      • II. 관련 연구 6
      • 1. ASR-TTS 파이프라인 기반 구음장애 음성 변환 시스템 7
      • 2. 음성 처리를 위한 효율적인 신경망 아키텍처 9
      • Ⅲ. 제안한 방법 11
      • 1. 시스템 개요 11
      • 2. 데이터 증강을 통한 강인한 모델 설계 12
      • 1) 디바이스 특화 잡음 증강 12
      • 2) 속도 섭동을 통한 발화 가변성 대응 13
      • 3. 스트리밍 ASR 아키텍처를 통한 저지연 프로세스 구축 14
      • 1) 전체 아키텍처 14
      • 2) 블록 단위 스트리밍 처리 17
      • 3) 스트리밍 방식의 한계와 파라미터 최적화 전략 19
      • 4) JETS 기반 음성합성 21
      • Ⅳ. 실험 및 결과 23
      • 1. 실험 환경 23
      • 1) 하드웨어 및 소프트웨어 구성 23
      • 2) 데이터셋 23
      • 2. 성능 평가 지표 25
      • 1) 구음장애 음성인식 모델 평가 지표 25
      • 2) 음성합성 모델 평가 지표 26
      • 3. 구음장애 음성인식 모델 성능 평가 27
      • 1) 비교 모델 구성 27
      • 2) 실험 결과 28
      • 4. 음성합성 모델 성능 평가 30
      • 1) 비교 모델 구성 30
      • 2) 실험 결과 30
      • 5. 정성적 분석 및 소결 32
      • Ⅴ. 시스템 응용: DOA 기반 다화자 상호작용 대화 시스템 34
      • Ⅵ. 결론 39
      • 참고 문헌 41
      • ABSTRACT 47
      • 감사의 말 49
      • 연구 업적 51
      더보기

      분석정보

      View

      상세정보조회

      0

      Usage

      원문다운로드

      0

      대출신청

      0

      복사신청

      0

      EDDS신청

      0

      동일 주제 내 활용도 TOP

      더보기

      주제

      연도별 연구동향

      연도별 활용동향

      연관논문

      연구자 네트워크맵

      공동연구자 (7)

      유사연구자 (20) 활용도상위20명

      이 자료와 함께 이용한 RISS 자료

      나만을 위한 추천자료

      해외이동버튼