RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      Graphics processing unit based real-time implementation of steered response power-phase transform for human-robot interaction

      한글로보기

      https://www.riss.kr/link?id=T12705712

      • 0

        상세조회
      • 0

        다운로드
      서지정보 열기
      • 내보내기
      • 내책장담기
      • 공유하기
      • 오류접수

      부가정보

      다국어 초록 (Multilingual Abstract)

      Speech-based interaction is commonly used for the exchange of information between a human and a robot. However, speech recognition for a robot is obstructed by noise in a real-life environment. Various speech-enhancement techniques have been studied to overcome this problem.
      Sound source localization (SSL) is the technique of determining the direction of a sound source. Because this direction is used as prior information for speech-enhancement technologies such as beamforming, SSL is a crucial component of noise-robust speech-based human-robot interaction. The steered response power-phase transform (SRP-PHAT) method for SSL has been widely used owing to its robustness to reverberations. However, it is known that SRP-PHAT cannot be executed in real time because it needs to calculate a very large number of candidate sound source locations. Thus, various CPU-based approaches have been proposed to overcome this problem.
      Prevailing GPU-based programming toolkits such as compute unified device architecture (CUDA) and open computing language (OpenCL) have helped GPU computing in integrating PC and GPU environments. In order to cope with the changing environments, it is vital to modify conventional algorithms into GPU-based algorithms for improved performance.
      SRP-PHAT is divided into four stages-loading the time-difference-of-arrival (TDOA) table, cross-correlation, SRP energy map calculation, and searching for maximum-SRP coordinates. Each stage is then transformed into a GPU-based framework. If the configurations of the microphone array and candidate coordinates remain unchanged, the TDOA values remain unchanged; therefore, TDOA values are pre-calculated. Cross-correlations are calculated only once per frame because they are commonly referenced by all the microphone pairs. On the basis of these cross-correlations, the SRP energy map for all the candidate coordinates is calculated. The candidate coordinates having maximum SRP are selected as the direction of the sound source.
      The experiment is carried out using a single-core CPU and GPU with a varying number of microphone channels and candidate coordinates. The execution times were measured on a 3.4-GHz CPU and a GPU having 288 CUDA cores. As compared to the execution time of a conventional single-core CPU-based SRP-PHAT, the execution times of the proposed method showed a 11-19-fold and a 19-25-fold improvement.
      In this study, SRP-PHAT optimized into sequential implementation was divided into four stages. Each stage was presented as a generalized parallel framework. Thus, users can adapt the algorithm to suit their application. In particular, the cross-correlation stage presents variable data parallelism with respect to the number of microphones. Similarly, the SRP energy map calculation stage presents variable data parallelism with respect to the number of candidate coordinates. And the searching for the maximum SRP coordinates stage presents the performance improvement in terms of execution time in proportion to log2(NC), where NC is the number of candidate.
      번역하기

      Speech-based interaction is commonly used for the exchange of information between a human and a robot. However, speech recognition for a robot is obstructed by noise in a real-life environment. Various speech-enhancement techniques have been studied t...

      Speech-based interaction is commonly used for the exchange of information between a human and a robot. However, speech recognition for a robot is obstructed by noise in a real-life environment. Various speech-enhancement techniques have been studied to overcome this problem.
      Sound source localization (SSL) is the technique of determining the direction of a sound source. Because this direction is used as prior information for speech-enhancement technologies such as beamforming, SSL is a crucial component of noise-robust speech-based human-robot interaction. The steered response power-phase transform (SRP-PHAT) method for SSL has been widely used owing to its robustness to reverberations. However, it is known that SRP-PHAT cannot be executed in real time because it needs to calculate a very large number of candidate sound source locations. Thus, various CPU-based approaches have been proposed to overcome this problem.
      Prevailing GPU-based programming toolkits such as compute unified device architecture (CUDA) and open computing language (OpenCL) have helped GPU computing in integrating PC and GPU environments. In order to cope with the changing environments, it is vital to modify conventional algorithms into GPU-based algorithms for improved performance.
      SRP-PHAT is divided into four stages-loading the time-difference-of-arrival (TDOA) table, cross-correlation, SRP energy map calculation, and searching for maximum-SRP coordinates. Each stage is then transformed into a GPU-based framework. If the configurations of the microphone array and candidate coordinates remain unchanged, the TDOA values remain unchanged; therefore, TDOA values are pre-calculated. Cross-correlations are calculated only once per frame because they are commonly referenced by all the microphone pairs. On the basis of these cross-correlations, the SRP energy map for all the candidate coordinates is calculated. The candidate coordinates having maximum SRP are selected as the direction of the sound source.
      The experiment is carried out using a single-core CPU and GPU with a varying number of microphone channels and candidate coordinates. The execution times were measured on a 3.4-GHz CPU and a GPU having 288 CUDA cores. As compared to the execution time of a conventional single-core CPU-based SRP-PHAT, the execution times of the proposed method showed a 11-19-fold and a 19-25-fold improvement.
      In this study, SRP-PHAT optimized into sequential implementation was divided into four stages. Each stage was presented as a generalized parallel framework. Thus, users can adapt the algorithm to suit their application. In particular, the cross-correlation stage presents variable data parallelism with respect to the number of microphones. Similarly, the SRP energy map calculation stage presents variable data parallelism with respect to the number of candidate coordinates. And the searching for the maximum SRP coordinates stage presents the performance improvement in terms of execution time in proportion to log2(NC), where NC is the number of candidate.

      더보기

      목차 (Table of Contents)

      • 1. Introduction 1
      • 2. SRP-PHAT 4
      • 3. SRP-PHAT for the Single Core CPU 6
      • 3-1. TDOA Table 6
      • 3-2. Cross correlation 8
      • 1. Introduction 1
      • 2. SRP-PHAT 4
      • 3. SRP-PHAT for the Single Core CPU 6
      • 3-1. TDOA Table 6
      • 3-2. Cross correlation 8
      • 3-3. SRP Energy Map 9
      • 3-4. Maximum SRP Coordinates 10
      • 4. SRP-PHAT for the GPU 11
      • 4-1. TDOA Table 12
      • 4-2. Cross correlation 13
      • 4-3. SRP Energy Map 16
      • 4-4. Maximum SRP Coordinates 19
      • 5. Evaluation with Real Data 23
      • 6. Conclusion 31
      • 7. References 33
      더보기

      분석정보

      View

      상세정보조회

      0

      Usage

      원문다운로드

      0

      대출신청

      0

      복사신청

      0

      EDDS신청

      0

      동일 주제 내 활용도 TOP

      더보기

      주제

      연도별 연구동향

      연도별 활용동향

      연관논문

      연구자 네트워크맵

      공동연구자 (7)

      유사연구자 (20) 활용도상위20명

      이 자료와 함께 이용한 RISS 자료

      나만을 위한 추천자료

      해외이동버튼