Parallel SRP-PHAT for GPUs|RISS 상세보기

국문 초록 (Abstract)

SRP-PHAT(steered response power phase transform)는 소리가 난 방향을 추정하는 데 널리 사용되는 알고리즘이다. SRP-PHAT는 음원이 있을 수 있는 매우 많은 수의 후보 음원 좌표들을 조사해야 한다. 따라서 기존 SRP-PHAT는 실시간으로 사용되지 못할 수도 있었다. 이 문제를 극복하기 위해, SRP-PHAT를 GPU(graphics processing unit)에서 병렬로 동작할 수 있게 만드는 연구가 있었다. 그러나 기존 GPU 기반 SRP-PHAT는 온 칩(on-chip) 메모리 사용이 충분히 고려되지 않아 GPU에 있는 최대 능력을 활용하지 못하였다. 이 논문에서는 주파수 영역과 시간 영역에서 SRP-PHAT의 GPU 기반 병렬 알고리즘이 제안된다. 제안된 방법은 SRP-PHAT의 메모리 접근 패턴을 최적화하고 온 칩 메모리를 효율적으로 사용한다. 결과적으로 제안된 방법은 수행되는 데 걸리는 시간이 CPU(central processing unit) 기반 알고리즘보다 주파수 영역에서 1,276배 그리고 시간 영역에서 80배 빨라졌다. 또한, 제안된 방법의 수행 시간은 기존 GPU 기반 방법보다 주파수 영역에서 1.5배 그리고 시간 영역에서 6배 향상되었다. 주파수 영역 SRP-PHAT에서, 프레임 당 수행 시간 99.9%는 SRP 커널이 차지한다. 다른 커널들의 병렬화는 거의 효과가 없으므로, SRP 커널이 주로 병렬화되었다. 시간 영역 SRP-PHAT에서, 크로스 스펙트럼과 SRP는 프레임 당 수행 시간에서 각각 19%와 77%를 차지한다. 따라서 크로스 스펙트럼과 SRP 커널이 주로 병렬화되었다. 제안된 알고리즘이 미노토 방법보다 가속된 이유는 다음과 같다. 주파수 영역에서, SRP 커널은 1.5배 가속되었다. 그 이유는 연산에 사용된 데이터가 공유 메모리와 레지스터로 로딩되어 처리되었기 때문이다. 시간 영역에서, 크로스 스펙트럼 커널은 17.7배 가속되었다. 그 이유는 세 가지이다. 첫째, 반복되어 읽혀야 하는 데이터가 레지스터로 미리 로딩(prefetch)되어 처리되었다. 둘째, SM당 스레드 블록 수를 최소화 함으로써 불필요한 스레드 블록의 문맥 교환을 줄였다. 셋째, 모든 글로벌 메모리 접근이 응집되었다. 시간 영역에서, SRP 커널은 7.3배 가속되었다. 그 이유는 네 가지이다. 첫째, 제안된 커널이 공유 메모리와 레지스터를 적극적으로 활용하였다. 둘째, 각 SM에서 불필요한 스레드 블록의 문맥 교환을 줄였다. 셋째, 모든 글로벌 메모리 접근이 응집되었다. 넷째, 최적의 자료 구조를 사용하여 연산 강도를 높였다.

번역하기

SRP-PHAT(steered response power phase transform)는 소리가 난 방향을 추정하는 데 널리 사용되는 알고리즘이다. SRP-PHAT는 음원이 있을 수 있는 매우 많은 수의 후보 음원 좌표들을 조사해야 한다. 따라서...

다국어 초록 (Multilingual Abstract)

The steered response power phase transform (SRP-PHAT) is one of the widely used algorithms for sound source localization. Since it must examine a large number of candidate sound source locations, conventional SRP-PHAT approaches may not be used in real time. To overcome this problem, an effort was made previously to parallelize the SRP-PHAT on graphics processing units (GPUs). However, the full capacities of the GPU were not exploited since on-chip memory usage was not addressed. In this dissertation, we propose GPU-based parallel algorithms of the SRP-PHAT both in the frequency domain and time domain. The proposed methods optimize the memory access patterns of the SRP-PHAT and efficiently use the on-chip memory. As a result, the proposed methods demonstrate a speedup of 1,276 times in the frequency domain and 80 times in the time domain compared to CPU-based algorithms, and 1.5 times in the frequency domain and 6 times in the time domain compared to conventional GPU-based methods. In the frequency domain SRP-PHAT, 99.9% of per frame execution time is a SRP kernel. Since parallelization of other kernels is nearly ineffective, the SRP kernel has mainly parallelized. In the time domain SRP-PHAT, 19% and 77% of the per frame execution time are cross spectrum and SRP kernels respectively. Therefore, cross spectrum and SRP kernels have mainly parallelized. The reasons of acceleration are as follows. In the frequency domain, the SRP kernel has accelerated 1.5 times. The reason is that the data used in the operations have processed after loading it into shared memory and registers. In the time domain, the cross spectrum kernel has accelerated 17.7 times. There are three reasons. First, the proposed kernel has exploited registers. Second, unnecessary context switches of thread blocks have reduced in each SM. Third, all memory accesses have coalesced to multiples of cache line size. In the time domain, the SRP kernel also has accelerated 7.3 times. There are four reasons. First, the proposed kernel has aggressively used the shared memory and registers. Second, unnecessary context switches of thread blocks have reduced in each SM. Third, all memory accesses have coalesced to multiples of cache line size. Fourth, operational intensity has quadrapled to exploit the limited bandwidth.

번역하기

목차 (Table of Contents)

ABSTRACT I
1. INTRODUCTION １
2. DIRECTION OF ARRIVAL ESTIMATION ６
2.1 Acoustic Model ６
2.1.1 Assumptions about Environment ６

ABSTRACT I
1. INTRODUCTION １
2. DIRECTION OF ARRIVAL ESTIMATION ６
2.1 Acoustic Model ６
2.1.1 Assumptions about Environment ６
2.1.2 Single Sound Source Reverberant Environment Acoustic Model ８
2.2 Direction of Arrival Estimation １０
2.2.1 Time of Arrival and Time Difference of Arrival １０
2.2.2 Principles of Direction of Arrival Estimation １１
2.2.3 Near-field and Far-field １７
2.2.4 Robust Time Delay Estimation in Reverberant Environments ２２
2.3 Generalized Cross Correlation Function ２３
2.3.1 GCC-PHAT ２３
2.3.2 Size Compression of a Discrete GCC-PHAT ２８
2.3.3 Credibility of the Computed Result ３１
2.4 SRP-PHAT ３３
2.4.1 Time domain SRP-PHAT ３３
2.4.2 Frequency Domain SRP-PHAT ４０
2.4.3 A Practical Meaning of SRP-PHAT ４４
3. GPU-BASED PARALLEL SRP-PHAT ４６
3.1 GPU Architecture and CUDA ４６
3.2 Conventional GPU-based Parallel SRP-PHAT ５０
3.2.1 Minotto’s Methods ５０
3.2.2 Pros and Cons ５６
3.3 Parallelism in the SRP-PHAT ５７
3.3.1 Parallelism in the SRP-PHAT ５８
3.4 Proposed Parallel SRP-PHAT for GPUs ６０
3.4.1 GPU-based parallel SRP-PHAT in the frequency domain ６０
3.4.2 Example ６３
3.4.3 GPU-based Parallel SRP-PHAT in the time domain ７１
3.4.4 Example ７３
4. EXPERIMENT ８０
5. CONCLUSION ８９
BIBLIOGRAPHY ９１

상세검색

RISS 보유자료

상세검색

해외전자자료

Parallel SRP-PHAT for GPUs

부가정보

분석정보

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료