Sound source localization is a major function of many auditory systems. In particular, knowing a speaker?s location not only enables natural bi-directional user interaction but also helps enhance the speech quality in a noisy environment. Steered re...
Sound source localization is a major function of many auditory systems. In particular, knowing a speaker?s location not only enables natural bi-directional user interaction but also helps enhance the speech quality in a noisy environment. Steered response power-phase transform (SRP-PHAT) is widely used for realizing robust sound source localization. However, SRP-PHAT does not always show satisfactory performance when used to find the location of a speaker in a noisy environment. When a voice and noise are simultaneously activated, the location of the voice may not be found from the SRP-PHAT because the SRP-based method finds the location of the sound with the highest output power, regardless of whether the sound is a voice or a noise. To handle this problem, we propose a new speaker localization approach that uses the voice power of the spatial point where a beamformer is focused. The proposed method captures the meaningful frequency components of a human voice under severely corrupted input signal conditions. Using the captured voice frequency components, the proposed method calculates the voice power of a beamforming signal, which is focused on each candidate location or direction. The point that has the highest voice power is selected as the location of the speaker. We compared the proposed method to SRP-PHAT in terms of speaker localization accuracy. The speaker localization accuracy of the proposed method was significantly better than that of the conventional SRP-PHAT in various noisy environments. The speaker localization performance using the proposed method was improved by 27.4% relative to that using SRP-PHAT in various noisy environments with a signal-to-noise ratio (SNR) of 0 dB.