http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Music Similarity Search Based on Music Emotion Classification
Kim, Hyoung-Gook,Kim, Jang-Heon The Acoustical Society of Korea 2007 韓國音響學會誌 Vol.26 No.e3
This paper presents an efficient algorithm to retrieve similar music files from a large archive of digital music database. Users are able to navigate and discover new music files which sound similar to a given query music file by searching for the archive. Since most of the methods for finding similar music files from a large database requires on computing the distance between a given query music file and every music file in the database, they are very time-consuming procedures. By measuring the acoustic distance between the pre-classified music files with the same type of emotion, the proposed method significantly speeds up the search process and increases the precision in comparison with the brute-force method.
Enhanced Timing Recovery Using Active Jitter Estimation for Voice-Over IP Networks
( Hyoung-gook Kim ) 한국인터넷정보학회 2012 KSII Transactions on Internet and Information Syst Vol.6 No.4
Improving the quality of service in IP networks is a major challenge for real-time voice communications. In particular, packet arrival-delay variation, so-called “jitter,” is one of the main factors that degrade the quality of voice in mobile devices with the voice-over Internet protocol (VoIP). To resolve this issue, a receiver-based enhanced timing recovery algorithm combined with active jitter estimation is proposed. The proposed algorithm copes with the effect of transmission jitter by expanding or compressing each packet according to the predicted network delay and variations. Additionally, the active network jitter estimation incorporates rapid detection of delay spikes and reacts to changes in network conditions. Extensive simulations have shown that the proposed algorithm delivers high voice quality by pursuing an optimal trade-off between average buffering delay and packet loss rate.
Dimension-Reduced Audio Spectrum Projection Features for Classifying Video Sound Clips
Kim, Hyoung-Gook The Acoustical Society of Korea 2006 韓國音響學會誌 Vol.25 No.e3
For audio indexing and targeted search of specific audio or corresponding visual contents, the MPEG-7 standard has adopted a sound classification framework, in which dimension-reduced Audio Spectrum Projection (ASP) features are used to train continuous hidden Markov models (HMMs) for classification of various sounds. The MPEG-7 employs Principal Component Analysis (PCA) or Independent Component Analysis (ICA) for the dimensional reduction. Other well-established techniques include Non-negative Matrix Factorization (NMF), Linear Discriminant Analysis (LDA) and Discrete Cosine Transformation (DCT). In this paper we compare the performance of different dimensional reduction methods with Gaussian mixture models (GMMs) and HMMs in the classifying video sound clips.
Automatic Emotion Classification of Music Signals Using MDCT-Driven Timbre and Tempo Features
Kim, Hyoung-Gook,Eom, Ki-Wan The Acoustical Society of Korea 2006 韓國音響學會誌 Vol.25 No.e2
This paper proposes an effective method for classifying emotions of the music from its acoustical signals. Two feature sets, timbre and tempo, are directly extracted from the modified discrete cosine transform coefficients (MDCT), which are the output of partial MP3 (MPEG 1 Layer 3) decoder. Our tempo feature extraction method is based on the long-term modulation spectrum analysis. In order to effectively combine these two feature sets with different time resolution in an integrated system, a classifier with two layers based on AdaBoost algorithm is used. In the first layer the MDCT-driven timbre features are employed. By adding the MDCT-driven tempo feature in the second layer, the classification precision is improved dramatically.
Robust Music Identification Using Long-Term Dynamic Modulation Spectrum
Kim, Hyoung-Gook,Eom, Ki-Wan The Acoustical Society of Korea 2006 韓國音響學會誌 Vol.25 No.e2
In this paper, we propose a robust music audio fingerprinting system for automatic music retrieval. The fingerprint feature is extracted from the long-term dynamic modulation spectrum (LDMS) estimation in the perceptual compressed domain. The major advantage of this feature is its significant robustness against severe background noise from the street and cars. Further the fast searching is performed by looking up hash table with 32-bit hash values. The hash value bits are quantized from the logarithmic scale modulation frequency coefficients. Experiments illustrate that the LDMS fingerprint has advantages of high scalability, robustness and small fingerprint size. Moreover, the performance is improved remarkably under the severe recording-noise conditions compared with other power spectrum-based robust fingerprints.
Retrieval of Broadcast News Using Audio Content Analysis
Kim, Hyoung-Gook The Acoustical Society of Korea 2007 韓國音響學會誌 Vol.26 No.e3
In this paper, we report our recent work on a indexing and retrieval system of broadcast news using audio content analysis. Key issues addressed in this work are two major parts of the audio indexing system: anchorperson detection based on audio segmentation, and phone-based spoken document retrieval, developed in the framework of the emerging MPEG-7 standard. Experiments are conducted on a database of Britisch broadcast news videos. We discuss the development of the retrieval system, and the evaluation of each part and the retrieval system.
No-reference quality assessment of dynamic sports videos based on a spatiotemporal motion model
Kim, Hyoung-Gook,Shin, Seung-Su,Kim, Sang-Wook,Lee, Gi Yong Electronics and Telecommunications Research Instit 2021 ETRI Journal Vol.43 No.3
This paper proposes an approach to improve the performance of no-reference video quality assessment for sports videos with dynamic motion scenes using an efficient spatiotemporal model. In the proposed method, we divide the video sequences into video blocks and apply a 3D shearlet transform that can efficiently extract primary spatiotemporal features to capture dynamic natural motion scene statistics from the incoming video blocks. The concatenation of a deep residual bidirectional gated recurrent neural network and logistic regression is used to learn the spatiotemporal correlation more robustly and predict the perceptual quality score. In addition, conditional video block-wise constraints are incorporated into the objective function to improve quality estimation performance for the entire video. The experimental results show that the proposed method extracts spatiotemporal motion information more effectively and predicts the video quality with higher accuracy than the conventional no-reference video quality assessment methods.
Enhancing VoIP speech quality using combined playout control and signal reconstruction
Hyoung-Gook Kim,Jin-Ho Lee IEEE 2012 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS - Vol.58 No.2
<P>The quality of real-time Voice over Internet Protocol (VoIP) networks is affected by network impairments such as delays, jitters, and packet loss. To solve this issue, this paper proposes a new receiver-based enhancing method of VoIP speech quality. Our approach is based on the combined playout control and signal reconstruction technique that consists of a set of algorithms that conceal packet loss, reduce buffering delay, detect spike delay, and alleviate packet delay jitter. The proposed fully receiver-based enhancing algorithm is computationally efficient, delivers high-quality voice service, and is suitable for use in any practical mobile VoIP system.</P>