http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Audio-Based Video Editing with Two-Channel Microphone
Tetsuya Takiguchi,Jun Adachi,Yasuo Ariki 보안공학연구지원센터 2008 International Journal of Hybrid Information Techno Vol.1 No.3
Audio has a key index in digital videos that can provide useful information for video editing, such as capturing conversations only, clipping only talking people, and so on. In this paper, we are studying about video editing based on audio with a two-channel (stereo) microphone that is standard equipment on video cameras, where the video content is automatically recorded without a cameraman. In order to capture only a talking person on video, a novel voice/non-voice detection algorithm using AdaBoost, which can achieve extremely high detection rates in noisy environments, is used. In addition, the sound source direction is estimated by the CSP (Crosspower-Spectrum Phase) method in order to zoom in on the talking person by clipping frames from videos, where a two-channel (stereo) microphone is used to obtain information about time differences between the microphones.
Human-Robot Interface Using System Request Utterance Detection Based on Acoustic Features
Tetsuya Takiguchi,Tomoyuki Yamagata,Atsushi Sako,Nobuyuki Miyake,Jerome Revaud,Yasuo Ariki 보안공학연구지원센터 2008 International Journal of Hybrid Information Techno Vol.1 No.3
For a mobile robot to serve people in actual environments, such as a living room or a party room, it must be easy to control because some users might not even be capable of operating a computer keyboard. For non-expert users, speech recognition is one of the most effective communication tools when it comes to a hands-free (human-robot) interface. This paper describes a new mobile robot with hands-free speech recognition. For a hands-free speech interface, it is important to detect commands for a robot in spontaneous utterances. Our system can understand whether user’s utterances are commands for the robot or not, where commands are discriminated from humanhuman conversations by acoustic features. Then the robot can move according to the user’s voice (command). In order to capture the user’s voice only, a robust voice detection system with AdaBoost is also described.
Extraction of Human Activities as Action Sequences using pLSA and Prefix Span
Takuya TONARU,Tetsuya TAKIGUCHI,Yasuo ARIKI 보안공학연구지원센터 2009 International Journal of Hybrid Information Techno Vol.2 No.1
In this paper, we propose a framework for recognizing human activities in our daily life. Since a human activity is represented as a sequence of actions, the actions are recognized from videos and then the frequently-occurring human activities can be extracted from them. We show the experimental results applied to the data taken in a deskwork environment to demonstrate the performance of the proposed framework. The experimental results were as follows: 86.0% averaged recall rate and 78.3% averaged precision rate were obtained in extracting human activities.
Speaker Independent Phoneme Recognition Based on Fisher Weight Map
Takashi Muroi,Tetsuya Takiguchi,Yasuo Ariki 보안공학연구지원센터 2008 International Journal of Hybrid Information Techno Vol.1 No.3
We have already proposed a new feature extraction method based on higher-order local auto-correlation and Fisher weight map (FWM) at Interspeech2006. This paper shows effectiveness of the proposed FWM in speaker dependent and speaker independent phoneme recognition. Widely used MFCC features lack temporal dynamics. To solve this problem, local auto-correlation features are computed and accumulated by weighting high scores on the discriminative areas. This score map is called Fisher weight map. From the speaker dependent phoneme recognition, the proposed FWM showed 79.5% recognition rate, by 5.0 points higher than the result by MFCC. Furhermore by combing FWM with MFCC and ¢MFCC, the recognition rate improved to 88.3%. In the speaker independent phoneme recognition, it showed 84.2% recognition rate, by 11.0 points higher than the result by MFCC. By combining FWM with MFCC and ¢MFCC, the reecognition rate improved to 89.0%.
Yusuke Mizuno,Tetsuya Takiguchi,Yasuo Ariki 한국멀티미디어학회 2009 한국멀티미디어학회 국제학술대회 Vol.2009 No.-
In this paper, we propose a method to estimate ground surface displacement accurately from microwave radar images captured before/after an earthquake and also inspect the availability of the method. Phase-only correlation is used for sub-pixel image matching of small local regions between two input images with a high degree of accuracy. The proposed method is examined through the experiment using real satellite images.
Generic Object Recognition using CRF by Incorporating BoF as Global Features
Takeshi Okumura,Tetsuya Takiguchi,Yasuo Ariki 한국멀티미디어학회 2009 한국멀티미디어학회 국제학술대회 Vol.2009 No.-
Generic object recognition by a computer is strongly required in various fields like robot vision and image retrieval in recent years. Conventional methods use Conditional Random Field (CRF) that recognizes the class of each region using the features extracted from the local regions and the class co-occurrence between the adjoining regions. However, there is a problem that CRF tends to fall into the local optimal recognition result because it uses only local features and the relation. To solve this problem, we propose a method that recognizes generic objects by incorporating Bag of Features (BoF) as the global feature into CRF. As a result of the experiment to the image dataset of 21 classes, the proposal method has improved the recognition rate by 6.5%.
Situation Recognition Using 3D Positional Information of Ball from Monocular Soccer Image Sequence
Takuro Nishino,Yasuo Ariki,Tetsuya Takiguchi 한국멀티미디어학회 2009 한국멀티미디어학회 국제학술대회 Vol.2009 No.-
In this paper, we propose a system that tracks a ball stably and accurately and detects the events of the game by using the 3D positional information for automatic soccer video production. We use 3D particle filter with the state vector of nine dimensions in the ball tracking. Since the ball tracking by particle filter is a local search, it is difficult to continue tracking when it fails. Thus, we solve this problem by switching the local search to 3D global search, and by interpolating the lost coordinates. As a result, the tracking accuracy was improved by about 19.8%, and the events like the goal or the goal kick was detected with high accuracy.