RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제
      • 좁혀본 항목 보기순서

        • 원문유무
        • 원문제공처
          펼치기
        • 등재정보
          펼치기
        • 학술지명
          펼치기
        • 주제분류
          펼치기
        • 발행연도
          펼치기
        • 작성언어

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • 무료
      • 기관 내 무료
      • 유료
      • SCOPUSKCI등재

        Language-Independent Word Acquisition Method Using a State-Transition Model

        Xu, Bin,Yamagishi, Naohide,Suzuki, Makoto,Goto, Masayuki Korean Institute of Industrial Engineers 2016 Industrial Engineeering & Management Systems Vol.15 No.3

        The use of new words, numerous spoken languages, and abbreviations on the Internet is extensive. As such, automatically acquiring words for the purpose of analyzing Internet content is very difficult. In a previous study, we proposed a method for Japanese word segmentation using character N-grams. The previously proposed method is based on a simple state-transition model that is established under the assumption that the input document is described based on four states (denoted as A, B, C, and D) specified beforehand: state A represents words (nouns, verbs, etc.); state B represents statement separators (punctuation marks, conjunctions, etc.); state C represents postpositions (namely, words that follow nouns); and state D represents prepositions (namely, words that precede nouns). According to this state-transition model, based on the states applied to each pseudo-word, we search the document from beginning to end for an accessible pattern. In other words, the process of this transition detects some words during the search. In the present paper, we perform experiments based on the proposed word acquisition algorithm using Japanese and Chinese newspaper articles. These articles were obtained from Japan's Kyoto University and the Chinese People's Daily. The proposed method does not depend on the language structure. If text documents are expressed in Unicode the proposed method can, using the same algorithm, obtain words in Japanese and Chinese, which do not contain spaces between words. Hence, we demonstrate that the proposed method is language independent.

      • KCI등재

        Language-Independent Word Acquisition Method Using a State-Transition Model

        Bin Xu,Naohide Yamagishi,Makoto Suzuki,Masayuki Goto 대한산업공학회 2016 Industrial Engineeering & Management Systems Vol.15 No.3

        The use of new words, numerous spoken languages, and abbreviations on the Internet is extensive. As such, automatically acquiring words for the purpose of analyzing Internet content is very difficult. In a previous study, we proposed a method for Japanese word segmentation using character N-grams. The previously proposed method is based on a simple state-transition model that is established under the assumption that the input document is described based on four states (denoted as A, B, C, and D) specified beforehand: state A represents words (nouns, verbs, etc.); state B represents statement separators (punctuation marks, conjunctions, etc.); state C represents postpositions (namely, words that follow nouns); and state D represents prepositions (namely, words that precede nouns). According to this state-transition model, based on the states applied to each pseudo-word, we search the document from beginning to end for an accessible pattern. In other words, the process of this transition detects some words during the search. In the present paper, we perform experiments based on the proposed word acquisition algorithm using Japanese and Chinese newspaper articles. These articles were obtained from Japan’s Kyoto University and the Chinese People’s Daily. The proposed method does not depend on the language structure. If text documents are expressed in Unicode the proposed method can, using the same algorithm, obtain words in Japanese and Chinese, which do not contain spaces between words. Hence, we demonstrate that the proposed method is language independent.

      • KCI등재

        Ternary Decomposition and Dictionary Extension for Khmer Word Segmentation

        Thaileang Sung,Insoo Hwang 한국데이타베이스학회 2016 Journal of information technology applications & m Vol.23 No.2

        In this paper, we proposed a dictionary extension and a ternary decomposition technique to improve the effectiveness of Khmer word segmentation. Most word segmentation approaches depend on a dictionary. However, the dictionary being used is not fully reliable and cannot cover all the words of the Khmer language. This causes an issue of unknown words or out-of-vocabulary words. Our approach is to extend the original dictionary to be more reliable with new words. In addition, we use ternary decomposition for the segmentation process. In this research, we also introduced the invisible space of the Khmer Unicode (char\u200B) in order to segment our training corpus. With our segmentation algorithm, based on ternary decomposition and invisible space, we can extract new words from our training text and then input the new words into the dictionary. We used an extended wordlist and a segmentation algorithm regardless of the invisible space to test an unannotated text. Our results remarkably outperformed other approaches. We have achieved 88.8%, 91.8% and 90.6% rates of precision, recall and F-measurement.

      • KCI등재

        Ternary Decomposition and Dictionary Extension for Khmer Word Segmentation

        Sung, Thaileang,Hwang, Insoo Korea Data Strategy Society 2016 Journal of information technology applications & m Vol.23 No.2

        In this paper, we proposed a dictionary extension and a ternary decomposition technique to improve the effectiveness of Khmer word segmentation. Most word segmentation approaches depend on a dictionary. However, the dictionary being used is not fully reliable and cannot cover all the words of the Khmer language. This causes an issue of unknown words or out-of-vocabulary words. Our approach is to extend the original dictionary to be more reliable with new words. In addition, we use ternary decomposition for the segmentation process. In this research, we also introduced the invisible space of the Khmer Unicode (char\u200B) in order to segment our training corpus. With our segmentation algorithm, based on ternary decomposition and invisible space, we can extract new words from our training text and then input the new words into the dictionary. We used an extended wordlist and a segmentation algorithm regardless of the invisible space to test an unannotated text. Our results remarkably outperformed other approaches. We have achieved 88.8%, 91.8% and 90.6% rates of precision, recall and F-measurement.

      • KCI등재

        영어의 강음절(강세 음절)과 한국어 화자의 단어 분절

        김선미(Kim Sunmi),남기춘(Nam Kichun) 한국음성학회 2011 말소리와 음성과학 Vol.3 No.1

        It has been posited that in English, native listeners use the Metrical Segmentation Strategy (MSS) for the segmentation of continuous speech. Strong syllables tend to be perceived as potential word onsets for English native speakers, which is due to the high proportion of strong syllables word-initially in the English vocabulary. This study investigates whether Koreans employ the same strategy when segmenting speech input in English. Word-spotting experiments were conducted using vowel-initial and consonant-initial bisyllabic targets embedded in nonsense trisyllables in Experiment 1 and 2, respectively. The effect of strong syllable was significant in the RT (reaction times) analysis but not in the error analysis. In both experiments, Korean listeners detected words more slowly when the word-initial syllable is strong (stressed) than when it is weak (unstressed). However, the error analysis showed that there was no effect of initial stress in Experiment 1 and in the item (F2) analysis in Experiment 2. Only the subject (F1) analysis in Experiment 2 showed that the participants made more errors when the word starts with a strong syllable. These findings suggest that Koran listeners do not use the Metrical Segmentation Strategy for segmenting English speech. They do not treat strong syllables as word beginnings, but rather have difficulties recognizing words when the word starts with a strong syllable. These results are discussed in terms of intonational properties of Korean prosodic phrases which are found to serve as lexical segmentation cues in the Korean language.ㅣㅑㅅㄷㄱㅁ셕ㄷ ?ㄷㅇ

      • On-line Handwritten Uyghur Word Recognition Using Segmentation-Based Techniques

        Mayire Ibrayim,Askar Hamdulla 보안공학연구지원센터 2015 International Journal of Signal Processing, Image Vol.8 No.6

        An approach to online handwriting word recognition using segmentation-based techniques is presented in this paper. This approach is referred to as lexicon-driven approach because an optimal segmentation is generated for each string in the lexicon. Word recognition problem is transformed into matching optimization problems between the dictionary entry and the handwritten word image. The segmentation processes use these steps such as removing delayed strokes, shape analysis of the stroke trajectory, reconstructing delayed strokes and combining adjacent fragments. Dynamic matching is used to ranking the lexicon entries in order to get best match. A match score is assigned to a segmentation and string by matching each segment to the corresponding character in the string with a character recognition algorithm that returns confidence value for each character class. As a result the performance for lexicons of size 10, 100, 500 and 1000are 93.17%, 70.33%, 59.79%,51.20% and 94.85%, 79.75%, 74.42%, 62.19% for adding distance and normalizing distance respectively.

      • KCI등재

        강음절이 한국어 화자의 영어 연속 음성의 어휘 분절에 미치는 영향

        김선미(Kim, Sunmi),남기춘(Nam, Kichun) 한국음성학회 2013 말소리와 음성과학 Vol.5 No.2

        English native listeners have a tendency to treat strong syllables in a speech stream as the potential initial syllables of new words, since the majority of lexical words in English have a word-initial stress. The current study investigates whether Korean (L1) - English (L2) late bilinguals perceive strong syllables in English continuous speech as word onsets, as English native listeners do. In Experiment 1, word-spotting was slower when the word-initial syllable was strong, indicating that Korean listeners do not perceive strong syllables as word onsets. Experiment 2 was conducted in order to avoid any possibilities that the results of Experiment 1 may be due to the strong-initial targets themselves used in Experiment 1 being slower to recognize than the weak-initial targets. We employed the gating paradigm in Experiment 2, and measured the Isolation Point (IP, the point at which participants correctly identify a word without subsequently changing their minds) and the Recognition Point (RP, the point at which participants correctly identify the target with 85% or greater confidence) for the targets excised from the non-words in the two conditions of Experiment 1. Both the mean IPs and the mean RPs were significantly earlier for the strong-initial targets, which means that the results of Experiment 1 reflect the difficulty of segmentation when the initial syllable of words was strong. These results are consistent with Kim & Nam (2011), indicating that strong syllables are not perceived as word onsets for Korean listeners and interfere with lexical segmentation in English running speech.

      • The Role of Post-lexical Intonational Patterns in Korean Word Segmentation

        Kim, Sa-Hyang Korean Society of Speech Sciences 2007 음성과학 Vol.14 No.1

        The current study examines the role of post-lexical tonal patterns of a prosodic phrase in word segmentation. In a word spotting experiment, native Korean listeners were asked to spot a disyllabic or trisyllabic word from twelve syllable speech stream that was composed of three Accentual Phrases (AP). Words occurred with various post-lexical intonation patterns. The results showed that listeners spotted more words in phrase-initial than in phrase-medial position, suggesting that the AP-final H tone from the preceding AP helped listeners to segment the phrase-initial word in the target AP. Results also showed that listeners' error rates were significantly lower when words occurred with initial rising tonal pattern, which is the most frequent intonational pattern imposed upon multisyllabic words in Korean, than with non-rising patterns. This result was observed both in AP-initial and in AP-medial positions, regardless of the frequency and legality of overall AP tonal patterns. Tonal cues other than initial rising tone did not positively influence the error rate. These results not only indicate that rising tone in AP-initial and AP_final position is a reliable cue for word boundary detection for Korean listeners, but further suggest that phrasal intonation contours serve as a possible word boundary cue in languages without lexical prominence.

      • The Role of Prosodic Boundary Cues in Word Segmentation in Korean

        Kim, Sa-Hyang Korean Society of Speech Sciences 2006 음성과학 Vol.13 No.1

        This study investigates the degree to which various prosodic cues at the boundaries of prosodic phrases in Korean contribute to word segmentation. Since most phonological words in Korean are produced as one Accentual Phrase (AP), it was hypothesized that the detection of acoustic cues at AP boundaries would facilitate word segmentation. The prosodic characteristics of Korean APs include initial strengthening at the beginning of the phrase and pitch rise and final lengthening at the end. A perception experiment utilizing an artificial language learning paradigm revealed that cues conforming to the aforementioned prosodic characteristics of Korean facilitated listeners' word segmentation. Results also indicated that duration and amplitude cues were more helpful in segmentation than pitch. Nevertheless, results did show that a pitch cue that did not conform to the Korean AP interfered with segmentation.

      • KCI등재

        학령초기 한국아동의 영어단어 읽기와 읽기 관련 변인들과의 관계

        한찬숙,정윤경,윤혜경 한국심리학회 산하 한국발달심리학회 2009 한국심리학회지 발달 Vol.22 No.2

        The present study examined reading variables which influence English word reading of Korean second graders who are learning English words. Forty four of children divided into two groups(High and Low of Hangul reading skill). In experiment 1, Hangul word naming speed predicted reading skill of High group, and Hangul phonemic segmentation and alphabet letter-sound fluency predicted reading skill of Low group. In experiment 2, Hangul word naming speed and alphabet letter-name fluency predicted High group. Hangul phoneme segmentation and alphabet letter-sound fluency predicted Low group. These results suggest that Korean second grader's Hangul reading skill related with acquisition of English word reading skill, and variables that predict children's English reading skill are different according to the level of Hangul reading skill. 본 연구는 초등학교 2학년 아동을 대상으로 영어단어 읽기 교육을 실시하면서 2개의 실험을 통해 아동의 영어단어 읽기에 영향을 주는 변인들을 살펴본 것이다. 본 연구에 참여한 44명의 아동들은 한글읽기 수준에 따라 상위집단과 하위집단으로 나뉘어졌다. 실험 1에는 한글과 비슷한 패턴의 친숙한 영어단어와 낯선 영어단어 과제가 사용되었다. 실험 결과 두 집단 간의 영어단어 읽기 능력은 유의미한 차이가 있었고, 상위 집단 아동들의 친숙한단어와 낯선단어읽기 모두는 한글단어명명이 유의미하게 예언하였고, 하위집단 아동들의 친숙한단어와 낯선단어읽기 모두는 한글음소분절과 알파벳문자지식숙달이 유의미하게 예언하였다. 실험 2에는 한글과 다른 패턴(CVCV)의 친숙한 영어단어와 한글과 비슷한 패턴(CVC)의 낯선영어단어 과제가 사용되었다. 실험 결과 두 집단 간의 읽기 능력은 유의미한 차이가 있었고, 상위집단의 친숙/상이패턴 단어읽기는 한글음소분절이, 낯선/동일패턴 단어읽기는 한글단어명명과 알파벳문자이름숙달이 유의미하게 예언하였고, 하위집단의 친숙/상이패턴 단어와 낯선/동일패턴 단어 읽기는 한글음소분절과 알파벳문자소리 숙달이 유의미하게 예언하였다. 본 연구결과는 학령초기 한국아동의 한글읽기기술은 영어단어 읽기 기술습득과 관계되고 아동의 영어단어읽기를 예언하는 변인은 한글읽기수준에 따라 다르다는 것을 시사한다.

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼