RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • 무료
      • 기관 내 무료
      • 유료
      • Stem-Affix based Uyghur Morphological Analyzer

        Mijit Ablimit,Tatsuya Kawahara,Akbar Pattar,Askar Hamdulla 보안공학연구지원센터 2016 International Journal of Future Generation Communi Vol.9 No.2

        Uyghur language is an agglutinative language in which words are derived from stems (or roots) by concatenating suffixes. This property makes a large number of combinations of morphemes, and greatly increases the word-vocabulary size, causing out-of-vocabulary (OOV) and data sparseness problems for statistical models. So words are split into certain sub-word units and applied to text and speech processing applications. Proper sub-word units not only provide high coverage and smaller lexicon size, but also provide semantic and syntactic information which is necessary for downstream applications. This paper discusses a general purpose morphological analyzer tool which can split a text of words into sequence of morphemes or syllables. Uyghur morpheme segmentation is a basic part of the comprehensive effort of the Uyghur language corpus compilation. As there are no delimiters for sub-word units, a supervised method, combined with certain rules and a statistical learning algorithm, is applied for morpheme segmentation. For phonetic units like syllable and phonemes, pure rule-based methods can extract with high accuracy. Most common and proper sub-words for various applications can be the linguistic morphemes for they provide linguistic information, high coverage, low lexicon size, and easily be restored to words. As the Uyghur language is written as pronounced, phonetic alterations of speech are openly expressed in text. This property makes many surface forms for a particular morpheme. A general purpose morphological analyzer must be able to analyze and export in both standard and surface forms. So the morpho-phonetic alterations like phonetic harmony, weakening, and morphological changes are summarized and learnt from training corpus. And a statistical model based morpheme segmentation tool is trained on the corpus of aligned word-morpheme sequences, and applied to predict possible morpheme sequences. For an open test set, with word coverage of 86.8% and morpheme coverage of 98.4%, the morpheme segmentation accuracy is 97.6%. This morpheme segmentation tool can output both on the standard forms and on the surface forms without costing segmentation accuracy. Furthermore, for various basic lexical units of word, morpheme, and syllable, the statistical properties are compared as a comprehensive effort of the Uyghur language corpus compilation.

      • Morpheme Segmentation and Concatenation Approaches for Uyghur LVCSR

        Mijit Ablimit,Tatsuya Kawahara,Askar Hamdulla 보안공학연구지원센터 2015 International Journal of Hybrid Information Techno Vol.8 No.8

        In this paper, various kinds of sub-word lexica are thoroughly investigated under the framework of Uyghur LVCSR system. Experimental results show that it is inefficient to directly model based on word units or small units like morpheme or even syllable units. It is observed that an optimal sub-word unit set between word and morpheme units can better fit for ASR system. In order to select best unit set we have investigated several effective unit segmentation, concatenation approaches, and their ASR performances. For segmentation approach, we investigate a supervised segmentation which split words into the smallest functional units - the linguistic morphemes, and an unsupervised segmentation which extract pseudo-morphemes (or statistical morphemes). In supervised model, a leaning algorithm is trained on a manually prepared training corpus, and morpho-phonetics changes are analyzed. In the unsupervised model, the Morfessor tool is used to extract pseudo-morphemes from a raw text corpus. For concatenation approach, several effective concatenation approaches are investigated based on linguistic morphemes. First is the data-driven approach which concatenates morpheme sequences based on certain measures like co-occurrence frequency or mutual probability. Second is a model based approach which merges units with global statistical criteria. In this study, the Morfessor program is revised and turned into concatenation program by controlling segmentation points. Third is the two-layer-lexica based concatenation approach which extracts an optimal sub-word unit set by aligning and comparing the ASR results of word and morpheme two lexical layers. This method utilizes both speech and text, and produced the best results in terms of WER and lexicon size, and proved to be very stable. The best optimal lexicon, which is obtained totally on the basis of HMM based acoustic model, outperformed all other baseline lexica. And when all these lexica are directly incorporated with a deep neural network (DNN) based acoustic model, without changing the speech and text training corpora and language models, the optimal lexicon not only drastically improved the ASR accuracy but also outperformed other units as a proof of the generality of the two-layer-lexica based approach.

      • Serum Carotenoid, Retinol and Tocopherol Concentrations and Risk of Cervical Cancer among Chinese Women

        Zhang, Yuan-Yuan,Lu, Ling,Abliz, Guzalnur,Mijit, Fatima Asian Pacific Journal of Cancer Prevention 2015 Asian Pacific journal of cancer prevention Vol.16 No.7

        Background: Despite many epidemiological studies on the effects of dietary antioxidant micronutrients on risk of cervical cancer, the findings remain uncertain and little evidence is available for serum nutrient markers. The present study aimed to examine the relationship between serum carotenoid, retinol and tocopherol concentrations and risk of cervical cancer among Chinese women. Materials and Methods: We conducted a hospital-based case-control study in which 358 adults (158 incident cases and 200 controls) were recruited from Xinjiang, China. Serum levels of carotenoids (${\alpha}$-carotene, ${\beta}$-carotene, ${\beta}$-cryptoxanthin, lycopene and lutein/zeaxanthin), retinol, and tocopherols (${\alpha}$-tocopherol and ${\gamma}$-tocopherol) were assessed by reverse-phase high-performance liquid chromatography. Results: We found inverse associations between serum carotenoid (${\alpha}$-carotene, ${\beta}$-carotene, and lutein/zeaxanthin) and tocopherol (${\alpha}$-tocopherol) concentrations and the risk of cervical cancer after adjusting for potential confounders, but a null association for retinol. The ORs for 1-SD increase were 0.71 (95 % CI: 0.56-0.92; p=0.003) for total carotenoids and 0.75 (95 % CI: 0.60-0.94; p=0.008) for total tocopherols. Conclusions: These results show that higher serum concentrations of some carotenoids and tocopherols are associated with a lower risk of cervical cancer among Chinese women.

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼