RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제
      • 좁혀본 항목 보기순서

        • 원문유무
        • 원문제공처
          펼치기
        • 등재정보
          펼치기
        • 학술지명
          펼치기
        • 주제분류
          펼치기
        • 발행연도
          펼치기
        • 작성언어
        • 저자
          펼치기

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • 무료
      • 기관 내 무료
      • 유료
      • KCI등재

        The structural-hierarchical organization of the word family in Buryat

        ( Darima Kharanutova ) 한국알타이학회 2015 알타이학보 Vol.0 No.25

        The material for the study includes one root words of the Buryat language structured with the help of the cluster method. The word-formation nest of Buryat words is not organized as in Russian by the graded method but by the cluster one. The relationship of coderivation or equal derivation is a universal feature of the word-building system of Buryat. The most appropriate term for understanding of Buryat word formation paradigm is the cluster of derivatives. The article is devoted to the problem of single rooted words modeling by the cluster method. The structuring of derivatives of the same derivational base as a word formation cluster is rested upon the derivational peculiarity of Buryat that represents different ways of word formation and not only affixation. It is characterized by the semantic derivation including composition and its varieties compounding, reduplication, syntactic derivation, etc. As a result derivatives with complex affixes (affixes coalition), derivational synonyms, pair words, reduplicants, compound words, syntactic words, contaminated words, conversives are found in one cluster. Undoubtedly, one-root lexical class structuring demands different approach: nest modeling requires on an axis rotation, derivational nest peak forming the axis which is traditionally located to the left of all derivatives. In our case the peak is in the center so vertical development results in consecutive derivation (in the given paper a consecutive chain of derivatives is named as “branch”), horizontal development results in one level clustering (in the paper derivatives with the same derivation base are named as fan of derivatives). In the paper we conclude that a derivational cluster reflecting systematic organization of Buryat one-root words reveals an important feature of word formation potential of Buryat words - derivational relations. These relations are mostly coderivative. In this connection Buryat derivational nest appears as a paradigm represented by extensive clusters and short branches.

      • KCI등재

        Microblog User Geolocation by Extracting Local Words Based on Word Clustering and Wrapper Feature Selection

        ( Hechan Tian ),( Fenlin Liu ),( Xiangyang Luo ),( Fan Zhang ),( Yaqiong Qiao ) 한국인터넷정보학회 2020 KSII Transactions on Internet and Information Syst Vol.14 No.10

        Existing methods always rely on statistical features to extract local words for microblog user geolocation. There are many non-local words in extracted words, which makes geolocation accuracy lower. Considering the statistical and semantic features of local words, this paper proposes a microblog user geolocation method by extracting local words based on word clustering and wrapper feature selection. First, ordinary words without positional indications are initially filtered based on statistical features. Second, a word clustering algorithm based on word vectors is proposed. The remaining semantically similar words are clustered together based on the distance of word vectors with semantic meanings. Next, a wrapper feature selection algorithm based on sequential backward subset search is proposed. The cluster subset with the best geolocation effect is selected. Words in selected cluster subset are extracted as local words. Finally, the Naive Bayes classifier is trained based on local words to geolocate the microblog user. The proposed method is validated based on two different types of microblog data - Twitter and Weibo. The results show that the proposed method outperforms existing two typical methods based on statistical features in terms of accuracy, precision, recall, and F1-score.

      • KCI등재후보

        Word2Vec을 활용한 문법 탐구의 핵심 어휘 탐색 연구

        강지영,김범진,나상수 서울대학교 국어교육연구소 2020 국어교육연구 Vol.46 No.-

        This study aimed to explore the concepts and attributes of grammar inquiry based on existing research. Accordingly, articles with the keyword “inquiry” were extracted from domestic journals, and overall, 77 papers directly related to “grammar education” were selected as research subjects. All these papers were transcribed to build a corpus and morphemic analysis was conducted using Mecab in the Python package KoNLPy. The corpus was further refined by excluding “unnecessary words” that did not affect composition of the meaning of the text. Word2Vec analysis was conducted on the refined corpus, since it is based on the distribution hypothesis which can demonstrate the relationship between words well. The top 132 words with high cosine similarity to “inquiry” were extracted from the word vector constructed through Word2Vec, and clustering was conducted to capture words more relevant to grammar inquiry. Subsequently, from a total of six clusters, one cluster presumed to be the most relevant to grammar inquiry was finally selected and a qualitative interpretation of the selected cluster was organized. Words such as thinking, divergence, high dimensional, etc. are primarily related to grammar inquiry, and should be noted in future grammar inquiry discussions. 본 연구는 그간 이루어져 온 문법 탐구 논의들을 바탕으로 문법 탐구의 개념 및 속성을 탐색하고자 하였다. 이를 위하여 국내 학술지에 실린 논문 중 키워드가 ‘탐구’인 것들을 추출하고, 이 중 ‘문법 교육’과 직접적으로 관련이 있는 것들을 선별하여 총 77개의 논문을 연구 대상으로 삼았다. 이후 이들을 모두 전사하여 말뭉치를 구축하고, 파이선 패키지 KoNLPy의 Mecab 분석기를 이용하여 형태소 분석을 실시하였으며, 텍스트의 의미 구성에 영향을 미치지 않는 불필요한 단어들을 불용어로 선정하여 제외하는 등 말뭉치를 정제하는 작업을 거쳤다. 이렇게 정제된 말뭉치에 대하여 Word2Vec 분석을 실시하였는데, 이는 Word2Vec이 분포 가설에 입각하기 때문에 단어 간 관계를 잘 보여 줄 수 있다고 판단하였기 때문이다. Word2Vec을 통해 구축된 워드벡터에서 ‘탐구’와 코사인 유사도가 높은 단어 상위 132개를 추출하고 그중 문법 탐구와 보다 관련이 깊은 단어들을 포착하기 위하여 군집화를 실시하였다. 총 6개의 군집 중 문법 탐구와 가장 관련이 깊다고 추정되는 하나의 군집을 최종적으로 선정하여 질적 해석을 시도하였다. 그 결과, ‘사고’, ‘발산’, ‘고차’, ‘고차원’은 문법 탐구와 관련되는 사고들을 보여 주는 단어로서, ‘분석력’, ‘비판력’, ‘관찰력’, ‘(문제) 해결력’은 문법 탐구 능력과 관련되는 단어로서 범주화되었다. 또한, ‘기쁨’, ‘성취감’, ‘즐거움’은 문법 탐구의 정의적 측면에서의 긍정적 효과를, ‘안내’, ‘해결’, ‘순환’은 문법 탐구의 과정적 속성을, ‘경험’, ‘과제’는 문법 탐구를 바라보는 관점의 차이를 보여 주었다. 이러한 단어들은 문법 탐구와 주요하게 관련되는 단어들로서, 앞으로 문법 탐구 논의에서 관심 있게 다루어져야 하는 대상으로 볼 수 있다.

      • KCI등재

        Word2Vec를 이용한 한국어 단어 군집화 기법

        허지욱 한국인터넷방송통신학회 2018 한국인터넷방송통신학회 논문지 Vol.18 No.5

        최근 인터넷의 발전과 함께 사용자들이 원하는 정보를 빠르게 획득하기 위해서는 효율적인 검색 결과를 제공해주는 정보검색이나 데이터 추출등과 같은 연구 분야에 대한 중요성이 점점 커지고 있다. 하지만 새롭게 생겨나는 한국어 단어나 유행어들은 의미파악하기가 어렵기 때문에 주어진 단어와 의미적으로 유사한 단어들을 찾아 분석하는 기법들에 대한 연구가 필요하다. 이를 해결하기 위한 방법 중 하나인 단어 군집화 기법은 문서에서 주어진 단어와 의미상 유사한 단어들을 찾아서 묶어주는 기법이다. 본 논문에서는 Word2Vec기법을 이용하여 주어진 한글 문서의 단어들을 임베딩하여 자동적으로 유사한 한국어 단어들을 군집화 하는 기법을 제안한다. Recently with the development of Internet technology, a lot of research area such as retrieval and extracting data have getting important for providing the information efficiently and quickly. Especially, the technique of analyzing and finding the semantic similar words for given korean word such as compound words or generated newly is necessary because it is not easy to catch the meaning or semantic about them. To handle of this problem, word clustering is one of the technique which is grouping the similar words of given word. In this paper, we proposed the korean language clustering technique that clusters the similar words by embedding the words using Word2Vec from the given documents.

      • KCI등재

        어휘 군집 방법이 중학교 영어 학습자의 어휘 습득에 미치는 영향

        하봄이 ( Ha Bomyi ),윤현숙 ( Yoon Hyunsook ) 한국외국어대학교 외국어교육연구소 2018 외국어교육연구 Vol.32 No.3

        본 연구의 목적은 다양한 어휘 군집 방법이 중학교 영어 학습자의 어휘 학습에 미치는 영향과 학습자가 다양한 어휘 군집 방법의 효과에 대해 가지는 기대 및 인식과 실제 효과와의 관계를 비교하고자 하는 데 있다. 이를 위해 32명의 학습자가 4가지 군집방법, 즉, 무작위군, 의미군, 두 가지의 주제군 군집방법에 참여하였다. 주제군은 학습자의 주제 친숙도에 따라 주제군1과 주제군2로 나뉘어 진행되었다. 연구 결과, 즉각적인 학습 효과와 장기적인 학습효과 모두에서 의미군을 제외하고 무작위군, 주제군1, 주제군2의 학습 효과가 비슷한 수준으로 나타났다. 하지만, 의미군에서는 통계적으로 유의미한 수준의 낮은 학습효과가 발견되었다. 또한, 학습자의 주제 친숙도는 학습효과에 별다른 영향을 미치지 않았으며, 학습자들이 다양한 어휘군집 방법의 학습효과에 대해 갖는 기대와 인식도 실제 학습효과와는 무관한 것으로 나타났다. 이 결과를 바탕으로 어휘 교수에 대한 제언이 제시된다. This study aimed to investigate the effects of different word clustering on middle school English learners’ vocabulary acquisition and the relationship between the learners’ perceptions on the effectiveness of different types of word clustering and actual test scores. A total of 32 students participated in vocabulary learning sessions with four different types of word clustering: unrelated, semantic, and two types of thematic clustering. The two thematic clusters were selected based on the degree of learners’ familiarity with topics. The results showed that the three types of clustering, which are unrelated and two types of thematic clustering, were equally effective, but semantic clustering was less effective than the other three types in both short-term and long-term vocabulary learning. Also, topic familiarity resulted in no significant difference in the effectiveness of vocabulary learning, and no meaningful relationship was found between the learners’ perception on the effectiveness of clustering types and the actual test scores. Implications for teaching vocabulary are discussed based on the results.

      • KCI우수등재

        Clustering high-cardinality categorical data using category embedding methods

        Hyun Cho,Yeojin Chung 한국데이터정보과학회 2020 한국데이터정보과학회지 Vol.31 No.1

        Compared to clustering numerical data, clustering algorithms for categorical data have not been extensively studied, particularly for data with high-cardinality attributes. When categorical attributes have a large number of levels, clustering algorithms tend to suffer from the curse of dimensionality. In this study, we verified that a good clustering performance can be achieved in the presence of categorical attributes by com-bining clustering algorithms typically applied to numerical data with word embedding methods. Using word embedding methods that were originally developed for natural language processing, the levels of categorical attributes can be represented in a vector space, where the resulting embedding vectors would reflect the relationship between frequently appearing categories. We utilized Word2vec, GloVe, and fastText for category embedding. We also applied K-means and Gaussian mixture model for clustering the embedded data. The clustering performance of the proposed methods was compared with that of typical clustering algorithms for categorical data, namely, K-mode and robust clustering using links. In a simulation study and experiments employing real-life examples, the Gaussian mixture model with GloVe had the best performance, especially when the number of observations and complexity of data was increased.

      • KCI등재

        과학교과서 텍스트의 계량적 분석을 이용한 과학 개념어의 생산적 지식 교육 방안 탐색

        윤은정 ( Eunjeong Yun ) 한국과학교육학회 2020 한국과학교육학회지 Vol.40 No.1

        과학 개념에 대한 이해를 언어학적 관점에서 바라보면 학생들이 과학 개념어에 대한 깊고 정교한 이해와 더불어 정확하게 사용할 수 있는 능력을 길러주는 것이 매우 중요하다. 본 연구에서는 지금까지 과학 교육에서 과학 개념어에 대한 생산적 지식 교육의 기틀이 잘 마련되어 있지 않음에 주목하고, 과학 개념을 구성하고 있는 단어들 사이의 관계를 생산적이고 효과적으로 교육할 수 있는 방안을 탐색함으로써 과학 개념어의 생산적 지식 교육의 기틀을 제공하고자 하였다. 이를 위해 첫째, 몇 가지의 계량 언어학적 텍스트 분석 방법을 이용하여 과학 교과서 텍스트로 부터 과학 개념을 구성하고 있는 단어들과 그들 사이의 관계를 추출하고, 둘째, 각 방법의 결과로 추출된 단어 관계의 의미를 정성적으로 살펴본 뒤, 셋째, 이를 이용하여 과학 개념어의 생산적 지식 향상에 도움을 줄 수 있는 쓰기 활동 방법을 제안해 보았다. 중학교 1학년 과학교과서 ‘힘과 운동’ 단원 텍스트를 클러스터 분석, 공기 빈도 분석, 텍스트 네트워크 분석, 그리고 워드임베딩의 네 가지 계량 언어학적 분석 방법을 사용하여 분석해 보았다. 연구결과 첫째, 클러스터 분석 결과를 활용하여 문장 완성하기 활동을 제안하였다. 둘째, 공기 빈도 분석 결과를 이용한 빈 칸 채우기 활동을 제안하였다. 셋째, 네트워크 분석 결과를 이용하여 소재 중심 글쓰기 활동을 제안하였다. 넷째, 워드임베딩을 이용한 학습 중요 단어 목록 작성을 제안하였다. Looking at the understanding of scientific concepts from a linguistic perspective, it is very important for students to develop a deep and sophisticated understanding of words used in scientific concept as well as the ability to use them correctly. This study intends to provide the basis for productive knowledge education of scientific words by noting that the foundation of productive knowledge teaching on scientific words is not well established, and by exploring ways to teach the relationship among words that constitute scientific concept in a productive and effective manner. To this end, we extracted the relationship among the words that make up the scientific concept from the text of science textbook by using quantitative text analysis methods, second, qualitatively examined the meaning of the word relationship extracted as a result of each method, and third, we proposed a writing activity method to help improve the productive knowledge of scientific concept words. We analyzed the text of the “Force and motion” unit on first grade science textbook by using four methods of quantitative linguistic analysis: word cluster, co-occurrence, text network analysis, and word-embedding. As results, this study suggests four writing activities, completing sentence activity by using the result of word cluster analysis, filling the blanks activity by using the result of co-occurrence analysis, material-oriented writing activities by using the result of text network analysis, and finally we made a list of important words by using the result of word embedding.

      • KCI등재

        A Large-scale Text Analysis with Word Embeddings and Topic Modeling

        ( Won-joon Choi ),( Euhee Kim ) 서울대학교 인지과학연구소 2019 Journal of Cognitive Science Vol.20 No.1

        This research exemplifies how statistical semantic models and word embedding techniques can play a role in understanding the system of human knowledge. Intuitively, we speculate that when a person is given a piece of text, they first classify the semantic contents, group them to semantically similar texts previously observed, then relate their contents with the group. We attempt to model this process of knowledge linking by using word embeddings and topic modeling. Specifically, we propose a model that analyzes the semantic/thematic structure of a given corpus, so as to replicate the cognitive process of knowledge ingestion. Our model attempts to make the best of both word embeddings and topic modeling by first clustering documents and then performing topic modeling on them. To demonstrate our approach, we apply our method to the Corpus of Contemporary American English (COCA). In COCA, the texts are first divided by text type and then by subcategory, which represents the specific topics of the documents. To show the effectiveness of our analysis, we specifically focus on the texts related to the domain of science. First, we cull out science-related texts from various genres, then preprocess the texts into a usable, appropriate format. In our preprocessing steps, we attempt to fine-grain the texts with a combination of tokenization, parsing, and lemmatization. Through this preprocess, we discard words of little semantic value and disambiguate syntactically ambiguous words. Afterwards, using only the nouns from the corpus, we train a word2vec model on the documents and apply K-means clustering to them. The results from clustering show that each cluster represents each branch of science, similar to how people relate a new piece of text to semantically related documents. With these results, we proceed on to perform topic modeling on each of these clusters, which reveal latent topics cluster and their relationship with each other. Through this research, we demonstrate a way to analyze a mass corpus and highlight the semantic/ thematic structure of topics in it, which can be thought as a representation of knowledge in human cognition.

      • KCI등재

        Aspects of Echo Word Formation in Hindi and Kashmiri

        ( Chung Chin-wan ) 한국현대언어학회 2018 언어연구 Vol.33 No.4

        This study delves into echo word formation in Hindi and Kashmiri which is considered a case of partial reduplication. Even though two languages share the same designated first segment for the echo word, they implement the process differently from each other. This is attributed to different restrictions on the identical leftmost segment /v/ in the base and the echo word. Hindi prefers to delete one of the /v/s while Kashmiri replaces the /v/ with the /p/ in the echo word. Another difference is how the fixed segment in the echo word replaces the onset clusters in the base. Hindi only replaces the first segment in the cluster whereas the whole cluster is replaced by the fixed segment in Kashmiri. The different mode of implementing echo word formation process is reflected in the constraints and their specific ranking for each language can readily explain the relevant examples in both languages. (Chonbuk National University)

      • KCI우수등재

        Word2vec 모델로 학습된 단어 벡터의 의미 관계 분석

        강형석(Hyungsuc Kang),양장훈(Janghoon Yang) Korean Institute of Information Scientists and Eng 2019 정보과학회논문지 Vol.46 No.10

        As the usage of artificial intelligence (AI) in natural language processing has increased, the importance of word embedding has grown significantly. This paper qualitatively analyzes the representational capability of word2vec models to structure semantic relation in terms of antonymy and hyponymy based on clustering characteristics and t-SNE distribution. To this end, a K-means clustering algorithm was applied to a set of words drawn from 10 categories. Some words in antonymy are found not to be embedded properly. This is attributed to the fact that they typically have many common attributes with a very few opposite ones. It is also observed that words in hyponymy are not properly embedded at all. This can be attributed to the fact that the hyponymic relations of those words are based on the information gathered through a learning process of a knowledge system, as opposed to a natural process of language acquisition. Thus, it appears that word2vec models based on the distributional hypothesis are limited to representing certain antonymic relations and do not properly represent hyponymic relations at all.

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼