        TF-IDF와 소설 텍스트의 구조를 이용한 주제어 추출 연구

        유은순(Eun-Soon You),최건희(Gun-Hee, Choi),김승훈(Seung-Hoon Kim) 한국컴퓨터정보학회 2015 韓國컴퓨터情報學會論文誌 Vol.20 No.2

        도서 상품에 대한 정보량이 폭증하면서 고객이 도서 선택에 어려움을 겪는 상황이 발생하고 있다. 이에 따라 고객에게 적합한 도서 정보를 제공하여 구매를 유도하는 도서 추천시스템의 중요성이 커지고 있다. 하지만 도서의 서지정보나 사용자 정보 등을 이용한 기존의 추천시스템은 추천 결과의 신뢰도에 문제를 드러내고 있기 때문에 도서 본문 텍스트의 의미적 정보를 추천시스템에 반영하는 것이 필요하다. 따라서 본 논문은 이에 대한 선행연구로 TF-IDF기법과 소설의 외형적 구조를 이용한 소설 텍스트의 주제어 추출 방법을 제안하였다. 이를 위해 100권의 소설텍스트를 수집하고 각각의 소설을 머리말, 대화문, 비대화문, 맺음말의 4개의 구조로 분리한 후 TF-IDF 가중치를 계산하였다. 실험결과 본문 텍스트만을 이용했을 때 보다 머리말과 맺음말을 포함하고 대화문에 가중치를 높게 부여하였을 때 주제어의 추출 정확도가 42.1%의 성능 향상을 보였다. With the explosive growth of information about books, there is a growing number of customers who find it difficult to pick a book. Against the backdrop, the importance of a book recommendation system becomes greater, through which appropriate information about books could be offered then to encourage customers to buy a book in the end. However, existing recommendation systems based on the bibliographical information or user data reveal the reliability issue found in their recommendation results. This is why it is necessary to reflect semantic information extracted from the texts of a book’s main body in a recommendation system. Accordingly, this paper suggests a method for extracting keywords from the main body of novels, as a preceding research, by using TF-IDF method as well as the text structure. To this end, the texts of 100 novels have been collected then to divide them into four structural elements of preface, dialogue, non-dialogue and closing. Then, the TF-IDF weight of each keyword has been calculated. The calculation results show that the extraction accuracy of keywords improves by 42.1% in performance when more weight is given to dialogue while including preface and closing instead of using just the main body.

      • KCI등재

        Classification of Characters in Movie by Correlation Analysis of Genre and Linguistic Style

        Eun-Soon You(유은순),Jae-Won Song(송재원),Seung-Bo Park(박승보) 한국컴퓨터정보학회 2019 韓國컴퓨터情報學會論文誌 Vol.24 No.1

        The character dialogue created by AI is unnatural when compared with human-made dialogue, and it can not reveal the character’s personality properly in spite of remarkable development of AI. The purpose of this paper is to classify characters through the linguistic style and to investigate the relation of the specific linguistic style with the personality. We analyzed the dialogues of 92 characters selected from total 60 movies categorized four movie genres, such as romantic comedy, action, comedy and horror/thriller, using Linguistic Inquiry and Word Count (LIWC), a text analysis software. As a result, we confirmed that there is a unique language style according to genre. Especially, we could find that the emotional tone than analytical thinking are two important features to classify. They were analyzed as very important features for classification as the precision and recall is over 78% for romantic comedy and action. However, the precision and recall were 66% and 50% for comedy and horror/thriller. Their impact on classification was less than romantic comedy and action genre. The characters of romantic comedy deal with the affection between men and women using a very high value of emotional tone than analytical thinking. The characters of action genre who need rational judgment to perform mission have much greater analytical thinking than emotional tone. Additionally, in the case of comedy and horror/thriller, we analyzed that they have many kinds of characters and that characters often change their personalities in the story.

      • KCI등재

        포스트휴먼 시대의 로봇과 인간의 윤리

        유은순(Eun-Soon You),조미라(Mi-Ra Cho) 한국콘텐츠학회 2018 한국콘텐츠학회논문지 Vol.18 No.3

        로봇의 영역이 인간의 정신적, 감정적 노동까지 대신하는 지능형 로봇으로 진화하면서 인간과 로봇 관계에서 발생할 수 있는 ‘로봇윤리’가 중요한 이슈로 떠오르고 있다. 본 연구는 포스트휴먼 시대에 필요한 인간과 로봇의 윤리 성찰을 고찰하고자 하며, 그 중심 내용은 다음과 같다. 첫째, 로봇의 윤리적 실천 가능성에 도전하는 윤리 소프트웨어 개발 사례를 통해 오로지 강제 입력된 윤리 코드만으로 로봇이 과연 옳고 그름을 판단할 수 있는가라는 문제의식에서 출발한다. 둘째, 로봇윤리는 인간의 편향성이 내재된 데이터를 학습했을 때 발생할 수 있는 비윤리적 문제들을 고려하고, 더불어 국가와 문화 간의 윤리적 상대주의를 인정해야 한다. 셋째, 로봇윤리는 로봇을 위한 윤리 강령만이 아니라, 인간과 로봇이 서로 공진화할 수 있는 새로운 개념의 ‘인간 윤리’가 전제되어야 한다. As the field of robots is evolving to intelligent robots that can replace even humans’ mental or emotional labor, ‘robot ethics’ needed in relationship between humans and robots is becoming a crucial issue these days. The purpose of this study is to consider the ethics of robots and humans that is essential in this post-human age. It will deal with the followings as the main contents. First, with the cases of developing ethics software intended to make robots practice ethics, the authors begin this research being conscious about the matter of whether robots can really judge what is right or wrong only with the ethics codes entered forcibly. Second, regarding robot ethics, we should consider unethicality that might arise from learning data internalizing human biasness and also reflect ethical differences between countries or between cultures, that is, ethical relativism. Third, robot ethics should not be just about ethics codes intended for robots but reflect the new concept of ‘human ethics’ that allows humans and robots to coevolve.

      • KCI등재

        공모전을 통한 집단지성 기반의 협업적 디지털 스토리텔링

        유은순(Eun-Soon You),박승보(Seung-Bo Park),이연호(Yeon-Ho Lee),조근식(Geun-Sik Jo) 한국콘텐츠학회 2010 한국콘텐츠학회논문지 Vol.10 No.12

        웹의 발전과 디지털 기술의 등장으로 정보 소비자에 머물렀던 대중들이 미디어 이용을 통해 콘텐츠를 생산하고 공유하면서 정보 제공자로 거듭나고 있다. 이처럼 개인 욕구 기반의 콘텐츠가 증가함에 따라 디지털 스토리텔링을 제작하기 위한 환경과 기술에 큰 관심이 모아지고 있다. 하지만 콘텐츠를 제작하기 위한 기존의 저작도구와 환경은 개인 창작에 초점이 맞추어져 있어 다른 사용자들의 참여와 협업이 어렵고 콘텐츠의 공유와 재활용에도 한계가 있다. 따라서 공모전을 통해 집단지성을 활용한 새로운 형태의 협업적 디지털 스토리텔링 제작 절차를 참가자들에게 제안하였다. 이를 위해 협업 기반의 저작도구를 개발하여 웹 사이트를 통해 참가자들에게 제공함으로써 온라인에서 콘텐츠가 제작될 수 있도록 물질적 환경을 마련하였다. 또한 콘텐츠 결과물뿐만 아니라 콘텐츠를 제작하기 위한 참가자들의 협업 과정도 고려함으로써 기존의 공모전과도 차별성을 두었다. Web development and digital technology enable users not only to consume contents but also to produce and share it through using various media. Thus, since personal needs for contents are increased, the interest in environment and technology for creating digital contents is growing. Because of existing digital contents technology such as writing tool or digital storyboard have focused on the individual creation, it is hard to induce participation and collaboration of other users and sharing and reusing contents. Therefore, we suggest a new form of collaborate digital storytelling using the concept of the collective intelligence through contest. Most of all, we develop writing tool and storyboard tool in order to facilitate participants to produce online contents. Also, distinguished from previous contest, this contest considers not only content output but also collaborative process for making it.

      • KCI등재

        영화 대사의 정량적 분석을 통한 등장인물의 감정과 서사간의 상관성 연구

        유은순(Eun-Soon You) 한국콘텐츠학회 2013 한국콘텐츠학회논문지 Vol.13 No.6

        영화의 언어적 요소인 대사(dialogue)는 영화의 서사 전개에서 중요한 역할을 한다. 하지만 스토리를 이미지로 표현하는 영화의 매체적 특성상 영화 분석의 초점은 주로 영상에 맞추어져 있었고, 대사는 평가절하되거나 그 중요성에 비해 연구가 미흡한 것이 사실이다. 본 연구는 그동안 영화 연구에서 부차적이고 주변적인 위치에 머물러 있던 대사가 서사의 진행에서 어떤 기능을 하는지 살펴보고, 대사가 영화에서 갖는 의미를 조명한다. 이를 위해 영화 속 등장인물들의 발화를 통해 표출된 감정 표현(emotion expressions)들을 대사로부터 수작업으로 선별하여 긍정과 부정으로 극성 분류를 한 후, 감정 표현들의 비율이 어떻게 서사와 연관성을 갖는지를 정량적으로 분석하였다. A linguistic element found in a movie, dialogue, plays a critical role in building up narrative structure. Still, analyses conducted on movies mostly focus on images due to the nature of a movie that conveys a story through its visual images while dialogue has either been underestimated or received less spotlight despite their importance. This study highlights the significance of lines in a movie. This study calls attention to dialogue, which has stayed out of the main focus and been on the periphery thus far when analyzing movies, so as to see how they contribute to constructing a narrative. It then spotlights the significance of dialogue in the movie. To this end, the study sorts out emotional expressions articulated by actors through their dialogues then to make polarity classification into affirmation and negation, followed by a quantitative analysis of how the polarity proportion of emotional expressions changes depending on the narrative structure. The study also suggests a narrative’s relevance with emotions by pointing to dynamic emotional changes that shift between affirmation and negation depending on incidents, conflicts and resolution thereof throughout a movie.

      • KCI등재

        스토리 기반의 정보 검색 연구

        유은순(Eun-Soon You),박승보(Seung-Bo Park) 한국지능정보시스템학회 2013 지능정보연구 Vol.19 No.4

        Video information retrieval has become a very important issue because of the explosive increase in video data from Web content development. Meanwhile, content-based video analysis using visual features has been the main source for video information retrieval and browsing. Content in video can be represented with content-based analysis techniques, which can extract various features from audio-visual data such as frames, shots, colors, texture, or shape. Moreover, similarity between videos can be measured through content-based analysis. However, a movie that is one of typical types of video data is organized by story as well as audio-visual data. This causes a semantic gap between significant information recognized by people and information resulting from content-based analysis, when content-based video analysis using only audio-visual data of low level is applied to information retrieval of movie. The reason for this semantic gap is that the story line for a movie is high level information, with relationships in the content that changes as the movie progresses. Information retrieval related to the story line of a movie cannot be executed by only content-based analysis techniques. A formal model is needed, which can determine relationships among movie contents, or track meaning changes, in order to accurately retrieve the story information. Recently, story-based video analysis techniques have emerged using a social network concept for story information retrieval. These approaches represent a story by using the relationships between characters in a movie, but these approaches have problems. First, they do not express dynamic changes in relationships between characters according to story development. Second, they miss profound information, such as emotions indicating the identities and psychological states of the characters. Emotion is essential to understanding a character’s motivation, conflict, and resolution. Third, they do not take account of events and background that contribute to the story. As a result, this paper reviews the importance and weaknesses of previous video analysis methods ranging from content-based approaches to story analysis based on social network. Also, we suggest necessary elements, such as character, background, and events, based on narrative structures introduced in the literature. We extract characters’ emotional words from the script of the movie Pretty Woman by using the hierarchical attribute of WordNet, which is an extensive English thesaurus. WordNet offers relationships between words (e.g., synonyms, hypernyms, hyponyms, antonyms). We present a method to visualize the emotional pattern of a character over time. Second, a character’s inner nature must be predetermined in order to model a character arc that can depict the character’s growth and development. To this end, we analyze the amount of the characters dialogue in the script and track the character’s inner nature using social network concepts, such as in-degree (incoming links) and out-degree (outgoing links). Additionally, we propose a method that can track a character’s inner nature by tracing indices such as degree, in-degree, and out-degree of the character network in a movie through its progression. Finally, the spatial background where characters meet and where events take place is an important element in the story. We take advantage of the movie script to extracting significant spatial background and suggest a scene map describing spatial arrangements and distances in the movie. Important places where main characters first meet or where they stay during long periods of time can be extracted through this scene map. In view of the aforementioned three elements (character, event, background), we extract a variety of information related to the story and evaluate the performance of the proposed method. We can track story information extracted over time and detect a change in the character’s emotion or inner nature, spatial movement,

      • KCI등재

        '디지털 분석 도구를 활용한 문학 연구 : 라클로의 『위험한 관계Les liaisons dangereuses』를 중심으로

        류선정,유은순,RYU Sun-Jung,YOU Eun-Soon 국제문화기술진흥원 2024 The Journal of the Convergence on Culture Technolo Vol.10 No.3

        본 연구는 디지털 분석 도구를 활용하여 서간체 형식의 18세기 리베르탱 소설의 걸작으로 꼽히는 『위험한 관계』를 대상으로 '리베르티나주(libertinage)'를 둘러싼 이성과 감정의 문제를 계량적으로 분석하였다. 첫째, Voyant과 LIWC 22의 사용 단어 빈도수 분석을 통해 리베르티나주가 'love'와 'time'과 같은 키워드로 발현되었음을 확인하였다. 둘째, Voyant의 'Contexts' 기능을 통해 발몽이 투르벨 부인에게 보낸 편지들과 메르퇴유 부인에게 보낸 편지들은 모두 'love'를 중심 테마로 하고 있지만, 전자에서는 감정적 어휘들이, 후자에서는 전략적 어휘들이 더 많이 사용된 것을 확인하였다. 그리고 메르퇴유가 보낸 편지에서 가장 많이 사용된 어휘는 'time'으로서 'love'보다 빈도수가 더 높은 것을 확인하였다. 셋째, LIWC 22를 이용하여 주요 인물들이 주고받은 편지들을 대상으로 인물별, 각부별 '분석적 사고(analytic thinking)'와 '감정적 어조(emotional tone)'를 각각 측정하고 분석하였다. 이상의 분석 결과들은 『위험한 관계』가 18세기 프랑스의 계몽주의 시대 때 배척의 대상이었던 '감정'이라는 문제에 천착하고 있는 작품임을, 그리고 루소의 『신 엘로이즈』처럼 낭만주의를 예고하는 작품임을 뒷받침하는 중요한 근거로서 유의미할 것이다. We This study aimed to quantitatively analyze the theme of 'libertinage' and the associated issues of reason and emotion in 『Dangerous Liaisons』, a novel considered a masterpiece of libertine literature and an epistolary novel of the 18th century, using digital analysis tools. First, based on the frequency analysis of word usage using Voyant and LIWC 22, we confirmed that libertinage is manifested with keywords such as 'love' and 'time'. With Voyant's 'Contexts' feature, it was found that the letters sent by Valmont to Madame de Tourvel and those sent by Madame de Merteuil both have 'love' as the central theme. However, emotional vocabulary was higher in the former, whereas strategic vocabulary was more prevalent in the latter. Additionally, it was observed that the most frequently used word in the letters sent by Madame de Merteuil is 'time', with a higher frequency than 'love'. Thirdly, using LIWC 22, we measured the analytical thinking and emotional tone of the letters exchanged by the main characters, and analyzed how these values changed according to the chapters. Through these analyses, we confirmed that this novel, alongside Rousseau's "New Eloise," anticipates romanticism by embracing the theme of 'emotion,' which was rejected by 18th-century Enlightenment ideals.

      • KCI등재
      • KCI등재

        문학과 트랜스미디어 스토리텔링 간의 상관성 연구

        류선정(RYU SunJung),유은순(YOU Eun Soon) 인문콘텐츠학회 2017 인문콘텐츠 Vol.0 No.46

        본 논문은 트랜스미디어 스토리텔링(transmedia storytelling)과 같은 현대 이야기 기술의 출현을 디지털 환경과 미디어의 활용적 측면에서 조명했던 기존의 연구에서 벗어나 그 기술의 근원이 문학에서 출발하고 있음을 밝히고자 하였다. 이를 위해 본 연구는 ‘19세기 프랑스 사회의 벽화’라고 일컬어지는 오노레 드 발자크(Honoré de Balzac)의 소설집 『인간희극(La Comédie humaine)』에 나타난 현대의 트랜스미디어 스토리텔링의 특징적 요소들을 밝혀내는 것을 목적으로 하였다. 그리고 다음과 같은 연구 결과를 통해 발자크가 얼마나 시대를 앞서가는 혁신적인 글쓰기를 시도하였던 작가인지를 확인할 수 있었다. 첫째, 트랜스미디어 스토리텔링은 다수의 미디어 사용과 다양한 변형을 통해 이야기 세계를 확장해나가는 것을 특징으로 한다. 발자크는 19세기라는 시대의 특성상 다양한 매체로 외연의 확장을 시도할 수는 없었지만, 이야기의 변형을 통해 다양한 이야기를 양산하는 내연의 확장을 기획하고 추진하였다. 『인간희극』은 90여 편에 이르는 소설들이 서로 유기적인 관계를 띠면서 하나의 세계를 형성하고 있는데, 이는 하나의 이야기가 여러 텍스트로 나누어져 전체 이야기를 새롭고 풍부하게 하는 트랜스미디어 스토리텔링의 특성과 부합한다. 둘째, 발자크가 독립된 텍스트들 사이의 연결 고리를 형성하기 위해 사용했던 대표적인 서사 기술은 ‘인물 재출현’기법이다. 이것은 한 인물이 여러 작품에 직접 출현하거나 다른 인물들에 의해 언급되는 기법으로, 트랜스미디어 스토리텔링의 대표적인 성공 사례로 꼽히는 마블(Marvel)의 슈퍼 히어로 영화에서 이야기 확장의 전략으로 자주 사용되는 ‘크로스오버(crossover)’의 원형을 제시하고 있다. 셋째, 발자크의 인간희극의 서사 구조가 탈중심성과 열린 결말을 띤다는 점에서 당대의 소설들과 구별되고, 오히려 현대의 트랜스미디어 스토리텔링에 대응된다. 발자크가 시도한 비선형적 서사 구조와 복수의 플롯, 암시와 복선 등은 작품에 대한 독자들의 흥미와 상상력을 자극하여 작품에 대한 관심을 지속적으로 유발시켰다. 마지막으로 트랜스미디어 스토리텔링에서처럼 발자크 또한 독자들의 적극적인 참여와 공유, 생산을 이끌어 냈다는 점이다. 발자크의 소설들 사이에 관찰되는 모순점과 오류, 비선형적 구조와 불연속적인 특징 등은 오히려 독자로 하여금 상상 속에서 등장인물들의 다양한 가능성을 탐색하게 하고 그들의 삶을 재구성하게 함으로써 개별 작품 사이의 간극을 좀 더 풍부한 의미로 채워질 수 있도록 하였다. 본 연구는 발자크의 작법과 트랜스미디어 스토리텔링의 서사 전략이 어떻게 상응하는지를 살펴봄으로써 오늘날 발자크가 왜 근대성의 미학을 구현한 작가로 평가받고 있는지를 다시 한 번 확인할 수 있었다. This research paper investigates how modern narrative technique, such as transmedia storytelling, has emerged by tracing back to its origin in the literature through expansion of research scope beyond the existing focus on the digital environment and media usage. To this end, the research aims to identify distinctive elements of modern transmedia storytelling from the collection of interlinked novels written by Honoré de Balzac entitled “La Comédie humaine” or “The Human Comedy”in English. The collection is often known as the literary fresco of the French society in the 19th Century. The following research outcomes have affirmed that Balzac was a writer exploring a revolutionary writing technique way ahead of his time. First, transmedia storytelling is distinctive in that the narrative world expands while transcending diverse media and going through diverse transformations. Although Balzac could not try external expansion through diverse media due to the limitations unique to the times back in the 19th Century, the writer pursued the internal expansion of his narrative world by transforming and producing a great quantity of various stories. The collection, “The Human Comedy”, creates a world shaped by a vast series of some 90 organically interlinked novels. This is akin to transmedia storytelling, which offers new perspectives into and enriches the entire plot by breaking down one story into many different texts. Second, “personage reappearance” is Balzac’s representative narrative technique aimed to interlink independent texts. The technique allows one personage either to appear or to be mentioned by other personages in many of Balzac’s writings. This technique can be suggested as the origin of crossover strategy often used by Marvel to expand the cinematic universe for its superhero films. The strategy is cited as a classic case of successfully utilizing transmedia storytelling. Third, “La Comédie humaine”by Balzac is different from novels of his time in that its narrative structure is decentralized and open-ended. It rather corresponds to transmedia storytelling of modern times. Balzac’s attempts at nonlinear narrative structure, multiple plots, hints and foreshadowings spark readers’ interests and imagination, thereby capturing readers’ lasting attention for his writings. Lastly, Balzac encourages active participation, sharing and creation by readers just as transmedia storytelling does. The contradictions, errors, nonlinear structure, discontinuity and other features that characterize Balzac’s novels invite readers to use their imagination for exploring various possibilities that can spread out for characters whose life can be reconstructed afterwards. Then, the gaps separating individual piece in his collection can be filled with a wealth of meaning. This paper investigates how Balzac’s writing technique corresponds to the narrative strategy found in transmedia storytelling while reaffirming why, in our times, Balzac is deemed to be a writer who had materialized the aesthetics of modernity.

      • KCI등재

        추천 시스템의 성능 안정성을 위한 예측적 군집화 기반 협업 필터링 기법

        이오준(O-Joun Lee),유은순(Eun-Soon You) 한국지능정보시스템학회 2015 지능정보연구 Vol.21 No.1

        With the explosive growth in the volume of information, Internet users are experiencing considerable difficulties in obtaining necessary information online. Against this backdrop, ever-greater importance is being placed on a recommender system that provides information catered to user preferences and tastes in an attempt to address issues associated with information overload. To this end, a number of techniques have been proposed, including content-based filtering (CBF), demographic filtering (DF) and collaborative filtering (CF). Among them, CBF and DF require external information and thus cannot be applied to a variety of domains. CF, on the other hand, is widely used since it is relatively free from the domain constraint. The CF technique is broadly classified into memory-based CF, model-based CF and hybrid CF. Model-based CF addresses the drawbacks of CF by considering the Bayesian model, clustering model or dependency network model. This filtering technique not only improves the sparsity and scalability issues but also boosts predictive performance. However, it involves expensive model-building and results in a tradeoff between performance and scalability. Such tradeoff is attributed to reduced coverage, which is a type of sparsity issues. In addition, expensive model-building may lead to performance instability since changes in the domain environment cannot be immediately incorporated into the model due to high costs involved. Cumulative changes in the domain environment that have failed to be reflected eventually undermine system performance. This study incorporates the Markov model of transition probabilities and the concept of fuzzy clustering with CBCF to propose predictive clustering-based CF (PCCF) that solves the issues of reduced coverage and of unstable performance. The method improves performance instability by tracking the changes in user preferences and bridging the gap between the static model and dynamic users. Furthermore, the issue of reduced coverage also improves by expanding the coverage based on transition probabilities and clustering probabilities. The proposed method consists of four processes. First, user preferences are normalized in preference clustering. Second, changes in user preferences are detected from review score entries during preference transition detection. Third, user propensities are normalized using patterns of changes (propensities) in user preferences in propensity clustering. Lastly, the preference prediction model is developed to predict user preferences for items during preference prediction. The proposed method has been validated by testing the robustness of performance instability and scalability-performance tradeoff. The initial test compared and analyzed the performance of individual recommender systems each enabled by IBCF, CBCF, ICFEC and PCCF under an environment where data sparsity had been minimized. The following test adjusted the optimal number of clusters in CBCF, ICFEC and PCCF for a comparative analysis of subsequent changes in the system performance. The test results revealed that the suggested method produced insignificant improvement in performance in comparison with the existing techniques. In addition, it failed to achieve significant improvement in the standard deviation that indicates the degree of data fluctuation. Notwithstanding, it resulted in marked improvement over the existing techniques in terms of range that indicates the level of performance fluctuation. The level of performance fluctuation before and after the model generation improved by 51.31% in the initial test. Then in the following test, there has been 36.05% improvement in the level of performance fluctuation driven by the changes in the number of clusters. This signifies that the proposed method, despite the slight performance improvement, clearly offers better performance stability compared to the existing techniques. Further research on this study will be direct

