RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제
      • 좁혀본 항목 보기순서

        • 원문유무
        • 원문제공처
          펼치기
        • 등재정보
          펼치기
        • 학술지명
          펼치기
        • 주제분류
          펼치기
        • 발행연도
          펼치기
        • 작성언어
        • 저자
          펼치기

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • 무료
      • 기관 내 무료
      • 유료
      • KCI등재

        콜모고로프-스미르노프 통계량을 이용한 구간형 심볼릭 반응변수 의사결정나무 연구

        이성건 한국자료분석학회 2017 Journal of the Korean Data Analysis Society Vol.19 No.4

        심볼릭 데이터(symbolic data)는 의료, 기업, 사회과학, 정부 등 다양한 분야에서 나타나고 있다. 심볼릭 데이터 분석은 특히 빅데이터의 복잡하고 다양한 데이터에 대한 주요한 분석방법으로 주목받고 있다. 심볼릭 데이터에서 주로 다루는 데이터로는, 구간값(interval-valued) 데이터, 다중값(multi-valued) 데이터, 히스토그램(histogram) 데이터 등이 있다. 통계학에서 전통적으로 다루고 있는 데이터도 심볼릭 데이터로 변환이 가능하다. 이는 데이터의 크기를 줄이는 방법으로 사용되기도 한다. 본 연구에서는 K-S 통계량(Kolmogorov-Smirnov statistic)을 이용하여 구간형 반응변수를 갖는 의사결정나무를 제안하고자 한다. 기존의 심볼릭 데이터에 대한 의사결정나무는 독립변수가 심볼릭인 경우에 대한 것이 대부분이어서 반응변수가 심볼릭인 경우에는 적용할 수 없다. 본 연구의 의사결정나무는 구간형 반응변수에 대해 분리변수를 찾기 위한 분리기준으로 K-S 통계량을 이용하였으며, K-S 통계량은 구간형 데이터의 경험적분포함수를 이용하였다(Lee, 2016). 실제 적용 사례로 국내 A병원의 혈압데이터(이완기, 수축기)에 대해 제안된 방법으로 의사결정나무를 구축하고 해석하였다. 제안된 방법이 구간형 자료에 대해 효율적임을 확인하였다. Symbolic data are from various field of applications, such as medical, industry, social sciences, government experiment etc.. Symbolic data analysis is new methods that treat the underlying informations on the given raw data. It is crucial for the complex system of big data. Symbolic data cover interval-valued data, multi-valued data, histogram-valued data etc.. Classical data variables can be changed into symbolic data variables. It can be used to reduce the size of data. In this study, a decision tree for symbolic response using Kolmogorov-Smirnov statistics is considered. This can be extended to other type of symbolic data. We are interested in the selection of split variables to grow the tree having interval-valued response. We consider Kolmogorov-Smirnov (K-S) statistics as split criterion. To construct tree, we developed the empirical distributions of intervals (Lee, 2016) and put it in the decision tree building process. To compare the method with classical ones, blood pressure data (systolic, diastolic) is used as an applications. We can see that the proposed method is useful for an interval response.

      • KCI등재

        A Study on Two Sample Test for Interval-Valued Symbolic Data

        이성건 한국자료분석학회 2016 Journal of the Korean Data Analysis Society Vol.18 No.6

        Symbolic data appear from various field of applications, such as social sciences, medical, industry and government experiment etc.. Symbolic data analysis treats new concepts that are underlying on the given raw data. It is important of the complex nature of big data. It can be multi-valued data, interval-valued data, histogram-valued data. Classical variables can be transformed into symbolic variables. So, we can reduce the size of the data. In this study, we consider statistical tests of two sample symbolic data, especially on interval-valued variables. It can be easily extended to other symbolic data such as histogram- valued variables. We are interested in two sample statistical tests for interval-valued variables. The first approach could be Kolmogorov-Smirnov (K-S) test for intervals. To construct K-S tests, we define empirical distributions of intervals and then compare proposed tests to classical ones. The p-value of the tests is calculated using permutation techniques with R. Blood pressure data is used as an applications to investigate their properties. We can find that the proposed method is competitive.

      • KCI등재

        퍼지 관계를 활용한 사례기반추론 예측 정확성 향상에 관한 연구

        이인호(In Ho Lee),신경식(Kyung-shik Shin) 한국지능정보시스템학회 2010 지능정보연구 Vol.16 No.4

        In terms of business, forecasting is a work of what is expected to happen in the future to make managerial decisions and plans. Therefore, the accurate forecasting is very important for major managerial decision making and is the basis for making various strategies of business. But it is very difficult to make an unbiased and consistent estimate because of uncertainty and complexity in the future business environment. That is why we should use scientific forecasting model to support business decision making, and make an effort to minimize the model's forecasting error which is difference between observation and estimator. Nevertheless, minimizing the error is not an easy task. Case-based reasoning is a problem solving method that utilizes the past similar case to solve the current problem. To build the successful case-based reasoning models, retrieving the case not only the most similar case but also the most relevant case is very important. To retrieve the similar and relevant case from past cases, the measurement of similarities between cases is an important key factor. Especially, if the cases contain symbolic data, it is more difficult to measure the distances. The purpose of this study is to improve the forecasting accuracy of case-based reasoning approach using fuzzy relation and composition. Especially, two methods are adopted to measure the similarity between cases containing symbolic data. One is to deduct the similarity matrix following binary logic(the judgment of sameness between two symbolic data), the other is to deduct the similarity matrix following fuzzy relation and composition. This study is conducted in the following order; data gathering and preprocessing, model building and analysis, validation analysis, conclusion. First, in the progress of data gathering and preprocessing we collect data set including categorical dependent variables. Also, the data set gathered is cross-section data and independent variables of the data set include several qualitative variables expressed symbolic data. The research data consists of many financial ratios and the corresponding bond ratings of Korean companies. The ratings we employ in this study cover all bonds rated by one of the bond rating agencies in Korea. Our total sample includes 1,816 companies whose commercial papers have been rated in the period 1997~2000. Credit grades are defined as outputs and classified into 5 rating categories(A1, A2, A3, B, C) according to credit levels. Second, in the progress of model building and analysis we deduct the similarity matrix following binary logic and fuzzy composition to measure the similarity between cases containing symbolic data. In this process, the used types of fuzzy composition are max-min, max-product, max-average. And then, the analysis is carried out by case-based reasoning approach with the deducted similarity matrix. Third, in the progress of validation analysis we verify the validation of model through McNemar test based on hit ratio. Finally, we draw a conclusion from the study. As a result, the similarity measuring method using fuzzy relation and composition shows good forecasting performance compared to the similarity measuring method using binary logic for similarity measurement between two symbolic data. But the results of the analysis are not statistically significant in forecasting performance among the types of fuzzy composition. The contributions of this study are as follows. We propose another methodology that fuzzy relation and fuzzy composition could be applied for the similarity measurement between two symbolic data. That is the most important factor to build case-based reasoning model.

      • KCI등재

        심볼릭 인공지능을 위한 R 심볼릭 데이터분석

        전성해(Sunghae Jun) 한국지능시스템학회 2017 한국지능시스템학회논문지 Vol.27 No.5

        컴퓨터와 인간은 분명 다르지만 기본적으로 데이터를 저장하고 처리하는 개념적 측면에서는 서로 유사한 구조를 갖는다. 하지만 수집된 전체 데이터를 처리하고 분석하는 컴퓨터와는 달리 인간은 요약된 패턴 단위로 데이터를 처리한다. 즉 인간은 전체 데이터를 다루기보다는 요약된 정보를 통해 최적의 의사결정을 한다. 전체 데이터보다 요약된 정보만을 관리하면 시간과 비용 면에서 더 효율적인 시스템을 구축할 수 있다. 특히 빅데이터 환경에서 인공지능의 학습을 위한 대용량 데이터의 처리 및 분석을 위하여 요약된 정보에 기반 한 데이터학습에 대한 필요성이 제기되고 있다. 본 연구에서는 이와 같이 요약된 정보에 기반 한 심볼릭 인공지능 시스템의 효율적인 구축을 위하여 통계학의 심볼릭 데이터분석에 대하여 연구한다. 특히 대표적인 데이터언어인 R에서 제공하는 심볼릭 데이터분석 함수를 이용한 심볼릭 인공지능에 대한 방법을 소개한다. 제안방법의 성능평가를 위하여 객관적인 기계학습 데이터 사례를 이용하였다. Computers and humans are different, but basically they have a similar structure in conceptual aspects of data storing and processing. However, unlike computers that process and analyze the entire data collected, humans process the data in a summarized pattern. In other words, humans make the best decisions through summarized information rather than whole data. By managing only summarized information, you can build a more efficient system in terms of time and cost. In particular, there is a need for learning from data based on summarized information for processing and analyzing large amounts of data for artificial intelligence learning in a big data environment. In this paper, symbolic data analysis of statistics is studied for efficient construction of symbolic artificial intelligence system based on the information summarized in this way. We introduce a method for symbolic artificial intelligence using symbolic data analysis functions provided by R data language. In order to evaluate the performance of proposed method, objective machine learning data were used.

      • KCI등재

        비대칭적 유사도와 CSV를 이용한 심볼릭 클러스터링 알고리듬

        박찬웅(Chanwoong Park),오승준(SeungJoon Oh) 한국정보기술학회 2012 한국정보기술학회논문지 Vol.10 No.11

        Dealing with symbolic data has become quite common in data mining as well as data analysis. Symbolic data analysis deals with variables that can have intervals, histograms, and even functions as values. Much of previous work has been based on numeric data only. This paper investigates the problem of clustering symbolic data which are described by interval attributes and multivalued attributes. We propose a nonsymmetric proximity measurement for estimating the degree of similarity between symbolic data. Similarity between two interval attributes is proportional to the length of the common part between interval attributes, and similarity between two multivalued attributes is proportional to the intersection between their attributes. Also, a method for hierarchical clustering based on the combined similarity value(CSV) is explored. The efficacy of the proposed algorithm is experimentally shown and its validity is verified by comparing with the related methodologies.

      • KCI등재후보

        Forecasting Symbolic Candle Chart-Valued Time Series

        Park, Heewon,Sakaori, Fumitake The Korean Statistical Society 2014 Communications for statistical applications and me Vol.21 No.6

        This study introduces a new type of symbolic data, a candle chart-valued time series. We aggregate four stock indices (i.e., open, close, highest and lowest) as a one data point to summarize a huge amount of data. In other words, we consider a candle chart, which is constructed by open, close, highest and lowest stock indices, as a type of symbolic data for a long period. The proposed candle chart-valued time series effectively summarize and visualize a huge data set of stock indices to easily understand a change in stock indices. We also propose novel approaches for the candle chart-valued time series modeling based on a combination of two midpoints and two half ranges between the highest and the lowest indices, and between the open and the close indices. Furthermore, we propose three types of sum of square for estimation of the candle chart valued-time series model. The proposed methods take into account of information from not only ordinary data, but also from interval of object, and thus can effectively perform for time series modeling (e.g., forecasting future stock index). To evaluate the proposed methods, we describe real data analysis consisting of the stock market indices of five major Asian countries'. We can see thorough the results that the proposed approaches outperform for forecasting future stock indices compared with classical data analysis.

      • KCI등재후보

        Exploratory Methods for Joint Distribution Valued Data and Their Application

        Igarashi, Kazuto,Minami, Hiroyuki,Mizuta, Masahiro The Korean Statistical Society 2015 Communications for statistical applications and me Vol.22 No.3

        In this paper, we propose hierarchical cluster analysis and multidimensional scaling for joint distribution valued data. Information technology is increasing the necessity of statistical methods for large and complex data. Symbolic Data Analysis (SDA) is an attractive framework for the data. In SDA, target objects are typically represented by aggregated data. Most methods on SDA deal with objects represented as intervals and histograms. However, those methods cannot consider information among variables including correlation. In addition, objects represented as a joint distribution can contain information among variables. Therefore, we focus on methods for joint distribution valued data. We expanded the two well-known exploratory methods using the dissimilarities adopted Hall Type relative projection index among joint distribution valued data. We show a simulation study and an actual example of proposed methods.

      • KCI등재

        구간형 자료의 주성분 분석에 관한 연구

        최수진,강기훈 한국통계학회 2020 응용통계연구 Vol.33 No.1

        Interval-valued data, one type of symbolic data, are observed in the form of intervals rather than single values. Each interval-valued observation has an internal variation. Principal component analysis reduces the dimension of data by maximizing the variance of data. Therefore, the principal component analysis of the interval-valued data should account for the variance between observations as well as the variation within the observed intervals. In this paper, three principal component analysis methods for interval-valued data are summarized. In addition, a new method using a truncated normal distribution has been proposed instead of a uniform distribution in the conventional quantile method, because we believe think there is more information near the center point of the interval. Each method is compared using simulations and the relevant data set from the OECD. In the case of the quantile method, we draw a scatter plot of the principal component, and then identify the position and distribution of the quantiles by the arrow line representation method. 심볼릭 자료 중 하나인 구간형 자료는 모든 관측값에서 단일 값이 아닌 구간을 값으로 취하며, 관측값 내에 변동이존재한다는 특징을 갖는다. 주성분 분석은 자료의 분산을 최대로 설명하여 자료의 차원을 축소하는 방법이므로 구간형 자료의 주성분 분석은 관측값 간의 분산 뿐만 아니라 관측값 내의 분산 역시 설명하여야 한다. 본 논문에서는 구간형 자료의 세 가지 주성분 분석법을 소개하고자 한다. 또한 기존의 분위수 방법에서 균일분포를 사용하는 것이 아니라 구간의 중심점 부근이 좀 더 많은 정보를 가지고 있는 것으로 보고 절단정규분포를 사용하는 방법을 제안하였다. 모의실험과 OECD 관련 실제 통계 자료를 통하여 각 방법의 결과를 비교해 보았다. 마지막으로 분위수 방법의경우 화살표 표현법을 통해 주성분 산점도를 그리고 분위수들의 위치와 분포를 확인하였다.

      • SCOPUSKCI등재

        Transmission Performance of Half-Symbol-Rate-Carrier Offset QPSK Modulation in Band-limited Channels

        Yeo, Hyeop-Goo The Korea Institute of Information and Commucation 2009 Journal of information and communication convergen Vol.7 No.2

        This paper examines the BER performance of the recently proposed half-symbol-rate-carrier (HSRC) offset quadrature phase-shift-keying (OQPSK) receiver for high-speed data communication. A modified demodulation technique using a bit-time period signal integration, the bit-error-rate (BER) performance of the HSRC-OQPSK signal improves more than 4dB compared to that of a demodulation technique using a symbol-time period integration. This paper also examines the BER performance of modified demodulation with various band-limited channels modeled using low-pass filters, and the three different data-rate systems are simulated and compared with the performance of the system using the conventional demodulation technique.

      • KCI등재

        반도체공정 이상탐지 및 클러스터링을 위한 심볼릭 표현법의 적용

        노웅기(Woong-Kee Loh),홍상진(Sang Jeen Hong) 한국정보과학회 2009 정보과학회 컴퓨팅의 실제 논문지 Vol.15 No.11

        반도체(semiconductor) 기술은 1950년대에 집적 회로(integrated circuit, IC)가 발명된 이후 오늘날까지 급속한 발전을 거듭하고 있다. 하나의 완전한 반도체를 제조하기 위해서는 매우 다양하고 긴공정을 거쳐야 한다. 반도체 제조 생산성을 높이기 위하여 공정들이 종료되기 전에 미리 이상(fault)을 발견하기 위한 이상탐지 및 분류(fault detection and classification, FOC)에 대한 많은 연구가 진행되고 있다. 이를 위하여 다양한 반도체 장비에 갖가지 종류의 센서를 부착하여 일정한 시간 간격으로 원하는 값을 측정한다. 이러한 측정 값은 실수 값들의 연속이므로 시계열(time- seIies) 데이터의 일종이다. 본 논문에서는 반도체 공정에서의 이상탐지 및 클러스터링을 수행하는 알고리즘을 제안한다. 제안된 알고리즘은 시계열 데이터를 심불릭 표현법(symbolic representation) 으로 변환하여 이상을 탐지하는 기존의 알고리즘을 수정한 것이다. 본 논문의 공헌은 일반적인 시계열 데이터에 대한 기존의 이상탐지 알고리즘을 수정하여 반도체 공정 데이터에 대해서도 활용할 수 있음을 보일 뿐만 아니라, 이상탐지 및 클러스터링의 정확성을 높이는 실험 결과를 제시하는 것이다. 실험 결과, 본 논문에서 제안한 알고리즘은 긍정 오류(false positive) 및 부정 오류(false negative)를 모두 발생하지 않았다. Since the invention of the integrated circuit (IC) in 1950s, semiconductor technology has undergone dramatic development up to these days. A complete semiconductor is manufactured through a diversity of processes. For better semiconductor productivity, fault detection and classification (FDC) has been rigorously studied for finding faults even before the processes are completed. For FDC, various kinds of sensors are attached in many semiconductor manufacturing devices, and sensor values are collected in a periodic manner. The collection of scnsor values consists of sequences of real numbers, and hence is regarded as a kind of time-series data. In this paper, we propose an algorithm for dctecting and clustering faults in semiconductor processes. The proposed algorithm is a modification of the existing anomaly detection algorithm dealing with symbolically-represented time-series. The contributions of this paper are: (1) showing that a modification of the existing anomaly detection algorithm dealing with general time-series could be used for semiconductor process data and (2) presenting experimental results for improving correctness of fault detection and clustering. As a result of our expeliment, the proposed algorithm caused neither false positive nor false negative.

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼