RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제
      • 좁혀본 항목 보기순서

        • 원문유무
        • 원문제공처
          펼치기
        • 등재정보
          펼치기
        • 학술지명
          펼치기
        • 주제분류
          펼치기
        • 발행연도
          펼치기
        • 작성언어
        • 저자
          펼치기

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • 무료
      • 기관 내 무료
      • 유료
      • Generalization Threshold Optimization of Fuzzy Rough Set algorithm in Healthcare Data Classification

        Beibei Dong,Yu Liu,Benzhen Guo,Xiao Zhang 보안공학연구지원센터 2016 International Journal of Database Theory and Appli Vol.9 No.3

        There is ineffective classification problem in application of K-means clustering algorithm in massive data cluster analysis. This paper presents a K-means algorithm based on generalization threshold rough set optimization weight. Firstly, utilize attribute order described method, using the average distance calculation with Laplace method to optimize the generalization threshold of fuzzy rough set , then the Euclidean distance metric is used in the calculation of the similarity of K-means algorithm, introducing the variation coefficient into the cluster analysis, clustering the Euclidean distance weighted K-means algorithm totally based on data, finally, combine the rough set algorithm based on the generalization threshold optimization and K-means clustering algorithm, applied to medical and health data classification. The K-means algorithm based on generalization threshold rough set optimization weight presented by this paper has a better effect on medical and health data classification.

      • KCI등재

        AMI로부터 측정된 전력사용데이터에 대한 군집 분석

        안효정,임예지 한국통계학회 2021 응용통계연구 Vol.34 No.6

        We cluster the electricity consumption of households in A-apartment in Seoul, Korea using Hierarchical $K$-means clustering algorithm. The data is recorded from the advanced metering infrastructure (AMI), and we focus on the electricity consumption during evening weekdays in summer. Compare to the conventional clustering algorithms, Hierarchical $K$-means clustering algorithm is recently applied to the electricity usage data, and it can identify usage patterns while reducing dimension. We apply Hierarchical $K$-means algorithm to the AMI data, and compare the results based on the various clustering validity indexes. The results show that the electricity usage patterns are well-identified, and it is expected to be utilized as a major basis for future applications in various fields. 본 연구에서는 Hierarchical $K$-means 군집화 알고리즘을 이용해 서울의 A아파트 가구들의 전력 사용량 패턴을 군집화 하였다. 차원을 축소해주면서 패턴을 파악할 수 있는 Hierarchical $K$-means 군집화 알고리즘은 기존 $K$-means 군집화 알고리즘의 단점을 보완하여 최근 대용량 전력 사용량 데이터에 적용되고 있는 방법론이다. 본 연구에서는 여름 저녁 피크 시간대의 시간당 전력소비량 자료에 대해 군집화 알고리즘을 적용하였으며, 다양한 군집 개수와 level에 따라 얻어진 결과를 비교하였다. 결과를 통해 사용량에 따라 패턴이 군집화 됨을 확인하였으며, 군집화 유효성 지수들을 통해 이를 비교하였다.

      • KCI등재

        Approximate k values using Repulsive Force without Domain Knowledge in k-means

        ( Jung-jae Kim ),( Minwoo Ryu ),( Si-ho Cha ) 한국인터넷정보학회 2020 KSII Transactions on Internet and Information Syst Vol.14 No.3

        The k-means algorithm is widely used in academia and industry due to easy and simple implementation, enabling fast learning for complex datasets. However, k-means struggles to classify datasets without prior knowledge of specific domains. We proposed the repulsive k-means (RK-means) algorithm in a previous study to improve the k-means algorithm, using the repulsive force concept, which allows deleting unnecessary cluster centroids. Accordingly, the RK-means enables to classifying of a dataset without domain knowledge. However, three main problems remain. The RK-means algorithm includes a cluster repulsive force offset, for clusters confined in other clusters, which can cause cluster locking; we were unable to prove RK-means provided optimal convergence in the previous study; and RK-means shown better performance only normalize term and weight. Therefore, this paper proposes the advanced RK-means (ARK-means) algorithm to resolve the RK-means problems. We establish an initialization strategy for deploying cluster centroids and define a metric for the ARK-means algorithm. Finally, we redefine the mass and normalize terms to close to the general dataset. We show ARK-means feasibility experimentally using blob and iris datasets. Experiment results verify the proposed ARK-means algorithm provides better performance than k-means, k’-means, and RK-means.

      • KCI등재

        다목적 유전자 알고리즘을 이용한문서 클러스터링

        이정송(Jung Song Lee),박순철(Soon Cheol Park) 한국산업정보학회 2012 한국산업정보학회논문지 Vol.17 No.2

        본 논문에서는 텍스트 마이닝 분야에서 중요한 부분을 차지하고 있는 문서 클러스터링을 위하여 다목적 유전자 알고리즘을 제안한다. 문서 클러스터링에 있어 중요한 요소 중 하나는 유사한 문서를 그룹화 하는 클러스터링 알고리즘이다. 지금까지 문서 클러스터링에는 k-means 클러스터링, 유전자 알고리즘 등을 사용한 연구가 많이 진행되고 있다. 하지만 k-means 클러스터링은 초기 클러스터 중심에 따라 성능 차이가 크며 유전자 알고리즘은 목적 함수에 따라 지역 최적해에 쉽게 빠지는 단점을 갖고 있다. 본 논문에서는 이러한 단점을 보완하기 위하여 다목적 유전자 알고리즘을 문서 클러스터링에 적용해 보고, 기존의 알고리즘과 정확성을 비교 및 분석한다. 성능 시험을 통해 k-means 클러스터링(약 20%)과 기존의 유전자 알고리즘(약 17%)을 비교할 때 본 논문에서 제안한 다목적 유전자 알고리즘의 성능이 월등하게 향상됨을 보인다. In this paper, the multi-objective genetic algorithm is proposed for the document clustering which is important in the text mining field. The most important function in the document clustering algorithm is to group the similar documents in a corpus. So far, the k-means clustering and genetic algorithms are much in progress in this field. However, the k-means clustering depends too much on the initial centroid, the genetic algorithm has the disadvantage of coming off in the local optimal value easily according to the fitness function. In this paper, the multi-objective genetic algorithm is applied to the document clustering in order to complement these disadvantages while its accuracy is analyzed and compared to the existing algorithms. In our experimental results, the multi-objective genetic algorithm introduced in this paper shows the accuracy improvement which is superior to the k-means clustering(about 20 %) and the general genetic algorithm (about 17 %) for the document clustering.

      • KCI등재

        맵리듀스를 이용한 다중 중심점 집합 기반의 효율적인 클러스터링 방법

        강성민(Sungmin Kang),이석주(Seokjoo Lee),민준기(Jun-ki Min) 한국정보과학회 2015 정보과학회 컴퓨팅의 실제 논문지 Vol.21 No.7

        데이터 사이즈가 증가함에 따라서 대용량 데이터를 분석하여 데이터의 특성을 파악하는 것이 매우 중요해졌다. 본 논문에서는 분산 병렬 처리 프레임워크인 맵리듀스를 활용한 k-Means 클러스터링 기반의 효과적인 클러스터링 기법인 MCSK-Means (Multi centroid set k-Means)알고리즘을 제안한다. k-Means 알고리즘은 임의로 정해지는 k개의 초기 중심점들의 위치에 따라서 클러스터링 결과의 정확도가 많은 영향을 받는 문제점을 가지고 있다. 이러한 문제를 해결하기 위하여, 본 논문에서 제안하는 MCSK-Means 알고리즘은 k개의 중심점들로 이루어진 m개의 중심점 집합을 사용하여 임의로 생성되는 초기 중심점의 의존도를 줄였다. 또한, 클러스터링 단계를 거친 m개의 중심점 집합들에 속한 중심점들에 대하여 직접 계층 클러스터링 알고리즘을 적용하여 k개의 클러스터 중심점들을 생성하였다. 본 논문에서는 MCSK-Means 알고리즘을 맵리듀스 프레임워크 환경에서 개발하여 대용량 데이터를 효율적으로 처리할 수 있도록 하였다. As the size of data increases, it becomes important to identify properties by analyzing big data. In this paper, we propose a k-Means based efficient clustering technique, called MCSKMeans (Multi centroid set k-Means), using distributed parallel processing framework MapReduce. A problem with the k-Means algorithm is that the accuracy of clustering depends on initial centroids created randomly. To alleviate this problem, the MCSK-Means algorithm reduces the dependency of initial centroids using sets consisting of k centroids. In addition, we apply the agglomerative hierarchical clustering technique for creating k centroids from centroids in m centroid sets which are the results of the clustering phase. In this paper, we implemented our MCSK-Means based on the MapReduce framework for processing big data efficiently.

      • A Network Intrusion Detection Model Based on K-means Algorithm and Information Entropy

        Gao Meng,Li Dan,Wang Ni-hong,Liu Li-chen 보안공학연구지원센터 2014 International Journal of Security and Its Applicat Vol.8 No.6

        Many factors could influence the clustering performance of K-means algorithm, selection of initial cluster centers was an important one, traditional method had a certain degree of randomness in dealing with this problem, for this purpose, information entropy was introduced into the process of cluster centers selection, and a fusion algorithm combining with information entropy and K-means algorithm was proposed, in which, information entropy value was used to measure the similarity degree among records, the least similar record would be regarded as a cluster center. In addition, a network intrusion detection model was built, it could make cluster centers change dynamically along with the network changes, and the model could real-time update the cluster centers according to actual needs. Experiment results show that the improved algorithm proposed is better than the traditional K-means algorithm in detection ratio and false alarm ratio, and the network intrusion detection model is proved to be feasible.

      • KCI등재

        암호화 연산 프로토콜 및 데이터 분포 기반 중심점 선정을 이용한 개인 정보 보호 k-Means 클러스터링 알고리즘

        김형진(Hyeong-Jin Kim),장재우(Jae-Woo Chang) 한국정보과학회 2018 정보과학회 컴퓨팅의 실제 논문지 Vol.24 No.10

        아웃소싱 데이터베이스 상에서 개인정보를 보호 클러스터링 알고리즘이 활발히 연구되었다. Rao. et. al. 은 paillier 암호화 시스템을 이용하여 정보 보호를 지원하는 k-Means 클러스터링 알고리즘을 제안하였다. 그러나 해당 알고리즘은 초기 중심점을 임의로 선정함으로써 Clustering의 결과가 불규칙하다. 아울러, 비트 배열 기반 비교 연산자를 사용하기 때문에 배열의 크기에 비례하여 계산 비용이 크게 증가하는 단점이 존재한다. 이러한 문제점을 해결하기 위해. 본 논문에서는 정보 보호를 지원하는 효율적인 k-Means 클러스터링 알고리즘을 제안한다. 이를 위해, 암호화된 데이터의 비교를 수행하는 암호화 연산프로토콜을 제안한다. 또한 전체 데이터 분포를 반영하여 초기 중심점을 선정함으로써 효율적인 클러스터링을 수행한다. 마지막으로, 성능평가를 통해 제안하는 알고리즘이 기존 알고리즘에 비해 평균적으로 약 150~250% 성능이 향상됨을 보인다. Privacy-preserving clustering algorithms in outsourced databases have been actively studied. Rao. et. al. proposed a k-Means clustering algorithm that supports the protection of information by using a paillier crypto system. However, since the algorithm selects initial center points randomly, the result of clustering is irregular. Because it uses a comparison operator based on bit arrays, it has a disadvantage in that its computation cost greatly increases in proportion to the size of the array. To solve this problem, we propose an efficient k-Means clustering algorithm that supports the protection of information. To do this, we propose a cryptographic operation protocol that compares encrypted data. In addition, we performs efficient clustering by selecting initial center points in terms of the entire data distribution. Finally, we show through our performance evaluation that the proposed algorithm outperforms the existing algorithm by 150 to 250% on the average.

      • K-means Parallelization Algorithm Based on MapReduce

        Shuguang Wang,Chao Jiang 보안공학연구지원센터 2016 International Journal of Database Theory and Appli Vol.9 No.8

        Spatial Cluster analysis is another important technique in the field of spatial data mining, especially the K-Means spatial clustering method, which can deal with spatial objects with geographical location and attribute. However, with the development of the information society, the spatial data grows explosively, but the serial algorithm has low computing efficiency and is difficult to process massive spatial data. Aiming at spatial with a double meaning of location and attribute, the paper designed and implemented K-Means spatial clustering parallel algorithm on Hadoop. Using Yahoo Weibo user data is to do clustering analysis. Finally, the visualization of clustering results was implemented by Google Map.

      • KCI등재

        K-Means Clustering 알고리즘과 헤도닉 모형을 활용한 서울시 연립 · 다세대 군집분류 방법에 관한 연구

        권순재(Soonjae Kwon),김성현(Seonghyeon Kim),탁온식(Onsik Tak),정현희(Hyeonhee Jeong) 한국지능정보시스템학회 2017 지능정보연구 Vol.23 No.3

        Recent centrally the downtown area, the transaction between the row housing and multiplex housing is activated and platform services such as Zigbang and Dabang are growing. The row housing and multiplex housing is a blind spot for real estate information. Because there is a social problem, due to the change in market size and information asymmetry due to changes in demand. Also, the 5 or 25 districts used by the Seoul Metropolitan Government or the Korean Appraisal Board(hereafter, KAB) were established within the administrative boundaries and used in existing real estate studies. This is not a district classification for real estate researches because it is zoned urban planning. Based on the existing study, this study found that the city needs to reset the Seoul Metropolitan Governments spatial structure in estimating future housing prices. So, This study attempted to classify the area without spatial heterogeneity by the reflected the property price characteristics of row housing and Multiplex housing. In other words, There has been a problem that an inefficient side has arisen due to the simple division by the existing administrative district. Therefore, this study aims to cluster Seoul as a new area for more efficient real estate analysis. This study was applied to the hedonic model based on the real transactions price data of row housing and multiplex housing. And the K-Means Clustering algorithm was used to cluster the spatial structure of Seoul. In this study, data onto real transactions price of the Seoul Row housing and Multiplex Housing from January 2014 to December 2016, and the official land value of 2016 was used and it provided by Ministry of Land, Infrastructure and Transport(hereafter, MOLIT). Data preprocessing was followed by the following processing procedures: Removal of underground transaction, Price standardization per area, Removal of Real transaction case(above 5 and below -5). In this study, we analyzed data from 132,707 cases to 126,759 data through data preprocessing. The data analysis tool used the R program. After data preprocessing, data model was constructed. Priority, the K-means Clustering was performed. In addition, a regression analysis was conducted using Hedonic model and it was conducted a cosine similarity analysis. Based on the constructed data model, we clustered on the basis of the longitude and latitude of Seoul and conducted comparative analysis of existing area. The results of this study indicated that the goodness of fit of the model was above 75 % and the variables used for the Hedonic model were significant. In other words, 5 or 25 districts that is the area of the existing administrative area are divided into 16 districts. So, this study derived a clustering method of row housing and multiplex housing in Seoul using K-Means Clustering algorithm and hedonic model by the reflected the property price characteristics. Moreover, they presented academic and practical implications and presented the limitations of this study and the direction of future research. Academic implication has clustered by reflecting the property price characteristics in order to improve the problems of the areas used in the Seoul Metropolitan Government, KAB, and Existing Real Estate Research. Another academic implications are that apartments were the main study of existing real estate research, and has proposed a method of classifying area in Seoul using public information(i.e., real-data of MOLIT) of government 3.0. Practical implication is that it can be used as a basic data for real estate related research on row housing and multiplex housing. Another practical implications are that is expected the activation of row housing and multiplex housing research and, that is expected to increase the accuracy of the model of the actual transaction. The future research direction of this study involves conducting various analyses to overcome the limitations of the threshold and indicates the need for deeper researc

      • KCI등재

        Performance Improvement of Clustering Method Based on Random Swap Algorithm

        Sunjin Yu,Changyong Yoon 한국지능시스템학회 2019 INTERNATIONAL JOURNAL of FUZZY LOGIC and INTELLIGE Vol.19 No.2

        This paper proposes a method for improving performance of clustering algorithm. Among unsupervised learning methods such as clustering, K-means has the advantage of being widely used and simple to implement, but has problems that is heavily influenced by initial centroids and may be got stuck at local minimum. To minimize these shortcomings, this work uses a clustering method with a random swap algorithm based on K-means++ that calculates the distance between data and the center point as a probability. The swap-based clustering method replaces the existing centroids and the concentrated centroids with each other during clustering steps, and then performing re-partitions, and fine-tuning steps with K-means++. The experimental results show the swap-based K-means++ clustering method proposed in this paper has better performance than other methods, as comparing the method proposed with the method of other clustering under various circumstances. As producing datasets of vehicles on general purpose for measuring performance in various experimental environments, we demonstrate the excellence of the proposed method.

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼