RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제
      • 좁혀본 항목 보기순서

        • 원문유무
        • 원문제공처
          펼치기
        • 등재정보
          펼치기
        • 학술지명
          펼치기
        • 주제분류
          펼치기
        • 발행연도
          펼치기
        • 작성언어

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • 무료
      • 기관 내 무료
      • 유료
      • KCI등재

        반응/미반응 자료의 과대표본 추출에 대한 연구

        황산하,진서훈,최종후 한국자료분석학회 2013 Journal of the Korean Data Analysis Society Vol.15 No.4

        For binary classification problem, target ratio of data mart can affect model performance. When forming a data mart for model building, if certain categories of the target variable is relatively rare, it is desirable that good/bad ratio of target variable is balanced. This is called as oversampling. This study is experimental study for oversampling. Target ratio is changed from 1:1 to 1:20. We found optimal model under various target ratio. Especially, boosting and random forrest are compared with traditional classification method decision tree and logistic regression. Proper target ratio and cut-off value are found by cross-validation. Logistic regression model gave better result when target/non-target ratio is 1:20 and 1:16 under cut-off value 0.10. AdaBoost gave the best result for other target ratio cases. Under the cut-off value 0.20, AdaBoost gave better result for target/non-target ratio is balanced decision tree gave better result for unbalanced target/non-target ratio. Under the cut-off value 0.3 to 0.5, logistic regression and random forest gave better result while decision tree gave worse result. 반응/미반응과 같은 이분형(binary) 목표변수를 갖는 모집단에서 모형개발을 위한 데이터마트를 형성할 때 반응/미반응 구성비는 구축된 모형의 성능에 영향을 준다. 따라서 목표변수의 특정 범주가 상대적으로 희소한 경우 모형 구축을 위하여 데이터마트를 형성할 때, 목표변수 각 범주 빈도의 수적 형평성을 맞추는 것이 바람직하다. 이를 과대표본추출(oversampling)이라고 한다. 본 연구는 이에 대한 실험적 연구로 과대표본추출의 구성비를 1:1에서 1:20까지 다양하게 구성하여 분류기준값(cut-off) 별로 최적모형을 찾아보았다. 특히 앙상블 기법인 부스팅(boosting) 중 아다부스트(AdaBoost)와 랜덤포레스트(random forests) 기법을 의사결정나무 및 회귀모형과 함께 비교하였다. 또한 교차타당성(cross-validation)을 통한 검증을 통하여 과대표본추출의 적절한 분류기준값과 구성비를 찾아보았다. 분류기준값 0.10에서 구성비가 1:20, 1:16 등 불균형이 심할 때는 로지스틱회귀분석이 좋은 결과를 주었으며 그 외의 구성비에서는 아다부스트가 가장 좋은 결과를 주었다. 분류기준값 0.20에서는 구성비의 불균형이 적을 때는 아다부스트가 좋은 결과를 주었으며 구성비의 불균형이 클 때는 의사결정나무가 좋은 결과를 주었다. 분류기준값 0.3에서 0.5까지에서는 로지스틱회귀분석과 랜덤포레스트가 좋은 결과를 주고 있으며 의사결정나무는 상대적으로 좋지 않은 결과를 보였다.

      • KCI등재

        ${\sum}-{\Delta}$ modulator의 구조를 갖는A/D 변환기 설계

        윤정식,정정화 한국통신학회 2003 韓國通信學會論文誌 Vol.28 No.1C

        본 논문에서는 2 Ms/s의 데이터 rate와 12-비트의 해상도를 갖는 Sigma-delta modulator의 구조를 제안한다. Sigma-delta modulator는 oversampling과 노이즈 shaping의 두 가지 특성으로 인해 낮은 해상도의 A/D 변환기와 결합하여 높은 해상도를 갖는 A/D 변환기의 구현이 가능하다는 장점으로 audio 응용 분야에 널리 사용되어 왔다. 그러나, Sigma-delta modulator를 무선 데이터 통신 등 다양한 응용 분야에서 사용하기 위해서는 좀더 높은 데이터 rate를 갖는 Sigma-delta modulator에 관한 연구가 필요하게 되었다. 본 논문에서 제안한 Sigma-delta modulator 구조는 기존의 64 내지 256의 oversampling비를 16으로 낮추어 sampling을 하여 기존의 수 십에서 수 백 Ks/s정도의 데이터 rate를 1 Ms/s 이상의 높은 데이터 rate에서 동작하도록 하였다. 그리고 두 개의 2차 Sigma-delta modulator를 Cascade 구조로 연결하고, 이득을 최적화하여 4차의 Sigma-delta modulator와 유사한 결과를 얻을 수 있었다. 내부에는 1-비트 A/D, D/A 변환기를 채용하여 부가적인 calibration 회로가 필요 없도록 하였다. This thesis proposes a sigma-delta modulator architecture with 2 Ms/s data rate and 12 bit resolution. A sigma-delta modulate has the features of oversampling and noise shaping. With these features, it can be connected with low resolution A/D converter to achieve higher resolution A/D converter. Most previous researches have been concentrated on high resolution but low data rate applications, e.g. audio applications. But, in order to be applied to various applications such as wireless data communication, researches on sigma-delta modulator architecture for higher data rate are required. The proposed sigma-delta modulator architecture has the sampling rate of 16 times Nyquist rate to achieve high data rate, and consists of a cascade of two 2nd order sigma-delta modulator to get relatively high resolution. The experimental result shows that the proposed architecture achieves 12-bit resolution at 2 Ms/s data rate.

      • KCI등재

        Mixture copulas with discrete margins and their application to imbalanced data

        Liu Yujian,Xie Dejun,Edwards David A.,Yu Siyi 한국통계학회 2023 Journal of the Korean Statistical Society Vol.52 No.4

        This article introduces the approach of using Bayesian sampling to estimate the mixture copula with discrete margins, we further apply our models to solve the class imbalanced problems in data science by oversampling. The methodology makes it possible to learn and sample from the data set with the discrete and continuous features exists simultaneously. On the other hand, the discreetness of factors in a data set are not naturally considered for the classic SMOTE algorithm and classic random oversampling is simply performed by generating the already existing points, which do not give any new information to the classifiers and is easy to overfit. Copula methods enable us to generate new points with the correlation structure memorized by learning from the training set. Hence, the overfitting problems are reduced. Experiments with synthetic and real data are done in the article following the introduction of the methodology. The outcomes shows the validity of the approach when compared with the benchmark methods.

      • SCIESCOPUSKCI등재

        A Single-ended Simultaneous Bidirectional Transceiver in 65-nm CMOS Technology

        Jeon, Min-Ki,Yoo, Changsik The Institute of Electronics and Information Engin 2016 Journal of semiconductor technology and science Vol.16 No.6

        A simultaneous bidirectional transceiver over a single wire has been developed in a 65 nm CMOS technology for a command and control bus. The echo signals of the simultaneous bidirectional link are cancelled by controlling the decision level of receiver comparators without power-hungry operational amplifier (op-amp) based circuits. With the clock information embedded in the rising edges of the signals sent from the source side to the sink side, the data is recovered by an open-loop digital circuit with 20 times blind oversampling. The data rate of the simultaneous bidirectional transceiver in each direction is 75 Mbps and therefore the overall signaling bandwidth is 150 Mbps. The measured energy efficiency of the transceiver is 56.7 pJ/b and the bit-error-rate (BER) is less than $10^{-12}$ with $2^7-1$ pseudo-random binary sequence (PRBS) pattern for both signaling directions.

      • 과표본추출 기반의 블록체인 이상탐지 방법에 관한 연구

        고자영(Ja-young Go),배석주(S. J. Bae) 대한산업공학회 2019 대한산업공학회지 Vol.45 No.6

        Despite the characteristics of reliable blockchain, there are an increasing trend of anomalies in its network. Recent crime reports show that bitcoins can be used in illegal transactions such as drug trafficking, money laundering and frauds. Thus, it is crucial to detect illegal transactions earlier to secure credibility of blockchain network. We extracted features from both each users and their transactions after building a database. In particular, transaction data are of a network structure, so features are extracted using the network analysis. Owing to unbalance property of the transaction data, the borderline SMOTE is used as the oversampling method. Finally, the analysis and comparison are performed using support vector machine (SVM), random forest (RF), XGBoost, and logistic regression to evaluate their performances. We apply the proposed method to the real data set of bitcoin transaction data, and find that XGBoost shows the best performance in detecting anomal transactions. The proposed oversampling-based methods show a potential in detecting anomal transactions earlier.

      • KCI등재

        A Single-ended Simultaneous Bidirectional Transceiver in 65-nm CMOS Technology

        Min-Ki Jeon,Changsik Yoo 대한전자공학회 2016 Journal of semiconductor technology and science Vol.16 No.6

        A simultaneous bidirectional transceiver over a single wire has been developed in a 65 nm CMOS technology for a command and control bus. The echo signals of the simultaneous bidirectional link are cancelled by controlling the decision level of receiver comparators without power-hungry operational amplifier (op-amp) based circuits. With the clock information embedded in the rising edges of the signals sent from the source side to the sink side, the data is recovered by an open-loop digital circuit with 20 times blind oversampling. The data rate of the simultaneous bidirectional transceiver in each direction is 75 Mbps and therefore the overall signaling bandwidth is 150 Mbps. The measured energy efficiency of the transceiver is 56.7 pJ/b and the bit-error-rate (BER) is less than 10<SUP>-12</SUP> with 2<SUP>7</SUP>-1 pseudo-random binary sequence (PRBS) pattern for both signaling directions.

      • A study on the characteristics of applying oversampling algorithms to Fosberg Fire-Weather Index (FFWI) data

        Hyung-Koo Yoon,Sang Yeob Kim,Dongsoo Lee,Jung-Doung Yu 국제구조공학회 2024 Smart Structures and Systems, An International Jou Vol.34 No.1

        Oversampling algorithms are methods employed in the field of machine learning to address the constraints associated with data quantity. This study aimed to explore the variations in reliability as data volume is progressively increased through the use of oversampling algorithms. For this purpose, the synthetic minority oversampling technique (SMOTE) and the borderline synthetic minority oversampling technique (BSMOTE) are chosen. The data inputs, which included air temperature, humidity, and wind speed, are parameters used in the Fosberg Fire-Weather Index (FFWI). Starting with a base of 52 entries, new data sets are generated by incrementally increasing the data volume by 10% up to a total increase of 100%. This augmented data is then utilized to predict FFWI using a deep neural network. The coefficient of determination (R<sup>2<sup>) is calculated for predictions made with both the original and the augmented datasets. Suggesting that increasing data volume by more than 50% of the original dataset quantity yields more reliable outcomes. This study introduces a methodology to alleviate the challenge of establishing a standard for data augmentation when employing oversampling algorithms, as well as a means to assess reliability.

      • Valid oversampling schemes to handle imbalance

        Kim, Young-geun,Kwon, Yongchan,Paik, Myunghee Cho Elsevier 2019 Pattern recognition letters Vol.125 No.-

        <P><B>Abstract</B></P> <P>An imbalance is one of the problems in machine learning. When data are not balanced, the correct specification rate for the minor class suffers even if accuracy is high. The oversampling method has been used to address the issue without consideration about the sacrifice of accuracy. In addition, an arbitrary oversampling scheme may introduce bias. In this paper, we propose principled methods of handling imbalance under user-specified constraints on the sensitivity and specificity. Our work consists of three elements of contributions. First, we provide an optimized target proportion that minimizes the maximum error rate under user-specified constraints on sensitivity and specificity. Second, we introduce the notion of <I>resampling at random</I> (RAR) under which the limit of the estimator is not altered from the original sample. These two elements are relevant to any classification methods. Third, we derive asymptotic properties of selected classifiers when we apply RAR oversampling with the target proportion. Finally, we implement the proposed method in an image recognition context using the extracted feature from the last layer of deep convolutional neural networks (CNNs). We present an analysis of fundus data to classify diabetic retinopathy using the proposed method.</P> <P><B>Highlights</B></P> <P> <UL> <LI> We provide an optimized oversampling target proportion. </LI> <LI> We identify valid schemes which do not alter the limit of the estimator. </LI> <LI> We derive the asymptotic properties of classifiers when the valid scheme is applied. </LI> <LI> We implement the proposed method in an image recognition context. </LI> <LI> We present an analysis of fundus data to classify diabetic retinopathy. </LI> </UL> </P>

      • KCI등재

        A Comparison of Oversampling Methods on Imbalanced Topic Classification of Korean News Articles

        Yirey Suh,Jaemyung Yu,Jonghoon Mo,Leegu Song,김청택 서울대학교 인지과학연구소 2017 Journal of Cognitive Science Vol.18 No.4

        Machine learning has progressed to match human performance, including the field of text classification. However, when training data are imbalanced, classifiers do not perform well. Oversampling is one way to overcome the problem of imbalanced data and there are many oversampling methods that can be conveniently implemented. While comparative researches of oversampling methods on non-text data have been conducted, studies comparing oversampling methods under a unifying framework on text data are scarce. This study finds that while oversampling methods generally improve the performance of classifiers, similarity is an important factor that influences the performance of classifiers on imbalanced and resampled data.

      • KCI등재

        다양한 데이터 전처리 기법과 데이터 오버샘플링을 적용한 GRU 모델 기반 이상 탐지 성능 비교

        유승태,김강석,Yoo, Seung-Tae,Kim, Kangseok 한국정보보호학회 2022 정보보호학회논문지 Vol.32 No.2

        According to the recent change in the cybersecurity paradigm, research on anomaly detection methods using machine learning and deep learning techniques, which are AI implementation technologies, is increasing. In this study, a comparative study on data preprocessing techniques that can improve the anomaly detection performance of a GRU (Gated Recurrent Unit) neural network-based intrusion detection model using NGIDS-DS (Next Generation IDS Dataset), an open dataset, was conducted. In addition, in order to solve the class imbalance problem according to the ratio of normal data and attack data, the detection performance according to the oversampling ratio was compared and analyzed using the oversampling technique applied with DCGAN (Deep Convolutional Generative Adversarial Networks). As a result of the experiment, the method preprocessed using the Doc2Vec algorithm for system call feature and process execution path feature showed good performance, and in the case of oversampling performance, when DCGAN was used, improved detection performance was shown.

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼