RISS 검색 - 국내학술지논문

무료
기관 내 무료
유료

내보내기
내책장담기
한글로보기

정확도순

내림차순

내림차순

10개씩 출력

1
오토인코더를 이용한 딥러닝 기반 추천시스템 모형의 비교 연구

이효진,정윤서,Lee, Hyo Jin,Jung, Yoonsuh 한국통계학회 2021 응용통계연구 Vol.34 No.3
- 원문보기
추천 시스템은 고객의 데이터를 이용하여 개인 맞춤화된 상품을 추천한다. 추천 시스템은 협업 필터링, 콘텐츠 기반 필터링 그리고 이 두 가지를 합친 하이브리드 방법의 세 가지로 크게 나누어진다. 이 연구에서는 딥러닝 방법론에 기초한 오토인코더를 이용한 추천 시스템에 대한 소개와 그 모형들의 비교 연구를 진행한다. 오토인코더는 데이터 행렬에 0이 많은 경우의 문제를 효과적으로 다룰 수 있는 딥러닝 기반의 비지도학습 모형이다. 이 연구에서는 세 개의 실제 데이터를 이용하여 다섯 가지 종류의 오토인코더 기반 모형들을 비교한다. 처음의 세 개 모형은 협업 필터링에 속한 모형이고 나머지 두 개의 모형은 하이브리드 모형이다. 실제 데이터는 고객의 평점 데이터이고, 대부분의 평점이 없어서 희박성 비율이 높다는 특징이 있다. Recommender systems use data from customers to suggest personalized products. The recommender systems can be categorized into three cases; collaborative filtering, contents-based filtering, and hybrid recommender system that combines the first two filtering methods. In this work, we introduce and compare deep learning-based recommender system using autoencoder. Autoencoder is an unsupervised deep learning that can effective solve the problem of sparsity in the data matrix. Five versions of autoencoder-based deep learning models are compared via three real data sets. The first three methods are collaborative filtering and the others are hybrid methods. The data sets are composed of customers' ratings having integer values from one to five. The three data sets are sparse data matrix with many zeroes due to non-responses.
2
서울시 공공자전거 수요예측 모형 비교 연구

민소아(Soah Min),정윤서(Yoonsuh Jung) 한국데이터정보과학회 2021 한국데이터정보과학회지 Vol.32 No.3
- 원문보기
- 복사/대출신청
최근 환경 및 교통 문제 현안의 대안으로서 공공자전거 이용 활성화 정책이 양산되고, 그 사용량이 증가하고 있다. 본 논문에서는 서울시에서 제공하는 공공자전거의 일별 대여이력을 바탕으로 공공 자전거 수요 예측을 위한 모형들을 비교 분석한다. 대여소별 시계열 데이터에 대한 상관성을 가정한 벡터 자기회귀 모형 (VAR)과 독립성 가정이 요구되지 않는 기계 학습모형인 서포트 벡터 회귀 모형 (SVR), 비독립 데이터에 특화된 딥러닝 기법인 LSTM 모형, 그리고 비유사성 측정방법에 따라 데이터를 군집화하는 시계열 군집 분석 기법을 활용한 SVR 모형과 VAR 모형을 이용하여 자전거 대여 데이터를 모델링하고 수요 예측에 사용하여 그 예측의 정확도를 비교한다. Recently, as an alternative to environmental and transportation issues, the policy of activating the use of public bicycles has been mass-produced and the usage has increased. This paper compares and studies the model for predicting the demand of public bicycles based on the daily rental history of public bicycles in Seoul. VAR model which is extended from univariate autoregressive model to multivariate autoregressive model, the SVR model, which does not require independent assumptions, and the LSTM model, which is a deep learning technique specialized in dependent data are compared. In addition, a time series clustering analysis technique that clusters data according to a dissimilarity measure is used in SVR and VAR models. The performance is compared using RMSE and MAE. The predictive power of the LSTM model is the best, and the next is SVR. VAR model shows lowest predictive power compared to the other models.
3
이분산성 존재시 효율적 정보기반기준 분위수 회귀모형 선택법

신우영(Wooyoung Shin),정윤서(Yoonsuh Jung) 한국데이터정보과학회 2021 한국데이터정보과학회지 Vol.32 No.5
- 원문보기
- 복사/대출신청
이 논문은 오차의 이분산성 존재시 분위수 회귀모형의 선택에 대한 다양한 방법들을 제안한다. 분위수 손실함수 (check loss function)는 분위수 회귀분석에서 모형의 적합 과정과 선택 과정에 모두 사용된다. 저자들의 관심은 분위수 회귀모형의 선택에 (또는 조율 모수의 선택에) 있기 때문에, 모형의 적합에는 항상 분위수 손실함수를 사용한다. 모형의 선택 방법 중에는 정보기반기준(information-based criteria)이 많이 사용되며, 기존의 정보기반기준들은 오차의 이분산성 존재를 고려하지 않기 때문에 효율적인 모형 선택에 제약이 있다. 이를 해결하고자 본 논문에서는 분위수 손실함수를 이용한 모형의 선택시 관측값에 서로 다른 가중치를 부여하여, 관측값에 따른 변동성을 고려하는 새로운 방법론을 제안한다. 서로 다른 가중치를 추정하기 위하여 반응변수의 사분위수 범위를 추정하고, 이를 이용해 관측값에 따라 달라지는 반응변수의 변동을 모형 선택 과정에 반영한다. 그 결과 변동성이 큰 부분의 상대적인 중요도가 낮아져서, 변수 선택이나 조율 모수의 선택에 미치는 영향이 줄어든다. 선형모형과 비선형 모형인 경우로 나누어 제안하는 방법론의 구체적 적용 방법을 제시하고, 각각의 제안하는 방법론의 효율성을 모의 실험과 실제 데이터 분석을 통하여 제시한다. We propose a class of methods for the tuning parameter selection in quantile regression model when the errors are heteroscedastic. Check loss function is commonly used for quantile regression in model fitting and model (or tuning parameter) selection. As we are interested in the tuning parameter selection, we always use check loss function for the model fitting process. Information-based criteria are widely used for model selection, but it does not consider heteroscedastic errors. To attack this issue, we suggest using different weights in the information-based criteria. Specifically, we estimate the variation in the response variable using interquartile range (IQR). IQR is then utilized to yield weight for each sample. The effect of the samples in the high variation is expected to reduce due to the proposed method. The form of the proposed method changes depending on whether the model is linear or nonlinear. Its effectiveness for treating the heteroscedasticity is presented via simulated data and two real data sets.

내보내기
내책장담기
한글로보기

정확도순

내림차순

내림차순

10개씩 출력

맨처음 페이지로 1 맨끝 페이지로

상세검색

RISS 보유자료

상세검색

해외전자자료

연관 검색어 추천