http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Taeksoo Shin(신택수),Taeho Hong(홍태호) 한국지능정보시스템학회 2011 지능정보연구 Vol.17 No.3
Recently, support vector machines (SVMs) are being recognized as competitive tools as compared with other data mining techniques for solving pattern recognition or classification decision problems. Furthermore, many researches, in particular, have proved them more powerful than traditional artificial neural networks (ANNs) (Amendolia et al., 2003; Huang et al., 2004, Huang et al., 2005; Tay and Cao, 2001; Min and Lee, 2005; Shin et al., 2005; Kim, 2003).The classification decision, such as a binary or multi-class decision problem, used by any classifier, i.e. data mining techniques is so cost-sensitive particularly in financial classification problems such as the credit ratings that if the credit ratings are misclassified, a terrible economic loss for investors or financial decision makers may happen. Therefore, it is necessary to convert the outputs of the classifier into wellcalibrated posterior probabilities-based multiclass credit ratings according to the bankruptcy probabilities. However, SVMs basically do not provide such probabilities. So it required to use any method to create the probabilities (Platt, 1999; Drish, 2001). This paper applied AdaBoost algorithm-based support vector machines (SVMs) into a bankruptcy prediction as a binary classification problem for the IT companies in Korea and then performed the multi-class credit ratings of the companies by making a normal distribution shape of posterior bankruptcy probabilities from the loss functions extracted from the SVMs. Our proposed approach also showed that their methods can minimize the misclassification problems by adjusting the credit grade interval ranges on condition that each credit grade for credit loan borrowers has its own credit risk, i.e. bankruptcy probability.
Tserendulam Dorjmaa,Taeksoo Shin 한국경영학회 2015 한국경영학회 통합학술발표논문집 Vol.2015 No.08
The rapid growth of information technology and mobile service platforms, i.e., internet, google, and facebook, etc. has led the abundance of data. Due to this environment, the world is now facing a revolution in the process that data is searched, collected, stored, and shared. Abundance of data gives us several opportunities to knowledge discovery and data mining techniques. In recent years, data mining methods as a solution to discovery and extraction of available knowledge in database has been more popular in many fields such as movie recommendation of e-commerce service. However, most of classification approaches for predicting movie popularity have used only several types of information of the movie such as actor, director, rating score, language and countries etc. In this study, we propose a classification-based support vector machine (SVM) model for predicting the movie popularity based on movie’s genre data and social network data. Social network analysis (SNA) is used for improving the classification accuracy. This study builds the movies’ network (one mode network) based on initial data which is a two mode network as user-to-movie network. For the proposed method we computed degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality as centrality measures in movie’s network. Those four centrality values and movies’ genre data were used to classify the movie popularity in this study. The logistic regression, neural network, naive Bayes classifier, and decision tree as benchmarking models for movie popularity classification were also used for comparison with the performance of our proposed model. To assess the classifier’s performance accuracy this study used MovieLens data as an open database. Our empirical results indicate that our proposed model has about 10% higher accuracy than other classification models. The implications of our results show that our proposed model could be used for improving movie popularity classification.
Integrated Model of Data Mining and Sentiment Analysis for Daily KOSPI Forecasting
Zhongjun Cui,Taeksoo Shin 한국경영학회 2015 한국경영학회 통합학술발표논문집 Vol.2015 No.08
Although stock price forecasting has been a traditional topic in the research domain of investment decision making, there have been many difficulties in forecasting stock price due to the unexpected rapid changes in stock prices. Recently, many researchers attempted to analyze sentiment in SNS data or news data to forecast stock price, but these researches have limitations that they used only one of sentiment data or KOSPI (Korea Composite Stock Price index) data in forecasting stock price. The aim of this paper is to propose new domain-specific sentiment dictionaries on stock price by using sentiment analysis, and acquire daily sentiment indices by analyzing the sentiment of news articles, and then use both of the sentiment data and KOSPI data together as input for data mining model for daily KOSPI forecasting, and finally improve the accuracy of forecasting the direction of KOSPI. TF-IDF weight was considered in building sentiment dictionaries and calculating daily sentiment indices by using domain-specific sentiment dictionaries. Our empirical result showed that in particular, a K-NN model with KOSPI and the sentiment data calculated by using both TF-IDF weights-based sentiment dictionary and the weights of news article itself in each news article data, had the accuracy of 68% and outperformed any other models in validation data.
Tserendulam Dorjmaa,Taeksoo Shin 한국IT서비스학회 2017 한국IT서비스학회지 Vol.16 No.3
The rapid growth of information technology and mobile service platforms, i.e., internet, google, and facebook, etc. has led the abundance of data. Due to this environment, the world is now facing a revolution in the process that data is searched, collected, stored, and shared. Abundance of data gives us several opportunities to knowledge discovery and data mining techniques. In recent years, data mining methods as a solution to discovery and extraction of available knowledge in database has been more popular in e-commerce service fields such as, in particular, movie recommendation. However, most of the classification approaches for predicting the movie popularity have used only several types of information of the movie such as actor, director, rating score, language and countries etc. In this study, we propose a classification-based support vector machine (SVM) model for predicting the movie popularity based on movie’s genre data and social network data. Social network analysis (SNA) is used for improving the classification accuracy. This study builds the movies’ network (one mode network) based on initial data which is a two mode network as user-to-movie network. For the proposed method we computed degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality as centrality measures in movie’s network. Those four centrality values and movies’ genre data were used to classify the movie popularity in this study. The logistic regression, neural network, naïve Bayes classifier, and decision tree as benchmarking models for movie popularity classification were also used for comparison with the performance of our proposed model. To assess the classifier’s performance accuracy this study used MovieLens data as an open database. Our empirical results indicate that our proposed model with movie’s genre and centrality data has by approximately 0% higher accuracy than other classification models with only movie’s genre data. The implications of our results show that our proposed model can be used for improving movie popularity classification accuracy.