http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Graph based KNN for Optimizing Index of News Articles
Jo, Taeho Korea Multimedia Society 2016 The journal of multimedia information system Vol.3 No.3
This research proposes the index optimization as a classification task and application of the graph based KNN. We need the index optimization as an important task for maximizing the information retrieval performance. And we try to solve the problems in encoding words into numerical vectors, such as huge dimensionality and sparse distribution, by encoding them into graphs as the alternative representations to numerical vectors. In this research, the index optimization is viewed as a classification task, the similarity measure between graphs is defined, and the KNN is modified into the graph based version based on the similarity measure, and it is applied to the index optimization task. As the benefits from this research, by modifying the KNN so, we expect the improvement of classification performance, more graphical representations of words which is inherent in graphs, the ability to trace more easily results from classifying words. In this research, we will validate empirically the proposed version in optimizing index on the two text collections: NewsPage.com and 20NewsGroups.
Time Series Prediction using Virtual Term Generation Scheme
Jo, Taeho,Cho, Sungzoon 한국경영과학회 1996 한국경영과학회 학술대회논문집 Vol.- No.1
The values measured at different time and enumerated sequentially by homogenous interval is called time series. Its goal is to predict values in future by analysing the measured values in past. The stastical approach to time series prediction tend to be by a neural approach with difficulties in expressing the reationship among past data. In neural approach, the preblem is the acquisition of the enough training data in advance. The goal of this paper is that such problem is solved by generating another term as virtual term between terms in time series.
Semantic Word Categorization using Feature Similarity based K Nearest Neighbor
Jo, Taeho Korea Multimedia Society 2018 The journal of multimedia information system Vol.5 No.2
This article proposes the modified KNN (K Nearest Neighbor) algorithm which considers the feature similarity and is applied to the word categorization. The texts which are given as features for encoding words into numerical vectors are semantic related entities, rather than independent ones, and the synergy effect between the word categorization and the text categorization is expected by combining both of them with each other. In this research, we define the similarity metric between two vectors, including the feature similarity, modify the KNN algorithm by replacing the exiting similarity metric by the proposed one, and apply it to the word categorization. The proposed KNN is empirically validated as the better approach in categorizing words in news articles and opinions. The significance of this research is to improve the classification performance by utilizing the feature similarities.
Modified Version of SVM for Text Categorization
Taeho Jo 한국지능시스템학회 2008 INTERNATIONAL JOURNAL of FUZZY LOGIC and INTELLIGE Vol.8 No.1
This research proposes a new strategy where documents are encoded into string vectors for text categorization and modified versions of SVM to be adaptable to string vectors. Traditionally, when the traditional version of SVM is used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and apply the modified version of SVM adaptable to string vectors for text categorization.
Representation of Texts into String Vectors for Text Categorization
Taeho Jo 한국정보과학회 2010 Journal of Computing Science and Engineering Vol.4 No.2
In this study, we propose a method for encoding documents into string vectors, instead of numerical vectors. A traditional approach to text categorization usually requires encoding documents into numerical vectors. The usual method of encoding documents therefore causes two main problems: huge dimensionality and sparse distribution. In this study, we modify or create machine learning-based approaches to text categorization, where string vectors are received as input vectors, instead of numerical vectors. As a result, we can improve text categorization performance by avoiding these two problems.
Table based Single Pass Algorithm for Clustering News Articles
Taeho Jo 한국지능시스템학회 2008 INTERNATIONAL JOURNAL of FUZZY LOGIC and INTELLIGE Vol.8 No.3
This research proposes a modified version of single pass algorithm specialized for text clustering. Encoding documents into numerical vectors for using the traditional version of single pass algorithm causes the two main problems: huge dimensionality and sparse distribution. Therefore, in order to address the two problems, this research modifies the single pass algorithm into its version where documents are encoded into not numerical vectors but other forms. In the proposed version, documents are mapped into tables and the operation on two tables is defined for using the single pass algorithm. The goal of this research is to improve the performance of single pass algorithm for text clustering by modifying it into the specialized version.