http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
The Sequence Labeling Approach for Text Alignment of Plagiarism Detection
( Leilei Kong ),( Zhongyuan Han ),( Haoliang Qi ) 한국인터넷정보학회 2019 KSII Transactions on Internet and Information Syst Vol.13 No.9
Plagiarism detection is increasingly exploiting text alignment. Text alignment involves extracting the plagiarism passages in a pair of the suspicious document and its source document. The heuristics have achieved excellent performance in text alignment. However, the further improvements of the heuristic methods mainly depends more on the experiences of experts, which makes the heuristics lack of the abilities for continuous improvements. To address this problem, machine learning maybe a proper way. Considering the position relations and the context of text segments pairs, we formalize the text alignment task as a problem of sequence labeling, improving the current methods at the model level. Especially, this paper proposes to use the probabilistic graphical model to tag the observed sequence of pairs of text segments. Hence we present the sequence labeling approach for text alignment in plagiarism detection based on Conditional Random Fields. The proposed approach is evaluated on the PAN@CLEF 2012 artificial high obfuscation plagiarism corpus and the simulated paraphrase plagiarism corpus, and compared with the methods achieved the best performance in PAN@CLEF 2012, 2013 and 2014. Experimental results demonstrate that the proposed approach significantly outperforms the state of the art methods.
A Hyperlink-Extended Language Model for Microblog Retrieval
Zhongyuan Han,Muyun Yang,Leilei Kong,Haoliang Qi,Sheng Li 보안공학연구지원센터 2015 International Journal of Database Theory and Appli Vol.8 No.6
Microblog retrieval has received much attention in recent years. In microblog retrieval, the content linked by URLs is one of the most important information of a microblog. We present a Hyperlink-extended model for microblog retrieval that combines content of microblogs and the content of embedded hyperlinks webpages using a probabilistic ranking function based on language model. Hyperlink-extended language model incorporates the users' information retrieval requirements and the microblog author’s expression needs. Using standard TREC 2011 and TREC 2012 microblog retrieval collection, various aspects of our microblog retrieval model are evaluated. Results show our model significantly outperform the art-of-the-state URL-based approaches and the best performance of TREC 2012 microblog retrieval.
A Temporal Microblog Filtering Model
Zhongyuan Han,Muyun Yang,Leilei Kong,Haoliang Qi,Sheng Li 보안공학연구지원센터 2016 International Journal of Grid and Distributed Comp Vol.9 No.1
The rapid growth in the popularity of social networking and microblogging has led to a new way of finding and broadcasting information in the recent years. The real-time microblog filtering emerges as the times require. The task of real-time microblog filtering is to decide if subsequently posted tweets are relevant to a given query which represents the special information needs. One-side feedback is one of the most difficult problems in microblog filtering. This paper focuses on exploiting the time profile of relevant microblogs to address this problem. A temporal microblog filtering based on retrieval model is proposed. Specifically, similarity threshold achieved by the language model is adjusted according to temporal burst. Evaluated on the TREC 2012 microblog real-time filtering track dataset, the experimental results show that the proposed model is significantly better than several baselines.
A Study on Adaptive Direction Teaching-Learning-Based Optimization Algorithm
Xu Sun,Mengying He,Leilei Kong,Haoliang Qi 보안공학연구지원센터 2016 International Journal of u- and e- Service, Scienc Vol.9 No.4
In the real life learning process, the teacher communicates with the students for a better learning outcome. The teaching-learning-based optimization (TLBO) algorithm simulates this procedure and shows its great performance in solving the constrained and unconstrained nonlinear optimization problem. This paper presents an adaptive direction strategy(ADS )t o improve the searching ability for the TLBO algorithm. The improved algorithm is tested through searching the optimal points for a few typical testing functions. The testing result shows that the improved TLBO algorithm could obtain better optimal solutions in shorter time. Compared to the normal TLBO algorithm, the stability and effectiveness of the improved algorithm are increased greatly.
A Method of Plagiarism Source Retrieval and Text Alignment Based on Relevance Ranking Model
Leilei Kong,Zicheng Zhao,Zhimao Lu,Haoliang Qi,Feng Zhao 보안공학연구지원센터 2016 International Journal of Database Theory and Appli Vol.9 No.12
The problem of text plagiarism has increased because of the digital resources available on the World Wide Web. Source Retrieval and Text Alignment are two core tasks of plagiarism detection. A plagiarism source retrieval and text alignment system based on relevance ranking model is described in this paper. Not only the source retrieval task but also the text alignment task is all regarded as a process of information retrieval, and the relevance ranking is used to search the plagiarism sources and obtain the candidate plagiarism seeds. For source retrieval, BM25 model is used, while for text alignment, Vector Space Model is exploited. Furthermore, a plagiarism detection system named HawkEyes is developed based on the proposed methods and some demonstrations of HawkEyes are given.
Prediction of Users Retweet Times in Social Network
Haihao Yu,Xu Feng Bai,ChengZhe Huang,Haoliang Qi 보안공학연구지원센터 2015 International Journal of Multimedia and Ubiquitous Vol.10 No.5
In view of the fact that the propagation path topology cannot effectively deal with complex social network consists of hundreds of millions of users. More researchers choose to use machine learning methods to complete retweet prediction. Those use the classification method to judge whether a message will be retweeted or not. This paper argues that retweet prediction should be regression analysis problem, not just the classification problem. Through collecting user characteristics on Twitter and selecting some features which have an important impact on the retweet behavior, a Prediction algorithm Based on the Logistic Regression for users Retweet Times in social network was proposed. Experiment results based on the actual data set show the regression analysis predicting model has a good predicting accuracy in dealing with retweet predicting, the proposed method is effectiveness.