http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
SSF: Sentence Similar Function Based on word2vector Similar Elements
Yuan, Xinpan,Wang, Songlin,Wan, Lanjun,Zhang, Chengyuan Korea Information Processing Society 2019 Journal of information processing systems Vol.15 No.6
In this paper, to improve the accuracy of long sentence similarity calculation, we proposed a sentence similarity calculation method based on a system similarity function. The algorithm uses word2vector as the system elements to calculate the sentence similarity. The higher accuracy of our algorithm is derived from two characteristics: one is the negative effect of penalty item, and the other is that sentence similar function (SSF) based on word2vector similar elements doesn't satisfy the exchange rule. In later studies, we found the time complexity of our algorithm depends on the process of calculating similar elements, so we build an index of potentially similar elements when training the word vector process. Finally, the experimental results show that our algorithm has higher accuracy than the word mover's distance (WMD), and has the least query time of three calculation methods of SSF.
Near-Duplication Document Detection Using Weight One Permutation Hashing
Xinpan Yuan,Songlin Wang,Xiaojun Deng 한국정보과학회 2019 Journal of Computing Science and Engineering Vol.13 No.2
As a standard algorithm for efficiently calculating set similarity, Minwise hashing is widely used to detect text similarity. The major drawback associated with Minwise hashing is expensive preprocessing. One permutation hashing (OPH) is proposed in order to reduce the number of random permutations. OPH divides the space Ω evenly into k bins, and selects the smallest nonzero value in each bin to re-index the selected elements. We propose a weight one permutation hashing (WOPH) by dividing the entire space Ω into k1 and k2 bins and sampling k1 and k2 in proportion to form a weighted kw. WOPH has a wider range of precision by expanding the proportion of w1 and w2 to different accuracy levels of the user. The variance of WOPH can be rapidly decreased first and then slowly decreased, although the final variance is the same as OPH with the same k. We combined the dynamic double filter with WOPH to reduce the calculation time by eliminating unnecessary comparison in advance. For example, for a large number of real data with low similarity accompanied by high threshold queries, the filter reduces the comparison of WOPH by 85%.
SSF: Sentence Similar Function Based on word2vector Similar Elements
Xinpan Yuan,Songlin Wang,Lanjun Wan,Chengyuan Zhang 한국정보처리학회 2019 Journal of information processing systems Vol.15 No.6
In this paper, to improve the accuracy of long sentence similarity calculation, we proposed a sentence similaritycalculation method based on a system similarity function. The algorithm uses word2vector as the systemelements to calculate the sentence similarity. The higher accuracy of our algorithm is derived from twocharacteristics: one is the negative effect of penalty item, and the other is that sentence similar function (SSF)based on word2vector similar elements doesn’t satisfy the exchange rule. In later studies, we found the timecomplexity of our algorithm depends on the process of calculating similar elements, so we build an index ofpotentially similar elements when training the word vector process. Finally, the experimental results show thatour algorithm has higher accuracy than the word mover’s distance (WMD), and has the least query time ofthree calculation methods of SSF.
Research on Fault Diagnosis of Wind Power Generator Blade Based on SC-SMOTE and kNN
Cheng Peng,Qing Chen,Longxin Zhang,Lanjun Wan,Xinpan Yuan 한국정보처리학회 2020 Journal of information processing systems Vol.16 No.4
Because SCADA monitoring data of wind turbines are large and fast changing, the unbalanced proportion of data in various working conditions makes it difficult to process fault feature data. The existing methods mainly introduce new and nonrepeating instances by interpolating adjacent minority samples. In order to overcomethe shortcomings of these methods which does not consider boundary conditions in balancing data, an improved oversampling balancing algorithm SCSMOTE (safe circle synthetic minority oversampling technology) is proposed to optimize data sets. Then, for the balanced data sets, a fault diagnosis method based on improvedknearest neighbors (kNN) classification for wind turbine blade icing is adopted. Compared with the SMOTE algorithm, the experimental results show that the method is effective in the diagnosis of fan blade icing fault and improves the accuracy of diagnosis.