RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      KCI등재

      텍스트 마이닝을 이용한 웹 포럼 불량글 탐지 모델 = The Spam Detection Model for Web Forums using Text Mining Techniques

      한글로보기

      https://www.riss.kr/link?id=A101574285

      • 0

        상세조회
      • 0

        다운로드
      서지정보 열기
      • 내보내기
      • 내책장담기
      • 공유하기
      • 오류접수

      부가정보

      다국어 초록 (Multilingual Abstract)

      The spam in the discussion web forum causes user inconvenience and lowers the value of the web forum as the open source of user opinion. The importance of postings is evaluated in terms of the number of involved authors, so the spam distorts the analysis result by adding the unnecessary data in the opinion analysis. We propose the automatic detection model of spam postings in the web forum. We extract text features of posting contents using text mining techniques from the perspective of linguistics and then perform supervised learning to recognize spam from normal postings. Significant features are derived through the learning process and the automatic detection model is built based on those features. To build the automatic detection model of normal postings and spam, four evaluators are asked to recognize the spam posting in prior. We adopted the Naive Bayesian, Support Vector Machine (SVM), decision tree, which are known to perform well in data and text mining tasks. We can extract the text features to recognize the spam and detect automatically the newly posted spam. We apply the proposed model to the YahooFinace-Walmart forum, which is the world largest Walmart-related web forum.
      번역하기

      The spam in the discussion web forum causes user inconvenience and lowers the value of the web forum as the open source of user opinion. The importance of postings is evaluated in terms of the number of involved authors, so the spam distorts the analy...

      The spam in the discussion web forum causes user inconvenience and lowers the value of the web forum as the open source of user opinion. The importance of postings is evaluated in terms of the number of involved authors, so the spam distorts the analysis result by adding the unnecessary data in the opinion analysis. We propose the automatic detection model of spam postings in the web forum. We extract text features of posting contents using text mining techniques from the perspective of linguistics and then perform supervised learning to recognize spam from normal postings. Significant features are derived through the learning process and the automatic detection model is built based on those features. To build the automatic detection model of normal postings and spam, four evaluators are asked to recognize the spam posting in prior. We adopted the Naive Bayesian, Support Vector Machine (SVM), decision tree, which are known to perform well in data and text mining tasks. We can extract the text features to recognize the spam and detect automatically the newly posted spam. We apply the proposed model to the YahooFinace-Walmart forum, which is the world largest Walmart-related web forum.

      더보기

      참고문헌 (Reference)

      1 Hayati P., "Toward spam 2.0: An evaluation of Web 2.0 anti-spam methods Industrial Informatics" 875-880, 2009

      2 Buckland M., "The relationship between Recall and Precision" 45 (45): 12-19, 1999

      3 Gruhl D., "The predictive power of online chatter" KDD 78-87, 2005

      4 Vapnik VN., "The nature of statistical learning theory" Springer-Verlag 1995

      5 Gillin P., "The New Influencers, A Marketer’s Guide to the New Social Media" Quill Driver Books\Word Dancer Press 2007

      6 Robert F., "Syntax. Critical Concepts in Linguistics" Routledge 2006

      7 Dunning T., "Statistical Identification of Language" New Mexico State University 94-273, 1994

      8 Lin Y., "Splog detection using self-similarity analysis on blogtemporal dynamics" 2007

      9 Jindal N., "Opinion Spam and Analysis" WSDM’08 2008

      10 Lewis D., "Naive (Bayes) at forty: The independence assumption in information retrieval" Machine Learning 4-15, 1998

      1 Hayati P., "Toward spam 2.0: An evaluation of Web 2.0 anti-spam methods Industrial Informatics" 875-880, 2009

      2 Buckland M., "The relationship between Recall and Precision" 45 (45): 12-19, 1999

      3 Gruhl D., "The predictive power of online chatter" KDD 78-87, 2005

      4 Vapnik VN., "The nature of statistical learning theory" Springer-Verlag 1995

      5 Gillin P., "The New Influencers, A Marketer’s Guide to the New Social Media" Quill Driver Books\Word Dancer Press 2007

      6 Robert F., "Syntax. Critical Concepts in Linguistics" Routledge 2006

      7 Dunning T., "Statistical Identification of Language" New Mexico State University 94-273, 1994

      8 Lin Y., "Splog detection using self-similarity analysis on blogtemporal dynamics" 2007

      9 Jindal N., "Opinion Spam and Analysis" WSDM’08 2008

      10 Lewis D., "Naive (Bayes) at forty: The independence assumption in information retrieval" Machine Learning 4-15, 1998

      11 Morinaga S., "Mining product reputations on the Web" 341 : 2002

      12 Zinman A., "Is Britney Spears spam" 2007

      13 Quinlan JR., "Induction of decision trees. In Machine Learning"

      14 Benevenuto F., "Identifying Video Spammers in Online Social Networks" AIRWeb 2008

      15 Gwet K., "Handbook of Inter-Rater Reliability (Second Edition)" ISBN 2010

      16 Sampson S., "Gathering customer feedback via the Internet: instruments and prospects" 98 (98): 71-, 1998

      17 Glance N., "Deriving Marketing Intelligence from Online Discussion" KDD 2005

      18 Han S., "Collaborative blog spam filtering using adaptive percolation search" WWW 2006

      19 Mishne G., "Blocking Blog Spam with Language Model Disagreement" AIRWeb 2005

      20 Wanas N., "Automatic Scoring of Online Discussion Posts" WICOW 2008

      21 Paul K., "Analyzing Grammar: An Introduction" Cambridge University Press 35-, 2005

      22 Wenger A., "Analysis of travel bloggers' characteristics and their communication about Austria as a tourism destination" 14 (14): 2008

      23 Liu Y, "ARSA: A Sentiment-Aware Model for Predicting Sales Performance Using Blogs" SIGIR 2007

      24 Niu Y., "A Quantitative Study of Forum Spamming Using Context-based Analysis" 2007

      더보기

      동일학술지(권/호) 다른 논문

      동일학술지 더보기

      더보기

      분석정보

      View

      상세정보조회

      0

      Usage

      원문다운로드

      0

      대출신청

      0

      복사신청

      0

      EDDS신청

      0

      동일 주제 내 활용도 TOP

      더보기

      주제

      연도별 연구동향

      연도별 활용동향

      연관논문

      연구자 네트워크맵

      공동연구자 (7)

      유사연구자 (20) 활용도상위20명

      인용정보 인용지수 설명보기

      학술지 이력

      학술지 이력
      연월일 이력구분 이력상세 등재구분
      2028 평가예정 재인증평가 신청대상 (재인증)
      2022-01-01 평가 등재학술지 유지 (재인증) KCI등재
      2019-04-09 학회명변경 영문명 : 미등록 -> Korea Knowledge Information Technology Society KCI등재
      2019-01-01 평가 등재학술지 유지 (계속평가) KCI등재
      2016-01-01 평가 등재학술지 유지 (계속평가) KCI등재
      2014-03-17 학술지명변경 외국어명 : Journal of The Korea Knowledge Information Technology Society -> Journal of Knowledge Information Technology and Systems KCI등재
      2012-01-01 평가 등재학술지 선정 (등재후보2차) KCI등재
      2011-01-01 평가 등재후보 1차 PASS (등재후보1차) KCI등재후보
      2009-01-01 평가 등재후보학술지 선정 (신규평가) KCI등재후보
      더보기

      학술지 인용정보

      학술지 인용정보
      기준연도 WOS-KCI 통합IF(2년) KCIF(2년) KCIF(3년)
      2016 0.39 0.39 0.29
      KCIF(4년) KCIF(5년) 중심성지수(3년) 즉시성지수
      0.25 0.22 0.312 0.07
      더보기

      이 자료와 함께 이용한 RISS 자료

      나만을 위한 추천자료

      해외이동버튼