문서객체모델 기반의 개인정보 처리방침 웹 페이지 정보 자동 추출 시스템 개발|RISS 상세보기

다국어 초록 (Multilingual Abstract)

Today, many companies and government agencies are utilizing a variety of methods for the development and management of the personal information protection system. Especially, in accordance with the Privacy Act, the Privacy Policy must be published by the one who collects and uses the personal information and it must include certain information about using the personal information. Recently, the Ministry of Security and Public Administration and other government agencies have continued to monitor and evaluate the personal information protection system of companies and government agencies by analyzing information on the Privacy Policy web pages. For more accurate evaluation and monitoring, the precise information collecting is more important than the design of evaluation method. However, so far, the evaluation process takes a long time and is inaccuracy, because investigators(human) collected information by themselves.
In this study, we have developed the Document Object Model based automatic information extraction system for the Privacy Policy web pages . The Existing web data extraction methodologies, that can extract web data by analyzing the overlapped patters from a lot of similar pattered web pages, can not be used for the Privacy Policy web pages, because the Privacy Policy has no rules or formats, so it is impossible to analyze the overlapped patters from the Privacy Policy web pages. To overcome the limitations of the existing methodologies, we have developed the system that can analyze the structure of individual Privacy Policy web page and extract data from it. For the development of system, we have utilized TFIDF(Term Frequency-Inverse Document Frequency), cosine similarity, the Network Distance Based similarity(NDB similarity) in addition to Document Object Model. The system can automatically extract data from the Privacy Policy web pages and it will be used for devising the other unstructured wed data extraction methodologies.

번역하기

국문 초록 (Abstract)

오늘날 많은 기업 및 정부기관들은 안전한 개인정보 보호체계의 구축과 관리를 위하여 다양한 수단과 방법을 활용하고 있다. 특히 개인정보보호법에 의거하여 개인정보를 수집하고 사용하는 개인정보처리자는 개인정보의 수집과 사용에 대한 일정내역을 개인정보 처리방침에 기재하여 공개해야한다. 또한 최근에는 안전행정부 등의 정부 기관을 필두로 개인정보 처리방침의 정보를 수집 및 분석하여 기업과 정부기관의 개인정보 보호체계를 평가하고 감시하려는 노력이 계속되고 있다. 이러한 방식으로 기업과 정부기관의 개인정보 보호체계를 정확하게 평가하기 위해서는 타당하고 객관적인 평가 방법의 수립도 중요하지만, 무엇보다도 개인정보 처리방침의 정보를 정확하게 수집하여 평가에 활용하는 것이 더 중요하다. 하지만 지금까지의 평가에서는 조사원(사람)에 의해서 정보가 수집되었기 때문에 시간이 오래 걸리고 그 정확도 또한 보장 할 수 없었다.
이에 본 연구에서는 문서객체모델(Document Object Model)을 기반으로 개인정보 처리방침 웹 페이지의 정보를 자동으로 추출할 수 있는 시스템을 개발하였다. 기존의 웹 데이터 추출방법론은 다수의 웹 페이지를 통하여 공통적인 구조의 패턴을 분석하고 정보를 추출하는 방식을 활용했는데, 개인정보 처리방침의 경우 문서별로 작성자가 다르고 정해진 형식이 없어서 문서별 구조가 상이하기 때문에 다수의 웹 페이지에서 공통적인 구조의 패턴을 분석하는 것이 불가능하다. 이러한 기존 방법론의 한계점을 극복하고 보다 정확하게 개인정보 처리방침 웹 페이지에서 데이터를 추출하기 위해 본 연구에서는 추출의 대상이 되는 데이터의 항목을 미리 정의하고, 해당 항목이 존재하는 데이터영역을 각각의 개인정보 처리방침 웹 페이지 별로 탐색하여 정확한 데이터 추출이 가능하도록 하였다. 이러한 시스템의 구축을 위하여 문서객체모델 이외에도 TFIDF(Term Frequency-Inverse Document Frequency),코사인유사도(Cosine Similarity), 네트워크 거리 기반 유사도(Network Distance Based Similarity, NDB유사도) 등을 활용하였다. 본 연구에서 구축한 시스템을 통하여 정확하고 효율적인 개인정보 처리방침 웹 페이지 데이터의 추출을 가능하게 했으며, 시스템 구축과정에서 개발한 방법론이 다른 도메인의 비정형 웹 데이터 추출 방법론 개발에서도 활용될 것을 기대한다.

번역하기

오늘날 많은 기업 및 정부기관들은 안전한 개인정보 보호체계의 구축과 관리를 위하여 다양한 수단과 방법을 활용하고 있다. 특히 개인정보보호법에 의거하여 개인정보를 수집하고 사용하...

상세검색

RISS 보유자료

상세검색

해외전자자료

문서객체모델 기반의 개인정보 처리방침 웹 페이지 정보 자동 추출 시스템 개발

부가정보

분석정보

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료