This study examines the design, implementation, and evaluation of a RAG-based legal chatbot specialized in personal information protection. The dataset was built from authoritative sources, including the Personal Information Protection Act, its Enforc...
This study examines the design, implementation, and evaluation of a RAG-based legal chatbot specialized in personal information protection. The dataset was built from authoritative sources, including the Personal Information Protection Act, its Enforcement Decree, official notifications, and guidelines. The retrieval system combines SBERT embeddings, TF-IDF, and BM25 in an ensemble retriever, integrated with FAISS vector search and cross- encoder re-ranking. To address the limitations of traditional metrics in the legal domain, a multiple-choice evaluation framework was applied. In addition to accuracy, the metrics include QSS for concept count, CSS for semantic similarity among choices, and sensitivity to input variations. Experiments with LLaMA3.1 (8b, 70b), LLaMA3.3 (70b), and Gemma3 (1b, 27b) showed that larger models achieved higher accuracy and consistency, especially for complex queries and similar options. LLaMA3.3 70b performed best. This work demonstrates the feasibility of a chatbot grounded in the Personal Information Protection Act and proposes an evaluation framework to validate the reliability and validity of RAG systems in the legal domain.