With the advancement of digital image editing technologies, document forgery has become increasingly sophisticated, raising the need for precise detection techniques. Traditional image processing-based methods have limitations in capturing complex for...
With the advancement of digital image editing technologies, document forgery has become increasingly sophisticated, raising the need for precise detection techniques. Traditional image processing-based methods have limitations in capturing complex forgery traces, while deep learning-based methods often suffer from degraded generalization performance in cross-domain environments or bias toward specific language datasets. To address these issues, this study proposes ASEF-Net (Attention-based Stacking Ensemble with an Edge-Focused Convolutional Network), a multimodal stacking ensemble model that combines the strengths of traditional and deep learning approaches to ensure stable detection performance across diverse document environments.
The proposed ASEF-Net emphasizes forgery traces using Log-Transform Histogram Equalization (LTHE) and extracts structural, visual, and textural features as inputs for machine learning models. Simultaneously, it utilizes EdgeCNN, designed to focus on edge information, to capture minute discontinuities within the image. The predictions from these base models are adaptively integrated through an attention-based meta-model to determine the final authenticity of the document. Furthermore, this study constructed a new Korean-based 'Transcript' dataset, mimicking academic transcripts, to resolve the linguistic bias of existing datasets and reflect the structural characteristics of Korean documents.
Experimental results using public datasets (DocTamper, Receipt, MIDV-2020) and the constructed Transcript dataset demonstrated that ASEF-Net achieved superior classification accuracy and generalization performance compared to state-of-the-art detection models. In particular, the model proved its robustness in robustness evaluations involving Gaussian blur and geometric distortions, as well as in class imbalance scenarios. This study is significant in that it enhances the reliability of document forgery detection through multimodal feature fusion and confirms its applicability in real-world forensic environments.