http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
이진 분류 데이터 세트 내 편향 제거를 위한 다차원 서브셋 기반 시스템
변경수(KyeongSu Byun),김구(Goo Kim),권준호(Joonho Kwon) Korean Institute of Information Scientists and Eng 2023 정보과학회논문지 Vol.50 No.5
As artificial intelligence technology develops, artificial intelligence-related fairness issues are drawing attention. As a result, many related studies have been conducted on this issue, but most of the research has focused on developing models and training methods. Research on removing bias existing in data used for learning, which is a fundamental cause, is still insufficient. Therefore, in this paper, we designed and implemented a system that divides the biases existing within the data into label biases and subgroup biases and removes the biases to generate datasets with improved fairness. The proposed system consists of two steps: (1) subset generation and (2) bias removal. First, the subset generator divides the existing data into subsets on formed by a combination of values in an datasets. Subsequently, the subset is divided into dominant and weak groups based on the fairness indicator values obtained by validating the existing datasets based on the validation datasets. Next, the bias remover reduces the bias shown in the subset by repeating the process of sequentially extracting and verifying the dominant group of each subset to reduce the difference from the weak group. Afterwards, the biased subsets are merged and a fair data set is returned. The fairness indicators used for the verification use the F1 score and the equalized odd. Comprehensive experiments with real-world Census incoming data, COMPAS data, and bank marketing data as verification data demonstrated that our proposed system outperformed the existing technique by yielding a better fairness improvement rate and providing more accuracy in most machine learning algorithms.