The demand for test-taking has become an integral part of the academic experience for Korean secondary EFL learners, who aim to excel in a highly competitive, exam-oriented environment. At the center of high-stakes exams lies the College Scholastic Ab...
The demand for test-taking has become an integral part of the academic experience for Korean secondary EFL learners, who aim to excel in a highly competitive, exam-oriented environment. At the center of high-stakes exams lies the College Scholastic Ability Test (CSAT). Gap-filling inference items in the CSAT English section have historically been the most difficult and discriminatory, posing significant challenges for test-takers. However, these gap-filling inference items have faced many criticisms for being susceptible to test-wiseness—the ability of test-takers to use cues independently of the knowledge or skills being assessed. Many practitioners, including teachers and administrators, question whether such items are sufficient for determining the anticipated test-taking process, which involves drawing logical inferences based on contextual understanding. The purpose of the present study is to examine the test-taking strategies used by test-takers while processing these gap-filling inference items. Additionally, the study will discuss whether these test-taking processes align with the assessment's goals, to determine if these items can legitimately claim cognitive validity—defined as the extent to which a test accurately engages the theoretical cognitive processes necessary to complete the task as designed.
The present study explored 20 Korean high school senior EFL test-takers’ test-taking processes for six gap-filling inference items of CSAT English section, inferencing tasks that require test-takers to draw on relevant textual information to infer missing information, ensuring a coherent understanding of the text as a whole. The process of test performance was tracked via web-based eye-tracking measures and posterior stimulated recall interviews were conducted to verify the types of test-taking strategies involved and describe how they used them and why. Based on the predesigned coding scheme, which was later revised and refined through analysis of the transcribed interview data, the frequency of each test-taking strategy was recorded and visualized in 9-Quadrant Matrix, with the horizontal axis representing construct-inappropriate strategies (IA) and the vertical axis representing construct appropriate strategies (A), illustrating the interplay between the two types of strategies across test-taking instances.
Among the 9 possible cases, three scenarios that challenge the validity of gap-filling inference items were identified. Case 1 (High IA x Medium A) represents overuse of IA, involving successful instances where test-takers relied heavily on IA despite limited use of A. Case 2 (Medium A x Low IA) represents underuse of IA, revealing that insufficient use of IA led to failure. Case 3 (High A x High IA) highlights unnecessary use of IA, signaling successful instances where test-takers with High A, who should theoretically not depend on IA, still relied on it to answer correctly. These cases were investigated by utilizing verbal reports from stimulated recall interviews and visualized eye-tracking data, such as gaze plots and heatmaps.
The findings reveal that each scenario, overuse, underuse, or retreating use of IA, undermine the cognitive validity of these items. Case 1 participants relied heavily on IA by bringing in preconceptions as to where the sentence containing the test-wise cues typically lie, which acted as shortcuts to locate stem-specific or central information, after which they specifically adhered to the content of those sentences to match the option. Case 2 participants employed IA by activating prior knowledge to compensate for their lack of higher-order strategies, particularly when text cohesion was insufficient or inference using the implicit contextual coherence was required. Case 3 participants relied on IA by combining preconceived test-wise cues and prior knowledge to either guide their approach or compensate for their inability to make precise text-based inferences. In all cases, the test-taking processes raised concerns about cognitive validity. Overuse of unrequired skills led to item success, underuse resulted in failure, and the assessment of constructs independently of test-wiseness proved unfeasible.
The findings emphasize the need for pedagogical approaches that address the overreliance on IA strategies. Educators should focus on fostering higher-order reading skills, such as contextual understanding and text-based inferencing, to reduce dependence on construct-inappropriate strategies and enhance the cognitive validity of gap-filling inference items.