(An) exploratory validation of standardized EFL speaking tests based on the theoretical framework of test usefulness|RISS 상세보기

다국어 초록 (Multilingual Abstract)

The primary purpose of this study was to investigate test usefulness for the five leading standardized speaking tests Task characteristics, scoring rubrics, and test methods were evaluated from these tests in order to further investigate the variances that influence speaking performances.
The following components of test usefulness were checked and measured: reliability, construct validity, authenticity, and interactiveness. Differences in task types, testing, and scoring methods were identified as sources of variance that influence speaking assessments. Therefore, the different test tasks and the contributions of the tasks relative to a test taker’s speaking ability were examined for test usefulness. Due to limited resources, actual one-on-one interviews are not always feasible in L2 testing conditions. Therefore, for this study the commonly substituted method of the Simulated Oral Proficiency Interview (SOPI), was administered, with results compared to the Oral Proficiency Interview (OPI) method. As various tasks require different scoring rubrics, the nature of these scoring rubrics was also examined to determine their impact on test takers.
To address these issues, two tasks from the Test of English for International Communication (TOEIC) Speaking, the Test of English Proficiency developed by Seoul National University (TEPS) Speaking, the International English Language Testing System (IELTS) Speaking – General Training, the Test of English as a Foreign Language (TOEFL) iBT Speaking, and the American Council on the Teaching of Foreign Languages (ACTFL) Oral Proficiency Interview Computer Test (OPIc) were administered to seventy four college students. These standardized tests were administrated using the SOPI method, while the one-on-one interview was conducted following the OPI method. Examinees’ performances were rated three times: first, according to a rubric based on communicative language ability (CLA), and then following the rubric that originally accompanied each task, and finally according to a holistic rubric. Task completion was added to the CLA rubric to further examine the task effects on the test takers. After this, the data were analyzed using several analytic methods, including a multi-faceted Rasch model and factor analysis.
The results indicated that the TOEFL and IELTS had the most overall usefulness, characterized by a good degree of authenticity and interactiveness. On the other hand, the TOEIC was the least useful test, with an ill-defined Target Language Use (TLU) task and TLU domain. Factor analysis revealed that unlike the high correlation found among all the tests and the interview from the preliminary reliability estimation and previous research, no other test loaded on the same factor as the interview. Therefore, while OPI may not be a replica of real communication, it was at least found to measure different constructs of speaking ability when compared to the other SOPI methods.
For the task evaluation, the results indicated that overall, the TOEFL and the IELTS with integrated tasks had the highest test usefulness. These two tasks were also the most difficult tasks for the test takers and they both had a high degree of authenticity and interactiveness. On the other hand, a low degree of authenticity and interactiveness did not necessarily coincide with ease of the test task. Therefore, the qualities of a test task should not be evaluated independently, nor determined by a single quality of a given test. The findings also revealed latent factors based not on the operational constructs of speaking ability, but according to the tasks. Different scoring rubrics yielded different performance measures; however, a CLA rubric and holistic scoring were consistent in producing stable measures, regardless of the different tasks.
Based on findings on test usefulness and the variables that affect speaking ability, it is recommended to first develop a test that has a well-defined TLU domain in order to improve the quality of speaking assessments in current English as a Foreign Language (EFL) settings. The correspondences between the TLU domain, the TLU task, and the test task were shown to be the most important features of test usefulness. Next, it was found that assessing speaking ability via the SOPI method is necessary yet provides insufficient evidence for the test taker’s speaking ability. Therefore, the SOPI method should be accompanied by the OPI method. When such assessments are not feasible, as in most cases for our L2 learning environments, task selection for a SOPI test should include at least one task that is similar to a one-on-one interview in terms of the task characteristics and test usefulness.

번역하기

목차 (Table of Contents)

TABLE OF CONTENTS
CHAPTER 1
INTRODUCTION 1
1.1 Purpose of the Study 1

TABLE OF CONTENTS
CHAPTER 1
INTRODUCTION 1
1.1 Purpose of the Study 1
1.2 Research Questions 4
CHAPTER 2
LITERATURE REVIEW 5
2.1 Communicative Language Ability 5
2.2 Test Usefulness 7
2.2.1 Qualities of Test Usefulness 7
2.2.2 Test Usefulness for the Five Tests 8
2.2.3 Language Use in Language Tests 10
2.3 OPI, SOPI, and COPI 11
2.4 Holistic Rating versus Analytic Rating 14
2.5 The Rasch Model 16
CHAPTER 3
METHODOLOGY 18
3.1 Participants 18
3.1.1 Examinees 18
3.1.2 Raters 20
3.2 Instruments 20
3.2.1 Five Standardized Speaking Tests 20
3.2.1.1 Test Descriptions 21
3.2.1.2 TLU Domain 24
3.2.2 Task Characteristics and the Checklist 25
3.2.3 Rating Scales 28
3.2.3.1 CLA Rubric 28
3.2.3.2 Each Task Rubric 28
3.3 Data Collection and Scoring Procedures 30
3.3.1 Test Administration Procedures 30
3.3.2 Scoring Procedures 30
3.4 Analysis 32
3.4.1 Computer Equipment and Software 32
3.4.2 Descriptive Statistics 32
3.4.3 Reliability Analysis 32
3.4.4 Rasch Measurement 33
3.4.5 Factor Analysis 33
CHAPTER 4
RESULTS AND DISCUSSION 34
4.1 Authenticity and Interactiveness 34
4.1.1 The TOEIC 38
4.1.1.1 Task 2 38
4.1.1.2 Task 3 42
4.1.2 The TEPS 43
4.1.2.1 Task 4 43
4.1.2.2 Task 5 43
4.1.3 The IELTS 44
4.1.3.1 Task 6 44
4.1.3.2 Task 7 45
4.1.4 The OPIc: Task 8 46
4.1.5 The TOEFL 47
4.1.5.1 Task 9 47
4.1.5.2 Task 10 47
4.1.6 Summary 52
4.2 Descriptive Statistics and Reliability Estimation 55
4.2.1 Descriptive Statistics 55
4.2.1.1 CLA Rubric 55
4.2.1.2 Each Rubric 60
4.2.2 Reliability Estimation 64
4.2.2.1 Inter-Rater Reliability for Rating Scales 64
4.2.2.2 Inter-Rater Reliability for Each Task 65
4.2.2.3 Internal Consistency for Rating Scales 68
4.3 Construct Validity 69
4.3.1 FACETS Analysis 70
4.3.1.1 All FACETS Vertical Ruler 70
4.3.1.2 Measurement Report 73
4.3.2 Factor Analysis 75
4.3.3 Summary 78
4.4 Test Task Validation 79
4.4.1 FACETS Analysis for the Test Tasks 79
4.4.1.1 All FACETS Vertical Ruler 79
4.4.1.2 Measurement Report 81
4.4.1.2.1 Examinee Report 81
4.4.1.2.2 Rater Report 85
4.4.1.2.3 Task Report 86
4.4.1.2.4 Criteria Report 87
4.4.1.3 Category Statistics 88
4.4.1.4 Fit Statistics and Unexpected Responses 89
4.4.1.4.1 Fit Statistics and Overfit Values 89
4.4.1.4.2 Misfit Values and Unexpected Responses 90
4.4.1.5 Bias Interaction Report 96
4.4.1.6 Criteria Measurement by Task 102
4.4.2 Factor Analysis for the Test Tasks 104
4.4.2.1 Factor Analysis by Task 104
4.4.2.2 Factor Analysis on All Items 107
4.5 Analysis of the Scoring Rubric 110
4.5.1 FACETS Analysis with Correlation Coefficient 110
4.5.2 Measures’ Correlation Coefficient 115
4.5.2.1 CLA Rubric 116
4.5.2.2 Holistic Rubric 117
4.5.2.3 Each Rubric 118
4.6 Summary and Discussions 121
CHAPTER 5
CONCLUSION AND IMPLICATIONS 125
5.1 Summary 125
5.2 Implications 131
5.3 Limitations 132
REFERENCES 134
APPENDICES 143

상세검색

RISS 보유자료

상세검색

해외전자자료

(An) exploratory validation of standardized EFL speaking tests based on the theoretical framework of test usefulness

부가정보

분석정보

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료