The purpose of the present study is to assess second language (L2) spoken English using automated scoring techniques. Automated scoring aims to classify a large set of learners` oral performance data into a small number of discrete oral proficiency le...
The purpose of the present study is to assess second language (L2) spoken English using automated scoring techniques. Automated scoring aims to classify a large set of learners` oral performance data into a small number of discrete oral proficiency levels. In automated scoring, objectively measurable features such as the frequencies of lexical and grammatical items are generally used as exploratory variables to predict oral proficiency levels, any of which can be used as a criterion variable in this study. We have chosen the NICT JLE Corpus, a corpus of 1,281 Japanese EFL learners` speech productions coded into nine oral proficiency levels (Izumi, Uchimoto, & Isahara, 2004). The nine oral proficiency levels were used as the criterion variables and linguistic features analyzed in Biber (1988) as explanatory variables. We employed random forests (Breiman, 2001), a powerful method for text classification and feature extraction, to predict oral proficiency. As a result of random forests with the out-of-bag error estimate, 60.11% of the productions were correctly classified. Compared to the baseline accuracy of the simplest possible algorithm of always choosing the most frequent level (37.63%), our random forests model improved prediction by 22.48 points. The Pearson product-moment correlation coefficient with human scoring was 0.85. Predictors that showed a clear discrimination of oral proficiency levels were tokens, types, and the frequency of nouns in the order of strength.