Machine-learning methods for item difficulty prediction using item text features
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F67985807%3A_____%2F22%3A00570007" target="_blank" >RIV/67985807:_____/22:00570007 - isvavai.cz</a>
Result on the web
<a href="https://www.psychometricsociety.org/sites/main/files/file-attachments/imps2022-version-7-abstract.pdf?1656714871" target="_blank" >https://www.psychometricsociety.org/sites/main/files/file-attachments/imps2022-version-7-abstract.pdf?1656714871</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Machine-learning methods for item difficulty prediction using item text features
Original language description
ZÁKLADNÍ ÚDAJE: IMPS 2022 International Meeting of the Psychometric Society. Book of Abstracts (Talks, Posters). Bologna: Psychometric Society, 2022. s. 161-161. [IMPS 2022. International Meeting of the Psychometric Society. 11.07.2022-15.07.2022, Bologna]. ABSTRAKT: Item difficulty predictions using various text features extracted from items’ wordings may help to build a test appropriately, particularly when pre-tests are limited. In this work, we examine and compare different machine learning methods for prediction of item difficulty using features from text analysis of item wordings. We employ multivariate regression, support vector machine, regression trees, random forests, and back-propagation neural networks in both frameworks, i.e., as supervised regression and classification algorithms, respectively. Furthermore, for item difficulty classification, we also build naïve Bayes classifier, and the multivariate regression designed in multinomial fashion. While the supervised regression algorithms consider the item difficulty as a continuous dependent variable, the supervised classification approaches treat the item difficulty as a variable split into a few disjunctive classes. Methods are illustrated on items of an English language test of the Czech matura exam. Although the regression and classification tasks could not be mutually compared, within the given task, the models differ in their performance. Using k-fold cross validation and several performance metrics, support vector machines and random forests usually outperform others.
Czech name
—
Czech description
—
Classification
Type
O - Miscellaneous
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/GA21-03658S" target="_blank" >GA21-03658S: Theoretical foundations of computational psychometrics</a><br>
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2022
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů