Machine-learning methods for item difficulty prediction using item text features
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F67985807%3A_____%2F22%3A00570007" target="_blank" >RIV/67985807:_____/22:00570007 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.psychometricsociety.org/sites/main/files/file-attachments/imps2022-version-7-abstract.pdf?1656714871" target="_blank" >https://www.psychometricsociety.org/sites/main/files/file-attachments/imps2022-version-7-abstract.pdf?1656714871</a>
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Machine-learning methods for item difficulty prediction using item text features
Popis výsledku v původním jazyce
ZÁKLADNÍ ÚDAJE: IMPS 2022 International Meeting of the Psychometric Society. Book of Abstracts (Talks, Posters). Bologna: Psychometric Society, 2022. s. 161-161. [IMPS 2022. International Meeting of the Psychometric Society. 11.07.2022-15.07.2022, Bologna]. ABSTRAKT: Item difficulty predictions using various text features extracted from items’ wordings may help to build a test appropriately, particularly when pre-tests are limited. In this work, we examine and compare different machine learning methods for prediction of item difficulty using features from text analysis of item wordings. We employ multivariate regression, support vector machine, regression trees, random forests, and back-propagation neural networks in both frameworks, i.e., as supervised regression and classification algorithms, respectively. Furthermore, for item difficulty classification, we also build naïve Bayes classifier, and the multivariate regression designed in multinomial fashion. While the supervised regression algorithms consider the item difficulty as a continuous dependent variable, the supervised classification approaches treat the item difficulty as a variable split into a few disjunctive classes. Methods are illustrated on items of an English language test of the Czech matura exam. Although the regression and classification tasks could not be mutually compared, within the given task, the models differ in their performance. Using k-fold cross validation and several performance metrics, support vector machines and random forests usually outperform others.
Název v anglickém jazyce
Machine-learning methods for item difficulty prediction using item text features
Popis výsledku anglicky
ZÁKLADNÍ ÚDAJE: IMPS 2022 International Meeting of the Psychometric Society. Book of Abstracts (Talks, Posters). Bologna: Psychometric Society, 2022. s. 161-161. [IMPS 2022. International Meeting of the Psychometric Society. 11.07.2022-15.07.2022, Bologna]. ABSTRAKT: Item difficulty predictions using various text features extracted from items’ wordings may help to build a test appropriately, particularly when pre-tests are limited. In this work, we examine and compare different machine learning methods for prediction of item difficulty using features from text analysis of item wordings. We employ multivariate regression, support vector machine, regression trees, random forests, and back-propagation neural networks in both frameworks, i.e., as supervised regression and classification algorithms, respectively. Furthermore, for item difficulty classification, we also build naïve Bayes classifier, and the multivariate regression designed in multinomial fashion. While the supervised regression algorithms consider the item difficulty as a continuous dependent variable, the supervised classification approaches treat the item difficulty as a variable split into a few disjunctive classes. Methods are illustrated on items of an English language test of the Czech matura exam. Although the regression and classification tasks could not be mutually compared, within the given task, the models differ in their performance. Using k-fold cross validation and several performance metrics, support vector machines and random forests usually outperform others.
Klasifikace
Druh
O - Ostatní výsledky
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
<a href="/cs/project/GA21-03658S" target="_blank" >GA21-03658S: Teoretické základy výpočetní psychometrie</a><br>
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2022
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů