Problems of Authorship Classification: recognizing the Author Style or a Book

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00023221%3A_____%2F23%3AN0000063" target="_blank" >RIV/00023221:_____/23:N0000063 - isvavai.cz</a>
Result on the web
<a href="https://www.digitalhumanities.org/dhq/vol/17/4/000723/000723.html" target="_blank" >https://www.digitalhumanities.org/dhq/vol/17/4/000723/000723.html</a>
DOI - Digital Object Identifier
—

Result language
angličtina
Original language name
Problems of Authorship Classification: recognizing the Author Style or a Book
Original language description
The presented article proposes that one of the problems regarding authorship attribution tasks is the attribution of a specific book rather than the author. This often leads to overestimated reported performance. This problem is in general connected to the dataset construction and more specifically to the train-test data split. Using a heavily delexicalized and diverse dataset of Czech authors and basic LinearSVC classifiers, we designed a three-step experiment setting to explore book versus author attribution effects. First, the authorship attribution task is performed on a dataset split to train and test data segments across books. Second, the same task is performed on a dataset where individual books are used wholly either for training or testing. Expectedly, this leads to poorer results. In the third step, we do not attribute book segments to authors but to books themselves. This step reveals that there is a general tendency towards attributing to a specific book rather than to different books of the same author. The results indicate that authors who show a higher inner confusion among their works (i.e., the model attributes their works to other works of theirs) tend to perform better in the task of attribution of an unseen book.
Czech name
—
Czech description
—

Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Similar results(10)