Problems of Authorship Classification: recognizing the Author Style or a Book
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00023221%3A_____%2F23%3AN0000063" target="_blank" >RIV/00023221:_____/23:N0000063 - isvavai.cz</a>
Result on the web
<a href="https://www.digitalhumanities.org/dhq/vol/17/4/000723/000723.html" target="_blank" >https://www.digitalhumanities.org/dhq/vol/17/4/000723/000723.html</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Problems of Authorship Classification: recognizing the Author Style or a Book
Original language description
The presented article proposes that one of the problems regarding authorship attribution tasks is the attribution of a specific book rather than the author. This often leads to overestimated reported performance. This problem is in general connected to the dataset construction and more specifically to the train-test data split. Using a heavily delexicalized and diverse dataset of Czech authors and basic LinearSVC classifiers, we designed a three-step experiment setting to explore book versus author attribution effects. First, the authorship attribution task is performed on a dataset split to train and test data segments across books. Second, the same task is performed on a dataset where individual books are used wholly either for training or testing. Expectedly, this leads to poorer results. In the third step, we do not attribute book segments to authors but to books themselves. This step reveals that there is a general tendency towards attributing to a specific book rather than to different books of the same author. The results indicate that authors who show a higher inner confusion among their works (i.e., the model attributes their works to other works of theirs) tend to perform better in the task of attribution of an unseen book.
Czech name
—
Czech description
—
Classification
Type
J<sub>ost</sub> - Miscellaneous article in a specialist periodical
CEP classification
—
OECD FORD branch
60500 - Other Humanities and the Arts
Result continuities
Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Digital Humanities Quarterly
ISSN
1938-4122
e-ISSN
—
Volume of the periodical
2023
Issue of the periodical within the volume
17.4
Country of publishing house
US - UNITED STATES
Number of pages
22
Pages from-to
—
UT code for WoS article
—
EID of the result in the Scopus database
—