Multilingual Stylometry. The Influence of Language on the Performance of Authorship Attribution using Corpora from the European Literary Text Collection (ELTeC)
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68378068%3A_____%2F24%3A00603253" target="_blank" >RIV/68378068:_____/24:00603253 - isvavai.cz</a>
Result on the web
<a href="https://ceur-ws.org/Vol-3834/paper9.pdf" target="_blank" >https://ceur-ws.org/Vol-3834/paper9.pdf</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Multilingual Stylometry. The Influence of Language on the Performance of Authorship Attribution using Corpora from the European Literary Text Collection (ELTeC)
Original language description
Stylometric authorship attribution is concerned with the task of assigning texts of unknown, pseudony- mous or disputed authorship to their most likely author, often based on a comparison of the frequency of a selected set of features that represent the texts. The parameters of the analysis, such as feature selec- tion and the choice of similarity measure or classification algorithm, have received significant attention in the past. Two additional key factors for the performance and reliability of stylometric methods, how- ever, have so far received less attention, namely corpus composition and corpus language. As a first step, the aim of this study is to investigate the influence of language on the performance of stylometric authorship attribution. We address this question using four different corpora derived from the European Literary Text Collection (ELTeC). We use machine-translation to obtain each corpus in the other three languages. We find that, as expected, the attribution accuracy varies between language-based corpora, and that translated corpora, on average, display a lower attribution accuracy compared to their counter- parts in the original language. Overall, our study contributes to a better understanding of stylometric methods of authorship attribution.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
60206 - Specific literatures
Result continuities
Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
CHR 2024: Computational Humanities Research 2024: Proceedings of the Computational Humanities Research Conference 2024
ISBN
—
ISSN
1613-0073
e-ISSN
—
Number of pages
23
Pages from-to
386-408
Publisher name
Technical University & CreateSpace Independent Publishing
Place of publication
Aachen
Event location
Aarhus
Event date
Dec 4, 2024
Type of event by nationality
EUR - Evropská akce
UT code for WoS article
—