Improving Cross-domain Authorship Attribution by Combining Lexical and Syntactic Features: Notebook for PAN at CLEF 2019

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F19%3A10427040" target="_blank" >RIV/00216208:11320/19:10427040 - isvavai.cz</a>
Výsledek na webu
<a href="https://research.rug.nl/en/publications/improving-cross-domain-authorship-attribution-by-combining-lexica" target="_blank" >https://research.rug.nl/en/publications/improving-cross-domain-authorship-attribution-by-combining-lexica</a>
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Improving Cross-domain Authorship Attribution by Combining Lexical and Syntactic Features: Notebook for PAN at CLEF 2019
Popis výsledku v původním jazyce
Authorship attribution is a problem in information retrieval and computationallinguistics that involves attributing authorship of an unknown documentto an author within a set of candidate authors. Because of this, PAN-CLEF2019 organized a shared task that involves creating a computational model thatcan determine the author of a fanfiction story. The task is cross-domain becauseof the open set of fandoms to which the documents belong. Additionally, theset of candidate authors is also open since the actual author of a document maynot be among the candidate authors.We extracted character-level, word-level andsyntactic information from the documents in order to train a support vector machine.Our approach yields an overall macro-averaged F1 score of 0.687 on thedevelopment data of the shared task. This is an improvement of 18.7% over thecharacter-level lexical baseline. On the test data, our model achieves an overallmacro F1 score of 0.644.We compare different feature types and find that charactern-grams are the most informative feature type though all tested feature typescontribute to the performance of the model.
Název v anglickém jazyce
Improving Cross-domain Authorship Attribution by Combining Lexical and Syntactic Features: Notebook for PAN at CLEF 2019
Popis výsledku anglicky
Authorship attribution is a problem in information retrieval and computationallinguistics that involves attributing authorship of an unknown documentto an author within a set of candidate authors. Because of this, PAN-CLEF2019 organized a shared task that involves creating a computational model thatcan determine the author of a fanfiction story. The task is cross-domain becauseof the open set of fandoms to which the documents belong. Additionally, theset of candidate authors is also open since the actual author of a document maynot be among the candidate authors.We extracted character-level, word-level andsyntactic information from the documents in order to train a support vector machine.Our approach yields an overall macro-averaged F1 score of 0.687 on thedevelopment data of the shared task. This is an improvement of 18.7% over thecharacter-level lexical baseline. On the test data, our model achieves an overallmacro F1 score of 0.644.We compare different feature types and find that charactern-grams are the most informative feature type though all tested feature typescontribute to the performance of the model.

Klasifikace

Druh
O - Ostatní výsledky
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
—

Ostatní

Rok uplatnění
2019
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Podobné výsledky(10)

Combining Textual and Speech Features in the NLI Task Using State-of-the-Art Machine Learning Techniques Combining Textual and Speech Features in the NLI Task Using State-of-the-Art Machine Learning Techniques Authorship Verification based on Syntax Features

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Improving Cross-domain Authorship Attribution by Combining Lexical and Syntactic Features: Notebook for PAN at CLEF 2019

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Podobné výsledky(10)