Authorship and Time Attribution of Arabic Texts Using JGAAP
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F18%3A10365864" target="_blank" >RIV/00216208:11210/18:10365864 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1007/978-3-319-67056-0_16" target="_blank" >http://dx.doi.org/10.1007/978-3-319-67056-0_16</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-67056-0_16" target="_blank" >10.1007/978-3-319-67056-0_16</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Authorship and Time Attribution of Arabic Texts Using JGAAP
Popis výsledku v původním jazyce
One basic task in Natural Language processing is text classification, such as sorting documents by their content. A less well-known variant on this task is classifying documents by inferred metadata, such as the document's (inferred) language, date of composition or authorship. Authorship attribution is a well-studied problem, but most of the work done has been in major European languages such as English. [Notable exceptions who have studied Arabic, in particular, include. We present a study selected from a new corpus (CLAUDia) containing nearly a half-billion words of Arabic text using a standard authorship analysis tool (JGAAP) to study the effects of author, genre, and time of composition on writing style and by extension on classification. We have selected a subcorpus balanced to permit comparisons between genres as well as between time periods to see how best-performing methods change with genre and time. We also provide an analysis of a larger variety of different feature sets than has previously been done for Arabic.
Název v anglickém jazyce
Authorship and Time Attribution of Arabic Texts Using JGAAP
Popis výsledku anglicky
One basic task in Natural Language processing is text classification, such as sorting documents by their content. A less well-known variant on this task is classifying documents by inferred metadata, such as the document's (inferred) language, date of composition or authorship. Authorship attribution is a well-studied problem, but most of the work done has been in major European languages such as English. [Notable exceptions who have studied Arabic, in particular, include. We present a study selected from a new corpus (CLAUDia) containing nearly a half-billion words of Arabic text using a standard authorship analysis tool (JGAAP) to study the effects of author, genre, and time of composition on writing style and by extension on classification. We have selected a subcorpus balanced to permit comparisons between genres as well as between time periods to see how best-performing methods change with genre and time. We also provide an analysis of a larger variety of different feature sets than has previously been done for Arabic.
Klasifikace
Druh
C - Kapitola v odborné knize
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
<a href="/cs/project/GA13-28220S" target="_blank" >GA13-28220S: Struktury kultury: Arabsko-islámská kultura prismatem korpusové lingvistiky</a><br>
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2018
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název knihy nebo sborníku
Intelligent Natural Language Processing: Trends and Applications
ISBN
978-3-319-67056-0
Počet stran výsledku
25
Strana od-do
325-349
Počet stran knihy
776
Název nakladatele
Springer
Místo vydání
Cham
Kód UT WoS kapitoly
—