Style Markers Based on Stop-word List
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F14%3A00077516" target="_blank" >RIV/00216224:14330/14:00077516 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Style Markers Based on Stop-word List
Original language description
The analysis of author?s characteristic writing style and vocabulary has been used to uncover the identity of authors of documents by both manual linguistic approaches and automatic algorithmic methods. The revealing of the gender, name, or age can helpto expose pedophiles in social networks, false product reviews on the Internet servers, or machine translations submitted as manually translated texts. These problems are predominantly solved by a combination of stylometry and machine learning techniques. Since the stylometry focuses on the author?s style, word n-grams cannot be used as a style marker. Stop words are not influenced by a topic of documents, therefore they can be used to create style markers. In this paper, we present a guidance on how toimplement stop-word extraction and to include stop-words based style markers into a multilingual classification system based on the stylometry.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/LM2010013" target="_blank" >LM2010013: LINDAT-CLARIN: Institute for analysis, processing and distribution of linguistic data</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach
Others
Publication year
2014
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Eighth Workshop on Recent Advances in Slavonic Natural Language Processing
ISBN
—
ISSN
2336-4289
e-ISSN
—
Number of pages
5
Pages from-to
85-89
Publisher name
Tribun EU
Place of publication
Brno
Event location
Brno
Event date
Jan 1, 2014
Type of event by nationality
CST - Celostátní akce
UT code for WoS article
—