All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Style Markers Based on Stop-word List

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F14%3A00077516" target="_blank" >RIV/00216224:14330/14:00077516 - isvavai.cz</a>

  • Result on the web

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    angličtina

  • Original language name

    Style Markers Based on Stop-word List

  • Original language description

    The analysis of author?s characteristic writing style and vocabulary has been used to uncover the identity of authors of documents by both manual linguistic approaches and automatic algorithmic methods. The revealing of the gender, name, or age can helpto expose pedophiles in social networks, false product reviews on the Internet servers, or machine translations submitted as manually translated texts. These problems are predominantly solved by a combination of stylometry and machine learning techniques. Since the stylometry focuses on the author?s style, word n-grams cannot be used as a style marker. Stop words are not influenced by a topic of documents, therefore they can be used to create style markers. In this paper, we present a guidance on how toimplement stop-word extraction and to include stop-words based style markers into a multilingual classification system based on the stylometry.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

    IN - Informatics

  • OECD FORD branch

Result continuities

  • Project

    <a href="/en/project/LM2010013" target="_blank" >LM2010013: LINDAT-CLARIN: Institute for analysis, processing and distribution of linguistic data</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach

Others

  • Publication year

    2014

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Eighth Workshop on Recent Advances in Slavonic Natural Language Processing

  • ISBN

  • ISSN

    2336-4289

  • e-ISSN

  • Number of pages

    5

  • Pages from-to

    85-89

  • Publisher name

    Tribun EU

  • Place of publication

    Brno

  • Event location

    Brno

  • Event date

    Jan 1, 2014

  • Type of event by nationality

    CST - Celostátní akce

  • UT code for WoS article