All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Crowd Sourcing as an Improvement of N-Grams Text Document Classification Algorithm

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989592%3A15410%2F20%3A73606106" target="_blank" >RIV/61989592:15410/20:73606106 - isvavai.cz</a>

  • Result on the web

    <a href="https://obd.upol.cz/id_publ/333185992" target="_blank" >https://obd.upol.cz/id_publ/333185992</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1109/SMAP49528.2020.9248454" target="_blank" >10.1109/SMAP49528.2020.9248454</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Crowd Sourcing as an Improvement of N-Grams Text Document Classification Algorithm

  • Original language description

    A common task in a world of natural language processing is text classification useful for e.g.spam filters, documents sorting, science articles classification or plagiarism detection. This can still be done best and most accurately by human, on the other hand, we can of ten accept certain error in the classification in exchange for its speed. Here, natural language processing mechanism transforms the text in natural language to a form understandable by a classifier such as K-Nearest Neighbour, Decision Trees, Artificial Neural Network or Support Vector Machines. We can also use thishuman element to help automated classification to improve its accuracy by means of crowdsourcing. This work deals with classification of text documents and its improvement through crowdsourcing. Itsgoal is to design and implement text documents classifier prototype based on documents similarityand to design evaluation and crowdsourcing-based classification improvement mechanism. For classification the N-grams algorithm has been chosen, which was implemented in Java. Interface for crowdsourcing was created using CMS WordPress. In addition to data collection, the purpose of interface is to evaluate classification accuracy, which leads to extension of classifier test data set, thus the classification is more successful. We have tested our approach on two data sets with promising preliminary results even across different languages. This led to a real-world implementation started at the beginning of 2019 in cooperation of two universities: VšB-TUO and OSU.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

  • Continuities

    I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Others

  • Publication year

    2020

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    SMAP 2020 - 15th International Workshop on Semantic and Social Media Adaptation and Personalization

  • ISBN

    978-1-72815-919-5

  • ISSN

  • e-ISSN

  • Number of pages

    5

  • Pages from-to

    1-6

  • Publisher name

    IEEE Computer Society Press

  • Place of publication

    New York

  • Event location

    Zakynthos

  • Event date

    Oct 29, 2020

  • Type of event by nationality

    EUR - Evropská akce

  • UT code for WoS article