All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Determining Window Size from Plagiarism Corpus for Stylometric Features

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F15%3A00084706" target="_blank" >RIV/00216224:14330/15:00084706 - isvavai.cz</a>

  • Result on the web

    <a href="http://link.springer.com/chapter/10.1007%2F978-3-319-24027-5_31" target="_blank" >http://link.springer.com/chapter/10.1007%2F978-3-319-24027-5_31</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1007/978-3-319-24027-5_31" target="_blank" >10.1007/978-3-319-24027-5_31</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Determining Window Size from Plagiarism Corpus for Stylometric Features

  • Original language description

    The sliding window concept is a common method for computing a profile of a document with unknown structure. This paper outlines an experiment with stylometric word-based feature in order to determine an optimal size of the sliding window. It was conducted for a vocabulary richness method called average word frequency class? using the PAN 2015 source retrieval training corpus for plagiarism detection. The paper shows the pros and cons of the stop words removal for the sliding window document profiling and discusses the utilization of the selected feature for intrinsic plagiarism detection. The experiment resulted in the recommendation of setting the sliding windows to around 100 words in length for computing the text profile using the average word frequency class stylometric feature.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

    IN - Informatics

  • OECD FORD branch

Result continuities

  • Project

    <a href="/en/project/LG13010" target="_blank" >LG13010: Czech Republic representation in the European Research Consortium for Informatics and Mathematics (ERCIM)</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Others

  • Publication year

    2015

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Experimental IR Meets Multilinguality, Multimodality, and Interaction

  • ISBN

    9783319240268

  • ISSN

    0302-9743

  • e-ISSN

  • Number of pages

    7

  • Pages from-to

    293-299

  • Publisher name

    Springer International Publishing

  • Place of publication

    Toulouse, France

  • Event location

    Toulouse, France

  • Event date

    Sep 8, 2015

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article