Determining Window Size from Plagiarism Corpus for Stylometric Features
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F15%3A00084706" target="_blank" >RIV/00216224:14330/15:00084706 - isvavai.cz</a>
Result on the web
<a href="http://link.springer.com/chapter/10.1007%2F978-3-319-24027-5_31" target="_blank" >http://link.springer.com/chapter/10.1007%2F978-3-319-24027-5_31</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-24027-5_31" target="_blank" >10.1007/978-3-319-24027-5_31</a>
Alternative languages
Result language
angličtina
Original language name
Determining Window Size from Plagiarism Corpus for Stylometric Features
Original language description
The sliding window concept is a common method for computing a profile of a document with unknown structure. This paper outlines an experiment with stylometric word-based feature in order to determine an optimal size of the sliding window. It was conducted for a vocabulary richness method called average word frequency class? using the PAN 2015 source retrieval training corpus for plagiarism detection. The paper shows the pros and cons of the stop words removal for the sliding window document profiling and discusses the utilization of the selected feature for intrinsic plagiarism detection. The experiment resulted in the recommendation of setting the sliding windows to around 100 words in length for computing the text profile using the average word frequency class stylometric feature.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/LG13010" target="_blank" >LG13010: Czech Republic representation in the European Research Consortium for Informatics and Mathematics (ERCIM)</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2015
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Experimental IR Meets Multilinguality, Multimodality, and Interaction
ISBN
9783319240268
ISSN
0302-9743
e-ISSN
—
Number of pages
7
Pages from-to
293-299
Publisher name
Springer International Publishing
Place of publication
Toulouse, France
Event location
Toulouse, France
Event date
Sep 8, 2015
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—