Utilizing Text Similarity Measurement for Data Compression to Detect Plagiarism in Czech

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27240%2F15%3A86092341" target="_blank" >RIV/61989100:27240/15:86092341 - isvavai.cz</a>
Nalezeny alternativní kódy
RIV/61989100:27740/15:86092341
Výsledek na webu
<a href="http://dx.doi.org/10.1007/978-3-319-13572-4_13" target="_blank" >http://dx.doi.org/10.1007/978-3-319-13572-4_13</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-13572-4_13" target="_blank" >10.1007/978-3-319-13572-4_13</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Utilizing Text Similarity Measurement for Data Compression to Detect Plagiarism in Czech
Popis výsledku v původním jazyce
This paper attempts to apply data compression based simi- larity method for plagiarism detection. The method has been used earlier for plagiarism detection for Arabic and English languages. In this paper we utilize this method for Czech language text from a local multi-domain Czech corpus with 50 original documents with non-plagiarized parts, and 100 suspicious documents. The documents were generated so that every document could have from 1 to 5 paragraphs. The suspicion rate in the documents was randomly chosen from 0.2 to 0.8. The ndings of the study show that the similarity measurement based on Lempel-Ziv com- parison algorithms is ecient for the plagiarized part of the Czech text documents with a success rate of 82.60%. Future studies may enhance the eciency of the algorithms by including combined and more sophis- ticated methods.
Název v anglickém jazyce
Utilizing Text Similarity Measurement for Data Compression to Detect Plagiarism in Czech
Popis výsledku anglicky
This paper attempts to apply data compression based simi- larity method for plagiarism detection. The method has been used earlier for plagiarism detection for Arabic and English languages. In this paper we utilize this method for Czech language text from a local multi-domain Czech corpus with 50 original documents with non-plagiarized parts, and 100 suspicious documents. The documents were generated so that every document could have from 1 to 5 paragraphs. The suspicion rate in the documents was randomly chosen from 0.2 to 0.8. The ndings of the study show that the similarity measurement based on Lempel-Ziv com- parison algorithms is ecient for the plagiarized part of the Czech text documents with a success rate of 82.60%. Future studies may enhance the eciency of the algorithms by including combined and more sophis- ticated methods.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
IN - Informatika
OECD FORD obor
—

Návaznosti výsledku

Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach

Ostatní

Rok uplatnění
2015
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Advances in Intelligent Systems and Computing. Volume 334
ISBN
978-3-319-13571-7
ISSN
2194-5357
e-ISSN
—
Počet stran výsledku
10
Strana od-do
163-182
Název nakladatele
Springer
Místo vydání
New York
Místo konání akce
Addis Ababa
Datum konání akce
17. 11. 2014
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Utilizing Text Similarity Measurement for Data Compression to Detect Plagiarism in Czech

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)