DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21240%2F16%3A00300156" target="_blank" >RIV/68407700:21240/16:00300156 - isvavai.cz</a>
Result on the web
<a href="http://www.lrec-conf.org/proceedings/lrec2016/pdf/895_Paper.pdf" target="_blank" >http://www.lrec-conf.org/proceedings/lrec2016/pdf/895_Paper.pdf</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus
Original language description
The ever increasing importance of machine learning in Natural Language Processing is accompanied by an equally increasing need in large-scale training and evaluation corpora. Due to its size, its openness and relative quality, the Wikipedia has already been a source of such data, but on a limited scale. This paper introduces the DBpedia Abstract Corpus, a large-scale, open corpus of annotated Wikipedia texts in six languages, featuring over 11 million texts and over 97 million entity links. The properties of the Wikipedia texts are being described, as well as the corpus creation process, its format and interesting use-cases, like Named Entity Linking training and evaluation.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
V - Vyzkumna aktivita podporovana z jinych verejnych zdroju
Others
Publication year
2016
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
ISBN
978-2-9517408-9-1
ISSN
—
e-ISSN
—
Number of pages
5
Pages from-to
3339-3343
Publisher name
European Language Recources Association (ELRA)
Place of publication
Paris
Event location
Portorož
Event date
May 23, 2016
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000526952503091