All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

arTenTen: Arabic Corpus and Word Sketches

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F14%3A00073241" target="_blank" >RIV/00216224:14330/14:00073241 - isvavai.cz</a>

  • Result on the web

    <a href="http://www.sciencedirect.com/science/article/pii/S1319157814000330" target="_blank" >http://www.sciencedirect.com/science/article/pii/S1319157814000330</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1016/j.jksuci.2014.06.009" target="_blank" >10.1016/j.jksuci.2014.06.009</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    arTenTen: Arabic Corpus and Word Sketches

  • Original language description

    We present arTenTen, a web-crawled corpus of Arabic, gathered in 2012. arTenTen consists of 5.8-billion words. A chunk of it has been lemmatized and part-of-speech (POS) tagged with the MADA tool and subsequently loaded into Sketch Engine, a leading corpus query tool, where it is open for all to use. We have also created ´word sketches?: one-page, automatic, corpus-derived summaries of a word?s grammatical and collocational behavior. We use examples to demonstrate what the corpus can show us regarding Arabic words and phrases and how this can support lexicography and inform linguistic research. The article also presents the ´sketch grammar? (the basis for the word sketches) in detail, describes the process of building and processing the corpus, and considers the role of the corpus in additional research on Arabic.

  • Czech name

  • Czech description

Classification

  • Type

    J<sub>x</sub> - Unclassified - Peer-reviewed scientific article (Jimp, Jsc and Jost)

  • CEP classification

    IN - Informatics

  • OECD FORD branch

Result continuities

  • Project

    Result was created during the realization of more than one project. More information in the Projects tab.

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2014

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Name of the periodical

    Journal of King Saud University-Computer and Information Sciences

  • ISSN

    1319-1578

  • e-ISSN

  • Volume of the periodical

    2014

  • Issue of the periodical within the volume

    26

  • Country of publishing house

    NL - THE KINGDOM OF THE NETHERLANDS

  • Number of pages

    15

  • Pages from-to

    381-395

  • UT code for WoS article

  • EID of the result in the Scopus database