All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Arabic Spelling Error Detection and Correction

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F16%3A10335394" target="_blank" >RIV/00216208:11320/16:10335394 - isvavai.cz</a>

  • Result on the web

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    angličtina

  • Original language name

    Arabic Spelling Error Detection and Correction

  • Original language description

    A spelling error detection and correction application is typically based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system. We develop our dictionary of 9.2 million fully-inflected Arabic words (types) from a morphological transducer and a large corpus, validated and manually revised. We improve the error model by analyzing error types and creating an edit distance re-ranker. We also improve the language model by analyzing the level of noise in different data sources and selecting an optimal subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2013, OpenOffice Ayaspell 3.4 and Google Docs.

  • Czech name

  • Czech description

Classification

  • Type

    J<sub>x</sub> - Unclassified - Peer-reviewed scientific article (Jimp, Jsc and Jost)

  • CEP classification

    AI - Linguistics

  • OECD FORD branch

Result continuities

  • Project

    <a href="/en/project/GBP103%2F12%2FG084" target="_blank" >GBP103/12/G084: Center for Large Scale Multi-modal Data Interpretation</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2016

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Name of the periodical

    Natural Language Engineering

  • ISSN

    1351-3249

  • e-ISSN

  • Volume of the periodical

    22

  • Issue of the periodical within the volume

    5

  • Country of publishing house

    GB - UNITED KINGDOM

  • Number of pages

    23

  • Pages from-to

    751-773

  • UT code for WoS article

  • EID of the result in the Scopus database

    2-s2.0-84925238826