Vše

Co hledáte?

Vše
Projekty
Výsledky výzkumu
Subjekty

Rychlé hledání

  • Projekty podpořené TA ČR
  • Významné projekty
  • Projekty s nejvyšší státní podporou
  • Aktuálně běžící projekty

Chytré vyhledávání

  • Takto najdu konkrétní +slovo
  • Takto z výsledků -slovo zcela vynechám
  • “Takto můžu najít celou frázi”

A Language Framework for Measuring Semantic and Syntactic Similarity for Arabic Texts

Identifikátory výsledku

  • Kód výsledku v IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3A3F85CGXQ" target="_blank" >RIV/00216208:11320/25:3F85CGXQ - isvavai.cz</a>

  • Výsledek na webu

    <a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85188898036&doi=10.1007%2fs42979-024-02691-x&partnerID=40&md5=b78c4d0c2a44025a094611d2030a6de4" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85188898036&doi=10.1007%2fs42979-024-02691-x&partnerID=40&md5=b78c4d0c2a44025a094611d2030a6de4</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1007/s42979-024-02691-x" target="_blank" >10.1007/s42979-024-02691-x</a>

Alternativní jazyky

  • Jazyk výsledku

    angličtina

  • Název v původním jazyce

    A Language Framework for Measuring Semantic and Syntactic Similarity for Arabic Texts

  • Popis výsledku v původním jazyce

    A language framework for determining the similarity of two snipped texts is proposed. The edit distance concept is employed as a frame algorithm to capture syntactic and semantic similarities. In the proposed work, syntax level distances between lemma-form words are calculated, while partial edit costs are allowed to embed semantic similarity measurements. Many knowledge resources have been used, such as words’ synonyms, negation rules, and word semantic spaces. A researchable Arabic thesaurus dictionary is built in two forms, surface form and lemma form. Semantic word spaces are generated from one of the word embedding models, which represents the words in vector spaces. The algorithm is enhanced to overcome problems with different word orders between sentences by a word permutation technique that elects the best alignment of the snipped text words to yield the best matching score. The algorithm also studied the effect of negation words on textual similarity. The proposed approach was implemented to find the similarity between Arabic language texts. Results are compared with other state-of-the-art algorithms using two benchmark datasets. The experimental results show that the proposed approach achieves a higher Pearson correlation coefficient compared to other works. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2024.

  • Název v anglickém jazyce

    A Language Framework for Measuring Semantic and Syntactic Similarity for Arabic Texts

  • Popis výsledku anglicky

    A language framework for determining the similarity of two snipped texts is proposed. The edit distance concept is employed as a frame algorithm to capture syntactic and semantic similarities. In the proposed work, syntax level distances between lemma-form words are calculated, while partial edit costs are allowed to embed semantic similarity measurements. Many knowledge resources have been used, such as words’ synonyms, negation rules, and word semantic spaces. A researchable Arabic thesaurus dictionary is built in two forms, surface form and lemma form. Semantic word spaces are generated from one of the word embedding models, which represents the words in vector spaces. The algorithm is enhanced to overcome problems with different word orders between sentences by a word permutation technique that elects the best alignment of the snipped text words to yield the best matching score. The algorithm also studied the effect of negation words on textual similarity. The proposed approach was implemented to find the similarity between Arabic language texts. Results are compared with other state-of-the-art algorithms using two benchmark datasets. The experimental results show that the proposed approach achieves a higher Pearson correlation coefficient compared to other works. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2024.

Klasifikace

  • Druh

    J<sub>SC</sub> - Článek v periodiku v databázi SCOPUS

  • CEP obor

  • OECD FORD obor

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

  • Projekt

  • Návaznosti

Ostatní

  • Rok uplatnění

    2024

  • Kód důvěrnosti údajů

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

  • Název periodika

    SN Computer Science

  • ISSN

    2662-995X

  • e-ISSN

  • Svazek periodika

    5

  • Číslo periodika v rámci svazku

    4

  • Stát vydavatele periodika

    US - Spojené státy americké

  • Počet stran výsledku

    14

  • Strana od-do

    1-14

  • Kód UT WoS článku

  • EID výsledku v databázi Scopus

    2-s2.0-85188898036