Vše

Co hledáte?

Vše
Projekty
Výsledky výzkumu
Subjekty

Rychlé hledání

  • Projekty podpořené TA ČR
  • Významné projekty
  • Projekty s nejvyšší státní podporou
  • Aktuálně běžící projekty

Chytré vyhledávání

  • Takto najdu konkrétní +slovo
  • Takto z výsledků -slovo zcela vynechám
  • “Takto můžu najít celou frázi”

An Approach for Textual Based Clustering Using Word Embedding

Identifikátory výsledku

  • Kód výsledku v IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F21%3A10442329" target="_blank" >RIV/00216208:11320/21:10442329 - isvavai.cz</a>

  • Výsledek na webu

  • DOI - Digital Object Identifier

Alternativní jazyky

  • Jazyk výsledku

    angličtina

  • Název v původním jazyce

    An Approach for Textual Based Clustering Using Word Embedding

  • Popis výsledku v původním jazyce

    Numerous endeavors have been made to improve the retrieval procedure in Textual Case-Based Reasoning (TCBR) utilizing clustering and feature selection strategies. SOPHisticated Information Analysis (SOPHIA) approach is one of the most successful efforts which is characterized by its ability to work without the domain of knowledge or language dependency. SOPHIA is based on the conditional probability, which facilitates an advanced Knowledge Discovery (KD) framework for case-based retrieval. SOPHIA attracts clusters by themes which contain only one word in each. However, using one word is not sufficient to construct cluster attractors because the exclusion of the other words associated with that word in the same context could not give a full picture of the theme. The main contribution of this chapter is to introduce an enhanced clustering approach called GloSOPHIA (GloVe SOPHIA) that extends SOPHIA by integrating word embedding technique to enhance KD in TCBR. A new algorithm is proposed to feed SOPHIA with similar terms vector space gained from Global Vector (GloVe) embedding technique. The proposed approach is evaluated on two different language corpora and the results are compared with SOPHIA, K-means, and Self- Organizing Map (SOM) in several evaluation criteria. The results indicate that GloSOPHIA outperforms the other clustering methods in most of the evaluation criteria.

  • Název v anglickém jazyce

    An Approach for Textual Based Clustering Using Word Embedding

  • Popis výsledku anglicky

    Numerous endeavors have been made to improve the retrieval procedure in Textual Case-Based Reasoning (TCBR) utilizing clustering and feature selection strategies. SOPHisticated Information Analysis (SOPHIA) approach is one of the most successful efforts which is characterized by its ability to work without the domain of knowledge or language dependency. SOPHIA is based on the conditional probability, which facilitates an advanced Knowledge Discovery (KD) framework for case-based retrieval. SOPHIA attracts clusters by themes which contain only one word in each. However, using one word is not sufficient to construct cluster attractors because the exclusion of the other words associated with that word in the same context could not give a full picture of the theme. The main contribution of this chapter is to introduce an enhanced clustering approach called GloSOPHIA (GloVe SOPHIA) that extends SOPHIA by integrating word embedding technique to enhance KD in TCBR. A new algorithm is proposed to feed SOPHIA with similar terms vector space gained from Global Vector (GloVe) embedding technique. The proposed approach is evaluated on two different language corpora and the results are compared with SOPHIA, K-means, and Self- Organizing Map (SOM) in several evaluation criteria. The results indicate that GloSOPHIA outperforms the other clustering methods in most of the evaluation criteria.

Klasifikace

  • Druh

    C - Kapitola v odborné knize

  • CEP obor

  • OECD FORD obor

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

  • Projekt

  • Návaznosti

Ostatní

  • Rok uplatnění

    2021

  • Kód důvěrnosti údajů

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

  • Název knihy nebo sborníku

    Machine Learning and Big Data Analytics Paradigms: Analysis, Applications and Challenges

  • ISBN

    978-3-030-59337-7

  • Počet stran výsledku

    20

  • Strana od-do

    261-280

  • Počet stran knihy

    323

  • Název nakladatele

    Springer

  • Místo vydání

    Cham

  • Kód UT WoS kapitoly