The representation of some phrases in Arabic word semantic vector spaces
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F18%3A43955839" target="_blank" >RIV/49777513:23520/18:43955839 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.sciencedirect.com/science/article/pii/S0950705119302941" target="_blank" >https://www.sciencedirect.com/science/article/pii/S0950705119302941</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1515/comp-2018-0017" target="_blank" >10.1515/comp-2018-0017</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
The representation of some phrases in Arabic word semantic vector spaces
Popis výsledku v původním jazyce
We demonstrate several ways to use morphological word analogies to examine the representation of complex words in semantic vector spaces. We present a set of morphological relations, each of which can be used to generate many word analogies. 1. We show that the difference-vectors for pairs which have the same relation to each other are similarly aligned. 2. We suggest that addition of difference-vectors is a useful phrase-building operator. 3. We propose that pairs in the same relation may have similar relative frequencies. 4. We suggest that homographs, which necessarily have the same semantic vectors, can sometimes be separated into different vectors for different senses, using frequency estimates and alignment constraints obtained from word analogies. 5. We observe that some of our analogies seem to be parallel, and might be combined. We use Arabic words as a case study, because Arabic orthography includes verb conjugations, object pronouns, definitive articles, possessive pronouns, and some prepositions in single word-forms. Therefore, a number of short phrases, built up of easily perceived constituents, are already present in stock semantic spaces for Arabic available on the web. Similar phrases in English would require including bigrams or trigrams as lemmas in the word embedding, although English derivational morphology allows for other relationships in standard semantic spaces which Arabic does not, for example negation. We make our corpus of morphological relations available to other researchers.
Název v anglickém jazyce
The representation of some phrases in Arabic word semantic vector spaces
Popis výsledku anglicky
We demonstrate several ways to use morphological word analogies to examine the representation of complex words in semantic vector spaces. We present a set of morphological relations, each of which can be used to generate many word analogies. 1. We show that the difference-vectors for pairs which have the same relation to each other are similarly aligned. 2. We suggest that addition of difference-vectors is a useful phrase-building operator. 3. We propose that pairs in the same relation may have similar relative frequencies. 4. We suggest that homographs, which necessarily have the same semantic vectors, can sometimes be separated into different vectors for different senses, using frequency estimates and alignment constraints obtained from word analogies. 5. We observe that some of our analogies seem to be parallel, and might be combined. We use Arabic words as a case study, because Arabic orthography includes verb conjugations, object pronouns, definitive articles, possessive pronouns, and some prepositions in single word-forms. Therefore, a number of short phrases, built up of easily perceived constituents, are already present in stock semantic spaces for Arabic available on the web. Similar phrases in English would require including bigrams or trigrams as lemmas in the word embedding, although English derivational morphology allows for other relationships in standard semantic spaces which Arabic does not, for example negation. We make our corpus of morphological relations available to other researchers.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
<a href="/cs/project/LO1506" target="_blank" >LO1506: Podpora udržitelnosti centra NTIS - Nové technologie pro informační společnost</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2018
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Open Computer Science
ISSN
2299-1093
e-ISSN
—
Svazek periodika
8
Číslo periodika v rámci svazku
1
Stát vydavatele periodika
PL - Polská republika
Počet stran výsledku
12
Strana od-do
182-193
Kód UT WoS článku
000473498500001
EID výsledku v databázi Scopus
2-s2.0-85060464530