The representation of some phrases in Arabic word semantic vector spaces
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F18%3A43955839" target="_blank" >RIV/49777513:23520/18:43955839 - isvavai.cz</a>
Result on the web
<a href="https://www.sciencedirect.com/science/article/pii/S0950705119302941" target="_blank" >https://www.sciencedirect.com/science/article/pii/S0950705119302941</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1515/comp-2018-0017" target="_blank" >10.1515/comp-2018-0017</a>
Alternative languages
Result language
angličtina
Original language name
The representation of some phrases in Arabic word semantic vector spaces
Original language description
We demonstrate several ways to use morphological word analogies to examine the representation of complex words in semantic vector spaces. We present a set of morphological relations, each of which can be used to generate many word analogies. 1. We show that the difference-vectors for pairs which have the same relation to each other are similarly aligned. 2. We suggest that addition of difference-vectors is a useful phrase-building operator. 3. We propose that pairs in the same relation may have similar relative frequencies. 4. We suggest that homographs, which necessarily have the same semantic vectors, can sometimes be separated into different vectors for different senses, using frequency estimates and alignment constraints obtained from word analogies. 5. We observe that some of our analogies seem to be parallel, and might be combined. We use Arabic words as a case study, because Arabic orthography includes verb conjugations, object pronouns, definitive articles, possessive pronouns, and some prepositions in single word-forms. Therefore, a number of short phrases, built up of easily perceived constituents, are already present in stock semantic spaces for Arabic available on the web. Similar phrases in English would require including bigrams or trigrams as lemmas in the word embedding, although English derivational morphology allows for other relationships in standard semantic spaces which Arabic does not, for example negation. We make our corpus of morphological relations available to other researchers.
Czech name
—
Czech description
—
Classification
Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/LO1506" target="_blank" >LO1506: Sustainability support of the centre NTIS - New Technologies for the Information Society</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2018
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Open Computer Science
ISSN
2299-1093
e-ISSN
—
Volume of the periodical
8
Issue of the periodical within the volume
1
Country of publishing house
PL - POLAND
Number of pages
12
Pages from-to
182-193
UT code for WoS article
000473498500001
EID of the result in the Scopus database
2-s2.0-85060464530