The expected sum of edge lengths in planar linearizations of trees
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3ANHTSS7K8" target="_blank" >RIV/00216208:11320/25:NHTSS7K8 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85188338511&doi=10.15398%2fjlm.v12i1.362&partnerID=40&md5=ae510ecacbed631e27e96f2ddead0113" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85188338511&doi=10.15398%2fjlm.v12i1.362&partnerID=40&md5=ae510ecacbed631e27e96f2ddead0113</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.15398/jlm.v12i1.362" target="_blank" >10.15398/jlm.v12i1.362</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
The expected sum of edge lengths in planar linearizations of trees
Popis výsledku v původním jazyce
Dependency trees have proven to be a very successful model to represent the syntactic structure of sentences of human languages. In these structures, vertices are words and edges connect syntactically-dependent words. The tendency of these dependencies to be short has been demonstrated using random baselines for the sum of the lengths of the edges or their variants. A ubiquitous baseline is the expected sum in projective orderings (wherein edges do not cross and the root word of the sentence is not covered by any edge), that can be computed in time O(n). Here we focus on a weaker formal constraint, namely planarity. In the theoretical domain, we present a characterization of planarity that, given a sentence, yields either the number of planar permutations or an efcient algorithm to generate uniformly random planar permutations of the words. We also show the relationship between the expected sum in planar arrangements and the expected sum in projective arrangements. In the domain of applications, we derive a O(n)-time algorithm to calculate the expected value of the sum of edge lengths. We also apply this research to a parallel corpus and fnd that the gap between actual dependency distance and the random baseline reduces as the strength of the formal constraint on dependency structures increases, suggesting that formal constraints absorb part of the dependency distance minimization efect. Our research paves the way for replicating past research on dependency distance minimization using random planar linearizations as random baseline. © 2024 Institute of Computer Science, Polish Academy of Sciences. All rights reserved.
Název v anglickém jazyce
The expected sum of edge lengths in planar linearizations of trees
Popis výsledku anglicky
Dependency trees have proven to be a very successful model to represent the syntactic structure of sentences of human languages. In these structures, vertices are words and edges connect syntactically-dependent words. The tendency of these dependencies to be short has been demonstrated using random baselines for the sum of the lengths of the edges or their variants. A ubiquitous baseline is the expected sum in projective orderings (wherein edges do not cross and the root word of the sentence is not covered by any edge), that can be computed in time O(n). Here we focus on a weaker formal constraint, namely planarity. In the theoretical domain, we present a characterization of planarity that, given a sentence, yields either the number of planar permutations or an efcient algorithm to generate uniformly random planar permutations of the words. We also show the relationship between the expected sum in planar arrangements and the expected sum in projective arrangements. In the domain of applications, we derive a O(n)-time algorithm to calculate the expected value of the sum of edge lengths. We also apply this research to a parallel corpus and fnd that the gap between actual dependency distance and the random baseline reduces as the strength of the formal constraint on dependency structures increases, suggesting that formal constraints absorb part of the dependency distance minimization efect. Our research paves the way for replicating past research on dependency distance minimization using random planar linearizations as random baseline. © 2024 Institute of Computer Science, Polish Academy of Sciences. All rights reserved.
Klasifikace
Druh
J<sub>SC</sub> - Článek v periodiku v databázi SCOPUS
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Journal of Language Modelling
ISSN
2299-856X
e-ISSN
—
Svazek periodika
12
Číslo periodika v rámci svazku
1
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
42
Strana od-do
1-42
Kód UT WoS článku
—
EID výsledku v databázi Scopus
2-s2.0-85188338511