Query a corpus in near-natural language A human-friendly corpus query language not only for linguists
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F24%3A10488519" target="_blank" >RIV/00216208:11210/24:10488519 - isvavai.cz</a>
Result on the web
<a href="https://doi.org/10.1075/scl.119.10mil" target="_blank" >https://doi.org/10.1075/scl.119.10mil</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1075/scl.119.10mil" target="_blank" >10.1075/scl.119.10mil</a>
Alternative languages
Result language
angličtina
Original language name
Query a corpus in near-natural language A human-friendly corpus query language not only for linguists
Original language description
This paper addresses the pressing issue of accessibility of corpora to users who are not able or willing to learn a formal query language. It introduces a working online automatic translator from a near-natural language into the Corpus Query Language (CQL), as used in SketchEngine, Czech National Corpus web applications, and other services. The translator does not require strict syntactical patterns and allows for a certain amount of typing errors, using the redundancy associated with natural language. It allows querying corpora of 35 languages hosted by the Czech National Corpus infrastructure, all of them annotated in the Universal Dependencies formalism. Alternatively, the translated CQL code can be employed in other compatible systems. The paper both presents the theoretical assumptions of our solution and outlines the details of its implementation, including examples of use.
Czech name
—
Czech description
—
Classification
Type
C - Chapter in a specialist book
CEP classification
—
OECD FORD branch
60203 - Linguistics
Result continuities
Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Book/collection name
Studies in Corpus Linguistics
ISBN
978-90-272-1594-9
Number of pages of the result
15
Pages from-to
248-262
Number of pages of the book
266
Publisher name
John Benjamins
Place of publication
Amsterdam
UT code for WoS chapter
—