LatinCy: Synthetic Trained Pipelines for Latin NLP

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F23%3A8ILUKGMT" target="_blank" >RIV/00216208:11320/23:8ILUKGMT - isvavai.cz</a>
Výsledek na webu
<a href="http://arxiv.org/abs/2305.04365" target="_blank" >http://arxiv.org/abs/2305.04365</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.48550/arXiv.2305.04365" target="_blank" >10.48550/arXiv.2305.04365</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
LatinCy: Synthetic Trained Pipelines for Latin NLP
Popis výsledku v původním jazyce
"This paper introduces LatinCy, a set of trained general purpose Latin-language "core" pipelines for use with the spaCy natural language processing framework. The models are trained on a large amount of available Latin data, including all five of the Latin Universal Dependency treebanks, which have been preprocessed to be compatible with each other. The result is a set of general models for Latin with good performance on a number of natural language processing tasks (e.g. the top-performing model yields POS tagging, 97.41% accuracy; lemmatization, 94.66% accuracy; morphological tagging 92.76% accuracy). The paper describes the model training, including its training data and parameterization, and presents the advantages to Latin-language researchers of having a spaCy model available for NLP work."
Název v anglickém jazyce
LatinCy: Synthetic Trained Pipelines for Latin NLP
Popis výsledku anglicky
"This paper introduces LatinCy, a set of trained general purpose Latin-language "core" pipelines for use with the spaCy natural language processing framework. The models are trained on a large amount of available Latin data, including all five of the Latin Universal Dependency treebanks, which have been preprocessed to be compatible with each other. The result is a set of general models for Latin with good performance on a number of natural language processing tasks (e.g. the top-performing model yields POS tagging, 97.41% accuracy; lemmatization, 94.66% accuracy; morphological tagging 92.76% accuracy). The paper describes the model training, including its training data and parameterization, and presents the advantages to Latin-language researchers of having a spaCy model available for NLP work."

Klasifikace

Druh
O - Ostatní výsledky
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
—

Ostatní

Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Podobné výsledky(10)

Slovak Language Models for Basic Preprocessing Tasks in Python Advancing Hungarian Text Processing with HuSpaCy: Efficient and Accurate NLP Pipelines A BERT Based Approach for Arabic POS Tagging

Co hledáte?

Rychlé hledání

Chytré vyhledávání

LatinCy: Synthetic Trained Pipelines for Latin NLP

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Podobné výsledky(10)