Text-to-Ontology Mapping via Natural Language Processing with Application to Search for Relevant Ontologies in Catalysis
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F67985807%3A_____%2F23%3A00568306" target="_blank" >RIV/67985807:_____/23:00568306 - isvavai.cz</a>
Nalezeny alternativní kódy
RIV/68407700:21240/23:00364070
Výsledek na webu
<a href="https://dx.doi.org/10.3390/computers12010014" target="_blank" >https://dx.doi.org/10.3390/computers12010014</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.3390/computers12010014" target="_blank" >10.3390/computers12010014</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Text-to-Ontology Mapping via Natural Language Processing with Application to Search for Relevant Ontologies in Catalysis
Popis výsledku v původním jazyce
The paper presents a machine-learning based approach to text-to-ontology mapping. We explore a possibility of matching texts to the relevant ontologies using a combination of artificial neural networks and classifiers. Ontologies are formal specifications of the shared conceptualizations of application domains. While describing the same domain, different ontologies might be created by different domain experts. To enhance the reasoning and data handling of concepts in scientific papers, finding the best fitting ontology regarding description of the concepts contained in a text corpus. The approach presented in this work attempts to solve this by selection of a representative text paragraph from a set of scientific papers, which are used as data set. Then, using a pre-trained and fine-tuned Transformer, the paragraph is embedded into a vector space. Finally, the embedded vector becomes classified with respect to its relevance regarding a selected target ontology. To construct representative embeddings, we experiment with different training pipelines for natural language processing models. Those embeddings in turn are later used in the task of matching text to ontology. Finally, the result is assessed by compressing and visualizing the latent space and exploring the mappings between text fragments from a database and the set of chosen ontologies. To confirm the differences in behavior of the proposed ontology mapper models, we test five statistical hypotheses about their relative performance on ontology classification. To categorize the output from the Transformer, different classifiers are considered. These classifiers are, in detail, the Support Vector Machine (SVM), k-Nearest Neighbor, Gaussian Process, Random Forest, and Multilayer Perceptron. Application of these classifiers in a domain of scientific texts concerning catalysis research and respective ontologies, the suitability of the classifiers is evaluated, where the best result was achieved by the SVM classifier.
Název v anglickém jazyce
Text-to-Ontology Mapping via Natural Language Processing with Application to Search for Relevant Ontologies in Catalysis
Popis výsledku anglicky
The paper presents a machine-learning based approach to text-to-ontology mapping. We explore a possibility of matching texts to the relevant ontologies using a combination of artificial neural networks and classifiers. Ontologies are formal specifications of the shared conceptualizations of application domains. While describing the same domain, different ontologies might be created by different domain experts. To enhance the reasoning and data handling of concepts in scientific papers, finding the best fitting ontology regarding description of the concepts contained in a text corpus. The approach presented in this work attempts to solve this by selection of a representative text paragraph from a set of scientific papers, which are used as data set. Then, using a pre-trained and fine-tuned Transformer, the paragraph is embedded into a vector space. Finally, the embedded vector becomes classified with respect to its relevance regarding a selected target ontology. To construct representative embeddings, we experiment with different training pipelines for natural language processing models. Those embeddings in turn are later used in the task of matching text to ontology. Finally, the result is assessed by compressing and visualizing the latent space and exploring the mappings between text fragments from a database and the set of chosen ontologies. To confirm the differences in behavior of the proposed ontology mapper models, we test five statistical hypotheses about their relative performance on ontology classification. To categorize the output from the Transformer, different classifiers are considered. These classifiers are, in detail, the Support Vector Machine (SVM), k-Nearest Neighbor, Gaussian Process, Random Forest, and Multilayer Perceptron. Application of these classifiers in a domain of scientific texts concerning catalysis research and respective ontologies, the suitability of the classifiers is evaluated, where the best result was achieved by the SVM classifier.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Computers
ISSN
2073-431X
e-ISSN
—
Svazek periodika
12
Číslo periodika v rámci svazku
1
Stát vydavatele periodika
CH - Švýcarská konfederace
Počet stran výsledku
25
Strana od-do
14
Kód UT WoS článku
000914556100001
EID výsledku v databázi Scopus
2-s2.0-85146810459