Topic modeling and classification of scientific disciplines

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F67985955%3A_____%2F22%3A00566673" target="_blank" >RIV/67985955:_____/22:00566673 - isvavai.cz</a>
Výsledek na webu
<a href="https://doi.org/10.5281/zenodo.6957149" target="_blank" >https://doi.org/10.5281/zenodo.6957149</a>
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Topic modeling and classification of scientific disciplines
Popis výsledku v původním jazyce
This paper evaluates the possibility of classifying Ph.D. theses into disciplines by using a bottom-up empirical approach based on topic modeling. It examines a dataset of 334810 Ph.D. theses submitted at French universities between 2006 and 2020. In this comprehensive dataset, the variable “discipline” does not rely on any controlled vocabulary or disciplinary ontology. Consequently, there are 23057 unique labels for the variable of which 14538 appear only once. Such situation renders impossible any full-scale analysis of the data from the perspective of scientific disciplines. Our topic model is built atop of abstracts of 285311 of theses in French that include a title, keywords, and abstract. After applying the TopSBM algorithm, we obtained a topic model with 7 levels of hierarchy. The outcomes of our experiments with classification of theses into disciplines suggest that topics derived from purely textual data implicitly capture information about disciplines. This quality of topic modelling can be of great benefit when dealing with datasets where disciplinary information is unavailable or unreliable and where citation records are absent (as it remains the case especially in the Humanities).
Název v anglickém jazyce
Topic modeling and classification of scientific disciplines
Popis výsledku anglicky
This paper evaluates the possibility of classifying Ph.D. theses into disciplines by using a bottom-up empirical approach based on topic modeling. It examines a dataset of 334810 Ph.D. theses submitted at French universities between 2006 and 2020. In this comprehensive dataset, the variable “discipline” does not rely on any controlled vocabulary or disciplinary ontology. Consequently, there are 23057 unique labels for the variable of which 14538 appear only once. Such situation renders impossible any full-scale analysis of the data from the perspective of scientific disciplines. Our topic model is built atop of abstracts of 285311 of theses in French that include a title, keywords, and abstract. After applying the TopSBM algorithm, we obtained a topic model with 7 levels of hierarchy. The outcomes of our experiments with classification of theses into disciplines suggest that topics derived from purely textual data implicitly capture information about disciplines. This quality of topic modelling can be of great benefit when dealing with datasets where disciplinary information is unavailable or unreliable and where citation records are absent (as it remains the case especially in the Humanities).

Klasifikace

Druh
O - Ostatní výsledky
CEP obor
—
OECD FORD obor
50803 - Information science (social aspects)

Návaznosti výsledku

Projekt
<a href="/cs/project/GJ20-01752Y" target="_blank" >GJ20-01752Y: Grantový a negrantový výzkum v České republice: scientometrická analýza a modelování témat</a><br>
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Ostatní

Rok uplatnění
2022
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Podobné výsledky(10)

Mapping knowledge. Topic analysis of science locates researchers in disciplinary landscape Template for FNSPE Extended Ph.D. Thesis Abstract in TeX Možnosti francouzské literatury 20. a 21. století pro psaní závěrečných prací na katedře francouzského jazyka a literatury

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Topic modeling and classification of scientific disciplines

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Podobné výsledky(10)