Classification of Poverty Condition Using Natural Language Processing

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F22%3A88HHHD7W" target="_blank" >RIV/00216208:11320/22:88HHHD7W - isvavai.cz</a>
Výsledek na webu
<a href="https://doi.org/10.1007/s11205-022-02883-z" target="_blank" >https://doi.org/10.1007/s11205-022-02883-z</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/s11205-022-02883-z" target="_blank" >10.1007/s11205-022-02883-z</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Classification of Poverty Condition Using Natural Language Processing
Popis výsledku v původním jazyce
This work introduces a methodology to classify between poor and extremely poor people through Natural Language Processing. The approach serves as a baseline to understand and classify poverty through the people’s discourses using machine learning algorithms. Based on classical and modern word vector representations we propose two strategies for document level representations: (1) document-level features based on the concatenation of descriptive statistics and (2) Gaussian mixture models. Three classification methods are systematically evaluated: Support Vector Machines, Random Forest, and Extreme Gradient Boosting. The fourth best experiments yielded around 55% of accuracy, while the embeddings based on GloVe word vectors yielded a sensitivity of 79.6% which could be of great interest for the public policy makers to accurately find people who need to be prioritized in social programs.
Název v anglickém jazyce
Classification of Poverty Condition Using Natural Language Processing
Popis výsledku anglicky
This work introduces a methodology to classify between poor and extremely poor people through Natural Language Processing. The approach serves as a baseline to understand and classify poverty through the people’s discourses using machine learning algorithms. Based on classical and modern word vector representations we propose two strategies for document level representations: (1) document-level features based on the concatenation of descriptive statistics and (2) Gaussian mixture models. Three classification methods are systematically evaluated: Support Vector Machines, Random Forest, and Extreme Gradient Boosting. The fourth best experiments yielded around 55% of accuracy, while the embeddings based on GloVe word vectors yielded a sensitivity of 79.6% which could be of great interest for the public policy makers to accurately find people who need to be prioritized in social programs.

Klasifikace

Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
—

Ostatní

Rok uplatnění
2022
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Social Indicators Research [online]
ISSN
1573-0921
e-ISSN
1573-0921
Svazek periodika
162
Číslo periodika v rámci svazku
3
Stát vydavatele periodika
JP - Japonsko
Počet stran výsledku
23
Strana od-do
1413-1435
Kód UT WoS článku
000752794700001
EID výsledku v databázi Scopus
2-s2.0-85124366997

Podobné výsledky(10)

Vector representation of context networks of latent topics 3D face recognition based on the hierarchical score-level fusion classifiers Contextual latent semantic networks used for document classification

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Classification of Poverty Condition Using Natural Language Processing

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)