Optimizing Long Text Classification Performance Through Keyword-Based Sentence Selection: A Case Study on Online News Classification for Indonesian GDP Growth-Rate Detection
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3ARN4BHU6M" target="_blank" >RIV/00216208:11320/25:RN4BHU6M - isvavai.cz</a>
Výsledek na webu
<a href="https://ojs3.unpatti.ac.id/index.php/barekeng/article/view/11904" target="_blank" >https://ojs3.unpatti.ac.id/index.php/barekeng/article/view/11904</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.30598/barekengvol18iss2pp1081-1094" target="_blank" >10.30598/barekengvol18iss2pp1081-1094</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Optimizing Long Text Classification Performance Through Keyword-Based Sentence Selection: A Case Study on Online News Classification for Indonesian GDP Growth-Rate Detection
Popis výsledku v původním jazyce
Efficiently managing lengthy textual data, particularly in online news, is crucial for enhancing the performance of long text classification. This study delves into innovative approaches to streamline the Gross Domestic Product (GDP) computation process by harnessing modern data analytics, Natural Language Processing (NLP), and online news sources. Leveraging online news data introduces real-time information, promising to improve the accuracy and timeliness of economic indicators like GDP. However, handling the complexity of extensive textual data poses a challenge, demanding advanced NLP techniques. This research shifts from traditional word-weight-based methods to keyword-based extractive summarization techniques to address this. These tailored approaches ensure that selected sentences align precisely with specific keywords relevant to the research case, such as GDP growth rate detection. The study emphasizes the necessity of adapting summarization methods to capture information in unique research contexts effectively. According to classification results, the implementation of sentence selection successfully demonstrated improved performance in terms of classification accuracy. Specifically, there was an average accuracy increase of 0.0226 for machine learning and 0.0164 for transfer learning models. Additionally, in terms of computational efficiency, sentence selection also accelerates processing time during hyperparameter tuning and fine-tuning, as observed using the same computational resources.
Název v anglickém jazyce
Optimizing Long Text Classification Performance Through Keyword-Based Sentence Selection: A Case Study on Online News Classification for Indonesian GDP Growth-Rate Detection
Popis výsledku anglicky
Efficiently managing lengthy textual data, particularly in online news, is crucial for enhancing the performance of long text classification. This study delves into innovative approaches to streamline the Gross Domestic Product (GDP) computation process by harnessing modern data analytics, Natural Language Processing (NLP), and online news sources. Leveraging online news data introduces real-time information, promising to improve the accuracy and timeliness of economic indicators like GDP. However, handling the complexity of extensive textual data poses a challenge, demanding advanced NLP techniques. This research shifts from traditional word-weight-based methods to keyword-based extractive summarization techniques to address this. These tailored approaches ensure that selected sentences align precisely with specific keywords relevant to the research case, such as GDP growth rate detection. The study emphasizes the necessity of adapting summarization methods to capture information in unique research contexts effectively. According to classification results, the implementation of sentence selection successfully demonstrated improved performance in terms of classification accuracy. Specifically, there was an average accuracy increase of 0.0226 for machine learning and 0.0164 for transfer learning models. Additionally, in terms of computational efficiency, sentence selection also accelerates processing time during hyperparameter tuning and fine-tuning, as observed using the same computational resources.
Klasifikace
Druh
J<sub>ost</sub> - Ostatní články v recenzovaných periodicích
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
BAREKENG: Jurnal Ilmu Matematika dan Terapan
ISSN
2615-3017
e-ISSN
—
Svazek periodika
18
Číslo periodika v rámci svazku
2
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
14
Strana od-do
1081-1094
Kód UT WoS článku
—
EID výsledku v databázi Scopus
—