English Dataset For Automatic Forum Extraction

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F19%3A43956032" target="_blank" >RIV/49777513:23520/19:43956032 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.cys.cic.ipn.mx/ojs/index.php/CyS/article/viewFile/3259/2679" target="_blank" >https://www.cys.cic.ipn.mx/ojs/index.php/CyS/article/viewFile/3259/2679</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.13053/CyS-23-3-3259" target="_blank" >10.13053/CyS-23-3-3259</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
English Dataset For Automatic Forum Extraction
Popis výsledku v původním jazyce
This paper describes the process of collecting, maintaining and exploiting an English dataset of web discussions. The dataset consists of many web discussions with hand-annotated posts in the context of a tree structure of a web page. Each post consists of username, date, text, and citations used by its author. The dataset contains 79 different websites with at least 500 pages from each. Each web page consists of a tree structure of HTML tags with texts taken from selected web pages. In the paper, we also describe algorithms trained on the dataset. The algorithms employ basic architectures (such as a bag of words with an SVM classifier and an LSTM network) to set a baseline for the dataset.
Název v anglickém jazyce
English Dataset For Automatic Forum Extraction
Popis výsledku anglicky
This paper describes the process of collecting, maintaining and exploiting an English dataset of web discussions. The dataset consists of many web discussions with hand-annotated posts in the context of a tree structure of a web page. Each post consists of username, date, text, and citations used by its author. The dataset contains 79 different websites with at least 500 pages from each. Each web page consists of a tree structure of HTML tags with texts taken from selected web pages. In the paper, we also describe algorithms trained on the dataset. The algorithms employ basic architectures (such as a bag of words with an SVM classifier and an LSTM network) to set a baseline for the dataset.

Klasifikace

Druh
Jimp - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
<a href="/cs/project/EF17_048%2F0007267" target="_blank" >EF17_048/0007267: VaV inteligentních komponent pokročilých technologií pro plzeňskou metropolitní oblast</a>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP) S - Specificky vyzkum na vysokych skolach I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Ostatní

Rok uplatnění
2019
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Computación y Sistemas
ISSN
1405-5546
e-ISSN
—
Svazek periodika
23
Číslo periodika v rámci svazku
3
Stát vydavatele periodika
MX - Spojené státy mexické
Počet stran výsledku
7
Strana od-do
765-771
Kód UT WoS článku
000489136900014
EID výsledku v databázi Scopus
2-s2.0-85076633364

Podobné výsledky(10)

Comparing web pages in terms of inner structure Box Clustering Segmentation: A New Method for Vision-based Page Preprocessing Softwarový modul pro import dat ze slovenských webových portálů

Co hledáte?

Rychlé hledání

Chytré vyhledávání

English Dataset For Automatic Forum Extraction

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)