Py_ape: Text Data Acquiring, Extracting, Cleaning and Schema Matching in Python

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27240%2F20%3A10246988" target="_blank" >RIV/61989100:27240/20:10246988 - isvavai.cz</a>
Výsledek na webu
<a href="https://link.springer.com/chapter/10.1007%2F978-981-33-4370-2_6" target="_blank" >https://link.springer.com/chapter/10.1007%2F978-981-33-4370-2_6</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-981-33-4370-2_6" target="_blank" >10.1007/978-981-33-4370-2_6</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Py_ape: Text Data Acquiring, Extracting, Cleaning and Schema Matching in Python
Popis výsledku v původním jazyce
Py_ape is a package in Python that integrates a number of string and text processing algorithms for collecting, extracting, and cleaning text data from websites, creating frames for text corpora, and matching entities, matching two schemas, mapping and merging two schemas. The functions of Py_ape help the user step-by-step perform data integration and data preparation, based on some popular Python libraries. Especially in the entity matching function of the schema matching and merging phase, we used the Hamming distance algorithm to identify similar string pairs, and the longest common substring similarity algorithm to map data between the columns of schemas. These algorithms help to increase the accuracy of the schema matching process. In addition, in the article, we present experimental results using Py_ape to scrape, clean, match, and merge two sets of data related to aviation crashes, taken from different sources of Kaggle and Wikipedia. The result of the experiment will be evaluated in detail in the rest of the paper. (C) 2020, Springer Nature Singapore Pte Ltd.
Název v anglickém jazyce
Py_ape: Text Data Acquiring, Extracting, Cleaning and Schema Matching in Python
Popis výsledku anglicky
Py_ape is a package in Python that integrates a number of string and text processing algorithms for collecting, extracting, and cleaning text data from websites, creating frames for text corpora, and matching entities, matching two schemas, mapping and merging two schemas. The functions of Py_ape help the user step-by-step perform data integration and data preparation, based on some popular Python libraries. Especially in the entity matching function of the schema matching and merging phase, we used the Hamming distance algorithm to identify similar string pairs, and the longest common substring similarity algorithm to map data between the columns of schemas. These algorithms help to increase the accuracy of the schema matching process. In addition, in the article, we present experimental results using Py_ape to scrape, clean, match, and merge two sets of data related to aviation crashes, taken from different sources of Kaggle and Wikipedia. The result of the experiment will be evaluated in detail in the rest of the paper. (C) 2020, Springer Nature Singapore Pte Ltd.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach

Ostatní

Rok uplatnění
2020
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Communications in Computer and Information Science. Volume 1306
ISBN
978-981-334-369-6
ISSN
1865-0929
e-ISSN
1865-0937
Počet stran výsledku
12
Strana od-do
78-89
Název nakladatele
Springer
Místo vydání
Singapur
Místo konání akce
Quy Nhon
Datum konání akce
25. 11. 2020
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Mapování schémat v prostředí Sémantického webu Automatic Ontology Linking MAPSOM: User Involvement in Ontology Matching

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Py_ape: Text Data Acquiring, Extracting, Cleaning and Schema Matching in Python

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)