Data Quality Problems in TPC-DI Based Data Integration Processes
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F18%3A00103077" target="_blank" >RIV/00216224:14330/18:00103077 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1007/978-3-319-93375-7_4" target="_blank" >http://dx.doi.org/10.1007/978-3-319-93375-7_4</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-93375-7_4" target="_blank" >10.1007/978-3-319-93375-7_4</a>
Alternative languages
Result language
angličtina
Original language name
Data Quality Problems in TPC-DI Based Data Integration Processes
Original language description
Many data driven organisations need to integrate data from multiple, distributed and heterogeneous resources for advanced data analysis. A data integration system is an essential component to collect data into a data warehouse or other data analytics systems. There are various alternatives of data integration systems which are created inhouse or provided by vendors. Hence, it is necessary for an organisation to compare and benchmark them when choosing a suitable one to meet its requirements. Recently, the TPC-DI is proposed as the first industrial benchmark for evaluating data integration systems. When using this benchmark, we find some typical data quality problems in the TPC-DI data source such as multi-meaning attributes and inconsistent data schemas, which could delay or even fail the data integration process. This paper explains processes of this benchmark and summarises typical data quality problems identified in the TPC-DI data source. Furthermore, in order to prevent data quality problems and proactively manage data quality, we propose a set of practical guidelines for researchers and practitioners to conduct data quality management when using the TPC-DI benchmark.
Czech name
—
Czech description
—
Classification
Type
C - Chapter in a specialist book
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2018
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Book/collection name
Enterprise Information Systems
ISBN
9783319933740
Number of pages of the result
17
Pages from-to
57-73
Number of pages of the book
632
Publisher name
Springer Lecture Notes in Business Information Processing
Place of publication
Germany
UT code for WoS chapter
—