A Survey of Data Quality Measurement and Monitoring Tools
High-quality data is key to interpretable and trustworthy data analytics and the basis for meaningful data-driven decisions. In practical scenarios, data quality is typically associated with data preprocessing, profiling, and cleansing for subsequent tasks like data integration or data analytics. However, from a scientific perspective, a lot of research has been published about the measurement (i.e., the detection) of data quality issues and different generally applicable data quality dimensions and metrics have been discussed. In this work, we close the gap between research into data quality measurement and practical implementations by investigating the functional scope of current data quality tools. With a systematic search, we identified 667 software tools dedicated to "data quality", from which we evaluated 13 tools with respect to three functionality areas: (1) data profiling, (2) data quality measurement in terms of metrics, and (3) continuous data quality monitoring. We selected the evaluated tools with regard to pre-defined exclusion criteria to ensure that they are domain-independent, provide the investigated functions, and are evaluable freely or as trial. This survey aims at a comprehensive overview on state-of-the-art data quality tools and reveals potential for their functional enhancement. Additionally, the results allow a critical discussion on concepts, which are widely accepted in research, but hardly implemented in any tool observed, for example, generally applicable data quality metrics.
NurtureToken New!

Token crowdsale for this paper ends in

Buy Nurture Tokens

Authors

Are you an author of this paper? Check the Twitter handle we have for you is correct.

Lisa Ehrlinger (add twitter)
Elisa Rusz (add twitter)
Wolfram Wöß (add twitter)
Ask The Authors

Ask the authors of this paper a question or leave a comment.

Read it. Rate it.
#1. Which part of the paper did you read?

#2. The paper contains new data or analyses that is openly accessible?
#3. The conclusion is supported by the data and analyses?
#4. The conclusion is of scientific interest?
#5. The result is likely to lead to future research?

Github
User:
Stargazers:
37
Forks:
15
Open Issues:
21
Network:
15
Subscribers:
11
Language:
JavaScript
:whale: Tool to automate data quality checks on data pipelines
Youtube
Link:
None (add)
Views:
0
Likes:
0
Dislikes:
0
Favorites:
0
Comments:
0
Other
Sample Sizes (N=):
Inserted:
Words Total:
Words Unique:
Source:
Abstract:
None
07/18/19 06:02PM
21,624
5,058
Tweets
Memoirs: A Survey of Data Quality Measurement and Monitoring Tools. https://t.co/L8hqaIWvQ5
bimontesandro: RT @dbworld_: https://t.co/VcD6SLqXcL A Survey of Data Quality Measurement and Monitoring Tools. (arXiv:1907.08138v1 [cs.DB]) #databases
arxivml: "A Survey of Data Quality Measurement and Monitoring Tools", Lisa Ehrlinger, Elisa Rusz, Wolfram Wöß https://t.co/o1ENJqPv8q
darmont_lyon2: RT @dbworld_: https://t.co/VcD6SLqXcL A Survey of Data Quality Measurement and Monitoring Tools. (arXiv:1907.08138v1 [cs.DB]) #databases
arxiv_cs_LG: A Survey of Data Quality Measurement and Monitoring Tools. Lisa Ehrlinger, Elisa Rusz, and Wolfram Wöß https://t.co/DYDgbhoIQK
Images
Related