Benchmarking Algorithms for (data) Repairing and (data) Translation



BART is an error-generation tool for data cleaning applications. Its purpose is to introduce errors into clean databases for the purpose of benchmarking data-repairing algorithms. It provides users with the highest possible level of control over the error-generation process, and at the same time scales nicely to large databases. This is far from trivial, since, as we show in our technical papers, the error-generation problem is surprisingly challenging, and in fact, NP-complete. To scale to millions of tuples, the system relies on several non-trivial optimizations, including a new symmetry property of data quality constraints.


2 results

[VLDB-2016] Messing Up with Bart: Error Generation for Evaluating Data-Cleaning Algorithms (, , , , , ), volume 9, . (To appear in proceedings of the Proceedings of the VLDB Endowment) [bibtex] [pdf]
[TR-01-2015] Error Generation for Evaluating Data-Cleaning Algorithms (, , , , , ), Technical report, , . (Technical Report) [bibtex] [pdf]

Related Projects


Comments, Bugs, Requests