Benchmarking Algorithms for (data) Repairing and (data) Translation
BART is an error-generation tool for data cleaning applications. Its purpose is to introduce errors into clean databases for the purpose of benchmarking data-repairing algorithms. It provides users with the highest possible level of control over the error-generation process, and at the same time scales nicely to large databases. This is far from trivial, since, as we show in our technical papers, the error-generation problem is surprisingly challenging, and in fact, NP-complete. To scale to millions of tuples, the system relies on several non-trivial optimizations, including a new symmetry property of data quality constraints.
|[VLDB-2016]||Messing Up with Bart: Error Generation for Evaluating Data-Cleaning Algorithms ( ), volume 9, 2016. (To appear in proceedings of the Proceedings of the VLDB Endowment)|
|[TR-01-2015]||Error Generation for Evaluating Data-Cleaning Algorithms ( ), Technical report, , 2015. (Technical Report)|