Case Study: Applying Machine Learning to Data Quality

Wednesday, October 23, 2019

1:00 p.m. – 4:00 p.m.

Poor quality of data is costing companies both in real dollar in terms of manual rework, compliance related fines, etc. as well as opportunity costs. Traditionally, Data Quality team relies on the subject matter experts to identify relevant data quality. In today’s modern data architecture, when data flows at a high volume, in different formats, from multiple sources and through multiple platforms, capturing the correct and complete data quality rules is a nightmare with or without the help of a subject matter expert. A large financial services organization spent nearly 2 years to identify and implement 970+ data quality rules for its CCAR reporting process. Despite significant investments to ensure data quality, data errors often went undetected because the type and the number of deployed data quality rules were not complete and full-proof. Authors helped the organization to leverage Numbers Theory and Machine Learning to discover data quality rules autonomously by probing their existing data. This approach led to identification of an additional 2200+ rules and complete elimination in unexpected data errors in last 6 months.

Per security requirements at The Link, preregistration is required by 10/22/2019. Message host for any questions.

The Link at Kendall Square, 255 Main Street, 8th Floor, Cambridge, Mass. 02142