Many companies say they care about data quality. But actions speak louder than words.. This is how to spot a company that really cares about data quality: 1. They allocate capacity for fixing data quality issues 2. They test things properly before making changes 3. They measure the quality of their data 4. They investigate issues and learn from them 5. They proactively evaluate technologies in that space It’s all about being proactive and making the necessary investments. If you deal with issues reactively, and tell your teams data quality is important but don’t allocate the time and budget to improve it – most chances you don’t really care about data quality. Let’s just be clear about that.
0. They have a clear definition of "data quality" and corresponding metrics to measure it
☝️
Data quality is important but it's also hard to manage. There are lots of finger pointing and at the end no one takes ownership. We can say it's everyone's job but that also means everyone is waiting for someone else to do something. At the end data quality monitoring, auditing, alerting and correction need to be built into our data platform. One reason way lineage is so popular is because people use it to find changes in the data that could point to quality issues. It's not designed for that, but existing observability is lacking in that respect. Build it into the tools and the workflows and people will follow
Great insights, Shachar! It’s so true that proactive investment in data quality pays off immensely in the long run. As someone who’s seen firsthand the importance of well-structured, reliable systems, I couldn’t agree more with your points. By the way, I’m Assaf, an experienced full-stack software developer skilled in building high-performance web apps and optimizing backend systems. If you or anyone here needs help implementing scalable solutions or streamlining workflows, I’d be happy to lend a hand. Feel free to check out my page to see how I approach modern frameworks and clean, maintainable code!
Also, Data quality issues are often indicative of shortcomings in the processes currently in place, as data quality is significantly influenced by the accuracy and appropriateness of the data modeling systems.
Yes. Besides reporting & dashboarding, having good data is so important for your AI models to work perfectly. Catch data issues early in your pipeline.
Data Advisor | ex-Meta | ex-PayPal | Speaker | Airplane and Helicopter pilot 🚁
1wOh, and if you’re not sure how much to invest in data quality – in a poll I did a few months ago 50% of Data Engineers said they spend more than half their time dealing with data quality issues.. If a data engineer costs in average between $100,000-$200,000/year (I am being very conservative here..), and they spend half their time dealing with DQ issues - that’s $50,000-$100,000 you’re throwing down the drain. Every year. There are great tools and technologies out there, and if you have a team of 2-3 data engineers, most chances are you’re already paying way more than what these tools cost.