Downstream data teams feel the pain of upstream data quality issues immensely... but are scared of addressing it... "This is just how things are, we can't fix this." "The upstream engineering team doesn't care." "They would never allow us to additional CI/CD tests." Yet something interesting happens once we get an upstream engineer in the room to talk about data contracts. "Wait... the data team isn't already doing this? We can put this into our existing CI/CD pipeline? Notifications happen directly in my GitHub pull request? We should have been doing this yesterday." Despite the challenges of data quality, upstream engineers and downstream data teams are way more aligned than most think. It's just the silos between transactional and analytical databases that make communicating this alignment so hard. #data #dataengineering ----- 📌 Want to learn more? Check out our article "OLTP Vs. OLAP: How Professional POVs Cause Data Problems" https://lnkd.in/g_8cHS7h
Another option that get you thrown out the window is data modeling 😭
IME the "upstream engineer" usually works at another company...
LOL who would have thought!
This is one of the great challenges of machine learning and AI. Data available/provided is frequently not the _data_ that is needed. I encountered this while working on an AI tool for a predictive maintenance solution. Our data engineers frequently pushed for data that was unavailable, and/or probably inaccurate, and/or someone else's IP. If we had the staff, budget, and executive support to investigate and solve these problems we could have driven to a solution, but no one wanted to hear about such real-world problems.
Imagine a world whrre the authoring systems had data validation checks on input
Geoffrey Johnson this is the part of the venn diagram where both of our jobs meet 😂
So relevant and burning
Associate Director, Internal Data Services
6moIf I had a penny for every time I have suggested this