Piotr Czarnas’ Post

View profile for Piotr Czarnas, graphic

Founder @ DQOps open-source Data Quality platform | Detect any data quality issue and watch for new issues with Data Observability

Stale data only increases the cost of storing and using by mistake. Every organization needs a policy for finding and archiving unmaintained datasets. If there are too many datasets in the data lake and they look almost the same, some of them must be stale. The best way to find them is to monitor data freshness. The datasets with records with recent timestamps should be relatively good. Anything that has not been updated is stale. The problem is with dictionary data that has no update timestamps. You can add the "updated at" timestamp column to see if the table is active. If the table is inactive and there is no record that it is in use, restrict access to it only. When it is used, somebody will raise a ticket. Stale datasets that are not archived consume storage, but that is not the most significant cost. The actual cost is lost time for users who integrate these tables into their dashboards or models, and they have to repeat the whole work when they realize that the data is outdated. #dataquality #dataengineering #datagovernance

  • No alternative text description for this image
Dylan Anderson

Bridging the gap between data and strategy ✦ Head of Data Strategy @ Profusion ✦ Author of The Data Ecosystem newsletter ✦ R Programmer ✦ Policy Nerd

3d

This type of thing needs to be an essential part of data governance and management. But honestly, I never see it, which creates redundant costs and opens you up to a lot of risk

Sarah Choudhary

Visionary AI Innovator & Tech Leader | Forbes Council Member | Ph.D. in Data Science | Quantum Speaker | Ethical AI

3d

Hey Piotr! Love your take on stale data management! 😄 Here's a twist: what if those old datasets could serve as a 'time capsule' for spotting long-term trends or unexpected patterns? Sometimes yesterday's info gives tomorrow's insights! Ever thought about leveraging them this way?

Mohib Rahmani

Empowering transformation with practical solutions and AI innovation

3d

Conorteky agree, the frustration and time that users spend on the wild goose chase, building dashboards and models with data that’s as outdated as dial-up internet… I always say, restrict access and see who notices, if no one complains, it probably wasn’t worth keeping in the first place 🙈

Like
Reply
Dinmahomed Cassamo

Helping organizations make better decisions through the power of data.

3d

Absolutely. Stale data not only wastes storage but also time. And keeping data fresh isn’t just good practice, it’s essential. 

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics