Dylan Anderson’s Post

Bridging the gap between data and strategy ✦ Head of Data Strategy @ Profusion ✦ Author of The Data Ecosystem newsletter ✦ R Programmer ✦ Policy Nerd

4d Edited

Data quality, the most boring but essential topic in data At its core data quality can be broken down into eight different dimensions: 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 – The correctness of the data, ensuring it reflects the real-world scenario it is intended to model and represent 𝐂𝐨𝐧𝐬𝐢𝐬𝐭𝐞𝐧𝐜𝐲 – Uniformity across data sources, ensuring that the data is the same between storage and usage and does not conflict within different datasets 𝐂𝐨𝐦𝐩𝐥𝐞𝐭𝐞𝐧𝐞𝐬𝐬 – All necessary data is present, with no missing elements that could impact analysis or decision-making 𝗛𝗮𝘀 𝗜𝗻𝘁𝗲𝗴𝗿𝗶𝘁𝘆 – Data continues to be recorded and relationships maintained as intented 𝗠𝗮𝗶𝗻𝘁𝗮𝗶𝗻𝘀 𝗖𝗼𝗻𝗳𝗼𝗿𝗺𝗶𝘁𝘆 – Data always follows standard definitions, set to ensure consistency around type, size and format 𝐓𝐢𝐦𝐞𝐥𝐢𝐧𝐞𝐬𝐬 – Data is up-to-date and available when needed, ensuring that decisions are based on the most current information 𝐔𝐧𝐢𝐪𝐮𝐞𝐧𝐞𝐬𝐬 – No duplicate records exist within the organisation, helping maintain the integrity and accuracy of the data (creating trust) 𝐕𝐚𝐥𝐢𝐝𝐢𝐭𝐲 – Ensuring the data remains accurate and consistent throughout its lifecycle and conforms to the agreed structure, data quality standards/ rules or list of values Use these dimensions to help frame, understand and measure the quality of your data Check out my past newsletter article on this topic (https://lnkd.in/gF4ETX2y), where I define what data quality really means and the underlying root causes that lead to issues. Then the week after, we will jump into some approaches to fixing data quality issues, sharing what you should and should not do! If you still haven’t signed up (and not soaked up all that data ecosystem knowledge), subscribe for more! #data #dataquality #datastrategy #dataecosystem #DylanDecodes

44 Comments

Annie Nelson

Data Analyst | Tableau Consultant | Author of How to Become a Data Analyst

Dylan i just am going to do ai now and so i dont need to worry about this. Fire the employees.

21 Reactions

Bob Salmon

Senior Software Engineer at Qualis Flow

These are great, but coming from a software / UX world I would also expect: valuable. Why are we putting effort into collecting, cleaning, processing, storing and serving this data? It doesn't matter how spiffy it is, if it's the wrong data for our users. Are all tables / columns equally valuable to our users? Do all our users value the different parts of the data in the same way? Etc.

8 Reactions

Joakim Dalby

Consultant database, BI, data warehouse, data mart, cube, ETL, SQL, analysis, design, development, documentation, test, management, SQL Server, Access, ADP+, Kimball practitioner. JOIN people ON data.

I have twelve data quality dimensions: Timeliness and Availability Completeness Uniqueness Validity Conformity Uniformity Consistency Integrity Accuracy Reasonability and Anomalies and Outliers Confidentiality and Security Clarity and Usefulness More reading: http://www.joakimdalby.dk/HTM/DimensionalModeling.htm#SectionDQ

5 Reactions

Edwin Chuy

Certified Microsoft Data Scientist and Fabric Analytics Engineer

Most of my work in dealing with data has been related to address quality issues. Probably most here have heard the phrase "garbage in, garbage out". The worst things I have found to date is that one of the internal users input an emoji character into a phone number field. Another one is for an e-mail I have repeated seen something along the lines of "[email protected]". That sort of behavior should be penalized somehow considering that the fields can be left blank. In terms of data validation, fields should accept only certain characters and adhere to a certain format that could be enforced via a regex rule whenever the end user can input values freely. Also, when the possible values of a field are known, it is better to implement a drop-down list than allow the end user to input values freely. As with other things, detecting and addressing the data quality issues requires additional work and time. That's why a few years back, when somebody mentioned out of the blue, that we were going to implement a data warehouse, I knew that it was a pipe dream. Via automation and the availability of some new tools, I can finally see that happening in the long run.

Hannah Rounds

Data & Analytics Leader | Data Governance | BI & Analytics | Get Value From Your Data

The more explicit you can be about exactly how these dimensions are measured, the better your data will be. Another thing that I have found is that it’s easier to spot data quality problems in the data warehouse (vs in the system of record) but the best place to clean it up is in the business system. Figuring that handshake out is incredibly important.

2 Reactions

Nathalie Batoux

Empowering Science through data stewardship

Data quality is the foundation you can't ignore. I don't think it's boring. What is boring is having to unpick results to find where the input data was not of quality. I think a few more dimensions need to be added, such as source: where did the data come from. Also think FAIR and ALCOA+

Hugo Lu

Founder at Orchestra

Also important to understand that many data quality issues listed here can be caused by things unrelated to underlying data quality. For example a failure to orchestrate data pipelines and notice / recover from failed runs can lead to incomplete data (even if the underlying data is complete).

6 Reactions

Salma Sultana

Data Communications Consultant | Data Storytelling & Data Visualization Trainer | Corporate Workshop Facilitator | Ghostwriter | ≈20 years of Business Strategy, Analytics, and Presentation experience for C-Suite

I think we can add “Relevance” in there somewhere. I mean, sometimes you can have the most accurate, clean and error free data, yet it might have low quality if the information itself is not business relevant.

2 Reactions

William R.

Global CX Leader | $14.8M Revenue Impact | 5X ROI

Implement automated data quality scoring across these dimensions, then tie executive compensation to those metrics. What gets measured AND incentivized gets managed. This approach transformed data governance at multiple Fortune 500s I've advised.

1 Reaction

Andy Werdin

I help Data Analysts build a career | Director of Analytics | Python Expert | Advocate for Soft Skills in Data | ex-Zalando

Great breakdown of things to consider when looking at the data quality. I had countless headaches thanks to missing conformity between data sources 😅

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Jonatas Rasquinha

Master Data Management | Data Governance | Data Steward | SAP Data Quality | SAP Data Migration | Process Improvements | SAP MDG Workflow
5mo
Report this post
Do you know how to measure Data Quality?
Dylan Anderson

Bridging the gap between data and strategy ✦ Head of Data Strategy @ Profusion ✦ Author of The Data Ecosystem newsletter ✦ R Programmer ✦ Policy Nerd
5mo

Data quality, the most boring but essential topic in data At its core data quality can be broken down into six different dimensions: 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 – The correctness of the data, ensuring it reflects the real-world scenario it is intended to model and represent 𝐂𝐨𝐧𝐬𝐢𝐬𝐭𝐞𝐧𝐜𝐲 – Uniformity across data sources, ensuring that the data is the same between storage and usage and does not conflict within different datasets 𝐂𝐨𝐦𝐩𝐥𝐞𝐭𝐞𝐧𝐞𝐬𝐬 – All necessary data is present, with no missing elements that could impact analysis or decision-making 𝐓𝐢𝐦𝐞𝐥𝐢𝐧𝐞𝐬𝐬 – Data is up-to-date and available when needed, ensuring that decisions are based on the most current information 𝐔𝐧𝐢𝐪𝐮𝐞𝐧𝐞𝐬𝐬 – No duplicate records exist within the organisation, helping maintain the integrity and accuracy of the data (creating trust) 𝐕𝐚𝐥𝐢𝐝𝐢𝐭𝐲 – Ensuring the data remains accurate and consistent throughout its lifecycle and conforms to the agreed structure, data quality standards/ rules or list of values Use these dimensions to help frame, understand and measure the quality of your data Check out my newsletter article this Sunday, where I define what data quality really means and the underlying root causes that lead to issues. Then the week after, we will jump into some approaches to fixing data quality issues, sharing what you should and should not do! If you still haven’t signed up (and not soaked up all that data ecosystem knowledge), the link to my newsletter is in the comments! #data #dataquality #datastrategy #dataecosystem #DylanDecodes
Like Comment
To view or add a comment, sign in
Caroline Shelly John fCMgr

Commercial Data Governance Analyst | Finance for Sales | MBA | Data and Analytics | Azure Certified
5mo
Report this post
While everyone emphasizes the importance of high-quality data, the specific measures to achieve it are often less discussed. This infographic sheds light on essential measures. #DataQuality #DataManagement #DataGovernance #DataCleansing
Dylan Anderson

Bridging the gap between data and strategy ✦ Head of Data Strategy @ Profusion ✦ Author of The Data Ecosystem newsletter ✦ R Programmer ✦ Policy Nerd
5mo

Data quality, the most boring but essential topic in data At its core data quality can be broken down into six different dimensions: 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 – The correctness of the data, ensuring it reflects the real-world scenario it is intended to model and represent 𝐂𝐨𝐧𝐬𝐢𝐬𝐭𝐞𝐧𝐜𝐲 – Uniformity across data sources, ensuring that the data is the same between storage and usage and does not conflict within different datasets 𝐂𝐨𝐦𝐩𝐥𝐞𝐭𝐞𝐧𝐞𝐬𝐬 – All necessary data is present, with no missing elements that could impact analysis or decision-making 𝐓𝐢𝐦𝐞𝐥𝐢𝐧𝐞𝐬𝐬 – Data is up-to-date and available when needed, ensuring that decisions are based on the most current information 𝐔𝐧𝐢𝐪𝐮𝐞𝐧𝐞𝐬𝐬 – No duplicate records exist within the organisation, helping maintain the integrity and accuracy of the data (creating trust) 𝐕𝐚𝐥𝐢𝐝𝐢𝐭𝐲 – Ensuring the data remains accurate and consistent throughout its lifecycle and conforms to the agreed structure, data quality standards/ rules or list of values Use these dimensions to help frame, understand and measure the quality of your data Check out my newsletter article this Sunday, where I define what data quality really means and the underlying root causes that lead to issues. Then the week after, we will jump into some approaches to fixing data quality issues, sharing what you should and should not do! If you still haven’t signed up (and not soaked up all that data ecosystem knowledge), the link to my newsletter is in the comments! #data #dataquality #datastrategy #dataecosystem #DylanDecodes
Like Comment
To view or add a comment, sign in
Dylan Anderson

Bridging the gap between data and strategy ✦ Head of Data Strategy @ Profusion ✦ Author of The Data Ecosystem newsletter ✦ R Programmer ✦ Policy Nerd
5mo
Report this post
Data quality, the most boring but essential topic in data At its core data quality can be broken down into six different dimensions: 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 – The correctness of the data, ensuring it reflects the real-world scenario it is intended to model and represent 𝐂𝐨𝐧𝐬𝐢𝐬𝐭𝐞𝐧𝐜𝐲 – Uniformity across data sources, ensuring that the data is the same between storage and usage and does not conflict within different datasets 𝐂𝐨𝐦𝐩𝐥𝐞𝐭𝐞𝐧𝐞𝐬𝐬 – All necessary data is present, with no missing elements that could impact analysis or decision-making 𝐓𝐢𝐦𝐞𝐥𝐢𝐧𝐞𝐬𝐬 – Data is up-to-date and available when needed, ensuring that decisions are based on the most current information 𝐔𝐧𝐢𝐪𝐮𝐞𝐧𝐞𝐬𝐬 – No duplicate records exist within the organisation, helping maintain the integrity and accuracy of the data (creating trust) 𝐕𝐚𝐥𝐢𝐝𝐢𝐭𝐲 – Ensuring the data remains accurate and consistent throughout its lifecycle and conforms to the agreed structure, data quality standards/ rules or list of values Use these dimensions to help frame, understand and measure the quality of your data Check out my newsletter article this Sunday, where I define what data quality really means and the underlying root causes that lead to issues. Then the week after, we will jump into some approaches to fixing data quality issues, sharing what you should and should not do! If you still haven’t signed up (and not soaked up all that data ecosystem knowledge), the link to my newsletter is in the comments! #data #dataquality #datastrategy #dataecosystem #DylanDecodes
91 Comments
Like Comment
To view or add a comment, sign in
Daniela Santisteban

AIML Student | Data Product Manager Role | Data Analyst Heart
5mo
Report this post
Fantastic infographic by Dylan Anderson! Too often, companies overlook and discount the value of DQ, with either analytics teams being confused and impacted trying to tease apart what margin of error they should observe for their metrics, or creating inflated panic due to data not looking right and being unable to see if it was DQ slipping (and metrics being impacted as a result) or metrics slipping in reality. Sometimes it's hard to rationalize ROI of internal data initiatives, this becomes worse if leaders and decision makers can't TRUST the data!! Having DQ established early on, not only helps catch technical and performance bugs, relieving QA teams from some post-production issues, but data fidelity leads to happy stakeholders, better decision making, and invaluable insights on the state of the business for leadership.
Dylan Anderson

Bridging the gap between data and strategy ✦ Head of Data Strategy @ Profusion ✦ Author of The Data Ecosystem newsletter ✦ R Programmer ✦ Policy Nerd
5mo

Data quality, the most boring but essential topic in data At its core data quality can be broken down into six different dimensions: 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 – The correctness of the data, ensuring it reflects the real-world scenario it is intended to model and represent 𝐂𝐨𝐧𝐬𝐢𝐬𝐭𝐞𝐧𝐜𝐲 – Uniformity across data sources, ensuring that the data is the same between storage and usage and does not conflict within different datasets 𝐂𝐨𝐦𝐩𝐥𝐞𝐭𝐞𝐧𝐞𝐬𝐬 – All necessary data is present, with no missing elements that could impact analysis or decision-making 𝐓𝐢𝐦𝐞𝐥𝐢𝐧𝐞𝐬𝐬 – Data is up-to-date and available when needed, ensuring that decisions are based on the most current information 𝐔𝐧𝐢𝐪𝐮𝐞𝐧𝐞𝐬𝐬 – No duplicate records exist within the organisation, helping maintain the integrity and accuracy of the data (creating trust) 𝐕𝐚𝐥𝐢𝐝𝐢𝐭𝐲 – Ensuring the data remains accurate and consistent throughout its lifecycle and conforms to the agreed structure, data quality standards/ rules or list of values Use these dimensions to help frame, understand and measure the quality of your data Check out my newsletter article this Sunday, where I define what data quality really means and the underlying root causes that lead to issues. Then the week after, we will jump into some approaches to fixing data quality issues, sharing what you should and should not do! If you still haven’t signed up (and not soaked up all that data ecosystem knowledge), the link to my newsletter is in the comments! #data #dataquality #datastrategy #dataecosystem #DylanDecodes
1 Comment
Like Comment
To view or add a comment, sign in
James Dziak

Senior Technical Consultant - Hybrid Cloud Transformation at IBM
5mo
Report this post
Data quality is going to be the difference between good AI systems and bad ones.
Dylan Anderson

Bridging the gap between data and strategy ✦ Head of Data Strategy @ Profusion ✦ Author of The Data Ecosystem newsletter ✦ R Programmer ✦ Policy Nerd
5mo

Data quality, the most boring but essential topic in data At its core data quality can be broken down into six different dimensions: 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 – The correctness of the data, ensuring it reflects the real-world scenario it is intended to model and represent 𝐂𝐨𝐧𝐬𝐢𝐬𝐭𝐞𝐧𝐜𝐲 – Uniformity across data sources, ensuring that the data is the same between storage and usage and does not conflict within different datasets 𝐂𝐨𝐦𝐩𝐥𝐞𝐭𝐞𝐧𝐞𝐬𝐬 – All necessary data is present, with no missing elements that could impact analysis or decision-making 𝐓𝐢𝐦𝐞𝐥𝐢𝐧𝐞𝐬𝐬 – Data is up-to-date and available when needed, ensuring that decisions are based on the most current information 𝐔𝐧𝐢𝐪𝐮𝐞𝐧𝐞𝐬𝐬 – No duplicate records exist within the organisation, helping maintain the integrity and accuracy of the data (creating trust) 𝐕𝐚𝐥𝐢𝐝𝐢𝐭𝐲 – Ensuring the data remains accurate and consistent throughout its lifecycle and conforms to the agreed structure, data quality standards/ rules or list of values Use these dimensions to help frame, understand and measure the quality of your data Check out my newsletter article this Sunday, where I define what data quality really means and the underlying root causes that lead to issues. Then the week after, we will jump into some approaches to fixing data quality issues, sharing what you should and should not do! If you still haven’t signed up (and not soaked up all that data ecosystem knowledge), the link to my newsletter is in the comments! #data #dataquality #datastrategy #dataecosystem #DylanDecodes
Like Comment
To view or add a comment, sign in
Emma Corbett

Bringing people and data together | Strategist | Data Strategy & Governance | Coach and Mentor |Trustee | Fractional Leader | Trusted Advisor | Naturally Curious Complex Problem Solver
5mo
Report this post
I've been talking a lot with people about data quality recently. I like this explanation and graphic from Dylan Anderson articulating the data quality dimensions. It's easy to say 'we want quality data' but how do you define quality? If you can articulate your needs through these lenses, then you can start to measure and monitor your data quality. Also important to remember, you don't always need 💯 data quality. Sometimes good enough is good enough. #Data #DataQuality
Dylan Anderson

Bridging the gap between data and strategy ✦ Head of Data Strategy @ Profusion ✦ Author of The Data Ecosystem newsletter ✦ R Programmer ✦ Policy Nerd
5mo

Data quality, the most boring but essential topic in data At its core data quality can be broken down into six different dimensions: 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 – The correctness of the data, ensuring it reflects the real-world scenario it is intended to model and represent 𝐂𝐨𝐧𝐬𝐢𝐬𝐭𝐞𝐧𝐜𝐲 – Uniformity across data sources, ensuring that the data is the same between storage and usage and does not conflict within different datasets 𝐂𝐨𝐦𝐩𝐥𝐞𝐭𝐞𝐧𝐞𝐬𝐬 – All necessary data is present, with no missing elements that could impact analysis or decision-making 𝐓𝐢𝐦𝐞𝐥𝐢𝐧𝐞𝐬𝐬 – Data is up-to-date and available when needed, ensuring that decisions are based on the most current information 𝐔𝐧𝐢𝐪𝐮𝐞𝐧𝐞𝐬𝐬 – No duplicate records exist within the organisation, helping maintain the integrity and accuracy of the data (creating trust) 𝐕𝐚𝐥𝐢𝐝𝐢𝐭𝐲 – Ensuring the data remains accurate and consistent throughout its lifecycle and conforms to the agreed structure, data quality standards/ rules or list of values Use these dimensions to help frame, understand and measure the quality of your data Check out my newsletter article this Sunday, where I define what data quality really means and the underlying root causes that lead to issues. Then the week after, we will jump into some approaches to fixing data quality issues, sharing what you should and should not do! If you still haven’t signed up (and not soaked up all that data ecosystem knowledge), the link to my newsletter is in the comments! #data #dataquality #datastrategy #dataecosystem #DylanDecodes
2 Comments
Like Comment
To view or add a comment, sign in
Abhishek Walia
6mo
Report this post
🔍 The Importance of Data Cleaning in Analytics In the world of data analytics, the accuracy and reliability of our insights hinge on the quality of our data. Data cleaning, though often overlooked, is a critical step that ensures our data is primed for meaningful analysis. Here are some key steps I follow in the data cleaning process: Data Profiling: Start by understanding your data. Identify inconsistencies, missing values, and outliers. Handling Missing Data: Decide on the best approach for dealing with missing values—whether it's imputation, deletion, or ignoring them, based on the context. Removing Duplicates: Ensure no redundant data entries are skewing your results. This step is crucial for maintaining data integrity. Standardizing Data: Normalize data formats and units. Consistent data is easier to analyze and compare. Dealing with Outliers: Identify and address outliers that may distort your analysis. This could involve removing them or using statistical methods to adjust their impact. Data Validation: Verify the accuracy and consistency of your data against known standards or benchmarks. Documenting Changes: Keep a record of all changes made during the cleaning process. This transparency helps in tracking the data’s journey and ensuring repeatability. By investing time in thorough data cleaning, we can unlock the true potential of our datasets and derive insights that drive impactful decisions. Clean data leads to clear insights! 🌟 #DataCleaning #DataQuality #DataAnalytics #DataScience #DataPreparation
3 Comments
Like Comment
To view or add a comment, sign in
Salma Ghribi

Talend DI,ESB Certified x2 | Senior Data Analyst | BI | Data Engineer
9mo
Report this post
🚨📊𝐃𝐚𝐭𝐚 𝐜𝐥𝐞𝐚𝐧𝐬𝐢𝐧𝐠: 𝐓𝐡𝐞 𝐅𝐨𝐮𝐧𝐝𝐚𝐭𝐢𝐨𝐧 𝐟𝐨𝐫 𝐈𝐧𝐟𝐨𝐫𝐦𝐞𝐝 𝐃𝐞𝐜𝐢𝐬𝐢𝐨𝐧-𝐌𝐚𝐤𝐢𝐧𝐠 🔎The availability and cleanliness of data are crucial in data-driven field. Prioritizing data quality initiatives and maintaining stringent data hygiene practices is essential for organizations to thrive in today's data-driven field. ➡️These are the main reasons to consider the importance of data cleansing : 📌Access to accurate data empowers businesses to make informed decisions and derive meaningful insights. 📌Clean data ensures the reliability and accuracy of analyses and decision-making processes. 📌Investing in data availability and cleanliness enhances operational efficiency. 📌It safeguards the integrity and effectiveness of decision-making across various domains. #data #dataanalytics #dataengineering #dataanalyst #dataengineer
2 Comments
Like Comment
To view or add a comment, sign in
Voleergo Solutions LLP

2,155 followers
5mo
Report this post
In today's data-driven landscape, accurate and reliable data is crucial for informed decision-making and business success. However, errors, inconsistencies, and inaccuracies can lurk in even the most well-managed datasets. That's where data cleansing comes in! Data cleansing is the process of identifying and correcting errors, filling in missing values, and standardizing data formats to improve data quality and integrity. By prioritizing data cleansing, you can: Enhance data accuracy and reliability Boost data-driven decision-making Optimize data analytics and insights Improve data governance and compliance Don't let dirty data hold you back! Embrace data cleansing as an ongoing process in your data management strategy. #DataCleansing #DataQuality #DataIntegrity #DataManagement #DataGovernance #DataAnalytics #DecisionMaking
Like Comment
To view or add a comment, sign in
Arpit Sharma

Data Quality Analyst at Allica Bank || Alternative Assets || CAIA || Business Intelligence|| PowerBI || Business Analyst|| Excel
7mo
Report this post
6 Steps for Better Data Quality Ensuring top-notch data quality is crucial for reliable insights and decision-making. Here are six essential steps to elevate your data quality game: 1. **Data Profiling:** - Conduct thorough assessments of your data sources to understand their structure, content, and quality. This step helps identify anomalies and inconsistencies early on. 2. **Data Cleansing:** - Remove duplicates, correct errors, and fill in missing values. Clean data is the foundation of accurate analytics and reporting. 3. **Data Standardization:** - Implement uniform formats and conventions across your data sets. Consistency in data entry and storage minimizes confusion and errors. 4. **Data Validation:** - Establish validation rules to ensure data accuracy and integrity. Regular checks and automated validations catch issues before they escalate. 5. **Data Governance:** - Develop and enforce policies for data management, including roles, responsibilities, and processes. Strong governance ensures accountability and quality control. 6. **Continuous Monitoring:** - Set up ongoing monitoring and auditing systems to detect and address data quality issues in real time. Continuous improvement keeps your data reliable and relevant. By following these steps, you can significantly enhance the quality of your data, leading to more accurate insights and better business decisions. #DataQuality #DataManagement #BusinessIntelligence #DataGovernance #DataAnalytics
Like Comment
To view or add a comment, sign in

42,015 followers

View Profile Follow

Dylan Anderson’s Post

More from this author

Robinhood’s Upcoming Brand Crisis

Explore topics