Is the Accuracy Dimension Independent or Dependent on the Validity Dimension?

Sheron Philip Koshy

Technical Director at ECCMA

Published Nov 27, 2024

When evaluating data quality, two dimensions often discussed are Accuracy and Validity. While both are critical for ensuring high-quality, reliable data, the relationship between these two dimensions is nuanced. Are they independent, or does one depend on the other?

Understanding Accuracy and Validity

Accuracy refers to the extent to which data correctly represents the real-world values or true measurements. In simpler terms, accurate data is correct data. ISO 8000-102 defines data accuracy as the “closeness of agreement between a property value and a true value”. For example, if a thermometer reports a temperature of 30°C when the actual temperature is indeed 30°C, that temperature reading is accurate. Accuracy is about the truthfulness or correctness of data in reflecting the reality it is supposed to represent.

Validity, on the other hand, is about whether data conforms to defined rules, formats, or business constraints. Validity ensures that data is entered in an acceptable way, adhering to certain expected patterns or structures. For example, if a date field is meant to follow the "YYYY/DD/MM" format as per ISO 8601, any date entered in this format is considered valid, even if it represents an untrue value

Are Accuracy and Validity Independent or Dependent?

While Accuracy and Validity may look to be interdependent, they are distinct dimensions that can overlap in certain cases.

Accuracy as Independent of Validity

Accuracy focuses solely on correctness in relation to real-world facts. If the recorded data reflects the truth, it is accurate, regardless of whether the data format follows predefined rules or not.

Validity, by contrast, is concerned with conformance to predefined rules, not necessarily whether the data is true or false. It checks if the data fits the expected structure, but this does not guarantee its truthfulness.

For example:

A recorded temperature of 30°C (accurate and correct) in a text field (not valid, because it doesn't conform to the expected format of a numerical value) may still be accurate but invalid in terms of structure.

A valid date like “2020/02/28" (which conforms to "YYYY/MM/DD" format) could still be inaccurate if it reflects the wrong date due to human error or incorrect data entry.

Thus, accuracy and validity can exist independently of each other. Data can be valid but inaccurate, or accurate but non-conforming to data entry standards.

When Validity Affects Accuracy

While the two dimensions are independent, in some cases, validity can influence accuracy. For data to be accurately interpreted, it often needs to adhere to a certain structure or format. Invalid data, in some contexts, can result in inaccurate conclusions, especially if the format or business rules are essential to the data’s interpretation.

If the date format is invalid or inconsistent across the dataset, it may lead to incorrect analysis, even though the actual dates themselves are accurate (i.e., they represent real-world dates). In this way, validity ensures the data is in a format that allows for accurate analysis, especially when dealing with complex datasets or systems that require standardization.

Conclusion: Independent Dimensions with Some Overlap

•In summary, accuracy and validity are independent dimensions of data quality. Accuracy is about how well the data represents the true, real-world value, while validity is about how well data adheres to defined formats and standards.

•However, the two dimensions can sometimes intersect. Invalid data formats can prevent accurate interpretation, and accurate data may become meaningless if not entered in a valid structure. Still, accuracy and validity maintain their roles as separate, distinct dimensions of data quality that each serve a unique purpose in ensuring that data is both truthful and usable.

•To ensure high data quality, it's essential to focus on both dimensions—ensuring the accuracy of the data while also enforcing validity so that the data remains structured and usable across systems.

Sami Laine

Data Management Advisor&Consultant | CDOIQ Nordic Symposium | DAMA Finland ry

A good comment on data quality dimensions. I have contrasted two alternative definitions like this for years. Fit for purpose vs conforms to requirements. They are different and actually very different - it's the same case as with accuracy and validity. Fit for purpose can be measured afterwards and without explicit quality metrics. Your massive blackbox data just gives always correct answers even if you did not specify requirements i.e. DQ rules beforehands. The case of accuracy - it represents real world, provides value and you get correct algorithmic results. Conforms to requirements expects ability to specify requirements beforehands in a measurable way. Your data can conform to rules but still fail to support the business purpose because people cannot specify all details of the world and data in a perfect way. The case of validity - rules can be fulfilled successfully although reality is something different and business cases fail. Personally, I prefer always fit for purposes but there is also standardization school of though that wants to see metrics and conformance to the process. This was a good article on some of the fundamental issues in DQ that too many don't pay attention to.

Peter van Nederpelt

Great discussion. Are their more dependencies? https://datamanagement.wiki/overview/overview_data_quality_dimension

Is the Accuracy Dimension Independent or Dependent on the Validity Dimension?

Sheron Philip Koshy

Technical Director at ECCMA

More articles by this author

Insights from the community

Others also viewed

How do users measure the value of your data? Article No. 2

Gut decisions, data obscurity

Creating Fact Based Conversation

Ideas and guidelines on how businesses can improve the quality of their data

Understanding Data Types in SQL: Why They Matter

Actionable Data: What Is It? Here's Everything You Need to Know

Is Your Company Data Giving You the Whole Picture?

Sample Size Determination for Data Quality Checks

You can't report on everything

Climb the Information Ladder and turn your data into information

Explore topics

The Role of Standards in Enabling Data Portability

Dec 6, 2024

Understanding the Overlap Between Metadata, Master Data, Reference Data and Transaction Data

Jul 30, 2024

ISO 8000-116: Formatting of Organization Identifiers, aka Authoritative Legal Entity Identifiers

Jan 17, 2023

The Requirements of a Quality Identifier - ISO 8000-115

Nov 15, 2022

Building Master Data using ISO 22745 or eOTD

Oct 17, 2022

Framework - ISO 8000 Standard on Data Quality

Oct 10, 2022

When do we call a Material in a Material Master a Duplicate?

Sep 15, 2022