Is the Accuracy Dimension Independent or Dependent on the Validity Dimension?
When evaluating data quality, two dimensions often discussed are Accuracy and Validity. While both are critical for ensuring high-quality, reliable data, the relationship between these two dimensions is nuanced. Are they independent, or does one depend on the other?
Understanding Accuracy and Validity
Accuracy refers to the extent to which data correctly represents the real-world values or true measurements. In simpler terms, accurate data is correct data. ISO 8000-102 defines data accuracy as the “closeness of agreement between a property value and a true value”. For example, if a thermometer reports a temperature of 30°C when the actual temperature is indeed 30°C, that temperature reading is accurate. Accuracy is about the truthfulness or correctness of data in reflecting the reality it is supposed to represent.
Validity, on the other hand, is about whether data conforms to defined rules, formats, or business constraints. Validity ensures that data is entered in an acceptable way, adhering to certain expected patterns or structures. For example, if a date field is meant to follow the "YYYY/DD/MM" format as per ISO 8601, any date entered in this format is considered valid, even if it represents an untrue value
Are Accuracy and Validity Independent or Dependent?
While Accuracy and Validity may look to be interdependent, they are distinct dimensions that can overlap in certain cases.
Accuracy as Independent of Validity
Accuracy focuses solely on correctness in relation to real-world facts. If the recorded data reflects the truth, it is accurate, regardless of whether the data format follows predefined rules or not.
Validity, by contrast, is concerned with conformance to predefined rules, not necessarily whether the data is true or false. It checks if the data fits the expected structure, but this does not guarantee its truthfulness.
For example:
A recorded temperature of 30°C (accurate and correct) in a text field (not valid, because it doesn't conform to the expected format of a numerical value) may still be accurate but invalid in terms of structure.
A valid date like “2020/02/28" (which conforms to "YYYY/MM/DD" format) could still be inaccurate if it reflects the wrong date due to human error or incorrect data entry.
Thus, accuracy and validity can exist independently of each other. Data can be valid but inaccurate, or accurate but non-conforming to data entry standards.
When Validity Affects Accuracy
While the two dimensions are independent, in some cases, validity can influence accuracy. For data to be accurately interpreted, it often needs to adhere to a certain structure or format. Invalid data, in some contexts, can result in inaccurate conclusions, especially if the format or business rules are essential to the data’s interpretation.
If the date format is invalid or inconsistent across the dataset, it may lead to incorrect analysis, even though the actual dates themselves are accurate (i.e., they represent real-world dates). In this way, validity ensures the data is in a format that allows for accurate analysis, especially when dealing with complex datasets or systems that require standardization.
Conclusion: Independent Dimensions with Some Overlap
•In summary, accuracy and validity are independent dimensions of data quality. Accuracy is about how well the data represents the true, real-world value, while validity is about how well data adheres to defined formats and standards.
•However, the two dimensions can sometimes intersect. Invalid data formats can prevent accurate interpretation, and accurate data may become meaningless if not entered in a valid structure. Still, accuracy and validity maintain their roles as separate, distinct dimensions of data quality that each serve a unique purpose in ensuring that data is both truthful and usable.
•To ensure high data quality, it's essential to focus on both dimensions—ensuring the accuracy of the data while also enforcing validity so that the data remains structured and usable across systems.
Data Management Advisor&Consultant | CDOIQ Nordic Symposium | DAMA Finland ry
2wA good comment on data quality dimensions. I have contrasted two alternative definitions like this for years. Fit for purpose vs conforms to requirements. They are different and actually very different - it's the same case as with accuracy and validity. Fit for purpose can be measured afterwards and without explicit quality metrics. Your massive blackbox data just gives always correct answers even if you did not specify requirements i.e. DQ rules beforehands. The case of accuracy - it represents real world, provides value and you get correct algorithmic results. Conforms to requirements expects ability to specify requirements beforehands in a measurable way. Your data can conform to rules but still fail to support the business purpose because people cannot specify all details of the world and data in a perfect way. The case of validity - rules can be fulfilled successfully although reality is something different and business cases fail. Personally, I prefer always fit for purposes but there is also standardization school of though that wants to see metrics and conformance to the process. This was a good article on some of the fundamental issues in DQ that too many don't pay attention to.
information security | Management Systems: Quality (ISO 9001) | Information Security (ISO 27001 NEN 7510) | Safety (ISO 45001) | Data Quality (DQMS) | Knowledge (ISO 30401).
3wGreat discussion. Are their more dependencies? https://datamanagement.wiki/overview/overview_data_quality_dimension