Data Nugget January 2024
29 January 2024
A warm welcome to our Data Nugget newsletter subscribers in 2024. As we continue to grow our community, we aim to provide better content and timely delivery of our nuggets. So without further ado, let's dive into some crisp stories from the data management sphere. First, we have a concise book review on the rules of communicating with data. Second, we have an opinion-based piece about the importance of domain-driven talks. Third, we have an interesting nugget on the data architecture maturity. And lastly, we have the next podcast focusing on the business impact of data ethics.
We hope you enjoy this month's edition. Wish you all a happy New Year!
Let's grow Data Nugget together. Forward it to a friend. They can sign up here to get a fresh version of Data Nugget on the last day of every month.
Rules for Communicating with Data: A Book Review
Nugget by Winfried Adalbert Etzel
Who is this book for?
“Business Knowledge Blueprints” by Ronald Ross is a book that reminds us of the importance of effective and clear communication in the data-driven world. If you work in Data Quality, Data Architecture, Data Modeling, or Data Governance, this book is a must-read, and if you have already read it, it is worth revisiting to refresh your understanding of how we communicate in the ever-evolving landscape of data.
This book holds some of the insights into how we communicate with data, conceptual models, and data quality. And with that, it focuses on a part of the work with data that is important, but sadly often overlooked.
The power of communication
At the heart of Ronald Ross’ book is a profound realization that the work we do with data models, conceptual models, and data quality is connected to how we communicate within our businesses. This connection underscores the responsibility of those who create and manage data. In a world where data is often called the new oil, effective communication becomes the pipeline through which this valuable resource flows.
It is important to understand that 'The act of creating data is the act of creating a message to people in the future.' This is my favourite quote from this book. The quality of data within a data/system architecture can never be any better than the quality of the business communications that produced it, hence, a need for precision and consistency in every business communication.
The root of the problem: An application-centric approach
A long-term application-centric approach is often to blame for the problems related to data integration. In many cases, each project within an organization develops its files and databases, leading to a fractured data landscape. The result is a collection of local, narrow views that must be brought together through external interfaces.
The author managed to showcase this issue and emphasized that in this scenario, the big picture is not just lost, it never truly existed in the first place.
Ontologies and structure
To address this issue, there is a need for a shift in perspective. This involves embracing ontologies and related software capabilities, such as Financial Information Business Ontology (FIBO). These structures offer a more systematic approach to structuring data, helping to bridge the gap between fractured data sources and applications.
The importance of definitions
One of the central elements of “Business Knowledge Blueprints” is the importance of business vocabulary, particularly precise definitions. As Ross puts it, “Remember that business vocabulary is important — especially definitions. Be aware of the need for precision and consistency in every business communication.”
There are several important notions that the book prompts us to think about in our business context:
How do we distinguish things within our business context?
How do we name things to ensure clarity and avoid misleading interpretations?
How should we define things as opposed to merely describing them?
What distinguishes classification from categorization, and why is this difference important?
Ronald Ross made clear that these questions are not just intellectual exercises but practical guides to crafting robust business concept models.
To start with, it is important to understand what a business concept model is. A conceptual model is a representation of key elements of a target problem. It deliberately excludes complexity, focusing on clarity and precision in business knowledge and communication. In contrast, a data model is primarily aimed at organizing data for storage within a system while balancing maintainability and performance.
Two fundamental principles underscore the importance of well-structured data:
The Ironclad Rule of Classifications: Every instance of a general concept always satisfies each definitional criterion. This ensures that data is consistently organized and classified.
The Ironclad Rule of Categorizations: Every instance of a category is also an instance of the super category. Categorization takes classifying things to a deeper level, allowing for even greater precision.
I think these are such foundational principles that they should be part of starting any data modelling work.
Why does this book matter?
The staggering percentage of IT activity devoted to data integration and reporting highlights the urgency of re-evaluating our approach. To address this issue, it is essential to embrace a conceptual model that captures how people understand things and to ensure precision and consistency in business communication. The solution to the data integration challenge lies in breaking free from the shackles of an application-centric approach and adopting a holistic, concept-centric perspective.
Ronald Ross’ book strikes a chord with professionals in the fields of Data Quality, Data Architecture, Data Modeling, and Data Governance for several reasons:
Timelessness: In a world of rapid changes and disruptions driven by generative AI, Data Mesh, and more, the principles discussed in this book remain eternally relevant. Effective communication is a constant need, and it only becomes more valuable as our data landscapes evolve.
Real-world relevance: The book’s insights are not confined to theory; they offer practical guidance for professionals dealing with data every day. It addresses issues that those in the field encounter regularly.
Demanding change: We need to adapt to change while preserving the core principles of effective communication. In an era where AI is increasingly embedded in our workflows, communication still is the key to success.
My recommendation
A great book. A topic more people in data should be concerned about. There is so much rapid change in the data world, disruption through generative AI, Data Mesh, you name it. But what has been written about in this book stays relevant no matter the changes and challenges we face. It only increases in value. If we cannot communicate, how can we expect an AI to do better? Thank you for this book!
OpEd: A Domain by Any Other Name
Nugget by Kjetil Eritzland. Quotes from Eric Evans’ book Domain-Driven Design.
Since the launch of Dehghani’s book “Data Mesh,” domains have become the talk of the town. However, there is confusion as to what a domain is. People talk about data and business domains and discuss which domains we should have in an organization. But all of this without defining what a domain is.
Eric Evans’ book “Domain-Driven Design” deals with software development, not the production of data products, but there are similarities. Domain-driven design is about moving the focus to solving and understanding business problems rather than technical challenges. This is also the objective of data mesh.
Any application should relate to some activity of interest to the user. The subject area to which the user applied the program is what Evans calls the domain of the software. A subject area can, for instance, be production, accounting, or customer relationship management.
Each subject area is a sphere of knowledge. To understand the subject area, we need subject matter experts as part of our development team. But our developers also need to steep themselves deep in the subject area to build up knowledge of the business.
In the data world, we need to ask what expertise we need to ensure that we understand the data in our data products and their value. For source-aligned data products, we would typically need people using the applications that create the data. For data products more closely aligned with a consumption use case, e.g., a report or a dashboard, it would be the main users of those reports and dashboards.
To understand the business problems, Evans recommends building a model of the problem area. The model is not a diagram, but a diagram can be part of the model.
It is important to understand that the model is not something we build before we start implementing and then forget about it. We develop the model in lockstep with the implementation, reflecting the combined insight of subject matter experts and technologists. We develop the model in a true agile fashion, iteratively, as our understanding of the subject area deepens. For a model to be useful for developers, it must also address implementation issues in addition to business-related matters.
In the data world, a conceptual data model diagram, complemented with a business glossary, and descriptions of the problems we are going to solve, could constitute the model.
A bounded context is the area where a model is applicable. Within a bounded context, we can develop independently of other bounded contexts. A bounded context should not span more than one subject area. One subject area may contain even more than one bounded context if the effort needed to build, maintain, and support the data products within that subject area is too big for one domain team.
A bounded context delimits the applicability of a particular model so that the team members have a clear and shared understanding of the area within which our model is applicable. Within our context, we work to keep the model logically unified but do not worry about applicability outside of those bounds. In other contexts, other models apply, with differences in terminology, concepts and rules. By drawing an explicit boundary, you can keep the model pure. At the same time, you avoid confusion when shifting your attention to other contexts. Integration across the boundaries will involve some translations, which you can then analyze explicitly.
In a data mesh, each of our domain teams would operate within a bounded context. We should name each bounded context and define its area of applicability. It is these bounded contexts we are referring to when we talk about domains in a data mesh. If we want to do an exercise up front to identify our domains, the best place to start would be our subject areas. But the most important thing about a domain is its boundary. How we define what is inside a domain decides which problems we can work on autonomously within a domain team without interfering with the work of other teams.
Data Architecture Maturity
Nugget by Gaurav Sood.
Data is the oil of the 21st century, meaning it is essential to almost every business in the world directly or indirectly. Businesses put a lot of emphasis on the early adoption of new technology and architecture when it comes to data. But in the race for that, they often miss the evaluation of their data architecture and whether it is mature enough to handle the changes.
This is where it is important to understand the maturity level of your data setup. The data architects, together with the infrastructure teams, should continuously keep evaluating the data architecture maturity, and the following steps can help with that.
Current state evaluation
The first step is to understand the current state of your architecture to identify the gaps and opportunities for improvement. The most common way is to use a data architecture maturity model, to evaluate your data architecture across six dimensions: vision, strategy, organization, governance, delivery, and operations. The model can help you benchmark your data architecture against best practices and industry standards and prioritize the areas that need more attention and investment.
Defining the target state
Next is to define your target state and align it with your business objectives and goals. The architects should have a clear vision of what the business wants to achieve with the data architecture and how it will support the business strategy and value proposition. It is also important to have a roadmap that outlines the key milestones, deliverables, and dependencies for reaching the target state. The target state should be realistic, measurable, and adaptable to changing business needs and technologies.
Best practice implementation
Next is to implement best practices and standards that will optimize the data architecture design, development, and management. You should apply data modelling techniques that capture business concepts and rules as well as leverage data integration tools for movement, transformation, and synchronization. Also, establish data governance policies and procedures for ownership, stewardship, security, privacy, and compliance. Lastly, utilize data analytics and visualization tools for data discovery, exploration, and insight generation.
Measuring progress
The fourth step is to monitor and measure your progress and performance against your target state. It is very important to have a set of key performance indicators (KPIs) and metrics that reflect the value and impact of your data architecture on your business outcomes and processes. You should also have a feedback mechanism that allows you to collect and analyze data from various sources, such as users, customers, stakeholders, and systems, and use it to identify and resolve issues, risks, and opportunities for improvement.
Open to learning
The final step is to learn and adapt to the changing business and technology landscape. It is important to be flexible and always be on the lookout for new trends, innovations, and best practices that will help you enhance your data architecture capabilities and competencies. You should also be open to experimenting with new ideas, tools, and methods that can help you solve new or existing data challenges and create new or better data solutions. You should also be willing to share your knowledge, experience, and lessons learned with your peers, colleagues, and community, and learn from their feedback, insights, and perspectives.
Improving data architecture maturity is a multifaceted process that requires a holistic approach. It involves understanding your current state, defining a clear data strategy, implementing data governance, modernizing data storage, ensuring security and privacy, enabling data analytics, and fostering a culture of continuous improvement. By following these steps and considering the key considerations outlined above, organizations can enhance their data architecture maturity and harness the full potential of their data assets to drive business success.
You can read more here.
MetaDAMA 2#15: The Business Impact of Data Ethics
Nugget by Winfried Adalbert Etzel
Infants with guns!
Are we mature enough to track, collect and handle data responsibly, according to ethical standards? I talked with the Director of Data Innovation at IIH Nordic Steen Rasmussen about the Business impact of Data Ethics. Here are my key takeaways:
If were track, collect, and keep all data for any random opportunistic purpose, we would put your companies at risk. This includes a commercial curse of budget-heavy tracking, budget-light management and business value creation through data. ROT, Norwegian for clutter, is an acronym for Redundant, Obsolete, Trivial - the data that clutters your way to find valuable data.
Collection and tracking of data are still too dependent on people: If there is a change in personnel, you get situations where new people clutter the clutter.
Marketing & Sales
For many companies, it was Marketing & Sales that drove the data-driven agenda.
The big value of Marketing & Sales is to add the market dimension to the data.
You can relate your product to the market, and ship to where the market is.
Analyzing market data is «putting a fixed entity on a moving target». The market changes too rapidly to provide a good analysis.
The more you push behavioral forecasting into the future, the bigger your uncertainty.
Business Value & Ethics
Corporate irresponsibility is an issue.
Sometimes we get involved in a project for the project's sake.
Data projects have for a long time been theoretical, so the impact was not visible.
ChatGPT is a black box. Should we give it more firepower if we do not know how it works?
The market determines that there is a business value in being first.
The speed of innovation does not give time for reactive regulatory bodies to regulate efficiently.
Companies need data ethical guidelines to say how they will, shall and can use data.
Who should define data ethical guidelines in a company? It is still done on a user level, whilst senior management is looking at market situations and weighing them against ethical guidelines.
We need regulatory and top-level guidelines that cannot be bent according to market situations.
Ideally, but highly unlikely we need a global set of data ethical guidelines.
The more trustworthy you are as a company, the more relevant data is shared with you.
With that trust and data, you can understand the market better than companies that are not trustworthy and flying blind.
Personal Data Literacy is important, and we need basic digital skills in our society.
There is also a lack of understanding when it comes to measures set in place for people's benefit, e.g. Cookie-banners.
We are still lacking good privacy-approved alternatives to the tools we are using on an everyday basis.
Everyone has to follow ethical guidelines. We cannot have a DarkOps department in our company.
Data Ethics guidelines should be something everyone can refer to.
Ask yourself: What is the minimum of data we require to collect? Anything else should become an ethical question.
Data Protection Laws
There is a difference between the interpretation of regulations in the EU.
Nordic countries interpret law relatively, is this just, fair, and reasonable?
Southern European countries use a more Napoleonic or dogmatic approach, where the law is the law, and the law must be obeyed.
Both ChatGPT and Google Analytics have been handled differently by data protection authorities.
Data Protection Authorities generalize too much and don’t look at differences in technology.
Is a strict, generalized interpretation creating panic, for users of e.g. Google Analytics?
You can listen to the podcast here or on any of the common streaming services (Apple Podcast, Google Podcast, Spotify, etc.) Note: The podcasts in our monthly newsletters are behind the actual airtime of the MetaDAMA podcast series.
Thank you for reading this edition of Data Nugget. We hope you liked it.
Data Nugget was delivered with a vision, zeal and courage from the editors and the collaborators.
You can visit our website here, or write us at [email protected].
I would love to hear your feedback and ideas.
Data Nugget Head Editor
Data journalism
10moWell written by everyone! 🙌
Product Manager for Data Lineage at DNB || CDMP || PR-ansvarlig i DAMA Norge
10mo🥳