Selected Data Engineering Posts . . . October 2024

Selected Data Engineering Posts . . . October 2024

The most popular of my data engineering posts in October 2024 ... with additional references ...


Welcome to the October edition of "Selected Data Engineering Posts." This month, we’re looking at new tools and strategies that are making data engineering more efficient and impactful. Learn how generative AI coding tools from major tech companies are changing productivity and coding practices, with wider use expected soon. We’ll also discuss how to optimize Delta tables in Microsoft Fabric using techniques like Z-ordering and data compaction for large projects. AtScale's new Semantic Modeling Language aims to standardize semantic layers across different platforms, improving collaboration and reusability. Other articles cover important topics like using knowledge graphs for AI, AI in gathering software requirements, the competitive AI marketplace, effective data self-service, and data governance for secure insights.

Subscribe to stay updated on the latest advancements in data engineering.

This issue:

GenAI Coding Tools: Tech companies are leading the adoption of generative AI coding tools, which boost productivity and streamline code writing. Other industries are likely to follow, driven by significant efficiency gains, despite challenges like technical debt and varying code quality.

Delta Tables in Fabric: The article discusses advanced optimization techniques and maintenance strategies for managing Delta tables in Microsoft Fabric, including data compaction, Z-ordering, file size management, and commands like OPTIMIZE and VACUUM to enhance performance and efficiency in large-scale data operations.

Semantic Modeling Language: AtScale introduces the Semantic Modeling Language (SML) to standardize semantic layers across platforms. SML enables reusable, shareable models, supports multi-dimensional use cases, and promotes open collaboration through its open-source approach, aiming to streamline data analytics and model-building for business users and data scientists.

Knowledge Graphs: Knowledge graphs function like the human brain by efficiently storing, organizing, and retrieving data through semantic connections. They enhance AI by structuring data for accurate, scalable insights, supporting inferencing, and facilitating proprietary data management. This ensures reliable, context-driven AI applications.

Gathering Requiremens with AI: Generative AI, specifically using the RAG-based approach with Gemini 1.5 Pro, is used to enhance software requirements gathering. Manual tasks are automated, accuracy is improved, complex datasets are handled, and innovative requirements are generated, streamlining development.

AI Marketplace: The AI market is competitive. The article discusses the evolving AI market and the strategies of major vendors such as Microsoft, AWS and Google. It categorizes the providers based on the openness of their AI models and their monetization approaches and emphasizes the importance of aligning with partners who share similar business values when making AI decisions.

Self Service: Self-service in data management is an ongoing journey, requiring a deep understanding of business users. Successful implementation involves establishing appropriate data architecture and governance, creating a supportive operating model, and utilizing the right tools to empower users effectively.

Data Strategy: Organizations must evaluate their data management practices to ensure they align with business strategies and support real-time analytics and generative AI. A comprehensive data strategy enables effective data governance, democratization, and collaboration, ultimately unlocking business value and improving decision-making across the organization.

Effective Data Governance: This blog post discusses the importance of data governance and provides strategies on how to communicate it to various stakeholders. It emphasizes the need for data security, compliance, trusted data for decision making, accelerated insights and ROI on the data stack. By addressing these concerns and demonstrating the benefits of good governance, data stewards can effectively advocate for the necessary resources and drive successful data initiatives.

EEBO - Engineering Excellence to Business Outcomes: Engineering Excellence to Business Outcomes highlights EEBO metrics that bridge engineering efforts and business impact by aligning engineering work with financial results. The book emphasizes common metrics selection, clear communication, and accountability, helping organizations effectively demonstrate the value of engineering.

Looking forward to sharing these insights with you and supporting you in your quest for data excellence.

Enjoy reading!


𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗔𝗜 𝗖𝗼𝗱𝗶𝗻𝗴 𝗧𝗼𝗼𝗹𝘀 𝗶𝗻 𝗧𝗲𝗰𝗵 𝗖𝗼𝗺𝗽𝗮𝗻𝗶𝗲𝘀

... are Gen AI Coding Tools the Gateway to Broader Industry Adoption?

In one of the latest articles, Deloitte points out that Generative AI (Gen AI) tools have been rapidly adopted by tech companies, particularly in software development. These tools are considered transformative and significantly increase productivity by automating tasks such as writing routine code and maintaining accuracy.

However, 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀 arise, such as the risk of technical debt from lower-quality code and the need to address the tendency of AI to generate incorrect or biased results. Furthermore, the impact of Gen AI varies depending on the experience of the developer, with junior developers often benefiting the most.

𝗥𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗮𝘁𝗶𝗼𝗻𝘀:

  • 𝗠𝗮𝗻𝗮𝗴𝗲 𝗧𝗲𝗰𝗵𝗻𝗶𝗰𝗮𝗹 𝗗𝗲𝗯𝘁: Ensure that the speed of AI-generated code does not compromise quality, increasing long-term system maintenance costs.

  • 𝗩𝗮𝗹𝗶𝗱𝗮𝘁𝗲 𝗔𝗜-𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗲𝗱 𝗖𝗼𝗱𝗲: Regularly check AI-generated outputs for accuracy to avoid potential errors or biases.

  • 𝗧𝗿𝗮𝗶𝗻 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗲𝗿𝘀: Provide guidance and training on AI tools to maximize productivity and ensure quality standards.

  • 𝗠𝗼𝗻𝗶𝘁𝗼𝗿 𝗧𝗼𝗼𝗹 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆: Evaluate the effectiveness of different Gen AI tools to optimize ROI and improve coding outcomes.

𝗖𝗼𝗻𝗰𝗹𝘂𝘀𝗶𝗼𝗻: There are significant benefits to be gained from integrating AI coding tools, but it requires careful monitoring to avoid technical debt and ensure high code quality. By training developers and validating AI outputs, organizations can fully realize the transformative potential of these tools.

Go to Article

Further Reading

The 20 Generative AI Coding Tools Every Programmer Should Know About

11 generative AI programming tools for developers

7 Code Generation Tools That AI Insiders Have Tried And Tested


𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗶𝗻𝗴 𝗗𝗲𝗹𝘁𝗮 𝗧𝗮𝗯𝗹𝗲𝘀 𝗶𝗻 𝗠𝗶𝗰𝗿𝗼𝘀𝗼𝗳𝘁 𝗙𝗮𝗯𝗿𝗶𝗰: 

𝗔𝗱𝘃𝗮𝗻𝗰𝗲𝗱 𝗧𝗲𝗰𝗵𝗻𝗶𝗾𝘂𝗲𝘀 𝗮𝗻𝗱 𝗠𝗮𝗶𝗻𝘁𝗲𝗻𝗮𝗻𝗰𝗲 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝗶𝗲𝘀

In this third part of his blog series on data ingestion with Spark in Microsoft Fabric Rajaniesh Kaushikk focuses on advanced optimization and maintenance for Delta tables. Efficient data management is critical for performance and reliability, particularly with large datasets and complex workflows.

𝗞𝗲𝘆 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀 𝗮𝗻𝗱 𝗥𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗮𝘁𝗶𝗼𝗻𝘀:

𝗗𝗮𝘁𝗮 𝗖𝗼𝗺𝗽𝗮𝗰𝘁𝗶𝗼𝗻: Small files can degrade performance. Use the 'OPTIMIZE' command to merge small files into larger ones to improve read performance.

𝗙𝗶𝗹𝗲 𝗦𝗶𝘇𝗲 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Configure file sizes for optimal performance. Enable automatic file optimization with 'spark.databricks.delta.optimizeWrite.enabled' and 'spark.databricks.delta.autoCompact.enabled'.

𝗭-𝗢𝗿𝗱𝗲𝗿𝗶𝗻𝗴: This technique organizes related data to enhance query performance. Use the 'OPTIMIZE delta_table ZORDER BY (column1, column2);' command for efficient data retrieval.

𝗠𝗮𝗶𝗻𝘁𝗲𝗻𝗮𝗻𝗰𝗲 𝗧𝗲𝗰𝗵𝗻𝗶𝗾𝘂𝗲𝘀: Employ VACUUM to remove old files and manage storage costs. Use schema evolution and handle deletes to maintain optimal performance.

Effective optimization and maintenance of Delta tables are essential for high performance and efficient data operations in Microsoft Fabric. Implementing these strategies will ensure your data management remains robust and scalable, driving better outcomes from your data.

Go to Article

Further Reading

Table maintenance feature to manage delta tables in Fabric

Optimizing performance and cost in Microsoft Fabric Lakehouse

Automate Delta Tables Maintenance in a Microsoft Fabric Warehouse


𝗔𝗱𝘃𝗮𝗻𝗰𝗶𝗻𝗴 𝗦𝗲𝗺𝗮𝗻𝘁𝗶𝗰 𝗟𝗮𝘆𝗲𝗿 𝗔𝗱𝗼𝗽𝘁𝗶𝗼𝗻 𝘄𝗶𝘁𝗵 𝗢𝗽𝗲𝗻-𝗦𝗼𝘂𝗿𝗰𝗲𝗱 𝗦𝗠𝗟

... to establish a universal standard, enabling model portability and cross-platform collaboration.

SML is expected to drive industry-wide improvements by simplifying model sharing, reducing dependency on proprietary tools, and promoting widespread adoption of semantic layers.

According to David P. Mariani, AtScale, the lack of a universal standard for semantic modeling has long hindered seamless data analytics. Key challenges include data fragmentation, limited model interoperability, and vendor-specific solutions. In response, the Semantic Modeling Language (SML) was developed as an open-source initiative to standardize semantic modeling, allowing for more consistent and scalable data use.

𝗦𝗲𝘃𝗲𝗿𝗮𝗹 𝗿𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗮𝘁𝗶𝗼𝗻𝘀 have been proposed to address these challenges:

  • 𝗦𝘁𝗮𝗻𝗱𝗮𝗿𝗱𝗶𝘇𝗲 𝗺𝗼𝗱𝗲𝗹𝗶𝗻𝗴 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲: Open-sourcing SML aims to create a universally accepted standard, facilitating model portability and collaboration across platforms.

  • 𝗣𝗿𝗼𝗺𝗼𝘁𝗲 𝗿𝗲𝘂𝘀𝗮𝗯𝗹𝗲 𝗺𝗼𝗱𝗲𝗹𝘀: Pre-built models covering industry use cases enhance accessibility and allow for quick implementation.

  • 𝗘𝗻𝗰𝗼𝘂𝗿𝗮𝗴𝗲 𝗰𝗼𝗺𝗺𝘂𝗻𝗶𝘁𝘆-𝗱𝗿𝗶𝘃𝗲𝗻 𝗶𝗻𝗻𝗼𝘃𝗮𝘁𝗶𝗼𝗻: Open access to SML encourages contributions from developers, leading to faster development of new tools and model converters.

The adoption of a standardized semantic modeling language is crucial for accelerating analytics consumption, enhancing interoperability, and democratizing data access for broader business use.

Go to Article

Further Reading

Github repo: https://github.com/semanticdatalayer/SML

What is SML?


𝗧𝗵𝗲 𝗩𝗮𝗹𝘂𝗲 𝗼𝗳 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗚𝗿𝗮𝗽𝗵𝘀 𝗶𝗻 𝗠𝗼𝗱𝗲𝗿𝗻 𝗗𝗮𝘁𝗮 𝗠𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁

The Future of Data Management with Knowledge Graphs

Knowledge graphs enhance data management and AI, improving insights and operational efficiency.

In this blog post, Guillaume Rachez, Perfect Memory and Doug Kimball, Ontotext point out that knowledge graphs offer efficient ways to access and manage knowledge by mimicking the human brain's ability to connect and organize information. They improve data handling by integrating various data points into a semantic framework, enhancing inference and insight generation.

𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀:

  • 𝗖𝗼𝗺𝗽𝗹𝗲𝘅 𝗗𝗮𝘁𝗮 𝗥𝗲𝗹𝗮𝘁𝗶𝗼𝗻𝘀𝗵𝗶𝗽𝘀: Companies struggle with understanding and managing intricate data relationships, leading to inefficiencies.

  • 𝗟𝗶𝗺𝗶𝘁𝗲𝗱 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴: Traditional databases lack the capability to infer and connect disparate data points like a knowledge graph can.

  • 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗔𝗜 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴: Ensuring generative AI models are trained on accurate, structured data to prevent errors and improve reliability.

𝗥𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗮𝘁𝗶𝗼𝗻𝘀:

  • 𝗟𝗲𝘃𝗲𝗿𝗮𝗴𝗲 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗚𝗿𝗮𝗽𝗵𝘀: Utilize knowledge graphs to integrate and connect diverse data, improving access and insights.

  • 𝗔𝗱𝗼𝗽𝘁 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗶𝗻𝗴 𝗖𝗮𝗽𝗮𝗯𝗶𝗹𝗶𝘁𝗶𝗲𝘀: Implement advanced inferencing to enhance data understanding and analytics.

  • 𝗜𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗲 𝘄𝗶𝘁𝗵 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗔𝗜: Use knowledge graphs to train AI models, ensuring accuracy and reducing errors in AI-generated insights.

Knowledge graphs provide significant advantages in managing complex data and improving AI capabilities, offering a crucial tool for enhancing data insights and operational efficiency.

Go to Article

Further Reading

What Is a Knowledge Graph?

Knowledge Graphs 101: The Story (and Benefits) Behind the Hype

The Power of Knowledge Graphs in Data Management

Using a Knowledge Graph to Power a Semantic Data Layer for Databricks

How-to: Building Knowledge Graphs in 10 Steps


𝗦𝘁𝗿𝗲𝗮𝗺𝗹𝗶𝗻𝗶𝗻𝗴 𝗥𝗲𝗾𝘂𝗶𝗿𝗲𝗺𝗲𝗻𝘁𝘀 𝗚𝗮𝘁𝗵𝗲𝗿𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗥𝗔𝗚 𝗮𝗻𝗱 𝗚𝗲𝗻𝗔𝗜

𝗜𝗺𝗽𝗿𝗼𝘃𝗶𝗻𝗴 𝗔𝗰𝗰𝘂𝗿𝗮𝗰𝘆 𝘁𝗵𝗿𝗼𝘂𝗴𝗵 𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗼𝗻

In this article,Hemank Lowe points out that traditional methods of gathering project requirements are time-consuming, error-prone, and often incomplete. Manual processes lead to missed details, miscommunication, and rework, impacting project success. However, using a Retrieval-Augmented Generation (RAG) approach powered by generative AI offers significant improvements.

𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀 𝗶𝗻 𝗥𝗲𝗾𝘂𝗶𝗿𝗲𝗺𝗲𝗻𝘁𝘀 𝗚𝗮𝘁𝗵𝗲𝗿𝗶𝗻𝗴:

  • 𝗜𝗻𝗮𝗰𝗰𝘂𝗿𝗮𝗰𝘆 𝗮𝗻𝗱 𝗢𝗺𝗶𝘀𝘀𝗶𝗼𝗻𝘀: Manual efforts often result in incomplete or inconsistent documentation.

  • 𝗧𝗶𝗺𝗲 𝗮𝗻𝗱 𝗥𝗲𝘀𝗼𝘂𝗿𝗰𝗲-𝗜𝗻𝘁𝗲𝗻𝘀𝗶𝘃𝗲: The traditional approach is laborious, increasing project costs and delays.

  • 𝗖𝗼𝗺𝗽𝗹𝗲𝘅𝗶𝘁𝘆 𝗶𝗻 𝗗𝗮𝘁𝗮 𝗛𝗮𝗻𝗱𝗹𝗶𝗻𝗴: Managing large volumes of diverse data adds to the complexity of requirement analysis.

𝗥𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗮𝘁𝗶𝗼𝗻𝘀:

  • 𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗲 𝗠𝗮𝗻𝘂𝗮𝗹 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗲𝘀: Leverage RAG-based models to reduce manual work and improve accuracy.

  • 𝗘𝗻𝗵𝗮𝗻𝗰𝗲 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴: Use AI to grasp nuanced project details and stakeholder needs.

  • 𝗙𝗼𝘀𝘁𝗲𝗿 𝗜𝗻𝗻𝗼𝘃𝗮𝘁𝗶𝗼𝗻: Allow generative AI to create new content and foresee unanticipated requirements.

𝗧𝗵𝗲 𝗶𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 𝗼𝗳 𝗥𝗔𝗚 𝗮𝗻𝗱 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗔𝗜 𝘀𝘁𝗿𝗲𝗮𝗺𝗹𝗶𝗻𝗲𝘀 𝘁𝗵𝗲 𝗽𝗿𝗼𝗰𝗲𝘀𝘀 𝗼𝗳 𝗰𝗮𝗽𝘁𝘂𝗿𝗶𝗻𝗴 𝗿𝗲𝗾𝘂𝗶𝗿𝗲𝗺𝗲𝗻𝘁𝘀, improves the quality of documentation, and reduces errors, leading to more successful software development outcomes.

Go to Article

Further Reading

The Potential of Generative Artificial Intelligence in Requirements Engineering

AI in Requirements Engineering: Clarity and Consistency Boost

Using AI for requirements analysis: A case study


𝗔𝗜 𝗠𝗮𝗿𝗸𝗲𝘁𝗽𝗹𝗮𝗰𝗲 𝗛𝗲𝗮𝘁𝘀 𝗨𝗽: 𝗢𝗽𝗲𝗻 𝘃𝘀. 𝗖𝗹𝗼𝘀𝗲𝗱, 𝗗𝗶𝗿𝗲𝗰𝘁 𝘃𝘀. 𝗜𝗻𝗱𝗶𝗿𝗲𝗰𝘁

... choose an AI partner whose business model and values align with yours

The AI market is rapidly evolving, with major players like Microsoft facing increased competition. Ayal Steinberg points out that this competition centers around differentiation: how vendors position themselves to attract customers.

𝗞𝗲𝘆 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀 𝗮𝗻𝗱 𝗥𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗮𝘁𝗶𝗼𝗻𝘀

  • 𝗩𝗲𝗻𝗱𝗼𝗿 𝗗𝗲𝗽𝗲𝗻𝗱𝗲𝗻𝗰𝗲: Microsoft's early lead in AI workloads on Azure stemmed heavily from its OpenAI partnership. However, this reliance on Nvidia chips for in-house GPU technology increases data center capital expenditures and potentially leads to higher operating margins.

  • 𝗗𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁𝗶𝗮𝘁𝗶𝗼𝗻 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝗶𝗲𝘀: Each vendor is leveraging existing strengths and business models to differentiate themselves. Frameworks like "AI Market Dynamics: Open vs. Closed, Direct vs. Indirect" can help understand this landscape.

  • 𝗢𝗽𝗲𝗻 𝘃𝘀. 𝗖𝗹𝗼𝘀𝗲𝗱 𝗠𝗼𝗱𝗲𝗹𝘀: This framework considers how open the underlying AI model is (open-source or proprietary) and how the vendor monetizes it (direct model sales or AI integration within existing products/services).

𝗧𝗵𝗿𝗲𝗲 𝗘𝗺𝗲𝗿𝗴𝗶𝗻𝗴 𝗚𝗿𝗼𝘂𝗽𝘀

The framework categorizes vendors into three groups:

  • 𝗚𝗿𝗼𝘂𝗽 𝟭: 𝗠𝘆 𝗪𝗮𝘆 𝗼𝗿 𝘁𝗵𝗲 𝗛𝗶𝗴𝗵𝘄𝗮𝘆 (𝗖𝗹𝗼𝘀𝗲𝗱): These vendors (e.g., Microsoft, OpenAI) offer a complete suite of AI capabilities but limit them to their own closed ecosystem.

  • 𝗚𝗿𝗼𝘂𝗽 𝟮: 𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿 𝗔𝗜 𝗙𝗼𝗰𝘂𝘀 (𝗢𝗽𝗲𝗻): These vendors (e.g., Meta, Google) primarily generate revenue from user bases through advertising. Their AI investments are focused on improving existing user experiences and services.

  • 𝗚𝗿𝗼𝘂𝗽 𝟯: 𝗘𝗻𝘁𝗲𝗿𝗽𝗿𝗶𝘀𝗲 𝗔𝗜 𝗙𝗼𝗰𝘂𝘀 (𝗢𝗽𝗲𝗻): These vendors (e.g., IBM) primarily focus on enterprise software and services. Their AI strategy leverages open-source models combined with proprietary value-added services to deliver solutions for businesses.

𝗖𝗵𝗼𝗼𝘀𝗶𝗻𝗴 𝗮𝗻 𝗔𝗜 𝗣𝗮𝗿𝘁𝗻𝗲𝗿: 𝗔𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 𝗠𝗮𝘁𝘁𝗲𝗿𝘀: Selecting an AI partner with a business model and values aligned with your own is crucial. Today's AI decisions will become the foundation for future applications.

Understanding the evolving AI market landscape and vendor strategies empowers organizations to make informed decisions when selecting an AI partner. Choosing the right partner, based on both technical capabilities and business model alignment, can significantly impact the success of AI initiatives.

Go to Article

Further Reading

How to Make Money with AI: Strategies for Success and 7 Proven Methods

OpenHPI Free Online Course: Profitable AI

The AI Marketplace


𝗧𝗵𝗲 𝗝𝗼𝘂𝗿𝗻𝗲𝘆 𝘁𝗼 𝗦𝗲𝗹𝗳-𝗦𝗲𝗿𝘃𝗶𝗰𝗲 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀: 𝗔 𝗚𝘂𝗶𝗱𝗲

Empowering Business Users: Building a Foundation for Self-Service Success

Self-service analytics, where business users can independently access and analyze data, is a popular goal for data-driven organizations.

However - as Wayne Eckerson points out - achieving this can be challenging due to the complexities of data governance, architecture, and tool selection.

𝗥𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗮𝘁𝗶𝗼𝗻𝘀

  • 𝗕𝗮𝗹𝗮𝗻𝗰𝗶𝗻𝗴 𝗘𝗺𝗽𝗼𝘄𝗲𝗿𝗺𝗲𝗻𝘁 𝗮𝗻𝗱 𝗖𝗼𝗻𝘁𝗿𝗼𝗹: Successfully implementing self-service requires striking a balance between empowering business users and maintaining data governance.

  • 𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴 𝗕𝘂𝘀𝗶𝗻𝗲𝘀𝘀 𝗨𝘀𝗲𝗿𝘀: Classify business users and tailor self-service solutions to their specific needs and capabilities.

  • 𝗗𝗲𝘀𝗶𝗴𝗻𝗶𝗻𝗴 𝗘𝗳𝗳𝗲𝗰𝘁𝗶𝘃𝗲 𝗗𝗮𝘁𝗮 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲: Create a data architecture that supports self-service by providing a foundation of building blocks for business users to extend.

  • 𝗘𝘀𝘁𝗮𝗯𝗹𝗶𝘀𝗵𝗶𝗻𝗴 𝗮 𝗥𝗼𝗯𝘂𝘀𝘁 𝗢𝗽𝗲𝗿𝗮𝘁𝗶𝗻𝗴 𝗠𝗼𝗱𝗲𝗹: Implement clear data engineering standards and provide training and coaching to business users to enable them to build their own data structures.

  • 𝗦𝗲𝗹𝗲𝗰𝘁𝗶𝗻𝗴 𝘁𝗵𝗲 𝗥𝗶𝗴𝗵𝘁 𝗧𝗼𝗼𝗹𝘀: Choose tools that empower power users to discover, validate, prepare, and analyze data efficiently.

True self-service is a process that requires careful planning and execution. By understanding the business requirements, implementing a solid data architecture and operating model, and selecting the right tools, organizations can empower their business users to make data-driven decisions.

Go to Article

Further Reading

Self-Service Data: Definition, Benefits, and Step-by-Step Guide

Self-Service Data Analytics as a Hierarchy of Needs

Self Service Analytics: What is It and Why is It Important?

Self-Service Data Platform: Definition & Must-Have Features


𝗟𝗼𝘀𝘁 𝗶𝗻 𝘁𝗵𝗲 𝗙𝗹𝗼𝗼𝗱 𝗼𝗳 𝗗𝗮𝘁𝗮?

𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗮 𝗗𝗮𝘁𝗮 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝘆 𝗳𝗼𝗿 𝗦𝘂𝗰𝗰𝗲𝘀𝘀

In today's data-driven world, organizations grapple with managing vast amounts of information. As Databricks points out, a well-defined data strategy acts as a roadmap to unlock the true potential of this data and drive business value.

𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀:

  • 𝗗𝗮𝘁𝗮 𝗦𝗶𝗹𝗼𝘀 𝗮𝗻𝗱 𝗟𝗶𝗺𝗶𝘁𝗲𝗱 𝗩𝗶𝘀𝗶𝗯𝗶𝗹𝗶𝘁𝘆: Fragmented data hinders analysis and decision-making.

  • 𝗗𝗮𝘁𝗮 𝗟𝗶𝘁𝗲𝗿𝗮𝗰𝘆 𝗚𝗮𝗽: Lack of training hinders understanding and utilization of data.

  • 𝗚𝗼𝘃𝗲𝗿𝗻𝗮𝗻𝗰𝗲 𝗮𝗻𝗱 𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆 𝗥𝗶𝘀𝗸𝘀: Uncontrolled data access raises security and legal concerns.

𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗮 𝗦𝘁𝗿𝗼𝗻𝗴 𝗗𝗮𝘁𝗮 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝘆:

  • 𝗗𝗲𝗳𝗶𝗻𝗲 𝗚𝗼𝗮𝗹𝘀: Clearly outline the objectives for your data strategy, aligning with business goals like cost reduction or customer experience improvement.

  • 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗲 𝗬𝗼𝘂𝗿 𝗗𝗮𝘁𝗮 𝗟𝗮𝗻𝗱𝘀𝗰𝗮𝗽𝗲: Identify current data sources, access rights, and analytical maturity.

  • 𝗔𝘀𝘀𝗲𝗺𝗯𝗹𝗲 𝗮 𝗖𝗿𝗼𝘀𝘀-𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝗮𝗹 𝗧𝗲𝗮𝗺: Include stakeholders from various departments who can contribute diverse perspectives.

  • 𝗗𝗲𝘃𝗲𝗹𝗼𝗽 𝗮 𝗥𝗼𝗮𝗱𝗺𝗮𝗽: Define steps to achieve your goals, including timelines and resource allocation.

  • 𝗘𝘀𝘁𝗮𝗯𝗹𝗶𝘀𝗵 𝗗𝗮𝘁𝗮 𝗚𝗼𝘃𝗲𝗿𝗻𝗮𝗻𝗰𝗲: Set clear policies and standards for data quality, access, and security.

  • 𝗜𝗻𝘃𝗲𝘀𝘁 𝗶𝗻 𝗧𝗲𝗰𝗵𝗻𝗼𝗹𝗼𝗴𝘆: Consider data lake solutions, integration platforms, and analytics tools.

  • 𝗙𝗼𝘀𝘁𝗲𝗿 𝗮 𝗗𝗮𝘁𝗮-𝗗𝗿𝗶𝘃𝗲𝗻 𝗖𝘂𝗹𝘁𝘂𝗿𝗲: Train employees to work with data and understand its value.

  • 𝗠𝗼𝗻𝗶𝘁𝗼𝗿 𝗮𝗻𝗱 𝗔𝗱𝗮𝗽𝘁: Regularly assess progress and adjust the strategy based on evolving needs.

𝗘𝗺𝗽𝗼𝘄𝗲𝗿𝗺𝗲𝗻𝘁: A well-implemented data strategy empowers organizations to:

  • 𝗠𝗮𝗸𝗲 𝗜𝗻𝗳𝗼𝗿𝗺𝗲𝗱 𝗗𝗲𝗰𝗶𝘀𝗶𝗼𝗻𝘀: Data-driven insights guide better business strategies and resource allocation.

  • 𝗜𝗺𝗽𝗿𝗼𝘃𝗲 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆 𝗮𝗻𝗱 𝗥𝗲𝗱𝘂𝗰𝗲 𝗖𝗼𝘀𝘁𝘀: Eliminate wasteful practices and optimize workflows.

  • 𝗘𝗻𝗵𝗮𝗻𝗰𝗲 𝗖𝘂𝘀𝘁𝗼𝗺𝗲𝗿 𝗘𝘅𝗽𝗲𝗿𝗶𝗲𝗻𝗰𝗲: Gain insights into customer behavior to personalize interactions and drive loyalty.

  • 𝗨𝗻𝗹𝗼𝗰𝗸 𝗜𝗻𝗻𝗼𝘃𝗮𝘁𝗶𝗼𝗻: Fuel research and development with valuable data-driven insights.

By implementing a comprehensive data strategy, businesses can gain a competitive edge, improve decision-making, and enhance operational efficiency.

Go to Article

Further Reading

What is a Data Strategy & Why is it Important?

AWS: What is Data Strategy?

7 Elements of a Data Strategy

Databricks Data Strategy Blog


𝗖𝗼𝗺𝗺𝗶𝘁𝗺𝗲𝗻𝘁 𝘁𝗼 𝗘𝗳𝗳𝗲𝗰𝘁𝗶𝘃𝗲 𝗗𝗮𝘁𝗮 𝗚𝗼𝘃𝗲𝗿𝗻𝗮𝗻𝗰𝗲

How to craft the ultimate business case for data governance

Data governance is essential for organizations, but it often faces resistance due to perceived high upfront costs compared to unclear benefits. Prukalpa ⚡ (Atlan) points out that it is not merely about risk avoidance but also about leveraging data effectively.

𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀 𝗶𝗻 𝘀𝗲𝗹𝗹𝗶𝗻𝗴 𝗱𝗮𝘁𝗮 𝗴𝗼𝘃𝗲𝗿𝗻𝗮𝗻𝗰𝗲:

  • 𝗔𝗯𝘀𝘁𝗿𝗮𝗰𝘁 𝗕𝗲𝗻𝗲𝗳𝗶𝘁𝘀: The advantages of governance can seem intangible against the investment required.

  • 𝗗𝗮𝘁𝗮 𝗦𝗶𝗹𝗼𝘀: Lack of visibility into data creates risks regarding sensitive information, which can lead to fines and poor data quality.

  • 𝗧𝗿𝘂𝘀𝘁 𝗜𝘀𝘀𝘂𝗲𝘀: Stakeholders often distrust data, hampering decision-making and leading to inefficiencies.

𝗥𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗮𝘁𝗶𝗼𝗻𝘀:

  • 𝗛𝗶𝗴𝗵𝗹𝗶𝗴𝗵𝘁 𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆 𝗮𝗻𝗱 𝗖𝗼𝗺𝗽𝗹𝗶𝗮𝗻𝗰𝗲: Emphasize the importance of safeguarding sensitive data and compliance to protect the company’s reputation.

  • 𝗘𝗻𝗵𝗮𝗻𝗰𝗲 𝗧𝗿𝘂𝘀𝘁 𝗶𝗻 𝗗𝗮𝘁𝗮: Foster a data-driven culture by ensuring stakeholders understand data sources and quality.

  • 𝗜𝗺𝗽𝗿𝗼𝘃𝗲 𝗩𝗶𝘀𝗶𝗯𝗶𝗹𝗶𝘁𝘆: Ensure that teams can easily access and utilize data, thus increasing productivity and efficiency.

  • 𝗗𝗲𝗺𝗼𝗻𝘀𝘁𝗿𝗮𝘁𝗲 𝗥𝗢𝗜: Show how effective governance can enhance the performance of data assets and justify the investment.

Also, in light of the new EU AI Act, establishing effective data governance is critical for organizations seeking to maximize the potential of their data and ensure compliance. By addressing the challenges and implementing these recommendations, companies can derive significant value from their data.

Go to Article

Further Reading

Demonstrating the Value of Data Governance

7 benefits of data governance for your organization

How to Drive Business Value With Data Governance?

The Importance of Data Governance in Today’s Business Environment


𝗕𝗿𝗶𝗱𝗴𝗶𝗻𝗴 𝘁𝗵𝗲 𝗚𝗮𝗽 𝗕𝗲𝘁𝘄𝗲𝗲𝗻 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗮𝗻𝗱 𝗕𝘂𝘀𝗶𝗻𝗲𝘀𝘀 

𝗔 𝗚𝘂𝗶𝗱𝗲 𝘁𝗼 𝗘𝗘𝗕𝗢 𝗠𝗲𝘁𝗿𝗶𝗰𝘀

EEBO Metrics Create a Line of Sight from Engineering Excellence to Business Outcomes

EEBO metrics are crucial for aligning engineering efforts with business objectives. As Richard Gall points out, they provide a direct link between engineering activities and tangible business outcomes. By measuring engineering effectiveness and its impact on business value, organizations can make data-driven decisions, improve efficiency, and gain a competitive edge.

𝗥𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗮𝘁𝗶𝗼𝗻𝘀:

  • 𝗢𝘃𝗲𝗿𝗰𝗼𝗺𝗶𝗻𝗴 𝘁𝗵𝗲 𝗗𝗶𝘀𝗰𝗼𝗻𝗻𝗲𝗰𝘁: Traditional metrics often focus on activity rather than value. Organizations must shift towards metrics that directly measure business impact.

  • 𝗕𝗿𝗲𝗮𝗸𝗶𝗻𝗴 𝗗𝗼𝘄𝗻 𝗦𝗶𝗹𝗼𝘀: Effective collaboration between engineering and business teams is essential for aligning goals and understanding the connection between engineering work and business outcomes.

  • 𝗖𝗵𝗼𝗼𝘀𝗶𝗻𝗴 𝘁𝗵𝗲 𝗥𝗶𝗴𝗵𝘁 𝗠𝗲𝘁𝗿𝗶𝗰𝘀: Select metrics that are relevant to business objectives, actionable, and aligned with team behavior.

By implementing EEBO metrics and fostering collaboration between technical and business teams, organizations can bridge the gap between technical excellence and business success. This leads to more efficient and effective software development, better decision making and ultimately a stronger competitive position.

Go to Article

Further Reading

Introduction to EEBO Metrics

EEBO Metrics Create a Line of Sight from Engineering Excellence to Business Outcomes

How do you know engineering is powering business growth?


Takeaways

Key takeaways from this month's issue that will provide you with important strategies and insights for success in data technology:

GenAI Coding Tools: Tech companies should embrace generative AI coding tools to enhance productivity and code efficiency, while preparing for broader adoption across industries. Data engineers should explore these tools to automate routine tasks, improve coding speed, and maintain oversight to prevent technical debt, while continuously upskilling to maximize productivity and code quality.

Delta Tables in Fabric: To enhance performance and efficiency in managing Delta tables, implement advanced optimization techniques such as data compaction, Z-ordering, and effective file size management. Regularly utilize commands like OPTIMIZE and VACUUM to maintain data quality and streamline large-scale data operations.

Semantic Modeling Language: Explore and adopt the Semantic Modeling Language (SML) to enhance model portability, simplify data analytics workflows, and support standardized, reusable semantic models across platforms, fostering consistency and collaboration in data-driven environments.

Knowledge Graphs: Leverage knowledge graphs to efficiently organize and connect disparate data, enabling powerful inferencing capabilities and improving AI accuracy. Integrating knowledge graphs supports scalable, context-driven data management, which enhances AI applications and ensures reliable, accessible insights.

Gathering Requiremens with AI: Enhance requirements gathering by using generative AI and a retrieval-augmented generation (RAG) approach. It improves efficiency, accuracy, and scalability, automating manual tasks and creating more complete, innovative requirements while handling large volumes of data across various formats.

AI Marketplace: One should evaluate AI vendors based on their business models and revenue strategies, taking into account factors such as openness and direct monetization of AI. Working with vendors that share similar values can have a significant impact on future AI applications and the success of data-driven projects.

Self Service: To successfully implement self-service analytics, prioritize understanding business user needs. Establish robust data architecture and governance frameworks, develop supportive operating models, and select appropriate tools. This approach empowers users while ensuring data quality and consistency, ultimately driving effective decision-making across the organization.

Data Strategy: Focus on developing a comprehensive data strategy that aligns with business objectives. Prioritize creating a unified data architecture, removing silos, and fostering collaboration. By empowering users and ensuring effective governance, you can unlock valuable insights and enhance decision-making across the organization.

Effective Data Governance: To effectively advocate for data governance, one should focus on the tangible benefits it brings. By addressing data security, improving data quality, and accelerating insights, data governance can enhance the overall effectiveness of data engineering efforts. Point out the potential cost savings, increased productivity and improved decision making that result from well-managed data.

EEBO - Engineering Excellence to Business Outcomes: To demonstrate the impact of technical work, prioritize metrics that align with business goals, like EEBO. Collaborate with business stakeholders to select meaningful metrics, ensure clear communication and promote accountability - and ultimately demonstrate how technical work contributes to financial results and strategic success.


Conclusion

This issue shares important strategies for improving data engineering practices. Using generative AI coding tools can boost productivity and code quality while managing technical debt. Applying optimization techniques for Delta tables enhances performance in large data operations. The Semantic Modeling Language helps standardize data analytics, making collaboration easier. Knowledge graphs organize data effectively, improving AI accuracy and insights. Generative AI can streamline gathering requirements, making development faster and more innovative. Choosing AI vendors that align with your business values is crucial for success. Focusing on self-service analytics, solid data strategies, and good governance allows organizations to unlock valuable insights and make better decisions.

Stay tuned for our next issue, where we’ll explore the latest trends and innovations in data technology.

See you next month ...


#DataEngineering #DataGovernance #DataArchitecture #DataAnalytics #DataScience #MachineLearning #AI #DataManagement #BigData #DataQuality #DataSecurity #DataPrivacy #DataIntegration #DataTransformation #DataModeling #DataVisualization #DataOps #CloudDataWarehousing #CloudDataLakes #DataPipeline #ETL #ELT #DataLakes #DataWarehouses #DataMarts #DataCatalog #MetadataManagement #DataLineage #DataProfiling #DataCleansing #DataStandardization #DataGovernanceFramework #DataGovernancePolicy #DataGovernanceTools #DataGovernanceBestPractices #DataGovernanceCompliance

Ghulam Abbas (Owner)

Website Development, website Design,website clone, website SEO company and Real Estate lead generation with Skip trace , Data entry B2B LEAD GENERATION { company)

2w

Interesting

Kanha Shukla

Data Engineer • Data Scientist • Data Analyst | ★ Exp: 4+ ★ | University of Maryland - Software Engineering | Cloud | Devops | Business intelligence | Machine Learning

1mo

Fantastic selection of resources, Axel Schwanke! 📊 Data engineering evolves so quickly, and posts like these keep us all up-to-date. Here are a few trends I see shaping up: ·  🚀 DataOps & Automation: With tools becoming more intuitive, automation is key for scalable data workflows. ·  🌐 Real-Time Analytics: Demand for real-time insights is pushing the boundaries of traditional batch processing. ·  🔒 Data Governance & Privacy: As data grows, so does the need for robust governance strategies. Thanks for curating this! These insights drive impactful change across the industry. 👏

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics