Joon Solutions’ Post

View organization page for Joon Solutions, graphic

2,585 followers

8mo Edited

Are you struggling with slow and outdated Reports? Data lineage can be your secret weapon! Learn how data lineage can save your team time and money. This great article by Ha Le explores: - What data lineage is and why it's critical - The high cost of not considering data lineage (& a real-world example) - Practical steps to implement data lineage for your team https://lnkd.in/dNXKCFXU Want to learn more? Let's chat! Our Joon Solutions experts are here to help.

Why is data lineage critical for a data team?

medium.com

2 Comments

Marco Sollie

Data & Analytics | Co-Founder at Joon Solutions

7mo

Reduction in run time also leads to a reduction in hard cash spent on consumption based query engines. Awesome!

1 Reaction

Bao Nguyen

Data Engineer at Joon Solutions

7mo

Love this

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Jonathan Barrett

ex-IBM, ex-Teradata, CEO/Founder Joon Solutions, 15 years in data and I still love it!
7mo
Report this post
Another great piece of content by us! Working through data lineage is sometimes an afterthought of a data project but there are real advantages to putting some of these ideas top of mind. Give the article a read and let us know your thoughts!

Joon Solutions

2,585 followers
8mo Edited

Are you struggling with slow and outdated Reports? Data lineage can be your secret weapon! Learn how data lineage can save your team time and money. This great article by Ha Le explores: - What data lineage is and why it's critical - The high cost of not considering data lineage (& a real-world example) - Practical steps to implement data lineage for your team https://lnkd.in/dNXKCFXU Want to learn more? Let's chat! Our Joon Solutions experts are here to help.

Why is data lineage critical for a data team?

medium.com
Like Comment
To view or add a comment, sign in
Kursat Hosel

Co-Founder and CEO @ Konstellation
3mo Edited
Report this post
tl:dr: LIKE if you prefer Option 1. We are going deep. We got a request, and I'd like to test the market for this new feature. Forget about tech stack, data stack, LLM hype, alert fatigue, and incident management; focus on what matters: What the heck is happening in my data tables, good or bad? Here it goes: Imagine you access the platform (spoiler: it's Konstellation), and you first see a list of tables. How many do you have? 100s? 100,000s? It doesn't matter. You can filter, sort, group, and tag, making it easy to use based on what you care about most. We made it default to Table Reliability Score. That's right; we score every single table you observe based on its data quality state and reliability. You can check the details to learn more about a specific table. Details of a table? It's not something you can dig out from your warehouse. We give you: - The SLA of that table, along with a realistic expected data landing time (self-adjusting based on historical data) - How much does it cost to maintain that table? Should it cost that much? Should you drop that table? Can you optimize it? - How many users/teams/reports/queries use that table - Its lineage, along with related jobs: average cost of the job - A list of data anomalies, if there are any, along with root-cause analysis and impact analysis I am attaching a mock-up. Please share your thoughts. Left or Right? Option 1 or Option 2? Your opinion matters. If you think "data ops" is not your cup of tea, tag someone who would appreciate and/or have a strong preference. #dataops #dataobservability #dataengineering #dataquality
Like Comment
To view or add a comment, sign in
Jitendra Shimpi

Data Engineer | Expert in AWS Glue, Lambda, PySpark, SQL & Snowflake | Building Scalable ETL Pipelines & Optimizing Big Data Workflows
7mo
Report this post
🚀 Excited to share this insightful article on enhancing data pipeline reliability through comprehensive testing strategies! Whether you're new to managing data pipelines or looking to bolster your existing setup, this article offers a structured approach to ensure data quality and correctness. Check it out here:https://lnkd.in/g6V5Xfyw Discover how to: ✅ Conduct end-to-end system testing to prevent disruptions ✅ Implement data quality testing to maintain integrity ✅ Set up monitoring and alerting for prompt anomaly detection ✅ Strengthen your pipeline with unit and contract testing Let's optimize data pipeline performance together! #DataEngineering #DataQuality #TestingStrategies Joseph Machado

How to add tests to your data pipelines

startdataengineering.com
Like Comment
To view or add a comment, sign in
Khadidja Grebici

Head of Data Governance | Data Analytics, Business Process Automation
7mo
Report this post
Great takeaways, Ramdas! I would like to add a slight emphasis on the 'value delivery' aspect by considering the quantification of the business outcomes resulting from the data work. This could include measuring growth in terms of dollars through the discovery of new leads, among other factors. Overall, I couldn't agree more with your insightful points!

Ramdas Narayanan

VP Client Insights Analytics (Digital Data and Marketing) at Bank Of America, Data Driven Strategist, Innovation Advisory Council. Member at Vation Ventures. Opinions/Comments/Views stated in LinkedIn are solely mine.
7mo

Sharing an article from Niels Claeys on building blocks of successful data teams, a detailed and insightful article.

The building blocks of successful Data Teams

medium.com
Like Comment
To view or add a comment, sign in
Tinybird

10,333 followers
6mo
Report this post
Observability is a critical component of any data platform. Tinybird has always had built-in Service Data Source that you can query to build production-level monitoring. Today, we’re building on that with the jobs_log Service Data Source. The jobs_log Service Data Source dramatically improves insight into background jobs. Tinybird keeps data fast and fresh by working incrementally on data as it arrives, however, there’s still various uses for long-running background tasks. You might want to create scheduled, static snapshots of a dataset using Copy Pipes, or backfill an incremental materialized view with historical data. These run as background jobs. jobs_log provides major improvements for production-level monitoring. Read more 👉 https://lnkd.in/d4HPMcUr

Jobs log: Improved o11y for background jobs

tinybird.co
Like Comment
To view or add a comment, sign in
Akashdeep Gupta

Principal Data Engineer at Accenture, AWS Certified Data Analytics and ML Specialist
1mo Edited
Report this post
💡6 Key Considerations for Building an Efficient Data Pipeline 💡⁣ ⁣ ⁣ 𝟏. 𝐃𝐚𝐭𝐚 𝐃𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐜𝐲 𝐜𝐡𝐞𝐜𝐤𝐬 — To ensure source data for your pipeline is already in place/arrived within the SLA before the pipeline runs start. ⁣ ⁣ 𝟐. 𝐃𝐚𝐭𝐚 𝐐𝐮𝐚𝐥𝐢𝐭𝐲 — ⁣ To ensure final data generated meets quality standards or other metrics. Guarantees data is good for consumption by downstream systems and users.⁣ ⁣ 𝟑. 𝐂𝐨𝐧𝐭𝐫𝐨𝐥 𝐭𝐚𝐛𝐥𝐞𝐬 — ⁣ To simplify pipeline tracking and make it easy to retrieve key details (e.g., run metrics, audits, DQ reports). ⁣ ⁣ 𝐏𝐫𝐨 𝐓𝐢𝐩: Design them to be joinable for unified insights to get the best out of it. ⁣ 𝟒. 𝐑𝐞𝐫𝐮𝐧 𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐚𝐧𝐝 𝐛𝐚𝐜𝐤𝐟𝐢𝐥𝐥 𝐡𝐚𝐧𝐝𝐥𝐢𝐧𝐠 — ⁣ Often overlooked but crucial. Allows you to rerun the pipeline after failures and backfill old data with minimal disruption and manual effort.⁣ ⁣ Designing takes lots of brainstorming but makes everyone’s life easier.⁣ ⁣ 𝟓. 𝐋𝐨𝐠𝐠𝐢𝐧𝐠 𝐟𝐨𝐫 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐝𝐞𝐛𝐮𝐠𝐠𝐢𝐧𝐠 — ⁣ A centralized easily accessible logging solution to get the logs quickly for faster issue resolutions.⁣ ⁣ 𝟔. 𝐄𝐦𝐚𝐢𝐥 𝐍𝐨𝐭𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐨𝐫 𝐄𝐯𝐞𝐧𝐭𝐬 𝐨𝐧 𝐒𝐮𝐜𝐜𝐞𝐬𝐬/𝐅𝐚𝐢𝐥𝐮𝐫𝐞𝐬 — ⁣ To keep stakeholders informed and events for keeping downstream pipelines on track. —— What else would you add in the list that you consider while building a data pipeline ? #DataEngineering #DataPipelines #BigData #TechTips
Like Comment
To view or add a comment, sign in
Rite Software

13,638 followers
1mo
Report this post
Data extraction got you feeling stuck? Our latest blog reveals practical tactics to break through the barriers and streamline your data flow. Read on and transform your approach today! https://hubs.li/Q02VjS1D0 #OracleDataExtractionTool #DataSynchronizationSolutions #DataSynchronizationTool #RiteSoftware

How to Overcome Data Extraction Challenges? - Rite Software

https://ritesoftware.com

1 Comment
Like Comment
To view or add a comment, sign in
Joseph M.

Data Engineer, startdataengineering.com | Bringing software engineering best practices to data engineering.
2mo
Report this post
🚨 Dealing with repeated code and inefficient joins in data pipelines is a nightmare. When multiple teams ask for datasets with different granularities, it leads to redundant work, fragmented reporting, and confusion about which numbers are the source of truth. The risk? Incorrect metric calculations and inconsistent insights across teams. 💭 Imagine a streamlined system where every team has access to the right data, at the right granularity, all without repeating joins and calculations. No more redundant code. No more confusion. Just a single source of truth that every team can rely on with confidence. ⚙️ Here's how Use one (or both) of the following: 1. One Big Table (OBT): A fact table left-joined with all relevant dimension tables. 2. Pre-aggregated table: Rolling up the OBT to the exact grain required by the end users. This table becomes the single source of truth (SOT), delivering the exact columns and metrics users need without room for error. With this approach, teams access reliable data, and you eliminate the pain of repeated code, making your pipelines scalable and consistent. What other approaches do you follow to avoid code repetition and repeating joins? Let me know in the comment section below. #dataengineering #ETL #bigdata #analytics

8 Comments
Like Comment
To view or add a comment, sign in
James LaPlaunt

Client Executive: Strategic Accounts - Canada
3mo
Report this post
DataOps is the cornerstone of effective data management. Our latest findings reveal how organizations can tailor their DataOps strategies to maximize value. Dive into the data and see how we can drive business success together.

DBA Why DataOps is the key to getting the most out of your data

linkedin.com
Like Comment
To view or add a comment, sign in
Azizi Othman

AI & Robotics | IT Professional | Exploring the Impact of Emerging Technologies on Society & Business | "the idiot that doesn't know when to shut up" - My Mom
4w
Report this post
Your Data Quality Checks Are Worth Less (Than You Think) How to deliver outsized value on your data quality program Photo by Wolfgang Weiser on Unsplash Over the last several years, data quality and observability have become hot topics. There is a huge array of solutions in the space (in no particular order, and certainly not exhaustive): dbt tests SQLMesh audits Monte Carlo Great Expectations Soda Sifflet Regardless of their specific features, all of these tools have a similar goal: improve visibility of data quality issues, reduce the number of data incidents, and increase trust. Despite a lower barrier to entry, however, data quality programs remain difficult to implement successfully. I believe that there are three low-hanging fruit that can improve your outcomes. Let’s dive in! Hint 1: Focus on process failures, not bad records (when you can) For engineering-minded folks, it can be hard pill to swallow that some number of “bad” records will not only flow into your system but through your system, and that may be OK! Consider the following: Will the bad records flush out when corrected in the source system? If so, you may go to extraordinary lengths in your warehouse or lakehouse to correct data that is trivial for a source system operator to fix, with the result that your reporting is correct on the next refresh Is the dataset useful if it’s “directionally correct” in aggregate? CRM data is a classic example, since many fields need to be populated manually, and there’s a relatively high error rate compared to automated processes. Even if these errors aren’t corrected, as long as they’re not systemic, the dataset may still be useful Is accuracy of individual records extremely important? Financial reporting, operational reporting on sensor data from expensive machinery, and other “spot-critical” use cases deserve the time and effort needed to identify (and possibly isolate, remove, or remediate) bad records If your data product can tolerate Type 1 or Type 2 issues, fantastic! You can save a lot of effort by focusing on detection and alerting of process failures rather than one-off or limited anomalies. You can measure high-level metrics skimmed from metadata, such as record counts, unique counts of key columns, and min / max values. A rogue process in your application or SaaS systems can generate too many or too few records, or perhaps a new enumerated value has been added to a column unexpectedly. Depending on your specific use cases, you may need to write custom tests (e. g., total revenue by date and market segment or region), so make sure to profile your data and common failure scenarios. On the other hand, Type 3 issues require more complex systems and decisions. Do you move bad records to a dead-letter queue and send an alert for manual remediation? Do you build a self-healing process for well-understood data quality issues? Do you simply modify the record in some way to indicate the data quality issue so...

$Your Data Quality Checks Are Worth Less $Than You Think$ How to deliver outsized value on your data quality program Photo by Wolfgang Weiser on Unsplash Over the last several years, data quality and observability have become hot topics. There is a huge array of solutions in the space $in no particular order, and certainly not exhaustive$: dbt tests SQLMesh audits Monte Carlo Great...$

Your Data Quality Checks Are Worth Less $Than You Think$ How to deliver outsized value on your data quality program Photo by Wolfgang Weiser on Unsplash Over the last several years, data quality and observability have become hot topics. There is a huge array of solutions in the space $in no particular order, and certainly not exhaustive$: dbt tests SQLMesh audits Monte Carlo Great...

towardsdatascience.com
Like Comment
To view or add a comment, sign in

2,585 followers

View Profile Follow

Joon Solutions’ Post

Why is data lineage critical for a data team?

medium.com

More from this author

Roadmap: Data Infrastructure [2021]

What Are Data Silos Costing Your Organization?

Explore topics