Georg Heiler’s Post

solving critical data challenges

What if data pipelines were not only bespoke glue code? What if there were more building blocks and best practices to build on? See this template for a local data stack - batteries included. I hope this can serve as a building block to speed things up and build with higher quality & confidence. This Local data stack template is based on #oss #data tools like #duckdb #dagster #dbt #rust tooling for code quality and #pixi for effortless dependency handling https://buff.ly/4feldk6 https://buff.ly/4feldk6 Be advised: Even though it is local it can be easily scaled in the clouds of your choice based on #k8s with smart partition handling.

Local data stack template | Georg Heiler

georgheiler.com

3 Comments

Georg Heiler

solving critical data challenges

If you want to handle even larger data (> memory of 1 machine) see https://georgheiler.com/post/paas-as-implementation-detail how you can extend this template

Georg Heiler

solving critical data challenges

Just recently #fivetran made a pricing change and their transformation support via dbt-core no longer free (https://www.reddit.com/r/dataengineering/comments/1gjoejj/fivetran_just_made_dbt_core_not_free/ https://fivetran.com/docs/transformations/troubleshooting) potentially using this template will allow you to save money

See more comments

To view or add a comment, sign in

More Relevant Posts

Emmanuel Ogunwede

Machine Learning | Deep learning | Software Engineering
1w
Report this post
A few months ago, I had the opportunity to start tinkering around with Apache Iceberg. This article summarizes many weeks of experimenting, architecting, and implementing a proof-of-concept solution for batch ETL pipelines from various sources into Iceberg tables. I had a ton of fun exploring tools like #Dremio #Trino #Nessie #dltHub #SlingData, and (my personal favorite Orchestration platform) #Dagster If you're curious about building batch ETL around Iceberg or just want to see how tools like dltHub, #Sling and #Dagster from Dagster Labs can play together to help you build a Data Lakehouse using open source technologies, you should definitely check this out!!

Brooklyn Data Co. (a Velir company)

4,608 followers
1w

Struggling with data flexibility in a vendor-locked world? 🔏 In our latest blog, Emmanuel Ogunwede dives into how Dagster Labs’s embedded ELT framework can supercharge data lakehouse ingestion. Discover how Apache Iceberg and open, flexible tools can future-proof your data infrastructure while simplifying your ETL. 🌐 Ready to streamline your data workflows? Click to learn how https://ow.ly/oxNJ50U72x3

Using Dagster’s Embedded ELT for Lakehouse Data Ingestion
Like Comment
To view or add a comment, sign in
Brooklyn Data Co. (a Velir company)

4,608 followers
1w
Report this post
Struggling with data flexibility in a vendor-locked world? 🔏 In our latest blog, Emmanuel Ogunwede dives into how Dagster Labs’s embedded ELT framework can supercharge data lakehouse ingestion. Discover how Apache Iceberg and open, flexible tools can future-proof your data infrastructure while simplifying your ETL. 🌐 Ready to streamline your data workflows? Click to learn how https://ow.ly/oxNJ50U72x3

Using Dagster’s Embedded ELT for Lakehouse Data Ingestion
Like Comment
To view or add a comment, sign in
Joshua Patterson

Co-founder Voltron Data, Creator of RAPIDS.ai, White House Presidential Innovation Fellow
2mo
Report this post
This is the power of #composabledatasystems! Write code once, swap engines… inputs and outputs unchanged.
Mimoune Djouallah
2mo Edited

When you work with an open storage system, you're opening the door to possibilities you never imagined. With #OneLake leveraging Delta, any engine that understands the protocol can jump in and start working its magic (assuming you have access, of course). Take this for example: I’m running queries using Ibis and seamlessly flipping the execution engines between #MicrosoftFabric DWH, #Spark #DuckDB, and #Datafusion. The Dataframe API and visualization? Totally unchanged. As expected, connection configuration are engine specific !!! The wildest part? DuckDB and Datafusion share even the connection syntax, just changed one word con = eval(f"ibis.{engine}.connect()") I am calling it storage has won 😁 I attached the code here : https://lnkd.in/g8GDD9tn
Like Comment
To view or add a comment, sign in
Akash G.

Co-founder at Codefex | Al/ML, & Data
8mo Edited
Report this post
There can be many solutions to solve a use case in data and analytics. But the problem arises when you suddenly get the scale and your pipelines start breaking. The multiple clunky tools, point solutions ditch you when it's most needed. So, the data platform has to be a unified, managed, self-serving & cost effective. Your bandwidth should focus on Innovation not on Managing platforms, servers, and praying it works fine. Btw if your data in the warehouse is <30 GB and the multiple data platform with bandwidth costs more than the salary of one data analyst. Then you should either DM me or go and deploy Datazip from Datazip.io

Datazip - No-Code Data Engineering Platform for Analysts

datazip.io
Like Comment
To view or add a comment, sign in
Sunitha Muthukrishna

Principal Product Manager @ Microsoft | Improving Customer Experiences| Specialized in Marketplaces, Product Integrations and Growth | Mentor | Advocate for Inspiring Girls Now In Technology Evolution
1w
Report this post
📊 The Fabric API for GraphQL supports querying using views and stored procedures for data warehouses. This feature enhances data querying capabilities and provides more flexibility in data access. 🔗 Learn more: https://buff.ly/3YEdkxo #QueryViews #StoredProcedures #GraphQL #MicrosoftFabric #Warehouse #DataEngineering
Like Comment
To view or add a comment, sign in
Youcef Boudouh

Strategic Account Executive at SingleStore
4mo
Report this post
Amazing article published by Dave Eyler on how SingleStore’s bidirectional integration to Apache Iceberg can help you get the most of the data stored into your data lakehouse to power your mission critical applications and still guaranteeing sub-second SLA. https://lnkd.in/e9XWJwya

Unfreeze Apache Iceberg to Thaw Your Data Lakehouse

https://thenewstack.io
Like Comment
To view or add a comment, sign in
Meroxa

2,744 followers
8mo
Report this post
In today’s data-driven world, the complexity of integrating various data sources can overwhelm even the most experienced teams. Conduit offers a seamless solution, transforming the ETL process into a straightforward task. Discover how you can simplify your data integration and focus on what truly matters - gaining insights. https://lnkd.in/gccfqif6 #dataintegration #ETLsimplified #conduit #meroxa #developer

Open-source data integration refined

meroxa.com
Like Comment
To view or add a comment, sign in
openhousedb

58 followers
4mo Edited
Report this post
OpenHouse - The Kubernetes of Apache Iceberg Tables OpenHouse is an open-source control plane designed to streamline the management of Apache Iceberg tables in open data lakehouse deployments. It features a RESTful declarative Iceberg catalog and a range of data services, allowing users to define tables, schemas, and metadata in a declarative manner. OpenHouse ensures data integrity and operational efficiency by aligning the actual state of Iceberg tables with the desired state through orchestrated data services. Much like Kubernetes transformed Docker container management, OpenHouse is revolutionizing the management and governance of Iceberg tables. Creating a managed Iceberg lakehouse is crucial, as it provides a robust framework for routine tasks such as retention and replication, alongside Iceberg-specific data management activities. It also significantly enhances security and governance for Iceberg tables. Follow us on LinkedIn to discover more about how to productionize Iceberg with OpenHouse. Learn more on: https://lnkd.in/gnebYh2S Code: https://lnkd.in/gb-7kbUd #iceberg #openhouse

OpenHouse | OpenHouse

openhousedb.org
Like Comment
To view or add a comment, sign in
Ayush Gupta

Software Engineer
2mo
Report this post
Build a log ingestion system to manage and query large log datasets with a scalable log ingestor and query interface, offering real-time data ingestion. github: https://lnkd.in/gZpER_PJ Features - Filters on specific fields - Search in given time range - HTTP endpoint for posting logs - Kafka queue for streamlined Log processing - ngestion Buffer & Batch processing - Efficient search queries leveraging Elastic DB - Export logs in json - Add new logs via HTTP

GitHub - siAyush/ingestor: manage and query large log datasets with a scalable log ingestor and query interface, offering real time data ingestion

github.com
Like Comment
To view or add a comment, sign in
Alvin Vimal Mathews

Caped Crusader of User Experience | Founding Team @ Dassana
10mo
Report this post
Learn how Dassana is optimizing its operations with the power of Snowflake! Formerly expensive queries are now a breeze thanks to Snowflake's multi-node clusters, perfectly handling Dassana’s concurrency use-case.

Snowflake

959,051 followers
10mo

Formerly expensive queries that power Dassana’s Security Behavior Score are easily handled by Snowflake’s multi-node clusters, which handles Dassana’s concurrency use-case very well. Snowflake’s JSON support removes the need for an external schema manager to hold custom mappings. Dasasna also leverages Snowflake’s Snowpark API for data transformation, making the development of connectors extremely easy for their developers. Learn more: https://okt.to/B1x54Y #SnowflakeCyberApps
Like Comment
To view or add a comment, sign in

2,464 followers

471 Posts

View Profile Connect

Georg Heiler’s Post

More Relevant Posts

Explore topics