🎄 Day 4 of the DataExpert.io Bootcamp!
I took a break yesterday to enjoy a wonderful evening with my wife at our town’s Christmas tree lighting 🎅🌟—but I’m back on track today!
I wrapped up Day 2 lecture and lab solo with no issues and learned some key data engineering concepts:
🔑 Idempotency:
What it is, what causes it, and how it impacts pipelines.
🔑 Slowly Changing Dimensions (SCDs):
Explored the different types of SCDs and two approaches to building pipelines to handle them:
Full data loads: Processes all data but can be expensive due to window functions on large datasets.
Incremental loads: A more efficient approach that breaks data into CTEs (historical, unchanged, changed, and new records) and unions them into a single table. It’s more complex to build but significantly reduces compute cost—processing up to 20x less data!
I also learned about potential pitfalls of incremental pipelines, like filtering out data when columns are null, and how to ensure assumptions about data integrity hold true.
Feeling confident after today’s progress and excited to dive into more labs!
📢 To everyone tackling similar challenges, what’s been your biggest takeaway so far in building pipelines?
Catch me Monday morning on Twitch for more streams: https://lnkd.in/ebWqpRXH
#DataEngineering #BootcampJourney #IncrementalLoads #SCD #TwitchStreaming #LearningTogether
Managing Director and Partner BCG 🚀COO/GM/BU lead 💡 Company and Product Builder and Scaler 🌟Innovation and Growth 🌟 Digital 💡 Partnerships
2wSuper job! And great to see you and Yael Biran !💪💪