DAG integrity and coding conventions are critical. Join Austin Bennett from ChartBoost as he shares lessons on testing/verifying DAGs in GitHub workflows, unlocking efficiencies, and catching errors pre-deployment. Learn how Airflow & CI checks improve processes. #ApacheAirflow https://airflowsummit.org/
Airflow Summit’s Post
More Relevant Posts
-
I've been working on a project with Airflow and my API call task started erroring around rate limiting. Next thing you know I'm writing retry code around exponential backoff. It's always a great feeling to write simple and powerful code. Not to mention those green check marks are basically heroine. #dataengineering
To view or add a comment, sign in
-
Building models that stand the test of time. ⏳ With this TimestampMixin, each database entry tells a story—from the moment it’s created to every modification along the way. By embedding automated created_at and updated_at timestamps, we enable seamless data tracking and maintain a transparent history with minimal code. Such patterns aren’t just about code elegance—they’re the foundation of scalable, traceable systems. 💡 #DataIntegrity #PythonMagic #SQLAlchemy #WebDev #CodingCraftsmanship #SoftwareEngineering #FullStackInnovation #Automation
To view or add a comment, sign in
-
Loved participating in this panel discussing #Streamlit! Check out our thoughts on how to take advantage of it as part of your Snowflake toolkit.
🔍 Curious about how DAS42 leverages #Streamlit in Snowflake? Our skilled architects and developers recently discussed how they have been crafting innovative tools and applications for years. From chatbots to data quality audits and no-code interfaces, we’re making data-driven solutions a reality. Dive into our latest blog to see how we maximize Streamlit’s potential!
To view or add a comment, sign in
-
🚀 Day 10/100 of #100DaysOfCode on LeetCode! Today's challenge was about implementing a Calendar Booking System using a map data structure. The objective was to avoid triple bookings when scheduling events. Problem Breakdown: Input: A pair of integers representing start and end times for events. Output: A boolean indicating whether an event can be added without causing three overlapping events. Key Concepts: 🔹 Map Data Structure: Used to track event starts (+1) and ends (-1) by time. 🔹 Interval Conflict Detection: By traversing the map, we ensure no moment in time has more than two overlapping events. 🔹 Optimization: The ordered map helps efficiently manage time intervals. Code Structure: Insert: Add start and end times to the map, adjusting counters. Traverse: Check cumulative event overlap by iterating through the time intervals. Time Complexity (TC): O(log n) for insertion and traversal. Space Complexity (SC): O(n) to store event boundaries. Learning about scheduling algorithms and how to handle complex intervals was super fun today! 💻🔥 #LeetCode #CodingJourney #ProblemSolving #100DaysOfCode #DataStructures #Algorithms #Scheduling #Consistency
To view or add a comment, sign in
-
🎯 Day 1 of My 180-Day Challenge! 🎯 Kicking off this journey with the basics of Data Structures and Algorithms (DSA)! Today, I focused on understanding how to write flowcharts and pseudocode—essential tools for visualizing and structuring problem-solving processes. 🌟 What I Learned: * Flowcharts – Mapping out the logical steps of an algorithm visually, making complex problems easier to tackle. * Pseudocode – Writing algorithmic steps in plain language, which bridges the gap between the concept and actual code. These foundations are crucial for solving complex problems efficiently and clearly. 💡 Excited for what’s next in this challenge! Let’s keep pushing forward. 🚀 #180DayChallenge #Day1 #DSA #Flowcharts #Pseudocode #ProblemSolving #TechJourney #LearningInPublic #LinkedIn
To view or add a comment, sign in
-
🚀 Day 9/100 of #100DaysOfCode on LeetCode! Today's challenge was about solving an Interval Booking Problem using Event Counting with a map data structure. The goal was to efficiently book time slots without overlapping events. Problem Breakdown: Input: A range of start and end times for events. Output: A boolean value indicating if the event can be successfully booked without overlapping with previous events. Key Concepts: 🔹 Map Data Structure: Storing events as key-value pairs, where the key is the time and the value tracks event start (+1) or end (-1). 🔹 Interval Handling: By traversing the map, we sum up event counts to check for overlapping intervals. 🔹 Optimization: Using an ordered map ensures efficient handling of time ranges while minimizing space complexity. Example: Input: book(10, 20), book(15, 25), book(20, 30) Output: true, false, true (as the second event overlaps with the first). Code Structure: Insert: Add the start and end times to the map, incrementing or decrementing event counters. Traverse: Calculate the total number of concurrent events to detect overlaps. Time Complexity (TC): O(log n), where n is the number of events. Space Complexity (SC): O(n), for storing the events in the map. Excited to explore more about interval scheduling and event handling techniques! 💻🔥 #LeetCode #CodingJourney #ProblemSolving #100DaysOfCode #DataStructures #Algorithms #EventHandling #Consistency
To view or add a comment, sign in
-
3 Surprising Use-cases for Branching in Airflow you’ve not seen before Your Data Pipelines can have as many branches as this nice tree. Photo by Andrew Svk on Unsplash Branching Conditionality is an important feature of many DAGs Introduction How often is it that you’re writing a Data Pipeline and then you wish you could do something contingently? Something that only happens if a set of conditions are satisfied? Hopefully, not that often! Airflow has supported this type of functionality via the AirflowBranchPython Operator. Many other workflow Orchestration tools have followed suit. Prefect have Conditional Flows, Dagster have DyanmicOutput, and in Orchestra we facilitate branching based on status. This leads us to the most important question: Why? Why bother at all with branching, thereby making your pipeline more complicated than it needs to be. We’ll see there are actually some pretty incredible use-cases, especially for folks that are looking for a greater amount of automation in their lives. A quick example of Branching in Airflow Before diving in to use-cases, we’ll use the below code as a reference so we can understand how branching works in practice. from airflow import DAG from airflow.operators.dummy import DummyOperator from airflow.operators.python import BranchPythonOperator from datetime import datetime def choose_branch(**kwargs): value = kwargs['ti'].xcom_pull(task_ids='check_value') if value > 10: return 'path_a' else: return 'path_b' default_args = { 'owner': 'airflow', 'start_date': datetime(2023, 1, 1), 'retries': 1, } dag = DAG('example_branching', default_args=default_args, schedule_interval='@daily') start = DummyOperator(task_id='start', dag=dag) check_value = PythonOperator( task_id='check_value', python_callable=lambda: 15, # Example condition value dag=dag ) branch_task = BranchPythonOperator( task_id='branch_task', provide_context=True, python_callable=choose_branch, dag=dag, ) path_a = DummyOperator(task_id='path_a', dag=dag) path_b = DummyOperator(task_id='path_b', dag=dag) end = DummyOperator(task_id='end', dag=dag) start >> check_value >> branch_task >> [path_a, path_b] >> end The choose_branch button function returns a different value depending on a task value that is stored in an xcom (a temporary data store for tasks). The branch_taskis actually a separate task, that invokes a python callable (in this case the choose_branch function). By specifying the variables path_aand path_b, and finally adding these as the possible outputs in array format to the branch_task, Airflow knows how to branch based on the branching logic. Automating Model Training and Deployment Branching is really powerful in the Machine Learning and Data Science world. Suppose you have a Machine Learning model that needs to be trained every week, because every week yo...
3 Surprising Use-cases for Branching in Airflow you’ve not seen before Your Data Pipelines can have as many branches as this nice tree. Photo by Andrew Svk on Unsplash Branching Conditionality is an important feature of many DAGs Introduction How often is it that you’re writing a Data Pipeline and then you wish you could do something contingently? Something that only happens if a set of c...
towardsdatascience.com
To view or add a comment, sign in
-
The labelling of a table column must be meaningful. It doesn't matter if a column name is longer than we would like, but here is not the aesthetic is the relevance when other person has to analyze the table attributes. The same with code function naming, vars, modules, and other. Clear naming representing the goal of the code block really makes the difference. We have to have the discipline to commit in present high quality of code, table naming, even path naming for docker containers in artifact registries. Finally, it will help to educate your eye, when we have to work deep in data quality screens.
To view or add a comment, sign in
-
🚀 Exciting Update: BootstrapRAG v0.0.5 is Now Live! 🚀 I'm thrilled to announce the release of BootstrapRAG version 0.0.5! 🎉 This update brings powerful new features to simplify and enhance your RAG and LlamaIndex workflows and deploy, making it easier to build advanced retrieval-augmented generation systems. Here’s what’s new: 🔥 Fully Implemented LlamaIndex Workflows & LlamaDeploy 💡 LlamaWorkflows with GuidelineEvaluator and RetryGuidelineQueryEngine 💼 LlamaDeploy supports 3 major implementations: SimpleMessageQueue, KafkaMessageQueue, and RabbitMQMessageQueue 🌐 Expose RAG as an API with Swagger ReDoc and standard docs for seamless integration 🐛 Bug fixes and enhanced documentation for smoother experiences github Repo: https://lnkd.in/gkU_2NiE This version makes deploying RAG solutions faster and more efficient than ever. Special thanks to Qdrant, LlamaIndex who provided feedback and support along the way! #GenAI, #LLM, #BootstrapRAG
To view or add a comment, sign in
-
When you build Pipelines in scikit-learn, use 𝐦𝐚𝐤𝐞_𝐩𝐢𝐩𝐞𝐥𝐢𝐧𝐞 instead of the Pipeline class. The Pipeline class can be long for more complex pipelines. 𝐦𝐚𝐤𝐞_𝐩𝐢𝐩𝐞𝐥𝐢𝐧𝐞 makes your pipeline definition short and elegant.
To view or add a comment, sign in
4,366 followers