-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Account for change in UTC offset when calculating next schedule #35887
Conversation
e6edff5
to
44c371b
Compare
Hey this is actually works better than I expected, just one failing test that doesn’t seem to be related to folding. Maybe the actual logic isn’t that complicated. |
I did some additional debugging and it seems like the root issue is croniter does not take account Since we essentially treats croniter as a black box, I don’t think there is a way to detect this issue in Airflow, and every workaround would only result in another bug somewhere else. I believe the only viable fix would be to fix croniter, unfortunately. |
A minor plus is I found a bug in CPython while debugging this. python/cpython#112638 |
235d366
to
a29020a
Compare
I implemented a workaround for “common” cases that I can think of. The only logic I really changed is how we decide to apply the DST workaround—instead of the weird “fixed” concept, I peaked into the cron expression to figure out whether the DAG is run hourly or more frequentm and the workaround is applied if it is. This covers all the cron cases I can think of, and should not induce a backward incompatiblity (outside of fixing the logically incorrect behaviour as described in the original bug). |
I don't think it's worthwhile to break compatibility on folding.
And they all pass :)
a29020a
to
122b152
Compare
This looks similar to something I tried to do on the other PR but it didn't work in all cases. I think the example that broke was for both DST transitions, does '0 9 * * *' and '0 9,10 * * *' start at the same time (9am local) the day of the transition. |
There is a test case for |
Let me throw a few tests I have locally at this PR and see if anything fails |
Ok nvm! I think this is good, here is my test (adapted to Zurich): import pendulum
import datetime
from airflow.models.dag import DAG
from airflow.utils import timezone
from airflow.timetables.interval import CronDataIntervalTimetable
# Zurich (Switzerland) is chosen since it is +1/+2 DST, making it a bit easier
# to get my head around the mental timezone conversion.
# In 2023, DST entered on 26th Mar, 2am local clocks (1am UTC) were turned
# forward to 3am. DST exited on 29th Oct, 3am local clocks (1am UTC) were
# turned backward to 2am (making the 2:XX hour fold).
# Interval cron
cron_interval = CronDataIntervalTimetable("0 0,3 * * *", timezone=pendulum.timezone("Europe/Zurich"))
# Fixed cron
cron_fixed = CronDataIntervalTimetable("0 3 * * *", timezone=pendulum.timezone("Europe/Zurich"))
# ENTERING
# These two last executed at 00:00 today and 03:00 yesterday, respectively.
# Both of them should next execute at 3am local time today
last_run_interval = pendulum.datetime(2023, 3, 26, 0, tz="Europe/Zurich").in_timezone("UTC")
last_run_fixed = pendulum.datetime(2023, 3, 25, 3, tz="Europe/Zurich").in_timezone("UTC")
# these should be 2023-03-25T23:00:00+00:00 and 2023-03-24T23:00:00+00:00
expected_last_run_interval_utc_iso = "2023-03-25T23:00:00+00:00"
expected_last_run_fixed_utc_iso = "2023-03-25T02:00:00+00:00"
print(last_run_interval.isoformat(), last_run_interval.isoformat() == expected_last_run_interval_utc_iso)
print(last_run_fixed.isoformat(), last_run_fixed.isoformat() == expected_last_run_fixed_utc_iso)
# these should be 2023-03-26T00:00:00+01:00 and 2023-03-25T00:00:00+01:00
expected_last_run_interval_local_iso = "2023-03-26T00:00:00+01:00"
expected_last_run_fixed_local_iso = "2023-03-25T03:00:00+01:00"
print(last_run_interval.in_timezone("Europe/Zurich").isoformat(), last_run_interval.in_timezone("Europe/Zurich").isoformat() == expected_last_run_interval_local_iso)
print(last_run_fixed.in_timezone("Europe/Zurich").isoformat(), last_run_fixed.in_timezone("Europe/Zurich").isoformat() == expected_last_run_fixed_local_iso)
# Now do next run
next_run_interval = cron_interval._get_next(last_run_interval)
next_run_fixed = cron_fixed._get_next(last_run_fixed)
# these should be 2023-03-26T01:00:00+00:00
expected_next_run_interval_utc_iso = "2023-03-26T01:00:00+00:00"
expected_next_run_fixed_utc_iso = "2023-03-26T01:00:00+00:00"
print(next_run_interval.isoformat(), next_run_interval.isoformat() == expected_next_run_interval_utc_iso)
print(next_run_fixed.isoformat(), next_run_fixed.isoformat() == expected_next_run_fixed_utc_iso)
expected_next_run_interval_local_iso = "2023-03-26T03:00:00+02:00"
expected_next_run_fixed_local_iso = "2023-03-26T03:00:00+02:00"
print(next_run_interval.in_timezone("Europe/Zurich").isoformat(), next_run_interval.in_timezone("Europe/Zurich").isoformat() == expected_next_run_interval_local_iso)
print(next_run_fixed.in_timezone("Europe/Zurich").isoformat(), next_run_fixed.in_timezone("Europe/Zurich").isoformat() == expected_next_run_fixed_local_iso)
# LEAVING
# These two last executed at 00:00 today and 03:00 yesterday, respectively.
# Both of them should next execute at 3am local time today
last_run_interval = pendulum.datetime(2023, 10, 29, 0, tz="Europe/Zurich").in_timezone("UTC")
last_run_fixed = pendulum.datetime(2023, 10, 28, 3, tz="Europe/Zurich").in_timezone("UTC")
# these should be 2023-10-28T22:00:00+00:00 and 2023-10-28T01:00:00+00:00
expected_last_run_interval_utc_iso = "2023-10-28T22:00:00+00:00"
expected_last_run_fixed_utc_iso = "2023-10-28T01:00:00+00:00"
print(last_run_interval.isoformat(), last_run_interval.isoformat() == expected_last_run_interval_utc_iso)
print(last_run_fixed.isoformat(), last_run_fixed.isoformat() == expected_last_run_fixed_utc_iso)
# these should be 2023-10-29T00:00:00+02:00 and 2023-10-28T03:00:00+02:00
expected_last_run_interval_local_iso = "2023-10-29T00:00:00+02:00"
expected_last_run_fixed_local_iso = "2023-10-28T03:00:00+02:00"
print(last_run_interval.in_timezone("Europe/Zurich").isoformat(), last_run_interval.in_timezone("Europe/Zurich").isoformat() == expected_last_run_interval_local_iso)
print(last_run_fixed.in_timezone("Europe/Zurich").isoformat(), last_run_fixed.in_timezone("Europe/Zurich").isoformat() == expected_last_run_fixed_local_iso)
# Now do next run
next_run_interval = cron_interval._get_next(last_run_interval)
next_run_fixed = cron_fixed._get_next(last_run_fixed)
# these should be 2023-03-26T01:00:00+00:00
expected_next_run_interval_utc_iso = "2023-10-29T02:00:00+00:00"
expected_next_run_fixed_utc_iso = "2023-10-29T02:00:00+00:00"
print(next_run_interval.isoformat(), next_run_interval.isoformat() == expected_next_run_interval_utc_iso)
print(next_run_fixed.isoformat(), next_run_fixed.isoformat() == expected_next_run_fixed_utc_iso)
expected_next_run_interval_local_iso = "2023-10-29T03:00:00+01:00"
expected_next_run_fixed_local_iso = "2023-10-29T03:00:00+01:00"
print(next_run_interval.in_timezone("Europe/Zurich").isoformat(), next_run_interval.in_timezone("Europe/Zurich").isoformat() == expected_next_run_interval_local_iso)
print(next_run_fixed.in_timezone("Europe/Zurich").isoformat(), next_run_fixed.in_timezone("Europe/Zurich").isoformat() == expected_next_run_fixed_local_iso) Main:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just needs a news fragment, mine was here: https://github.com/timkpaine/airflow/blob/ebfbbfa82c625e9dc70994c92dcac4ccf59e6d3b/newsfragments/30083.significant.rst but probably can do better wording now that you've fully investigated the problems
|
||
from airflow.timetables._cron import CronMixin | ||
|
||
if not isinstance(self.timetable, CronMixin): | ||
return True | ||
|
||
from croniter import croniter | ||
|
||
cron = croniter(self.timetable._expression) | ||
next_a = cron.get_next(datetime.datetime) | ||
next_b = cron.get_next(datetime.datetime) | ||
return next_b.minute == next_a.minute and next_b.hour == next_a.hour |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's not used anywhere, and should not be used anywhere, why make a change to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah because you remove _should_fix_dst
While this technically happens for all runs (in such a timezone), we only | ||
really care about runs that happen at least once an hour, and can | ||
provide a somewhat reasonable rationale to skip the fold hour for things | ||
such as ``*/2`` (every two hour). So we try to *minially* peak into croniter | ||
internals to work around the issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this technically happens for all runs (in such a timezone), we only | |
really care about runs that happen at least once an hour, and can | |
provide a somewhat reasonable rationale to skip the fold hour for things | |
such as ``*/2`` (every two hour). So we try to *minially* peak into croniter | |
internals to work around the issue. | |
While this technically happens for all cron schedules (in such a timezone), we only | |
really care about schedules that run at least once an hour; for schedules that run | |
less frequently, such as ``*/2`` (every two hours), it seems reasonable to skip the | |
fold hour. So we try to *minially* peak into croniter internals to work around the issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mainly this suggestion
(1) tries to clarify "happens for all runs" vs "happens for all cron schedules". A given run is just a single instant in time but this is more about the schedule, i.e. computing next run from this run
and (2) suggests to just state the rationale rather than saying "we can provide" but don't provide
Does this need a significant fragment? I’m assuming it does not since there’s no backward incompatibility. |
# Last run before DST. Interval starts and ends on 2am UTC (local time is +1). | ||
next_info = timetable.next_dagrun_info(last_automated_data_interval=None, restriction=restriction) | ||
assert next_info and next_info.data_interval == DataInterval( | ||
pendulum.datetime(2023, 3, 24, 2, tz=TIMEZONE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it strikes me that since this assumes TIMEZONE will be UTC (i think) we should just specify UTC here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to me, fwiw, i think it would also be easier to reason about the tests if we compared in native UTC datetimes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think all TIMEZONE in this file should be replaced by UTC. I did this for consistency so a later PR can just change them all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
Ah damn I forgot to apply the docstring change. Will do a follow up PR. |
Co-authored-by: Daniel Standish <[email protected]> (cherry picked from commit c4549d7)
Doesn't make a difference to me but this does change how certain things are scheduled. Does a schedule being run differently count as significant? |
There's a bugfix newsfragment type I think. Isn't this a bugfix? Doesn't every bugfix imply some kind of behavior change? I guess what's the big change here in behavior? Previously when a dag with this kind of schedule crossed a boundary the dag would get stuck (and if catchup=True, stuck forever). That should no longer be the case. |
It changes when things will get scheduled, see my code above. So the same schedule will run at different times after upgrade |
Well, let's assess and articulate what the actual behavior change is. We'll need that anyway, however we categorize it. I took a look at your code and refactored a bit to better understand what the actual change is. Here's my version from __future__ import annotations
from datetime import datetime
import pendulum
from airflow.timetables.interval import CronDataIntervalTimetable
# Zurich (Switzerland) is chosen since it is +1/+2 DST, making it a bit easier
# to get my head around the mental timezone conversion.
# In 2023, DST entered on 26th Mar, 2am local clocks (1am UTC) were turned
# forward to 3am. DST exited on 29th Oct, 3am local clocks (1am UTC) were
# turned backward to 2am (making the 2:XX hour fold).
CET = "Europe/Zurich"
def to_local_iso(val: str):
return pendulum.parse(val).in_timezone(CET).isoformat()
def run_check(starting_run, timetable, exp_utc_iso):
print(f"when last run is\t\t\t{starting_run.in_timezone(CET)} / {starting_run}...")
exp_local = to_local_iso(exp_utc_iso)
print(f"then next run should be\t\t{exp_local} / {exp_utc_iso}")
next_run = timetable._get_next(starting_run)
if (dt := next_run.isoformat()) and dt == exp_utc_iso:
print("and it is!")
else:
print(f"but instead it is\t\t\t{to_local_iso(dt)} / {dt}")
delta = (pendulum.parse(dt) - pendulum.parse(exp_utc_iso)).in_minutes()
print(f"this is {abs(delta)} minutes {'ahead of' if delta < 0 else 'behind'} schedule")
print()
timetable_irregular = CronDataIntervalTimetable("0 0,3 * * *", timezone=pendulum.timezone(CET))
starting_run1 = pendulum.datetime(2023, 3, 26, 0, tz=CET).in_timezone("UTC")
run_check(
starting_run=starting_run1,
timetable=timetable_irregular,
exp_utc_iso="2023-03-26T01:00:00+00:00",
)
starting_run2 = pendulum.datetime(2023, 10, 29, 0, tz=CET).in_timezone("UTC")
expected_next_run = (
pendulum.instance(datetime(2023, 10, 29, 3, fold=1, tzinfo=pendulum.timezone(CET)))
.in_timezone("UTC")
.isoformat()
)
assert expected_next_run == "2023-10-29T02:00:00+00:00"
run_check(
starting_run=starting_run2,
timetable=timetable_irregular,
exp_utc_iso=expected_next_run, # runs on the second occurrence of 3am
) and i think this is what the change boils down to... before the change this is the output:
Caveat, this is all very confusing and hard to wrap one's head around but that said, let's discusse the two cases. In case 1, spring forward, the next run is supposed to at 3am (2 hours after the last run, since an hour is skipped) but instead the next run is calculated as 4am, which is simply the wrong time by any assessment. So one behavior change is, instead of running at 4am, it will run at 3am, the correct clock time. This to me seems like a bugfix. Side note: we might also want to verify the behavior for a In case 2, fall back, the next run is supposed to be the 3am. In actuality, this is an unambiguous time reference. At fall back, the clock only strikes 3am one time. I.e. after 2:59 am, it goes to 2:00 again -- not first to 3am and then back to 2am. Which is to say, the next run is at the wrong clock time again -- 2am instead of 3am. So again, to me this looks like a bugfix. The other thing to note is that we also observed this year that with fall back, the dag would actually get stuck (if catchup=True, then forever, or until next run is past transition otherwise). So, presuming this fixes that, it's a bugfix on another layer. Note, I'm not arguing to "win", just trying to further our own understandings of what this is and how best to represent this to users. Do I understand this correctly? Do you think I'm describing the details correctly? |
There is a broader conversation around "bug fix" vs "feature change" in the original ticket but yes on all parts and my stance on the original ticket is that it's bug fixes |
Restarting the effort in #30083 from scratch from some failing test cases we want to fulfill. Some changes in the other PR may be cherry-picked into here at some point.
Fix #7999 eventually.