-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute generator saved locals on MIR #101692
Conversation
I don't currently have time to review a PR this large but I think this work is super exciting, thank you for tackling it ❤️ |
ee1c144
to
2941ec7
Compare
This comment has been minimized.
This comment has been minimized.
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 6c9ec2af2224dcbfba350ffdd69a27e5030000c1 with merge 8ea250bdb76176ecb0225baa34aa634b024ae0e2... |
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
Queued 8ea250bdb76176ecb0225baa34aa634b024ae0e2 with parent 0d56e34, future comparison URL. |
Finished benchmarking commit (8ea250bdb76176ecb0225baa34aa634b024ae0e2): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Footnotes |
@craterbot check |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
🎉 Experiment
|
36cbe4d
to
f2506c0
Compare
This comment has been minimized.
This comment has been minimized.
d39d4d0
to
b134e71
Compare
Finished benchmarking commit (6cd6bad): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
…iler-errors Only compute mir_generator_witnesses query in drop_tracking_mir mode. Attempt to fix the perf regression in rust-lang#101692 r? `@ghost`
Thanks! That seems sufficient to me, marking as triaged. |
#107443 has removed the remaining 1%. |
…li-obk Compute generator saved locals on MIR Generators are currently type-checked by introducing a `witness` type variable, which is unified with a `GeneratorWitness(captured types)` whose purpose is to ensure that the auto traits correctly migrate from the captured types to the `witness` type. This requires computing the captured types on HIR during type-checking, only to re-do it on MIR later. This PR proposes to drop the HIR-based computation, and only keep the MIR one. This is done in 3 steps. 1. During type-checking, the `witness` type variable is never unified. This allows to stall all the obligations that depend on it until the end of type-checking. Then, the stalled obligations are marked as successful, and saved into the typeck results for later verification. 2. At type-checking writeback, `witness` is replaced by `GeneratorWitnessMIR(def_id, substs)`. From this point on, all trait selection involving `GeneratorWitnessMIR` will fetch the MIR-computed locals, similar to what opaque types do. There is no lifetime to be preserved here: we consider all the lifetimes appearing in this witness type to be higher-ranked. 3. After borrowck, the stashed obligations are verified against the actually computed types, in the `check_generator_obligations` query. If any obligation was wrongly marked as fulfilled in step 1, it should be reported here. There are still many issues: - ~I am not too happy having to filter out some locals from the checked bounds, I think this is MIR building that introduces raw pointers polluting the analysis;~ solved by a check specific to static variables. - the diagnostics for captured types don't show where they are used/dropped; - I do not attempt to support chalk. cc `@eholk` `@jyn514` for the drop-tracking work r? `@oli-obk` as you warned me of potential unsoundness
(GeneratorWitnessMIR) Compute generator saved locals on MIR rust-lang/rust#101692 (ParamEnv) InstCombine away intrinsic validity assertions rust-lang/rust#105582 (Primitive::Pointer) abi: add AddressSpace field to Primitive::Pointer rust-lang/rust#107248
(GeneratorWitnessMIR) Compute generator saved locals on MIR rust-lang/rust#101692 (ParamEnv) InstCombine away intrinsic validity assertions rust-lang/rust#105582 (Primitive::Pointer) abi: add AddressSpace field to Primitive::Pointer rust-lang/rust#107248
Enable -Zdrop-tracking-mir by default This PR enables the `drop-tracking-mir` flag by default. This flag was initially implemented in rust-lang#101692. This flag computes auto-traits on generators based on their analysis MIR, instead of trying to compute on the HIR body. This removes the need for HIR-based drop-tracking, as we can now reuse the same code to compute generator witness types and to compute generator interior fields.
Enable -Zdrop-tracking-mir by default This PR enables the `drop-tracking-mir` flag by default. This flag was initially implemented in rust-lang#101692. This flag computes auto-traits on generators based on their analysis MIR, instead of trying to compute on the HIR body. This removes the need for HIR-based drop-tracking, as we can now reuse the same code to compute generator witness types and to compute generator interior fields.
Enable -Zdrop-tracking-mir by default This PR enables the `drop-tracking-mir` flag by default. This flag was initially implemented in rust-lang#101692. This flag computes auto-traits on generators based on their analysis MIR, instead of trying to compute on the HIR body. This removes the need for HIR-based drop-tracking, as we can now reuse the same code to compute generator witness types and to compute generator interior fields.
…try> Support async recursive calls (as long as they have indirection) Before rust-lang#101692, we stored coroutine witness types directly inside of the coroutine. That means that a coroutine could not contain itself (as a witness field) without creating a cycle in the type representation of the coroutine, which we detected with the `OpaqueTypeExpander`, which is used to detect cycles when expanding opaque types after that are inferred to contain themselves. After `-Zdrop-tracking-mir` was stabilized, we no longer store these generator witness fields directly, but instead behind a def-id based query. That means there is no technical obstacle in the compiler preventing coroutines from containing themselves per se, other than the fact that for a coroutine to have a non-infinite layout, it must contain itself wrapped in a layer of allocation indirection (like a `Box`). This means that it should be valid for this code to work: ``` async fn async_fibonacci(i: u32) -> u32 { if i == 0 || i == 1 { i } else { Box::pin(async_fibonacci(i - 1)).await + Box::pin(async_fibonacci(i - 2)).await } } ``` Whereas previously, you'd need to coerce the future to `Pin<Box<dyn Future<Output = ...>>` before `await`ing it, to prevent the async's desugared coroutine from containing itself across as await point. This PR does two things: 1. Remove the behavior from `OpaqueTypeExpander` where it intentionally fetches and walks through the coroutine's witness fields. This was kept around after `-Zdrop-tracking-mir` was stabilized so we would not be introducing new stable behavior, and to preserve the much better diagnostics of async recursion compared to a layout error. 2. Reworks the way we report layout errors having to do with coroutines, to make up for the diagnostic regressions introduced by (1.). We actually do even better now, pointing out the call sites of the recursion!
…try> Support async recursive calls (as long as they have indirection) Before rust-lang#101692, we stored coroutine witness types directly inside of the coroutine. That means that a coroutine could not contain itself (as a witness field) without creating a cycle in the type representation of the coroutine, which we detected with the `OpaqueTypeExpander`, which is used to detect cycles when expanding opaque types after that are inferred to contain themselves. After `-Zdrop-tracking-mir` was stabilized, we no longer store these generator witness fields directly, but instead behind a def-id based query. That means there is no technical obstacle in the compiler preventing coroutines from containing themselves per se, other than the fact that for a coroutine to have a non-infinite layout, it must contain itself wrapped in a layer of allocation indirection (like a `Box`). This means that it should be valid for this code to work: ``` async fn async_fibonacci(i: u32) -> u32 { if i == 0 || i == 1 { i } else { Box::pin(async_fibonacci(i - 1)).await + Box::pin(async_fibonacci(i - 2)).await } } ``` Whereas previously, you'd need to coerce the future to `Pin<Box<dyn Future<Output = ...>>` before `await`ing it, to prevent the async's desugared coroutine from containing itself across as await point. This PR does two things: 1. Remove the behavior from `OpaqueTypeExpander` where it intentionally fetches and walks through the coroutine's witness fields. This was kept around after `-Zdrop-tracking-mir` was stabilized so we would not be introducing new stable behavior, and to preserve the much better diagnostics of async recursion compared to a layout error. 2. Reworks the way we report layout errors having to do with coroutines, to make up for the diagnostic regressions introduced by (1.). We actually do even better now, pointing out the call sites of the recursion!
Support async recursive calls (as long as they have indirection) Before rust-lang#101692, we stored coroutine witness types directly inside of the coroutine. That means that a coroutine could not contain itself (as a witness field) without creating a cycle in the type representation of the coroutine, which we detected with the `OpaqueTypeExpander`, which is used to detect cycles when expanding opaque types after that are inferred to contain themselves. After `-Zdrop-tracking-mir` was stabilized, we no longer store these generator witness fields directly, but instead behind a def-id based query. That means there is no technical obstacle in the compiler preventing coroutines from containing themselves per se, other than the fact that for a coroutine to have a non-infinite layout, it must contain itself wrapped in a layer of allocation indirection (like a `Box`). This means that it should be valid for this code to work: ``` async fn async_fibonacci(i: u32) -> u32 { if i == 0 || i == 1 { i } else { Box::pin(async_fibonacci(i - 1)).await + Box::pin(async_fibonacci(i - 2)).await } } ``` Whereas previously, you'd need to coerce the future to `Pin<Box<dyn Future<Output = ...>>` before `await`ing it, to prevent the async's desugared coroutine from containing itself across as await point. This PR does two things: 1. Only report an error if an opaque expansion cycle is detected *not* through coroutine witness fields. * Instead, if we find an opaque cycle through coroutine witness fields, we compute the layout of the coroutine. If that results in a cycle error, we report it as a recursive async fn. 4. Reworks the way we report layout errors having to do with coroutines, to make up for the diagnostic regressions introduced by (1.). We actually do even better now, pointing out the call sites of the recursion!
Generators are currently type-checked by introducing a
witness
type variable, which is unified with aGeneratorWitness(captured types)
whose purpose is to ensure that the auto traits correctly migrate from the captured types to thewitness
type. This requires computing the captured types on HIR during type-checking, only to re-do it on MIR later.This PR proposes to drop the HIR-based computation, and only keep the MIR one. This is done in 3 steps.
witness
type variable is never unified. This allows to stall all the obligations that depend on it until the end of type-checking. Then, the stalled obligations are marked as successful, and saved into the typeck results for later verification.witness
is replaced byGeneratorWitnessMIR(def_id, substs)
. From this point on, all trait selection involvingGeneratorWitnessMIR
will fetch the MIR-computed locals, similar to what opaque types do. There is no lifetime to be preserved here: we consider all the lifetimes appearing in this witness type to be higher-ranked.check_generator_obligations
query. If any obligation was wrongly marked as fulfilled in step 1, it should be reported here.There are still many issues:
I am not too happy having to filter out some locals from the checked bounds, I think this is MIR building that introduces raw pointers polluting the analysis;solved by a check specific to static variables.cc @eholk @jyn514 for the drop-tracking work
r? @oli-obk as you warned me of potential unsoundness