Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute generator saved locals on MIR #101692

Merged
merged 15 commits into from
Jan 28, 2023

Conversation

cjgillot
Copy link
Contributor

@cjgillot cjgillot commented Sep 11, 2022

Generators are currently type-checked by introducing a witness type variable, which is unified with a GeneratorWitness(captured types) whose purpose is to ensure that the auto traits correctly migrate from the captured types to the witness type. This requires computing the captured types on HIR during type-checking, only to re-do it on MIR later.

This PR proposes to drop the HIR-based computation, and only keep the MIR one. This is done in 3 steps.

  1. During type-checking, the witness type variable is never unified. This allows to stall all the obligations that depend on it until the end of type-checking. Then, the stalled obligations are marked as successful, and saved into the typeck results for later verification.
  2. At type-checking writeback, witness is replaced by GeneratorWitnessMIR(def_id, substs). From this point on, all trait selection involving GeneratorWitnessMIR will fetch the MIR-computed locals, similar to what opaque types do. There is no lifetime to be preserved here: we consider all the lifetimes appearing in this witness type to be higher-ranked.
  3. After borrowck, the stashed obligations are verified against the actually computed types, in the check_generator_obligations query. If any obligation was wrongly marked as fulfilled in step 1, it should be reported here.

There are still many issues:

  • I am not too happy having to filter out some locals from the checked bounds, I think this is MIR building that introduces raw pointers polluting the analysis; solved by a check specific to static variables.
  • the diagnostics for captured types don't show where they are used/dropped;
  • I do not attempt to support chalk.

cc @eholk @jyn514 for the drop-tracking work
r? @oli-obk as you warned me of potential unsoundness

@rustbot rustbot added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Sep 11, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 11, 2022
@jyn514
Copy link
Member

jyn514 commented Sep 11, 2022

I don't currently have time to review a PR this large but I think this work is super exciting, thank you for tackling it ❤️

@rust-log-analyzer

This comment has been minimized.

@cjgillot
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 11, 2022
@bors
Copy link
Contributor

bors commented Sep 11, 2022

⌛ Trying commit 6c9ec2af2224dcbfba350ffdd69a27e5030000c1 with merge 8ea250bdb76176ecb0225baa34aa634b024ae0e2...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Sep 11, 2022

☀️ Try build successful - checks-actions
Build commit: 8ea250bdb76176ecb0225baa34aa634b024ae0e2 (8ea250bdb76176ecb0225baa34aa634b024ae0e2)

@rust-timer
Copy link
Collaborator

Queued 8ea250bdb76176ecb0225baa34aa634b024ae0e2 with parent 0d56e34, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8ea250bdb76176ecb0225baa34aa634b024ae0e2): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
0.4% [0.2%, 0.8%] 42
Regressions ❌
(secondary)
1.1% [0.3%, 2.8%] 27
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-49.7% [-86.8%, -0.9%] 11
All ❌✅ (primary) 0.4% [0.2%, 0.8%] 42

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
1.2% [0.6%, 1.6%] 4
Regressions ❌
(secondary)
1.5% [1.5%, 1.5%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.7% [-2.6%, -0.7%] 2
All ❌✅ (primary) 1.2% [0.6%, 1.6%] 4

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.6% [2.4%, 2.8%] 3
Improvements ✅
(primary)
-2.5% [-2.5%, -2.5%] 1
Improvements ✅
(secondary)
-21.1% [-76.4%, -1.6%] 21
All ❌✅ (primary) -2.5% [-2.5%, -2.5%] 1

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Sep 12, 2022
@cjgillot
Copy link
Contributor Author

@craterbot check

@craterbot

This comment was marked as outdated.

@craterbot craterbot added S-waiting-on-crater Status: Waiting on a crater run to be completed. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 12, 2022
@craterbot

This comment was marked as outdated.

@craterbot
Copy link
Collaborator

🎉 Experiment pr-101692 is completed!
📊 11235 regressed and 6 fixed (243276 total)
📰 Open the full report.

⚠️ If you notice any spurious failure please add them to the blacklist!
ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Sep 15, 2022
@cjgillot cjgillot force-pushed the generator-lazy-witness branch 2 times, most recently from 36cbe4d to f2506c0 Compare September 20, 2022 19:56
@rust-log-analyzer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (6cd6bad): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.6% [0.4%, 0.6%] 14
Regressions ❌
(secondary)
2.2% [0.3%, 6.0%] 22
Improvements ✅
(primary)
-0.4% [-0.7%, -0.2%] 12
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-0.7%, 0.6%] 26

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.5% [2.5%, 2.5%] 1
Regressions ❌
(secondary)
2.3% [2.3%, 2.3%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.4% [-1.4%, -1.4%] 1
All ❌✅ (primary) 2.5% [2.5%, 2.5%] 1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.5% [2.3%, 4.8%] 7
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

@cjgillot cjgillot deleted the generator-lazy-witness branch January 28, 2023 10:54
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 29, 2023
…iler-errors

Only compute mir_generator_witnesses query in drop_tracking_mir mode.

Attempt to fix the perf regression in rust-lang#101692

r? `@ghost`
cjgillot added a commit to cjgillot/rust that referenced this pull request Jan 29, 2023
…ess, r=oli-obk"

This reverts commit 6cd6bad, reversing
changes made to 7d4df2d.
@cjgillot
Copy link
Contributor Author

perf: I've managed to remove the 5% regression with #107406. Is left a ~1% regression on some incr-unchanged and incr-patched (measured in #107438).

@Mark-Simulacrum Mark-Simulacrum added the perf-regression-triaged The performance regression has been triaged. label Jan 31, 2023
@Mark-Simulacrum
Copy link
Member

Thanks! That seems sufficient to me, marking as triaged.

@cjgillot
Copy link
Contributor Author

#107443 has removed the remaining 1%.

flip1995 pushed a commit to flip1995/rust that referenced this pull request Feb 10, 2023
…li-obk

Compute generator saved locals on MIR

Generators are currently type-checked by introducing a `witness` type variable, which is unified with a `GeneratorWitness(captured types)` whose purpose is to ensure that the auto traits correctly migrate from the captured types to the `witness` type.  This requires computing the captured types on HIR during type-checking, only to re-do it on MIR later.

This PR proposes to drop the HIR-based computation, and only keep the MIR one.  This is done in 3 steps.
1. During type-checking, the `witness` type variable is never unified.  This allows to stall all the obligations that depend on it until the end of type-checking.  Then, the stalled obligations are marked as successful, and saved into the typeck results for later verification.
2. At type-checking writeback, `witness` is replaced by `GeneratorWitnessMIR(def_id, substs)`.  From this point on, all trait selection involving `GeneratorWitnessMIR` will fetch the MIR-computed locals, similar to what opaque types do.  There is no lifetime to be preserved here: we consider all the lifetimes appearing in this witness type to be higher-ranked.
3. After borrowck, the stashed obligations are verified against the actually computed types, in the `check_generator_obligations` query.  If any obligation was wrongly marked as fulfilled in step 1, it should be reported here.

There are still many issues:
- ~I am not too happy having to filter out some locals from the checked bounds, I think this is MIR building that introduces raw pointers polluting the analysis;~ solved by a check specific to static variables.
- the diagnostics for captured types don't show where they are used/dropped;
- I do not attempt to support chalk.

cc `@eholk` `@jyn514` for the drop-tracking work
r? `@oli-obk` as you warned me of potential unsoundness
qinheping added a commit to qinheping/kani that referenced this pull request Mar 9, 2023
(GeneratorWitnessMIR)
Compute generator saved locals on MIR rust-lang/rust#101692

(ParamEnv)
InstCombine away intrinsic validity assertions rust-lang/rust#105582

(Primitive::Pointer) abi: add AddressSpace field to Primitive::Pointer  rust-lang/rust#107248
qinheping added a commit to qinheping/kani that referenced this pull request Mar 9, 2023
(GeneratorWitnessMIR)
Compute generator saved locals on MIR rust-lang/rust#101692

(ParamEnv)
InstCombine away intrinsic validity assertions rust-lang/rust#105582

(Primitive::Pointer) abi: add AddressSpace field to Primitive::Pointer  rust-lang/rust#107248
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 21, 2023
Enable -Zdrop-tracking-mir by default

This PR enables the `drop-tracking-mir` flag by default. This flag was initially implemented in rust-lang#101692.

This flag computes auto-traits on generators based on their analysis MIR, instead of trying to compute on the HIR body. This removes the need for HIR-based drop-tracking, as we can now reuse the same code to compute generator witness types and to compute generator interior fields.
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 22, 2023
Enable -Zdrop-tracking-mir by default

This PR enables the `drop-tracking-mir` flag by default. This flag was initially implemented in rust-lang#101692.

This flag computes auto-traits on generators based on their analysis MIR, instead of trying to compute on the HIR body. This removes the need for HIR-based drop-tracking, as we can now reuse the same code to compute generator witness types and to compute generator interior fields.
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 23, 2023
Enable -Zdrop-tracking-mir by default

This PR enables the `drop-tracking-mir` flag by default. This flag was initially implemented in rust-lang#101692.

This flag computes auto-traits on generators based on their analysis MIR, instead of trying to compute on the HIR body. This removes the need for HIR-based drop-tracking, as we can now reuse the same code to compute generator witness types and to compute generator interior fields.
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 12, 2023
…try>

Support async recursive calls (as long as they have indirection)

Before rust-lang#101692, we stored coroutine witness types directly inside of the coroutine. That means that a coroutine could not contain itself (as a witness field) without creating a cycle in the type representation of the coroutine, which we detected with the `OpaqueTypeExpander`, which is used to detect cycles when expanding opaque types after that are inferred to contain themselves.

After `-Zdrop-tracking-mir` was stabilized, we no longer store these generator witness fields directly, but instead behind a def-id based query. That means there is no technical obstacle in the compiler preventing coroutines from containing themselves per se, other than the fact that for a coroutine to have a non-infinite layout, it must contain itself wrapped in a layer of allocation indirection (like a `Box`).

This means that it should be valid for this code to work:

```
async fn async_fibonacci(i: u32) -> u32 {
    if i == 0 || i == 1 {
        i
    } else {
        Box::pin(async_fibonacci(i - 1)).await
          + Box::pin(async_fibonacci(i - 2)).await
    }
}
```

Whereas previously, you'd need to coerce the future to `Pin<Box<dyn Future<Output = ...>>` before `await`ing it, to prevent the async's desugared coroutine from containing itself across as await point.

This PR does two things:
1. Remove the behavior from `OpaqueTypeExpander` where it intentionally fetches and walks through the coroutine's witness fields. This was kept around after `-Zdrop-tracking-mir` was stabilized so we would not be introducing new stable behavior, and to preserve the much better diagnostics of async recursion compared to a layout error.
2. Reworks the way we report layout errors having to do with coroutines, to make up for the diagnostic regressions introduced by (1.). We actually do even better now, pointing out the call sites of the recursion!
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 13, 2023
…try>

Support async recursive calls (as long as they have indirection)

Before rust-lang#101692, we stored coroutine witness types directly inside of the coroutine. That means that a coroutine could not contain itself (as a witness field) without creating a cycle in the type representation of the coroutine, which we detected with the `OpaqueTypeExpander`, which is used to detect cycles when expanding opaque types after that are inferred to contain themselves.

After `-Zdrop-tracking-mir` was stabilized, we no longer store these generator witness fields directly, but instead behind a def-id based query. That means there is no technical obstacle in the compiler preventing coroutines from containing themselves per se, other than the fact that for a coroutine to have a non-infinite layout, it must contain itself wrapped in a layer of allocation indirection (like a `Box`).

This means that it should be valid for this code to work:

```
async fn async_fibonacci(i: u32) -> u32 {
    if i == 0 || i == 1 {
        i
    } else {
        Box::pin(async_fibonacci(i - 1)).await
          + Box::pin(async_fibonacci(i - 2)).await
    }
}
```

Whereas previously, you'd need to coerce the future to `Pin<Box<dyn Future<Output = ...>>` before `await`ing it, to prevent the async's desugared coroutine from containing itself across as await point.

This PR does two things:
1. Remove the behavior from `OpaqueTypeExpander` where it intentionally fetches and walks through the coroutine's witness fields. This was kept around after `-Zdrop-tracking-mir` was stabilized so we would not be introducing new stable behavior, and to preserve the much better diagnostics of async recursion compared to a layout error.
2. Reworks the way we report layout errors having to do with coroutines, to make up for the diagnostic regressions introduced by (1.). We actually do even better now, pointing out the call sites of the recursion!
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 9, 2024
Support async recursive calls (as long as they have indirection)

Before rust-lang#101692, we stored coroutine witness types directly inside of the coroutine. That means that a coroutine could not contain itself (as a witness field) without creating a cycle in the type representation of the coroutine, which we detected with the `OpaqueTypeExpander`, which is used to detect cycles when expanding opaque types after that are inferred to contain themselves.

After `-Zdrop-tracking-mir` was stabilized, we no longer store these generator witness fields directly, but instead behind a def-id based query. That means there is no technical obstacle in the compiler preventing coroutines from containing themselves per se, other than the fact that for a coroutine to have a non-infinite layout, it must contain itself wrapped in a layer of allocation indirection (like a `Box`).

This means that it should be valid for this code to work:

```
async fn async_fibonacci(i: u32) -> u32 {
    if i == 0 || i == 1 {
        i
    } else {
        Box::pin(async_fibonacci(i - 1)).await
          + Box::pin(async_fibonacci(i - 2)).await
    }
}
```

Whereas previously, you'd need to coerce the future to `Pin<Box<dyn Future<Output = ...>>` before `await`ing it, to prevent the async's desugared coroutine from containing itself across as await point.

This PR does two things:
1. Only report an error if an opaque expansion cycle is detected *not* through coroutine witness fields.
    * Instead, if we find an opaque cycle through coroutine witness fields, we compute the layout of the coroutine. If that results in a cycle error, we report it as a recursive async fn.
4. Reworks the way we report layout errors having to do with coroutines, to make up for the diagnostic regressions introduced by (1.). We actually do even better now, pointing out the call sites of the recursion!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. WG-trait-system-refactor The Rustc Trait System Refactor Initiative (-Znext-solver)
Projects
None yet
Development

Successfully merging this pull request may close these issues.