Permit MIR inlining without #[inline] #109247

saethlin · 2023-03-17T02:20:26Z

I noticed that there are at least a handful of portable-simd functions that have no #[inline] but compile to an assign + return.

I locally benchmarked inlining thresholds between 0 and 50 in increments of 5, and 50 seems to be the best. Interesting. That didn't include check builds though, ~~maybe perf will have something to say about that~~.

Perf has little useful to say about this. We generally regress all the check builds, as best as I can tell, due to a number of small codegen changes in a particular hot function in the compiler. Probably this is because we've nudged the inlining outcomes all over, and uses of #[inline(always)]/#[inline(never)] might need to be adjusted.

saethlin · 2023-03-18T03:16:26Z

@bors try @rust-timer queue

bors · 2023-03-18T04:20:56Z

⌛ Trying commit f5f1cb189a638607152e449c02d83dfce67f9d83 with merge 2dffcafbed22d039709534ae1a5078d9202122ff...

bors · 2023-03-18T07:01:19Z

☀️ Try build successful - checks-actions
Build commit: 2dffcafbed22d039709534ae1a5078d9202122ff (2dffcafbed22d039709534ae1a5078d9202122ff)

rust-timer · 2023-03-18T10:53:08Z

Finished benchmarking commit (2dffcafbed22d039709534ae1a5078d9202122ff): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.2%	[0.4%, 25.7%]	75
Regressions ❌ (secondary)	1.1%	[0.2%, 4.2%]	46
Improvements ✅ (primary)	-2.0%	[-6.2%, -0.3%]	50
Improvements ✅ (secondary)	-8.1%	[-43.8%, -0.3%]	27
All ❌✅ (primary)	-0.1%	[-6.2%, 25.7%]	125

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.8%	[0.7%, 6.3%]	9
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-4.3%	[-10.9%, -0.4%]	16
Improvements ✅ (secondary)	-4.5%	[-7.3%, -2.4%]	4
All ❌✅ (primary)	-1.8%	[-10.9%, 6.3%]	25

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	13.3%	[0.7%, 26.0%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-3.1%	[-7.2%, -0.5%]	34
Improvements ✅ (secondary)	-10.4%	[-46.7%, -2.4%]	21
All ❌✅ (primary)	-2.2%	[-7.2%, 26.0%]	36

saethlin · 2023-03-18T16:31:21Z

The perf summary above looks more negative than the actual report page. The overall effect is dragged down a lot by the fact that this regresses a function in rustc_query_system::dep_graph. Going to look into this for ~1 day to see if I can easily prod codegen back to what it was before.

saethlin · 2023-03-18T17:26:08Z

@bors try @rust-timer queue

bors · 2023-03-18T17:26:16Z

⌛ Trying commit 9373b9301ae612f830885f227fadf2e77ad1b0e8 with merge 9615c8e881c7c418a670d586a2fe748104c298fe...

bors · 2023-03-18T20:06:36Z

☀️ Try build successful - checks-actions
Build commit: 9615c8e881c7c418a670d586a2fe748104c298fe (9615c8e881c7c418a670d586a2fe748104c298fe)

nnethercote · 2023-04-11T21:43:39Z

This looks ok to me. There are some concern about check builds regressing instruction counts, but for changes like this that affect lots of code, it's equally/more important to look at cycles and wall-times. Those improve for check builds. (Select "Show non-relevant results" for the broad view.)

oli-obk · 2023-04-16T13:33:40Z

tests/codegen/array-map.rs

-    // CHECK: sub <8 x i32> %[[A]], %[[B]]
+    // CHECK: sub <8 x i32> %[[B]], %[[A]]


uh, what happened here? Is this just a reordering of the loads but not a reordering of the actual subtraction?

r=me after making sure that this test doesn't get miscompiled

I don't know exactly what is going on here but the test isn't miscompiled. It's our aggressive abbreviations in this codegen test that make it look suspicious. Using -Zmir-opt-level=3 to simulate the effect of this PR because it lifts the inline attribute requirement: https://godbolt.org/z/EP7aTcz1x

Before:

define void @_ZN7example21short_integer_zip_map17hbd5b1e22189a0746E(ptr noalias nocapture noundef writeonly sret([8 x i32]) dereferenceable(32) %0, ptr noalias nocapture noundef readonly dereferenceable(32) %x, ptr noalias nocapture noundef readonly dereferenceable(32) %y) unnamed_addr #0 personality ptr @rust_eh_personality !dbg !6 { %1 = load <8 x i32>, ptr %x, align 4, !dbg !11 %2 = load <8 x i32>, ptr %y, align 4, !dbg !11 %3 = sub <8 x i32> %1, %2, !dbg !12 store <8 x i32> %3, ptr %0, align 4, !dbg !62, !alias.scope !63, !noalias !66 ret void, !dbg !68 }

After:

define void @_ZN7example21short_integer_zip_map17hbd5b1e22189a0746E(ptr noalias nocapture noundef writeonly sret([8 x i32]) dereferenceable(32) %0, ptr noalias nocapture noundef readonly dereferenceable(32) %x, ptr noalias nocapture noundef readonly dereferenceable(32) %y) unnamed_addr #0 personality ptr @rust_eh_personality !dbg !6 { %1 = load <8 x i32>, ptr %y, align 4, !dbg !11 %2 = load <8 x i32>, ptr %x, align 4, !dbg !18 %3 = sub <8 x i32> %2, %1, !dbg !19 store <8 x i32> %3, ptr %0, align 4, !dbg !65, !alias.scope !66, !noalias !69 ret void, !dbg !71 }

oli-obk · 2023-04-16T19:44:44Z

@bors r+

bors · 2023-04-16T19:44:45Z

📌 Commit 3541775 has been approved by oli-obk

It is now in the queue for this repository.

bors · 2023-04-17T02:36:41Z

⌛ Testing commit 3541775 with merge 5546cb6...

bors · 2023-04-17T04:42:59Z

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing 5546cb6 to master...

rust-timer · 2023-04-17T05:59:32Z

Finished benchmarking commit (5546cb6): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.0%	[0.2%, 25.6%]	135
Regressions ❌ (secondary)	1.4%	[0.2%, 8.8%]	97
Improvements ✅ (primary)	-1.8%	[-5.3%, -0.3%]	52
Improvements ✅ (secondary)	-9.6%	[-43.2%, -1.2%]	21
All ❌✅ (primary)	0.2%	[-5.3%, 25.6%]	187

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.3%	[0.7%, 9.8%]	8
Regressions ❌ (secondary)	3.3%	[2.2%, 4.1%]	4
Improvements ✅ (primary)	-4.1%	[-6.0%, -1.9%]	11
Improvements ✅ (secondary)	-4.1%	[-6.0%, -2.2%]	2
All ❌✅ (primary)	-1.0%	[-6.0%, 9.8%]	19

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	14.0%	[1.2%, 26.9%]	2
Regressions ❌ (secondary)	3.0%	[2.3%, 4.1%]	9
Improvements ✅ (primary)	-2.8%	[-5.4%, -1.0%]	35
Improvements ✅ (secondary)	-10.9%	[-45.7%, -1.0%]	19
All ❌✅ (primary)	-1.9%	[-5.4%, 26.9%]	37

rylev · 2023-04-19T08:12:14Z

@saethlin @compiler-errors looks like perf may be worse than in the original perf run before merge. Of particular concern is that the cargo incr-patched: println test case has regressed significantly (42% increase in wall time). Is it perhaps worth taking another look at the performance impact of this?

cc @nnethercote

saethlin · 2023-04-19T13:24:01Z

That case was present before merge and has been discussed in this thread. Do you disagree with the statements made above?

… r=cjgillot Don't expect normalization to succeed in elaborate_drops Fixes rust-lang#110682 This was exposed through the changes in rust-lang#109247, which causes more things to be inlined. Inlining can happen before monomorphization, so we can't expect normalization to succeed. In the elaborate_drops analysis we currently have [this call](https://github.com/rust-lang/rust/blob/033aa092ab23ba14cdad27073c5e37ba0eddb428/compiler/rustc_mir_dataflow/src/elaborate_drops.rs#L278) to `normalize_erasing_regions`, which ICEs when normalization fails. The types are used to infer [whether the type needs a drop](https://github.com/rust-lang/rust/blob/033aa092ab23ba14cdad27073c5e37ba0eddb428/compiler/rustc_mir_dataflow/src/elaborate_drops.rs#L374), where `needs_drop` itself [uses `try_normalize_erasing_regions`](https://github.com/rust-lang/rust/blob/033aa092ab23ba14cdad27073c5e37ba0eddb428/compiler/rustc_middle/src/ty/util.rs#L1121). ~[`instance_mir`](https://doc.rust-lang.org/stable/nightly-rustc/rustc_middle/ty/context/struct.TyCtxt.html#method.instance_mir) isn't explicit about whether it expects the instances corresponding to the `InstanceDef`s to be monomorphized (though I think in all other contexts the function is used post-monomorphization), so the use of `instance_mir` in inlining doesn't necessarily seem wrong to me.~

Add #[inline] to as_deref While working on rust-lang/rust#109247 I found an `as_deref` call in the compiler that should have been inlined. This fixes the missing inlining (but doesn't address the perf issues I was chasing). r? `@thomcc`

saethlin added the A-mir-opt Area: MIR optimizations label Mar 17, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 17, 2023

saethlin added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 17, 2023