Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM failed to use the knowledge from a never-overflow assumption #509

Open
dtcxzyw opened this issue Mar 5, 2024 · 13 comments
Open

LLVM failed to use the knowledge from a never-overflow assumption #509

dtcxzyw opened this issue Mar 5, 2024 · 13 comments

Comments

@dtcxzyw
Copy link

dtcxzyw commented Mar 5, 2024

Hi all, I am an LLVM developer working on the middle-end optimizations.

Recently I found an anti-pattern in some Rust applications from my benchmark. All of them come from hashbrown::raw::RawTableInner::free_buckets:

hashbrown/src/raw/mod.rs

Lines 3290 to 3294 in 274c7bb

// Avoid `Option::unwrap_or_else` because it bloats LLVM IR.
let (layout, ctrl_offset) = match table_layout.calculate_layout_for(self.buckets()) {
Some(lco) => lco,
None => unsafe { hint::unreachable_unchecked() },
};

LLVM IR:

define i32 @src(i32 %x, i32 %y) {
  %res = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %x, i32 %y)
  %ov = extractvalue { i32, i1 } %res, 1
  %nowrap = xor i1 %ov, true
  tail call void @llvm.assume(i1 %nowrap)
  %val = extractvalue { i32, i1 } %res, 0
  ret i32 %val
}
define i32 @tgt(i32 %x, i32 %y) {
  %res = mul nuw i32 %x, %y
  ret i32 %res
}

Alive2: https://alive2.llvm.org/ce/z/YBRjxc

Currently LLVM cannot use the knowledge from a never-overflow assumption. I have a draft patch for the fold and it enables more optimizations in downstream users of hashbrown. But I am not willing to do an application-specific optimization in LLVM. So I file this issue to see whether we can fix it in hashbrown.

cc @nikic @DianQK

@DianQK
Copy link
Member

DianQK commented Mar 5, 2024

Could you further explain why alive2 proves the correctness of the transformation, but we cannot transform it in LLVM?
BTW, we should probably find a reduce example to report at rust. But I can also do this later.

@dtcxzyw
Copy link
Author

dtcxzyw commented Mar 5, 2024

but we cannot transform it in LLVM?

It is just not implemented. But I am not willing to do this in LLVM. We should fix it in hashbrown or rustc.

@dtcxzyw
Copy link
Author

dtcxzyw commented Mar 5, 2024

An example from tokio-rs: https://godbolt.org/z/Gr55dfMc8

@nikic
Copy link

nikic commented Mar 5, 2024

Looks like the same as llvm/llvm-project#80637. As I noted there, this technically loses information, but I guess in practice it's a lot better to replace with mul nuw here. I wouldn't be opposed to doing that in LLVM, as this kind of pattern is probably somewhat common. It arises whenever you have uses of checked_* in some method, and then it gets used in multiple places, some of them checking the result, and the others doing something unwrap_unchecked style. It avoids the duplication of implementing it both with checked_* and unchecked_* arithmetic.

@dtcxzyw Do you have some example where the replacement enables follow-up optimization in practice? As far as I can tell your examples will be lowered to the same asm after assumes are dropped in CGP.

@dtcxzyw
Copy link
Author

dtcxzyw commented Mar 5, 2024

@nikic See the IR diff in https://github.com/dtcxzyw/llvm-opt-benchmark/pull/289/files:
bench/diesel-rs/optimized/1hskzwx2vflsavf7.ll: https://godbolt.org/z/8e6qd9Gjo
bench/image-rs/optimized/2mngkegtim1o10y3.ll: enables factorization: (%2505 + 1) * 392 + 15 --> %2505 * 392 + 407

@DianQK
Copy link
Member

DianQK commented Mar 6, 2024

It arises whenever you have uses of checked_* in some method, and then it gets used in multiple places, some of them checking the result, and the others doing something unwrap_unchecked style. It avoids the duplication of implementing it both with checked_* and unchecked_* arithmetic.

As @nikic said, we want to reuse calculate_layout_for.

hashbrown/src/raw/mod.rs

Lines 1671 to 1674 in 274c7bb

match Self::TABLE_LAYOUT.calculate_layout_for(self.table.buckets()) {
Some(lco) => lco,
None => unsafe { hint::unreachable_unchecked() },
};

hashbrown/src/raw/mod.rs

Lines 1745 to 1748 in 274c7bb

let (layout, ctrl_offset) = match table_layout.calculate_layout_for(buckets) {
Some(lco) => lco,
None => return Err(fallibility.capacity_overflow()),
};

@RalfJung
Copy link
Member

But I am not willing to do an application-specific optimization in LLVM. So I file this issue to see whether we can fix it in hashbrown.

So as argued above by others, this is not an application-specific pattern. It is a pretty natural pattern for Rust programs: one has a checked operation that can fail in a safe way, and then one turns that into the unchecked variant (with UB on failure) by doing unwrap_unchecked() (which expands to the match statement from the issue description).

I'm not sure how else you are suggesting that code should be written, but it'd clearly be bad if calculate_layout_for had to be written twice.

@workingjubilee
Copy link
Member

Even aside from the pattern overall being visible in Rust code beyond hashbrown, most of the code in hashbrown is transitively included in every single Rust program that uses a hashmap, so "application-specific" is an amusing way to describe "probably 20% of all Rust programs ever? higher, even?"

@workingjubilee
Copy link
Member

@dtcxzyw Most of the callers that eventually arrived in hashbrown::raw::RawTableInner::free_buckets had std::collections::HashMap::some_fn higher on the stack, right? (I don't know what your tooling is like as to whether it can make this kind of thing obvious)

@dtcxzyw
Copy link
Author

dtcxzyw commented Apr 19, 2024

@dtcxzyw Most of the callers that eventually arrived in hashbrown::raw::RawTableInner::free_buckets had std::collections::HashMap::some_fn higher on the stack, right? (I don't know what your tooling is like as to whether it can make this kind of thing obvious)

Yeah.

Even aside from the pattern overall being visible in Rust code beyond hashbrown, most of the code in hashbrown is transitively included in every single Rust program that uses a hashmap, so "application-specific" is an amusing way to describe "probably 20% of all Rust programs ever? higher, even?"

I believe there is a better way to fix this issue by adding some MIR optimizations or modifying hashbrown itself. It is tricky to handle it in LLVM :(
See llvm/llvm-project#84016.

@nikic
Copy link

nikic commented Apr 19, 2024

Even aside from the pattern overall being visible in Rust code beyond hashbrown, most of the code in hashbrown is transitively included in every single Rust program that uses a hashmap, so "application-specific" is an amusing way to describe "probably 20% of all Rust programs ever? higher, even?"

I think you're missing the point here. When it comes to optimization issues, in Rust we are in the very fortunate position that the issue can be addressed either by adding new optimizations, or by changing library code to be more amenable to optimization. The fact that we can change the hashbrown implementation and have ~all Rust code benefit is the point, not an analysis mistake.

<rant> This is contrary to the stupidity prevalent in C/C++, where an incredible amount of compiler engineering resources go into implementing completely ridiculous optimizations that target specific patterns in SPEC benchmarks, where the proper thing to do would have been to just change the benchmark code. </rant>

(This is just for context -- as said above, I'm not convinced that modifying hashbrown is indeed the best solution here.)

@scottmcm
Copy link
Member

It is tricky to handle it in LLVM :(

Are there any small tweaks that we could do to what we emit to make it easier?

For example, I see that the current u64::checked_mul looks like this

define { i64, i64 } @demo_checked_mul(i64 noundef %a, i64 noundef %b) unnamed_addr #0 !dbg !7 {
start:
  %0 = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b), !dbg !12
  %_6.1 = extractvalue { i64, i1 } %0, 1, !dbg !12
  %not._6.1 = xor i1 %_6.1, true, !dbg !25
  %. = zext i1 %not._6.1 to i64, !dbg !25
  %_6.0 = extractvalue { i64, i1 } %0, 0, !dbg !12
  %1 = insertvalue { i64, i64 } poison, i64 %., 0, !dbg !28
  %2 = insertvalue { i64, i64 } %1, i64 %_6.0, 1, !dbg !28
  ret { i64, i64 } %2, !dbg !28
}

which definitely can't be further optimized because there's nothing telling LLVM that second part of the pair will be unused.

Inspired by rust-lang/rust#124114 I started imagining something like

  %product_or_undef = select i1 %_6.1, i64 undef, i64 %_6.0
  %1 = insertvalue { i64, i64 } poison, i64 %., 0
  %2 = insertvalue { i64, i64 } %1, i64 %product_or_undef, 1

but it looks like that's not actually useful because it gets immediately optimized away again.

@scottmcm
Copy link
Member

scottmcm commented Apr 19, 2024

One thing you might try on the hashbrown side: thanks to llvm/llvm-project#56563, it seemed like using wider types for the isize::MAX check actually codegen'd well in rust-lang/rust#100866, just was a bit slower to compile.

Rather than doing all this math as checked in

hashbrown/src/raw/mod.rs

Lines 261 to 263 in 274c7bb

let ctrl_offset =
size.checked_mul(buckets)?.checked_add(ctrl_align - 1)? & !(ctrl_align - 1);
let len = ctrl_offset.checked_add(buckets + Group::WIDTH)?;

maybe you could try writing it in u128 where I think it can't overflow? With the later isize::MAX check it might optimize better than one would first guess. Certainly the first mul there is nuw nsw in i128.

Alternatively, maybe there'd be a way to shift by the log of the buckets, since it said it's always a power of two, instead of multiplying? It looks like the function doesn't do anything to tell LLVM that it's actually a power-of-two, so it probably has no way of being smart about understanding & !(ctrl_align - 1) either. (cc llvm/llvm-project#58996)

I wonder if making buckets be NonZeroUsize could do anything...


(Aside: I should push on https://doc.rust-lang.org/std/primitive.usize.html#method.carrying_mul again...)

bors added a commit to rust-lang-ci/rust that referenced this issue Jun 23, 2024
…=<try>

Also get `add nuw` from `uN::checked_add`

When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang#124114 (comment) that it'd be worth trying for `checked_add` too.

It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead.

cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before

cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 23, 2024
…=<try>

Also get `add nuw` from `uN::checked_add`

When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang#124114 (comment) that it'd be worth trying for `checked_add` too.

It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead.

cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before

cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 24, 2024
…=Amanieu

Also get `add nuw` from `uN::checked_add`

When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang#124114 (comment) that it'd be worth trying for `checked_add` too.

It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead.

cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before

cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 25, 2024
…=Amanieu

Also get `add nuw` from `uN::checked_add`

When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang#124114 (comment) that it'd be worth trying for `checked_add` too.

It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead.

cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before

cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
tiif pushed a commit to tiif/miri that referenced this issue Jun 25, 2024
Also get `add nuw` from `uN::checked_add`

When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang/rust#124114 (comment) that it'd be worth trying for `checked_add` too.

It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead.

cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before

cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
flip1995 pushed a commit to flip1995/rust-clippy that referenced this issue Jun 28, 2024
Also get `add nuw` from `uN::checked_add`

When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang/rust#124114 (comment) that it'd be worth trying for `checked_add` too.

It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead.

cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before

cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
lnicola pushed a commit to lnicola/rust-analyzer that referenced this issue Jul 11, 2024
Also get `add nuw` from `uN::checked_add`

When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang/rust#124114 (comment) that it'd be worth trying for `checked_add` too.

It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead.

cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before

cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants