Make is_ascii_hexdigit branchless #103024

GKFX · 2022-10-13T22:11:06Z

Relevant issue #72895.

Use a bitwise or with 0x20 to make uppercase letters lowercase before comparing against the range 'a'..='f'. This offers a significant speedup on my machine in a simple benchmark.

#![feature(test)]

extern crate test;

const RAND: &'static [u8; 500000] = include_bytes!("random.dat");

#[bench]
fn is_ascii_hex_current(bench: &mut test::Bencher) {
    bench.iter(|| {
        let mut total = 0;
        for byte in RAND.iter() {
            if byte.is_ascii_hexdigit() { total += 1; }
        }
        total
    });
}

#[bench]
fn is_ascii_hex_bitor(bench: &mut test::Bencher) {
    bench.iter(|| {
        let mut total = 0;
        for byte in RAND.iter() {
            if matches!(*byte, b'0'..=b'9') || matches!(*byte | 0x20, b'a'..=b'f') { total += 1; }
        }
        total
    });
}

test is_ascii_hex_bitor   ... bench:     222,629 ns/iter (+/- 1,426)
test is_ascii_hex_current ... bench:   1,531,543 ns/iter (+/- 25,991)

The generated code can also be seen to be branch-free: https://rust.godbolt.org/z/T57hG18hc.

(For reasons which aren't clear to me, is_ascii_alphanumeric gets optimized better despite looking very similar so I haven't made the corresponding change there.)

Bitwise-or with 0x20 before checking if character in range a-z avoids need to check if it is in range A-Z. This makes the generated code shorter and faster.

rustbot · 2022-10-13T22:11:13Z

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

rust-highfive · 2022-10-13T22:11:14Z

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

joshtriplett · 2022-10-14T11:15:34Z

library/core/src/char/methods.rs

@@ -1510,7 +1510,8 @@ impl char {
    #[rustc_const_stable(feature = "const_ascii_ctype_on_intrinsics", since = "1.47.0")]
    #[inline]
    pub const fn is_ascii_hexdigit(&self) -> bool {
-        matches!(*self, '0'..='9' | 'A'..='F' | 'a'..='f')
+        // Bitwise or can avoid need for branches in compiled code.
+        matches!(*self, '0'..='9') || matches!(*self as u32 | 0x20, 0x61..=0x66)


To make this more maintainable, how about using (b'a' as u32)..=(b'f' as u32) instead?

Unfortunately that would be a syntax error. I can't figure out a nice-looking way to use character literals here; RangeInclusive::contains isn't const, ('a' as u32 <= *self as u32 | 0x20) && ('f' as u32 >= *self as u32 | 0x20) is very long.

You can still use that range in a pattern if you define it as a const item first.

One could make it slightly more concise:

let lower = *self as u32 | 0x20; matches!(*self, '0'..='9') || (lower >= 'a' as u32 && lower <= 'f' as u32)

As an aside, there's a (currently private) const ASCII_CASE_MASK: u8 in core::num which might be preferable to a literal 0x20.

You can still use that range in a pattern if you define it as a const item first.

const LOWER_ASCII: RangeInclusive<u32> = ('a' as u32)..=('f' as u32); matches!(*c, '0'..='9') || matches!(*c as u32 | 0x20, LOWER_ASCII)

doesn't seem to compile? I can get it to work with A and F const items though which I have done.

joshtriplett · 2022-10-14T11:15:50Z

@rustbot author

library/core/src/char/methods.rs

GKFX · 2022-10-15T10:39:36Z

@rustbot ready

clarfonthey · 2022-10-15T23:58:23Z

library/core/src/char/methods.rs

    pub const fn is_ascii_hexdigit(&self) -> bool {
-        matches!(*self, '0'..='9' | 'A'..='F' | 'a'..='f')
+        // Bitwise or converts A-Z to a-z, avoiding need for branches in compiled code.
+        const A: u32 = 'a' as u32;
+        const F: u32 = 'f' as u32;
+        matches!(*self, '0'..='9') || matches!(*self as u32 | 0x20, A..=F)


Does this end up optimising the same if you refactor it to just call the u8 method? Something like:

u8::try_from(*self).map_or(false, |c| c.is_ascii_hexdigit())

Mostly because it would be nice to be able to centralise the logic in one place, rather than duplicating it twice.

No, because it has a branch for the "is it < 256?" part: https://rust.godbolt.org/z/x13Yqx7Kx

That said, something like that might actually perform really well, because it'd probably branch predict well.

For example, imagine something like this:

if *self <= 'f' { let c = u8::try_from(*this).unwrap(); c.is_ascii_hexdigit() } else { false }

Yes there's a jump, but it probably predicts great. Or at least as well as whatever jump you get from checking this condition in the first place...

As always, the complication in optimizing these is in determining the distribution of the input.

Ah, I was hoping that LLVM would be smart enough to optimise that out (since all other paths also involve a <N<256 comparison) but I guess not.

I filed an LLVM bug about this: llvm/llvm-project#60683

Even in the simpler case of is_ascii_digit it doesn't do as well as it ought to.

clarfonthey · 2022-10-16T00:17:16Z

library/core/src/num/mod.rs

@@ -688,7 +688,8 @@ impl u8 {
    #[rustc_const_stable(feature = "const_ascii_ctype_on_intrinsics", since = "1.47.0")]
    #[inline]
    pub const fn is_ascii_hexdigit(&self) -> bool {
-        matches!(*self, b'0'..=b'9' | b'A'..=b'F' | b'a'..=b'f')
+        // Bitwise or converts A-Z to a-z, avoiding need for branches in compiled code.


Minor nit, but it feels like the lack of branches still isn't clear to the passive viewer since '0'..='9' and 'a'..='f' are still two separate ranges with a gap in the middle.

Oh, and uh, that || is looking suspiciously not-branchless. Since it ends up being so in the resulting assembly, perhaps it could be reworded as |?

Reading the compiled output:

example::char_is_hex_2: mov eax, dword ptr [rdi] lea ecx, [rax - 48] cmp ecx, 10 setb cl or eax, 32 add eax, -97 cmp eax, 6 setb al or al, cl ret

Which is effectively:

(*self - 48 < 10) | ((*self | 0x20) - 97 < 6)

It's less that there are "fewer" branches in this code, but more that going from three ranges to two triggers a threshold in LLVM's side that makes it decide that branches are no longer worth it, and it removes them.

So... tying this all together, maybe the real point is not that it's branchless by itself, but that there are fewer computations overall and LLVM is likely to optimise it without branches as a result.

That seems reasonable, I can do one where all the optimization is done by hand to make it clear what the resulting assembly should be.

Sorry for pushing churn on you here, but I'm actually going to ask that you aim for a middle ground where the code is still as obvious as possible while still producing branchless assembly.

Particularly, there's no need to replace the range changes with wrapping_sub manually, as LLVM will do that quite reliably: https://rust.godbolt.org/z/rPaxbf1o7

It's normal in the library for the code to not be in the form of the expected assembly. For examples, is_power_of_two is phrased using count_ones, not the well-known bitwise tricks

rust/library/core/src/num/uint_macros.rs

Lines 2132 to 2134 in 1ca6777

pub const fn is_power_of_two(self) -> bool {

self.count_ones() == 1

}

as that leaves it up to LLVM to generate the best assembly sequence -- which will be the (x != 0) && (x & (x-1)) == 0 on i386, but libcore doesn't need to know that.

So please go back to the version with the consts you had in a previous iteration. And consider making a local for the lower-cased version of the character -- that would help localize the comment. Maybe something like

// Bitwise or converts A-Z to a-z, allowing checking for the letters with a single range. let lower = *self as u32 | num::ASCII_CASE_MASK; const A: u32 = 'a' as u32; const F: u32 = 'f' as u32; // Not using logical or because the branch isn't worth it self.is_ascii_digit() | matches!(*self as u32 | 0x20, A..=F)

You could also consider splitting the hex_letter part into a separate (non-pub) function so that is_ascii_hexdigit becomes just

// Not using logical or because the branch isn't worth it self.is_ascii_digit() | self.is_ascii_hexletter()

to isolate the conversion-and-range stuff to its own thing.

@rustbot author

Actually, it looks like even the manual masking isn't necessary.

I was trying to make a repro to file an LLVM bug that it should figure out how to do that itself, and it turns out it already can. There's just something unfortunate about the matches! going on. Because just writing out the obvious comparisons does exactly what's needed: https://rust.godbolt.org/z/cPEoa1nT8

fn is_ascii_digit(c: u8) -> bool { c >= b'0' && c <= b'9' } fn is_ascii_hexletter(c: u8) -> bool { (c >= b'a' && c <= b'f') || (c >= b'A' && c <= b'F') } pub fn is_ascii_hexdigit(c: u8) -> bool { is_ascii_digit(c) || is_ascii_hexletter(c) }

No branches, and does the subtraction and masking tricks:

define noundef zeroext i1 @_ZN7example17is_ascii_hexdigit17hc2736916760c6f12E(i8 %c) unnamed_addr #0 !dbg !6 { %0 = add i8 %c, -48, !dbg !11 %1 = icmp ult i8 %0, 10, !dbg !11 %2 = and i8 %c, -33, !dbg !14 %3 = add i8 %2, -65, !dbg !14 %4 = icmp ult i8 %3, 6, !dbg !14 %.0 = select i1 %1, i1 true, i1 %4, !dbg !14 ret i1 %.0, !dbg !15 }

(it picks & !32 instead of | 32, but same thing.)

I might still say to use | there, though, to get or i1 %1, %4 instead of the select, even though the x86 backend appears to use an or anyway.

Now I guess I need to figure out what's going wrong in matches!...

Hmm, it's just something about the or pattern in matches!. This is also branchless: https://rust.godbolt.org/z/9xqT38rz9

fn is_ascii_digit(c: u8) -> bool { matches!(c, b'0'..=b'9') } fn is_ascii_hexletter(c: u8) -> bool { matches!(c, b'A'..=b'F') | matches!(c, b'a'..=b'f') } pub fn is_ascii_hexdigit(c: u8) -> bool { is_ascii_digit(c) | is_ascii_hexletter(c) }

(And yup, it works for char too: https://rust.godbolt.org/z/WvfsjMG6a.)

So I think, in the end, the right answer here is to just replace some short-circuiting rust operations with non-short circuiting ones (the logical or and pattern or each with a bitwise or instead) and call it a day, as that gives exactly the desired output.

Loving the investigative work here. Perhaps this is an actual codegen issue, since matches!(x, P) | matches!(x, Q) should ideally generate code rather close to matches!(x, P | Q). It feels like more of a codegen quirk than an LLVM optimizer quirk, but I could be wrong.

I dug in some more, and it's actually more interesting than I expected!

It looks like with the or-pattern it's tripping a different technique in LLVM. https://rust.godbolt.org/z/zdW1T43T3

Starting with the nice obvious

pub fn is_ascii_hexletter(c: char) -> bool { matches!(c, 'A'..='F' | 'a'..='f') }

SimplifyCfg and SROA get it down to the nice simple short-circuiting

define noundef zeroext i1 @example::is_ascii_hexletter(i32 noundef %c) unnamed_addr { start: %_4 = icmp ule i32 65, %c %_5 = icmp ule i32 %c, 70 %or.cond = and i1 %_4, %_5 br i1 %or.cond, label %bb5, label %bb2 bb2: ; preds = %start %_2 = icmp ule i32 97, %c %_3 = icmp ule i32 %c, 102 %or.cond1 = and i1 %_2, %_3 br i1 %or.cond1, label %bb5, label %bb4 bb5: ; preds = %bb2, %start br label %bb6 bb4: ; preds = %bb2 br label %bb6 bb6: ; preds = %bb4, %bb5 %.0 = phi i8 [ 1, %bb5 ], [ 0, %bb4 ] %0 = trunc i8 %.0 to i1 ret i1 %0 }

and InstCombine does the "I know how to simplify range checks like that" rewrite:

define noundef zeroext i1 @example::is_ascii_hexletter(i32 noundef %c) unnamed_addr { start: %0 = add i32 %c, -65 %1 = icmp ult i32 %0, 6 br i1 %1, label %bb5, label %bb2 bb2: ; preds = %start %2 = add i32 %c, -97 %3 = icmp ult i32 %2, 6 br i1 %3, label %bb5, label %bb4 bb5: ; preds = %bb2, %start br label %bb6 bb4: ; preds = %bb2 br label %bb6 bb6: ; preds = %bb4, %bb5 %.0 = phi i1 [ true, %bb5 ], [ false, %bb4 ] ret i1 %.0 }

But then something really interesting happens, and SimplifyCfg says "wait, that looks like a lookup table!", giving

define noundef zeroext i1 @example::is_ascii_hexletter(i32 noundef %c) unnamed_addr { start: switch i32 %c, label %bb4 [ i32 102, label %bb6 i32 101, label %bb6 i32 100, label %bb6 i32 99, label %bb6 i32 98, label %bb6 i32 97, label %bb6 i32 70, label %bb6 i32 69, label %bb6 i32 68, label %bb6 i32 67, label %bb6 i32 66, label %bb6 i32 65, label %bb6 ] bb4: ; preds = %start br label %bb6 bb6: ; preds = %start, %start, %start, %start, %start, %start, %start, %start, %start, %start, %start, %start, %bb4 %.0 = phi i1 [ false, %bb4 ], [ true, %start ], [ true, %start ], [ true, %start ], [ true, %start ], [ true, %start ], [ true, %start ], [ true, %start ], [ true, %start ], [ true, %start ], [ true, %start ], [ true, %start ], [ true, %start ] ret i1 %.0 }

Then later SimplifyCfg looks at that again and say "wait, that's a weird switch, how about I do that with a shift instead?" by encoding the table into an i38 (because 'A' through 'f' is 38 values):

define noundef zeroext i1 @example::is_ascii_hexletter(i32 noundef %c) unnamed_addr { start: %switch.tableidx = sub i32 %c, 65 %0 = icmp ult i32 %switch.tableidx, 38 %switch.cast = zext i32 %switch.tableidx to i38 %switch.shiftamt = mul i38 %switch.cast, 1 %switch.downshift = lshr i38 -4294967233, %switch.shiftamt %switch.masked = trunc i38 %switch.downshift to i1 %.0 = select i1 %0, i1 %switch.masked, i1 false ret i1 %.0 }

which is also branchless. Just not quite as good a way.

This should compile to the same thing as the previous commit (at a suitable optimization level) but makes it very clear what is intended.

scottmcm

Sorry, I can see how this got lost in the long conversation, but based on my investigation in #103024 (comment) this can be written without needing to manually do the wrapping_sub checks -- LLVM will happily do that itself, given the opportunity.

So as described in that comment, I'd still like to see it written in a more obviously-correct rust way that still optimizes well, something like https://rust.godbolt.org/z/eoaq4c37e.

@rustbot author

the8472 · 2023-02-21T20:03:13Z

Since we're trying to please LLVM, should this have a codegen test?

Dylan-DPC · 2023-05-17T06:28:02Z

@GKFX any updates on this?

JohnCSimon · 2023-08-27T06:15:55Z

@GKFX

Ping from triage: I'm closing this due to inactivity, Please reopen when you are ready to continue with this.
Note: if you are going to continue please open the PR BEFORE you push to it, else you won't be able to reopen - this is a quirk of github.
Thanks for your contribution.

@rustbot label: +S-inactive

Refactor some `char`, `u8` ASCII functions to be branchless Extract conditions in singular `matches!` with or-patterns to individual `matches!` statements which enables branchless code output. The following functions were changed: - `is_ascii_alphanumeric` - `is_ascii_hexdigit` - `is_ascii_punctuation` Added codegen tests --- Continued from rust-lang#103024. Based on the comment from `@scottmcm` rust-lang#103024 (review). The unmodified `is_ascii_*` functions didn't seem to benefit from extracting the conditions. I've never written a codegen test before, but I tried to check that no branches were emitted.

Make is_ascii_hexdigit branchless

acb42cf

Bitwise-or with 0x20 before checking if character in range a-z avoids need to check if it is in range A-Z. This makes the generated code shorter and faster.

rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Oct 13, 2022

rust-highfive assigned Mark-Simulacrum Oct 13, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 13, 2022

Fix typo in range

5b624ac

joshtriplett reviewed Oct 14, 2022

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 14, 2022

scottmcm reviewed Oct 14, 2022

View reviewed changes

library/core/src/char/methods.rs Outdated Show resolved Hide resolved

Clarify comment and use character literals in char::is_ascii_hexdigit

f624897

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 15, 2022

clarfonthey reviewed Oct 15, 2022

View reviewed changes

clarfonthey reviewed Oct 16, 2022

View reviewed changes

Mark-Simulacrum assigned scottmcm and unassigned Mark-Simulacrum Oct 16, 2022

Fully optimize is_ascii_hexdigit by hand

8d1d17f

This should compile to the same thing as the previous commit (at a suitable optimization level) but makes it very clear what is intended.

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 23, 2022

Dylan-DPC added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 13, 2022

scottmcm requested changes Jan 14, 2023

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 14, 2023

scottmcm mentioned this pull request Feb 23, 2023

&usize::<= is much slower than usize::<= #105259

Open

JohnCSimon closed this Aug 27, 2023

rustbot added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Aug 27, 2023

okaneco mentioned this pull request Oct 27, 2023

Refactor some char, u8 ASCII functions to be branchless #117260

Merged

scottmcm mentioned this pull request Nov 22, 2023

Fix some clippy lints for library/std/src/sys/windows #118154

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make is_ascii_hexdigit branchless #103024

Make is_ascii_hexdigit branchless #103024

GKFX commented Oct 13, 2022 •

edited

Loading

rustbot commented Oct 13, 2022

rust-highfive commented Oct 13, 2022

joshtriplett Oct 14, 2022

GKFX Oct 14, 2022

cuviper Oct 14, 2022

eggyal Oct 14, 2022 •

edited

Loading

GKFX Oct 14, 2022

joshtriplett commented Oct 14, 2022

GKFX commented Oct 15, 2022

clarfonthey Oct 15, 2022 •

edited

Loading

scottmcm Oct 16, 2022 •

edited

Loading

clarfonthey Oct 16, 2022

scottmcm Feb 12, 2023

clarfonthey Oct 16, 2022

GKFX Oct 22, 2022

scottmcm Oct 23, 2022

scottmcm Oct 24, 2022

scottmcm Oct 24, 2022 •

edited

Loading

clarfonthey Oct 24, 2022

scottmcm Oct 24, 2022

scottmcm left a comment

the8472 commented Feb 21, 2023

Dylan-DPC commented May 17, 2023

JohnCSimon commented Aug 27, 2023

	pub const fn is_power_of_two(self) -> bool {
	self.count_ones() == 1
	}

Make is_ascii_hexdigit branchless #103024

Make is_ascii_hexdigit branchless #103024

Conversation

GKFX commented Oct 13, 2022 • edited Loading

rustbot commented Oct 13, 2022

rust-highfive commented Oct 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eggyal Oct 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshtriplett commented Oct 14, 2022

GKFX commented Oct 15, 2022

clarfonthey Oct 15, 2022 • edited Loading

Choose a reason for hiding this comment

scottmcm Oct 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottmcm Oct 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottmcm left a comment

Choose a reason for hiding this comment

the8472 commented Feb 21, 2023

Dylan-DPC commented May 17, 2023

JohnCSimon commented Aug 27, 2023

GKFX commented Oct 13, 2022 •

edited

Loading

eggyal Oct 14, 2022 •

edited

Loading

clarfonthey Oct 15, 2022 •

edited

Loading

scottmcm Oct 16, 2022 •

edited

Loading

scottmcm Oct 24, 2022 •

edited

Loading