Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce XXH_SIZE_OPT and XXH_NO_STREAM #667

Merged
merged 3 commits into from
Jan 6, 2022
Merged

Conversation

easyaspi314
Copy link
Contributor

@easyaspi314 easyaspi314 commented Dec 23, 2021

  • XXH_SIZE_OPT is a value from 0-2 which indicates how much xxHash should care about code size, default 1 for -Os/-Oz, default 0 otherwise. It chooses a set of defaults.
    • ==1: conservative size optimizations (basically the same as what we were already doing with -Oz)
    • ==2: performance may cry
  • XXH_NO_STREAM disables the streaming API.
  • These two interact: if XXH_SIZE_OPT == 2 and XXH_NO_STREAM is not defined, XXH32 and XXH64 use the streaming API for single shot
    • TODO: apply this to XXH3 as well

 - XXH_SIZE_OPT is a value from 0-2 which indicates how much xxHash
   should care about code size, default 1 for -Os/-Oz, default 0
   otherwise
 - XXH_NO_STREAM disables the streaming API.
 - These two interact: if XXH_SIZE_OPT == 2 and XXH_NO_STREAM is not
 defined, XXH32 and XXH64 use the streaming API for single shot
    - TODO: apply this to XXH3 as well
@Cyan4973
Copy link
Owner

if XXH_SIZE_OPT == 2 and XXH_NO_STREAM is not defined, XXH32 and XXH64 use the streaming API for single shot

How much binary size is saved by this optimization ?

TODO: apply this to XXH3 as well

Do you want to do it as part of this PR ?

xxhash.h Outdated Show resolved Hide resolved
It seems that using the streaming API is not beneficial on GCC so it
isn't worth the complexity.
@easyaspi314
Copy link
Contributor Author

I checked it. While Clang seems to do very slightly better doing streamed single shot XXH3, GCC emits larger code. However that might be AArch64 Clang's identical code outliner copying the stack setup.

So, not worth the 40-50 lines to implement it.

XXH3 has a lot more code reuse than XXH32/64 so it isn't too surprising that this is the case. (Which is why I expect some more improvement once I clean up the modern XXH32/XXH64 refactor).

I also rerolled the midrange loops on >= 1.

@Cyan4973
Copy link
Owner

Cyan4973 commented Jan 6, 2022

While I believe that the objectives of the PR are fine,
I sometimes worry about the added maintenance complexity.

I see nothing "wrong" in this PR.
Just another set of build variables and #ifdef to maintain,
and while they only add a little bit of complexity,
there is a sense that little by little, as PR stack up, following the code is getting more difficult.

And unfortunately, I also have no simple counter proposal.
Sometimes, making the code more maintainable requires sweeping non-local changes for the sake of small complexity benefits.

So take it more as a "general feeling" when building future evolutions (not something specific to this PR).
I think code maintenance is a topic we must keep in mind and try to improve.

@Cyan4973 Cyan4973 merged commit b1a61df into Cyan4973:dev Jan 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants