-
Notifications
You must be signed in to change notification settings - Fork 777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Customize full accumulating loop for SVE #756
Conversation
note : it seems that Github Actions tests have not even started on this PR ? edit : wait, it seems this is not directly related to this PR in particular, the absence of Github Actions results predates it, it's also visible in previous PR. Let's investigate ... |
Ugh. I've checked the latest GitHub actions log. And it reports there's an error in our https://github.com/Cyan4973/xxHash/actions/runs/3426646597 It's been introduced at 058a465 which is basically follows #744 (comment) edit : Create a new issue for this problem : #757 |
Oh, it’s too bad. I just copy the actions from Yannic.
Is there any a quick fix on this?
|
Commands under I mean the following line - run: |
mkdir -p /usr/local/share/.tipi
# FIX: Hack for github action
git config --global --add safe.directory /usr/local/share/.tipi
git config --global --add safe.directory /__w/xxHash/xxHash/ should be fixed as this: - run: |
mkdir -p /usr/local/share/.tipi
# FIX: Hack for github action
git config --global --add safe.directory /usr/local/share/.tipi
git config --global --add safe.directory /__w/xxHash/xxHash/ Since I'm moving to my hometown, I can't edit the code. |
Oh, two more spaces are needed. No problem. I'll submit a pull request right now. |
Great debugging @t-mat ! |
9379c2e
to
e400b9e
Compare
XXH3_accumulate() handle the whole accumulating loop and architecture optimized code is in the mini loop of 512 bytes. But it also causes accessing memory frequently for the large block data. Now make XXH3_accumulate() as architecture optimized code. Signed-off-by: Haojian Zhuang <[email protected]> Signed-off-by: Devin Hussey <[email protected]>
With optimized full accumulating loop, the performance is improved at least 2 times. The ACC result needn't to save to stack in the full loop. And instructions of prefetching data for SVE are also used. Without this patch, the performance result is in below. === benchmarking 4 hash functions === benchmarking large inputs : from 512 bytes (log9) to 128 MB (log27) xxh3 , 1904, 2315, 2468, 2580, 2640, 2670, 2682, 2673, 2677, 2663, 2683, 2688, 2686, 2591, 2241, 2181, 2191, 2048, 2048 XXH32 , 1326, 1440, 1493, 1523, 1534, 1543, 1547, 1532, 1504, 1507, 1507, 1505, 1506, 1446, 1218, 1150, 1151, 1153, 1135 XXH64 , 2511, 2795, 2975, 3068, 3120, 3125, 3154, 3128, 3034, 3045, 3052, 3053, 3053, 2842, 2050, 1853, 1848, 1853, 1853 XXH128 , 1867, 2294, 2465, 2569, 2622, 2662, 2676, 2667, 2677, 2682, 2684, 2677, 2683, 2570, 2093, 2013, 2045, 2046, 2046 With this patch, the performance result is in below. === benchmarking 4 hash functions === benchmarking large inputs : from 512 bytes (log9) to 128 MB (log27) xxh3 , 3681, 6007, 7803, 8954, 9875, 10411, 10703, 10505, 10670, 10794, 10812, 10804, 10205, 9923, 6279, 5927, 5967, 6022, 6062 XXH32 , 1281, 1434, 1494, 1523, 1534, 1543, 1547, 1535, 1500, 1502, 1502, 1502, 1501, 1443, 1242, 1169, 1193, 1196, 1195 XXH64 , 2497, 2801, 2961, 3074, 3092, 3136, 3155, 3123, 3031, 3037, 3040, 3037, 3033, 2847, 2102, 1955, 1967, 1974, 1971 XXH128 , 3419, 5798, 7488, 8854, 9787, 10357, 10673, 10468, 10647, 10748, 10785, 10751, 10805, 9698, 6011, 5677, 5999, 6065, 6074 Signed-off-by: Haojian Zhuang <[email protected]> Signed-off-by: Devin Hussey <[email protected]>
OK. All checks have passed now. |
How about this patch set? :) |
Should I do anything for this pull request? Thanks |
Sorry @hzhuang1 , I just needed some available time to properly review the code change. I believe it's good, no modification requested. |
Thanks a lot. |
With the patch, performance is improved at least 2 times.