Xxhash: Update speed comparisons with crc32

Created on 24 Apr 2016  ·  6Comments  ·  Source: Cyan4973/xxHash

If you use the crc32 instruction properly, available since Nehalem (SSE 4.2), you can achieve throughput of 1.17 cycles per 8 bytes, which would be a theoretical performance of 20.5 GB/s on a 3ghz processor, under idealistic conditions. Source: http://www.drdobbs.com/parallel/fast-parallelized-crc-computation-using/229401411?pgno=2

Googling a little brings up this SO question, which quotes 20GB/s throughput, which matches up to the theoretical numbers very nicely: http://stackoverflow.com/questions/17645167/implementing-sse-4-2s-crc32c-in-software

Could you make a little note that hardware crc32 is actually ~3x faster than xxhash? That's not to say it's a more suitable hash algorithm, but I wasted considerable time considering a vectorized xxhash vs crc32 for checksum purposes, before I realized I couldn't come close to crc32 in performance.

question

Most helpful comment

Thanks!
Binaries are 64bit.
XXH3_64bit definitely performs better. Without using NEON vectorization I am getting 3.9GB/s and with NEON close to 4.0 GB/s. Still CRC32 using vmull_p64 has slightly better throughput though.

All 6 comments

Problem is,
benchmark was run on a Core 2 Duo @3GHz.
This CPU doesn't support hardware crc32c.

Also, crc32 and crc32c are similar but different algorithms : you won't get the same results.
crc32 is widely used, crc32c much less so. Such naming confusion can induce non-trivial interoperability issues.

The crc32 version benched here is the one provided within smasher test suite.
Faster versions exist, including vectorized ones.
It requires to modify the test suite to integrate them.

If you can tolerate Intel dependency for your application, and can guarantee all your client cpus are recent enough (which is reasonable in 2016), you can then use hardware crc32c, it's indeed very fast.

xxHash was created in a different context, using a cpu without this capability, and with intended goal of maximum portability, well beyond Intel's realm (arm, mips, power, etc.). Hence no reliance on brand-specific features.

@Cyan4973
As a really late follow-up, could you provide some insights on the new XXH3 as compared to crc32c?

Hardware crc32c by itself is not competitive. While it's certainly faster than software crc32, it cannot keep up with ILP, which most modern hash algorithms use.

However, multiple crc32c channels in parallel can be more efficient. In which case, the exact outcome is implementation-dependent. Many implementations can be found over Internet. I have found several implementations which can best XXH64 speed, but none yet that can best XXH3. Maybe it's a matter of searching more.

CRC32 and CRC32C can be very efficiently implemented using Intel's pclmulqdq or ARMv8 CLMUL instructions.

Some time ago I put together couple of ARM implementations using CRC32 and CLMUL instructions and thier speeds are floating around 4.1GB/s on rk3399. Now I compared them with xxh32 and xxh64 and got 3.5GB/s and 2.5GB/s respectively.

Is it expected that xxh64 is slower than xxh32 on ARMv8, or there is something wrong?

It's expected that xxh64 is slower than xxh32 on 32-bit binaries.
On 64-bit binaries though, this is less likely, but one can still imagine a chip which features weak/slow 64-bit multiply instructions, much slower than 32-bit ones, in which case, it's possible.
Unfortunately, ARMv8 family of chips is quite large, and each chip can feature very different performance trade-off. So I would say this case has to happen somewhere, but I would not make it the rule.

For a faster 64-bit hash on ARM, you may be interested in trying the newer XXH3_64bit(), featured in latest release.

Thanks!
Binaries are 64bit.
XXH3_64bit definitely performs better. Without using NEON vectorization I am getting 3.9GB/s and with NEON close to 4.0 GB/s. Still CRC32 using vmull_p64 has slightly better throughput though.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

devnoname120 picture devnoname120  ·  8Comments

shuffle2 picture shuffle2  ·  6Comments

ifduyue picture ifduyue  ·  8Comments

witedragen picture witedragen  ·  3Comments

jtoivainen picture jtoivainen  ·  4Comments