Xxhash: Uneven chunk size in xxh3 stream computes wrong hash

Created on 28 May 2020  ·  7Comments  ·  Source: Cyan4973/xxHash

I noticed a difference in the computing of an xxh3 (and xxh128) hash when the chunks of data are of uneven size. I'm testing xxh3 for eventual inclusion in rsync (once it stabilizes), and when transfer compression is used, the chunk sizes on the receiving side can vary wildly, causing the file checksum to be erroneous.

I created a simple test case while I was diagnosing the issue, but it appears to not really depend on the data (though I supplied some in the test case) but on the weird chunk sizes. Just run "make" and it will compile the simple C program and process the 2 test files through it (one with even chunks, and one with uneven chunks).

streaming-bug.tar.gz

(Note: I updated the tar file to avoid an extra 0-length update call after each chunk. The input files have a trailing newline added to each chunk's text and tester.c was not discarding it, resulting in a 0-byte chunk due to the empty line. It didn't affect the end result, but it's cleaner with the change.)

bug

All 7 comments

I confirm I can reproduce the issue with the linked source code.

Reasons for the error is still unclear.
Another test, which ingests data in segments of random length, seems to work fine so far, using exactly the same API.
Investigation is ongoing.

Hmmmmmmm there's definitely more things awry here.

If I add the single shot function as well, I get a third result.

./tester normal-chunks uneven-chunks

normal-chunks
incr:   cf3093cedd619ac9
full:   cf3093cedd619ac9
single: 8ef77a435d6e1fa9

uneven-chunks
incr:   ba74115927b80788
full:   cf3093cedd619ac9
single: 8ef77a435d6e1fa9

But if I replace everything with the equivalent XXH64 functions, they all return the same result.

./tester normal-chunks uneven-chunks
                                                                                   normal-chunks
incr:   98a65b63b84f8277
full:   98a65b63b84f8277
single: 98a65b63b84f8277

uneven-chunks
incr:   98a65b63b84f8277
full:   98a65b63b84f8277
single: 98a65b63b84f8277

OK, I think I get it.
It requires a specific sequence of sizes to show up,
but I'm nonetheless surprised this issue did not show up during tests.
Something must be improved there.

OK, fixed it.
Indeed, it was a pretty nasty bug. Thanks @WayneD for pointing that out !

I must now think of a way to trigger that bug during tests, and make it part of the PR.

Great work! I'm glad it wasn't too hard to fix it. I guess I just got "lucky" with an rsync test run combo of xxh128 checksums + a compression test then ended up with reproducible failure (and then I just had rsync write out the checksum call sequence that was failing in a simple file format).

I've verified that your "fix378" branch works fine in my rsync test run that was failing.

Now I just need to figure out why rsync is computing a different checksum value for about 1/4 of the copied files I just copied compared to what xxh128 outputs (where no such issue occurs when comparing xxh64 values).

Things are matching up correctly now. I think that I had only re-linked my code initially and I had to recompile it for it to be fully fixed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jvriezen picture jvriezen  ·  6Comments

carstenskyboxlabs picture carstenskyboxlabs  ·  6Comments

xinglin picture xinglin  ·  6Comments

boazsegev picture boazsegev  ·  6Comments

t-mat picture t-mat  ·  3Comments