Xxhash: Any known xxhash collision so far?

Created on 20 Jan 2019  ·  7Comments  ·  Source: Cyan4973/xxHash

Any known xxhash collision so far? Need some data for reference.

question

Most helpful comment

It's quite trivial to find some:

//add this to your compiler flags to run it faster: -O2 -march=native

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include "xxhash.h"

int main()
{
    const unsigned long long t = 1727396350;
    for(unsigned long s = 0; s < ULONG_MAX; ++s)
    {
        unsigned long long h = XXH32(&s, sizeof s, 0); //Replace XXH32 with XXH64 for XXH64 collisions
        if (h == t) printf("COLLISION!!!: %lu\n", s);
    }
}

I found a whole bunch of unsigned longs that all returned the same XXH32 value of 1727396350, some of which are listed down below:

90343688
4387316452
11971290027
16268262791
19557269070
25675257898
27143248113
33261236941
34729227156
40847215984
44136222263
48433195027
54551183855
56019174070
62137162898
65426169177
69723141941

The list goes on for a while... And I found all these in just a couple of minutes.

As for XXH64, it will take significantly more time to find a collision, which is to be expected since it has a 64 bit hash size, double that of XXH32. Not twice the amount of time mind you, but a lot more; since every bit doubles the amount of time required, it will (probably) take somewhere around 2^(64-32)=2^32=4294967296 times longer. That might sound like a lot, but considering I was able to find my first XXH32 collision in mere milliseconds, it might only take me a few years to find a collision on my laptop... without assembly optimization, or GPU acceleration, or FPGAs. An determined attacker could easily afford all of the above, and a bunch more computers to make things much, MUCH quicker. Heck, I could probably do that, if I didn't have better things to do. The most important part though is cryptanalysis: when an attack on this function is found (which should be dead-simple for any cryptographer out there), you'll probably be able to generate a collision in under a second on your 5 year-old smartphone, just like what happened to MD5.

Now for the usual security warning: XXHASH is NOT a cryptographic function. Do NOT use it anywhere where security matters. It is made for creating hashtables or hashsets, allowing for O(1) lookup in memory, just like djb2. They are made for one thing only: speed, and a lot of it. When looking up an item in a hash table, the program needs to hash the key as fast as possible so it can jump to the corresponding address. A collision there simply means that the program has to iterate over a few items until it finds what it's looking for, which is not a big deal.

Meanwhile, if even a hypothetical academic-grade reduction on a cryptographic hash function is found, it is a massive deal. Its only purpose is to withstand any attempts at generating a collision. Which is why, despite SHA1 requiring absolutely massive amounts of computing power to generate a collision, and only one collision found as of writing this (afaik), it is still widely considered insecure.

Modern cryptographic hash algorithms must also return hashes at least 128 bits in length, or 256 if you want to protect yourself from quantum computers, heck maybe even 384 if this whitepaper is to be believed (although it is disputed). Note that this does NOT mean that any hash function that returns a 128 bit hash is secure.

All 7 comments

xxhash is a non-cryptographic hash algorithm.
It must be assumed that collisions can be produced on demand.
32 and 64 bits are way too small anyway to provide any protection, and can be brute-forced.

I would like to know what values have this collision. does anyone have the values? as mentioned, need data for reference.

It's quite trivial to find some:

//add this to your compiler flags to run it faster: -O2 -march=native

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include "xxhash.h"

int main()
{
    const unsigned long long t = 1727396350;
    for(unsigned long s = 0; s < ULONG_MAX; ++s)
    {
        unsigned long long h = XXH32(&s, sizeof s, 0); //Replace XXH32 with XXH64 for XXH64 collisions
        if (h == t) printf("COLLISION!!!: %lu\n", s);
    }
}

I found a whole bunch of unsigned longs that all returned the same XXH32 value of 1727396350, some of which are listed down below:

90343688
4387316452
11971290027
16268262791
19557269070
25675257898
27143248113
33261236941
34729227156
40847215984
44136222263
48433195027
54551183855
56019174070
62137162898
65426169177
69723141941

The list goes on for a while... And I found all these in just a couple of minutes.

As for XXH64, it will take significantly more time to find a collision, which is to be expected since it has a 64 bit hash size, double that of XXH32. Not twice the amount of time mind you, but a lot more; since every bit doubles the amount of time required, it will (probably) take somewhere around 2^(64-32)=2^32=4294967296 times longer. That might sound like a lot, but considering I was able to find my first XXH32 collision in mere milliseconds, it might only take me a few years to find a collision on my laptop... without assembly optimization, or GPU acceleration, or FPGAs. An determined attacker could easily afford all of the above, and a bunch more computers to make things much, MUCH quicker. Heck, I could probably do that, if I didn't have better things to do. The most important part though is cryptanalysis: when an attack on this function is found (which should be dead-simple for any cryptographer out there), you'll probably be able to generate a collision in under a second on your 5 year-old smartphone, just like what happened to MD5.

Now for the usual security warning: XXHASH is NOT a cryptographic function. Do NOT use it anywhere where security matters. It is made for creating hashtables or hashsets, allowing for O(1) lookup in memory, just like djb2. They are made for one thing only: speed, and a lot of it. When looking up an item in a hash table, the program needs to hash the key as fast as possible so it can jump to the corresponding address. A collision there simply means that the program has to iterate over a few items until it finds what it's looking for, which is not a big deal.

Meanwhile, if even a hypothetical academic-grade reduction on a cryptographic hash function is found, it is a massive deal. Its only purpose is to withstand any attempts at generating a collision. Which is why, despite SHA1 requiring absolutely massive amounts of computing power to generate a collision, and only one collision found as of writing this (afaik), it is still widely considered insecure.

Modern cryptographic hash algorithms must also return hashes at least 128 bits in length, or 256 if you want to protect yourself from quantum computers, heck maybe even 384 if this whitepaper is to be believed (although it is disputed). Note that this does NOT mean that any hash function that returns a 128 bit hash is secure.

@OverclockedSanic however, I suppose the amount of data to process for xxhash64 makes things quite different. I mean in less than a year.

Hashing fonction is serial, quantum computing is no help except if the content of the data generated for matching the hash does not matter.

@ytrezq Not sure what you meant in the first part, but quantum computing, despite not breaking symmetric primitives like hash functions, definitely does affect them. An attack with grover's algorithm reduces the complexity for generating a collision from 2^n to 2^(n/2), meaning SHA3-256 in a post-quantum setting actually offers 128 bit of security as opposed to 256, or even ~85.33 if the whitepaper above is correct (in which case, you'd need to use at least SHA3-384 to be secure).

Again, doesn't break hash functions, since all you need to do is increase the size, unlike RSA and ECDSA which will become useless no matter the key size.

Sauce: https://en.wikipedia.org/wiki/SHA-3#Security_against_quantum_attacks
Grover's algorithm: https://en.wikipedia.org/wiki/Grover%27s_algorithm

The following program demonstrates an XXH64 collision:

#include <stdio.h>
#include <stdint.h>
#include <xxhash.h>

int main(void)
{
        const uint8_t str1[] = { 0xd0, 0x08, 0xad, 0x10, 0x02, 0x00, 0x00, 0x00, 0xd0, 0x08, 0xad, 0x10, 0x02, 0x00, 0x00, 0x00 };
        const uint8_t str2[] = { 0xa6, 0x03, 0x26, 0xc5, 0x01, 0x00, 0x00, 0x00, 0xa6, 0x03, 0x26, 0xc5, 0x01, 0x00, 0x00, 0x00 };
        printf("%lu\n", XXH64(str1, sizeof(str1), 156211));
        printf("%lu\n", XXH64(str2, sizeof(str2), 156211));
}

This was done simply by hashing lots of integers starting from zero, storing them all in a huge array, sorting said array and looking for duplicates. I couldn't get it to easily trigger without hashing the number twice, though; it seems there's some resistance to 64-bit collisions with only 64-bit inputs.

Or you could just use the known XXH64 multicollision:

#define XXH_INLINE_ALL
#include <xxhash.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>

static void print_buf(const uint8_t *buffer, size_t len)
{
    printf("0x");
    for (size_t i = 0; i < len; i++) {
        printf("%02x", buffer[i]);
    }
}

// https://github.com/Cyan4973/xxHash/issues/54#issuecomment-414061026
int main(void)
{
    uint8_t real[64], collide[64], seedBuf[8];

    // Fill with random integers
    FILE *urandom = fopen("/dev/urandom", "rb");
    if (!urandom)
        return 1;
    if (fread(real,    sizeof(real),    1, urandom) != 1)
        return 1;
    if (fread(seedBuf, sizeof(seedBuf), 1, urandom) != 1)
        return 1;
    fclose(urandom);

    uint64_t seed = XXH_readLE64(seedBuf);

    // Copy the collision code
    memcpy(collide, real, sizeof(collide));

    uint64_t x = XXH64(real,    sizeof(real),    seed);
    printf("XXH64(");
    print_buf(real, 64);
    printf(", 64, %#016"PRIx64") = %#016"PRIx64"\n", seed, x);
    for (size_t i = 0; i < 256; i++) {
        // ((uint64_t *)collide)[i % 4] += 0x0BA79078168D4BAF;
        uint64_t a = XXH_readLE64(&collide[8 * i % 4]);
        a += 0x0BA79078168D4BAF;
        XXH_writeLE64(&collide[8 * i % 4], a);

        // ((uint64_t *)collide)[4 + i % 4] += 0x9C90005B80000000;
        uint64_t b = XXH_readLE64(&collide[32 + 8 * i % 4]);
        b += 0x9C90005B80000000;
        XXH_writeLE64(&collide[32 + 8 * i % 4], b);

        uint64_t y = XXH64(collide, sizeof(collide), seed);
        printf("XXH64(");
        print_buf(collide, 64);
        printf(", 64, %#016"PRIx64") = %#016"PRIx64"\n", seed, y);
    }
}

However, XXH64 is bijective for 64-bit inputs, so it is impossible for there to be a collision at that length.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

WayneD picture WayneD  ·  7Comments

jtoivainen picture jtoivainen  ·  4Comments

boazsegev picture boazsegev  ·  6Comments

devnoname120 picture devnoname120  ·  8Comments

gitmko0 picture gitmko0  ·  4Comments