runtime 🚀 - Adding a HashCode type to help with combining hash codes

If you want something quick-and-dirty then you can use ValueTuple.Create(field1, field2).GetHashCode(). It's the same algorithm as that used in Tuple (which for that matter, is similar to that in Objects) and doesn't have the allocation overhead.

Otherwise there are questions as to how good a hash you will need, what likely field values are there going to be (which affects which algorithms will give good or bad results), is there a likelihood of hashDoS attacks, do collisions modulo a binary-even number hurt (like they do with binary-even hash tables), and so on, making one-fits-all inapplicable.

JonHanna on 25 Apr 2016

@JonHanna I think those question also apply to, for example, string.GetHashCode(). I don't see why providing Hash should be any harder than that.

Actually, it should be simpler, since users with special requirements can easily stop using Hash, but stopping using string.GetHashCode() is harder.

svick on 25 Apr 2016

👍1

+1

We have one of these in ASP.NET, https://github.com/aspnet/Common/blob/dev/src/Microsoft.Extensions.HashCodeCombiner.Sources/HashCodeCombiner.cs. It's also inline friendly.

davidfowl on 25 Apr 2016

If you want something quick-and-dirty then you can use ValueTuple.Create(field1, field2).GetHashCode().

Ah, good idea, I hadn't thought of ValueTuple when making this post. Unfortunately I don't think that'll be available until C# 7/the next framework release, or even know if it'll be that performant (those property/method calls to EqualityComparer can add up). But I haven't taken any benchmarks to measure this, so I wouldn't really know. I just think that there should be a dedicated/simple class for hashing that people could use without using tuples as a hackish workaround.

Otherwise there are questions as to how good a hash you will need, what likely field values are there going to be (which affects which algorithms will give good or bad results), is there a likelihood of hashDoS attacks, do collisions modulo a binary-even number hurt (like they do with binary-even hash tables), and so on, making one-fits-all inapplicable.

Absolutely agreed, but I don't think the majority of implementations take that into account, e.g. ArraySegment's current implementation is pretty naive. The main purpose of this class (along with avoiding allocations) would be to provide a go-to implementation for people who don't know much about hashing, to prevent them from doing something stupid like this. People who need to deal with the situations you described can implement their own hashing algorithm.

jamesqo on 25 Apr 2016

Unfortunately I don't think that'll be available until C# 7/the next framework release

I think you can use it with C# 2, just not with built-in support.

or even know if it'll be that performant (those property/method calls to EqualityComparer can add up)

What would this class do differently? If explicitly calling obj == null ? 0 : obj.GetHashCode() is any faster, than that should be moved into ValueTuple.

I'd have been inclined to +1 this proposal a couple of weeks ago, but I'm less inclined in light of ValueTuple reducing the allocation overhead of the trick of using Tuple for this, this seems to fall between two stools to me: If you don't need something particularly specialised then you can use ValueTuple, but if you need something beyond that, then a class like this isn't going to go far enough.

And when we do have C#7, it's going to have the syntactic sugar to make it even easier.

JonHanna on 25 Apr 2016

@JonHanna

What would this class do differently? If explicitly calling obj == null ? 0 : obj.GetHashCode() is any faster, than that should be moved into ValueTuple.

Why not have ValueTuple just use the Hash class for getting hash codes? That would also reduce the LOC in the file significantly (which right now is about ~2000 lines).

edit:

If you don't need something particularly specialised then you can use ValueTuple

True, but the problem is that many people might not realize that and implement their own inferior naive hashing function (like the one I linked to above).

jamesqo on 25 Apr 2016

👍1

That I could indeed get behind.

JonHanna on 25 Apr 2016

👍1

Probably outside the scope of this issue. But having a hashing namespace where we can find high-performance cryptographic and non-cryptographic hashes writen by experts would be a win here.

For instance, we had to code xxHash32, xxHash64, Metro128 and also downsampling from 128 to 64 and from 64 to 32 bits ourselves. Having an array of optimized functions can help developers to avoid writing their own unoptimized and/or buggy ones (I know, we have found a few bugs in our own too); but still being able to choose from depending on the needs.

We would gladly donate our implementations if there is interest, so they can be reviewed and optimized further by experts.

redknightlois on 26 Apr 2016

@redknightlois I'd be happy to add my SpookyHash implementation to an effort like that.

JonHanna on 26 Apr 2016

@svick Careful with string.GetHashCode() though, that's very specific, for very good reason, Hash DoS attacks.

blowdart on 26 Apr 2016

@terrajobst, how far is this in out API triage/review queue? I think it's a simple API that we always wanted to add to the platform and possibly we now have enough critical mass to actually do it?

cc: @ellismg

KrzysztofCwalina on 9 Nov 2016

I think it's ready to review in its current state.

mellinoe on 9 Nov 2016

@mellinoe That's great! I've cleaned up the proposal a bit to make it more terse, and also added some questions at the end I think should be addressed.

jamesqo on 9 Nov 2016

@jamesqo There should be long based too.

redknightlois on 10 Nov 2016

@redknightlois, sounds reasonable. I updated the proposal to include long overloads of Combine.

jamesqo on 11 Nov 2016

Is @JonHanna's suggestion not good enough?

C# return ValueTuple.Create(a, b, c).GetHashCode();

Unless there are good enough reasons why that's not good enough, we don't think it's making the cut.

terrajobst on 15 Nov 2016

Beyond the code generated to be a few orders of magnitude worse, I cannot think of any other good enough reason. Unless of course there are optimizations in the new runtime that deal with this particular case in mind, in which case this analysis is moot. Having said that I tried this on 1.0.1.

Let me illustrate with an example.

Lets suppose that we take the actual code that is being used for ValueTuple and use constants to call it.

        internal static class HashHelpers
        {
            public static int Combine(int h1, int h2)
            {
                // The jit optimizes this to use the ROL instruction on x86
                // Related GitHub pull request: dotnet/coreclr#1830
                uint shift5 = ((uint)h1 << 5) | ((uint)h1 >> 27);
                return ((int)shift5 + h1) ^ h2;
            }
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        public static int TryStaticCall()
        {
            return HashHelpers.Combine(10202, 2003);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        public static int TryValueTuple()
        {
            return ValueTuple.Create(10202, 2003).GetHashCode();
        }
    }

Now under an optimizing compiler chances are that there shouldn't be any difference, but in reality there is.

This is the actual code for ValueTuple

So now what can be seen here? First we are creating an struct in the stack, then we are calling the actual hash code.

Now compare it with the use of HashHelper.Combine which for all purposes it could be the actual implementation of Hash.Combine

I know!!! How cool is that???
But lets not stop there... lets use actual parameters:

        [MethodImpl(MethodImplOptions.NoInlining)]
        public static int TryStaticCall(int h1, int h2)
        {
            return HashHelpers.Combine(h1, h2);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        public static int TryValueTuple(int h1, int h2)
        {
            return ValueTuple.Create(h1, h2).GetHashCode();
        }

        static unsafe void Main(string[] args)
        {
            var g = new Random();
            int h1 = g.Next();
            int h2 = g.Next(); 
            Console.WriteLine(TryStaticCall(h1, h2));
            Console.WriteLine(TryValueTuple(h1, h2));
        }

The good thing, this is extremely stable. But lets compare it with the alternative:

Now lets go overboard...

        internal static class HashHelpers
        {
            public static int Combine(int h1, int h2)
            {
                // The jit optimizes this to use the ROL instruction on x86
                // Related GitHub pull request: dotnet/coreclr#1830
                uint shift5 = ((uint)h1 << 5) | ((uint)h1 >> 27);
                return ((int)shift5 + h1) ^ h2;
            }
            public static int Combine(int h1, int h2, int h3, int h4)
            {
                return Combine(Combine(h1, h2), Combine(h3, h4));
            }
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        public static int TryStaticCall(int h1, int h2, int h3, int h4)
        {
            return HashHelpers.Combine(h1, h2, h3, h4);
        }

And the result is pretty illustrative

I cannot really inspect the actual code that the JIT generate for the call, but just the prolog and epilog is enough to justify inclusion of the proposal.

The takeaway of the analysis is simple: that the holding type is a struct doesn't mean it is free :)

redknightlois on 15 Nov 2016

👍1

Performance was brought up during the meeting. The question is whether this API is likely to be on the hot path. To be clear, I'm not saying we shouldn't have the API. I'm merely saying unless there is a concrete scenario it's harder to design the API because we can't say "we need it for X, thus the measure of success is whether X can use it". That's important for APIs that don't allow you you to do something new, but rather do the same thing in a more optimized fashion.

terrajobst on 15 Nov 2016

I think that the more important it is to have a fast, good quality hash, the more important it is to tune the algorithm used to the object(s) and the range of values likely to be seen, and hence the more you need such a helper, the more you need to not use such a helper.

JonHanna on 15 Nov 2016

@terrajobst, performance was a major motivation for this proposal but not the only one. Having a dedicated type will help with discoverability; even with built-in tuple support in C# 7, developers may not necessarily know that they are equated by value. Even if they do, they may forget that tuples override GetHashCode, and will likely end up having to Google how to implement GetHashCode in .NET.

Also, there is a subtle correctness issue with using ValueTuple.Create.GetHashCode. Past 8 elements, only the last 8 elements are hashed; the rest are ignored.

jamesqo on 15 Nov 2016

@terrajobst At RavenDB GetHashCode performance had such a hit on our bottomline that we ended up implementing a whole set of highly optimized routines. Even Roslyn has their own internal hashing https://github.com/dotnet/roslyn/blob/master/src/Compilers/Core/Portable/InternalUtilities/Hash.cs also check the discussion on Roslyn specifically here: https://github.com/dotnet/coreclr/issues/1619 ... So when performance is KEY we cannot use the platform provided and have to roll our own (and pay the consequences).

Also @jamesqo issue is completely valid. Haven't required to combine so many hashes, but for 1M cases there is someone that is going to step over the cliff with that one.

redknightlois on 15 Nov 2016

@JonHanna

I think that the more important it is to have a fast, good quality hash, the more important it is to tune the algorithm used to the object(s) and the range of values likely to be seen, and hence the more you need such a helper, the more you need to not use such a helper.

So you're saying that adding a helper class would be bad, since it would encourage people to just throw in the helper function without thinking about how to do a proper hash?

It seems like the opposite would be true, actually; Hash.Combine should generally improve GetHashCode implementations. People who know how to do hashing can evaluate Hash.Combine to see if it fits their use case. Newbies who don't really know about hashing will use Hash.Combine instead of just xor-ing (or worse, adding) the constituent fields because they don't know how to do a proper hash.

jamesqo on 15 Nov 2016

👍4

We discussed this a bit more and you convinced us :-)

A few more questions:

We need to decide where to put this type. Introducing a new namespace seems odd; System.Numerics might work though. System.Collections.Generic might also work, because it has the comparers and hashing is most often used in the context of collections.
Should we provide an allocation-free builder pattern to combine a unknown number of hash codes?

On (2) @Eilon had this to say:

For reference, ASP.NET Core (and its predecessors and related projects) use a HashCodeCombiner: https://github.com/aspnet/Common/blob/dev/src/Microsoft.Extensions.HashCodeCombiner.Sources/HashCodeCombiner.cs

(@David Fowler mentioned it in the GitHub thread several months ago.)

And this is an example usage: https://github.com/aspnet/Mvc/blob/760c8f38678118734399c58c2dac981ea6e47046/src/Microsoft.AspNetCore.Mvc.Razor/Internal/ViewLocationCacheKey.cs#L129-L144

``` C#
var hashCodeCombiner = HashCodeCombiner.Start();
hashCodeCombiner.Add(IsMainPage ? 1 : 0);
hashCodeCombiner.Add(ViewName, StringComparer.Ordinal);
hashCodeCombiner.Add(ControllerName, StringComparer.Ordinal);
hashCodeCombiner.Add(AreaName, StringComparer.Ordinal);

if (ViewLocationExpanderValues != null)
{
foreach (var item in ViewLocationExpanderValues)
{
hashCodeCombiner.Add(item.Key, StringComparer.Ordinal);
hashCodeCombiner.Add(item.Value, StringComparer.Ordinal);
}
}

return hashCodeCombiner;
```

terrajobst on 16 Nov 2016

🎉1

We discussed this a bit more and you convinced us :-)

🎉

Introducing a new namespace seems odd; System.Numerics might work though.

If we decide not to add a new namespace, then it should be noted that any code that has a class named Hash and a using System.Numerics directive will fail to compile with an ambiguous type error.

Should we provide an allocation-free builder pattern to combine a unknown number of hash codes?

This sounds like a great idea. As a couple of initial suggestions, perhaps we should name it HashBuilder (a la StringBuilder) and have it return this after every Add method to make it easier to add hashes, like so:

public override int GetHashCode()
{
    return HashBuilder.Create(_field1)
        .Add(_field2)
        .Add(_field3)
        .ToHash();
}

jamesqo on 16 Nov 2016

👍4

@jamesqo please update the proposal at the top when there is consensus on the thread. We can then do final review. Assigning to you for now as you drive the design ;-)

karelz on 16 Nov 2016

If we decide not to add a new namespace, then it should be noted that any code that has a class named Hash and a using System.Numerics directive will fail to compile with an ambiguous type error.

Depends on the actual scenario. In many cases the compiler will prefer your type as the defined namespace hierarchy of the compilation unit is walked before considering using directives.

But even so: adding APIs can be a source breaking change. However, it's impractical to avoid this, assuming we want to make forward progress 😄 We generally strive to avoid conflicts by, for example, using names that aren't too general. For instance, I don't think we should call the type Hash. I think HashCode would probably be better.

As a couple of initial suggestions, perhaps we should name it HashBuilder

As a first approximation I was thinking of combining the statics and the builder into a single type, like so:

``` C#
namespace System.Collections.Generic
{
public struct HashCode
{
public static int Combine(int hash1, int hash2);
public static int Combine(int hash1, int hash2, int hash3);
public static int Combine(int hash1, int hash2, int hash3, int hash4);
public static int Combine(int hash1, int hash2, int hash3, int hash4, int hash5);
public static int Combine(int hash1, int hash2, int hash3, int hash4, int hash5, int hash6);

    public static long Combine(long hash1, long hash2);
    public static long Combine(long hash1, long hash2, long hash3);
    public static long Combine(long hash1, long hash2, long hash3, long hash4);
    public static long Combine(long hash1, long hash2, long hash3, long hash4, long hash5);
    public static long Combine(long hash1, long hash2, long hash3, long hash4, long hash5, longhash6);

    public static int CombineHashCodes<T1, T2>(T1 o1, T2 o2);
    public static int CombineHashCodes<T1, T2, T3>(T1 o1, T2 o2, T3 o3);
    public static int CombineHashCodes<T1, T2, T3, T4>(T1 o1, T2 o2, T3 o3, T4 o4);
    public static int CombineHashCodes<T1, T2, T3, T4, T5>(T1 o1, T2 o2, T3 o3, T4 o4, T5 o5);
    public static int CombineHashCodes<T1, T2, T3, T4, T5, T6>(T1 o1, T2 o2, T3 o3, T4 o4, T5 o5, T6 o6);

    public void Combine(int hashCode);
    public void Combine(long hashCode);
    public void Combine<T>(T obj);
    public void Combine(string text, StringComparison comparison);

    public int Value { get; }
}

}

This allows for code like this:

``` C#
return HashCode.Combine(value1, value2);

as well as:

``` C#
var hashCode = new HashCode();
hashCode.Combine(IsMainPage ? 1 : 0);
hashCode.Combine(ViewName, StringComparer.Ordinal);
hashCode.Combine(ControllerName, StringComparer.Ordinal);
hashCode.Combine(AreaName, StringComparer.Ordinal);

if (ViewLocationExpanderValues != null)
{
foreach (var item in ViewLocationExpanderValues)
{
hashCode.Combine(item.Key, StringComparer.Ordinal);
hashCode.Combine(item.Value, StringComparer.Ordinal);
}
}

return hashCode.Value;
```

Thoughts?

terrajobst on 16 Nov 2016

I like @jamesqo's idea of chained calls (return this from instance methods Combine).

I would even go as far as removing the static methods entirely and keep just the instance methods ...

Combine(long hashCode) will just cast down to int. Do we truly want that?
What is the use case for long overloads in the first place?

karelz on 16 Nov 2016

@karelz Please don't remove them, structs are not free. Hashes can be used in very hot paths, you certainly dont want to waste instructions when the static method would be essentially free. Look at the analysis of the code where I showed the real impact of the enclosing struct.

We used the Hashing static class to avoid name clashes and the code looks good.

redknightlois on 16 Nov 2016

👍2

@redknightlois I wonder if we should expect same 'bad' code also in a case of non-generic struct with one int field.
If that is still 'bad' assembly code, I wonder if we could improve JIT to do better job in optimizations here. Adding APIs just to save couple of instructions should be our last resort IMO.

karelz on 16 Nov 2016

@redknightlois Curious, does the JIT generate any worse code if the struct (in this case HashCode) can fit in a register? It'll only be an int big.

Also, I've been seeing lots of pull requests in coreclr recently to improve the code generated around structs, and it looks like dotnet/coreclr#8057 will enable those optimizations. Perhaps the code the JIT generates will be better after this change?

edit: I see @karelz has already mentioned my points here.

jamesqo on 16 Nov 2016

@karelz, I agree with you-- assuming the JIT generates decent code for an int-sized struct (which I believe it does, ImmutableArray has no overhead for example) then the static overloads are redundant and can be removed.

@terrajobst Some more ideas I have:

I think we can combine your & my ideas a bit. HashCode seems like a good name; it doesn't have to be a mutable struct following the builder pattern. Instead, it can be an immutable wrapper arounnd an int, and every Combine operation can return a new HashCode value. For example

public struct HashCode
{
    private readonly int _hash;

    public HashCode Combine(int hash) => return new HashCode(CombineCore(_hash, hash));

    public HashCode Combine<T>(T item) => Combine(EqualityComparer<T>.Default.GetHashCode(item));
}

// Usage
HashCode combined = new HashCode(_field1)
    .Combine(_field2)
    .Combine(_field3);

We should just have an implicit operator for conversion to int so people don't have to have that last .Value call.
Re Combine, is that the best name? It sounds more descriptive, but Add is shorter & easier to type. (Mix is another alternative, but that is slightly painful to type.)
- public void Combine(string text, StringComparison comparison): I don't think that really belongs in the same type, since this is unrelated to strings. Besides, it's easy enough to write StringComparer.XXX.GetHashCode(str) for those rare times you need to do so.
- We should remove the long overloads from this type & have a separate HashCode type for longs. Something like Int64HashCode, or LongHashCode.

I made a small sample implementation of things on TryRoslyn: http://tinyurl.com/zej9yux

jamesqo on 16 Nov 2016

Luckily it is easy to check. And the good news is that it works properly as it is 👍

We should just have an implicit operator for conversion to int so people don't have to have that last .Value call.

Probably, the code is not nearly as straightforward, having an implicit conversion would clean it up a bit. I still like the idea of being able to have multiple parameters interface too.

        [MethodImpl(MethodImplOptions.NoInlining)]
        public static int TryHashCombiner(int h1, int h2, int h3, int h4)
        {
            var h = new HashCode(h1).Combine(h2).Combine(h3).Combine(h4);
            return h.Value;
        }

Re Combine, is that the best name? It sounds more descriptive, but Add is shorter & easier to type. (Mix is another alternative, but that is slightly painful to type.)

Combine is the actual name that is used in the hashing community afaik. And it sort of gives you a clear idea of what it is being doing.

@jamesqo There are a lot of hashing functions, we had to implement very fast versions for, from 32bits, 64bits to 128bits for RavenDB (and we use every single one for different purposes).

We can think forward in this design with some extensible mechanism like this:

        internal interface IHashCode<T> where T : struct
        {
            T Combine(T h1, T h2);
        }

        internal struct RotateHashCode : IHashCode<int>, IHashCode<long>
        {
            long IHashCode<long>.Combine(long h1, long h2)
            {
                // The jit optimizes this to use the ROL instruction on x86
                // Related GitHub pull request: dotnet/coreclr#1830
                ulong shift5 = ((ulong)h1 << 5) | ((ulong)h1 >> 27);
                return ((int)shift5 + h1) ^ h2;
            }

            int IHashCode<int>.Combine(int h1, int h2)
            {
                // The jit optimizes this to use the ROL instruction on x86
                // Related GitHub pull request: dotnet/coreclr#1830
                uint shift5 = ((uint)h1 << 5) | ((uint)h1 >> 27);
                return ((int)shift5 + h1) ^ h2;
            }
        }

        internal struct HashCodeCombiner<T, W> where T : struct, IHashCode<W>
                                               where W : struct
        {
            private static T hasher;
            public W Value;

            static HashCodeCombiner()
            {
                hasher = new T();
            }

            [MethodImpl(MethodImplOptions.AggressiveInlining)]
            public HashCodeCombiner(W seed)
            {
                this.Value = seed;
            }

            [MethodImpl(MethodImplOptions.AggressiveInlining)]
            public HashCodeCombiner<T,W> Combine( W h1 )
            {
                Value = hasher.Combine(this.Value, h1);
                return this;
            }
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        public static int TryHashCombinerT(int h1, int h2, int h3, int h4)
        {
            var h = new HashCodeCombiner<RotateHashCode, int>(h1).Combine(h2).Combine(h3).Combine(h4);
            return h.Value;
        }

I don't know why the JIT is creating a very annoying prolog code for this. It shouldn't so it probably can be optimized, we should ask the JIT devs for it. But for the rest you can implement as many different Combiners as you want without wasting a single instruction. Having said that, this method is probably more useful for actual hash functions than for combiners. cc @CarolEidt @AndyAyersMS

EDIT: Thinking out loud here for a general mechanism to combine crypto and non-crypto hash functions in under a single hashing concept umbrella.

redknightlois on 16 Nov 2016

@jamesqo

it doesn't have to be a mutable struct following the builder pattern

Ah yes. In that case I'm fine with that pattern. I generally don't like the pattern of returning instances if the operation had a side effect. It's especially bad if the API is following the immutable WithXxx pattern. In this case though, the pattern is essentially an immutable data structure so that pattern would work fine.

I think we can combine your & my ideas a bit.

👍, so what about:

``` C#
public struct HashCode
{
public static HashCode Create(T obj);

[Pure] public HashCode Combine(int hashCode);
[Pure] public HashCode Combine(long hashCode);
[Pure] public HashCode Combine<T>(T obj);
[Pure] public HashCode Combine(string text, StringComparison comparison);

public int Value { get; }

public static implicit operator int(HashCode hashCode);

}

This allows for code like this:

``` C#
public override int GetHashCode()
{
    return HashCode.Create(value1).Combine(value2);
}

as well as this:

``` C#
var hashCode = new HashCode()
.Combine(IsMainPage ? 1 : 0)
.Combine(ViewName, StringComparer.Ordinal)
.Combine(ControllerName, StringComparer.Ordinal)
.Combine(AreaName, StringComparer.Ordinal);

if (ViewLocationExpanderValues != null)
{
foreach (var item in ViewLocationExpanderValues)
{
hashCode = hashCode.Combine(item.Key, StringComparer.Ordinal);
hashCode = hashCode.Combine(item.Value, StringComparer.Ordinal);
}
}

return hashCode.Value;
```

terrajobst on 17 Nov 2016

@terrajobst Thoughts:

The Create<T> factory method should be removed. Otherwise, there would be 2 ways of writing the same thing, HashCode.Create(_val) or new HashCode().Combine(_val). Also, having different names for Create / Combine would not be diff-friendly since if you added a new first field, you would have to change 2 lines.
I don't think the overload accepting a string/StringComparison belongs here; HashCode has nothing to do with strings. Instead, maybe we should add a GetHashCode(StringComparison) api to string? (Also all of those are ordinal comparisons, which is the default behavior of string.GetHashCode.)
What's the point of having Value, if there is already an implicit operator for conversion to int? Again, this would lead to different people writing different things.
We have to move the long overload to a new type. HashCode will only be 32 bits wide; it can't fit a long.
Let's add some overloads taking unsigned types, since they're more common in hashing.

Here's my proposed API:

public struct HashCode
{
    public HashCode Combine(int hash);
    public HashCode Combine(uint hash);
    public HashCode Combine<T>(T obj);

    public static implicit operator int(HashCode hashCode);
    public static implicit operator uint(HashCode hashCode);
}

public struct Int64HashCode
{
    public Int64HashCode Combine(long hash);
    public Int64HashCode Combine(ulong hash);

    public static implicit operator long(Int64HashCode hashCode);
    public static implicit operator ulong(Int64HashCode hashCode);
}

With only these methods, the example from ASP.NET can still be written as

var hashCode = new HashCode()
    .Combine(IsMainPage ? 1 : 0)
    .Combine(ViewName)
    .Combine(ControllerName)
    .Combine(AreaName);

if (ViewLocationExpanderValues != null)
{
    foreach (var item in ViewLocationExpanderValues)
    {
        hashCode = hashCode.Combine(item.Key);
        hashCode = hashCode.Combine(item.Value);
    }
}

return hashCode;

jamesqo on 17 Nov 2016

@jamesqo

What's the point of having Value, if there is already an implicit operator for conversion to int? Again, this would lead to different people writing different things.

The Framework Design Guidelines for operator overloads say:

CONSIDER providing methods with friendly names that correspond to each overloaded operator.

Many languages do not support operator overloading. For this reason, it is recommended that types that overload operators include a secondary method with an appropriate domain-specific name that provides equivalent functionality.

Specifically, F# is one of the languages that makes it awkward to invoke implicit conversion operators.

Also, I don't think having only one way of doing things is that important. In my opinion, it's more important to make the API convenient. If I just want to combine hash codes of few values, I think HashCode.CombineHashCodes(value1, value2, value3) is simpler, shorter and easier to understand than new HashCode().Combine(value1).Combine(value2).Combine(value3).

The instance method API is still useful for more complicated cases, but I think the more common case should have the simpler static method API.

svick on 17 Nov 2016

👍3

@svick, your point about other languages not supporting operators that well is legitimate. I yield, let's add Value then.

I don't think having only one way of doing things is that important.

It is important. If someone does it one way, and reads the code of a person who does it another way, then he/she will have to Google what the other way does.

If I just want to combine hash codes of few values, I think HashCode.CombineHashCodes(value1, value2, value3) is simpler, shorter and easier to understand than new HashCode().Combine(value1).Combine(value2).Combine(value3).

The problem with a static method is that, since there will be no params int[] overload, we will have to add overloads for each different arity, which is much less bang for buck. It's much nicer to have one method cover all use cases.

The second form will be easy to understand once you see it once or twice. In fact, you could argue it's more readable, since it's easier to chain vertically (and thus minimizes diffs when a field is added/removed):

public override int GetHashCode()
{
    return new HashCode()
        .Combine(_field1)
        .Combine(_field2)
        .Combine(_field3)
        .Combine(_field4);
}

jamesqo on 17 Nov 2016

[@svick] I don't think having only one way of doing things is that important.

I think minimizing the number of ways you can do the same thing is important because it avoids confusion. At the same time, our goal isn't to be 100% overlapping free if it helps to realize other goals, such as discoverability, convenience, performance or readability. In general, our goal is to minimize concepts, rather than APIs. For instance, multiple overloads are less problematic than having multiple different methods with disjoint terminology.

The reason I added the factory method is make it clear how one gets an initial hash code. Creating the empty struct followed by Combine doesn't seem very intuitive. The logical thing would be to add .ctor but in order to avoid boxing it would have to be generic, which you can't do with a .ctor. A generic factory method is the next best thing.

A nice side effect is that it looks very similar to how immutable data structures look in the framework. And in API design we strongly favor consistency over almost anything else.

[@svick] If I just want to combine hash codes of few values, I think HashCode.CombineHashCodes(value1, value2, value3) is simpler, shorter and easier to understand than new HashCode().Combine(value1).Combine(value2).Combine(value3).

I agree with @jamesqo: what I like about the builder pattern that it scales to an arbitrary amount of arguments with minimal performance penalty (if any, depending on how good our inliner is).

[@jamesqo] I don't think the overload accepting a string/StringComparison belongs here; HashCode has nothing to do with strings

Fair point. I added it because it was referenced in @Eilon's code. From experience I'd say that strings are super common. On the other hand, I'm not sure that specifying a comparison is. Let's leave it off for now.

[@jamesqo] We have to move the long overload to a new type. HashCode will only be 32 bits wide; it can't fit a long.

That's a good point. Do we even need a long version at all? I've only left it in because it was mentioned above and I didn't really think about it.

Now that I am, it seems we should leave only 32-bit because that's what the .NET GetHashCode() is about. In that vein, I'm not even sure we should add the uint version. If you use hashing outside of that realm, I think it's OK to point people to the more general purpose hashing algorithms we have in System.Security.Cryptography.

```C#
public struct HashCode
{
public static HashCode Create(T obj);

[Pure] public HashCode Combine(int hashCode);
[Pure] public HashCode Combine<T>(T obj);

public int Value { get; }

public static implicit operator int(HashCode hashCode);

}
```

terrajobst on 17 Nov 2016

Now that I am, it seems we should leave only 32-bit because that's what the .NET GetHashCode() is about. In that vein, I'm not even sure we should add the uint version. If you use hashing outside of that realm, I think it's OK to point people to the more general purpose hashing algorithms we have in System.Security.Cryptography.

@terrajobst There are very different types of hashing algorithms, a real zoo. In fact, probably 70% are non-cryptographic by design. And probably way more than half of those are designed to deal with 64+ bits (common target is 128/256). That the framework decided to use 32 bits I bet (I haven't been there) is because at the time x86 was still a huge consumer and hashes are used all over the place, so performance on lesser hardware was paramount.

Also to be strict most of the hash functions are really defined over the uint domain, and not on the int because the rules of shifting are different. In fact, if you check the code I posted before, the int is immediately converted into a uint because of that (and use the ror/rol optimization). Point in case, if we want to be strict the only hash should be uint, it can be viewed as an oversight that the framework returns int under that light.

Restricting this to int is no better than what we have today. If it were my call I would push to the design team to look into how we could accommodate supporting 128 and 256 variants and different hash functions (even if we would throw a don't-make-me-think alternative under your fingerprints).

The problems caused by oversimplification are sometimes worse than the design issues introduced when forced to deal with complex stuff. Simplifying functionality to such a big degree because developers are perceived to not being able to deal with having multiple options can easily lead into the path of the current state of SIMD. Most performance conscious developers cannot use it, and well everybody else wont use it either because most are not dealing with performance sensitive applications that have such fine throughput targets anyways.

The case of hashing is similar, the domains where you would use 32 bits are very restricted (most are already covered by the framework itself), for the rest you are out of luck.

Also as soon as you have to deal with more than 75000 elements you have 50% chances to have a collision, and that is bad in most scenarios (and that is assuming you have a well designed hash function). That is why 64 bits and 128 bits are so used outside of the boundaries of runtime structures.

With a design stuck on int we are only covering the problems caused by not having the Monday newspaper back in 2000 (so now everybody writes their poor hashing by themselves) but we wont advance even by a step the state of the art either.

That's my 2 cents into the discussion.

redknightlois on 17 Nov 2016

👍1

@redknightlois, I think we understand the limitations of the int hashes. But I agree with @terrajobst: this feature should be about APIs to compute hashes for the purpose of returning them from Object.GetHashCode overrides. We might in addition have a separate library for more modern hashing, but I would say it should be a separate discussion, as it needs to include deciding what to do with Object.GetHashCode and all the existing hashing data structures.

Unless you think it's still benefitial to do hash combining in 128 bits and then convert to int so the result can be returned from GetHahsCode.

KrzysztofCwalina on 17 Nov 2016

👍2

@KrzysztofCwalina I do agree it is two different approaches. One is to fix an issue caused on 2000; a different one is to tackle the general hashing problem. If we all agree this is a solution for the former, the discussion is over. However, for a design discussion for a "Future" milestone I have the sense that it will fall short, mostly because what we will do here will impact the future discussion. Making mistakes here, will have an impact.

redknightlois on 17 Nov 2016

@redknightlois, I would propose the following: let's design an API as if we did not have to worry about the future. Then, let's discuss which design choices we think would cause problems for the future APIs. Also, what we could do is add the c2000 APIs to corfx and in parallel try to experiment with the future APIs in corfxlab, which should uncover any issues associated with such additions, if we ever wanted to do them.

KrzysztofCwalina on 17 Nov 2016

@redknightlois

Making mistakes here, will have an impact.

I think if, in the future we want to support more advanced scenarios, then we can just do so in a separate type from HashCode. Decisions here should not really impact those cases.

jamesqo on 17 Nov 2016

👍1

I created a different issue to start tackling that.

redknightlois on 17 Nov 2016

❤1

@redknightlois :+1:. Btw, you responded before I could edit my comment, but I actually did try out your idea (above) of having the hash work with any type (int, long, decimal, etc.) and enscapulating the core hashing logic in a struct: https://github.com/jamesqo/HashApi (sample usage was here). But, having two generic type parameters ended up being way too complex, and the compiler type inference ended up not working when I tried to use the API. So yes, it's a good idea to more advanced hashing into a separate issue for now.

jamesqo on 17 Nov 2016

@terrajobst The API seems almost ready, but there are 1 or 2 more things I'd like to change.

I initially didn't want the static factory method, since HashCode.Create(x) has the same effect as new HashCode().Combine(x). But, I've changed my mind about that since that means 1 extra hash. Instead, why don't we rename Create to Combine? It seems kinda annoying to have to type one thing for the first field and another for the second field.
I think we should have HashCode implement IEquatable<HashCode> and implement some of the equality operators. Feel free to let me know if you have an objection.

(Hopefully) final proposal:

public struct HashCode : IEquatable<HashCode>
{
    public static HashCode Combine(int hash);
    public static HashCode Combine<T>(T obj);

    public HashCode Combine(int hash);
    public HashCode Combine<T>(T obj);

    public int Value { get; }

    public static implicit operator int(HashCode hashCode);

    public static bool operator ==(HashCode left, HashCode right);
    public static bool operator !=(HashCode left, HashCode right);

    public override bool Equals(object obj);
    public override bool Equals(HashCode other);
    public override int GetHashCode();
}

// Usage:

public override int GetHashCode()
{
    return HashCode
        .Combine(_field1)
        .Combine(_field2)
        .Combine(_field3)
        .Combine(_field4);
}

jamesqo on 17 Nov 2016

❤1

@terrajobst said:

Fair point. I added it because it was referenced in @Eilon's code. From experience I'd say that strings are super common. On the other hand, I'm not sure that specifying a comparison is. Let's leave it off for now.

It's actually super important: creating hashes for strings often involves taking into account the purpose of that string, which involves both its culture and its case-sensitivity. The StringComparer isn't about comparisons per se, but rather about providing specific GetHashCode implementations that are culture/case-aware.

Without this API, you'd need to do something weird like:

HashCode.Combine(str1.ToLowerInvariant()).Combine(str2.ToLowerInvariant())

And that is chock-full of allocations, follows poor culture-sensitivity patterns, etc.

Eilon on 17 Nov 2016

👍1

@Eilon in such case I would expect the code should explicitly call string.GetHashCode(StringComparison comparison) which is culture/case-aware and pass the result as int into Combine.

c# HashCode.Combine(str1.GetHashCode(StringComparer.Ordinal)).Combine(...)

karelz on 17 Nov 2016

@Eilon, you could just use StringComparer.InvariantCultureIgnoreCase.GetHashCode.

jamesqo on 17 Nov 2016

Those are certainly better in terms of allocations, but those calls are not pretty to look at... We have uses all throughout ASP.NET where hashes need to include culture/case-aware strings.

Eilon on 17 Nov 2016

Fair enough, combining all of what was said above, how about this shape then:

``` C#
namespace System.Collections.Generic
{
public struct HashCode : IEquatable
{
public static HashCode Combine(int hash);
public static HashCode Combine(T obj);
public static HashCode Combine(string text, StringComparison comparison);

    public HashCode Combine(int hash);
    public HashCode Combine<T>(T obj);
    public HashCode Combine(string text, StringComparison comparison);

    public int Value { get; }

    public static implicit operator int(HashCode hashCode);

    public static bool operator ==(HashCode left, HashCode right);
    public static bool operator !=(HashCode left, HashCode right);

    public override bool Equals(object obj);
    public override bool Equals(HashCode other);
    public override int GetHashCode();
}

}

// Usage:

public override int GetHashCode()
{
return HashCode.Combine(_field1)
.Combine(_field2)
.Combine(_field3)
.Combine(_field4);
}
```

terrajobst on 17 Nov 2016

ship it! :-)

KrzysztofCwalina on 17 Nov 2016

🎉1

@terrajobst _Hold on--_ can't Combine(string, StringComparison) just be implemented as an extension method?

public static class HashCodeExtensions
{
    public static HashCode Combine(this HashCode hashCode, string text, StringComparison comparison)
    {
        switch (comparison)
        {
            case StringComparison.Ordinal:
                return HashCode.Combine(StringComparer.Ordinal.GetHashCode(text));
            case StringComparison.OrdinalIgnoreCase:
                ...
        }
    }
}

I would much, much prefer it to be an extension method rather than a part of the type signature. However, if you or @Elion absolutely think this should be a builtin method, I won't block this proposal.

(edit: Also System.Numerics is probably a better namespace, unless we have hash-related types in Collections.Generic today that I'm not aware of.)

jamesqo on 17 Nov 2016

LGTM. I would go extension.

redknightlois on 17 Nov 2016

Yes, it could be an extension method, but what problem does it solve?

terrajobst on 17 Nov 2016

@terrajobst

Yes, it could be an extension method, but what problem does it solve?

I was suggesting in ASP.NET code. If it's common for their use case, then that's fine, but that may not be true for other libraries/apps. If it turns out this is common enough later, we could always reevaluate and decide to add it in a separate proposal.

jamesqo on 17 Nov 2016

Mhhh this is core anyways. Once defined it will be part of the signature anyways. Scrap the comment. It is ok as it is.

redknightlois on 17 Nov 2016

Using extension methods is useful for cases where:

it's an existing type which we'd like to augment without having to ship an update to the type itself
solve layering issues
separate super common APIs from much less used APIs.

I don't think (1) or (2) apply here. (3) would only help if we would move the code to a different assembly than HashCode or if we move it to a different namespace. I'd argue that strings are common enough that it's not worth it. In fact, I'd even argue that they are so common that treating them as first class makes more sense than trying to artificially separate them on an extension type.

terrajobst on 17 Nov 2016

@terrajobst, to be clear I was suggesting ditch the string API altogether, and leave it to ASP.NET to write up their own extension method for strings.

I'd argue that strings are common enough that it's not worth it. In fact, I'd even argue that they are so common that treating them as first class makes more sense than trying to artificially separate them on an extension type.

Yes, but how common is it for someone to want to get the non-ordinal hash code of a string, which is the only scenario the existing Combine<T> overload does not take care of? (e.g. Someone who calls StringComparer.CurrentCulture.GetHashCode in their overrides?) I may be wrong, but I haven't seen many.

Sorry for the pushback on this; it's just that once an API is added there's no going back.

jamesqo on 17 Nov 2016

👍1

yes, but how common is it for someone to want to get the non-ordinal hash code of a string

I might be biased, but case invariance is quite popular. Sure, not many (if any) care for culture-specific hash-codes but hash-codes that ignore casing I can totally see -- and that seems what @Eilon is after (which is StringComparison.OrdinalIgnoreCase).

Sorry for the pushback on this; it's just that once an API is added there's no going back.

No kidding 😈 Agreed, but even if the API isn't used as much it's useful and doesn't cause any harm.

terrajobst on 17 Nov 2016

@terrajobst Ok then, let's add it :+1: Last issue: I mentioned this above, but can we make the namespace Numerics rather than Collections.Generic? If we were to add more hashing-related types in the future like @redknightlois suggests, I think they would be something of a misnomer in Collections.

jamesqo on 18 Nov 2016

I'm lovin' it. 🍔

Eilon on 18 Nov 2016

I don't think Hashing falls conceptually into Collections. What about System.Runtime?

karelz on 18 Nov 2016

I was going to suggest the same, or even System. It is not Numerics either.

redknightlois on 18 Nov 2016

@karelz, System.Runtime could work. @redknightlois System would be convenient, since chances are you've already imported that namespace. Dunno if that would be the appropriate, though (again, if more hashing types are added).

jamesqo on 18 Nov 2016

We shouldn't put it in System.Runtime as this is for esoteric and quite specialized cases. I talked to @KrzysztofCwalina and we both think it's one of the two:

System
System.Collections.*

We both lean towards System.

terrajobst on 18 Nov 2016

👍3

If what we need is a rationale of why to go for System I can try a justification. We built HashCode to help in implementations of object.GetHashCode(), it sounds fitting that both would share namespace.

redknightlois on 18 Nov 2016

@terrajobst I think System should be the namespace, then. Let's :shipit:

jamesqo on 18 Nov 2016

🎉2

Updated the API spec in the description.

jamesqo on 18 Nov 2016

👍1

[@redknightlois] If what we need is a rationale of why to go for System I can try a justification. We built HashCode to help in implementations of object.GetHashCode(), it sounds fitting that both would share namespace.

That was the rationale that @KrzysztofCwalina and I used, too. Sold!

terrajobst on 18 Nov 2016

👍4 🎉1

@jamesqo

I assume you want to provide the PR with the implementation as well?

terrajobst on 18 Nov 2016

@terrajobst Yes, definitely. Thanks for taking the time to review this!

jamesqo on 18 Nov 2016

👍1

Yes, definitely.

Sweet. In that case I'll leave it assigned to you. That's good with you @karelz?

Thanks for taking the time to review this!

Thanks for taking the time working with us on the API shape. It can be a painful process to go back and forth. We highly appreciate your patience!

terrajobst on 18 Nov 2016

👍1

And I'm looking forward to delete the ASP.NET Core implementation and use this instead 😄

Eilon on 18 Nov 2016

❤1

public static HashCode Combine(string text, StringComparison comparison);
public HashCode Combine(string text, StringComparison comparison);

Nit: The methods on String that take StringComparison (e.g. Equals, Compare, StartsWith, EndsWith, etc.) use comparisonType as the name of the parameter, not comparison. Should the parameter be named comparisonType here as well to be consistent?

justinvp on 18 Nov 2016

@justinvp, that seems more like a naming flaw in String's methods; Type is redundant. I do not think we should make parameter names in new APIs more verbose just to "follow precedent" with old ones.

jamesqo on 18 Nov 2016

As another data point, xUnit chose to use comparisonType as well.

justinvp on 18 Nov 2016

@justinvp You've convinced me. Now that I think about it intuitively, "case-insensitive" or "culture-dependent" is a 'type' of comparison. I'll change the name.

jamesqo on 18 Nov 2016

I'm ok with the shape of this, but regarding the StringComparison, a possible alternative:

Don't include:

``` C#
public static HashCode Combine(string text, StringComparison comparison);
public HashCode Combine(string text, StringComparison comparison);

Instead, add a method:

``` C#
public class StringComparer
{
    public static StringComparer FromComparison(StringComparison comparison);
    ...
}

Then instead of writing:

``` C#
public override int GetHashCode()
{
return HashCode.Combine(_field1)
.Combine(_field2)
.Combine(_field3)
.Combine(_field4, _comparison);
}

you write:

``` C#
public override int GetHashCode()
{
    return HashCode.Combine(_field1)
                   .Combine(_field2)
                   .Combine(_field3)
                   .Combine(StringComparer.FromComparison(_comparison).GetHashCode(_field4));
}

Yes, it's a bit longer, but it solves the same problem without needing two specialized methods on HashCode (which we've just promoted to System), and you get a static helper method that can be used in other, unrelated situations. It also keeps it similar to how you'd use it if you already have a StringComparer (since we're not talking about comparer overloads):

C# public override int GetHashCode() { return HashCode.Combine(_field1) .Combine(_field2) .Combine(_field3) .Combine(_comparer.GetHashCode(_field4)); }

stephentoub on 18 Nov 2016

👍4 👎2

@stephentoub, FromComparison sounds like a good idea. I actually proposed upwards in the thread to add a string.GetHashCode(StringComparison) api, which makes your example even simpler (assuming non-null string):

public override int GetHashCode()
{
    return HashCode.Combine(_field1)
                   .Combine(_field2)
                   .Combine(_field3)
                   .Combine(_field4.GetHashCode(_comparison));
}

@Elion said it added too many calls though.

(edit: made a proposal for your api.)

jamesqo on 18 Nov 2016

👍1

I also dislike adding 2 specialized methods on HashCode for string.
@Eilon you mentioned the pattern is used in ASP.NET Core itself. How much do you think external developers will use it?

karelz on 18 Nov 2016

@jamesqo thanks for driving the design! As @terrajobst said, we appreciate your help and patience. Fundamental small APIs can sometimes take a while to iterate on :).

Let's see where we land with this last piece of API feedback, then we can go ahead with the implementation.

karelz on 18 Nov 2016

Should there be a:

C# public static HashCode Combine<T>(T obj, IEqualityComparer<T> cmp);

?

(Apologies if that was dismissed already and I'm missing it here).

JonHanna on 18 Nov 2016

@stephentoub said:

write:

c# public override int GetHashCode() { return HashCode.Combine(_field1) .Combine(_field2) .Combine(_field3) .Combine(StringComparer.FromComparison(_comparison).GetHashCode(_field4)); }

Yes, it's a bit longer, but it solves the same problem without needing two specialized methods on HashCode (which we've just promoted to System), and you get a static helper method that can be used in other, unrelated situations. It also keeps it similar to how you'd use it if you already have a StringComparer (since we're not talking about comparer overloads):

Well, it's not just a bit longer, it's like waaay super longer, and has zero discoverability.

What is the resistance to adding this method? If it's useful, can be clearly implemented correctly, has no ambiguity in what it does, why not add it?

Having the additional static helper/conversion method is fine - though I'm not sure I'd use it - but why at the expense of convenience methods?

Eilon on 18 Nov 2016

👎1

why at the expense of convenience methods?

Because it's not clear to me convenience methods are really needed here. I get that ASP.NET does it in various places. How many places? And in how many of those places is it actually a variable StringComparison you have rather than a known value? In which case you don't even need the helper I mentioned and could just do:

``` C#
.Combine(StringComparer.InvariantCulture.GetHashCode(_field4))

which in no way seems onerous to me or any more undiscoverable than knowing about StringComparison and doing:

``` C#
.Combine(_field4, StringComparison.InvariantCulture);

and is actually faster, since we don't have to branch inside Combine to do the exact same thing the dev could have written. Is the extra code that much of an inconvenience that it's worth adding specialized overloads for that one case? Why not overloads for StringComparer? Why not overloads for EqualityComparer? Why not overloads that take a Func<T, int>? At some point you draw the line and say "the value this overload provides just isn't worth it", because everything we add comes at a cost, whether that's the cost of maintenance, the cost of code size,, the cost of whatever, and if the dev really needs this case, it's very little additional code for the dev to handle with fewer specialized cases. So I was suggesting maybe the right place to draw the line is before these overloads rather than after (but as I stated at the beginning of my previous response, "I'm ok with the shape of this", and was suggesting an alternative).

stephentoub on 18 Nov 2016

Here's the search I did: https://github.com/search?p=2&q=user%3Aaspnet+hashcodecombiner&type=Code&utf8=%E2%9C%93

Out of ~100 matches, even just from the first few pages almost every use case has strings, and in several cases uses different kinds of string comparisons:

(And dozens of others.)

So it seems that certainly within the ASP.NET Core codebase this is an extremely common pattern. Of course I can't speak to any other system.

Eilon on 18 Nov 2016

Out of ~100 matches

Every one of the 10 you listed (I didn't look at the rest of the search) explicitly specifies the string comparison, rather than pulling it from a variable, so aren't we just talking about the difference between, for example:

``` C#
.Combine(Name, StringComparison.OrdinalIgnoreCase)

``` C#
.Combine(StringComparer.OrdinalIgnoreCase.GetHashCode(Name))

? That is not "waaay super longer" and is more efficient, unless I'm missing something.

Anyway, as I've stated, I'm simply suggesting we really consider whether these overloads are necessary. If most folks believe they are, and we're not just considering our own ASP.NET codebase, fine.

stephentoub on 18 Nov 2016

👍1

Related, what is the behavior we're planning for null inputs? What about int==0? I can start to see more benefit to the string overload if we allow null to be passed, as I believe StringComparer.GetHashCode typically throws for a null input, so if this really is common, it does start to become more cumbersome if the caller has to special case nulls. But that then also begs the question of what the behavior will be when null is provided. Is a 0 mixed in to the hash code as with any other value? Is it treated as a nop and the hashcode left alone?

stephentoub on 18 Nov 2016

👍1

I think the best general approach to null is to mix in a zero. For a single null element added having it as a nop would be better, but if someone is feeding in a sequence then it becomes more beneficial to have 10 nulls hash differently to 20.

JonHanna on 18 Nov 2016

👍3

Indeed, my vote comes from the perspective of ASP.NET Core's codebase, where having an overload that is string-aware would be very helpful. The things about line-length were not really my main concern, but rather about discoverability.

If a string-aware overload were not available in the system, we'd just add an internal extension method in ASP.NET Core and use that.

Eilon on 18 Nov 2016

If a string-aware overload were not available in the system, we'd just add an internal extension method in ASP.NET Core and use that.

I think that would be a great solution for now, until we see more evidence that such API is needed in general, also outside of ASP.NET Core code base.

karelz on 18 Nov 2016

I've to say that I don't see the value in removing the string overload. It doesn't reduce any complexity, it doesn't make the code more efficient, and it doesn't prevent us from improving other areas, such as providing a method that returns a StringComparer from a StringComparison. Syntactic sugar _does_ matter, because .NET has always been about making the common case easy. We also want to guide the developer to do the right thing and fall into the pit of success.

We need to appreciate that strings are special and incredibly common. By adding an overload that specializes them we achieve two things:

We make scenarios like @Eilon's much easier.
We make it discoverable that considering the comparison for strings is important, especially casing.

We also need to consider that common boilerplate helpers like the extension method @Eilon mentioned above isn't a good thing, it's a bad thing. It results in wasted hours of copy & pasting helper methods and will likely result in unnecessary code bloat and bugs when it's not done properly.

However, if the primary concern is about special casing string, then how about this:

``` C#
public struct HashCode : IEquatable
{
public HashCode Combine(T obj, IEqualityComparer comparer);
}

// Usage
return HashCode.Combine(_numberField)
.Combine(_stringField, StringComparer.OrdinalIgnoreCase);
```

terrajobst on 19 Nov 2016

@terrajobst, your compromise is a clever one. I like how you no longer have to call GetHashCode explicitly or nest an extra set of parentheses with a custom comparer.

(edit: I guess I should really credit it to @JonHanna since he mentioned it earlier in the thread? 😄 )

@JonHanna Yes, we're also going to hash null inputs as a 0.

jamesqo on 19 Nov 2016

👍1

Sorry for interrupting the conversation here. But, where should I put the new type? @mellinoe @ericstj @weshaggard, do you suggest I make a new assembly/package for this type like System.HashCode, or should I add it to an existing assembly like System.Runtime.Extensions? Thanks.

jamesqo on 19 Nov 2016

We've recently refactored the assembly layout in .NET Core quite a bit; I suggest to put it where the concrete comparers live, which seem to indicate System.Runtime.Extensions.

@weshaggard?

terrajobst on 19 Nov 2016

@terrajobst Regarding the proposal itself, I just found out that we can't name both the static & instance overloads Combine, unfortunately. 😢

The following results in a compiler error because instance and static methods cannot have the same names:

using System;
using System.Collections.Generic;

public struct HashCode
{
    public void Combine(int i)
    {
    }

    public static void Combine(int i)
    {
    }
}

Now we have 2 options:

Rename the static overloads to something different like Create, Seed, etc.
Move the static overloads to another static class:

public static class Hash
{
    public static HashCode Combine(int hash);
}

public struct HashCode
{
    public HashCode Combine(int hash);
}

// Usage:
return Hash.Combine(_field1)
           .Combine(_field2)
           .Combine(_field3);

I'm preferential towards the second. It's unfortunate that we have to work around this issue, but... thoughts?

jamesqo on 19 Nov 2016

Separating the logic into 2 types sounds weird to me - to use HashCode you have to make the connection and start with Hash class instead.

I'd rather add Create method (or Seed or Init).
I would add also no-args overload HashCode.Create().Combine(_field1).Combine(_field2).

karelz on 19 Nov 2016

@karelz, I do not think we should add a factory method if it isn't the same name. We should just offer the parameterless constructor, new, since it is more natural. Besides, e can't prevent people from writing new HashCode().Combine since it is a struct.

public override int GetHashCode()
{
    return new HashCode()
        .Combine(_field1)
        ...
}

This does an extra combine with 0 and _field1's hash code, instead of initializing directly from the hash code. However, a side effect of the current hash we're using, is that 0 is passed in as the first parameter, it will be rotated to zero and added to zero. And when 0 is xored with the first hash code, it will just produce the first hash code. So if the JIT is any good at constant folding (and I believe it optimizes this xor away), in effect this should be equivalent to direct initialization.

Proposed API (updated spec):

namespace System
{
    public struct HashCode : IEquatable<HashCode>
    {
        public HashCode Combine(int hash);
        public HashCode Combine<T>(T obj);
        public HashCode Combine<T>(T obj, IEqualityComparer<T> comparer);

        public int Value { get; }

        public static implicit operator int(HashCode hashCode);

        public static bool operator ==(HashCode left, HashCode right);
        public static bool operator !=(HashCode left, HashCode right);

        public override bool Equals(object obj);
        public override bool Equals(HashCode other);
        public override int GetHashCode();
    }
}

jamesqo on 20 Nov 2016

@redknightlois @JonHanna @stephentoub @Eilon, do you have an opinion on a factory method vs. using the default constructor? I found out the compiler does not allow a static Combine overload since that conflicts with the instance methods, so we have the option of either

HashCode.Create(field1).Combine(field2) // ...

// or, using default constructor

new HashCode().Combine(field1).Combine(field2) // ...

The advantage of the first is that it's a little terser. The advantage of the second is that it will have consistent naming so you don't have to write something different for the first field.

jamesqo on 20 Nov 2016

Another possibility is two different types, one with the Combine factory, one with the Combine instance (or the second as an extension on the first type).

I'm not sure which I'd prefer TBH.

JonHanna on 20 Nov 2016

@JonHanna, your second idea with the instance overloads being extension methods sounds great. That said, hc.Combine(obj) in that case tries to pick up on the static overload: TryRoslyn.

I proposed having a static class as the entrypoint a few comments above, which reminds me... @karelz, you said

Separating the logic into 2 types sounds weird to me - to use HashCode you have to make the connection and start with Hash class instead.

What connection would people have to make? Wouldn't we introduce them to Hash first, and then from there they can make their way to HashCode? I don't think adding a new static class would be a problem.

jamesqo on 20 Nov 2016

Separating the logic into 2 types sounds weird to me - to use HashCode you have to make the connection and start with Hash class instead.

We could keep the top-level type HashCode and just nest the struct. This would allow the desired usage while keeping the "entry point" of the API to one top-level type, e.g.:

``` c#
namespace System
{
public static class HashCode
{
public static HashCodeValue Combine(int hash);
public static HashCodeValue Combine(T obj);
public static HashCodeValue Combine(T obj, IEqualityComparer comparer);

    public struct HashCodeValue : IEquatable<HashCodeValue>
    {
        public HashCodeValue Combine(int hash);
        public HashCodeValue Combine<T>(T obj);
        public HashCodeValue Combine<T>(T obj, IEqualityComparer<T> comparer);

        public int Value { get; }

        public static implicit operator int(HashCodeValue hashCode);

        public static bool operator ==(HashCodeValue left, HashCodeValue right);
        public static bool operator !=(HashCodeValue left, HashCodeValue right);

        public bool Equals(HashCodeValue other);
        public override bool Equals(object obj);
        public override int GetHashCode();
    }
}

}
```

Edit: Although, probably need a better name than HashCodeValue for the nested type if we go down this path as HashCodeValue.Value is a little redundant, not that Value would be used very often. Maybe we don't even need a Value property -- you can get the Value via GetHashCode() if you don't want to cast to int.

justinvp on 20 Nov 2016

@justinvp What is the problem with having two separate types in the first place, though? This system seems to work fine for LinkedList<T> and LinkedListNode<T>, for example.

jamesqo on 20 Nov 2016

What is the problem with having two separate types in the first place, though?

There are two concerns with two top-level types:

Which type is the "entry point" for the API? If the names are Hash and HashCode, which one do you start with? It's not clear from those names. With LinkedList<T> and LinkedListNode<T> it's pretty clear which one is the main entry point, LinkedList<T>, and which is a helper.
Polluting the System namespace. It's not as much of a concern as (1), but something to keep in mind as we consider exposing new functionality in the System namespace.

Nesting helps mitigate these concerns.

justinvp on 20 Nov 2016

👍1

@justinvp

Which type is the "entry point" for the API? If the names are Hash and HashCode, which one do you start with? It's not clear from those names. With LinkedList and LinkedListNode it's pretty clear which one is the main entry point, LinkedList, and which is a helper.

OK, fair enough point. What if we named the types Hash and HashValue, not nesting types? Would that denote enough of a subjugatory relationship between the two types?

If we do so, then the factory method becomes even terse: Hash.Combine(field1).Combine(field2). Plus, using the struct type in itself is still practical. For example, someone might want to collect a list of hashes, and to communicate this to the reader a List<HashValue> is used instead of a List<int>. This might not work as well if we made the type nested: List<HashCode.HashCodeValue> (even List<Hash.Value> is kinda confusing at first glance).

Polluting the System namespace. It's not as much of a concern as (1), but something to keep in mind as we consider exposing new functionality in the System namespace.

I agree, but I also think it's important we follow convention & don't sacrifice ease of use. For example, the only BCL APIs I can think of where we have nested types (immutable collections don't count, they're not strictly part of the framework) is List<T>.Enumerator, where we actively want to hide the nested type because it's intended for compiler use. We don't want to do that in this case.

Maybe we don't even need a Value property -- you can get the Value via GetHashCode() if you don't want to cast to int.

I thought about that earlier. But then how is the user going to know that the type overrides GetHashCode, or that it has an implicit operator?

Proposed API

public static class Hash
{
    public static HashValue Combine(int hash);
    public static HashValue Combine<T>(T obj);
    public static HashValue Combine<T>(T obj, IEqualityComparer<T> comparer);
}

public struct HashValue : IEquatable<HashValue>
{
    public HashValue Combine(int hash);
    public HashValue Combine<T>(T obj);
    public HashValue Combine<T>(T obj, IEqualityComparer<T> comparer);

    public int Value { get; }

    public static implicit operator int(HashValue hashValue);

    public static bool operator ==(HashValue left, HashValue right);
    public static bool operator !=(HashValue left, HashValue right);

    public override bool Equals(object obj);
    public bool Equals(HashValue other);
    public override int GetHashCode();
}

jamesqo on 20 Nov 2016

👎1

What if we named the types Hash and HashValue, not nesting types?

Hash just seems way too general of a name to me. I think we need to have HashCode in the name of the entry point API because its intended purpose is to help implement GetHashCode(), not GetHash().

someone might want to collect a list of hashes, and to communicate this to the reader a List is used instead of a List. This might not work as well if we made the type nested: List (even List is kinda confusing at first glance).

This seems like an unlikely use case -- not sure we should optimize the design for it.

the only BCL APIs I can think of where we have nested types

TimeZoneInfo.AdjustmentRule and TimeZoneInfo.TransitionTime are examples in the BCL that were intentionally added as nested types.

justinvp on 20 Nov 2016

@justinvp

I think we need to have HashCode in the name of the entry point API because its intended purpose is to help implement GetHashCode(), not GetHash().

👍 I see.

I've thought about things a bit more. It seems reasonable to have a nested struct; as you mentioned, most people will never see the actual type. Just one thing: I think the type should be called Seed, rather than HashCodeValue. The context of its name is already implied by the containing class.

Proposed API

namespace System
{
    public static class HashCode
    {
        public static Seed Combine(int hash);
        public static Seed Combine<T>(T obj);
        public static Seed Combine<T>(T obj, IEqualityComparer<T> comparer);

        public struct Seed : IEquatable<Seed>
        {
            public Seed Combine(int hash);
            public Seed Combine<T>(T obj);
            public Seed Combine<T>(T obj, IEqualityComparer<T> comparer);

            public int Value { get; }

            public static implicit operator int(Seed seed);

            public static bool operator ==(Seed left, Seed right);
            public static bool operator !=(Seed left, Seed right);

            public bool Equals(Seed other);
            public override bool Equals(object obj);
            public override int GetHashCode();
        }
    }
}

jamesqo on 20 Nov 2016

@jamesqo Any objection or implementation issue with having public readonly int Value instead? Problem with Seed is that it is not technically a seed after the first combine.

Also agree with @justinvp, Hash should be reserved to dealing with hashes. This was introduced simplify dealing with HashCode instead.

redknightlois on 20 Nov 2016

@redknightlois To be clear, we were talking about the struct name, not the property name.

        public struct Seed : IEquatable<Seed>
        {
            public Seed Combine(int hash);
            public Seed Combine<T>(T obj);
            public Seed Combine<T>(T obj, IEqualityComparer<T> comparer);

            public int Value { get; }

            public static implicit operator int(Seed seed);

            public static bool operator ==(Seed left, Seed right);
            public static bool operator !=(Seed left, Seed right);

            public bool Equals(Seed other);
            public override bool Equals(object obj);
            public override int GetHashCode();
        }

Usage:
c# int hashCode = HashCode.Combine(field1).Combine(name, StringComparison.OrdinalIgnoreCase).Value; int hashCode = (int)HashCode.Combine(field1).Combine(field2);

Problem with Seed is that it is not technically a seed after the first combine.

It is a seed for the next combine, which produces a new seed.

jamesqo on 20 Nov 2016

Any objection or implementation issue with having public readonly int Value instead?

Why? int Value { get; } is more idiomatic and can easily be inlined.

jamesqo on 20 Nov 2016

It is a seed for the next combine, which produces a new seed.

Wouldn't that be a seedling? ;)

JonHanna on 20 Nov 2016

@jamesqo In my experience when surrounded with complex code properties tend to generate worse code than fields (among those, non inlines). Also a readonly field of a single int on an struct translates straight in a register and eventually when the JIT uses readonly for optimization (which couldn't find any use of it yet regarding code-gen); there are optimizations that could be allowed because it can reason it is a readonly. From the point of view of the use, there is really not different than a single getter.

EDIT: Plus it also pushes the idea that those structs are really immutable.

redknightlois on 21 Nov 2016

In my experience when surrounded with complex code properties tend to generate worse code than fields (among those, non inlines).

If you find a single non-Debug build where an auto-implemented property is not always inlined, then that is a JIT issue and should definitely be fixed.

Also a readonly field of a single int on an struct translates straight in a register

there are optimizations that could be allowed because it can reason it is a readonly.

The backing field of this struct will be readonly; the API will be an accessor.

I don't think using a property will hurt performance in any way here.

jamesqo on 21 Nov 2016

@jamesqo I will keep that in mind, when I find those. For performance sensitive code I just don't use properties anymore because of that (muscle memory at this point).

redknightlois on 21 Nov 2016

You may consider calling the nested struct "State" rather than "Seed"?

ellismg on 21 Nov 2016

👍1

@ellismg Sure, thanks for the suggestion. I was struggling to come up with a good name for the inner struct.

jamesqo on 21 Nov 2016

@karelz I think this API is finally good-to-go; this time, I checked to make sure everything compiles. Unless anyone has any objections, I will start working on the implementation for this.

jamesqo on 21 Nov 2016

@jamesqo @JonHanna why do we need the Combine<T>(T obj) instead of Combine(object o)?

karelz on 22 Nov 2016

why do we need the Combine(T obj) instead of Combine(object o)?

The latter would allocate if the instance was a struct.

stephentoub on 22 Nov 2016

duh, thanks for clarification.

karelz on 22 Nov 2016

We don't like the nested type because it seems to complicate the design. The root problem was that that we can't name the statics and the non-statics the same. We've two options: get rid of the statics or rename. We think that renaming to Create makes the most sense as it creates fairly readable code, compared with using the default constructor.

Unless there is some strong opposition, that's the design we've settled on:

```C#
namespace System
{
public struct HashCode : IEquatable
{
public static HashCode Create(int hashCode);
public static HashCode Create(T obj);
public static HashCode Create(T obj, IEqualityComparer comparer);

    public HashCode Combine(int hashCode);
    public HashCode Combine<T>(T obj);
    public HashCode Combine<T>(T obj, IEqualityComparer<T> comparer);

    public int Value { get; }

    public static implicit operator int(HashCode hashCode);

    public static bool operator ==(HashCode left, HashCode right);
    public static bool operator !=(HashCode left, HashCode right);

    public bool Equals(HashCode other);
    public override bool Equals(object obj);
    public override int GetHashCode();
}

}
```

terrajobst on 22 Nov 2016

👍2

Let's wait for couple of days for additional feedback to find out if there is strong feedback on the approved proposal. Then we can make it 'up for grabs'.

karelz on 22 Nov 2016

Why does it complicate the design? I could understand how it would be bad if we actually had to use the HashCode.State in code (e.g. to define the type of a variable) but do we expect that is often the case? Most times I will end up either returning the Value outright or converting to an int and storing that.

I think the combination of Create and Combine is worse.

ellismg on 22 Nov 2016

👍1

Please see https://github.com/dotnet/corefx/issues/8034#issuecomment-262661653

~~@terrajobst~~

We think that renaming to Create makes the most sense as it creates fairly readable code, compared with using the default constructor.

Unless there is some strong opposition, that's the design we've settled on:

I hear you, but I had a last-minute thought while I was working on the implementation... could we simply add a static Zero / Empty property to HashCode, and then have people call Combine from there? That would free us from having to have separate Combine / Create methods.

namespace System { public struct HashCode : IEquatable<HashCode> { public static HashCode Empty { get; } public HashCode Combine(int hashCode); public HashCode Combine<T>(T obj); public HashCode Combine<T>(T obj, IEqualityComparer<T> comparer); public int Value { get; } public static implicit operator int(HashCode hashCode); public static bool operator ==(HashCode left, HashCode right); public static bool operator !=(HashCode left, HashCode right); public bool Equals(HashCode other); public override bool Equals(object obj); public override int GetHashCode(); } } int GetHashCode() { return HashCode.Empty .Combine(_1) .Combine(_2); }

~~Anyone else think this is a good idea? (I'll submit a PR in the meantime, and if people think so I'll change it in the PR.)~~

jamesqo on 23 Nov 2016

@jamesqo, I like the Empty/Zero idea.

KrzysztofCwalina on 23 Nov 2016

👍1

I would be fine with that (no strong preference between Empty vs. Create factory) ... @weshaggard @bartonjs @stephentoub @terrajobst what do you guys think?

karelz on 23 Nov 2016

I personally think Create() is better; but I like HashCode.Empty better than new HashCode().

Since it allows for a version which has no operator-new, and it doesn't preclude deciding later that we really want Create as a bootstrapper... ::shrug::.

That's the full extent of my pushback (aka not very much).

bartonjs on 23 Nov 2016

FWIW I'd vote for Create rather than Empty/Zero. I'd rather start out with an actual value than hang everything off Empty/Zero. It just feels/looks weird.

khellang on 23 Nov 2016

👍1

It also discourages people seeding with zero, which tends to be a poor seed.

JonHanna on 23 Nov 2016

I prefer Create rather than Empty. It jives with how I think about it: I want to create hash code and mix in additional values. I'd be fine with the nested approach, too.

stephentoub on 23 Nov 2016

👍2

While I was going to say that calling it Empty was not a good idea (and that have been said already), after like a third thought I do still think is not a bad solution. How about something like Builder. While still possible to use zero, the word kinda discourage you to use it right away.

redknightlois on 23 Nov 2016

@JonHanna just to clarify: You meant it as vote for Create, right?

karelz on 23 Nov 2016

And on a fourth thought how about With instead of Create.

HashCode.With(a).Combine(b). Combine(c)

redknightlois on 23 Nov 2016

Example usage based on latest discussion (with Create possibly replaced w/ an alternative name):

```c#
public override int GetHashCode() =>
HashCode.Create(_field1).Combine(_field2).Combine(_field3);

We went down the path of this chaining approach, but didn't reconsider earlier proposals when the static & instance `Combine` methods didn't pan out...

Are we sure we don't want something like the existing `Path.Combine` pattern, that was proposed previously, with a handful of generic `Combine` overloads? e.g.:

```c#
public override int GetHashCode() =>
    HashCode.Combine(_field1, _field2, _field3);

justinvp on 23 Nov 2016

👎1 👍1

@justinvp Would lead to inconsistent code + more jitting I think b/c of more generic combinations. We can always revisit this in another issue if it turns out to be desirable.

jamesqo on 23 Nov 2016

For what it's worth, I prefer the originally-proposed version, at least in usage (not sure about the comments regarding code size, jitting, etc.). It seems like overkill to have an extra structure and 10+ different members for something that could be expressed as one method with a few overloads of different arity. I'm also not a fan of fluent-style API's in general, so perhaps that is coloring my opinion.

mellinoe on 23 Nov 2016

I wasn't going to mention this because it's a little unusual and I'm still not sure how I feel about it, but here's another idea, just to make sure all alternatives have been considered...

What if we did something along the lines of ASP.NET Core's mutable HashCodeCombiner "builder", with similar Add methods, but also included support for the collection initializer syntax?

Usage:

```c#
public override int GetHashCode() =>
new HashCode { _field1, _field2, _field3 };

With a surface area something like:

```c#
namespace System
{
    public struct HashCode : IEquatable<HashCode>, IEnumerable
    {
        public void Add(int hashCode);
        public void Add<T>(T obj);
        public void Add<T>(T obj, IEqualityComparer<T> comparer);

        public int Value { get; }

        public static implicit operator int(HashCode hashCode);

        public static bool operator ==(HashCode left, HashCode right);
        public static bool operator !=(HashCode left, HashCode right);

        public bool Equals(HashCode other);
        public override bool Equals(object obj);
        public override int GetHashCode();

        IEnumerator IEnumerable.GetEnumerator();
    }
}

It'd have to implement IEnumerable at a minimum along with at least one Add method to enable the collection initializer syntax. IEnumerable could be implemented explicitly to hide it from intellisense and GetEnumerator could either throw NotSupportedException or return the hash code value as a single combined item in the enumerable, if anyone happened to use it (which would be rare).

justinvp on 23 Nov 2016

👎3 👍2

@justinvp, you have an interesting idea. However, I respectfully disagree; I think HashCode should be kept an immutable to avoid gotchas with mutable structs. Also having to implement IEnumerable for this seems kinda artificial/flaky; if someone has a using System.Linq directive in the file, then Cast<> and OfType<> will show up as extension methods if they put a dot next to a HashCode. I think we should stick closer to the current proposal.

jamesqo on 23 Nov 2016

@jamesqo, I agree -- hence my hesitation to even mention it. The only thing I like about it is the usage can be cleaner than chaining, but that itself is another drawback as it's not clear that collection initializers can even be used without seeing sample usage.

justinvp on 23 Nov 2016

@MadsTorgersen, @jaredpar, why the collection initializer requires implementation of IEnumerable\@justinvp's third comment above.

@jamesqo, I agree it's better to keep this immutable (and not IEnumerable\

KrzysztofCwalina on 23 Nov 2016

@mellinoe I think that would make the simple case slightly simpler, but it would also make anything beyond that more complicated (and less clear about what is the right thing to do).

That includes:

more items than you have overloads for
conditions
loops
using comparer

Consider the code from ASP.NET posted before on this topic (updated to the current proposal):

```c#
var hashCode = HashCode
.Create(IsMainPage)
.Combine(ViewName, StringComparer.Ordinal)
.Combine(ControllerName, StringComparer.Ordinal)
.Combine(AreaName, StringComparer.Ordinal);

if (ViewLocationExpanderValues != null)
{
foreach (var item in ViewLocationExpanderValues)
{
hashCode = hashCode
.Combine(item.Key, StringComparer.Ordinal)
.Combine(item.Value, StringComparer.Ordinal);
}
}

return hashCode;

How would this look with the original `Hash.CombineHashCodes`? I think it would be:

```c#
var hashCode = Hash.CombineHashCodes(
    IsMainPage,
    StringComparer.Ordinal.GetHashCode(ViewName),
    StringComparer.Ordinal.GetHashCode(ControllerName),
    StringComparer.Ordinal.GetHashCode(AreaName));

if (ViewLocationExpanderValues != null)
{
    foreach (var item in ViewLocationExpanderValues)
    {
        hashCode = Hash.CombineHashCodes(
            hashCode
            StringComparer.Ordinal.GetHashCode(item.Key),
            StringComparer.Ordinal.GetHashCode(item.Value));
    }
}

return hashCode;

Even if you ignore calling GetHashCode() for custom comparers, I find having to pass previous value of hashCode as the first parameter not straightforward.

svick on 23 Nov 2016

👍2

@KrzysztofCwalina According to @ericlippert's note in The C# Programming Language¹, it's because collection initializers are (unsurprisingly) meant to be syntax sugar for collection creation, not for arithmetic (which was the other common use of method named Add).

¹ Due to how Google Books works, that link may not work for everyone.

svick on 23 Nov 2016

@KrzysztofCwalina, and note, it requires non-generic IEnumerable, not IEnumerable<T>.

justinvp on 23 Nov 2016

@svick, minor nit in your first example above: the first call to .Combine would be .Create with the current proposal. Unless we use the nested approach.

justinvp on 23 Nov 2016

@svick

it would also make anything beyond that more complicated (and less clear about what is the right thing to do)

I dunno, the second example is barely different from the first one overall, and it's not more complex IMO. With the second/original approach, you just pass in a bunch of hash codes (I think the first parameter actually should be IsMainPage.GetHashCode()), so it seems straightforward to me. But it seems like I'm in the minority here, so I won't push for the original approach. I don't have a strong opinion; both of those examples look reasonable enough to me.

mellinoe on 23 Nov 2016

👍1

@justinvp Thanks, updated. (I went with the first proposal in the first post, and didn't realize it's out of date, someone should probably update it.)

svick on 23 Nov 2016

@mellinoe the problem is actually that the second can generate subtle bugs. This is actual code from one of our projects.

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public int GetHashCode(PageFromScratchBuffer obj)
        {
            int v = Hashing.Combine(obj.NumberOfPages, obj.ScratchFileNumber);
            int w = Hashing.Combine(obj.Size.GetHashCode(), obj.PositionInScratchBuffer.GetHashCode());
            return Hashing.Combine(v, w);            
        }

We live with it, but we are dealing with very low level stuff every day; so not the average developer that is for sure. However it is not the same here to combine v with w than w with v ... the same in between v and w combines. Hash combinations are not commutative, so chaining one after another can actually get rid of a whole set of errors at the API level.

redknightlois on 23 Nov 2016

👍1

I went with the first proposal in the first post, and didn't realize it's out of date, someone should probably update it.

Done.
BTW: This proposal is very hard to keep track of, especially the votes ... so many variations (which I guess is good ;-))

karelz on 23 Nov 2016

@karelz If we add Create APIs then I think we can still add Empty. It doesn't have to be one or the other, as @bartonjs said. Proposed

namespace System
{
    public struct HashCode : IEquatable<HashCode>
    {
        public HashCode();

        public static HashCode Empty { get; }

        public static HashCode Create(int hashCode);
        public static HashCode Create<T>(T value);
        public static HashCode Create<T>(T value, IEqualityComparer<T> comparer);

        public HashCode Combine(int hashCode);
        public HashCode Combine<T>(T value);
        public HashCode Combine<T>(T value, IEqualityComparer<T> comparer);

        public int Value { get; }

        public static implicit operator int(HashCode hashCode);

        public static bool operator ==(HashCode left, HashCode right);
        public static bool operator !=(HashCode left, HashCode right);

        public bool Equals(HashCode other);
        public override bool Equals(object obj);
        public override int GetHashCode();
        public override string ToString();
    }
}

jamesqo on 24 Nov 2016

@JonHanna

It also discourages people seeding with zero, which tends to be a poor seed.

The hashing algorithm we're choosing will be the same one used in HashHelpers today, which has the effect that hash(0, x) == x. HashCode.Empty.Combine(x) will produce the exact same results as HashCode.Create(x), so there is objectively no difference.

jamesqo on 24 Nov 2016

@jamesqo you forgot to include the additional Zero in your last proposal. If that was an omission, can you update it? We can then ask folks to vote for your latest proposal. Looks like the other alternatives (see the top post which I updated) don't get that much following ...

karelz on 24 Nov 2016

@karelz Thanks for spotting, fixed.

jamesqo on 24 Nov 2016

@KrzysztofCwalina to check that you mean “Add” in the sense of adding to a collection, not in some other sense. I don’t know if I like this restriction, but that’s what we decided at the time.

MadsTorgersen on 24 Nov 2016

👍1

public static HashCode Create(int hash);
public HashCode Combine(int hash);

Should the parameter be named hashCode instead of hash since the passed-in value is going to be a hash code likely obtained from calling GetHashCode()?

Empty/Zero

If we do end up keeping this, another name to consider is Default.

justinvp on 24 Nov 2016

@justinvp

Should the parameter be named hashCode instead of hash since the passed-in value is going to be a hash code likely obtained from calling GetHashCode()?

I wanted to name the int parameters hash and the HashCode parameters hashCode. On second thought however, I do believe that hashCode would be better because, as you mentioned, hash is kinda vague. I'll update the API.

If we do end up keeping this, another name to consider is Default.

When I hear Default I think "the regular way to do something when you don't know which option to choose," not "the default value of a struct." e.g. Something like Encoding.Default has a completely different connotation.

jamesqo on 24 Nov 2016

👍2

The hashing algorithm we're choosing will be the same one used in HashHelpers today, which has the effect that hash(0, x) == x. HashCode.Empty.Combine(x) will produce the exact same results as HashCode.Create(x), so there is objectively no difference.

As someone who doesn't know much about the internals of this, I really like the simplicity of HashCode.Create(x).Combine(...). Create is very obvious, because it's used in many other places.

If Empty / Zero / Default doesn't provide any algorithmic usage, it shouldn't be there IMO.

PS: very interesting thread!! Good job! 👍

cwe1ss on 25 Nov 2016

@cwe1ss

If Empty / Zero / Default doesn't provide any algorithmic usage, it shouldn't be there IMO.

Having an Empty field does provide algorithmic usage. It represents a "starting value" from which you can combine hashes. For example, if you want to combine an array of hashes strictly using Create, it's pretty painful:

int CombineRange(int[] hashes)
{
    if (hashes.Length == 0)
    {
        return 0;
    }

    var result = HashCode.Create(hashes[0]);

    for (int i = 1; i < hashes.Length; i++)
    {
        result = result.Combine(hashes[i]);
    }

    return result;
}

If you have Empty, it becomes much more natural:

int CombineRange(int[] hashes)
{
    var result = HashCode.Empty;

    for (int i = 0; i < hashes.Length; i++)
    {
        result = result.Combine(hashes[i]);
    }

    return result;
}

// or

int CombineRange(int[] hashes)
{
    return hashes.Aggregate(HashCode.Empty, (hc, next) => hc.Combine(next));
}

@terrajobst This type is pretty analogous to ImmutableArray<T> to me. An empty array is not very useful by itself, but is very useful as a "starting point" for other operations, and that's why we have an Empty property for it. I think it would make sense to have one for HashCode, too; we're keeping Create.

jamesqo on 26 Nov 2016

@jamesqo I have noticed that you silently/by accident changed arg name obj to value in your proposal https://github.com/dotnet/corefx/issues/8034#issuecomment-262661653. I switched it back to obj which IMO captures better what you get. Name value is more associated with the "int" hash value itself in this context.
I am open to further discussion on the arg name if needed, but let's change it on purpose and keep track of the diff against last approved proposal.

I have updated the proposal on the top. I also called out diff against last approved version of the proposal.

karelz on 26 Nov 2016

The hashing algorithm we're choosing will be the same one used in HashHelpers today

Why it is a good algorithm to choose as the one that should be used everywhere? What assumption it is going to make about the hashcodes being combined? If it is used everywhere, is it going to open new avenues for DDoS attacks? (Note that we have been burnt by this for string hashing in past.)

What if we did something along the lines of ASP.NET Core's mutable HashCodeCombiner "builder"

I think this is the right pattern to use. Good universal hashcode combiner can generally use more state than what fits into hashcode itself, but then the fluent pattern breaks down because passing larger structs around is a performance problem.

jkotas on 27 Nov 2016

Why it is a good algorithm to choose as the one that should be used everywhere?

It shouldn't be used everywhere. See my comment at https://github.com/dotnet/corefx/issues/8034#issuecomment-260790829; it is mainly targeted at people who don't know much about hashing. People who know what they are doing can evaluate it to see if it fits their needs.

What assumption it is going to make about the hashcodes being combined? If it is used everywhere, is it going to open new avenues for DDoS attacks?

One problem with the current hash we have is that hash(0, x) == x. So if a series of nulls or zeroes is fed to the hash then it will remain 0. See code. This is not to say that nulls don't count, but none of the initial nulls do. I am considering using something more robust (but slightly more expensive) like here, which adds a magic constant to avoid mapping zero to zero.

I think this is the right pattern to use. Good universal hashcode combiner can generally use more state than what fits into hashcode itself, but then the fluent pattern breaks down because passing larger structs around is a performance problem.

I don't think there should be a universal combiner with a big struct size that attempts to fit every use case. Instead I was envisioning separate hash code types that are all int-sized (FnvHashCode, etc.) and all have their own Combine methods. Besides, these "builder" types are going to be kept in the same method anyway, not passed around.

jamesqo on 27 Nov 2016

I don't think there should be a universal combiner with a big struct size that attempts to fit every use case.

Is ASP.NET Core going to be able to replace their own hashcode combiner - that has 64-bit of state currently - with this one?

I was envisioning separate hash code types that are all int-sized (FnvHashCode, etc.)

Doesn't this lead to combinatory explosion? It should be part of the API proposal to make it clear what this API design leads to.

jkotas on 27 Nov 2016

@jkotas I stated similar objections at the start of the discussion. Dealing with hash functions requires subject matter knowledge. But I understand and support fixing the problem caused in 2001 with the introduction of hash codes at the very root of the framework and not prescribe a recipe to combine hashes. This design is aimed at solving that for the 99% of the cases (where no subject matter knowledge is available or even necessary, because of the statistical properties of the hash are good enough). ASP.Net Core should be able to use include such combiners into a general purpose framework on a non-system assembly like the one proposed for discussion here: https://github.com/dotnet/corefx/issues/13757

redknightlois on 27 Nov 2016

👍1

I agree that it is good idea to have hashcode combiner that is no-brainer to use in 99% of cases. However, it needs to allow for more internal state than just 32-bit.

BTW: ASP.NET used the fluent pattern for hashcode combining originally, but stopped doing it because of it lead to easy to miss bugs: https://github.com/aspnet/Razor/pull/537

jkotas on 27 Nov 2016

@jkotas regarding the hash-flooding security.
DISCLAIMER: Not an expert (you should consult one and MS do have more than a few on the issue).

I have been looking around and while not the general consensus on the issue there is an argument that is gaining traction nowadays. Hash codes are 32bits sized, I posted before a graph showing the probability of collisions given the size of the set. That means that no matter how good is your algorithm (looking at SipHash for example) it is pretty viable to generate lots of hashes and find collisions in a reasonable time (talking about less than an hour). Those problems need to be addressed at the data structure holding the hashes, they cannot be solved at the hash function level. Paying extra performance on non-cryptographic to secure against hash-flooding without fixing the underlying data structure will not solve the issue.

EDIT: You posted while I was writing. Under the light of this, what does 64bits state gains for you?

redknightlois on 27 Nov 2016

@jkotas I looked into the issue you linked to. It says:

Reaction to aspnet/Common#40

Description of https://github.com/aspnet/Common/issues/40:

Spot the bug:

public class TagBuilder
{
    private Dictionary<string, string> _attributes;
    private string _tagName;
    private string _innerContent;

    public override int GetHashCode()
    {
        var hash = HashCodeCombiner.Start()
            .Add(_tagName, StringComparer.Ordinal)
            .Add(_innerContent, StringComparer.Ordinal);

        foreach (var kvp in _attributes)
        {
            hash.Add(kvp.Key, StringComparer.Ordinal).Add(kvp.Value, StringComparer.Ordinal);
        }

        return hash.Build();
    }
}

Come on. That argument is like saying string should be mutable since people don't realize Substring returns a new string. Mutable structs are far worse in terms of gotchas; I think we should keep the struct immutable.

jamesqo on 27 Nov 2016

👍2

regarding the hash-flooding security.

There are two sides of this: correct-by-construction design (robust datastructures, etc.); and mitigation of the issues in the existing design. Both of them are important.

jkotas on 27 Nov 2016

@karelz Regarding parameter naming

I have noticed that you silently/by accident changed arg name obj to value in your proposal dotnet/corefx#8034 (comment). I switched it back to obj which IMO captures better what you get. Name value is more associated with the "int" hash value itself in this context.
I am open to further discussion on the arg name if needed, but let's change it on purpose and keep track of the diff against last approved proposal.

I am considering, in a future proposal, to add APIs to combine values in bulk. For example: CombineRange(ReadOnlySpan<T>). If we named this obj, we would have to name the parameter there objs, which sounds very awkward. So we should name it item instead; in the future, we can name the span parameter items. Updated the proposal.

jamesqo on 27 Nov 2016

👍1

@jkotas agree, but the point here is we are not mitigating anything at the combiner level...

The only thing we can do is have a random seed, which for all states and purposes I remember having seen the code at string and it is fixed per build. (could be wrong about that, because that was a long time ago though). Having a proper implementation of random seeds is the only mitigation that could apply here.

This is a challenge, give me your best string and or memory hash function with a fixed random seed and I will go and construct a set on 32 bits hash codes that will generate only collisions. I am not scared to issue such a challenge because it is pretty easy to do, probability theory is on my side. I would even go and make a bet, but I know I will win over it, so it is essentially not a bet anymore.

Moreover... a deeper analysis shows that even if the mitigation is the ability to have those "random seeds" built-in per run, a more convoluted combiner is not needed. Because essentially you mitigated the issue at the source.

Say you have M1 and M2 with different random seeds rs1 and rs2 ....
M1 will issue h1 = hash('a', rs1) and h2=hash('b', rs1)
M2 will issue h1' = hash('a', rs2) and h2'=hash('b', rs2)
The key point here is that h1 and h1' will differ with a probability 1/ (int.MaxInt-1) (if hash is good enough that is) which for all purposes is as good as it is gonna get.
Therefore whatever c(x,y) you decide to use (if good enough) is already taking in account the mitigation built-in at the source.

EDIT: I found the code, you are using Marvin32 which change on each domain now. So the mitigation for strings is using random seeds per run. Which as I stated is good enough mitigation.

redknightlois on 27 Nov 2016

@jkotas

Is ASP.NET Core going to be able to replace their own hashcode combiner - that has 64-bit of state currently - with this one?

Absolutely; it uses the same hashing algorithm. I just made this test app to measure the number of collisions and ran it 10 times. No significant difference from using 64 bits.

I was envisioning separate hash code types that are all int-sized (FnvHashCode, etc.)

Doesn't this lead to combinatory explosion? It should be part of the API proposal to make it clear what this API design leads to.

@jkotas, it will not. The design of this class will not set the design for future hashing APIs in stone. Those should be considered more advanced scenarios, should go in a different proposal such as dotnet/corefx#13757, and will have a different design discussion. I believe it is far more important to have a simple API for a general hashing algorithm, for newbies who are struggling with overriding GetHashCode.

I agree that it is good idea to have hashcode combiner that is no-brainer to use in 99% of cases. However, it needs to allow for more internal state than just 32-bit.

When would we need more internal state than 32 bits? edit: If it is to allow people to plug in custom hashing logic, I think (again) that should be considered an advanced scenario and be discussed in dotnet/corefx#13757.

jamesqo on 27 Nov 2016

you are using Marvin32 which change on each domain now

Right, the string hashcode randomization mitigation is enabled by default in .NET Core. It is not enabled by default for standalone apps in full .NET Framework because of compatibility; it is only enabled via quirks (e.g. in high-risk environments).

We still do have the code for non-randomized hashing around in .NET Core, but it should be ok to delete it. I do not expect that we will need it again. It would also make the string hashcode computation a bit faster because of there won't be the check on whether to use the non-randomized path anymore.

The Marvin32 algorithm used to compute the randomized string hashcodes has 64-bit internal state. It was picked by the MS subject experts. I am pretty sure that they had a good reason to use the 64-bit internal state, and they have not used it just to make things slower.

A general purpose hash combiner should keep evolving this mitigation: It should use a random seed and a strong enough hashcode combining algorithm. Ideally, it would use the same Marvin32 as randomized string hashing.

jkotas on 28 Nov 2016

The Marvin32 algorithm used to compute the randomized string hashcodes has 64-bit internal state. It was picked by the MS subject experts. I am pretty sure that they had a good reason to use the 64-bit internal state, and they have not used it just to make things slower.

@jkotas, the hash code combiner you linked to does not use Marvin32. It uses the same naive DJBx33x algorithm used by non-randomized string.GetHashCode.

A general purpose hash combiner should keep evolving this mitigation: It should use a random seed and a strong enough hashcode combining algorithm. Ideally, it would use the same Marvin32 as randomized string hashing.

This type is not intended to be used in places that are prone to hash DoS attacks. This is targeted for people who don't know better to add/xor, and will help prevent things like https://github.com/dotnet/coreclr/pull/4654.

jamesqo on 28 Nov 2016

A general purpose hash combiner should keep evolving this mitigation: It should use a random seed and a strong enough hashcode combining algorithm. Ideally, it would use the same Marvin32 as randomized string hashing.

Then we should talk with the C# team for them to implement a mitigated ValueTuple hashing algorithm. Because that code is going to be used in high risk environments too. And of course Tuple https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Tuple.cs#L60 or System.Numerics.HashHelpers (used all over the place).

Now, before we decide then how to implement it, I would look into the same subject experts if paying the cost of a fully randomized hashcode combining algorithm is worth it (if it exist of course) even though it would not change how the API is designed either (under the API proposed you can use a 512bits state and still have the same public API, if you are willing to pay the cost of it, of course).

redknightlois on 28 Nov 2016

This is targeted for people who don't know better to add/xor

It is exactly why it is important for it to be robust. The key value of .NET is that it addresses problems for people who don't know better.

jkotas on 28 Nov 2016

And while we are at it, lets not forget about IntPtr https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/IntPtr.cs#L119
That one is specially nasty, xor is probably the worst our there because bad will collision with dab.

redknightlois on 28 Nov 2016

implement a mitigated ValueTuple hashing algorithm

Good point. I am not sure whether ValueTuple shipped or whether these is still time to do this. Opened https://github.com/dotnet/corefx/issues/14046.

lets not forget about IntPtr

These are past mistakes ... the bar for fixing them is much higher.

jkotas on 28 Nov 2016

@jkotas

These are past mistakes ... the bar for fixing them is much higher.

I thought one of the points of .Net Core is that the bar for "small" changes like that should be much lower. If someone depends on the implementation of IntPtr.GetHashCode (which they really shouldn't), they can choose not to upgrade their version of .Net Core.

svick on 28 Nov 2016

👍2

the bar for "small" changes like that should be much lower

Yes, it is - compared to full .NET Framework. But you still have to do the work to get the change pushed through the system and you may find that it is just not worth the pain. Recent example is change in Tuple<T> hashing algorithm that got reverted because of it broke F#: https://github.com/dotnet/coreclr/pull/6767#issuecomment-256896016

jkotas on 28 Nov 2016

👍1

@jkotas

If we were to make HashCode 64-bit, do you think an immutable design would kill perf in 32-bit environments? I agree with other readers, a builder pattern seems to be much worse.

terrajobst on 29 Nov 2016

Kill the perf - no. Performance penalty paid for syntax sugar - yes.

jkotas on 29 Nov 2016

Performance penalty paid for syntax sugar - yes.

Is it something that could be optimized by the JIT in the future?

jamesqo on 29 Nov 2016

Kills the perf - no.
Performance penalty paid for syntax sugar - yes.

It's more than syntactic sugar. If we were willing to make HashCode a class, then it would be syntactic sugar. But a mutable value type is a bug farm.

Quoting you from earlier:

It is exactly why it is important for it to be robust. The key value of .NET is that it addresses problems for people who don't know better.

I'd argue that a mutable value type is not a robust API for the majority of the people who don't know better.

terrajobst on 29 Nov 2016

👍1

I'd argue that a mutable value type is not a robust API for the majority of the people who don't know better.

Agree. Thought, I think it is unfortunate that it is the case for mutable struct builder types. I use them all the time because of they are nice and tight. [MustNotCopy] annotations anyone?

jkotas on 29 Nov 2016

👍1

MustNotCopy is a struct lover's dream come true. @jaredpar?

KrzysztofCwalina on 29 Nov 2016

❤1 👍1

MustNotCopy is like stack only but even harder to use 😄

jaredpar on 29 Nov 2016

I suggest do not create any class but create extension methods to combine hash

static class HashHelpers
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static int CombineHash(this int hash1, int hash2);
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static int CombineHash<T>(this int hash, T value);
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static int CombineHash<T>(this int hash, T value, IEqualityComparer<T> comparer);
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static int CombineHash<T>(this int hash, IEnumerable<T> values);
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static int CombineHash<T>(this int hash, IEnumerable<T> values, IEqualityComparer<T> comparer);
}

That is all! It is fast and easy to use.

AlexRadch on 30 Nov 2016

@AlexRadch I don't like that that pollutes the method list for all integers, not just those meant as hashes.

Also, you have methods that continue a chain of computing the hash code, but how do you start it? Do you have to do something non-obvious like starting with zero? I.e. 0.CombineHash(this.FirstName).CombineHash(this.LastName).

svick on 30 Nov 2016

Update: Per the comment in dotnet/corefx#14046, it was decided that the existing hash formula would be kept for ValueTuple:

@jamesqo Thanks for the help.
From last discussion with @jkotas and @VSadov, we're ok to move forward with randomization/seeding, but would rather hold back on adopting a more expensive hash function.
Doing the randomization keeps the door to changing the hash function in the future if the need arises.

@jkotas, can we just keep the current ROL 5-based hash for HashCode then, and shrink it back to 4 bytes? This would eliminate all the problems with struct copying. We can have HashCode.Empty represent a randomized hash value.

jamesqo on 30 Nov 2016

👍1

@svick
Yes this pollutes methods for all integers, but it can be placed in separated name space and if you do not work with hashes you will not include it and will not see.

0.CombineHash(this.FirstName).CombineHash(this.LastName) should be wrote as this.FirstName.GetHash().CombineHash(this.LastName)

To implement starting from seed it can have next static method

static class HashHelpers
{
    public static int ClassSeed<T>();
}

class SomeClass
{
    int GetHash()
    {
        return HashHelpers.ClassSeed<SomeClass>().CombineHash(value1).CombineHash(value2);
    }
}

So each class will have different seed for randomizing hashes.

AlexRadch on 1 Dec 2016

@jkotas, can we just keep the current ROL 5-based hash for HashCode then, and shrink it back to 4 bytes?

I think that a public platform hashcode building helper needs to use 64-bit state to be robust. If it is 32-bit only, it will be prone to producing bad results when it is used to hash more elements, arrays or collections in particular. How do you write documentation on when it is good idea to use it vs. not? Yes, it is extra instructions spent mixing the bits, but I do not think it matters. These kind of instructions execute super fast. My experience is that it is better to do more bit mixing than less because of the effects of doing too little bit mixing are much more severe than doing too much.

Also, I still have concerns about the proposed shape of API. I believe that the problem should be thought about as hash code building, not hash code combining. Maybe it is premature to add this as platform API, and we should rather wait and see whether better pattern emerges for this. This does not prevent somebody from publishing a (source) nuget package with this API, or corefx using it as internal helper.

jkotas on 1 Dec 2016

@jkotas having a 64bits state doesn't guarantee that your output is going to have the proper statistical properties, the combining function itself must be designed to use an internal 64bits state. Also, if the combining function is good (statistically speaking), there is no such thing as more than less mixing. If the hashing has the randomization, avalanche and other statistical properties of interest, the mixing is accounted for as it is technically speaking a specially crafted hash function.

See for what makes a good hash function (which some clearly are like xor: http://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed and https://research.neustar.biz/2012/02/02/choosing-a-good-hash-function-part-3/

@jamesqo BTW, I just realized that the combiner wont work for the case of: "I am actually combining hashes (not runtime hashes) because the seed will change every single time." ... public constructor with seed?

redknightlois on 1 Dec 2016

@jkotas

I think that a public platform hashcode building helper needs to use 64-bit state to be robust. If it is 32-bit only, it will be prone to producing bad results when it is used to hash more elements, arrays or collections in particular.

Does this matter when it will end up being condensed to a single int in the end?

jamesqo on 2 Dec 2016

@jamesqo Not really, state size depends on only the function, not on the robustness. In fact you can actually make your hash function worse it the combine is not designed to work that way, and at best you are wasting resources because you cannot acquire randomness from coercion.

Corollary: it you are cohersing be extra sure the function is statistically excellent or you are almost guaranteed to make it worse.

redknightlois on 3 Dec 2016

This depends on whether there is correlation between the items. If there is no correlation, 32-bit state and simple rotl (or even xor) works just fine. If there is correlation, it depends.

Consider if somebody used this to build string hashcode out of individual characters. Not that it is likely somebody would actually do this for string, but it demonstrates the problem:

for (int i = 0; i < str.Length; i++)
   hashCodeBuilder.Add(str[i]);

It would give poor results for strings with 32-bit state and simple rotl because of characters in real world strings tend to be correlated. How often are the items that this is used for going to be correlated, and how bad results would it give? Hard to tell, though things in the real life tend to be correlated in unexpected ways.

jkotas on 3 Dec 2016

It will be great to add next method to API support Hash randomization.

namespace System
{
    public struct HashCode : IEquatable<HashCode>
    {
       // add this
       public static HashCode CreateRandomized(Type type);
       // or add this
       public static HashCode CreateRandomized<T>();
    }
}

AlexRadch on 3 Dec 2016

@jkotas I haven't tested it, so I trust you did. But that is definitely saying something about the function we intend to use. It is simply not good enough as least if you want to trade speed for reliability (nobody can do stupid things with it). I for once favor the design that this is not a non-crypto hashing function, but a fast way to combine uncorrelated hash-codes (which are as random as they get).

If want we are aiming to is that nobody would do stupid things with it, using a 64bits state is not fixing anything, we are just hiding the problem. It would still be possible to create an input that will exploit that correlation. Which points us yet again into the very same argument that I made 18 days ago. See: https://github.com/dotnet/corefx/issues/8034#issuecomment-261301533

redknightlois on 4 Dec 2016

I for once favor the design that this is not a non-crypto hashing function, but a fast way to combine uncorrelated hash-codes

The fastest way to combine uncorrelated hash-codes is xor...

jkotas on 5 Dec 2016

True, but we do know that last time didnt work so well (IntPtr comes to my mind). Rotation and XOR (current) is just as fast, with no loss if someone does put sort of correlated stuff in.

redknightlois on 5 Dec 2016

Add hash code randomization with public static HashCode CreateRandomized(Type type); or with public static HashCode CreateRandomized<T>(); methods or with both.

AlexRadch on 6 Dec 2016

@jkotas I think I may have found a better pattern for this. What if we used C# 7 ref returns? Instead of returning a HashCode each time, we would return a ref HashCode which fits into a register.

public struct HashCode
{
    private readonly long _value;

    public ref HashCode Combine(int hashCode)
    {
        CombineCore(ref _value, hashCode); // note: modifies the struct in-place
        return ref this;
    }
}

Usage remains the same as before:

return HashCode.Combine(1)
    .Combine(2).Combine(3);

The only downside is that we're back to a mutable struct again. But I don't think there's a way to have both no copying and immutability at the same time.

(ref this does not work yet, but I see a PR in Roslyn to enable it here.)

@AlexRadch I don't think it is wise to combine the hash more with the type, since getting the hash code of the type is expensive.

jamesqo on 6 Dec 2016

@jamesqo public static HashCode CreateRandomized<T>(); do not get type hash code. It creates randomized HashCode for this type.

AlexRadch on 6 Dec 2016

@jamesqo "ref this does not work yet". Even once the Roslyn issue is fixed, ref this won't be available to the corefx repo for a while (I'm not sure how long, @stephentoub can probably set expectations).

jcouv on 6 Dec 2016

The design discussion is not converging here. Moreover the 200 comments are very hard to follow.
We plan to grab @jkotas next week and flush out proposal in API review next Tuesday. We will then post the proposal back here for further comments.

On a side: I suggest to close this issue and create a new one with the "blessed proposal" when we have it next week to lower the burden on following the long discussion. Let me know if you think it is a bad idea.

karelz on 7 Dec 2016

👍1

@jcouv I am OK with it not working yet, so as long as we can follow this design when it is released. (I also think it may be possible to work around this temporarily using Unsafe.)

@karelz OK :smile: I will close this proposal later when I have time, and open a new one. I agree; my browser can't handle 200+ comments that well.

jamesqo on 7 Dec 2016

@karelz I hit a snag; it turns out that the PR in question was trying to enable ref this returns for reference types as opposed to value types. ref this cannot be safely returned from structs; see here for why. So the ref-returning compromise will not work.

Anyway, I will close this issue. I have opened another issue here: https://github.com/dotnet/corefx/issues/14354

jamesqo on 9 Dec 2016

Should be able to return ref "this" from a value type extension method post https://github.com/dotnet/roslyn/pull/15650 though I assume C#vNext...

benaadams on 9 Dec 2016

@benaadams

Should be able to return ref "this" from a value type extension method post dotnet/roslyn#15650 though I assume C#vNext...

Correct. It is possible to return this from a ref this extension method. It is not possible to return this from a normal struct instance method though. There are a lot of gory lifetime details as to why that is the case :(

jaredpar on 12 Dec 2016

@redknightlois

if we want to be strict the only hash should be uint, it can be viewed as an oversight that the framework returns int under that light.

CLS-compliance? Unsigned integers are not CLS-compliant.

paulomorgado on 13 Dec 2016

Runtime: Adding a HashCode type to help with combining hash codes

Motivation

Proposal

Notes

Most helpful comment

All 206 comments

Related issues