Equals
, as well as GetHashCode
, to avoid performance GetHashCode
. But what happens if this is not done? Today, let's compare the performance with the two settings and consider the tools to help avoid errors.Enum.HasFlag
method Enum.HasFlag
not very effective (*), but if you don’t use it on a resource-intensive piece of code, there will be no serious problems in the project. This is also true for protected copies created by non-readonly struct types in a readonly context. The problem exists, but is unlikely to be noticeable in ordinary applications.Equals
and GetHashCode
methods, then their standard versions from System.ValueType
. And they can significantly reduce the performance of the final application.System.ValueType
or System.Enum
types starts a wrapping (**).Enum.HasFlag
method and generates suitable code that does not start wrapping.GetHashCode
method. When implementing a hash function, we are faced with a dilemma: do a hash function distribution well or quickly. In some cases, both can be done, but in the ValueType.GetHashCode
type this is usually difficult.ValueType
method is to use reflection. That is why the CLR authors decided to sacrifice speed for the sake of distribution, and the standard version of GetHashCode
only returns the hash code of the first non-zero field and “spoils” it with a type identifier (***) (for more details, see the coreclr repo on the coreclr repo on github). public readonly struct Location { public string Path { get; } public int Position { get; } public Location(string path, int position) => (Path, Position) = (path, position); } var hash1 = new Location(path: "", position: 42).GetHashCode(); var hash2 = new Location(path: "", position: 1).GetHashCode(); var hash3 = new Location(path: "1", position: 42).GetHashCode(); // hash1 and hash2 are the same and hash1 is different from hash3
public readonly struct Location1 { public string Path { get; } public int Position { get; } public Location1(string path, int position) => (Path, Position) = (path, position); } public readonly struct Location2 { // The order matters! // The default GetHashCode version will get a hashcode of the first field public int Position { get; } public string Path { get; } public Location2(string path, int position) => (Path, Position) = (path, position); } public readonly struct Location3 : IEquatable<Location3> { public string Path { get; } public int Position { get; } public Location3(string path, int position) => (Path, Position) = (path, position); public override int GetHashCode() => (Path, Position).GetHashCode(); public override bool Equals(object other) => other is Location3 l && Equals(l); public bool Equals(Location3 other) => Path == other.Path && Position == other.Position; } private HashSet<Location1> _locations1; private HashSet<Location2> _locations2; private HashSet<Location3> _locations3; [Params(1, 10, 1000)] public int NumberOfElements { get; set; } [GlobalSetup] public void Init() { _locations1 = new HashSet<Location1>(Enumerable.Range(1, NumberOfElements).Select(n => new Location1("", n))); _locations2 = new HashSet<Location2>(Enumerable.Range(1, NumberOfElements).Select(n => new Location2("", n))); _locations3 = new HashSet<Location3>(Enumerable.Range(1, NumberOfElements).Select(n => new Location3("", n))); _locations4 = new HashSet<Location4>(Enumerable.Range(1, NumberOfElements).Select(n => new Location4("", n))); } [Benchmark] public bool Path_Position_DefaultEquality() { var first = new Location1("", 0); return _locations1.Contains(first); } [Benchmark] public bool Position_Path_DefaultEquality() { var first = new Location2("", 0); return _locations2.Contains(first); } [Benchmark] public bool Path_Position_OverridenEquality() { var first = new Location3("", 0); return _locations3.Contains(first); }
Method | NumOfElements | Mean | Gen 0 | Allocated | -------------------------------- |------ |--------------:|--------:|----------:| Path_Position_DefaultEquality | 1 | 885.63 ns | 0.0286 | 92 B | Position_Path_DefaultEquality | 1 | 127.80 ns | 0.0050 | 16 B | Path_Position_OverridenEquality | 1 | 47.99 ns | - | 0 B | Path_Position_DefaultEquality | 10 | 6,214.02 ns | 0.2441 | 776 B | Position_Path_DefaultEquality | 10 | 130.04 ns | 0.0050 | 16 B | Path_Position_OverridenEquality | 10 | 47.67 ns | - | 0 B | Path_Position_DefaultEquality | 1000 | 589,014.52 ns | 23.4375 | 76025 B | Position_Path_DefaultEquality | 1000 | 133.74 ns | 0.0050 | 16 B | Path_Position_OverridenEquality | 1000 | 48.51 ns | - | 0 B |
ValueType.Equals
. Here are the consequences of the method that uses reflection!Position_Path_DefaultEquality
). But if this is not the case, then the performance will be extremely low.ValueType.Equals
loaded in 50 seconds. private readonly HashSet<(ErrorLocation, int)> _locationsWithHitCount; readonly struct ErrorLocation { // Empty almost all the time public string OptionalDescription { get; } public string Path { get; } public int Position { get; } }
Equals
. And, unfortunately, it had an optional first field, which almost always was String.equals
. Performance remained high until the number of elements in the set increased significantly. Within minutes, a collection with tens of thousands of elements was initialized.ValueType.Equals/GetHashCode
always work slowly by default?ValueType.Equals
, and for ValueType.GetHashCode
there are special optimization methods. If the type does not have “pointers” and it is packaged correctly (I will show an example in a minute), then optimized versions are used: GetHashCode
iterations are performed on instance blocks, 4-byte XOR is used, Equals
method compares two instances using memcmp
. // Optimized ValueType.GetHashCode implementation static INT32 FastGetValueTypeHashCodeHelper(MethodTable *mt, void *pObjRef) { INT32 hashCode = 0; INT32 *pObj = (INT32*)pObjRef; // this is a struct with no refs and no "strange" offsets, just go through the obj and xor the bits INT32 size = mt->GetNumInstanceFieldBytes(); for (INT32 i = 0; i < (INT32)(size / sizeof(INT32)); i++) hashCode ^= *pObj++; return hashCode; } // Optimized ValueType.Equals implementation FCIMPL2(FC_BOOL_RET, ValueTypeHelper::FastEqualsCheck, Object* obj1, Object* obj2) { TypeHandle pTh = obj1->GetTypeHandle(); FC_RETURN_BOOL(memcmp(obj1->GetData(), obj2->GetData(), pTh.GetSize()) == 0); }
ValueTypeHelper::CanCompareBits
, it is called from the iteration ValueType.Equals
, and from the iteration ValueType.GetHashCode
. public struct Case1 { // Optimization is "on", because the struct is properly "packed" public int X { get; } public byte Y { get; } } public struct Case2 { // Optimization is "off", because struct has a padding between byte and int public byte Y { get; } public int X { get; } }
public struct MyDouble { public double Value { get; } public MyDouble(double value) => Value = value; } double d1 = -0.0; double d2 = +0.0; // True bool b1 = d1.Equals(d2); // False! bool b2 = new MyDouble(d1).Equals(new MyDouble(d2));
-0,0
and +0,0
are equal, but have different binary representations. This means that Double.Equals
is correct, and MyDouble.Equals
is false. In most cases, the difference is insignificant, but imagine how many hours you will spend on correcting the problem caused by this difference.Equals
and GetHashCode
methods in struct types is to use the FxCop CA1815 rule . But there is one problem: this is too strict an approach.System.Collections.Generic.KeyValuePair <TKey, TValue>
structure defined in mscorlib does not overwrite Equals
and GetHashCode
. It is unlikely that today someone will define a variable of type HashSet <KeyValuePair<string, int>>
, but I believe that even the BCL can break the rule. Therefore, it is useful to discover this before it is too late.GetHashCode
will be very bad if the first field of many instances has the same value.Equals
and GetHashCode
, but you shouldn’t rely on them, because even a small code change can turn them off.Source: https://habr.com/ru/post/418515/
All Articles