Article 6W77P Records and Collections

Records and Collections

by
jonskeet
from Jon Skeet's coding blog on (#6W77P)
Story ImageRecords and Collections

This post is to some extent a grab-bag of points of friction I've encountered when using records and collections within the election site.

Records recap

This may end up being the most generally useful blog post in this series. Although records have been in C# since version 10, I haven't used them much myself. (I've been looking forward to using them for over a decade, but that's a different matter.)

Having decided to make all my data models immutable, using records (always sealed records in my case) to implement those models in C# was pretty much a no-brainer. Just specify the properties you want using the same format as primary constructors, and the compiler does a bunch of boilerplate work for you.

As a simple example, consider the following record declaration:

public sealed record Candidate(int Id, string Name, int? MySocietyId, int? ParliamentId);

That generates code roughly equivalent to this:

public sealed class Candidate : IEquatable<Candidate>{ public int Id { get; } public string Name { get; } public int? MySocietyId { get; } public int? ParliamentId { get; } public Candidate(int id, string name, int? mySocietyId, int? parliamentId) { Id = id; Name = name; MySocietyId = mySocietyId; ParliamentId = parliamentId; } public override bool Equals(object? obj) => obj is Candidate other && Equals(other); public override int GetHashCode() { // The real code also uses EqualityContract, skipped here. int idHash = EqualityComparer<int>.Default.GetHashCode(Id); int hash = idHash * -1521134295; int nameHash = EqualityComparer<string>.Default.GetHashCode(Name); hash = (hash + nameHash) * -1521134295; int mySocietyIdHash = EqualityComparer<int?>.Default.GetHashCode(MySocietyId); hash = (hash + mySocietyIdHash) * -1521134295; int parliamentIdHash = EqualityComparer<int?>.Default.GetHashCode(ParliamentId); hash = (hash + parliamentIdHash) * -1521134295; return hash; } public bool Equals(Candidate? other) { if (ReferenceEquals(this, other)) { return true; } if (other is null) { return false; } // The real code also uses EqualityContract, skipped here. return EqualityComparer<int>.Default.Equals(Id, other.Id) && EqualityComparer<string>.Default.Equals(Name, other.Name) && EqualityComparer<int?>.Default.Equals(MySocietyId, other.MySocietyId) && EqualityComparer<int?>.Default.Equals(ParliamentId, other.ParliamentId); } public static bool operator==(Candidate? left, Candidate? right) => { if (ReferenceEquals(left, right)) { return true; } if (left is null) { return false; } return left.Equals(right); } public static bool operator!=(Candidate? left, Candidate? right) => !(left == right); public override string ToString() => $"Candidate {{ Id = {Id}, Name = {Name}, MySocietyId = {MySocietyId}, ParliamentId = {ParliamentId} }}"; public void Deconstruct(out int Id, out string Name, out int? MySocietyId, out int? ParliamentId) => (Id, Name, MySocietyId, ParliamentId) = (this.Id, this.Name, this.MySocietyId, this.ParliamentId);}

(This could be written a little more compactly using primary constructors, but I've kept to old school" C# to avoid confusion.)

Additionally, the compiler allows the with operator to be used with records, to create a new instance based on an existing instance and some updated properties. For example:

var original = new Candidate(10, "Jon", 20, 30);var updated = original with { Id = 40, Name = "Jonathan" };

That's all great! Except when it's not quite...

Record equality

As shown above, the default implementation of equality for records uses EqualityComparer.Default for each of the properties. That's fine when the default equality comparer for the property type is what you want - but that's not always the case. In our election data model case, most of the types are fine - but ImmutableList is not, and we use that quite a lot.

ImmutableList doesn't override Equals and GetHashCode itself - so it has reference equality semantics. What I really want is to use an equality comparer for the element type, and say that two immutable lists are equal if they have the same count, and the elements are equal when considered pairwise. That's easy enough to implement - along with a suitable GetHashCode method. It could easily be wrapped in a type that implements IEqualityComparer&lt;ImmutableList&gt;, although it so happens I haven't done that yet.

Unfortunately, the way that records work in C#, there's no way of specifying an equality comparer to be used for a given property. If you implement the Equals and GetHashCode methods directly, those are used instead of the generated versions (and the Equals(object) generated code will still use the version you've implemented) but it does mean you've got to implement it for all the properties. This in turn means that if you add a new property in the record, you need to remember to modify Equals and GetHashCode (something I've forgotten to do at least once) - whereas if you're happy to use the default generated implementation, adding a property is trivial.

What I'd really like would be some way of indicating to the compiler that it should use a specified type to obtain the equality comparer (which could be assumed to be stateless) for a property. For example, imagine we have these types:

// Imagine this is in the framework...public interface IEqualityComparerProvider{ static abstract IEqualityComparer<T> GetEqualityComparer<T>();}// As is this...[AttributeUsage(AttributeTargets.Property)]public sealed class EqualityComparerAttribute : Attribute{ public Type ProviderType { get; } public EqualityComparer(Type providerType) { ProviderType = providerType; }}

Now I could implement the interface like this:

public sealed class CollectionEqualityProvider : IEqualityComparerProvider{ public static IEqualityComparer<T> GetEqualityComparer<T>() { var type = typeof(T); if (!type.IsGenericType) { throw new InvalidOperationException("Unsupported type"); } var genericTypeDefinition = type.GetGenericTypeDefinition(); if (genericTypeDefinition == typeof(ImmutableList<>)) { // Instantiate and return an appropriate equality comparer } if (genericTypeDefinition == typeof(ImmutableDictionary<,>)) { // Instantiate and return an appropriate equality comparer } // etc... throw new InvalidOperationException("Unsupported type"); }}

It's unfortunate that the comments would involve further reflection - but it would certainly be feasible.

We could then declare a record like this:

public sealed record Ballot( Constituency Constituency, [IEqualityComparerProvider(typeof(CollectionEqualityProvider))] ImmutableList<Candidacy> Candidacies);

... and I'd expect the compiler to generate code such as:

public sealed class Ballot{ private static readonly IEqualityComparer<ImmutableList<Candidacy>> candidaciesComparer; // Skip code that would be generated as it is today. public bool Equals(Candidate? other) { if (ReferenceEquals(this, other)) { return true; } if (other is null) { return false; } return EqualityComparer<Constituency>.Default.Equals(Constituency, other.Constituency) && candidaciesComparer.Equals(Candidacies, other.Candidacies); } public override int GetHashCode() { int constituencyHash = EqualityComparer<Constituency>.Default.GetHashCode(Constituency); int hash = constituencyHash * -1521134295; int candidaciesHash = candidaciesComparer.GetHashCode(Candidacies); hash = (hash + candidaciesHash) * -1521134295; return hash; }}

I'm sure there are other ways of doing this. The attribute could instead specify the name of a private static read-only property used to obtain the equality comparer, removing the interface. Or the GetEqualityComparer method could be non-generic with a Type parameter instead (leaving the compiler to generate a cast after calling it). I've barely thought about it - but the important thing is that the requirement of having a custom equality comparison for a single property becomes independent of all the other properties. If you already have a record with 9 properties where the default equality comparison is fine, then adding an 10th property which requires more customization is easy - whereas today, you'd need to implement Equals and GetHashCode including all 10 properties.

(The same could be said for string formatting for the properties, but it's not an area that has bitten me yet.)

The next piece of friction I've encountered is also about equality, but in a different direction.

Reference equality

If you remember from my post about data models, within a single ElectionContext, reference equality for models is all we ever need. The site never needs to fetch (say) a constituency result from the 2024 election from one context by specifying a Constituency from a different context. Indeed, if I ever found code that did try to do that, it would probably indicate a bug: everything within any given web request should refer to the same ElectionContext.

Given that, when I'm creating an ImmutableDictionary, I want to provide an IEqualityComparer which only performs reference comparisons. While this seems trivial, I found that it made a pretty significant difference to the time spent constructing view-models when the context is reloaded.

I'd expected it would be easy to find a reference equality comparer within the framework - but if there is one, I've missed it.

Update, 2025-03-27T21:04Z, thanks to Michael Damatov

As Michael pointed out in comments, there is one in the framework: System.Collections.Generic.ReferenceEqualityComparer - and I remember finding it when I first discovered I needed one. But I foolishly dismissed it. You see, it's non-generic:

public sealed class ReferenceEqualityComparer : System.Collections.Generic.IEqualityComparer<object>, System.Collections.IEqualityComparer

That's odd and not very useful, I thought at the time. Why would I only want IEqualityComparer<object> rather than a generic one?

Oh Jon. Foolish, foolish Jon.

IEqualityComparer is contravariant in T - so there's an implicit reference conversion from IEqualityComparer<object> to IEqualityComparer for any class type X.

I have now removed my own generic ReferenceEqualityComparer type... although it's meant I've had to either cast or explicitly specify some type arguments where previously the types were inferred via the type of the comparer.

End of update

I've now made a habit of using reference equality comparisons everywhere within the data models, which has made it worth adding some extension methods - and these probably don't make much sense to add to the framework (although they could easily be supplied by a NuGet package):

public static ImmutableDictionary<TKey, TValue> ToImmutableReferenceDictionary<TSource, TKey, TValue>( this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, Func<TSource, TValue> elementSelector) where TKey : class => source.ToImmutableDictionary(keySelector, elementSelector, ReferenceEqualityComparer<TKey>.Instance);public static ImmutableDictionary<TKey, TSource> ToImmutableReferenceDictionary<TSource, TKey>( this IEnumerable<TSource> source, Func<TSource, TKey> keySelector) where TKey : class => source.ToImmutableDictionary(keySelector, ReferenceEqualityComparer<TKey>.Instance);public static ImmutableDictionary<TKey, TValue> ToImmutableReferenceDictionary<TKey, TValue>( this IDictionary<TKey, TValue> source) where TKey : class => source.ToImmutableDictionary(ReferenceEqualityComparer<TKey>.Instance);public static ImmutableDictionary<TKey, TValue> ToImmutableReferenceDictionary<TKey, TValue, TSourceValue>( this IDictionary<TKey, TSourceValue> source, Func<KeyValuePair<TKey, TSourceValue>, TValue> elementSelector) where TKey : class => source.ToImmutableDictionary(pair => pair.Key, elementSelector, ReferenceEqualityComparer<TKey>.Instance);

(I could easily add similar methods for building lookups as well, of course.) Feel free to take issue with the names - while they're only within the election repo, I'm not going to worry too much about them.

Why not make reference equality the default?
I could potentially kill two birds with one stone here. If I often want reference equality, and deep" equality is relatively hard to achieve, why not just provide Equals and GetHashCode methods that make all my records behave with reference equality comparisons?

That's certainly an option - but I do lean on the deep equality comparison for testing purposes: if I load the same context twice for example, the results should be equal, otherwise there's something wrong.
Moreover, as record types encourage deep equality, it feels like I'd be subverting their natural behaviour by specifying reference equality comparisons. While I'm not expecting anyone else to ever see this code, I don't like writing code which would confuse readers who come with expectations based on how most code works.

Speaking of extension methods for commonly-used comparers...

Ordinal string comparisons

String comparisons make me nervous. I'm definitely not an internationalisation expert, but I know enough to know it's complicated.

I also know enough to be reasonably confident that the default string comparisons are ordinal for Equals and GetHashCode, but culture-sensitive for CompareTo. As I say, I'm reasonably confident in that - but I always find it hard to validate, so given that I almost always want to use ordinal comparisons, I like to be explicit. Previously I've specified StringComparer.Ordinal (or StringComparer.OrdinalIgnoreCase just occasionally) but - just as above with the reference equality comparer - that gets irritating if you're using it a lot.

I've therefore create another bunch of extension methods, just to make it clear that I want to use ordinal string comparisons - even if (in the case of equality) that would already be the default.

I won't bore you with the full methods, but I've got:

  • OrderByOrdinal
  • OrderByOrdinalDescending
  • ThenByOrdinal
  • ThenByOrdinalDescending
  • ToImmutableOrdinalDictionary (4 overloads, like the ones above for ToImmutableReferenceDictionary)
  • ToOrdinalDictionary (4 overloads again)
  • ToOrdinalLookup (2 overloads)

(I don't actually use ToOrdinalLookup much, but it feels sensible to implement all of them.)

Would these be useful in the framework? Possibly. I can see why they're not there - string is just another type" really... but I bet a high proportion of uses of LINQ end up with strings as keys in some form or another. Possibly I should suggest this for MoreLINQ - although having started the project over 15 years ago, I haven't contributed to it for over a decade...

Primary constructor and record call hierarchy" niggle in VS

I use call hierarchy" in Visual Studio all the time. Put your cursor on a member, then Ctrl-K, Ctrl-T and you can see everything that calls that member, and what calls the caller, etc.

For primary constructor and record parameters, find references" works (Ctrl-K, Ctrl-R) but call hierarchy" doesn't. I'm okay with call hierarchy" not working for primary constructor parameters, but as the record parameters become properties, I'd expect to see the call hierachy for them just as I can with any other property.

More frustrating though is the inability to see the call hierarchy for calling the constructor". Given that the declaration of the class/record sort of acts as the declaration of the constructor as well, I'd have thought that putting your cursor on the class/record declaration (in the name) would work. It's not that it's ambiguous - Visual Studio just complains that Cursor must be on a member name". You can get at the calls by expanding the source file entry in Solution Explorer, but it's weird only to have to do that for this one case.

Feature requests (for the C# language, .NET, and Visual Studio)

In summary, I love records, and love the immutable collections - but some friction could be reduced with the introduction of:

  • Some way of controlling (on a per-property basis) which equality comparer is used in the generated code
  • Equality comparers for immutable collections, with the ability to specify the element comparisons to use
  • An IEqualityComparer implementation which performs reference comparisons
  • Call Hierarchy" showing the calls to the constructors for primary constructors and records
Conclusion

Some of the niggles I've found with records and collections are at least somewhat specific to my election site, although I strongly suspect that I'm not the only developer with immutable collections in their records, with a desire to use them in equality comparisons.

Overall, records have served me well so far in the site, and I'm definitely pleased that they're available, even if there are still possible improvements to be made. Similarly, it's lovely to have immutable collections just naturally available - but some help in performing comparisons with them would be welcome.

External Content
Source RSS or Atom Feed
Feed Location http://codeblog.jonskeet.uk/feed/
Feed Title Jon Skeet's coding blog
Feed Link https://codeblog.jonskeet.uk/
Reply 0 comments