Election 2029: Data Models

I was considering using the term architecture" somewhere in the title of this post, but it feels too pompous for the scale of site. I could probably justify it, but it would give me the ick every time I used the term. But this post will basically describe how I'm approaching data within the election site, as well as what that data is.
In many systems, there are lots of uses for data, and the system needs to be designed with all of those in mind. For my election site, I only have to worry about two uses:
- The site itself, which is purely read-only
- Tooling to maintain the data, which is read-write
The tooling is only run by me, at home - so I don't need to worry about concurrency in terms of multiple concurrent writes occurring. I do need to worry about concurrency in terms of making sure that the site always reads a consistent view of the data even if it checks for updates while I'm in the middle of writing - but I'll cover that in a later post.
This post is only about the internal representation of the data, where I control basically everything. Most of the site data is sourced from external data sources, and I'll cover how those are handled in a separate post later on as well.
How the site uses dataThe code for the site involves the following projects (ignoring test projects for now). This list is in dependency order: so any project in the list may have dependencies on earlier projects, but not on later ones.
- Election2029.Common: utility code
- Election2029.Models: the core data models, which are all immutable (mostly sealed records)
- Election2029.Storage: code purely about storing the data, either on the file system or in Firestore, but with a common interface
- Election2029.ViewModels: immutable wrappers around the models to simplify the view code (no dependency on storage)
- Election2029.Web: the ASP.NET Core code, with Razor pages etc
The Razor pages don't generally have code-behind, instead injecting a corresponding view-model which is just rendered as HTML in simple Razor syntax. The intention is to keep the view code really straightforward - I'm fine with having a loop here or there, but most of the view should be HTML rather than C#, if you see what I mean. Aside from anything else, putting any real" logic in the view models makes them easier to test, which is how I've got a comprehensive test suite aspirations to have unit tests.
Each page is injected with the current view-model for the state of the world, for that view". This view-model is reused until the underlying data changes, which happens relatively rarely. It does change though, which means the view-models can't be injected as singletons. I'll revisit the data reloading mechanics in a later post. (I realise I'm saying this a lot. As posts go on, that should happen less, of course.) The result of page rendering is also cached, but only for 10 seconds, as a balance between data freshness and excessive work duplication. In theory I could probably say cache until we have a new view-model" but that's likely to be more complex than is worthwhile.
How the tooling uses dataThe code for the tooling involves the following projects, as well as Election2029.Common, Election2029.Models, Election2029.Storage as for the site.
- Election2029.Tools.Common: utility code for all tools (as there are some tools beyond the data manager" I'm discussing here)
- Election2029.Tools.DemocracyClub.Models: models and or Democracy Club data - an external source I import from
- Election2029.Tools.Parliament.Members: models and client code for the Parliament Members API - another external source
- Election2029.Tools.DataManager: a command-line tool for various data management tasks, including adding polls, updating data from external data sources etc
Importantly, this doesn't use Election2029.ViewModels at all. Until I started writing this post, there was a dependency from the DataManager project to the ViewModels project, which made me slightly nervous... but having speculatively removed it, I found that there were only a few references, which we easily fixed. (Arguably it's still useful for the DataManager to be able to perform some simple text formatting, e.g. format the fieldwork dates for this poll" - I might move that into the models project in the future.)
The modelsThe models themselves are not just record definitions, although that's what I'll show here. I figure it's reasonable for the data models to have additional members for:
- Simple convenience calculations. For example, working out the total number of votes cast for a single constituency result by summing the number of votes cast for each party.
- Lookups based on the rest of the data. For example, a projection set" contains a list of projected results by constituency; we have a lookup from constituency to results for that constituency".
These could go in the view-models, but they're typically used by multiple view-models, so it makes sense to put them in the models.
The models are grouped together. There's an ElectionCoreContext which contains separate models which don't refer to each other, then ElectionContext which contains everything - it has an ElectionCoreContext, and then other models which can refer to models within the core context, but not other non-core models. This means that creating an ElectionContext consists of:
- Creating each element of the core context independently
- Creating the core context from those elements
- Creating each non-core element, with the core context for reference
- Creating the overall context from the core context and the non-core elements
This makes the data model much easier to work with than if elements couldn't refer to other models - for example, it means that a PartyChange can refer to Candidate and Party models, rather than just containing the ID of the candidate and the old/new party IDs. The core/non-core split means all of this can be done without any circular dependencies.
In all of the declarations below, I'm omitting any extra interfaces that the records implement, as well as any code within the record.
Quick terminology note: the context" here is just my term for the latest state of the world". It's got nothing to do with Entity Framework, and any resemblance to DDD bounded contexts is at least somewhat coincidental. (Or I suspect you could think of the whole ElectionContext as a single bounded context - the system is small enough that the one boundary goes around everything. I'm not sufficiently knowledgeable about DDD to understand how well the system fits with it.)
Core modelsThe ElectionCoreContext has the following declaration:
public sealed record ElectionCoreContext( ImmutableList<Constituency> Constituencies, ImmutableList<Candidate> Candidates, ImmutableList<Party> Parties, ImmutableList<ElectoralCommissionParty> ElectoralCommissionParties, ImmutableList<DataProvider> DataProviders);
The elements within those are:
public sealed record Constituency(string Name, string Code, string? Code2023, string HexRQ, ConstituencyLinks Links);public sealed record ConstituencyLinks(string? WhoCanIVoteFor2024, string? WhoCanIVoteFor2029, int ParliamentId);public sealed record Candidate(int Id, string Name, int? MySocietyId, int? ParliamentId);public sealed record Party( string Id, string BriefName, string FullName, ImmutableList<string> ElectoralCommisionIds, string? ParliamentAbbr, string CssPrefix, ImmutableList<string> ColorsByStrength);public sealed record ElectoralCommissionParty(string ElectoralCommissionId, string Name);public sealed record DataProvider( string Id, string Name, bool Enabled, string DescriptionHtml, string? Link, string? DataDirectory);
A few notes on these:
- In general I've found that modeling records in terms of lists rather than sets or dictionaries makes it easier to keep data consistent... even if the order of a collection doesn't logically matter (e.g. for constituencies), it's really handy to have a canonical ordering so that equality tests are simpler, diffs are easy to read etc.
- ConstituencyLinks could be inlined into Constituency (and indeed it is, in storage). It's inconsitent with how Candidate has the IDs from external data sources inlined. This may well change over time.
- The core elements don't refer to each other" aspect is somewhat polluted by a Party having a list of Electoral Commission IDs. In theory I could have increased the number of tiers" within the model so that a party could have an ImmutableList instead... but it turns out that I very rarely need to refer to Electoral Commission parties. (The background on this is that there's a very large number of parties registered with the Electoral Commission. Most of the time we only need major parties, and we don't want or need to distinguish between, say, Green Party" and Scottish Green Party" - or indeed the two Electoral Commission parties which are both called Conservative and Unionist Party".)
- The DataDirectory part of DataProvider is only used by the tooling... that feels a little odd, but it's not particularly unreasonable.
- DataProvider (which is basically for polls and seat projections) has an Enabled flag to allow me to ingest data from a provider without displaying it, while I'm obtaining permission to use it on the site.
So far, so relatively simple. The full ElectionContext is rather bigger:
public sealed record ElectionContext( ElectionCoreContext CoreContext, PostcodeMapping PostcodeMapping, ImmutableList<ByElection> ByElections, ImmutableList<Ballot> Ballots2029, ResultSet Results2029, ResultSet Results2024, ImmutableList<NotionalResult> NotionalResults2019, ImmutableList<ProjectionSet> ProjectionSets, ImmutableList<Poll> Polls, ImmutableList<PartyChange> PartyChanges, ImmutableList<CurrentRepresentation> CurrentRepresentation)
The elements within those are:
public sealed record PostcodeMapping(ImmutableList<OutcodeMapping> OutcodeMappings);public sealed record OutcodeMapping(string Outcode, ImmutableList<Constituency> Constituencies, ReadOnlyMemory<byte> Map);public sealed record ByElection(LocalDate Date, Result Result);public sealed record Result( Constituency Constituency, Party WinningParty, ImmutableList<Candidacy>? CandidateResults, int? SpoiltBallots, int? RegisteredVoters, Instant IngestionTime, Instant? DeclarationTime);public sealed record Candidacy(Candidate Candidate, ElectoralCommissionParty Party, int? Votes);public sealed record Ballot(Constituency Constituency, ImmutableList<Candidacy> Candidacies);public sealed record ResultSet(ImmutableList<Result> Results, Instant? LastUpdated);public sealed record NotionalResult( Constituency Constituency, ImmutableList<NotionalCandidacy> CandidateResults, int AbsoluteMajority, decimal Turnout);public sealed record ProjectionSet( string Id, DataProvider Provider, string Name, string Abbreviation, LocalDate FieldworkStart, LocalDate FieldworkEnd, LocalDate PublicationDate, string? ArticleLink, string? DataLink, ImmutableList<Projection> Projections, ImmutableList<ProjectionDistribution> Distributions);public sealed record Projection(Constituency Constituency, Party Party, ProjectionStrength Strength, string? Link, string? Description, ImmutableList<ProjectionMeasure>? VictoryChances, ImmutableList<ProjectionMeasure>? VoteShares);public enum ProjectionStrength { Tossup, Lean, Likely, Safe };public sealed record ProjectionMeasure(Party Party, decimal Percentage);public sealed record Poll( string Id, DataProvider Provider, ImmutableList<PollValue> Values, string? PreviousId, LocalDate FieldworkStart, LocalDate FieldworkEnd, LocalDate PublicationDate, string? ArticleLink, string? DataLink);public sealed record PollValue(Party Party, decimal Share, decimal? ShareChange);public sealed record PartyChange( Candidate Candidate, Constituency Constituency, Party OldParty, Party? NewParty, LocalDate Date, string Description);public sealed record CurrentRepresentation(Constituency Constituency, Candidate? Member, Party? Party);
More notes:
- ReadOnlyMemory isn't necessarily immutable (e.g. it can wrap a byte[]); I might change this to ImmutableList at some point, although in practice it doesn't cause any problems. The structure of postcode data is an interesting topic in its own right which - you've guessed it - I'll cover in another post.
- For non-UK readers, a by-election is an election out of the normal cycle, caused by an MP resigning, dying, or being ejected via a petition. (By-elections happen in non-parliamentary-constituency contexts as well, but my site only deals with parliamentary constituencies.)
- Within the system, the term ballot" is effectively the list of candidates in a constituency"; in everyday usage it can also mean a single vote" but I couldn't think of a better term, and this is what's used by Democracy Club.
- We could probably manage without the LastUpdated property of ResultSet, but it would still make sense to have as its own type, as it includes a simplified mapping from constituency to result. It's not (yet) worth doing this for the 2019 notional results as they're used in far fewer places.
- Speaking of the 2019 notional results... these are special" because there were significant constituency boundary changes between 2019 and 2024. The notional results are a prediction of what the 2019 actual results would have looked like under the new boundaries. For significantly more detail, consult the Rallings & Thrasher" document on the topic.
- The Poll.PreviousId property is used to compute the ShareChange. It should probably go away - it's not used anywhere in the site. (One nice side-effect of writing these posts is the effect of a sort of code review. I may have removed it by the time I actually publish this post.)
- PartyChange isn't as normalized as it might be. Arguably we should record the history of parties for a given person, and then reflect that in the constituency view by considering the history of who was the MP at any given time, and which party were they representing at any given time" - this is definitely an area I might revisit in the next few years. (Once we've had the first actual by-election, which will be quite soon, it'll be easier to reason about.)
- CurrentRepresentation is similarly a little odd, in that we should be able to derive the data from the combiation of 2024 election results, by-elections, and party changes. It happens to be convenient at the moment, but may well become less so over time.
The aspects about PartyChange and CurrentRepresentation make me wonder whether I might introduce a ConstituencyHistory model that isn't stored anywhere, but is derived in the ElectionContext on construction.
Final thoughtsAs noted earlier, both the models and the view-models are immutable. In other projects where I've attempted to use immutability, I've encountered quite a bit of friction - but the combination of this being a fully-read-only web site, and the functionality of C# records, has made it really pretty straightforward. The benefits I've always known about (making it easier to reason about what other code might do, and not having to worry about thread safety) are all still as valuable as ever... but in this particular project, the limitations and frustrations haven't been a problem. C# records have some challenges and annoyances of their own, but nothing really problematic.
One aspect I'll highlight now and then talk about later when posting specifically about some performance is that within a single ElectionContext, reference equality for models is all we ever need. If you have two ProjectionSet references that refer to different objects (but are part of the same ElectionContext), they'll definitely be different" projection sets. There are a very few models where that's only coincidentally true, such as ProjectionMeasure, but those don't have any logical identity. All the primary" models really do have some form of identity, and within an ElectionContext reference equality is equivalent to logical identity.
Having described the data model and the MVVM (at least notionally) approach to the site, there are now plenty of options I can choose from for what to write about next. I think I may go into the storage side of things...