Building an election website
I don't know much about my blog readership, so let's start off with two facts that you may not be aware of:
- I live in the UK.
- The UK has a general election on July 4th 2024.
I'm politically engaged, and this is a particularly interesting election. The Conservative party have been in office for 14 years, and all the polls show them losing the upcoming election massively. Our family is going to spend election night with some friends, staying up for as many of the results we can while still getting enough sleep for me to drive safely home the next day.
I recently started reading Comment is Freed, the Substack for Sam and Lawrence Freedman. This Substack is publishing an article every day in the run-up to the election, and I'm particularly interested in Sam's brief per-constituency analysis and predictions. It was this site that made me want to create my own web site for tracking the election results - primarily for on-the-night results, but also for easy information lookup later.
In particular, I wanted to see how accurate the per-seat predictions were with reality. Pollsters in the UK are generally predicting three slightly different things:
- Overall vote share (what proportion of votes went to each party)
- Overall seat tallies (in the 650 individual constituencies, how many seats did each party win)
- Per-seat winners (sometimes with predicted majorities; sometimes with probabilities of winning)
The last of these typically manifests as what is known as an MRP prediction: Multi-level Regression and Poststratification. They're relatively new, and we're getting a lot of them in this election.
After seeing those MRPs appear over time, I reflected - and in retrospect this was obvious - that instead of only keeping track of how accurate Sam Freedman's predictions were, it would be much more interesting to look at the accuracy of all the MRPs I could get permission to use.
At the time of this writing, the site includes data from the following providers:
- The Financial Times
- Survation
- YouGov
- Ipsos
- More in Common
- Britain Elects (as published in The New Stateman)
I'm expecting to add predictions from Focaldata and Sam Freedman in the next week.
Information on the siteThe site is at https://jonskeet.uk/election2024, and it just has three pages:
- The full view (or in colour) contains:
- Summary information:
- 2019 (notional) results and 2024 results so far
- Predictions and their accuracy so far (in terms of proportion of declared results which were correctly called)
- Hybrid actual result if we know it, otherwise predicted" results for each prediction set
- 2019/2024 and predicted results for the four nations of the UK
- Per-seat information:
- The most recent results
- The biggest swings (for results where the swing is known; there may be results which don't yet have majority information)
- Recent surprises" where a surprise is deemed to be a result where at least half the predictions were wrong"
- Contentious" constituencies - i.e. ones where the predictions disagree most with each other
- Notable losses/wins - I've picked candidates that I think will be most interesting to users,
mostly cabinet and shadow cabinet members. - All constituencies, in alphabetical order
- The simple view (or in colour) doesn't include predictions at all. It contains:
- 2019 (notional) results and 2024 results so far
- Recent results
- Notable losses/wins
- An introduction page so that most explanatory text can be kept off the main pages.
I have very little idea how much usage the site will get at all, but I'm hoping that folks who want a simple, up-to-date view of recent results will use the simple view, and those who want to check specific constituencies and see how the predictions are performing will use the full view.
The colour mode" is optional because I'm really unsure whether I like it. In colour mode, results are colour-coded by party and (for predictions) likelihood. It does give an at a glance idea" impression of the information, but only if you've checked which columns you're looking at to start with.
ImplementationThis is a coding blog, and the main purpose of writing this post was to give a bit of information about the implementation to anyone interested.
The basic architecture is:
- ASP.NET Core Razor Pages, running on Google Kubernetes Engine (where my home page was already hosted)
- Constituency information, notable candidates" and predictions are stored in Google Drive
- Result information for the site is stored in Firestore
- Result information originates from the API of The Democracy Club
and a separate process uploads the data to Firestore - Each server refreshes its in-memory result data every 10 seconds and candidate/prediction data every 10 minutes via a background hosted service
A few notes on each of these choices...
I was always going to implement this in ASP.NET Core, of course. I did originally look at making it a Cloud Function, but currently the Functions Framework for .NET doesn't support Razor. It doesn't really need to, mind you: I could just deploy straight on Cloud Run. That would have been a better fit in terms of rapid scaling to be honest; my web site service in my GKE cluster only has two nodes. The cluster itself has three. If I spot there being a vast amount of traffic on the night, I can expand the cluster, but I don't expect that to be nearly as quick to scale as Cloud Run would be. Note to self: possibly deploy to Cloud Run as a backup, and redirect traffic on the night. It would take a bit of work to get the custom domain set up though. This is unlikely to actually be required: the busiest period is likely to be when most of the UK is asleep anyway, and the site is doing so little actual work that it should be able to support at least several hundred requests per second without any extra work.
Originally, I put all information, including results in Google Drive. This is a data source I already use for my local church rota, and after a little initial setup with credential information and granting the right permissions, it's really simple to use. Effectively I load a single sheet from the overall spreadsheet in each API request, with a trivial piece of logic to map each row into a dictionary from column name to value. Is this the most efficient way of storing and retrieving data? Absolutely not. But it's not happening often, and it does end up being really easy to read code, and the data is very easy to create and update. (As of 2024-06-24, I've added the ability to load several sheets within a single request, with unexpectedly simplified some other code too. But the treat each row as a string-to-string dictionary" design remains.)
For each prediction provider" I store the data using the relevant sheets from the original spreadsheets downloaded from the sites. (Most providers have a spreadsheet available; I've only had to resort to scraping in a couple of cases.) Again, this is inefficient - it means fetching data for columns I'll never actually access. But it means when a provider releases a new poll, I can have the site using it within minutes.
An alternative approach would be to do what I've done for results - I could put all the prediction information in Firestore in a consistent format. That would keep the site code straightforward, moving the per-provider code to tooling used to populate the Firestore data. If I were starting again from scratch, I'd probably do that - probably still using Google Sheets as an intermediate representation. It doesn't make any significant difference to the performance of the site, beyond the first few seconds after deployment. But it would probably be nice to only have a single source of data.
The raw" data is stored in what I've called an ElectionContext - this is what's reloaded by the background service. This doesn't contain any processed information such as most recent" results or contentious results". Each of the three page models then has a static cache. A request for a new model where the election context hasn't changed just reuses the existing model. This is currently done by setting ViewData.Model in the page model, to refer to the cached model. There may well be a more idiomatic way of doing this, but it works well. The upshot is that although the rendered page isn't cached (and I could look into doing that of course), everything else is - most requests don't need to do anything beyond simple rendering of already-processed data.
I was very grateful to be informed about the Democracy Club API - I was expecting to have to enter all the result data manually myself (which was one reason for keeping it in Google Sheets). The API isn't massively convenient, as it involves mapping party IDs to parties, ballot paper IDs to constituency IDs, and then fetching the results - but it only took a couple of hours to get the upload process for Firestore working. One downside of this approach is that I really won't be able to test it before the night - it would be lovely to have a fake server (running the same code) that I could ask to start replaying 2019 election results" for example... but never mind. (I've tested it against the 2019 election results, to make sure I can actually do the conversion and upload etc.) You might be expecting this to be hosted in some sort of background service as well... but in reality it's just a console application which I'll run from my laptop on the night. Nothing to deploy, should be easy to debug and fix if anything goes wrong.
In terms of the UI for the site itself, the kind way to put it would be efficient and simplistic". It's just HTML and CSS, and no request will trigger any other requests. The CSS is served inline (rather than via a separate CSS resource) - it's small enough not to be a problem, and that felt simpler than making sure I handled caching appropriately. There's no JS at all - partly because it's not necessary, and partly because my knowledge of JS is almost non-existent. Arguably with JS in place I could make it autorefresh... but that's about all I'd want to do, and it feels like more trouble than it's worth. The good news is that this approach ends up with a really small page size. In non-colour mode, the simple view is currently about 2.5K, and the full view is about 55K. Both will get larger as results come in, but I'd be surprised to see them exceed 10K and 100K respectively, which means the site will probably be among the most bandwidth-efficient way of accessing election data on the night.
ConclusionI've had a lot of fun working on this. I'll leave the site up after the election, possibly migrating the data all to Firestore at some point.
I've experienced yet again the joy of working on something slightly out of my comfort zone (I've learned bits of HTML and CSS I wasn't aware of before, learned more about Razor pages, and used C# records more than elsewhere, and I love collection expressions) that is also something I want to use myself. It's been great.
Unfortunately at the moment I can't really make the code open source... but I'll consider doing so after the election, as a separate standalone site (as opposed to part of my home page). It shouldn't be too hard to do - although I should warn readers that the code is very much in the quick and dirty hack" style.
Feedback welcome in the comments - and of course, I encourage the use of the site on July 4th/5th and afterwards...