Lessons from election night
On Thursday (July 4th, 2024) the UK held a general election. There are many, many blog posts, newspaper articles, podcast episodes etc covering the politics of it, and the lessons that the various political parties may need to learn. I, on the other hand, learned very different lessons on the night of the 4th and the early morning of the 5th.
In my previous blog post, I described the steps I'd taken at that point to build my election web site. At the time, there was no JavaScript - I later added the map view, interactive view and live view which all do require JavaScript. Building those three views, adding more prediction providers, and generally tidying things up a bit took a lot of my time in the week and a half between the blog post and the election - but the election night itself was busier still.
Only two things really went wrong" as such on the night, though they were pretty impactful.
Result entry woesFirstly, the web site used to crowd source results for Democracy Club had issues. I don't know the details, and I'm certainly not looking to cause any trouble or blame anyone. But just before 2am, the web site no longer loaded, which means no new results were being added. My site doesn't use the Democracy Club API directly - instead, it loads data from a Firestore database, and I have a command-line tool to effectively copy the data from the Democracy Club API to Firestore. It worked very smoothly to start with - in fact the first result came in while I was coding a new feature (using the exit poll as another prediction provider) and I didn't even notice. But obviously, when the results stop being submitted, that's a problem.
At first, I added the results manually via the Firestore console, clearing the backlog of results that I'd typed into a text document as my wife had been calling them out from the TV. I'd hoped the web site problems were just a blip, and that I could just keep up via the manual result entry while the Democracy Club folks sorted it out. (It seemed unlikely that I'd be able to help fix the site, so I tried to avoid interrupting their work instead.) At one point the web site did come back briefly, but then went down again - at which point I decided to assume that it wouldn't be reliable again during the night, and that I needed a more efficient solution than using the Firestore console. I checked every so often later on, and found that the web site did come back every so often, but it was down as often as it was up, so after a while I stopped even looking. Maybe it was all sorted by the time I'd got my backup solution ready.
That backup solution was to use Google Sheets. This was what I'd intended from the start of the project, before I knew about Democracy Club at all. I've only used the Google Sheets API to scrape data from sheets, but it makes that really quite simple. The code was already set up, including a simple row to dictionary" mapping utility method, and I already had a lot of the logic to avoid re-writing existing results in the existing tooling targeting Democracy Club - so creating a new tool to combine those bits didn't take more than about 20 minutes to write. Bear in mind though that this is at 2:30am, with more results coming in all the time, and I'd foolishly had a mojito earlier on.
After a couple of brief teething problems, the spreadsheet result sync tool was in place. I just needed to type the winning party into the spreadsheet next to the constituency name, and every 10 seconds the tool would check for changes and upload any new results. It was a frantic job trying to keep up with the results as they came in (or at least be close to keeping up), but it worked.
Then the site broke, at 5:42am.
Outage! 11 minutes of (partial) downtimeThe whole site has been developed rapidly, with no unit tests and relatively little testing in general, beyond what I could easily check with ad hoc data. (In particular, I would check new prediction data locally before deploying to production.) I'd checked a few things with test results, but I hadn't tested this statement:
Results2024Predicted = context.Predictions.Select(ps => (ps, GetResults(ps.GetPartyOrNotPredicted))) // Ignore prediction sets with no predictions in this nation. .Where(pair => pair.Item2[Party.NotPredicted] != Constituencies.Count) .ToList();
The comment indicates the purpose of the Where call - I have a sort of fake" value in the Party enum for this seat hasn't been predicted, or doesn't have a result". That worked absolutely fine - until enough results had come in that at about 5:42am one of the nations (I forget which one) no longer had any outstanding seats. At that point, the dictionary in pair.Item2 (yes, it would be clearer with a named tuple element) didn't have Party.NotPredicted as a key, and this code threw an exception.
One of the friends I was with spotted that the site was down before I did, and I was already working on it when I received a direct message on Twitter from Sam Freedman about the outage. Yikes. Fortunately by now the impact of the mojito was waning, but the lack of sleep was a significant impairment. In fact it wasn't the whole site that was down - just the main view. Those looking at the simple", live", map" or interactive" views would still have been fine. But that's relatively cold comfort.
While this isn't the fix I would have written with more time, this is what I pushed at 5:51am:
Results2024Predicted = context.Predictions.Select(ps => (ps, GetResults(ps.GetPartyOrNotPredicted))) // Ignore prediction sets with no predictions in this nation. .Where(pair => !pair.Item2.TryGetValue(Party.NotPredicted, out var count) || count != Constituencies.Count) .ToList();
Obviously I tested that locally before pushing to production, but I was certainly keen to get it out immediately. Fortunately, the fix really was that simple. At 5:53am, through the magic of Cloud Build and Kubernetes, the site was up and running again.
So those were the two really significant issues of the night. There were some other mild annoyances which I'll pick up on below, but overall I was thrilled.
What went well?Overall, this has been an immensely positive experience. It went from a random idea in chat with a friend on June 7th to a web site I felt comfortable sharing via Twitter, with a reasonable amount of confidence that it could survive modest viral popularity. Links in a couple of Sam Freedman's posts definitely boosted the profile, and monitoring suggests I had about 30 users with the live" view which refreshes the content via JavaScript every 10 seconds. Obviously 30 users isn't huge, but I'll definitely take it - this is in the middle of the night, with plenty of other ways of getting coverage.
I've learned lots of small" things about Razor pages, HTML, CSS and JavaScript, as well as plenty of broader aspects that I've described below.
Other than the short outage just before 6am - which obviously I'm kicking myself about - the site behaved itself really well. The fact that I felt confident deploying a new feature (the exit poll predictions) at 11:30pm, and removing a feature (the swing reporting, which was incorrect based on majority percentages) at 3am is an indication of how happy I am with the code overall. I aimed to create a simple site, and I did so.
What would I do differently next time?Some of the points below were thoughts I'd had before election night. Some of them were considered before election night, but only confirmed in terms of yes, this really did turn out to be a problem" on election night. Some were really unexpected.
Don't drink!At about 7pm, I'd been expecting to spend the time after the exit poll was announced developing a tool to populate my result database from a spreadsheet, as I hadn't seen any confirmation from Democracy Club that the results pages were going to be up. During dinner, I saw messages on Slack saying it would all be okay - so I decided it would be okay to have a cocktail just after the exit polls came out. After all, I wasn't really expecting to be active beyond confirming results on the Democracy Club page.
That was a mistake, as the next 10 hours were spent:
- Adding the exit poll feature (which I really should have anticipated)
- Developing the spreadsheet-to-database result populator anyway
- Frantically adding results to the spreadsheet as quickly as I could
I suspect all of that would have been slightly easier with a clear head.
Avoid clunky data entry where possible (but plan ahead)When the Democracy Club result confirmation site went down, I wasn't sure what to do. I had to decide between committing to I need new tooling now" and accepting that there'd be no result updates while I was writing it, or doing what I could to add results manually via the Firestore console, hoping that the result site would be back up shortly.
I took the latter option, and that was a mistake - I should have gone straight for writing the tool. But really, the mistake was not writing the tool ahead of time. If I'd written the tool days before just in case, not only would I have saved that coding time on the night, but I could also have added more validation to avoid data entry errors.
To be specific: I accidentally copied a load of constituency names into my result spreadsheet where the party names should have been. They were dutifully uploaded to Firestore, and I then deleted each of those records manually. I then pasted the same set of constituency names into the same (wrong) place in the spreadsheet again, because I'm a muppet. In my defence, this was probably at about 6am - but that's why it would have been good to have written the tool to anticipate data entry errors. (The second time I made the mistake, I adjusted the tool so that fixing the spreadsheet would fix the data in Firestore too.)
Better full cache invalidation than redeploy the site"A couple of times - again, due to manual data entry, this time of timestamp values - the site ended up polling data waiting for results to be uploaded two hours in the future. Likewise even before the night itself, my reload non-result data every 10 minutes" policy was slightly unfortunate. (I'd put a couple of candidates in the wrong seats.) I always had a way of flushing the cache: just redeploy the site. The cache was only in memory, after all. Redeploying is certainly effective - but it's clunky and annoying.
In the future, I expect to have something in the database to say reload all data now". That may well be a Firestore document which also contains other site options such as how frequently to reload other data. I may investigate the IOptionsMonitor interface for that.
Better no result update" than site down"The issue with the site going down was embarrassing, of course. I started thinking about how I could avoid that in the future. Most of the site is really very static - the only thing that drives any change in page content is when some aspect of the data is reloaded. With the existing code, there's no load the data" within the serving path - it's all updated periodically with a background service. The background service then provides an ElectionContext which can be retrieved from all the Razor page code-behind classes, and that's effectively transformed into a view-model for the page. The view-model is then cached while the ElectionContext hasn't changed, to avoid recomputing how many seats have been won by each party etc.
The bug that brought the site down - or rather, the main view - was in the computation of the view-model. If the code providing the ElectionContext instead provided the view-model, keeping the view-model computation out of the serving path, then a failure to build the view-model would just mean stale data instead of a page load failure. (At least until the server was restarted, of course.) Admittedly if the code computing the view-model naively transformed the ElectionContext into all the view-models, then a failure in one would cause all the view-models to fail to update. This should be relatively easy to avoid though.
My plan for the future is to have three clear layers in the new site:
- Underlying model, which is essentially the raw data for the election, loaded from Firestore and normalized
- View-models, which provide exactly what the views need but which don't actually depend on anything in ASP.NET Core itself (except maybe HtmlString)
- The views, with the view-models injected into the Razor pages
I expect to use separate a project for each of these, which should help to enforce layering and make it significantly easier to test the code.
Move data normalization and validation to earlier in the pipelineThe current site loads a lot of data from Google Sheets, using Firestore just for results. There's a lot of prediction-provider-specific code used to effectively transform those spreadsheets into a common format. This led to multiple problems:
- In order to check whether the data was valid with the transformation code, I had to start the web site
- The normalization happened every time the data was loaded
- If a prediction provider changed the spreadsheet format (which definitely happened...) I had to modify the code for it to handle both the old and the new format
- Adopting a new prediction provider (or even just a new prediction set) always required redeploying the site
- Loading data from Google Sheets is relatively slow (compared with Firestore) and the auth model for Sheets is more geared towards user credentials than services
All of this can be fixed by changing the process. If I move from lots of code in the site to load from Sheets" to lots of individual tools which populate Firestore, and a small amount of code in the site to read from Firestore" most of those problems go away. The transformation code can load all of the data and validate it before writing anything to Firestore, so we should never have any data that will cause the site itself to have problems. Adding a new prediction set or a new prediction provider should be a matter of adding collections and documents to Firestore, which the site can just pick up dynamically - no site-side code changes required.
The tooling doesn't even have to load from Google Sheets necessarily. In a couple of cases, my process was actually scrape HTML from a site, reformat the HTML as a CSV file, then import that CSV file into Google Sheets." It would be better to just scrape HTML, transform, upload to Firestore" without all the intermediate steps.
With that new process, I'd have been less nervous about adding the exit poll prediction provider" on election night.
Capture more dataI had to turn down one feature - listing the size of swings and having a biggest swings of the night" section - due to not capturing enough data. I'd hoped that party + majority % in 2019" and party + majority % in 2024" would be enough to derive the swing, but it doesn't work quite that way. In the future, I want to capture as much data as possible about the results (both past and present). That will initially mean all the voting information in each election" but may also mean a richer data model for predictions - instead of bucketing the predictions into toss-up/lean/likely/safe, it would be good to be able to present the original provider data around each prediction, whether that's a predicted vote share or a chance of the seat going to this party" - or just a toss-up/lean/likely/safe bucketing. I'm hoping that looking at all the predictions from this time round will provide enough of an idea of how to design that data model for next time.
TestsTests are good. I'm supportive of testing. I don't expect to write comprehensive tests for a future version, but where I can see the benefit, I would like to easily be able to write and run tests. That may well mean just one complex bit of functionality getting a load of testing and everything else being lightweight, but that would be better than nothing.
In designing for testability, it's likely that I'll also make sure I can run the site locally without connecting to any Google Cloud services... while I'll certainly have a Firestore test" database separate from prod", it would be nice if I could load the same data just from local JSON files too.
What comes next?I enjoyed this whole experience so much that I've registered the https://election2029.uk domain. I figure if I put some real time into this, instead of cobbling it all together in under a month, I could really produce something that would be useful to a much larger group of people. At the moment, I'm planning to use Cloud Run to host the site (still using ASP.NET Core for the implementation) but who knows what could change between now and the next election.
Ideally, this would be open source from the start, but there are some issues doing that which could be tricky to get around, at least at the moment. Additionally, I'd definitely want to build on Google Cloud again, and with a site that's so reliant on data, it would be odd to say hey, you can look at the source for the site, but the data is all within my Google Cloud project, so you can't get at it." (Making the data publicly readable is another option, but that comes with issues too.) Maybe over the next few years I'll figure out a good way of handling this, but I'm putting that question aside for the moment.
I'm still going to aim to keep it pretty minimal in terms of styling, only using JavaScript where it really makes sense to do so. Currently, I'm not using any sort of framework (Vue, React, etc) and if I can keep things that way, I think I've got more chance of being able to understand what I'm doing - but I acknowledge that if the site becomes larger, the benefits of a framework might outweigh the drawbacks. It does raise the question of which one I'd pick though, given the timescale of the project...
Beyond 2029, I'll be starting to think about retirement. This project has definitely made me wonder whether retiring from full-time commercial work but providing tech tooling for progressive think-tanks might be avery pleasant way of easing myself into fuller retirement. But that's a long way off...