Election 2029: The Impossible Exception – Solved
data:image/s3,"s3://crabby-images/aec73/aec73d098d6bb113fee95b8d26b80c8c842b9ae3" alt="Story Image"
Shortly after writing my previous post, a colleague pinged me to say she'd figured out what was wrong - at least at the most immediate level, i.e. the exception itself. Nothing is wrong with the ordering code - it's just that the exception message is too easy to misread. She's absolutely right, and I'm kicking myself.
Here's the exception message again:
Incorrect ordering for PredictionSets: mic-01 should occur before focaldata-01
and the code creating that:
string currentText = selector(current);string nextText = selector(next);if (StringComparer.Ordinal.Compare(currentText, nextText) >= 0){ throw new InvalidOperationException( $"Incorrect ordering for {message}: {currentText} should occur before {nextText}");}
In my previous post, I then claimed:
The exception message implies that at the time of the exception, the value of currentText was "focaldata-01", and the value of nextText was "mic-01".
No, it doesn't! It implies the exact opposite, that the value of currentText was "mic-01", and the value of nextText was "focaldata-01"... in other words, that the data was genuinely wrong.
Sigh. Even while constantly thinking when my code misbehaves, it's almost always my fault" I'm still not capable of really taking a step back and double-checking my logic properly.
But this is odd, right? Because the data that had previously been invalid (20:15:57) magically became" valid later (20:26:22), right? That's what I claimed in the last post. I should have looked at the logs more carefully... a new instance was started at 20:22:58. That new instance loaded the data correctly, so reloading the already-valid data was fine.
What was actually wrong?I've started writing this post before actually fixing the code, but I'm now sure that the problem is with partial" reload - adding a new prediction set to the database and then reloading the data from a storage system that already has the existing data in its cache. This should be relatively easy to test -
First, it's worth fixing that message. Instead of talking about what should occur" let's say what actually is the case, with the index into the collection at which things go wrong:
foreach (var (index, (current, next)) in source.Zip(source.Skip(1)).Index()){ string currentText = selector(current); string nextText = selector(next); if (StringComparer.Ordinal.Compare(currentText, nextText) >= 0) { throw new InvalidOperationException($"Incorrect ordering: {message}[{index}]={currentText}; {message}[{index + 1}]={nextText}"); }}
Next, let's add another level of checking when uploading new data: as well as reloading twice from a clean start, let's add a before then after" reload. The code for this isn't interesting (although it was fiddly for dependency-injection reasons). Then just test adding a definitely first" prediction set with an ID of aaaa"...
Hooray, I've reproduced the problem!
Incorrect ordering: PredictionSets[4]=name-length; PredictionSets[5]=aaaa
After that, it didn't take very long (with a bit more logging) to find the problem. Once I'd found it, it was really easy to fix. Without going into too much unnecessary detail, I was corrupting my internal mappings from hash to full data" when combining new and old mappings.
var predictionSetsByHash = newHashes.Concat(currentHashes) .Zip(currentPredictionSets.Concat(newPredictionSets)) .ToOrdinalDictionary(pair => pair.First, pair => pair.Second);
should have been:
var predictionSetsByHash = newHashes.Concat(currentHashes) .Zip(newPredictionSets.Concat(currentPredictionSets)) .ToOrdinalDictionary(pair => pair.First, pair => pair.Second);
This would only ever be a problem when loading a context with a new prediction set, when we previously had a prediction set.
This is where my election site not having many automated tests (these would have had to be integration tests rather than unit tests, probably) fell short... although it's one of the few times that's been the case, to be fair to myself.
It's probably time to start writing some more tests - especially as in this case, this was a whole context storage system that had been rewritten in the early hours of the morning.
ConclusionSo, a few lessons learned:
- Yup, when my code misbehaves, it's almost always my fault. Even when I stare at it and think I've somehow found something really weird.
- I should write more tests.
- It's really important to make exception messages as unambiguous as possible.
- I should always listen to Amanda.