Violating the “smart enum” pattern in C#
For a while now, I've been a big fan of a pattern in C# which mimics Java enums to a certain extent. In general, it's a lovely pattern. Only after reading a comment on a recent blog post by Eric Lippert did I find out about a horrible flaw. Dubious thanks to John Payson for prompting this post. (And I fully intend to include this in the Abusing C# talk that I give every so often")
Trigger warning: this post contains code you may wish you'd never seen.
Let's start off with the happy path - a moment of calm before the storm.
When things go well"The idea is to create a class - which can be abstract - with a certain fixed set of instances, which are available via static properties on the class itself. The class is not sealed, but it prevents other classes from subclassing it by using a private constructor. Here's an example, including how it can be used:
using System;public abstract class Operation{ public static Operation Add { get; } = new AddOperation(); public static Operation Subtract { get; } = new SubtractOperation(); // Prevent non-nested subclasses. They can't call this constructor, // which is the only way of creating an instance. private Operation() {} public abstract int Apply(int lhs, int rhs); private class AddOperation : Operation { public override int Apply(int lhs, int rhs) { return lhs + rhs; } } private class SubtractOperation : Operation { public override int Apply(int lhs, int rhs) { return lhs - rhs; } }}public class Test{ static void Main() { Operation add = Operation.Add; Console.WriteLine(add.Apply(5, 3)); // 8 }}
(Note the use of the new "automatically implemented read-only property" from C# 6. Yum!)
Obviously there are possible variations of this - there could be multiple instances of one subclass, or we could even create instances dynamically if that were useful. The main point is that no other code can create instances or even declare their own subclasses of Operation. That private constructor stops everything, right? Um"
My eyes! The goggles do nothing!I'm sure all my readers are aware that every C# constructor either has to chain to another constructor using : this(...) or chain to a base class constructor, either implicitly calling a constructor without arguments or explicitly using : base(...).
What I wasn't aware of - or rather, I think I'd been aware of once, but ignored - was that constructor initializers can have cycles. In other words, it's absolutely fine to write code like this:
public class SneakyOperation : Operation{ public SneakyOperation() : this(1) {} public SneakyOperation(int i) : this() {} public override int Apply(int lhs, int rhs) { return lhs * rhs; }}
Neither operator attempts to call a base class constructor, so it doesn't matter that the only base class constructor is private. This code compiles with no issues - not even a warning.
So we've already failed in our intention to prevent subclassing from occurring" but this will fail with a StackOverflowException which will take down the process, so there's not much harm, right?
Well" suppose we didn't let it throw a StackOverflowException. Suppose we triggered our own exception first. There are various ways of doing this. For example, we could use arithmetic:
public SneakyOperation(int x) : this(x, 0){}// This isn't going to be happy if y is 0...public SneakyOperation(int x, int y) : this(x / y){}
Or we could have one constructor call another with a delegate which will deliberately throw:
public SneakyOperation(int x) : this(() => { throw new Exception("Bang!"); }){}public SneakyOperation(Func<int> func) : this(func()){}
So now, we have a class where you can call the constructor, and an exception will be thrown. But hey, you still can't get at the instance which was created, right? Well, this is tricky. We genuinely can't use this anywhere in code which we expect to execute - there's no way of reaching a constructor body, because in order to do that we'd have to have some route which took us through the base class constructor, which was can't access. Field initializers aren't allow to use this, either explicitly or implicitly (e.g. by calling instance methods). If there's a way of circumventing that, I'm not aware of it.
So, we can't capture a reference to the instance before throwing the exception. And after the exception is thrown, it's garbage, and therefore unavailable" unless we take it out of the garbage bin, of course. Enter a finalizer"
private static volatile Operation instance;~SneakyOperation(){ // Resurrect the instance. instance = this;}
Now all we have to do is wrap up the "create an object and wait for it to be resurrected" logic in a handy factory method:
public static Operation GetInstance(){ try { new SneakyOperation(0); } catch {} // Thoroughly expected Operation local; while ((local = instance) == null) { GC.Collect(); GC.WaitForPendingFinalizers(); } return local;}
Admittedly if multiple threads call GetInstance() at a time, there's no guarantee which thread will get which instance, or whether they'll get the same instance" but that's fine. There's very little to guarantee this will ever terminate, in fact" but I don't care much, as clearly I'm never going to use this code in production. It works on my machine, for the few times I've been able to bear running it. Here's a complete example (just combine it with the Operation class as before) in case you want to run it yourself:
public class SneakyOperation : Operation{ private static volatile Operation instance; private SneakyOperation(int x) : this(() => { throw new Exception(); }) {} private SneakyOperation(Func<int> func) : this(func()) {} public static Operation GetInstance() { try { new SneakyOperation(0); } catch {} Operation local; while ((local = instance) == null) { GC.Collect(); GC.WaitForPendingFinalizers(); } return local; } ~SneakyOperation() { instance = this; } public override int Apply(int lhs, int rhs) { return lhs * rhs; }}public class Test{ static void Main() { Operation sneaky = SneakyOperation.GetInstance(); Console.WriteLine(sneaky.Apply(5, 3)); // 15 }}
Vile, isn't it?
Fixing the problemThe first issue is the cyclic constructors. That's surely a mistake in the language specification. (It's not a bug in the compiler, as far as I can tell - I can't find anything in the spec to prohibit it.) This can be fixed, and it shouldn't even be too hard to do. There's nothing conditional in constructor initializers; each constructor chains to exactly one other constructor, known via overload resolution at compile time. Even dynamic doesn't break this - constructor initializers can't be dispatched dynamically. This makes detecting a cycle trivial - from each constructor, just follow the chain of constructor calls until either you reach a base class constructor (success) or you reach one of the constructor declarations already in the chain (failure). I will be pestering Mads for this change - it may be too late for C# 6, but it's never too early to ask about C# 7. This will be a breaking change - but it will only break code which is tragically broken already, always resulting in either an exception or a stack overflow.
Of course, a C# language change wouldn't change the validity of such code being created through IL. peverify appears to be happy with it, for example. Still, a language fix would be a good start.
There's a second language change which could help to fix this too - the introduction of private abstract/virtual methods. Until you consider nested classes, neither of those make sense - but as soon as you remember that there can be subclasses with access to private members, it makes perfect sense to have them. If Operation had a private abstract method (probably exposed via a public concrete method) then SneakyOperation (and any further subclasses) would have to be abstract, as it couldn't possibly override the method. I've already blogged about something similar with public abstract classes containing internal abstract methods, which prevents concrete subclasses outside that assembly" but this would restrict inheritance even further.
Does it reach the usefulness bar required for new language features? Probably not. Would it be valid IL with the obvious meaning? I've performed experiments which suggest that it would be valid, but not as restrictive as one would expect - but I could be mistaken. I think there's distinct merit in the idea, but I'm not expecting to see it come to pass.
ConclusionSo, what can you take away from this blog post? For starters, the fact that the language can probably be abused in far nastier ways than you've previously imagined - and that it's very easy to miss the holes in the spec which allow for that abuse. (It's not like this hole is a recent one" it's been open forever.) You should absolutely not take away the idea that this is a recommended coding pattern - although the original pattern (for Operation) is a lovely one which deserves more widespread recognition.
Finally, you can take away the fact that nothing gets my attention quicker than a new (to me) way of abusing the language.