Inferring from “is”, part one
In last week's episode of FAIC I was discussing code of the form:
if (animal is Dog) ((Dog)animal).Bark();
Specifically, why the cast was illegal if the variable tested was of generic parameter type. Today I want to take a bit of a different tack and examine the question "why do we need to insert a cast at all?" After all, the compiler should know that within the consequence of the if, the variable animal is guaranteed to be of type Dog. (And is guaranteed to be non-null!) Shouldn't the code simply be:
if (animal is Dog) animal.Bark();
This issue has been posed to the C# design committee a number of times over the years. I thought today I might describe how I'd push back on the proposal, and counter with some proposals that have a better chance of being actually implemented.
Throughout I'll assume that animal is of type Animal and that there are the obvious relationships between this type and its derived types.
The first thing I would note is that the is operator operates on an expression that has a value, not only a variable. Automatically we would wish to restrict the feature to variables:
if (foo.Bar(DateTime.Now) is Dog) foo.Bar(DateTime.Now).Bark();
The supposition that foo.Bar(DateTime.Now) will be Dog both times seems unwarranted; the compiler has no reason to believe that two calls with potentially two different arguments will return an object of the same type consistently. The value of those two expressions can be different.
Fortunately, the value of a variable never changes, right? Oh, wait, variables are called variables because they vary:
if (this.animal is Dog){ this.M(); this.animal.Bark();}
If animal is a field then M() might change the value of the field. But M() could be any method, including" Bark()!
if (this.animal is Dog){ this.animal.Bark(); this.animal.Bark();}
How do we know that the first call does not change the type of animal rendering the second an error? Remember, if the method is virtual then we may not even have the code that will ultimately be called when the program is compiled, so analyzing it is a non-starter. This problem alone would be enough for me to reject the feature completely, but let's solider on.
What if there is no intervening call? Unfortunately, other things can also give an opportunity to change a field:
if (this.animal is Dog){ yield return 123; this.animal.Bark();}
The method returns at the yield and then continues when MoveNext() is called again, but by that time the caller may have done something to modify the field. Similarly:
if (this.animal is Leopard){ M(await bar, this.animal.Spots.Count());}
The await returns immediately, giving ample time for someone else to change the value of animal.
And of course none of this considers the problem of the field being modified on another thread, but if you have two threads sharing memory, one reading and one writing, and no locks, of course you already have a bug, so I'm not super concerned about that problem.
Also of course this same set of problems applies to elements of arrays as much as to fields.
So what to do here? Remember, the point of a type system is to detect potential problems and determine at compile time that they will not crop up at runtime. If the type system ever allows a giraffe to bark then we have a failure of the type system. We can't just ignore the problem. Restricting what can come between the condition and the usage of the variable seems difficult. And just imagine the error messages the compiler team would have to come up with to explain why animal.Bark() is legal, but animal.Bark(); animal.Bark(); is illegal.
It seems like restricting the operand to be a variable is not enough. What if we restricted it to a local variable or formal parameter? Of course those can be modified:
if (animal is Dog){ animal = new Goldfish(); animal.Bark();}
So the proposed feature would necessitate a flow analysis to discover if the variable is written before it is used, but the compiler already has such an analyzer, for definite assignment.
Unfortunately, if the variable is a closed-over outer variable of a lambda then we have some of the same problems as before:
this.func = ()=> { animal = new Frog(); };if (animal is Dog){ this.func(); animal.Bark();}
Once again we are in a position where almost anything that comes between the test and the usage can cause the type system to be violated.
I think I have beaten enough on the control flow problems. There are other very real problems here though. Consider:
if (animal is IFoo) if (animal is IBar) if (animal is IBlah) M(animal);
What is the type of animal for purposes of overload resolution? Is it Animal? IFoo? If M() has overloads that take both IBar and IBlah, is this an ambiguity error?
That question brings to mind the possibility of breaking changes. Suppose we have
class B { public void M() { ... } }class D : B { public new void M() { ... } }...B c = whatever;if (c is D) c.M();
Today, without the feature, this calls B.M. That is almost certainly wrong; it seems highly likely that the author of the code expected D.M to be called. But nevertheless, introducing the feature would cause a different method to be called tomorrow than would have been called in yesterday's code, and that's a subtle breaking change.
Or, suppose we have
void N(Animal x) { ... }void N(Dog d) { ... }...if (animal is Dog) N(animal);
Again, the original code is probably a bug, and N(Animal) probably does the right thing regardless, but if we go from calling N(Animal) yesterday to N(Dog) tomorrow, that could be characterized as a breaking change.
Similarly for scenarios that I won't spell out in detail, where working code suddenly becomes not-compiling code because the additional type information introduces an ambiguity that previously did not exist for overload resolution.
So that's lots of points against this proposed solution of inferring the type to be stronger within the body of the if. Next time on FAIC: we'll look at some alternatives that would be less dangerous.