Article 479EW Indexer error cases

Indexer error cases

by
ericlippert
from Fabulous adventures in coding on (#479EW)

Pop quiz: Consider these little methods. (They could be static, inline, whatever, that doesn't matter.)

int[] A1()
{
Console.WriteLine("A");
return new int[10];
}
int[] A2()
{
Console.WriteLine("A");
return null;
}
int B1()
{
Console.WriteLine("B");
return 1;
}
int B2()
{
Console.WriteLine("B");
return 11;
}
int C()
{
Console.WriteLine("C");
return 123;
}

Now I have four questions for you; can you predict the behaviour of each of these program fragments?

try { A1()[B1()] = C(); }
catch { Console.WriteLine("catch"); }

try { A1()[B2()] = C(); }
catch { Console.WriteLine("catch"); }

try { A2()[B1()] = C(); }
catch { Console.WriteLine("catch"); }

try { A2()[B2()] = C(); }
catch { Console.WriteLine("catch"); }

There's no trickery going on here; seriously, make a mental model of what each of these does and try to predict their behaviour. Scroll down when you're done.

.

.

.

.

.

The first one prints A, B, C. All the rest print A, B, C, catch.

Back when I was a young and naive compiler developer, I would have predicted the final three produce "A, B, catch", based on the following reasoning:

  • In C# an assignment produces all the side effects of the left-hand-side expression before all the side effects of the right-hand-side expression.
  • An exception is a side effect.
  • The exceptions produced here - either the null deference when A2() is called, or index out of bounds when A1()[B2()] is called - should therefore happen when the left hand side is evaluated, before C() is evaluated.

This reasoning is incorrect; in C# (and also in Java and many other languages), it is the assignment itself that triggers the exception, even when the exception is the result of an indexing operation.

That is, my wrong naive mental model was that A()[B()]=C() had the semantics of:

int[] a = A();
int b = B();
// Make an alias; crash if its bad
ref int r = ref a[b];
int c = C();
r = c;

This is wrong; the correct model is that this is equivalent to:

int[] a = A();
int b = B();
int c = C();
a[b] = c;

I must admit, when I learned this shortly after joining the C# team, I was surprised and I considered it a bug in the design. But there are good reasons for this, and it turns out to have some nice properties.

First off, let's get the main point out of the way. You should never write a program like the one I wrote, where there is a difference in behaviour depending on when the exception is thrown. Those exceptions should never be thrown! A program should not dereference null or blow past the end of an array and then catch the problem; you shouldn't cause the problem in the first place. So the choice of what to do here really should only matter to the compiler writer, not to the user. If you write your code correctly, it makes no difference when the exception is thrown because it will not actually be thrown.

So, given that it ought not to matter to the user, what is the language designer to do? Should the language designer say that "evaluate the left" causes the exception, or should the exception be triggered by the assignment, after the right has been evaluated?

One way to think about it is: what if we were calling a user-defined indexer setter instead of an array. Suppose we have a class X with an indexer:

public int this[int i]
{
get { " whatever " }
set
{
if (i < 0 || i > 10)
throw new BlahBlahBlah();
" whatever "
}
}

An indexer is just a fancy way to write a method call. Suppose we have:

X X1()
{
Console.WriteLine("X");
return new X();
}

And now what does X1()[B()]=C(); mean? Plainly it must mean:

X x = X1();
int b = B();
int c = C();
x.indexer_setter(b, c);

And that last line has the semantics of "throw if x is null, otherwise do the call and throw if the index is out of bounds". In this scenario we definitely evaluate B() even if X1() returns null, and we definitely evaluate C() even if B() is out of bounds, so it makes sense that we should be consistent between user-defined and built-in indexers.

The decision to make the evaluation of an indexer defer all the checks until the assignment also helps in this scenario; think about how you would make this work if you were the compiler writer:

async Task<int> D() {
await Task.Delay(1000);
return 123;
}

"

A()[B()] = await D();

At the point of the await, the current activation is going to spill all of its local state into a closure, assign the continuation of the current task to complete the assignment, and return control to the caller; when D() completes, the continuation is activated and the assignment happens. But what information is stored in the closure? If the left side is computed as

int[] a = A();
int b = B();
ref int r = ref a[b];

Then we can lose "a" and "b"; we never read them again. But we need to somehow store a ref in a closure, which is not legal in the CLR! There are ways to get around that, but boy are they painful and expensive. By deferring the validation of the collection and index until the assignment, we eliminate the need to spill a reference off the stack into a closures; we just store the ordinary values of "a" and "b".

Of course, the design team for C# 1.0 did not know that the design team for C# 5.0 was going to need to implement temporary spilling into closures, but it surely is a happy accident that the way it does work is the way that is cheapest and easiest to implement should you need to put temporaries into a closure.

Next time on FAIC: I recently learned about an interesting way to represent immutable lists that has completely different performance characteristics than standard immutable lists. At first glance it seems impossible, but it all works out beautifully. We'll do a short series on how to make standard linked lists into catenable deques without going to all the trouble of building a finger tree, like we did last time.

External Content
Source RSS or Atom Feed
Feed Location http://ericlippert.com/feed
Feed Title Fabulous adventures in coding
Feed Link https://ericlippert.com/
Reply 0 comments