Article 9FRG When everything you know is wrong, part two

When everything you know is wrong, part two

by
ericlippert
from Fabulous adventures in coding on (#9FRG)

Now that we've looked at a bunch of myths about when finalizers are required to run, let's consider when they are required to not run:

Myth: Keeping a reference to an object in a variable prevents the finalizer from running while the variable is alive; a local variable is always alive at least until control leaves the block in which the local was declared.

{ Foo foo = new Foo(); Blah(foo); // Last read of foo Bar(); // We require that foo not be finalized before Bar(); // Since foo is in scope until the end of the block, // it will not be finalized until this point, right?} 

The C# specification states that the runtime is permitted broad latitude to detect when storage containing a reference is never going to be accessed again, and to stop treating that storage as a root of the garbage collector. For example, suppose we have a local variable foo and a reference is written into it at the top of the block. If the jitter knows that a particular read is the last read of that variable, the variable can legally be removed from the set of GC roots immediately; it doesn't have to wait until control leaves the scope of the variable. If that variable contained the last reference then the GC can detect that the object is unreachable and put it on the finalizer queue immediately. Use GC.KeepAlive to avoid this.

Why does the jitter have this latitude? Suppose the local variable is enregistered into the register needed to pass the value to Blah(). If foo is in a register that Bar() needs to use, there's no point in saving the value of the never-to-be-read-again foo on the stack before Bar() is called. (If the actual details of the code generated by the jitter is of interest to you, see Raymond Chen's deeper analysis of this issue.)

Extra bonus fun: the runtime uses less aggressive code generation and less aggressive garbage collection when running the program in the debugger, because it is a bad debugging experience to have objects that you are debugging suddenly disappear even though the variable referring to the object is in scope. That means that if you have a bug where an object is being finalized too early, you probably cannot reproduce that bug in the debugger!

See the last point in this article for an even more horrid version of this problem.

Myth: Finalizers run no more than once.

Suppose you have an object that is in the process of being finalized and is therefore no longer a candidate for finalization, or you have suppressed finalization. The aptly-named ReRegisterForFinalize method tells the runtime that you would like the object to be finalized. This can cause an object to be finalized more than once.

Why on earth would you want to do that? The most common usage case is that you have a pool of objects that are very expensive for some reason. Perhaps they are producing collection pressure if they are allocated too often, or perhaps they are for some reason expensive to allocate but cheap to re-use. In this case you can have a "pool" of living objects. When you need an object, you remove it from the pool. When you're done with the object, you put it back in the pool. What if you forget to put the object back in the pool? (This is analogous to forgetting to dispose of an object that has an unmanaged resource.) In that case, the finalizer can put the object being finalized back in the pool, so it is no longer dead. Of course the object now needs to be finalized again, should the user take it out of the pool and again forget to finalize it.

I do not recommend resurrecting dead objects unless you really know what you are doing and you have a clearly unacceptable performance problem that this technique solves. In the case of Roslyn we identified very early on that the compiler allocates a gazillion small objects, some of them very short-lived and reusable, and that we had a performance problem directly attributable to excess collection pressure. We used a pooling strategy for the cases where our performance tests indicated that it would be a win.

Myth: An object being finalized is a dead object.

The GC must identify an object as dead - no living references - in order to place it on the finalizer queue, but the finalizer queue is itself a living object, so objects on the finalizer queue are technically alive as far as the GC is concerned. Which is good; if the GC runs for a second time while the objects identified the previous time are still on the finalization queue, they should not be reclaimed, and they certainly should not be placed on the finalization queue again!

Myth: An object being finalized is guaranteed to be unreachable from code outside the finalization queue.

There could be two objects both determined by the GC to be dead, both with references to each other. When one is finalized, it decides to keep itself alive an copies its "this" to a static field, which is clearly reachable by user code. Since the now-reachable object has a reference to another object, that object is also reachable, so user code could be running in it while it is being finalized.

Again, I strongly recommend against resurrecting dead objects unless you really know what you are doing and have a truly excellent reason for doing this crazy thing.

Myth: Finalizers run on the thread that created the object.

The finalizer typically runs on its own thread. If you have an object that is in some way has affinity to a particular thread - perhaps it uses thread local storage, or perhaps it is an apartment threaded object - then you must do whatever threading magic is necessary to use the object safely from the finalizer thread, preferably without blocking the finalizer thread indefinitely.

Myth: Finalizers run on the garbage collector thread.

The finalizer and the garbage collector typically have their own threads. This is not a requirement of all versions of the CLR, but it is the typical case.

Myth: Finalizers run as the garbage collector determines that objects are dead.

As we've discussed, the GC determines that the object is dead and needs finalization, and puts it on the finalizer queue. The GC then keeps on doing what it does best: looking for dead objects.

Myth: Finalizers never deadlock

We can certainly force a finalizer to deadlock, illustrating that the myth is false:

class Deadlock{ ~Deadlock() { System.Threading.Monitor.Enter(this); } static void Main() { Deadlock d = new Deadlock(); System.Threading.Monitor.Enter(d); d = null; System.GC.Collect(); System.GC.WaitForPendingFinalizers(); }}

This is obviously unrealistic, but realistic deadlocks are in particular possible in scenarios like I mentioned above: where a call must be marshalled to the correct thread for an object that has some sort of thread affinity. Here's a link to a typical example. (Note that the article leads with "finalizers are dangerous and you should avoid them at all costs". This is good advice.)

Myth: Finalizers run in a predictable order

Suppose we have a tree of objects, all finalizable, and all on the finalizer queue. There is no requirement whatsoever that the tree be finalized from the root to the leaves, from the leaves to the root, or any other order.

Myth: An object being finalized can safely access another object.

This myth follows directly from the previous. If you have a tree of objects and you are finalizing the root, then the children are still alive - because the root is alive, because it is on the finalization queue, and so the children have a living reference - but the children may have already been finalized, and are in no particularly good state to have their methods or data accessed.

Myth: Running a finalizer frees the memory associated with the object.

The finalizer thread runs the finalizers, the GC thread identifies dead objects that do not need finalization, and reclaims their memory. The finalizer thread does not try to do the GC's job for it.

Myth: An object being finalized was fully constructed.

I've saved the worst for last. This is in my opinion the truly nastiest of all the issues with finalizers. I'll give you two scenarios, both horrible.

sealed class Nasty : IDisposable{ IntPtr foo; IntPtr bar; public Nasty() { foo = AllocateFoo(); // Suppose a thread abort exception is thrown right here. bar = AllocateBar(); } ~Nasty() { Dispose(false); } public void Dispose() { Dispose(true); } private void Dispose(bool disposing) { DeallocateFoo(foo); DeallocateBar(bar); }}

In C++, destructors don't run if a constructor throws, but in C# an object becomes eligible for finalization the moment that it is created. If a thread abort exception is thrown after foo is initialized then bar is still zero when the finalizer runs, and zero might not be a valid input to DeallocateBar.

Now let's combine that with the first point in today's episode: that a finalizer can run earlier than you think.

sealed class Horrid : IDisposable{ IntPtr foo; public Horrid() { foo = AllocateFoo(); Bar.Blah(); // static method } ~Horrid() { Dispose(false); } public void Dispose() { Dispose(true); } private void Dispose(bool disposing) {

OK, what are the possible scenarios at this point? Plainly a thread abort exception could have been thrown before, during or after the execution of Blah(), so we cannot rely on any invariant set up by Blah() in the finalizer. But we can at least rely on the fact that there are only three possibilities: Blah() was never run, Blah() threw, or Blah() completed normally, right?

No; there is a fourth possibility: Blah() is still running on the user thread, the GC has identified that the this is never read, so the object is a candidate for finalization, and therefore it is possible that the finalizer and constructor are running concurrently. (Why you would create an object and then never read the reference I do not know, but people do strange things.)

And finally, I described an even more horrid version of this scenario in a previous blog entry.

Read the title of this article again: everything you know is wrong. In a finalizer you have no guarantee that anything happened other than the object was allocated, and that the GC at one time believed it to be dead. You have no guarantee that any invariant set up by the constructor is valid, and the constructor (or any other method of the object) could still be running when the finalizer is called, provided that the runtime knows that local copies of the reference will never be read again.

It is therefore very difficult indeed to write a correct finalizer, and the best advice I can give you is to not try.

Next time on FAIC: A far-too-detailed analysis of a copy-paste bug. But not in code this time!


2750 b.gif?host=ericlippert.com&blog=67759120
External Content
Source RSS or Atom Feed
Feed Location http://ericlippert.com/feed
Feed Title Fabulous adventures in coding
Feed Link https://ericlippert.com/
Reply 0 comments