Article APG9 Backwards compatibility is (still) hard

Backwards compatibility is (still) hard

by
jonskeet
from Jon Skeet's coding blog on (#APG9)

At the moment, I'm spending a fair amount of time thinking about a new version of the C# API and codegen for Protocol Buffers, as well as other APIs for interacting with Google services. While that's the context for this post, I want to make it very clear that this is still a personal post, and should in no way be taken to be "Google's opinion" on anything. The underlying issue could apply in many other situations, but it's easiest to describe a concrete scenario.

Context and current state: the builder pattern

The problem I've been trying to address is the relative pain of initializing a protobuf message. Protocol buffer messages are declared in a separate schema file (.proto) and then code is generated. The schema declares fields, each of which has a name, a type and a number associated with it. The generated message types are immutable, with builder classes associated with them. So for example, we might start off with a message like this:

message Person { string first_name = 1; string last_name = 3;}

And construct a Person object in C# like this:

var person = new Person.Builder { FirstName = "Jon", LastName = "Skeet" }.Build();// Now person.FirstName and person.LastName are readonly properties

That's not awful, but it's not the cleanest code in the world. We can make it slightly simpler using an implicit conversion from the builder type to the message type:

Person person = new Person.Builder { FirstName = "Jon", LastName = "Skeet" };

It's still not really clean though. Let's revisit why the builder pattern is useful:

  • We can specify just the properties we want.
  • By deferring the "build" step until after we've specified everything, we get mutability without building a lot of intermediate objects.

If only there were another language construct allowing that"

Optional parameters to the rescue!

If we provided a constructor with an optional parameter for each property, we can specify just what we want. So something like:

public Person(string firstName = null, string lastName = null)...var person = new Person(firstName: "Jon", lastName: "Skeet");

Hooray! That looks much nicer:

  • We can use var (if we want to) because there are no implicit conversions to confuse things.
  • We don't need to mention a builder at all.
  • Every piece of text in the statement is something we want to express, and we only express it once.

That last point is a lovely place to be in terms of API design - while you still need to worry about naming, ordering and how the syntax fits into bigger expressions, you've achieved some sense of "as simple as possible, but no simpler".

So, that's all great - except for versioning.

Let's just add a field at the end"

One of the aims of protocol buffers is to support an evolving schema. (The limitations are different for proto2 and proto3, but that's a slightly different matter.) So what happens if we add a new field to the message?

message Person { string first_name = 1; string last_name = 3; string title = 4; // Mr, Mrs etc}

Now we end up with the following constructor:

public Person(string firstName = null, string lastName = null, string title = null)

The code still compiles - but if we try to use run our old client code against the new version of the library, it will fail - because the method it refers to no longer exists. So we have source compatibility, but not binary compatibility.

Let's just add a field in the middle"

You may have noticed that I don't have a field with tag 2 - this is not an accident. Suppose we now add it, for the obvious middle_name field:

message Person { string first_name = 1; string middle_name = 2; string last_name = 3; string title = 4; // Mr, Mrs etc}

Regenerate the code, and we end up with a constructor with 4 parameters:

public Person( string firstName = null, string middleName = null, string lastName = null, string title = null)

Just to be clear, this change is entirely fine in protocol buffers - while normally fields are assigned incrementally, it shouldn't be a breaking change to add a new field "between" existing ones.

Let's take a look at our client code again:

var person = new Person(firstName: "Jon", lastName: "Skeet");

Yup, that still works - we need to recompile, but we still end up with a Person with the right properties. But that's not the only code we could have started with. Suppose we'd actually had:

var person = new Person("Jon", "Skeet");

Until this last change that would have been fine - even after we'd added the optional title parameter, the two arguments would still have mapped to firstName and lastName respectively.

Having added the middle_name field, however, the code would still compile with no errors or warnings, but the meaning of the second argument would have changed - it would now map onto the middleName parameter instead of lastName.

Basically, we'd like to stop this code (using positional arguments) from compiling in the first place.

Feature requests and a workaround

The two features we really want from C# here are:

  • Some way of asking the generated code to perform dynamic overload resolution at execution time" not based on dynamic values, but on the basis that the code we're compiling against may have changed since we compiled. This resolution only needs to be performed once, on first execution (or class load, or whatever) as by the time we're executing, everything is fixed (the parameter names and types, and the argument names and types). It could be efficient.
  • Some way of forcing any call sites to use named arguments for any optional parameters. (Even though in our case all the parameters are optional, I can easily imagine a case where there are a few required parameters and then the optional ones. Using positional arguments for those required parameters is fine.)

It's hard (without forking Roslyn :) to implement these feature requests ourselves, but for the second one we can at least have a workaround. Consider the following struct:

public struct DoNotCallThisMethodWithPositionalArguments {}

" and now imagine our generated constructor had been :

public Person( DoNotCallThisMethodWithPositionalArguments ignoreMe = default(DoNotCallThisMethodWithPositionalArguments), string firstName = null, string middleName = null, string lastName = null, string title = null)

Now our constructor call using positional arguments will fail, because there's no conversion from string to the crazily-named struct. The only "nice" way you can call it is to use named arguments, which is what we wanted. You could call it using positional arguments like this:

var person = new Person( new DoNotCallThisMethodWithPositionalArguments(), "Jon", "Skeet");

(or using default(...) like the constructor declaration) - but at this point the code looks broken, so it's your own fault if you decide to use it.

The reason for making it a struct rather than a class is to avoid null being convertible to it. Annoyingly, it wouldn't be hard to make a class that you could never create an actual instance of, but you can't prevent anyone from creating a value of a struct. Basically, what we really want is a type such that there is no valid expression which is convertible to that type - but aside from static classes (which can't be used as parameter types) I don't know of any way of doing that. (I don't know what would happen if you compiled the Person class using a non-static class as the first parameter, then made that class static and recompiled it. Confusion on the part of the C# compiler, I should think.)

Another option (as mentioned in the comments) is to have a "poor man's" version of the compiler enforcement via a Roslyn Code Diagnostic - add an attribute to any method call where you want "All optional parameters must be specified with named arguments" to apply, and then make the code diagnostic complain if you disobey that. That diagnostic could ship with the Protocol Buffers NuGet package, which would make for a pretty nice experience. Not quite as good as a language feature though :)

Conclusion

Default parameters are a pain in terms of compatibility. For internal company code, it's often reasonable to only care about source compatibility as you can recompile all calling code before deployment - but for open source projects, binary compatibility within the same major version is the norm.

How useful and common do I think these features would be? Probably not common enough to meet the bar - unless there's encouragement within comments here, in which case I'm happy to file feature requests on GitHub, of course.

As it happens, I'm currently looking at radical changes to the C# implementation of Protocol Buffers, regretfully losing the immutability aspect due to it raising the barrier to entry. It's not quite a done deal yet, but assuming that goes ahead, all of this will be mostly irrelevant - for Protocol Buffers. There are plenty of other places where code generation could be more robustly backward-compatible through judicious use of optional-but-please-use-named-arguments parameters though"


1512 b.gif?host=codeblog.jonskeet.uk&blog=717
External Content
Source RSS or Atom Feed
Feed Location http://codeblog.jonskeet.uk/feed/
Feed Title Jon Skeet's coding blog
Feed Link https://codeblog.jonskeet.uk/
Reply 0 comments