Article 3H686 Implementing IXmlSerializable in readonly structs

Implementing IXmlSerializable in readonly structs

by
jonskeet
from Jon Skeet's coding blog on (#3H686)
Background

There are three things you need to know to start with:

Operations on read-only variables which are value types copy the variable value first. I've written about this before on this blog. C# 7.2 addresses this by introducing the readonly modifier for structs. See the language proposal for more details. I was touched to see that the proposal references my blog post :)

The ref readonly local variables and in parameter features in C# 7.2 mean that "read-only variables" are likely to be more common in C# than they have been in the past.

Noda Time includes many value types which implement IXmlSerializable. Noda Time implements IXmlSerializable.ReadXml by assigning to this: fundamentally IXmlSerializable assumes a mutable type. I use explicit interface implementation to make this less likely to be used directly on an unboxed variable. With a generic method using an interface constraint you can observe a simple method call mutating a non-read-only variable, but that's generally harmless.

Adding the readonly modifier to Noda Time structs

I relished news of the readonly modifier for struct declarations. At last, I can remove my ReadWriteForEfficiency attribute! Hmm. Not so much. To be fair, some of the structs (7 out of 18) were fine. But every struct that implements IXmlSerializable gave me this error:

Cannot assign to 'this' because it is read-only

That's reasonable: in members of a readonly struct, this effectively becomes an in parameter instead of a ref parameter. But how can we fix that? Like any sensible developer, I turned to Stack Overflow, which has precisely the question I'm interested in. It even has an answer! Unfortunately, it amounts to a workaround: assigning to this via unsafe code.

Violating readonly with unsafe code

To give a concrete example of the answer from Stack Overflow, here's my current LocalTime code:

void IXmlSerializable.ReadXml([NotNull] XmlReader reader){ Preconditions.CheckNotNull(reader, nameof(reader)); var pattern = LocalTimePattern.ExtendedIso; string text = reader.ReadElementContentAsString(); this = pattern.Parse(text).Value;}

Here's the code that compiles when LocalTime is marked as readonly, after enabling unsafe blocks:

unsafe void IXmlSerializable.ReadXml([NotNull] XmlReader reader){ Preconditions.CheckNotNull(reader, nameof(reader)); var pattern = LocalTimePattern.ExtendedIso; string text = reader.ReadElementContentAsString(); fixed (LocalTime* thisAddr = &this) { *thisAddr = pattern.Parse(text).Value; }}

Essentially, the unsafe code is bypassing the controls around read-only structs. Just for kicks, let's apply the same change
throughout Noda Time, and think about what would happen"

As it happens, that fix doesn't work universally: ZonedDateTime is a "managed type" because it contains a reference (to a DateTimeZone) which means you can't create a pointer to it. That's a pity, but if we can make everything else readonly, that's a good start. Now let's look at the knock-on effects"

Safe code trying to abuse the unsafe code

Let's try to abuse our "slightly dodgy" implementations. Here's a class we'll put in a "friendly" assembly which is trying to be as helpful as possible:

using NodaTime;public class AccessToken{ private readonly Instant expiry; public ref readonly Instant Expiry => ref expiry; public AccessToken(Instant expiry) => this.expiry = expiry;}

Great - it lets you get at the expiry time of an access token without even copying the value.

The big test is: can we break this friendly's code's assumptions about expiry really not changing its value?

Here's code I expected to mutate the access token:

static void MutateAccessToken(AccessToken accessToken){ ref readonly Instant expiry = ref accessToken.Expiry; string xml = "<evil>2100-01-01T00:00:00Z</evil>"; EvilReadXml(in expiry, xml);}static void EvilReadXml<T>(in T value, string xml) where T : IXmlSerializable{ var reader = XmlReader.Create(new StringReader(xml)); reader.Read(); value.ReadXml(reader);}

We have an in parameter in EvilReadXml, so the expiry variable is being passed by reference, and then we're calling ReadXml on that parameter" so doesn't that mean we'll modify the parameter, and thus the underlying expiry field in the object?

Nope. Thinking about it, the compiler doesn't know when it compiles EvilReadXml that T is a readonly struct - it could be a regular struct. So it has to create a copy of the value before calling ReadXml.

Looking the the spec proposal, there's one interesting point in the section on ref extension methods:

However any use of an in T parameter will have to be done through an interface member. Since all interface members are considered mutating, any such use would require a copy.

Hooray! That suggests it's safe - at least for now. I'm still worried though: what if the C# team adds a readonly struct generic constraint in C# 8? Would that allow a small modification of the above code to break? And if interface methods are considered mutating anyway, why doesn't the language know that when I'm trying to implement the interface?

But hey, now that we know unsafe code is able to work around this in our XML implementation, what's to stop nasty code from using the same trick directly?

Unsafe code abusing safe code

Imagine we didn't support XML serialization at all. Could unsafe code mutate AccessToken? It turns out it can, very easily:

static void MutateAccessToken(AccessToken accessToken){ ref readonly Instant expiry = ref accessToken.Expiry; unsafe { fixed (Instant* expiryAddr = &expiry) { *expiryAddr = Instant.FromUtc(2100, 1, 1, 0, 0); } }}

This isn't too surprising - unsafe code is, after all, unsafe. I readily admit I'm not an expert on .NET security, and I know the
landscape has changed quite a bit over time. These days I believe the idea of a sandbox has been somewhat abandoned - if you care about executing code you don't really trust, you do it in a container and use that as your security boundary. That's about the limit of my knowledge though, which could be incorrect anyway.

Where do we go from here?

At this point, I'm stuck making a choice between several unpleasant ones:

  • Leave Noda Time public structs as non-read-only. That prevents users from writing efficient code to use them without copying.
  • Remove XML serialization from Noda Time 3.0. We're probably going to remove binary serialization anyway on the grounds that Microsoft is somewhat-discouraging it going forward, and it's generally considered a security problem. However, I don't think XML serialization is considered to be as bad, so long as you use it carefully, preventing arbitrary type loading. (See this paper for more information.)
  • Implement XML serialization using unsafe code as shown above. Even though my attempt at abusing it failed, that doesn't provide very much comfort. I don't know what other ramifications there might be for including unsafe code. Does that limit where the code can be deployed? It also doesn't help with my concern about a future readonly struct constraint.

Thoughts very welcome. Reassurances from the C# team that "readonly struct" constraints won't happen particularly welcome" along with alternative ways of implementing XML serialization.

External Content
Source RSS or Atom Feed
Feed Location http://codeblog.jonskeet.uk/feed/
Feed Title Jon Skeet's coding blog
Feed Link https://codeblog.jonskeet.uk/
Reply 0 comments