Anti-unification, part 5

ericlippert

from Fabulous adventures in coding on 2018-11-06 18:10 (#42CCM)

Last time we wrote all the boring boilerplate code for substitutions and trees. Now let's implement the algorithm. As I noted a couple of episodes back, we can reduce the algorithm to repeated application of two rules that mutate three pieces of state: the current generalization, the current substitutions on s, and the current substitutions on t.

The function returns those three things, and they do not have any particular semantic connection to each other aside from being the solution to this problem, so let's try returning them as a tuple.

This seems like a good place to try out nested functions in C# 7, since each rule is logically its own function, but also only useful in the context of the algorithm; there's no real reason to make these private methods of the class since no other code calls them. Also, they're logically manipulating the local state of their containing function.

We'll start by setting up the initial state as being the most general generalization:

public static (Tree, Substitutions, Substitutions)
Antiunify(Tree s, Tree t)
{
var h = MakeHole();
var generalization = h;
var sSubstitutions = Substitutions.Empty.Add(h, s);
var tSubstitutions = Substitutions.Empty.Add(h, t);

Recall the first rule seeks situations where there is a substitution that is insufficiently specific. We want to go until no more rules apply, so we'll have this return a Boolean indicating whether the rule was applied or not.

bool RuleOne()
{
var holes = from subst in sSubstitutions
let cs = subst.Value
let ct = tSubstitutions[subst.Key]
where cs.Kind == ct.Kind
where cs.Value == ct.Value
where cs.ChildCount == ct.ChildCount
select subst.Key;
var hole = holes.FirstOrDefault();
if (hole == null)
return false;
var sTree = sSubstitutions[hole];
var tTree = tSubstitutions[hole];
sSubstitutions = sSubstitutions.Remove(hole);
tSubstitutions = tSubstitutions.Remove(hole);
var newHoles =
sTree.Children.Select(c => MakeHole()).ToList();
foreach (var (newHole, child) in newHoles.Zip(
sTree.Children, (newHole, child) => (newHole, child)))
sSubstitutions = sSubstitutions.Add(newHole, child);
foreach (var (newHole, child) in newHoles.Zip(
tTree.Children, (newHole, child) => (newHole, child)))
tSubstitutions = tSubstitutions.Add(newHole, child);
generalization = generalization.Substitute(
hole, new Tree(sTree.Kind, sTree.Value, newHoles));
return true;
}

There is a small code smell here: tuples are value types, and so the "default" if there is no pair of holes that matches like this is (null, null), so that's the condition that we're using to check to see if the rules apply.

Notice that we're using tuples to iterate over two sequences of equal size via zip. The code seems inelegant to me in a subtle way. The fundamental issue here is that C# has always had mutable tuples ever since version 1.0; it just called them "argument lists", and that's weird. It has always struck me as bizarre that C# requires you to pass an argument tuple, but that it gives you no syntax for manipulating that tuple in any way other than extracting the arguments from it or mutating them. You cannot treat what is logically a tuple as a tuple; instead you have to write code that explicitly constructs a real tuple out of the logical tuple, and end up writing what looks like it ought to be an identity:

(newHole, child) => (newHole, child)

For that matter, why do we need to zip at all? In this particular example it would be nice if the tuple syntax carried over into foreach loops; imagine if instead of that ugly zip code we could just write

foreach (var newHole, var child in newHoles, sTree.Children)

Zipping is only necessary here because the language lacks the feature of treating tuples as values consistently across the language. I'm hoping there will be further improvements in this area in C# 8.

But I digress. We've implemented the first rule, and the second is even more straightforward. Here we are looking for redundant holes and removing them:

bool RuleTwo()
{
var pairs =
from s1 in sSubstitutions
from s2 in sSubstitutions
where s1.Key != s2.Key
where s1.Value == s2.Value
where tSubstitutions[s1.Key] == tSubstitutions[s2.Key]
select (s1.Key, s2.Key);
var (hole1, hole2) = pairs.FirstOrDefault();
if (hole1 == null)
return false;
sSubstitutions = sSubstitutions.Remove(hole1);
tSubstitutions = tSubstitutions.Remove(hole1);
generalization = generalization.Substitute(hole1, hole2);
return true;
}

Quite fine. And now the outer loop of the algorithm is trivial. We keep applying rules until we are in a situation where neither applies.

while (RuleOne() || RuleTwo())
{ /* do nothing */ }
return (generalization, sSubstitutions, tSubstitutions);
}

It's slightly distasteful to have RuleOne and RuleTwo useful for both their side effects and their values, but really their values are only being used for control flow, not for the value that was computed, so I feel OK about this.

Let's try it out! Again we'll make a couple of local helper functions:

static void Main()
{
Tree Cons(params Tree[] children) =>
new Tree("call", "cons", children);
Tree Literal(string value) =>
new Tree("literal", value);
var one = Literal("1");
var two = Literal("2");
var three = Literal("3");
var nil = Literal("nil");
var s = Cons(Cons(one, two), Cons(Cons(one, two), nil));
var t = Cons(three, Cons(three, nil));
var (generalization, sSubstitutions, tSubstitutions) =
Tree.Antiunify(s, t);
Console.WriteLine(generalization);
Console.WriteLine(sSubstitutions.LineSeparated());
Console.WriteLine(tSubstitutions.LineSeparated());
}

And when we run it, we get the right answer:

cons(h1,cons(h1,nil))
cons(1,2)/h1
3/h1

Nice!

Next time on FAIC: Why is this useful?

Source	RSS or Atom Feed
Feed Location	http://ericlippert.com/feed
Feed Title	Fabulous adventures in coding
Feed Link	https://ericlippert.com/