Anti-unification, part 1

ericlippert

from Fabulous adventures in coding on 2018-10-29 16:41 (#41TP6)

Last time on FAIC we noted that there was a simple, recursive, linear-time first-order unification algorithm. We didn't give the algorithm, but hopefully you see how it would go. The point is, we start with two expressions that contain some holes, and we deduce hole-free expressions that, when filled into the holes, cause the two expressions to become identical.

I got into this series of posts because I wanted to talk about anti-unification, and it is hard to say what anti-unification is if you don't know what unification is. Anti-unification is kinda, sorta, but not exactly the opposite problem. The anti-unification problem is:

Given two input expressions (often, but not necessarily without holes), call them s and t, find a generalizing expression g that does have holes, and two substitutions; the first substitution makes the result expression equal to the first input, and the second substitution makes it equal to the second input.

Let's look at an example. Suppose are inputs are

s is (1 :: 2) :: (1 :: 2) :: nilt is 3 :: 3 :: nil

If you don't like the :: operator, we could equivalently think of these as function calls:

s is cons(cons(1, 2), cons(cons(1, 2), nil))t is cons(3, cons(3, nil))

Or, if you prefer, as trees:

s is cons t is cons / \ / \ cons cons 3 cons / \ / \ / \ 1 2 cons nil 3 nil / \ 1 2

and the question is: what is an expression with holes such that there are substitutions for the holes that make both s and t? Plainly the generalizing expression is

cons(h0, cons(h0, nil)

and the substitutions are cons(1, 2) / h0 to make s and 3 / h0 to make t. (Recall that our standard notation for substitutions is to separate the value being substituted and the hole to substitute by a slash.)

You might have noticed that the unification problem might have no solution; there might be no possible set of substitutions for all the variables that make the statements all true. But the anti-unification problem always has at least one solution: the generalized expression h0 always works, and of course the substitutions are s/h0 and t/h0.

Thus we need to make the anti-unification problem a little bit harder for it to be interesting: we want the most specific generalization, not just any generalization.

It was pretty clear that unification on equations could be generalized to multiple equations. Similarly, I hope it is clear that anti-unification on two expressions can be generalized to any number of expressions; we want the generalizing expression and a substitution for each input expression.

And of course just as there was higher-order unification for unification problems involving lambdas, and unification modulo theory for problems involving arithmetic or other mathematical ideas, there are similar variations on anti-unification. For our purposes we'll consider only first-order anti-unifiction, which is the easy one.

Why is anti-unification useful? I'll get into that in a later post. Now that we know what first-order anti-unification is, can we devise and implement an algorithm that takes in expressions and gives us the most specific anti-unifying expression?

Next time on FAIC: we'll sketch out the algorithm.

Source	RSS or Atom Feed
Feed Location	http://ericlippert.com/feed
Feed Title	Fabulous adventures in coding
Feed Link	https://ericlippert.com/