An interesting list structure, part 1
As long-time readers of my blog know, I'm a big fan of functional persistent immutable list data structures. I recently learned about a somewhat bizarre but very practical implementation of such lists, and naturally I wanted to implement it myself to try it out. The list data structure I'm going to discuss today is called a "Hughes list", after its creator John Muir Hughes. The Haskell implementation is called "DList", which stands for "difference list", and this data structure is also commonly used in Prolog and Scala.
Let's start by revisiting the standard immutable persistent linked list, which takes the form of a stack. As you probably recall, by "immutable" we mean that the operations like Push and Pop on the list do not change the list; it is immutable. Rather, they return a new list that has the desired content. Since lists are immutable, their contents can be re-used in new lists without worrying that a mutation will wreck things for the code that is using the new version; this is what is meant by "persistent" in this context.
In this implementation I'm going to implement the usual push, pop, peek and "is empty?" operations, but I'm going to add two more: "append", which adds a single item to the list at the "wrong" end, and "concatenate", which concatenates two lists together.
That is, if we have a list 1, 2, 3 and we push 4, the result is 4, 1, 2, 3. If we have a list 1, 2, 3 and we append 4, we have the list 1, 2, 3, 4. And if we have list 1, 2, 3, and we concatenate list 4, 5, 6, the result is 1, 2, 3, 4, 5, 6. All clear?
Essentially I'm implementing catenable deques here, but as we'll see, they are very inefficient in this implementation.
I'm also going to implement two static list constructors: one that makes an empty list, and one that makes a single-item list.
Oh, and before we begin: I'm going to use the standard simple recursive implementations of these operations. In practice, these would blow the stack in C# if the lists got big, because C# is not consistently tail-recursive. Were I designing these types for industrial use, I'd write iterative algorithms instead of recursive algorithms, and I'd also follow good practices like implementing IEnumerable and so on. These types are not intended for use in production; they are for pedagogic purposes only.
public abstract class SimpleList<T>
{
public static readonly SimpleList<T> Empty =
new EmptyList();
public static SimpleList<T> Single(T t) =>
Empty.Push(t);
private SimpleList() { }
public abstract bool IsEmpty { get; }
public abstract T Peek { get; }
// If this is a, b, c, then this.Pop is b, c
public abstract SimpleList<T> Pop { get; }
// If this is a, b, c and s is d, e, f, then
// this.Concatenate(s) is a, b, c, d, e, f
public abstract SimpleList<T> Concatenate(SimpleList<T> s);
// If this is b, c, and t is a, then this.Push(t) is a, b, c
public SimpleList<T> Push(T t) =>
new NonEmptyList(t, this);
// If this is b, c, and t is a, then this.Append(t) is b, c, a
public SimpleList<T> Append(T t) =>
this.Concatenate(Single(t));
private sealed class EmptyList : SimpleList<T>
{
public override bool IsEmpty => true;
public override T Peek =>
throw new InvalidOperationException();
public override SimpleList<T> Pop =>
throw new InvalidOperationException();
public override SimpleList<T> Concatenate(
SimpleList<T> s) => s;
public override string ToString() => "";
}
private sealed class NonEmptyList : SimpleList<T>
{
private readonly T head;
private readonly SimpleList<T> tail;
public NonEmptyList(T head, SimpleList<T> tail)
{
this.head = head;
this.tail = tail;
}
public override bool IsEmpty => false;
public override T Peek => head;
public override SimpleList<T> Pop => tail;
public override SimpleList<T> Concatenate(
SimpleList<T> s) =>
s.IsEmpty ? this : this.Pop.Concatenate(s).Push(this.Peek);
}
public override string ToString() =>
this.Pop.IsEmpty ?
$"{this.Peek}" :
$"{this.Peek}, {this.Pop}";
}
}
All right, hopefully that was pretty straightforward. Empty, Single, IsEmpty, Peek, Push and Pop are all O(1). Concatenate is O(n) in the size of "this", except for the special case of concatenating an empty list, which is O(1) regardless of the size of "this". Append is implemented as Concatenate, so it is O(n).
The only algorithm which is even slightly tricky is Concatenate, but hopefully you see how it works. Basically, we're using recursion to reverse "this" and then push each element of the reversed list onto s. As I noted above, a more efficient and safe implementation would be to actually reverse the list in a loop rather than using recursion to put that information onto the call stack, but whatever, this is for pedagogy - however, we will return to a related issue in a future episode, so keep it in mind!
The problem here is that we are often in a situation where Append and Concatenate are frequent operations, but Peek and Pop are not. For example, we might be in a "list building" scenario where we are composing a large list out of many smaller lists, and wish to be able to efficiently concatenate them.
I already did a series on using finger trees to implement an efficient catenable deque, but finger trees are a quite complex and difficult data structure to understand; you may recall that I did not even implement the efficient "break up/concatenate" operations because they were just too much work. The great thing about Hughes lists is that they are a seemingly very simple data structure that efficiently implements list concatenation. In fact, here it is:
public struct HughesList<T>
{
private readonly Func<SimpleList<T>, SimpleList<T>> f;
}
And" that's it. A Hughes list is a struct that wraps a single delegate; the delegate takes a simple list and returns a simple list. Of course I have left out all the methods and properties, but the delegate is the only data that is stored in a Hughes list.
I know what you're thinking, because it's the same thing that I thought when I first heard about this: that sounds completely bonkers. How can a function from lists to lists be a representation of a list? Where's the head and the tail and all that stuff?
Next time on FAIC: We'll find out how it works and build an implementation in less than two dozen lines of code.