Graph traversal, part one
There are lots of topics in both practical and theoretical computer science that deal with graphs, lots of different kinds of graphs, lots of ways to represent a graph as a data structure, and lots of operations you'd want to perform on such a data structure. Like, "is this node in this graph?" or "is there any path from this node to that node in this graph?" and so on. Programming with graphs is a huge topic. In this series I want to explore a tiny subset of those topics. Specifically:
- How can we create an immutable data structure which represents a directed graph with labels on the edges?
- Given a directed acyclic graph (DAG), what are all the possible edge traversals through that graph starting from a particular node?
First things first. Like I said, there are a zillion different ways to represent a graph. I want to do so immutably - again, an immutable data type is one where "mutations" return a new object that is a mutated version of the original object, which stays the same. For my purposes today, I want to be able to take an existing graph, possibly empty, and insert new nodes and edges into that graph. The data structure I'm going to choose for my internal representation is that a graph consists of:
- an unordered, no-duplicates set of nodes of arbitrary type. Just like you can have a list of strings or a list of pairs of integers, I want to have a graph where the nodes are strings or giraffes or whatever.
- every node has an unordered, no-duplicates set of outgoing edges of arbitrary type. Again, I want to be able to make the edges anything I want. If I want a graph where the nodes are apples and the edges are tigers, I should be able to get that.
- an edge that leaves one node in a graph goes to another (or the same) node in the graph. That is, the edges have a direction to them; every edge starts at one node and goes to another.
Suppose my edges are integers and my nodes are strings. The constraints above imply that there can be edges 123 and 345 that both leave node "ABC" and arrive at node "XYZ". But there cannot be an edge 123 from "ABC" to "DEF" and an edge 123 from "ABC" to "XYZ"; the edges leaving node "ABC" must be distinct in their values.
Like I said earlier, there are lots of different kinds of graphs, and these constraints work well for my purposes in this series. If, for example, I wanted edge labels to be costs, say, then this would not work well; if edge labels are costs then you want the opposite characteristics. That is, you want there to be only one edge between any pair of nodes, where that edge indicates the cost of traversal, and it is perfectly reasonable to have two edges leaving the same node with the same cost. But in my case I want the edges to uniquely identify a path through the graph.
With that set of characteristics in mind, we quickly realize that we could say that an immutable directed graph is nothing more than an immutable dictionary, where the dictionary keys are nodes and the dictionary values are themselves immutable dictionaries where the nested dictionary keys are edge values and the nested dictionary values are the destination nodes of those edges.
Rather than making my own immutable dictionary - because that's not the point of this series - I'll just use the System.Collections.Immutable package downloadable from NuGet to define my graph. Since this is first, just a super-thin wrapper around a dictionary reference, and second, logically a value, I'll make it a struct.
using System;using System.Collections.Generic;using System.Collections.Immutable;struct ImmutableDirectedGraph<N, E>{ public readonly static ImmutableDirectedGraph<N, E> Empty = new ImmutableDirectedGraph<N, E>( ImmutableDictionary<N, ImmutableDictionary<E, N>>.Empty); private ImmutableDictionary<N, ImmutableDictionary<E, N>> graph; private ImmutableDirectedGraph( ImmutableDictionary<N, ImmutableDictionary<E, N>> graph) { this.graph = graph; }
To add a new node to the graph we first check to see if the graph already contains this node; if it does, we leave it alone. If not, we add it to the map with an empty set of outgoing edges.
public ImmutableDirectedGraph<N, E> AddNode(N node){ if (graph.ContainsKey(node)) return this; return new ImmutableDirectedGraph<N, E>( graph.Add(node, ImmutableDictionary<E, N>.Empty));}
To add an edge to the graph, we first make sure that both nodes are in the graph. Once that's taken care of, we get the edge set for the start node and add to it the new edge.
public ImmutableDirectedGraph<N, E> AddEdge(N start, N finish, E edge){ var g = this.AddNode(start).AddNode(finish); return new ImmutableDirectedGraph<N, E>( g.graph.SetItem( start, g.graph[start].SetItem(edge, finish))));}
Remember, setting an item on an immutable dictionary gives you back a different dictionary with the new item added.
Finally, given a graph and a node, what are the outgoing edges? We could give an error if the node is not even in the graph, but I think it is reasonable to say that the edge set of a node that is not in the graph is the empty set:
public IReadOnlyDictionary<E, N> Edges(N node){ return graph.ContainsKey(node) ? graph[node] : ImmutableDictionary<E,N>.Empty;}
I'm not going to need edge and node removal for my purposes; doing so is left as an exercise to the reader. Note that the data structure I've sketched out here does not have an efficient mechanism for removing the edges that are pointing to a removed node; modifying the data structure to make that efficient is an interesting challenge that I'm not going to get into in this series.
As you've likely come to expect, to use this thing we start with the empty graph and add edges (and, if we like, nodes with no edges).
Here is a portion of the Zork map where I've removed a whole bunch of edges to make a DAG. (Click for larger.)
We can build that map in our graph structure like this:
var map = ImmutableDirectedGraph<string, string>.Empty .AddEdge("Troll Room", "East West Passage", "East") .AddEdge("East West Passage", "Round Room", "East") .AddEdge("East West Passage", "Chasm", "North") .AddEdge("Round Room", "North South Passage", "North") .AddEdge("Round Room", "Loud Room", "East") .AddEdge("Chasm", "Reservoir South", "Northeast") .AddEdge("North South Passage", "Chasm", "North") .AddEdge("North South Passage", "Deep Canyon", "Northeast") .AddEdge("Loud Room", "Deep Canyon", "Up") .AddEdge("Reservoir South", "Dam", "East") .AddEdge("Deep Canyon", "Reservoir South", "Northwest") .AddEdge("Dam", "Dam Lobby", "North") .AddEdge("Dam Lobby", "Maintenance Room", "East") .AddEdge("Dam Lobby", "Maintenance Room", "North");
Notice how there are two paths from the Dam Lobby to the Maintenance room, but our data structure handles that just fine.
Next time: we'll see how to find all the traversals through that graph starting from any given node.