Life, part 25

ericlippert

from Fabulous adventures in coding on 2020-07-20 14:35 (#55Y2D)

Let's crisp up the problem from last episode. The problem for today's episode is to write a method that takes these nine 2-quads:

Computes these sixteen 1-quads:

And returns these four 2-quads:

Those four 2-quads form a 3-quad; I haven't written a 3-quad data structure yet, but I'm sure you can imagine how it goes. We'll write it out later; it's just a struct containing four 2-quads. I like to start by writing the signature of the method:

static Quad3 Step9Quad2ToQuad3( Quad2 nw, Quad2 n, Quad2 ne, Quad2 w, Quad2 c, Quad2 e, Quad2 sw, Quad2 s, Quad2 se){

The first step is to get those twelve additional 2-quads. The first four we need are these:

Start at the beginning: how are we going to get the western 2-quad on the northern edge? Plainly it is made up of parts of the 2-quads I've labeled NW and N. A couple episodes ago I added an or" operator to 2-quads and masks that get the eastern and western edges, so how about this?

Quad2 n_w = nw.EastEdge | n.WestEdge;

Do you see why that's not quite right? That is this 2-quad:

We've pasted the east edge of nw and the west edge of n together, but we also need to swap eastness with westness. The result we've got is mirror-imaged-at-the-level-of-1-quads the 2-quad that we want. Make sure that this step is clear in your head because correctly understanding how this is mirrored" is important in this algorithm.

We're going to be doing these operations a lot in this episode, so in keeping with my general attitude of there's nothing too small to go in a method", I'm going to make two helper methods on Quad2 right now; one creates a problem and one fixes it:

public Quad2 HorizontalMiddleMirrored(Quad2 right) => EastEdge | right.WestEdge;public Quad2 Mirror => new Quad2((ushort)((cells << 8) | (cells >> 8)));

Now we can write correct code for extracting the n_w 2-quad:

Quad2 n_w = nw.HorizontalMiddleMirrored(n).Mirror;

Why not combine these into one helper function HorizontalMiddle? Normally I would, but... we will come back to this point in a bit.

So far we've done two ands, two ors and two shifts. We can do three times that work to similarly get:

Quad2 n_e = n.HorizontalMiddleMirrored(ne).Mirror;Quad2 c_w = w.HorizontalMiddleMirrored(c).Mirror;Quad2 c_e = c.HorizontalMiddleMirrored(e).Mirror;

We now have eight of the sixteen 2-quads that we need. I'm sure you see how we can get these four:

We need to do the same thing again but this time for the vertical direction. And this time the 2-quads are going to come out flipped north-for-south instead of mirrored east-for-west. Two more helper methods!

public Quad2 VerticalMiddleFlipped(Quad2 bottom) => SouthEdge | bottom.NorthEdge;public Quad2 Flip => new Quad2((ushort)(((cells & SouthEdgeMask) << 4) | ((cells & NorthEdgeMask) >> 4)));

and now we can compute those four:

Quad2 w_n = nw.VerticalMiddleFlipped(w).Flip;Quad2 w_s = w.VerticalMiddleFlipped(sw).Flip;Quad2 c_n = n.VerticalMiddleFlipped(center).Flip;Quad2 c_s = center.VerticalMiddleFlipped(s).Flip;

We're 75% of the way to getting our sixteen needed 2-quads. How are we going to get these?

Three of them are pretty easy; they can be constructed from what we already have in hand.

Quad2 c_nw = w_n.HorizontalMiddleMirrored(c_n).Mirror;Quad2 c_sw = w_s_flip.HorizontalMiddleMirrored(c_s).Mirror;Quad2 c_ne = n_e.VerticalMiddleFlipped(c_e).Flip;

That last one is a little tricky but we can get it:

Quad2 s_e = s.HorizontalMiddleMirrored(se).Mirror;Quad2 c_se = c_e.VerticalMiddleFlipped(s_e).Flip;

So far we've done 26 each of and, or, shift, to get the 16 2-quads we need.

We supposed that we had a fast lookup table that took a 2-quad and returned a 1-quad; let's actually build that. The 2-quad key is a ushort and the returned 1-quad is an integer where the low four bits are the bits of the 1-quad:

static int[] lookup = new int[65536];... initialized as ...for (int i = 0; i <= 0xffff; i += 1) lookup[i] = StepQuad2(new Quad2((ushort)i));

We can therefore solve our problem with this mess:

Quad2 newNW = new Quad2((ushort)( (lookup[(ushort)nw] << 12) | // NW is bits 12-15 (lookup[(ushort)w_n] << 8) | // SW is 8-11 (lookup[(ushort)n_w] << 4) | // NE is 4-7 (lookup[(ushort)c_nw]))); // SE is 0-3Quad2 newNE = new Quad2((ushort)( (lookup[(ushort)n] << 12) | (lookup[(ushort)c_n] << 8) | (lookup[(ushort)n_e] << 4) | (lookup[(ushort)c_ne])));Quad2 newSW = new Quad2((ushort)( (lookup[(ushort)w] << 12) | (lookup[(ushort)w_s] << 8) | (lookup[(ushort)c_w] << 4) | (lookup[(ushort)c_sw])));Quad2 newSE = new Quad2((ushort)( (lookup[(ushort)c] << 12) | (lookup[(ushort)c_s] << 8) | (lookup[(ushort)c_e] << 4) | (lookup[(ushort)c_se])));return new Quad3(newNW, newNE, newSW, newSE);

What I like about this: it would work.

What I dislike about this: we're back in the mechanism domain. We are treating 1-quads as ints and constantly going back and forth between ushorts and Quad2s. The implementation detail of how a Quad2 is laid out in memory is once again front and center in this algorithm. Also, it feels like we are doing more work than we need to throughout this algorithm.

And we are wasting a bunch of space in the lookup table. We have 65536 32-bit integers but we are only using 4 of those bits. That seems like a small problem; memory is cheap after all but... wait a minute... what if we could fix several of these problems at once?

The first step in fixing this mess is: we're going to make the lookup table an array of Quad2s, where each Quad2 contains four copies of the 1-quad:

static Quad2[] lookup = new Quad2[65536];... initialized as ...for (int i = 0; i <= 0xffff; i += 1){ int n = StepQuad2(new Quad2((ushort)i)); lookup[i] = new Quad2((ushort)( (n << 12) | (n << 8) | (n << 4) | n));}...static Quad2 Lookup(Quad2 q) => lookup[(ushort)q];

I mean, why not? We have plenty of bits to spare, so use them.

Now the call site looks like this:

Quad2 newNW = Lookup(nw).NW | Lookup(w_n).SW | Lookup(n_w).NE | Lookup(c_nw).SE;Quad2 newNE = Lookup(n).NW | Lookup(c_n).SW | Lookup(n_e).NE | Lookup(c_ne).SE;Quad2 newSW = Lookup(w).NW | Lookup(w_s).SW | Lookup(c_w).NE | Lookup(c_sw).SE;Quad2 newSE = Lookup(c).NW | Lookup(c_s).SW | Lookup(c_e).NE | Lookup[c_se].SE;return new Quad3(newNW, newNE, newSW, newSW);

Look at that improvement in the readability of the code! We plainly see that we are constructing new Quad2s out of the parts of old ones. We're keeping all the typing consistent. This is great.

Could it be even better?

Take a look at those sixteen computations of a quad1 on the next tick. You notice something?

All the ones that are going in the NW of a new 2-quad are lookups on an argument that was passed in to this method: nw, n, w and c.
All the ones that are going to the SW were computed by flipping" something: w_n, c_n, w_s and c_s.
All the ones that are going to the NE were computed by mirroring" something: n_w, n_e, c_w and c_e
All the ones going to the SE were both flipped and mirrored: c_nw, c_ne, c_sw, c_se.

What does this mean?

It means we can move the flipping and mirroring operations back in time into the lookup table as well. Read this next bit carefully because it is tricky. We're going to initialize the lookup table like this:

for (int i = 0; i <= 0xffff; i += 1){ Quad2 normal = new Quad2((ushort)i); Quad2 mirror = normal.Mirror; Quad2 flip = normal.Flip; Quad2 both = mirror.Flip; int result = StepQuad2(normal); lookup[(ushort)both] |= new Quad2((ushort)result); // SE lookup[(ushort)mirror] |= new Quad2((ushort)(result << 4)); // NE lookup[(ushort)flip] |= new Quad2((ushort)(result << 8)); // SW lookup[(ushort)normal] |= new Quad2((ushort)(result << 12)); // NW}

And now we can remove all the flips and mirrors when computing the new Quad2s! Put it all together:

static Quad3 Step9Quad2ToQuad3( Quad2 nw, Quad2 n, Quad2 ne, Quad2 w, Quad2 c, Quad2 e, Quad2 sw, Quad2 s, Quad2 se){ // These are mirrored Quad2 n_w = nw.HorizontalMiddleMirrored(n); Quad2 n_e = n.HorizontalMiddleMirrored(ne); Quad2 c_w = w.HorizontalMiddleMirrored(c); Quad2 c_e = c.HorizontalMiddleMirrored(e); // These are flipped Quad2 w_n = nw.VerticalMiddleFlipped(w); Quad2 c_n = n.VerticalMiddleFlipped(c); Quad2 w_s = w.VerticalMiddleFlipped(sw); Quad2 c_s = c.VerticalMiddleFlipped(s); // These are mirrored and flipped Quad2 c_nw = w_n.HorizontalMiddleMirrored(c_n); Quad2 c_ne = n_e.VerticalMiddleFlipped(c_e); Quad2 c_sw = w_s.HorizontalMiddleMirrored(c_s); Quad2 c_se = c_e.SouthEdge | c_s.NE | se.NW; Quad2 newNW = Lookup(nw).NW | Lookup(w_n).SW | Lookup(n_w).NE | Lookup(c_nw).SE; Quad2 newNE = Lookup(n).NW | Lookup(c_n).SW | Lookup(n_e).NE | Lookup(c_ne).SE; Quad2 newSW = Lookup(w).NW | Lookup(w_s).SW | Lookup(c_w).NE | Lookup(c_sw).SE; Quad2 newSE = Lookup(c).NW | Lookup(c_s).SW | Lookup(c_e).NE | Lookup[c_se].SE; return new Quad3(newNW, newNE, newSW, newSW);}

Computing c_se actually gets easier, which is a nice bonus.

What's our final tally? To move ahead an 8*8 portion of a 12*12 grid we do 41 ands, 25 ors and 16 array lookups. And zero shifts. All the other bit operations have been moved backwards in time to run during the computation of the lookup table.

Could we do even less work?

Yes; we could eliminate 16 of those ands if we had four lookup tables each with four bits in the right place and zeros everywhere else; we then would not have to do any masking. It would just be:

 Quad2 newNW = LookupNW(nw) | LookupSW(w_n) | LookupNE(n_w) | LookupSE(c_nw); ...

But that's not how the original implementation did it, and so that's not what I'm doing here. It would be an interesting experiment to find out of that makes a difference; maybe we will come back to that point later in the series, but probably not.

Oh, and incidentally, you remember that fragment of bit-twiddling code that I posted from the original implementation?

int ix01 = (ix00 & 0xf0f0) | (cN.q[14] & 0x0f0f);int ix11 = (ix10 & 0xf0f0) | (cN.q[15] & 0x0f0f);int ix03 = (ix02 & 0xf0f0) | (ix01 & 0x0f00) | (cN.q[7] & 0x000f);

I bet you can figure out what it does now!

Well that was quite the journey to get a fast method that can step an 8*8 fragment of a 12*12 grid forwards one tick by memoizing the 4*4 step to 2*2" problem. Why is this method useful?

Next time on FAIC: Yeah, why?

Source	RSS or Atom Feed
Feed Location	http://ericlippert.com/feed
Feed Title	Fabulous adventures in coding
Feed Link	https://ericlippert.com/