Life, part 38
Here we go again!
Fellow BASIC enthusiast Jeff Coding Horror" Atwood, of Stack Overflow and Discourse fame, has started a project to translate the programs in the 1978 classic BASIC Computer Games into more modern languages. I never had a copy of this one - my first computer book was Practise Your BASIC - but these programs are characteristic of the period and I am happy to help out.
1978 was of course well into the craze of writing Life simulators for home personal computers. In this episode I'll break down the original program assuming that you're not a BASIC old-timer like me; this language has a lot of quirks. You can read the commentary from the book here, and f you want to run the original program yourself the copy here is a valid Vintage BASIC program.
Let's get into it!
2 PRINT TAB(34);"LIFE"4 PRINT TAB(15);"CREATIVE COMPUTING MORRISTOWN, NEW JERSEY"6 PRINT: PRINT: PRINT8 PRINT "ENTER YOUR PATTERN:"
- All lines are numbered whether they need to be or not. Line numbering is often used to indicate program structure; we'll come back to this in a bit.
- Lines are not one-to-one with statements! A single line may contain several statements separated by colons.
- PRINT takes a semicolon-separated list of zero or more expressions.
- PRINT outputs a newline by default; as we'll see presently, there is a way to suppress the newline.
- TAB(x) does not produce a string; it moves the print cursor" to column x. In many versions of BASIC, TAB can only appear as an argument to PRINT.
9 X1=1: Y1=1: X2=24: Y2=7010 DIM A(24,70),B$(24)20 C=1
- Variable names are typically one or two characters.
- 24 x 70 is the size of the Life board that we're simulating here; this was quite a large screen by 1978 standards. I owned both a Commodore 64 and a Commodore PET with 40*25 character screens; it was possible to get an 80-column PET but they were rare.
- Note that the authors have made the bizarre choice throughout that (X, Y) coordinates refer to (line, column) and not the more standard (column, line) interpretation. Normally we think of X as being the left-right coordinate, not the top-bottom coordinate.
- Single-valued variables need not be declared.
- Array-valued variables are dimensioned", and the dimensions typically must be literal constants. Here we have a two-dimensional array and a one-dimensional array.
- Anything that is string-valued has a $ suffix, so B$ is an array of strings.
- Arrays and strings are both indexed starting from 1.
- As we know from early episodes in this series, a naive implementation needs to keep track of both the current" and next" state. In a memory-constrained environment such as can be assumed by a 1978 BASIC program's author, we need to keep all that in the same array. Spoiler alert: the convention used by this program for the contents of array A is:
- 0 means cell is dead, will stay dead.
- 1 means cell is alive, will stay alive.
- 2 means cell is alive, will die.
- 3 means cell is dead, will come alive.
- You might have noticed that the program is formatted to have very few unnecessary spaces. Having become accustomed to vertical and horizontal whitespace being used to reduce eyestrain and guide the reader, it looks dense and hard to read. This was not a stylistic choice though; it was imposed by the limitations of the technology of the time. Line lengths were constrained, often to the width of the screen, and when you have only 40 or 80 characters of screen, unnecessary spaces are bad. But more interesting from my perspective as a compiler writer is that implementations of BASIC often tokenized as the user typed in the program, and then stored only the tokens in memory, not the original source code as it was typed. When the program was listed back, the program printer reconstituted the program from the token stream which had discarded the spaces. Pre .NET versions of VB did this!
30 INPUT B$(C)40 IF B$(C)="DONE" THEN B$(C)="": GOTO 8050 IF LEFT$(B$(C),1)="." THEN B$(C)=" "+RIGHT$(B$(C),LEN(B$(C))-1)60 C=C+170 GOTO 30
- INPUT takes a string from the console and places it in the given variable.
- Arrays are indexed using parentheses.
- This is a while" loop but most BASICs did not have loop structures beyond FOR/NEXT, or even ELSE clauses for IF/THEN statements. You had to write GOTO statements to build those control flows, as shown here. Notice that all the colon-separated statements execute if the condition of the IF is met; otherwise we continue on to the next statement. (I was fortunate to have access to an original PET with Waterloo Structured BASIC which did have while loops, though I did not understand the point of them when I was in elementary school. Many years later I ended up working with some of the authors of Waterloo BASIC in my first internships at WATCOM, though I did not realize it at the time! The whole saga of Waterloo Structured BASIC and the SuperPET hardware is told here.)
- LEFT$, RIGHT$ and LEN do what they say on the tin: give you the left substring of given length, right substring of given length, and length of a string. Strings are not directly indexable.
- + is both the string concatenation operator and the arithmetic addition operator.
- It seems clear that we are inputting strings and sticking them in an array until the user types DONE, but why are we checking whether the first character in the string is a period, and replacing it with a space? It is because in some versions of BASIC, INPUT strips leading spaces, but they are necessary when entering a Life pattern.
- Did you notice that nothing whatsoever was explained to the user of the program except enter your pattern"? Enter it how? Why should the user know that spaces are dead, leading spaces are soft, a period in the first place is a hard space but a live cell anywhere else, and that you type DONE when you're done? The assumption of the authors is that the only person running the code is the person who typed it in. You should explain as little as possible" is not a great attitude to drill into beginner programmers.
- Notice that we have no bounds checking. B$ was dimensioned to 24 elements; if the user types in more than 24 lines in the pattern, the program crashes. Similarly we have no checks on the lengths of the strings themselves. A whole generation of young programmers was taught from their very first lesson that crashes are the user's fault for doing something wrong.
- Our program state by the time we break out of the loop is: B$ has C lines in it, and B$(C) is an empty string.
80 C=C-1: L=090 FOR X=1 TO C-1100 IF LEN(B$(X))>L THEN L=LEN(B$(X))110 NEXT X
- Since the last line is always blank, we are reducing the line count C by one.
- The maximum line length L is initialized to zero, unnecessarily. I prefer an unnecessary but clear initialization to the lack thereof. But as we'll see in a moment, some variables are just assumed to be initialized to zero, and some are initialized twice needlessly. More generally, programs that were designed to teach children about programming were chock full of sloppy inconsistencies and this is no exception.
- FOR/NEXT is the only structured loop in many early BASICs. The loop variable starts at the given value and executes the body unless the loop variable is greater than the final value. Some versions of the language also had an optional STEP clause, and some would even keep track of whether the loop was counting up or down. Fancy!
- Plainly the purpose of the FOR/NEXT loop here is to find the maximum line length of the pattern input by the user, but it appears to have a bug; we have strings in B$ indexed from 1 through C, but we are skipping the length check on B$(C). The subtraction here appears to be a mis-edit; perhaps the C=C-1 was moved up, but the developer forgot to adjust the loop condition. The bug only affects inputs where the last line is also the longest.
- Is it inefficient to compute LEN(B$(X)) twice in the case where the maximum length L needs to be updated? Many versions of BASIC used length-prefixed strings (as do all versions of VB and all .NET languages) and thus computing length is O(1). When I first learned C as a teenager it struck me as exceedingly weird that there was no out-of-the-box system for length-prefixing a string. And it still does.
120 X1=11-C/2130 Y1=33-L/2140 FOR X=1 TO C150 FOR Y=1 TO LEN(B$(X))160 IF MID$(B$(X),Y,1)<>" " THEN A(X1+X,Y1+Y)=1:P=P+1170 NEXT Y180 NEXT X
- There are no comments in this program; it would be nice to explain that what we're doing here is trying to center the pattern that the user has just typed in to the interior of a 24 x 70 rectangle. (X1, Y1) is the array coordinate of the top left corner of the pattern as it is being copied from B$ to A; spaces are kept as zero, and non-spaces become 1. This automatic centering is a really nice feature of this program.
- Once again we have no bounds checking. If L is greater than 67 or C is greater than 23, bad things are going to happen when we index into A. (Though if C is greater than 23, we might have already crashed when filling in B$.)
- We already initialized X1 and Y1; we have not read their values at any point before they are written for a second time. By contrast, the population count P is accessed for the first time here and is assumed to be initialized to zero. Again, there is some amount of sloppiness going on here that could have been easily removed in code review.
200 PRINT:PRINT:PRINT210 PRINT "GENERATION:";G,"POPULATION:";P;: IF I9 THEN PRINT "INVALID!";215 X3=24:Y3=70:X4=1: Y4=1: P=0220 G=G+1225 FOR X=1 TO X1-1: PRINT: NEXT X
- The input and state initialization phase is done. This is the start of the main display-and-compute-next-generation loop of our Life algorithm. The subtle indication is that the line numbering just skipped from 180 directly to 200, indicating that we're starting a new section of the program.
- Notice that two of our PRINT statements here end in semicolons. This suppresses the newline at the end. Notice also that the separator between G and POPULATION:" is a comma, which instructs PRINT to tab out some whitespace after converting G to a string and printing it.
- I9, whatever it is, has not been initialized yet and we are depending on it being zero. There is no Boolean type; in BASIC we typically use zero for false and -1 for true. (Do you see why -1 for true is arguably preferable to 1?)
- We know that (X1, Y1) is the top left corner of the might be living" portion of the pattern inside array A. (X2, Y2) and (X3, Y3) both appear to be the bottom right corner of the array, both being (24, 70) at this point, and (X4, Y4) is (1, 1), so it is likely another top left" coordinate of some sort. Maybe? Let's see what happens.
- We reset the living population counter P to zero and increase the generation count G by one.
- We then print out X1-1 blank lines. This implementation is quite smart for a short 1978 BASIC program! It is tracking what subset of the 24*70 grid is maybe alive" so that it does not have to consider the entire space on each generation.
- We're continuing with the pattern established so far that X and Y are loop variables. Thankfully, this pattern is consistent throughout the program.
- The assignment of P on line 215 is redundant; we're going to assign it zero again on line 309 and there is no control flow on which it is read between those two lines.
230 FOR X=X1 TO X2240 PRINT250 FOR Y=Y1 TO Y2253 IF A(X,Y)=2 THEN A(X,Y)=0:GOTO 270256 IF A(X,Y)=3 THEN A(X,Y)=1:GOTO 261260 IF A(X,Y)<>1 THEN 270261 PRINT TAB(Y);"*";262 IF X<X3 THEN X3=X264 IF X>X4 THEN X4=X266 IF Y<Y3 THEN Y3=Y268 IF Y>Y4 THEN Y4=Y270 NEXT Y290 NEXT X
- This is where things start to get interesting; this nested loop does a lot of stuff.
- We are looping from (X1, Y1) to (X2, Y2), so this establishes the truth of our earlier assumption that these are the upper left and bottom right coordinates of the region of array A that could have living cells. However, note that the authors missed a trick; they set up (X1, Y1) correctly in the initialization phase, but they could have also set (X2, Y2) at that time as well.
- We start with a PRINT because all the PRINTs in the inner loop are on the same line; we need to force a newline.
- We update from the current state to the next state; as noted above, if current state is 2 then we were alive but we're going to be dead, so we set the state to 0. Similarly, if current state is 3 then we were dead but are coming alive, so the state is set to 1.
- It's not clear to me why the test on line 260 is a not-equal-one instead of an equal-zero. There are only four possible values; we've considered two of them. It's not wrong, it's just a little strange.
- In all cases where the cell is dead we GOTO 270 which is NEXT Y. Though some BASIC dialects did allow multiple NEXT statements for the same loop, it was considered bad form. The right thing to do was to GOTO the appropriate NEXT if you wanted to continue" the loop.
- Notice that there's a syntactic sugar here. IF A(X,Y)<>1 THEN 270 elides the GOTO".
- If we avoided skipping to the bottom of the loop then the cell is alive, so we tab out to the appropriate column and print it. Then we finally see the meaning of (X3, Y3) and (X4, Y4); as surmised, they are the upper left and bottom right corners of the possibly alive" sub-rectangle of the next generation but I guessed backwards which was which. (X1, Y1), (X2, Y2) are the sub-rectangle of the current generation.
- The line numbering pattern seems to have gone completely off the rails here and in the next section. This often indicates that the author got the code editor into a state where they had to introduce a bunch of code they did not initially plan for and did not want to renumber a bunch of already-written code that came later. The convention was to number lines on the tens, so that if you needed to come back and insert code you forgot, you had nine places in which to do it. Were I writing this code for publication, I would have taken the time to go back through it and renumber everything to a consistent pattern, but it really was a pain to do so with the editors of the era.
295 FOR X=X2+1 TO 24: PRINT: NEXT X299 X1=X3: X2=X4: Y1=Y3: Y2=Y4301 IF X1<3 THEN X1=3:I9=-1303 IF X2>22 THEN X2=22:I9=-1305 IF Y1<3 THEN Y1=3:I9=-1307 IF Y2>68 THEN Y2=68:I9=-1
- We've now processed the entire currently maybe alive" rectangle so we print out the remaining blank lines to fill up the screen.
- We copy (X3, Y3) and (X4, Y4), the next generation" sub-rectangle to (X1, Y1), (X2, Y2) and it becomes the current generation.
- Here we do something really quite clever that none of the implementations I looked at in my previous series handled. The authors of this algorithm have implemented a rectangle of death" as a border of the array; that is a pretty standard way of handling the boundary condition. But what I really like is: they detect when a living cell hits the boundary and set flag I9 to true to indicate that we are no longer playing by infinite-grid Life rules! This flag is never reset, so you always know when you are looking at the UI that this is possibly not the true evolution of your initial pattern.
309 P=0500 FOR X=X1-1 TO X2+1510 FOR Y=Y1-1 TO Y2+1520 C=0530 FOR I=X-1 TO X+1540 FOR J=Y-1 TO Y+1550 IF A(I,J)=1 OR A(I,J)=2 THEN C=C+1560 NEXT J570 NEXT I580 IF A(X,Y)=0 THEN 610590 IF C<3 OR C>4 THEN A(X,Y)=2: GOTO 600595 P=P+1600 GOTO 620610 IF C=3 THEN A(X,Y)=3:P=P+1620 NEXT Y630 NEXT X
- Finally, we've got to compute the next generation. Note that we had a corresponding sudden increase in line number to mark the occasion.
- We reset the population counter to zero and we loop over the currently-maybe-alive rectangle expanded by one cell in each direction, because the dead cells on the edge might become alive.
- Variable C before was the number of valid lines in B$, our string array. Now it is the number of living neighbours of cell (X, Y). Even when restricted to two-character variables, they are in fact plentiful and there is no need to confusingly reuse them.
- We count the number of living cells surrounding (X, Y) including (X, Y) itself, remembering that is alive, stays alive" is 1, and is alive, dies" is 2. Once we have the count then we have our standard rules of Life: if the cell is currently dead and the neighbour count is 3 then it becomes alive (3). If it is currently alive and the neighbour count including itself is not 3 or 4 then it becomes dead (2). Otherwise it stays as either 0 or 1.
- We have a GOTO-to-GOTO bug here. That GOTO 600 could be replaced with GOTO 620 and save a statement.
635 X1=X1-1:Y1=Y1-1:X2=X2+1:Y2=Y2+1640 GOTO 210650 END
- We did not track whether any cell on the border became alive, so the code makes the conservative assumption that the maybe-living-cells-in-here rectangle needs to be grown one cell on each side. Smart... or... is it?
- Take a look at the logic on line 635, and then compare it to the looping constructs on lines 500 and 510. We loop from x1-1 to x2+1; we nowhere else read or write x1 or x2, and as soon as the loop is done, we reassign x1 to x1-1 and x2 to x2+1. It would have made more sense to do the increments and decrements first, and then do the loops!
- Our program then goes into an infinite loop. This was common in BASIC programs of the time; when you wanted it to end, you just pressed RUN-STOP. I mean, what else are you going to do? Prompt the user? Poll the keyboard? Just run and the user will stop you when they want it stopped.
- Some dialects of BASIC required an END statement even if it was unreachable.
- Notice that there was never any clear the screen" control here. It just constantly prints out new screens and hopes for the best as the old screen scrolls off the top!
I hope you enjoyed that trip down memory lane as much as I did. I am extremely happy to have, you know, while loops, in the languages I use every day. But I do miss the simplicity of BASIC programming some days. It is also quite instructive to see how you can implement all this stuff out of IF X THEN GOTO when you need to - and when you're writing a compiler that takes a modern language to a bytecode or machine code, you really do need to.
Next time on FAIC: I'll write this algorithm in idiomatic modern C# and we'll see how it looks.