Article 5Q4B7 And FORTRAN, FORTRAN So Far Away

And FORTRAN, FORTRAN So Far Away

by
Remy Porter
from The Daily WTF on (#5Q4B7)

A surprising amount of the world runs on FORTRAN. That's not to say that huge quantities of new FORTRAN are getting written, though it's far from a dead language, but that there are vital libraries written fifty years ago that are still used to this day.

But the world in which that FORTRAN was written and the world in which we live today is wildly different. Which brings us to the story of George and Ike.

In the late 1960s, the company that Ike worked for got a brand-spanking new CDC 6600 mainframe. At the time, it was the fastest computer you could purchase, with a blistering 3MFLOPS performance- 3 million floating point operations per second. The company wanted to hand this off to their developers to do all sorts of fancy numerical simulations with FORTRAN, but there was just one problem: they wanted to do a lot of new programs, and the vendor-supplied compiler took a sadly long time to do its work. As they were internally billing CPU time at $0.10/second, teams were finding it quite expensive to do their work.

CDC_6600.jc.jpg
By Jitze Couperus - Link

Enter Ike. Ike was a genius. Ike saw this problem, and then saw a solution. That solution was 700,000 lines of CDC 6600 assembly language which was his own, custom, FORTRAN compiler. It used half as much memory, ran many times faster, and could generate amazingly user-friendly error messages. Ike's compiler became their internal standard.

Time passed, and Ike moved on to different things at the company. A new team, including George, was brought in, and they were given a "simple" task: update this internal compiler into the coding standards of the late 1970s.

A lot had changed in the decade or so since Ike had released his compiler. First was the rather shocking innovation of terminals which could display lower-case characters. The original character width of the CDC 6600 was 6-bits, but the lower-case codes were sneaked in as 12-bit characters prefixed with an escape code. In mainstream FORTRAN releases, the addition of lower-case characters was marked with a name change: after FORTRAN77, all future versions of the language would simply go by "Fortran".

George dug through the character handling in the assembly code, and found a recurring line: BCDBIT EQU 6. This was part of every code segment which handled text. This was a handy flag for George and his team: every place they needed to change had it. The change wasn't simple, though, as they had to do some munging with changing shift counts and changing character masks, adding in some logic for escape codes. In principle, this was absolutely an achievable task for any given BCDBIT EQU 6 line.

In practice, there were 122,000 occurrences of that line in the code. The team would be hard-pressed to do even 1% of that in the time they had allotted- and so 1% is what they committed to do. About 1,200 instances in the assembly would be updated to allow escape characters, covering most of the cases the users wanted to be able to handle wide characters in. It left a lot of code paths where bad results might happen, but that could be handled with the old "caveat programmator".

There were other, similar issues with handling text. For example, any code which read data from a file or input was capped at reading 73 or 80 characters at a time- the screen width of the terminals when Ike had been designing the code. That was an easy fix, but introduced George to another... quirk of Ike's design.

You see, it wasn't enough for Ike to write his own compiler. Because the compiler wasn't just a compiler: it was also a linker. An extremely bad and fragile linker, but it would combine your program's compiled binary with its dependencies in a way that mostly worked. But it also wasn't just a linker. Because Ike's compiler/linker was also a runtime, which would allow you to run your code in a self-contained environment.

Ostensibly, this was meant for testing. For example, that runtime wouldn't let your program access the disks, but it would pretend to. Lines like DSKDELAY MSECDELAY 33 were scattered through the runtime portion of the code. This would simulate the delay you could expect from accessing disks. And, in a full block, often looked like:

DSKDELAY MSECDELAY 33 SA1 DISKMSEC SX6 X1+33 SA6 A1

This code is incrementing a count. At the end of the run it would output an estimate of how much time you spent doing disk I/O. There was just one problem with that: the estimate was absolutely useless. The hardware configuration had so much impact- whether your I/O passed through the $3M memory cache, or had to go out to one of the old-style barrel-sized hard-drives. Disk I/O operations could take nanoseconds or could take entire seconds. Ike's "helpful" estimates weren't.

So, Ike's genius may have been a little misguided, sometimes. There was one other quirk he had left for George to discover. This compiler was 700,000 lines of code. Assembling that code into an executable took time- specifically 330 seconds. At ten cents per second, that's $33 per compile. This was the late 70s- that was more than George's daily salary. Ike was under pressure to find ways to optimize the assembling of this code, and as established, Ike was a genius.

The Assembler did provide an IF pseudo-op, allowing conditional assembly, in the same way the C preprocessor allows you to do conditional compilation. But this was expensive at assembly time. If Ike used IFs, the assembling would have taken even longer than 330 seconds. So Ike found a trick.

BCDBIT EQU 6 YEAH, RIGHTREADCARD READS INFILE,LINE,73 FOR EVER AND EVER, AMEN SA1 OPT A1+B1 SAVED A THOUSANDTH OF A BUDGET PENNYDSKDELAY MSECDELAY 33 BECASE WE ARE AN OS, ALSO, TOO

Now, as you might guess from looking at this code, it's constructed as columns. The rightmost column is clearly comments. In fact, this Assembly dialect reserves three columns for operations. After the third time it encounters spaces, it treats everything from that point forward as a comment.

Which now, saying that, you should probably find this line a bit more suspicious:

 SA1 OPT A1+B1 SAVED A THOUSANDTH OF A BUDGET PENNY

Everything after the third run of spaces means that A1+B1 would be a comment- except that the OPT symbol gets expanded at assembly time. Which means if it has a value, the A1+B1 operation is ignored, but if it has a value, that value is used here.

In George's words:

That is, if OPT evaluated to anything, then OPT was the operand, otherwise if OPT was all blanks, then A1+B1 was the operand. This could be extended as many times across as you'd like, leading to eye-watering code, nearly impossible to understand. But it did greatly speed up assembly time, so a big win.

Ike still worked at the company, so George was able to go over to his new office and ask questions. Unfortunately, that turned out to be worse than useless. Ike always had an answer ready for George, but that answer was always wrong. Whether Ike didn't understand his old code, had simply forgotten how it worked, or was just having a laugh at George's expense was a question George could never answer.

But there were questions George could answer, by the end of the project. As George explains:

While rather frustrating, I did eventually slide the compiler into the late 1970's, and we got a good ten years more of use out of it. So, a success story?

In the end, there's no WTF here, just a story about working within constraints we don't think about often.

otter-icon.png [Advertisement] Continuously monitor your servers for configuration changes, and report when there's configuration drift. Get started with Otter today! TheDailyWtf?d=yIl2AUoC8zAlCwg05V21sg
External Content
Source RSS or Atom Feed
Feed Location http://syndication.thedailywtf.com/TheDailyWtf
Feed Title The Daily WTF
Feed Link http://thedailywtf.com/
Reply 0 comments