Article 48FWX Set the Flux Capacitor for 12/30/1899

Set the Flux Capacitor for 12/30/1899

by
Remy Porter
from The Daily WTF on (#48FWX)

I mentally think of the early 2000s as "the XML Age". XML was everywhere. Everyone wanted to put XML into their applications. Java was the XML king, using XML for everything. People were starting to ask themselves, "do we need the big expensive RDBMS, when we can just use XML instead?"

In other words, just like JSON today, but worse. Tomislav A inherited one such database- a clunky pile of XML documents with only the vaguest hint of a schema to them, and a mix of valuable, useful data, and a mountain of crap. With the assumption that all data validation was happening in the application layer, the database was mostly "garbage in, garbage sits there forever and can never be deleted because it might be important".

Tomislav's job was to filter out the garbage, find the actually useful information, and move it into a real datastore. There were loads of problems and inconsistencies in the data, but with a little Python in his toolkit, Tomislav could mostly handle those.

For example, dates. Dates were stored in the XML as just text strings, because why would you use a structured document format to represent structured data? Since they were just strings, and since there was no data validation/sanitization in the application layer, there was no system to the strings. "31-DEC-2002" sat next to "12/31/2002" and "31/12/2002". With some help from dateutil.parser, Tomislav could handle most of those glitches without difficulty.

Until, when importing a date, he got the error: ValueError: year 39002 is out of range.

One of the dates in the document was just that- 39002. Knowing that there was no validation, Tomislav was ready to dismiss it as just corrupt data, but not before taking a stab at it. Was it the number of days, or maybe weeks since the Unix Epoch?

>>> datetime(2019, 1, 1) - datetime(1970, 1, 1)datetime.timedelta(days=17897)

Nope, not even close. The rest of the record didn't look like garbage, though. And 39002 seemed so oddly specific. Tomislav had one advantage, though: one of the original developers was still with the company.

Heidi was still writing code, mostly keeping ancient MFC-based C++ applications from falling over. Because she was elbow deep in the Windows stack, it didn't take her more than a second of looking at the "date" to figure out what was going on.

"39002? That's the number of days since December 30th, 1899."

"What? Why?"

Heidi laughed. "Let me tell you a story."

Rewind time far enough, and the most important piece of software on the planet, at least from the end users perspective, was Lotus 1-2-3, which had the killer feature: it was Excel before Excel existed. All users ever want is Excel.

Spreadsheets are large, complex systems, and in the early days of Lotus 1-2-3, they were running on computers that had limitations on doing large, complex things. Compact representations which could be manipulated quickly were important. So, if you wanted to represent a date, for example, you wanted a single number. If you limit yourself to dates (and not times), then counting the number of days since an arbitrary threshold date seems reasonable. If you want times, you can make it a floating point, and use fractional days.

If you wanted to set a landmark, you could pick any date, but a nice round number seems reasonable. Let's say, for example, January 1st, 1900. From there, it's easy to just add and subtract numbers of days to produce new dates. Oh, but you do have to think about leap years. Leap years are more complicated- a year is a leap year if it's divisible by four, but not if it's divisible by 100, unless it's also divisible by 400. That's a lot of math to do if you're trying to fit a thousand rows in a spreadsheet on a computer with less horsepower than your average 2019 thermostat.

So you cheat. Checking if a number is divisible by four doesn't require a modulus operation- you can check that with a bitmask, which is super fast. Unfortunately, it means your code is wrong, because you think 1900 is a leap year. Now all your dates after February 28th are off-by-one. Then again, you're the one counting. Speaking of being the one counting, while arrays might start at zero, normal humans start counting at one, so January 1st should be 1, which makes December 31st, 1899 your "zero" date.

You've got a spreadsheet app with working dates that can handle thousands of cells in under 640k of RAM. It's time to ship it. The problem is solved.

Until Microsoft comes back into the picture, anyway. As previously stated, what users really want is Excel, but Excel doesn't exist yet. So you need to make it. Lotus 1-2-3 owns the spreadsheet space, so Microsoft is actually the plucky upstart here, and like all plucky upstarts, they have to start out by being compatible with the established product. Excel needs to be able to handle Lotus files. And that means it needs to handle Lotus dates. So it replicates the Lotus behavior, down to thinking 1900 is a leap year.

Lotus, and Excel, both think that 60 days after December 31st, 1899 is February 29th.

Time marches on. Excel needs to have macros, and the thought is to bolt on some of the newfangled Object-Oriented Basic folks have been experimenting with. This is a full-fledged programming language, so there's already an assumption that it should be able to handle dates correctly, and that means counting leap years properly. So this dialect of Basic doesn't think 1900 is a leap year.

This macro language thinks that 60 days after December 31st, 1899 is March 1st.

No problem. Move your start date back one more, to December 30th, 1899. Now Excel, Lotus, and our macro language all agree that March 1st, 1900, is day number 61. Our macro language is off-by-one for the first few months of 1900, but that discrepancy is acceptable, and no one at Microsoft, including Bill Gates who signed off on it, cares.

The Basic-derived macro language is successful enough inside of Excel that it grows up to be Visual Basic. It is "the" Microsoft language, and when they start extending it with features like COM for handling library linking and cross-process communication, it lays the model. Which means when they're figuring out how to do dates in COM" they use the Visual Basic date model. And COM was the whole banana, as far as Windows was concerned- everything on Windows touched COM or its successors in some fashion. It wasn't until .NET that the rule of December 30th, 1899 was finally broken, but it still crops up in Office products and SQL Server from time to time.

Tomislav writes:

I don't use any Microsoft technologies, and never did much. I always believed that if there was one common thing across all systems and technologies, it was 1970-01-01, the Epoch, the magic date!

Armed with a better picture of how weird dates could actually be, Tomislav was able to get back to work on trying to make sense of that XML database. Heidi still has her hands deep in the guts of ancient C++ applications that barely compile on modern Windows environments.

For me, the key take away is how small choices made ages ago, decisions which made sense at the time, keep cropping up. It's not just dates, although the Y2K bug and the 2038 problem both highlight how often this happens with dates. You never know when today's quick little cheat or hack becomes tomorrow's standard.

For more information on this topic:

proget-icon.png [Advertisement] Ensure your software is built only once and then deployed consistently across environments, by packaging your applications and components. Learn how today! TheDailyWtf?d=yIl2AUoC8zAy06opJEWxUI
External Content
Source RSS or Atom Feed
Feed Location http://syndication.thedailywtf.com/TheDailyWtf
Feed Title The Daily WTF
Feed Link http://thedailywtf.com/
Reply 0 comments