Common mistakes in date/time formatting and parsing
There are many, many questions on Stack Overflow about both parsing and formatting date/time values. (I use the term "date/time" to mean pretty much "any type of chronlogical information" - dates, times of day, instants in time etc.) Given how often the same kinds of mistakes are made, I thought it would be handy to have a blog post to refer to.
This post assumes you already know the basic operations of formatting and parsing, in terms of the appropriate types to use:
- Java pre-Java-8: SimpleDateFormat (or just DateFormat)
- Java in java.time: DateTimeFormatter
- Joda Time: DateTimeFormatter (created via DateTimeFormat)
- .NET: The Parse/ParseExact/TryParse/TryParseExact/ToString methods on the appropriate type (usually DateTime and DateTimeOffset)
- Noda Time: The *Pattern class corresponding to the type you're working with, e.g. InstantPattern, LocalDateTimePattern.
There are three broad classes of issue here - one of which is "just" a matter of carelessness, usually, and the other which still surprises me in terms of sheer wrongness.
Pattern capitalization issuesThis is an insidious problem, because in some cases you may get the right values, but not all of the time. I suspect it usually comes up again due to copy and paste, but often from specifications rather than other code - in a specification, it's pretty clear what "YYYY-MM-DD HH:MM:SS" means as a date/time format, but that doesn't mean it's the right pattern to put in code.
The main thing to do is read the documentation carefully. Of course, some platforms have clearer documentation than others, but most are at least "good enough". For the Java APIs, the pattern specifiers are generally documented with the formatting classes themselves; for .NET's built-in classes you want the custom date and time format strings and standard date and time format strings MSDN pages, and for Noda Time follow the various options from the text handling part of the user guide. (For other platforms, use your common sense. :)
The most common mistakes here are:
- Using mm for months or MM for minutes, rather than vice versa. I've seen this mistake both ways round.
- Using hh for "hour of day" when HH is intended. H is in the range 0-23; h is in the range 1-12. h is usually used singly (rather than requiring exactly two digits), and almost always in conjunction with an AM/PM specifier - as otherwise it's ambiguous. H is usually used as HH, so that 5am is represented as "05" for example.
- Using YYYY for year - in Java and Noda Time, Y is used for week-year rather than normal calendar year; it's usually used in conjunction with "week of year" and "day of week", but it's much less common than yyyy.
- Using DD for "day of month" when in Java it actually means "day of year".
I'm surprised by how often I see code like this:
var text = "Tue, 5 May 2015 3:15pm";var dateTime = DateTime.ParseExact( text, "yyyy-MM-dd'T'HH:mm:ss");
Here the pattern and the actual data are entirely different, and I get the impression that the author has copied the pattern from another piece of code without any thought about what the magic string "yyyy-MM-dd'T'HH:mm:ss" is there for.
I suspect it goes without saying for most readers, but you should never copy code from elsewhere into your own code without understanding how it works, or which parts you may potentially need to modify.
The result of this sort of error is usually a complete failure to parse, which is at least simpler to find than the "plausible but not quite correct" pattern issue.
Pattern incompatibility issuesSome developers assume that a pattern which works in Java will work in Python, or the equivalent for any other pair of platforms. Don't make this assumption. Always read the documentation - and if you're porting code from one platform to another, you'll need to "decode" the pattern with one set of documentation, then "encode" it with the other.
Time zone issuesUnderstanding time zonesThere are two common issues when understanding what a time zone is to start with.
The first is to assume that a UTC offset (e.g. "+8 hours") is the same as a time zone. This is an understandable mistake, given that a lot of documentation (from organizations which really should know better) misuse the terminology. The UTC offset is the difference between UTC and local time at a particular instant - so for example, while I'm writing this, I'm in the UK time zone which is currently at UTC+1. However, in the winter (in the same time zone) it will be at UTC+0. So if you have a value of (say) "2015-05-10T16:43:00+0100" that only tells you the UTC offset - it doesn't tell you the time zone. There may well be multiple time zones with the same offset at that particular time, but which will have different offsets at differ times.
The second mistake is to think that an abbreviation such as "EST" or "GMT" identifies a time zone. It doesn't, in two ways:
- A single time zone often uses multiple abbreviations over time. For example, "Pacific Time" varies between PST (Pacific Standard Time) and PDT (Pacific Daylight Time). It's unfortunate that some people use the abbreviation for standard time even when they mean the general time zone - so even though currently (at the time of writing) Pacific Time is in PDT (UTC-7), some people would write the local time with "PST" at the end. Grr. Avoid abbrevations if you possibly can.
- The same abbreviation may be used in multiple time zones, or even at different points in time to mean different things within the same time zone. For example, "BST" can mean British Summer Time in Europe/London (standard time of UTC+0, plus 1 hour of daylight saving time), British Standard Time in Europe/London (standard time of UTC+1, with no daylight saving time, around 1970 only) and Bougainville Standard Time in Pacific/Bougainville (UTC+11). Avoid abbreviations if you possibly can.
First, you need to understand exactly what the library you're using does with time zones, and what the types you're using represent. One of the most common misconceptions here is with java.util.Date - this is just an instant in time, with no concept of a time zone or calendar system. The fact that the string returned from Date.toString always uses the system default time zone is unfortunately misleading in this respect, and causes developers to ask how to "convert" a Date from one time zone to another.
Next, you need to understand exactly what your data represents. In my experience, most textual data either specifies a date and/or time without a given time zone or it specifies a date and time with a UTC offset. When no time zone information is present, you may know the time zone it's meant to refer to, or you may not. If you're using a library which has multiple different types to represent different kinds of information (e.g. Joda Time, java.time or Noda Time) I personally find it clearest to parse to a type that closest represents the information actually stated in the text, and then convert it to something else where appropriate.
You definitely need to be aware when the parsing operation is going to impose any sort of time zone understanding on your data. This is the case with SimpleDateFormat in Java and with DateTime.ParseExact and friends in .NET. For SimpleDateFormat, unless you explicitly set a time zone (or the pattern includes a UTC offset), the system default time zone is used - this is usually not what you want. Parsing in .NET allows you to specify how you want the text to be understood, but you need to be careful. (The fact that DateTime sometimes represents a value in the system default time zone, sometimes a value in UTC, and sometimes a value with no associated time zone makes this all tricky.)
Locale / culture issuesMost libraries allow you to specify which culture to use when parsing (or formatting) data. This is a two-edged sword:
- If you're formatting a value to be displayed directly to an end user, that's great: they can see the month name in their own language, etc. In this situation, you'll typically use a "standard" format (e.g. "the short date/time format")
- If you're formatting or parsing a value which is designed to be machine-readable (e.g. passed to a web service) then you almost certainly want the invariant culture instead of a user-specific culture. In this situation, you'll typically use a "custom" format (e.g. "yyyy-MM-dd'T'HH:mm:ss") or a specific culture-invariant format.
Culture can affect several aspects of handling conversions:
- The calendar system used (e.g. the Gregorian calendar vs an Islamic calendar)
- The "standard" formats used (e.g. month/day/year vs day/month/year)
- The separators used (e.g. - vs / for date separators)
- The month and day names used
- The number system used
As a final common problem, you may be performing more conversions than you should be. For example, if you've got a DateTime field in the database but you're passing a value as a string in your SQL parameter (you are using parameterized SQL, right?) then you probably shouldn't be. Most platforms allow parameters to be specified as the value in a "native" representation. Likewise when you fetch a value, don't just call toString on it and then parse the result - if the value is a date/time value, it should already be in a native representation; a simple cast (or call to the type-specific method) should be enough.
ConclusionDate/time text handling is fraught with problems, as a simple look at Stack Overflow shows. Be careful, make sure you know exactly what you're converting from and to, and check exactly what you're specifying vs what you're leaving implicit.