The is a guest blog written for the Computer Weekly Developer Network by Michael Vessey, senior consultant with software testing provider SQS.
Some of the most common defects in software systems are in areas relating to dates and times. Often requirements and specifications can be misinterpreted or misunderstood. A developer might use standard functions (such as datediff) to make life easier, however these functions may not meet expected (and not explicitly described) requirements.
Common examples of poorly defined requirements include: "Show me on screen how many days there are until my birthday." Does the specification tell us if the number to be displayed is inclusive of both dates? If my Birthday is 29th February and it's not a leap year then what should be displayed?
What kind of year is that?
These are all common problems that are often overlooked until a very thorough test analyst starts looking at fringe test cases. With some utterly bizarre logic and differences between the Gregorian, Chinese, Julian and Hindu calendars (to name a few), defects often occur in scenarios a developer may not be aware of.
For example as children we are taught that if a year is divisible by four then it is a leap year - but the actual logic required for determining leap years is:
if year modulo 400 is 0
else if year modulo 100 is 0
else if year modulo 4 is 0
Operating system and server-side date issues
So we know that the logic of dealing with dates can be very irregular and there are traps for unwary developers; but what about hardware date handling and server date/time storage? One of the first real examples of this was the Y2K bug. There are many more instances of this type of defect cropping up, for example Microsoft's SQL server still contains settings that dictate which century a two-digit year belongs to (see below).
Other databases and applications suffer from similar issues - the important thing for a developer to realise is that you cannot rely on the server configuration; rather that you must code defensively when dealing with date/time storage and be as explicit as possible.
A brief look at currently known issues could make a lot of people pause for thought and look a little closer at their own date/time storage and look for potential trip-ups.
• GPS Date rollover (21st August 1999)
• Code Value Terminator (9/9/99 is used as both an "unknown" date or a terminator to indicate no more data in a file)
• Taiwan (minguo calendar determines 2011 as the first three-digit year)
• 2038 (Unix 32-bit unsigned integer storage limits dates to 19th January 2038)
• 2042 (IBM Mainframe clocks stop at 17th September 2042)
• 2107 (FAT file systems will have date overflow issues at the end of 2107)
• Year 10,000 (The first five-digit year)
It's unlikely that you will encounter any of these on a day-to-day basis, but emerging complexities and features in multinational software systems are leading to more common and realistic scenarios where date/time storage is an issue.
Null is not nothing
The most prevalent of these is the use of NULL values. Modern ".Net" developers often overlook blank data in their fields and as a result the value "1/1/1900" is used to replace empty date values. This can have catastrophic results when report writers begin to create reports based on date ranges.
For example the following extremely simple query to show all employees that left before 1st October 2010 has two very different results depending upon whether dates are stored as NULL or 1/1/1900
select * from employee where leavingdate<'01 oct 2010'
Since NULL values cannot be evaluated our employees who have not left do not show up in the results. However, if NULL dates are being stored as 1/1/1900 then we suddenly have a lot more people appearing in our query (namely the ones that have not left the company yet).
The source of time-trouble
To explain why we have so many issues in managing and storing dates and times, we have to look at how they are stored and why.
A date (or time) is most useful when it can be ordered in some manner - time is relatively linear (unless you are near a black hole or are approaching the speed of light) and reporting of time based events nearly always requires data shown in date order. Dates must be stored as a baseline date and a number of increments after that baseline.
Similarly Microsoft SQL server has a baseline date of 1/1/1900 and can represent (using both positive and negative values) values from January 1, 1753, through December 31, 9999 to the nearest millisecond
Satellite navigation systems use the GPS epoch (6th January 1980) is as the baseline date and the increment is expressed in a thirteen-bit field as the number of weeks since the epoch and the weekday. So, modern sat-nav systems won't work 8192 weeks after 1980 (sometime near the year 2137).
For those of you asking "surely there must be a method of storing dates in strings that is much safer?" well no, you only have to consider that "01 Apr 2011" in French would be "01 Avr 2011" and that American dates are expressed in reverse to British dates. When coupled with leading-zero problems and the fact that string storage is considerably more costly than integer-based storage, you can see why the current systems are in place.
Tips for developing with dates
The lessons to learn from this, when dealing with Dates
• Establish and test to your boundaries in the past and the future
• Test NULL values
• Test in more than one language
• Test ordering of data between the 9th of the month and the 10th of the month (one-digit to two-digit date)
• Ensure that localised dates are handled (13th day of the month will usually trigger this)
Most importantly of all - try not to re-invent the wheel, there are enough flawed date storage systems without adding another one.