February 30th, 1712 and databases...

D. Hugh Redelmeier hugh-pmF8o41NoarQT0dZR+AlfA at public.gmane.org
Thu Dec 13 19:58:47 UTC 2012


| From: Colin McGregor <colin.mc151-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org>

| So, bottom line question I guess is are there database programs that
| are smart enough to know that February 30th can (in one VERY
| exceptionally unusual case) be a valid date?

That's just one of many odd cases in calendars.

So fundamentally, this comes down to how you think about dates/times.

In computers there are several kinds of time, based on properties that
you require of them (eg. monotonicity).  See time(7).  But let's just
pay attention to "wall clock" times and dates.

Traditionally, UNIX and Linux have represented time as a count of
seconds since the epoch (00:00:00 UTC on 1 January 1970).  I would
hope that serious database systems would do the same kind of thing.  Do 
they?

This representation has a bunch of strengths and weaknesses:

- it assumes Newtonian time

+ it is simple and linear and make sense to anyone who cannot
  simply ignore the complexities of normal time representations.
  It makes doing arithmetic with time values simple.

- yikes!  According to <https://en.wikipedia.org/wiki/Unix_time>
  I'm wrong: Unix assumes a day has a constant number of seconds.
  So leap seconds are handled badly.  If a second is added to a day
  that second will have the same representation as the second after
  it!  If a second is removed, there will be a hole in the
  representation.  So subtracting two time_t values will give
  the wrong number of seconds if the range includes leap seconds.

- it means that there must be a flexible way of representing at least
  the most important 10,000 rules about calendars.  These describe or
  modify how to convert between textual and time_t representations.
  Why flexible?  Because new, ever odder, rules are being invented and
  discovered all the time.  With luck, this can be hidden in a
  userland library.  See ctime(3) and friends.

- it doesn't have enough range (with signed 32-bit time_t, 1951 -
  2038).  With 64-bit time_t, the range seems OK: +/- 293 billion
  years.

- it isn't precise enough for some uses.  For example, make depends
  on time_t timestamps and could use more precision.

+ having timestamps NOT be relative to timezones is really helpful.
  This gets even clearer when dealing with daylight savings time.

So many oddities are or should be handled in the library that converts
a time_t to or from a human-familiar notation.

- many many calendars that are not Gregorian (Julian, Islamic,
  Japanese, Jewish, Mayan, French Revolutionary, ...)

- many surprising oddities, often due to transitions.

  - <https://en.wikipedia.org/wiki/February_30>  Collin's
    example is from the Swedish transitional calendar 1712.

  - there is no year 0: 1BC was followed immediately by 1AD.

  - read cal(1) about 1752.  That reflects England and
    its colonies.

Clearly these library routines need to know a cultural context to 
interpret or generate textual representations of time.
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists





More information about the Legacy mailing list