I'm working on a project where I have to parse excel files for a client to extract data. An odd thing is popping up here: when I parse a date in the format of 5/9 (may 9th) in the excel sheet, I get 39577 in my program. I'm not sure if the year is encoded here (it is 2008 for these sheets).

Are these dates the number of days since some sort of epoch?

Does anyone know how to convert these numbers to something meaningful? I'm not looking for a solution that would convert these properly at time of parsing from the excel file (we already have thousands of extracted files that required a human to select relevant information - re-doing the extraction is not an option).


DateTime.FromOADate (if you're using .NET) is the method you want. Excel dates are stored as doubles. If you have dates in the first two months of 1900 you might get bit by the Excel 1900 leap year bug.

From http://msdn.microsoft.com/en-us/library/system.datetime.fromoadate.aspx:

Double-precision floating-point number that represents a date as the number of days before or after the base date, midnight, 30 December 1899. The sign and integral part of d encode the date as a positive or negative day displacement from 30 December 1899, and the absolute value of the fractional part of d encodes the time of day as a fraction of a day displacement from midnight. d must be a value between negative 657435.0 through positive 2958466.0.

Excel stores dates as the number of days since 0-JAN-1900 (so 1-JAN-1900 would have a value of "1"). You can find a really good breakdown of how Excel handles dates and times here:

Dates And Times In Excel

When dates appear on screen in Excel as "5/9", "May 9th", or some such, it is a trick of the formatting and is not representative of the underlying data value. It sounds like your parsing program is pulling the underlying value, not the formatted date. In order to suggest a fix, though, I need to know what your parsing program is (Excel macro, formula, outside code, etc.).

