Lubridate: BC dates

Created on 27 May 2009  ·  7Comments  ·  Source: tidyverse/lubridate

Remember to self: consider BC dates and time (Before Christ/ Before Common Era)

  • Garrett

All 7 comments

Hi. I know this is ancient, but is anyone thinking about this bce / b.c. Issue, or does lubridate now handle this natively?

What's the problem more concretely. What's the user pattern people have in mind?

I am closing this. If someone insists that this should be done and is useful, please reopen.

Well, not that I use this regularly but I just worked on a dataset that had negative years i.e., BC, included and had quite some difficulties to deal with it. E.g. the following doesn't work:

> lubridate::ymd("-2255-01-01")
[1] "2255-01-01"

> lubridate::parse_date_time(-2255, "Y")
[1] NA
Warning message:
All formats failed to parse. No formats found.

since a Date object with positive year is returned. That said, the following works:

> lubridate::ymd("0000-01-01") - lubridate::years(2255)
[1] "-2255-01-01"

which made me write a helper function that deals with negative years.

I can confirm that for anyone working on ancient periods (e.g. Greek and Roman periods) this feature would be very useful. This problem/choice is currently a deal breaker for Tidyverse and lubridate enthusiasts using R in Digital Humanities projects and courses. I'd be happy to test the modified functions.

Hi! Thanks for the package and the work done around. If someone considers implementing BCE dates, or dealing with those with the package as is (1.7.9), here are some thoughts about problems caused by a phantom year zero.

Dealing with (non existing) "Year zero"

Explanations

If anyone is considering dealing with "before common era" dates in lubridate, be aware that Year zero doesn't exist (for historians I mean, there is Year -1 and then Year 1, see for example the Wikipedia chronology – also note that's for the Julian calendar, that could have minor conflicts with the Gregorian calendar we use nowadays; you would find more details about this on Wikipedia), and that can cause a few problems.

Quick clarification for people unfamiliar with those notations:

  • "CE" stands for "Common Era", which is a "de-christianisation" of long and still used "AD", "Anno Domini" (so dates CE could be seen as "positive years")
  • "BCE" stands for "Before Common Era", equivalent to "BC", "Before Christ" ("negative years").
    eg. Year 2021 (happy new year btw!) would be "2021 CE". Socrates died in 399 BCE.

For instance, lubridate follows the ISO 8601 (version 8601:2004 I presume? BCE dates could be handled with ISO 8601:2019 but the free-access part of the doc is unclear about it), which starts at 0000-01-01, that is the 1st January of 1 BCE (Year -1).

This writing is confusing because it leaves to think "0000-01-01" is Year 0, and that "-001-01-01" is Year -1 when it's Year -2, and can cause problems to compute durations (see code below).

That aside, if encountered, "0 CE/AD" or "0 BCE/BC" should probably be parsed into Year -1.

References: Wikipedia (ISO 8601, Year zero, 1 BC, Common Era...)

Some code to make my point

(Licensed under WTFPL: Do What The Fuck You Want to)

pacman::p_load(lubridate)
pacman::p_version(lubridate)
#> [1] '1.7.9'

a <- ymd("0001-01-01")
a
#> [1] "0001-01-01"
# Year 1, no problem

b <- ymd("0000-01-01") - years(1)
b
#> [1] "-001-01-01"
# It is Year -1?
# No, it's -2 even if printed (-001-01-01),
# since ymd("0000-01-01") is already Year -1.

# The problem appears if we compute duration between the two
as.duration(a - b)
#> [1] "63158400s (~2 years)"
# But there is only one year between 1st January -1 and 1st January 1!
# since year zero doesn't exist.

Let's illustrate with Augustus dates:

  • birth: 23 September 63 BCE
  • death: 19 August 14 CE
  • age at death: 75
aug_birth <- ymd("0000-09-23") - years(63)
aug_death <- ymd("0014-08-19")
age <- aug_death - aug_birth
as.duration(age)
#> [1] "2426889600s (~76.9 years)"
# That's one year too much!

# The correct writing would be:
aug_birth <- ymd("0000-09-23") - years(63 - 1)

So a correct helper function would be, to parse BCE yyyy-mm-dd:

parse_bce_ymd <- function(str) {
  regex <- "(\\d{4})(-\\d{2}-\\d{2})"
  match <- stringr::str_match(str, regex)
  years_n <- readr::parse_number(match[, 2]) - 1 # Beware the -1 here
  right_side <- match[, 3]
  date <- ymd(paste0("0000-",right_side)) - years(years_n)
  return(date)
}
# Test the function.
aug_birth <- parse_bce_ymd("0063-09-23")
aug_death <- ymd("0014-08-19")
age <- aug_death - aug_birth
as.duration(age)
#> [1] "2395353600s (~75.9 years)"
# Yay that's correct!

Still, lubridate print the BCE date with one year less (less in absolute value, that is one year ahead here) than the "real one", as if a zero-year existed, which is misleading.

aug_birth
#> [1] "-062-09-23"

In view of the last comments (and the widespread use of R and Tidyverse packages in digital humanities projects), do you think the issue could be re-opened, @vspinu?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

saberbouabid picture saberbouabid  ·  3Comments

DulceC picture DulceC  ·  5Comments

courtiol picture courtiol  ·  6Comments

sebschub picture sebschub  ·  7Comments

rachaelmburke picture rachaelmburke  ·  55Comments