Data.table: R-devel change; R-API inside parallel regions

Created on 30 Nov 2018  ·  4Comments  ·  Source: Rdatatable/data.table

Luke Tierney contacted me about a memory issue in R-devel. Tests are passing both on CRAN and Travis/Appveyor, but memory usage is higher in some cases. As before the garbage collector turns off.
Not sure when exactly the R-devel change was made, but some time in the last month or so. It might be that the R-devel change causes the new memory issue, or that the R-devel change reveals the problem that exists already (possibly even with data.table-release on R-release). In any case, it needs to be fixed.

Luke wrote :

I distilled the issue in 'constellation' down to the attached file
from the examples. If I run this the memory usage is much bigger than
previously and the gc() output is garbled. It's a
multi-threading issue again; everything looks fine with
OMP_NUM_THREADS=1. With mutlipe threads you are
calling DATAPTR from threads other than the main one and that creates
a race on setting the R_GCEnabled flag, so eventually it is getting
stuck on off. I instrumented the places where the GC is disabled and
tracked this as the first one from a thread other than the main one:

#4  0x00007ffff78ba9d5 in DATAPTR (x=x@entry=0x1167458)
     at ../../../R/src/include/Rinlinedfuns.h:106
#5  0x00007fffea02c4c8 in subsetVectorRaw (target=0x4529590, source=0x1167458,
     idx=0x4175210, any0orNA=FALSE) at subset.c:44
#6  0x00007fffea02c7a6 in subsetDT (x=<optimized out>, rows=<optimized out>,
     cols=<optimized out>) at subset.c:272

Given Luke's detail, it's easy to see the problem in the code without needing to reproduce. All R API usage needs taking outside all parallel regions as Luke requested before. I delayed doing that last time due to time pressure, and now it needs tackling. Last time I just did the first step which was to ensure that DATAPTR inside parallel regions did not receive ALTREP.

This warrants accelerating release of 1.12.0, not least because it impacts Luke.

$ grep "omp.*parallel" *.c

  • [x] subset.c
  • [x] reorder.c
  • [x] fwrite.c
  • [x] fread.c
  • [x] fsort.c
  • [x] forder.c
  • [x] between.c

$ grep ALTREP *.c

  • [x] between.c
  • [x] fsort.c
  • [x] reorder.c
  • [x] wrappers.c
R-devel bug

Most helpful comment

awesome of Luke to do all the hard work of identifying the cause! 💯

All 4 comments

awesome of Luke to do all the hard work of identifying the cause! 💯

I'm in process of confirming with Luke that it's fixed before closing ...

Confirmed fixed. Thanks to @ltierney for his help.


For completeness in case we need to refer back...
I could not reproduce the problem with R-devel compiled as follows. (I needed to confirm I could reproduce the problem with 1.11.8 before I could confirm 1.11.9 fixes it.)

./configure --without-recommended-packages --disable-byte-compiled-packages --enable-strict-barrier CC="gcc -fsanitize=address -fno-sanitize=float-divide-by-zero -fno-omit-frame-pointer" CFLAGS="-O0 -g -Wall -pedantic"

even with the script that Luke provided :

library(data.table)
library(constellation)
temp <- as.data.table(vitals[VARIABLE == "TEMPERATURE"])
pulse <- as.data.table(vitals[VARIABLE == "PULSE"])
resp <- as.data.table(vitals[VARIABLE == "RESPIRATORY_RATE"])
temp[, RECORDED_TIME := as.POSIXct(RECORDED_TIME,
  format = "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")]
pulse[, RECORDED_TIME := as.POSIXct(RECORDED_TIME,
  format = "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")]
resp[, RECORDED_TIME := as.POSIXct(RECORDED_TIME,
  format = "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")]
gc()
for (i in 1:10) {
  cat("i=",i,"\n")
  b1 <- bundle(temp, pulse, resp,
    bundle_names = c("PLATELETS", "INR"), window_hours_pre = 24,
    window_hours_post = c(6, 6), join_key = "PAT_ID",
    time_var = "RECORDED_TIME", event_name = "CREATININE", mult = "all")
}
gc()

As a long shot, I recompiled R-devel more simply as follows, still with --enable-strict-barrier which is the important thing :
./configure --without-recommended-packages --enable-strict-barrier
reinstalled constellation and all its dependencies and data.table 1.11.8, and this time Luke's script correctly fails :

Fatal error: Wrong thread calling 'RunFinalizers'

Installing data.table-dev (1.11.9) into this R-devel and rerunning, works fine. Reinstalling data.table 1.11.8 again fails. So it's indeed fixed by 1.11.9. In my original R-devel compile, perhaps something there was making 1.11.8 work (disabling byte compiler, -00, ASAN, etc). Anyway, a simpler R-devel compile was needed and was worth trying.

@jangorecki If time allows, it would be great to add --enable-strict-barrier to the R-devel used in CI pipelines. I don't know if you compile R-devel or not for CI pipelines.

@mattdowle yes I compile R-devel daily, will add this flag there. Added to not forget in #3147

Was this page helpful?
0 / 5 - 0 ratings