Data.table: [R-Forge #5222] 'not found' when DT[, list(sum(non-.SD-col), lapply(.SD,mean)), by=..., .SDcols=...]

Created on 8 Jun 2014 · 12Comments · Source: Rdatatable/data.table

Submitted by: Matt Weller; Assigned to: Nobody; R-Forge link

When using .SDcols (for the purpose of applying a function to multiple columns) I cannot reference other columns in the original table (v1) using the following syntax:

dt = data.table(grp=c(2,3,3,1,1,2,3), v1=1:7, v2=7:1, v3=10:16)
dt.out = dt[, c(v1 = sum(v1),  lapply(.SD,mean)), by = grp, .SDcols = v2:v3]
# Error in `[.data.table`(dt, , list(v1 = sum(v1), lapply(.SD, mean)), by = grp,  : 
#   object 'v1' not found

A similar error happens when I use c instead of list, clearly the column v1 cannot be accessed within the j clause.

I resorted to the following code which includes column v1, even though I do not want that to be included in the lapply portion, having to drop it after computation.

sd.cols = c("v1","v2", "v3")
dt.out = dt[, c(sum.v1 = sum(v1), lapply(.SD,mean)), by = grp, .SDcols = sd.cols]

According to eddi on Stackoverflow this is a bug and he has asked me to report it. I cannot provide much more detail as I'm not exactly sure which part he thinks was a bug, looking at the accepted answer by Arun and their ensuing discussion will highlight where but the problem lies.

Here is the relevant SO post.

High bug

Source

arunsrinivasan

All 12 comments

Another post to update: http://stackoverflow.com/questions/27755518/data-table-sd-lapply-multiple-columns-in-argument

arunsrinivasan on 4 Jan 2015

Bit late, but adding this question of mine to the pile

MichaelChirico on 11 Jul 2015

I didn't even think about it as a bug, usually I provide additional required fields to .SDcols and later in j I use .SD[, !"total", with=FALSE] to exclude unwanted column.

jangorecki on 11 Jul 2015

That's another good workaround, I wonder the performance difference vis-a-vis using dt$total. And yes, this sort of dances the line between FR and bug, IMO.

MichaelChirico on 11 Jul 2015

Bumping this up again. Looks like this could be a very important fix. this question seem to be related to and could be potentially solved via DT[, (deltaColsNewNames) := lapply(.SD, normalDelta, price), .SDcols = deltaColsNames]

DavidArenburg on 19 Aug 2015

Here's another simple case where this would be useful: http://stackoverflow.com/a/32498711/1191259

franknarf1 on 10 Sep 2015

Here's another simple case that suffers: http://stackoverflow.com/questions/32944060/using-data-table-to-calculate-new-columns/32944519#32944519

rentrop on 5 Oct 2015

Another to update when fixed: http://stackoverflow.com/q/32915770/1191259

franknarf1 on 9 Oct 2015

Yay! we can now do this:

require(data.table)
dt = data.table(grp=c(2,3,3,1,1,2,3), v1=1:7, v2=7:1, v3=10:16)
dt.out = dt[, c(v1 = sum(v1),  lapply(.SD,mean)), by = grp, .SDcols = v2:v3]
  #  grp v1  v2   v3
# 1:   2  7 4.5 12.5
# 2:   3 12 4.0 13.0
# 3:   1  9 3.5 13.5

arunsrinivasan on 8 Mar 2016

👍1

Updated all SO posts linked here. Thanks to all.

arunsrinivasan on 8 Mar 2016

👍1

Thanks, @arunsrinivasan. I was waiting for this fix for couple of years.

DavidArenburg on 8 Mar 2016

Awesome! Thank you

rentrop on 8 Mar 2016

Was this page helpful?

0 / 5 - 0 ratings