Submitted by: Matt Weller; Assigned to: Nobody; R-Forge link
When using .SDcols
(for the purpose of applying a function to multiple columns) I cannot reference other columns in the original table (v1) using the following syntax:
dt = data.table(grp=c(2,3,3,1,1,2,3), v1=1:7, v2=7:1, v3=10:16)
dt.out = dt[, c(v1 = sum(v1), lapply(.SD,mean)), by = grp, .SDcols = v2:v3]
# Error in `[.data.table`(dt, , list(v1 = sum(v1), lapply(.SD, mean)), by = grp, :
# object 'v1' not found
A similar error happens when I use c instead of list, clearly the column v1 cannot be accessed within the j
clause.
I resorted to the following code which includes column v1, even though I do not want that to be included in the lapply
portion, having to drop it after computation.
sd.cols = c("v1","v2", "v3")
dt.out = dt[, c(sum.v1 = sum(v1), lapply(.SD,mean)), by = grp, .SDcols = sd.cols]
According to eddi on Stackoverflow this is a bug and he has asked me to report it. I cannot provide much more detail as I'm not exactly sure which part he thinks was a bug, looking at the accepted answer by Arun and their ensuing discussion will highlight where but the problem lies.
Here is the relevant SO post.
Another post to update: http://stackoverflow.com/questions/27755518/data-table-sd-lapply-multiple-columns-in-argument
Bit late, but adding this question of mine to the pile
I didn't even think about it as a bug, usually I provide additional required fields to .SDcols
and later in j
I use .SD[, !"total", with=FALSE]
to exclude unwanted column.
That's another good workaround, I wonder the performance difference vis-a-vis using dt$total
. And yes, this sort of dances the line between FR and bug, IMO.
Bumping this up again. Looks like this could be a very important fix. this question seem to be related to and could be potentially solved via DT[, (deltaColsNewNames) := lapply(.SD, normalDelta, price), .SDcols = deltaColsNames]
Here's another simple case where this would be useful: http://stackoverflow.com/a/32498711/1191259
Here's another simple case that suffers: http://stackoverflow.com/questions/32944060/using-data-table-to-calculate-new-columns/32944519#32944519
Another to update when fixed: http://stackoverflow.com/q/32915770/1191259
Yay! we can now do this:
require(data.table)
dt = data.table(grp=c(2,3,3,1,1,2,3), v1=1:7, v2=7:1, v3=10:16)
dt.out = dt[, c(v1 = sum(v1), lapply(.SD,mean)), by = grp, .SDcols = v2:v3]
# grp v1 v2 v3
# 1: 2 7 4.5 12.5
# 2: 3 12 4.0 13.0
# 3: 1 9 3.5 13.5
Updated all SO posts linked here. Thanks to all.
Thanks, @arunsrinivasan. I was waiting for this fix for couple of years.
Awesome! Thank you