Referring to issue #713 I think I just found a related bug.
Declaring and passing an aggregation function _within_ a function calling dcast.data.table fails on my machine
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.2 (Yosemite)
data.table_1.9.6
Here is an example:
testdata=data.table(c(1,1, 1, 2, 2), c(1,2,3, 4, 5), c( "a", "a", "b", "a", "b"))
colnames(testdata)=c("ID", "VAL", "CLASS")
#test dcast.data.table within function with internally declared aggregate fun -> FAILS
test_dcast_dt2 <- function(data) {
testfunc2 <- function(x) {
sum(x)
}
data_cast=dcast.data.table(data, "ID ~ CLASS", value.var="VAL", fun.aggregate=testfunc2)
}
res2=test_dcast_dt2(testdata)
It's interesting to see that this case fails too:
testdata=data.table(c(1,1, 1, 2, 2), c(1,2,3, 4, 5), c( "a", "a", "b", "a", "b"))
colnames(testdata)=c("ID", "VAL", "CLASS")
test_dcast_dt <- function(data, aggfunc) {
data_cast=dcast.data.table(data, "ID ~ CLASS", value.var="VAL", fun.aggregate=aggfunc)
}
custom_sum <- function(x) {
sum(x)
}
res=test_dcast_dt(testdata, custom_sum)
It seems to me that issue #713 was fixed only if the variable name for the passed func is always "fun.aggregate":
This example works in contrast to the previous one:
testdata=data.table(c(1,1, 1, 2, 2), c(1,2,3, 4, 5), c( "a", "a", "b", "a", "b"))
colnames(testdata)=c("ID", "VAL", "CLASS")
test_dcast_dt <- function(data, fun.aggregate) {
data_cast=dcast.data.table(data, "ID ~ CLASS", value.var="VAL", fun.aggregate=fun.aggregate)
}
custom_sum <- function(x) {
sum(x)
}
res=test_dcast_dt(testdata, custom_sum)
I think I am having an issue with this bug, but I wanted to be sure it was the same issue. I was trying to define an fun.aggregate within a function before a call to dcast. A trivial example:
wrapper <- function() {
f <- function(x) list(x)
dcast(data, y ~ x + b, fun.aggragate = f)
}
I had attempted to trying to find f using get() by targetting specific sys.call environments. I also attempted to attach f into a new.env() from base environemnt. So is "d0rg0ld commented on Oct 2, 2015" comment still the best approache currently?
Related question on SO: R data.table function doesn't recognize an already-specified argument
Most helpful comment
It seems to me that issue #713 was fixed only if the variable name for the passed func is always "fun.aggregate":
This example works in contrast to the previous one: