Data.table: dcast.data.table eval(fun.aggragate) -- fails when called inside function with internal aggregation function declaration

Created on 2 Oct 2015  ·  4Comments  ·  Source: Rdatatable/data.table

Referring to issue #713 I think I just found a related bug.

Declaring and passing an aggregation function _within_ a function calling dcast.data.table fails on my machine

R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.2 (Yosemite)
data.table_1.9.6

Here is an example:

testdata=data.table(c(1,1, 1, 2, 2), c(1,2,3, 4, 5), c( "a", "a", "b", "a", "b"))
colnames(testdata)=c("ID", "VAL", "CLASS")

#test dcast.data.table within function with internally declared aggregate fun -> FAILS

  test_dcast_dt2 <- function(data) {
    testfunc2 <- function(x) {
      sum(x)
    }
    data_cast=dcast.data.table(data, "ID ~ CLASS", value.var="VAL", fun.aggregate=testfunc2)
  } 

  res2=test_dcast_dt2(testdata)
bug

Most helpful comment

It seems to me that issue #713 was fixed only if the variable name for the passed func is always "fun.aggregate":

This example works in contrast to the previous one:

testdata=data.table(c(1,1, 1, 2, 2), c(1,2,3, 4, 5), c( "a", "a", "b", "a", "b"))
colnames(testdata)=c("ID", "VAL", "CLASS")

  test_dcast_dt <- function(data, fun.aggregate) {
    data_cast=dcast.data.table(data, "ID ~ CLASS", value.var="VAL", fun.aggregate=fun.aggregate)
  } 

  custom_sum <- function(x) {
    sum(x)
  }

  res=test_dcast_dt(testdata, custom_sum)

All 4 comments

It's interesting to see that this case fails too:

testdata=data.table(c(1,1, 1, 2, 2), c(1,2,3, 4, 5), c( "a", "a", "b", "a", "b"))
colnames(testdata)=c("ID", "VAL", "CLASS")

  test_dcast_dt <- function(data, aggfunc) {
    data_cast=dcast.data.table(data, "ID ~ CLASS", value.var="VAL", fun.aggregate=aggfunc)
  } 

  custom_sum <- function(x) {
    sum(x)
  }

  res=test_dcast_dt(testdata, custom_sum)

It seems to me that issue #713 was fixed only if the variable name for the passed func is always "fun.aggregate":

This example works in contrast to the previous one:

testdata=data.table(c(1,1, 1, 2, 2), c(1,2,3, 4, 5), c( "a", "a", "b", "a", "b"))
colnames(testdata)=c("ID", "VAL", "CLASS")

  test_dcast_dt <- function(data, fun.aggregate) {
    data_cast=dcast.data.table(data, "ID ~ CLASS", value.var="VAL", fun.aggregate=fun.aggregate)
  } 

  custom_sum <- function(x) {
    sum(x)
  }

  res=test_dcast_dt(testdata, custom_sum)

I think I am having an issue with this bug, but I wanted to be sure it was the same issue. I was trying to define an fun.aggregate within a function before a call to dcast. A trivial example:

wrapper <- function() {
  f <- function(x) list(x)
  dcast(data, y ~ x + b, fun.aggragate = f)
}

I had attempted to trying to find f using get() by targetting specific sys.call environments. I also attempted to attach f into a new.env() from base environemnt. So is "d0rg0ld commented on Oct 2, 2015" comment still the best approache currently?

Was this page helpful?
0 / 5 - 0 ratings