I frequently do
DT[CJ(colA = colA, colB = colB, unique=TRUE), on=c("colA","colB")]
# to complete missing levels
# or
DT[, CJ(colA = colA, colB = colB, unique=TRUE)][!DT, on=c("colA","colB")]
# to identify missing levels
# http://stackoverflow.com/a/36065607/1191259
It would be nice if I could get away with writing colA
and colB
fewer times. The FR here is for
CJ(colA, colB, unique=TRUE, names=TRUE)
to infer the names colA
and colB
, perhaps using whatever method is used by data.frame() and data.table() (make.names
?).
(The name repetition could be reduced further if on=.Icols
were a thing, I suppose, but I'll leave that for a separate FR.)
SO posts to update...
CJ
takes ...
as first argument, and that function is going to be generic method, so AFAIK we will need to change it into CJ(x, ...)
, those changes can be made together #1090
+1 and I don't see the need for the names
argument - this should be the only behavior. With the join syntax change to using "on" instead of setkey
this has become a big sticking point for me.
I'd also like to see unique = TRUE
be the default - I can't think of _ever_ not needing to unique the arguments to CJ
.
@jangorecki I didn't touch the #1090 / #814 stuff yet. better as self-contained, I think, unless I'm missing something
Most helpful comment
+1 and I don't see the need for the
names
argument - this should be the only behavior. With the join syntax change to using "on" instead ofsetkey
this has become a big sticking point for me.I'd also like to see
unique = TRUE
be the default - I can't think of _ever_ not needing to unique the arguments toCJ
.