Data.table: DT[TRUE] can lead to invalid key

Created on 13 Dec 2018  ·  4Comments  ·  Source: Rdatatable/data.table

I didn't realize DT[TRUE] was a way to achieve a shallow copy. shallow copy was only intended for internal use. Thanks to @renkun-ken for highlighting this in #3214, and related #2254.

  • [x] #3214
  • [ ] #2254

In v1.11.8 we see this :

DT = data.table(id = 1:5, key="id")
DT1 = DT[TRUE]
key(DT1)
[1] "id"
DT1[3, id:=6L]
key(DT1)
# NULL              # correct
DT$id
# [1] 1 2 6 4 5     # should be 1:5
key(DT)
# [1] "id"          # invalid key

It only occurs after DT[TRUE], iiuc, which hopefully folk have not discovered or relied on too much?! I hope the usage out there is like @renkun-ken described to add new columns to the shallow copy, not to change existing columns!

New test 1542.08 was added in PR #2313 ready for when this is fixed.

bug

Most helpful comment

Yes, setkey, changing existing columns should not be used on the shallow copy since columns themselves are not copied.

All 4 comments

Yes, setkey, changing existing columns should not be used on the shallow copy since columns themselves are not copied.

If we won't allow to make shallow copy with dt[TRUE] this issue will be automatically resolved.

Eventually. But in the meantime, we can't break @renkun-ken's workflow.
More detail here: https://github.com/Rdatatable/data.table/issues/3214#issuecomment-462490046

the following code could be added to tests to ensure copy behaviour

DT = data.table(a=c(1,2), b=c("b","a"))
address(DT)
address(DT[])
address(DT[, .SD])
address(DT[TRUE])
sapply(DT, address)
sapply(DT[], address)
sapply(DT[, .SD], address)
sapply(DT[TRUE], address)
Was this page helpful?
0 / 5 - 0 ratings