Data.table: fread Unable to handle mis-quoted field if it is out-of-sample

Created on 9 Jul 2017  ·  3Comments  ·  Source: Rdatatable/data.table

This example:

require(data.table)
DT = data.table(A=rep("abc", 10000), B="def")
DT[110, A:='"a"b']
fwrite(DT, f<-tempfile(), quote=F)
fread(f)

produces an error message which is misleading:

Expecting 2 cols but row 0 contains only 1 cols (sep=','). Consider fill=true. <<"a"b,def>>

At least it doesn't crash (which I thought it would given that type[0] gets bumped up from CT_STRING into a non-existent type)...

bug fread

Most helpful comment

@ben519 Your dataset contains just 1 row, so it's definitely not because of out-of-sample irregularities. I've created a new issue for your error (see link above)

All 3 comments

Possible approach:

  • Introduce new quoting rule QR0 (all other rules become QR1..QR4). This would be the default QR. Under this rule, fields may or may not be quoted, but no internal quotes are allowed. Thus, the following fields are admissible under QR0: 1,foo,"","bar",,"baz,baz", while these are not: "foo""bar","foo\"bar",foo"bar,f"oo,bar".
  • When reading a file, if some field is of STRING type and cannot be read under current QR, then:

    • If we're currently at QR0 -- bump the QR until the field can be read, then continue scanning the file;

    • Otherwise, bump the QR but then go back and rescan all string fields (since the meaning of quotation marks has changed in the data that was already read).

  • QR bumps have the following hierarchy: QR0 -> {QR1|QR2|QR3} -> QR4.

I'm running into this too, but getting an error.

screen shot 2017-11-03 at 1 59 52 pm

fread("foo.csv", select=c("Date", "Description", "Amount"), header = T)  # error
fread("foo.csv", header = T, verbose = F)  # works

screen shot 2017-11-03 at 1 59 03 pm

@ben519 Your dataset contains just 1 row, so it's definitely not because of out-of-sample irregularities. I've created a new issue for your error (see link above)

Was this page helpful?
0 / 5 - 0 ratings