Pandas: ENH: Support for multiple comment characters with readers

Created on 4 Nov 2014  ·  3Comments  ·  Source: pandas-dev/pandas

I would be very pleased if Pandas supported multiple comment characters when reading data from files. According to:

import pandas as pd
df = pd.read_table("data.dat", comment=("#","@"), delim_whitespace=True)

I don't know if this is requires a minor or major implementation effort?

Best,
Erik

Enhancement IO CSV

Most helpful comment

Related:

Would be great if a comment character could actually also be two characters, e.g. "##". For example, in VCF files, some meta data is specified at the beginning of the file with "##" before the actual table starts:

http://www.internationalgenome.org/wiki/Analysis/vcf4.0/

Often one just wants to ignore these, but:

df = pd.read_csv("data.vcf", comment="##")

doesn't work. Note that for VCF it won't work to just use comment="#" since the header line actually starts with a single "#".

All 3 comments

this would be a bit of an effort. the reader is basically byte by byte (with some backref capability). So it would have to check agains a buffer of the comment chars (it just checks against the single char now, but only if its not NULL), in a performant manner. Could be done.

Related:

Would be great if a comment character could actually also be two characters, e.g. "##". For example, in VCF files, some meta data is specified at the beginning of the file with "##" before the actual table starts:

http://www.internationalgenome.org/wiki/Analysis/vcf4.0/

Often one just wants to ignore these, but:

df = pd.read_csv("data.vcf", comment="##")

doesn't work. Note that for VCF it won't work to just use comment="#" since the header line actually starts with a single "#".

This would be difficult. I'm closing this for now

Was this page helpful?
0 / 5 - 0 ratings