Pandas: ENH: Support for multiple comment characters with readers

Created on 4 Nov 2014 · 3Comments · Source: pandas-dev/pandas

I would be very pleased if Pandas supported multiple comment characters when reading data from files. According to:

import pandas as pd
df = pd.read_table("data.dat", comment=("#","@"), delim_whitespace=True)

I don't know if this is requires a minor or major implementation effort?

Best,
Erik

Enhancement IO CSV

Source

ebran

Most helpful comment

Would be great if a comment character could actually also be two characters, e.g. "##". For example, in VCF files, some meta data is specified at the beginning of the file with "##" before the actual table starts:

http://www.internationalgenome.org/wiki/Analysis/vcf4.0/

Often one just wants to ignore these, but:

df = pd.read_csv("data.vcf", comment="##")

doesn't work. Note that for VCF it won't work to just use comment="#" since the header line actually starts with a single "#".

dansondergaard on 22 Nov 2016

👍11

All 3 comments

this would be a bit of an effort. the reader is basically byte by byte (with some backref capability). So it would have to check agains a buffer of the comment chars (it just checks against the single char now, but only if its not NULL), in a performant manner. Could be done.

jreback on 4 Nov 2014

http://www.internationalgenome.org/wiki/Analysis/vcf4.0/

Often one just wants to ignore these, but:

df = pd.read_csv("data.vcf", comment="##")

doesn't work. Note that for VCF it won't work to just use comment="#" since the header line actually starts with a single "#".

dansondergaard on 22 Nov 2016

👍11

This would be difficult. I'm closing this for now

wesm on 6 Jul 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

df.duplicated and drop_duplicates raise TypeError with set and list values.

Abrosimov-a-a · 3Comments

DataFrame.describe can't return percentiles when data set contain nan

tade0726 · 3Comments

frame _apply_standard error when operating on 0 or NaN values

venuktan · 3Comments

read_csv(filename_with_asian_locale) failed in python 3.6 for windows

mfmain · 3Comments

BUG: fillna with inplace does not work with multiple columns selection by loc

hiiwave · 3Comments