Pandas: read_json() removes underscore and converts string to integer when only numbers and underscore present

Created on 14 Oct 2017  ·  3Comments  ·  Source: pandas-dev/pandas

Code Sample

import pandas as pd

df = pd.DataFrame({'test': ['34343_43434']})
json = df.to_json(orient='records')
result = pd.read_json(json, orient='records')
print(result)

output:
test
0 3434343434

Problem description

Pandas appears to be converting the initial string ("34343_43434") to an integer (3434343434) and removes the underscore to do it.

This only occurs when all characters in the string (besides the underscore) are integers. For example, if the initial value were "34343_43434X" then the output would correctly "34343_43434X". This issue does not occur when dtypes=False.

Expected Output

test

0 34343_43434

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.3
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

Compat IO JSON

All 3 comments

I was able to replicate this on python 3.6.2 but not on 3.5.3. Not sure why though

MIght be related to PEP 515

I'm able to do things like this in 3.6, but not in 3.5

In [5]: num = 34343_43434

In [6]: type(num)
Out[6]: int

In [7]: num
Out[7]: 3434343434

this is an unfortunate side effect of the pep, but this looks valid to me.

Was this page helpful?
0 / 5 - 0 ratings