import pandas as pd
df = pd.DataFrame({'test': ['34343_43434']})
json = df.to_json(orient='records')
result = pd.read_json(json, orient='records')
print(result)
output:
test
0 3434343434
Pandas appears to be converting the initial string ("34343_43434") to an integer (3434343434) and removes the underscore to do it.
This only occurs when all characters in the string (besides the underscore) are integers. For example, if the initial value were "34343_43434X" then the output would correctly "34343_43434X". This issue does not occur when dtypes=False.
test
0 34343_43434
pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.3
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
I was able to replicate this on python 3.6.2 but not on 3.5.3. Not sure why though
MIght be related to PEP 515
I'm able to do things like this in 3.6, but not in 3.5
In [5]: num = 34343_43434
In [6]: type(num)
Out[6]: int
In [7]: num
Out[7]: 3434343434
this is an unfortunate side effect of the pep, but this looks valid to me.