Environment
When calling numpy.loadtxt
on file containing strings as follows:
import numpy as np
datestxt = np.loadtxt("NYSE_dates.txt", dtype=str)
print(datetxt)
Where NYSE_dates.txt is simply a list of dates (could be anything really):
7/5/1962
7/6/1962
7/9/1962
...
12/29/2020
12/30/2020
12/31/2020
Output is:
["b'7/5/1962'" "b'7/6/1962'" "b'7/9/1962'" ..., "b'12/29/2020'"
"b'12/30/2020'" "b'12/31/2020'"]
As you can see, all the strings have been bytes-casted, then stringified through conv
, as you would get the same result from str(str('12/31/2020').encode('latin1'))
, per conv
& compat.asbytes
.
After looking at the code, it appears that all strings are bytes-casted with asbytes(...)
pretty much throughout, as for example in split_line(...)
, so this must mean every routine in the module is broken.
I also have that issue. This is very very annoying; basically you can't use loadtxt in Python3.
Temporary solution: I removed all asbytes() calls in the loadtxt method.
Yeah, I remember thinking something was fishy in there when I looked through the code.
For the record, I am running into the same issue with datetime64 inputs, leading to a parsing error of the form: Error parsing datetime string "b'2013-01-02'"
. To work around this, I had to create a converter for that column:
def decoder(input_bytes):
return input_bytes.decode("ascii")
This would be fine in production code but is highly non-pretty for training material...
Pushing off to 1.11.
work-around - run iconv on the file first.
pushing off to 1.12.
I see that this is being pushed forward, but I find that is is a bug that should be addressed, and a fix seems to be easily implemented.
Pretty shocking that this hasn't been fixed for 5 years
It looks as though this is working as desired in NumPy 1.13.3 (though I'm not sure which PR fixed it). Can this issue be closed?
>>> import io
>>> import numpy as np
>>> f = io.StringIO("7/5/1962\n7/6/1962\n")
>>> np.loadtxt(f, dtype=str)
array(['7/5/1962', '7/6/1962'],
dtype='<U8')
>>> np.__version__
'1.13.3'
Looks like this was fixed in #8349, in response to #8033.
Closing. Please reopen if needed.
Most helpful comment
Pretty shocking that this hasn't been fixed for 5 years