Getting Number of samples, -20, must be non-negative.
when trying to build a histogram of my dataset.
import numpy as np
my_data = np.loadtxt("my_data.csv", delimiter=',', dtype=np.int16)
n_base, bins_base = np.histogram(my_data, bins="auto")
Here is my_data.csv
Number of samples, -20, must be non-negative.
1.16.4 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0]
Interestingly enough, when I convert this dataset to float, then the histogram is being build without an issue
Thanks for the bug report. I can confirm that this bug exists in the master branch, too.
NumPy devs: The problem is that there is an internal function, _hist_bin_sturges
in histograms.py
, that uses the method ptp
to compute the difference of the maximum and minimum of an array with dtype int16
. In this case, the maximum is 32767 and the minimum is -16, so that difference should be 32783. But ptp
returns a value with the same type as the array, so it returns -32753, which results in the incorrect calculation.
We could fix that by replacing x.ptp()
with something like x.max().item() - x.min().item()
.
Most of the other bin estimators have the same problem with x.ptp()
.
A possible fix is in https://github.com/numpy/numpy/pull/14381.
Most helpful comment
Thanks for the bug report. I can confirm that this bug exists in the master branch, too.
NumPy devs: The problem is that there is an internal function,
_hist_bin_sturges
inhistograms.py
, that uses the methodptp
to compute the difference of the maximum and minimum of an array with dtypeint16
. In this case, the maximum is 32767 and the minimum is -16, so that difference should be 32783. Butptp
returns a value with the same type as the array, so it returns -32753, which results in the incorrect calculation.We could fix that by replacing
x.ptp()
with something likex.max().item() - x.min().item()
.