.__skip_array_function__
function attribute to allow for skipping __array_function__
dispatch. (https://github.com/numpy/numpy/pull/13389)numpy/core/overrides.py
in C for speed (https://github.com/numpy/numpy/issues/12028):get_overloaded_types_and_args
array_function_implementation_or_override
ndarray.__array_function__
?array_function_dispatch
?numpy.core
np.core.defchararray
(#12154)np.einsum
and np.block
(https://github.com/numpy/numpy/pull/12163)numpy.lib
numpy.fft
/numpy.linalg
(https://github.com/numpy/numpy/pull/12117)arange?
](https://github.com/numpy/numpy/issues/12379)ndarray.__repr__
should not rely on __array_function__
(https://github.com/numpy/numpy/pull/12212)stacklevel
should be increased by 1 for wrapped functions, so tracebacks point to the right place (gh-13329)It might be good to merge a preliminary "Decorate all public NumPy functions with @array_function_dispatch" for some high-profile functions and request downstream consumers of the protocol to try it out
Once we merge https://github.com/numpy/numpy/pull/12099 I have another PR ready that will add dispatch decorators for most of numpy.core
. It will be pretty easy to finish things up -- this one took less than an hour to put together.
cc @eric-wieser @mrocklin @mhvk @hameerabbasi
See https://github.com/shoyer/numpy/tree/array-function-easy-impl for my branch implementing all the "easy" overrides on functions with Python wrappers. The leftover parts are np.block
, np.einsum
and a handful of multiarray functions written entirely in C (e.g., np.concatenate
). I'll split this into a bunch of PRs once we're done with #12099.
Note that I haven't written tests for overrides on each individual function. I'd like to add a few integration tests when we're done (e.g., a duck array that logs all applied operations), but I don't think it would be productive to write dispatching tests for each individual function. The checks in #12099 should catch the most common errors on dispatchers, and every line of code in dispatcher functions should get executed by existing tests.
@shoyer - on the tests, I agree that it is not particularly useful to write tests for each one; instead, within numpy, it may make most sense to start using the overrides relatively quickly in MaskedArray
.
@mhvk sounds good to me, though I'll let someone else who uses/knows MaskedArray take the lead on that.
See https://github.com/numpy/numpy/pull/12115, https://github.com/numpy/numpy/pull/12116, #12119 and https://github.com/numpy/numpy/pull/12117 for PRs implementing __array_function__
support on functions defined in Python.
@shoyer - seeing some of the implementations, I have two worries:
reshape
, the original functionality already provided a way to override it, by defining a reshape
method. We are effectively deprecating that for any class that defines __array_function__
.np.median
, careful use of np.asanyarray
and ufuncs ensured that subclasses could already use them. But that functionality can no longer be accessed directly.I think overall these two things are probably benefits, since we simplify the interface and can make the implementations optimized for pure ndarray
- though the latter suggests that ndarray.__array_function__
should take over converting lists, etc., to ndarray
, so that the implementations can skip that part). Still, I thought I'd note it since it makes me dread implementing this for Quantity
a bit more than I thought -- in terms of both the amount of work and the possible hit in performance.
though the latter suggests that ndarray.__array_function__ should take over converting lists, etc., to ndarray, so that the implementations can skip that part).
I'm not sure I follow here.
We are indeed effectively deprecating the old way of overriding functions like reshape
and mean
, though the old way does still support incomplete implementations of NumPy's API.
I'm not sure I follow here.
I think the issue is that if we implement __array_function__
for even a single function, the previous mechanisms break completely and there is no way to fail-over. Which is why I propose we revisit my NotImplementedButCoercible
proposal.
@hameerabbasi - yes, that is the problem. Though we need to be careful here how easy we make it to rely on duct-tape solutions that we would really rather get rid of... (which is why I wrote above that my "problems" may actually be benefits...). Maybe there is a case for trying as is in 1.16 and then deciding on actual experience whether we want to provide fall-back of "ignore my __array_function__
for this case".
Re: dispatcher styling: My preferences on style are based on memory/import time considerations, and verbosity. Quite simply, merge the dispatchers where the signature is likely to remain the same. This way, we create the least amount of objects and the cache hits will be higher too.
That said, I'm not too opposed to the lambda style.
The style for writing dispatcher functions has now come up in a few PRs. It would good to make a consistent choice across NumPy.
We have a few options:
Option 1: Write a separate dispatcher for each function, e.g.,
def _sin_dispatcher(a):
return (a,)
@array_function_dispatch(_sin_dispatcher)
def sin(a):
...
def _cos_dispatcher(a):
return (a,)
@array_function_dispatch(_cos_dispatcher)
def cos(a):
...
Advantages:
sin(x=1)
-> TypeError: _sin_dispatcher() got an unexpected keyword argument 'x'
. Disadvantages:
Option 2: Reuse dispatcher functions within a module, e.g.,
def _unary_dispatcher(a):
return (a,)
@array_function_dispatch(_unary_dispatcher)
def sin(a):
...
@array_function_dispatch(_unary_dispatcher)
def cos(a):
...
Advantages:
Disadvantages:
sin(x=1)
-> TypeError: _unary_dispatcher() got an unexpected keyword argument 'x'
Option 3: Use lambda
functions when the dispatcher definition would fit on one line, e.g.,
# inline style (shorter)
@array_function_dispatch(lambda a: (a,))
def sin(a):
...
@array_function_dispatch(lambda a, n=None, axis=None, norm=None: (a,))
def fft(a, n=None, axis=-1, norm=None):
...
# multiline style (more readable?)
@array_function_dispatch(
lambda a: (a,)
)
def sin(a):
...
@array_function_dispatch(
lambda a, n=None, axis=None, norm=None: (a,)
)
def fft(a, n=None, axis=-1, norm=None):
...
Advantages:
Disadvantages:
TypeError: <lambda>() got an unexpected keyword argument 'x'
)@shoyer: edited to add the two-line PEP8 spacing to make the "lines of code" aspect more realistic
Note that the error message issues can be fixed by reconstructing the code object, although that will come with some import-time cost. Perhaps worth investigating, and breaking out @nschloe's tuna to compare some options.
Yep, the decorator module could also be used for generating function definition (it uses a slightly different approach for code generation, a little more like namedtuple in that it uses exec()
).
As long as the error is not solved, I think we need to stick to the options with a dispatcher which has a clear name. I'd slightly to bundle dispatchers together (2) for memory reasons, though would then keep the error message very much in mind, so would suggest calling the dispatcher something like _dispatch_on_x
.
Though if we can change the error, things change. E.g., it might be as simple as catching exceptions, replacing <lambda>
with the function name in the exception text, and then re-raising. (Or does that chain thing these days?)
I agree that the error message needs to be clear, ideally shouldn't change at all.
OK, for now I think it's best to hold off on using lambda
, unless we get some sort of code generation working.
https://github.com/numpy/numpy/pull/12175 adds a draft of what overrides for multiarray functions (written in C) could look like if we take the Python wrapper approach.
@mattip where are we at on implementing matmul
as a ufunc? Once we finish up all these __array_function__
overrides, I think that's the last thing we need for making NumPy's public API fully overloadable. It would be nice to have it all ready for NumPy 1.16!
PR #11175, which implements NEP 20, has been making slow progress. It is a blocker for PR #11133, which has the matmul loop code. That one still needs to be updated and then verified via benchmarks that the new code is not slower than the old.
I have four PRs up for review which should complete the full set of overrides. Final reviews/signoffs/merges would be appreciated so we can start testing __array_function__
in earnest! https://github.com/numpy/numpy/pull/12154, https://github.com/numpy/numpy/pull/12163, https://github.com/numpy/numpy/pull/12119, https://github.com/numpy/numpy/pull/12175
Adding overrides to np.core
caused a few pandas tests to fail (https://github.com/pandas-dev/pandas/issues/23172). We're not quite sure what's going on yet but we should definitely figure it out before releasing.
See https://github.com/numpy/numpy/issues/12225 for my best guess at the why this is causing test failures in dask/pandas.
Some benchmarks of imports times (on my macbook pro with a solid state drive):
decorator.decorate
(#12226): 183.694 msMy benchmark script
import numpy as np
import subprocess
times = []
for _ in range(100):
result = subprocess.run("python -X importtime -c 'import numpy'",
shell=True, capture_output=True)
last_line = result.stderr.rstrip().split(b'\n')[-1]
time = float(last_line.decode('ascii')[-15:-7].strip().rstrip())
times.append(time)
print(np.median(times) / 1e3)
Any idea of the memory usage (before/after)? That's kind of useful as well, especially for IoT applications.
Do you know how to reliably measure memory usage for a module?
On Sat, Oct 20, 2018 at 6:56 AM Hameer Abbasi notifications@github.com
wrote:
Any idea of the memory usage (before/after)? That's kind of useful as
well, especially for IoT applications.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/numpy/numpy/issues/12028#issuecomment-431584123, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABKS1k_IkrJ2YmYReaDrnkNvcH2X0-ZCks5umyuogaJpZM4W3kSC
.
I think writing a script containing import numpy as np
, adding a sleep statement and tracking process memory should be good enough. https://superuser.com/questions/581108/how-can-i-track-and-log-cpu-and-memory-usage-on-a-mac
Any other core devs want to take a quick look (really, it only includes two functions!) at https://github.com/numpy/numpy/pull/12163? It's the last PR adding array_function_dispatch
to internal numpy functions.
For reference, here's the performance difference I see when disabling __array_function__
:
before after ratio
[45718fd7] [4e5aa2cd]
<master> <disable-array-function>
+ 72.5±2ms 132±20ms 1.82 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 10000)
- 44.9±2μs 40.8±0.6μs 0.91 bench_ma.Concatenate.time_it('ndarray', 2)
- 15.3±0.3μs 13.3±0.7μs 0.87 bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 100, <type 'object'>)
- 38.4±1μs 32.7±2μs 0.85 bench_linalg.Linalg.time_op('norm', 'longfloat')
- 68.7±3μs 56.5±3μs 0.82 bench_linalg.Linalg.time_op('norm', 'complex256')
- 80.6±4μs 65.9±1μs 0.82 bench_function_base.Median.time_even
- 82.4±2μs 66.8±3μs 0.81 bench_shape_base.Block.time_no_lists(100)
- 73.5±3μs 59.3±3μs 0.81 bench_function_base.Median.time_even_inplace
- 15.2±0.3μs 12.2±0.6μs 0.80 bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 100, <type 'str'>)
- 2.20±0.1ms 1.76±0.04ms 0.80 bench_shape_base.Block2D.time_block2d((1024, 1024), 'uint64', (4, 4))
- 388±20μs 310±10μs 0.80 bench_lib.Pad.time_pad((10, 10, 10), 3, 'linear_ramp')
- 659±20μs 524±20μs 0.80 bench_linalg.Linalg.time_op('det', 'float32')
- 22.9±0.7μs 18.2±0.8μs 0.79 bench_function_base.Where.time_1
- 980±50μs 775±20μs 0.79 bench_shape_base.Block2D.time_block2d((1024, 1024), 'uint32', (4, 4))
- 36.6±1μs 29.0±1μs 0.79 bench_ma.Concatenate.time_it('unmasked', 2)
- 16.4±0.7μs 12.9±0.6μs 0.79 bench_core.CountNonzero.time_count_nonzero_axis(3, 100, <type 'str'>)
- 16.4±0.5μs 12.9±0.4μs 0.79 bench_core.CountNonzero.time_count_nonzero_axis(2, 100, <type 'object'>)
- 141±5μs 110±4μs 0.78 bench_lib.Pad.time_pad((10, 100), (0, 5), 'linear_ramp')
- 18.0±0.6μs 14.1±0.6μs 0.78 bench_core.CountNonzero.time_count_nonzero_axis(3, 100, <type 'object'>)
- 11.9±0.6μs 9.28±0.5μs 0.78 bench_core.CountNonzero.time_count_nonzero_axis(1, 100, <type 'int'>)
- 54.6±3μs 42.4±2μs 0.78 bench_function_base.Median.time_odd_small
- 317±10μs 246±7μs 0.78 bench_lib.Pad.time_pad((10, 10, 10), 1, 'linear_ramp')
- 13.8±0.5μs 10.7±0.7μs 0.77 bench_reduce.MinMax.time_min(<type 'numpy.float64'>)
- 73.3±6μs 56.6±4μs 0.77 bench_lib.Pad.time_pad((1000,), (0, 5), 'mean')
- 14.7±0.7μs 11.4±0.3μs 0.77 bench_core.CountNonzero.time_count_nonzero_axis(2, 100, <type 'str'>)
- 21.5±2μs 16.5±0.6μs 0.77 bench_reduce.MinMax.time_min(<type 'numpy.int64'>)
- 117±4μs 89.2±3μs 0.76 bench_lib.Pad.time_pad((1000,), 3, 'linear_ramp')
- 43.7±1μs 33.4±1μs 0.76 bench_linalg.Linalg.time_op('norm', 'complex128')
- 12.6±0.6μs 9.55±0.2μs 0.76 bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 100, <type 'int'>)
- 636±20μs 482±20μs 0.76 bench_ma.MA.time_masked_array_l100
- 86.6±4μs 65.6±4μs 0.76 bench_lib.Pad.time_pad((1000,), (0, 5), 'linear_ramp')
- 120±4μs 90.4±2μs 0.75 bench_lib.Pad.time_pad((1000,), 1, 'linear_ramp')
- 160±5μs 119±8μs 0.74 bench_ma.Concatenate.time_it('ndarray+masked', 100)
- 14.4±0.6μs 10.7±0.3μs 0.74 bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 100, <type 'str'>)
- 15.7±0.4μs 11.7±0.6μs 0.74 bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 100, <type 'str'>)
- 21.8±2μs 16.1±0.7μs 0.74 bench_reduce.MinMax.time_max(<type 'numpy.int64'>)
- 11.9±0.6μs 8.79±0.3μs 0.74 bench_core.CountNonzero.time_count_nonzero_axis(2, 100, <type 'bool'>)
- 53.8±3μs 39.4±2μs 0.73 bench_function_base.Median.time_even_small
- 106±20μs 76.7±4μs 0.73 bench_function_base.Select.time_select
- 168±10μs 122±4μs 0.72 bench_shape_base.Block2D.time_block2d((512, 512), 'uint32', (2, 2))
- 12.5±0.5μs 8.96±0.4μs 0.72 bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 100, <type 'int'>)
- 162±10μs 115±5μs 0.71 bench_function_base.Percentile.time_percentile
- 12.9±1μs 9.12±0.4μs 0.71 bench_random.Random.time_rng('normal')
- 9.71±0.4μs 6.88±0.3μs 0.71 bench_core.CorrConv.time_convolve(1000, 10, 'full')
- 15.1±0.8μs 10.7±0.4μs 0.71 bench_reduce.MinMax.time_max(<type 'numpy.float64'>)
- 153±9μs 108±7μs 0.71 bench_shape_base.Block2D.time_block2d((1024, 1024), 'uint8', (2, 2))
- 109±5μs 76.9±5μs 0.71 bench_ma.Concatenate.time_it('ndarray+masked', 2)
- 34.3±1μs 24.2±0.6μs 0.71 bench_linalg.Linalg.time_op('norm', 'complex64')
- 9.80±0.2μs 6.84±0.5μs 0.70 bench_core.CorrConv.time_convolve(1000, 10, 'same')
- 27.4±6μs 19.1±2μs 0.70 bench_core.CountNonzero.time_count_nonzero_axis(1, 10000, <type 'bool'>)
- 9.35±0.4μs 6.50±0.3μs 0.70 bench_core.CorrConv.time_convolve(50, 100, 'full')
- 65.2±4μs 45.2±1μs 0.69 bench_shape_base.Block.time_block_simple_row_wise(100)
- 12.9±1μs 8.89±0.3μs 0.69 bench_core.CountNonzero.time_count_nonzero_axis(3, 100, <type 'bool'>)
- 19.6±3μs 13.5±0.4μs 0.69 bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 100, <type 'object'>)
- 75.6±2μs 52.1±3μs 0.69 bench_lib.Pad.time_pad((10, 10, 10), (0, 5), 'reflect')
- 12.4±1μs 8.51±0.4μs 0.69 bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 100, <type 'bool'>)
- 172±30μs 117±4μs 0.68 bench_ma.Concatenate.time_it('unmasked+masked', 100)
- 23.1±0.5μs 15.8±0.9μs 0.68 bench_linalg.Linalg.time_op('norm', 'int16')
- 8.18±0.9μs 5.57±0.1μs 0.68 bench_core.CorrConv.time_correlate(1000, 10, 'full')
- 153±5μs 103±3μs 0.68 bench_function_base.Percentile.time_quartile
- 758±100μs 512±20μs 0.68 bench_linalg.Linalg.time_op('det', 'int16')
- 55.4±6μs 37.4±1μs 0.68 bench_ma.Concatenate.time_it('masked', 2)
- 234±30μs 157±5μs 0.67 bench_shape_base.Block.time_nested(100)
- 103±4μs 69.3±3μs 0.67 bench_linalg.Eindot.time_dot_d_dot_b_c
- 19.2±0.4μs 12.9±0.6μs 0.67 bench_core.Core.time_tril_l10x10
- 122±7μs 81.7±4μs 0.67 bench_lib.Pad.time_pad((10, 10, 10), 3, 'edge')
- 22.9±1μs 15.3±0.5μs 0.67 bench_linalg.Linalg.time_op('norm', 'int32')
- 16.6±2μs 11.0±0.3μs 0.66 bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 100, <type 'object'>)
- 9.98±0.3μs 6.58±0.1μs 0.66 bench_core.CorrConv.time_convolve(1000, 10, 'valid')
- 118±6μs 77.9±4μs 0.66 bench_shape_base.Block2D.time_block2d((512, 512), 'uint16', (2, 2))
- 212±50μs 140±8μs 0.66 bench_lib.Pad.time_pad((10, 10, 10), (0, 5), 'mean')
- 21.9±0.7μs 14.4±0.5μs 0.66 bench_linalg.Linalg.time_op('norm', 'int64')
- 131±5μs 85.9±5μs 0.65 bench_lib.Pad.time_pad((10, 10, 10), 3, 'constant')
- 56.8±2μs 37.0±3μs 0.65 bench_lib.Pad.time_pad((1000,), (0, 5), 'constant')
- 58.9±3μs 38.1±1μs 0.65 bench_lib.Pad.time_pad((10, 100), (0, 5), 'reflect')
- 72.1±2μs 46.5±3μs 0.64 bench_lib.Pad.time_pad((10, 100), (0, 5), 'constant')
- 8.66±0.3μs 5.58±0.2μs 0.64 bench_core.CorrConv.time_correlate(50, 100, 'full')
- 300±30μs 193±10μs 0.64 bench_shape_base.Block2D.time_block2d((1024, 1024), 'uint8', (4, 4))
- 15.9±5μs 10.2±0.3μs 0.64 bench_core.CountNonzero.time_count_nonzero_axis(3, 100, <type 'int'>)
- 13.7±0.5μs 8.80±0.1μs 0.64 bench_random.Random.time_rng('uniform')
- 8.60±0.5μs 5.50±0.2μs 0.64 bench_core.CorrConv.time_correlate(1000, 10, 'same')
- 44.7±2μs 28.5±0.7μs 0.64 bench_lib.Pad.time_pad((1000,), 1, 'reflect')
- 72.7±3μs 46.2±2μs 0.64 bench_lib.Pad.time_pad((10, 10, 10), 3, 'wrap')
- 567±50μs 360±40μs 0.63 bench_shape_base.Block2D.time_block2d((512, 512), 'uint64', (2, 2))
- 58.0±3μs 36.7±2μs 0.63 bench_lib.Pad.time_pad((10, 100), 3, 'reflect')
- 219±30μs 138±7μs 0.63 bench_lib.Pad.time_pad((10, 100), 1, 'mean')
- 261±60μs 164±10μs 0.63 bench_lib.Pad.time_pad((10, 100), 1, 'linear_ramp')
- 825±100μs 519±30μs 0.63 bench_shape_base.Block2D.time_block2d((512, 512), 'uint64', (4, 4))
- 121±5μs 75.7±2μs 0.63 bench_lib.Pad.time_pad((10, 10, 10), 1, 'constant')
- 8.16±0.2μs 5.08±0.4μs 0.62 bench_core.CorrConv.time_convolve(50, 100, 'same')
- 66.6±3μs 41.3±2μs 0.62 bench_lib.Pad.time_pad((1000,), 3, 'constant')
- 53.1±3μs 32.9±0.8μs 0.62 bench_lib.Pad.time_pad((10, 100), 3, 'wrap')
- 285±60μs 177±10μs 0.62 bench_lib.Pad.time_pad((10, 10, 10), (0, 5), 'linear_ramp')
- 8.30±0.9μs 5.14±0.1μs 0.62 bench_core.CorrConv.time_correlate(1000, 10, 'valid')
- 115±3μs 71.2±3μs 0.62 bench_shape_base.Block2D.time_block2d((256, 256), 'uint64', (2, 2))
- 19.1±0.5μs 11.8±0.6μs 0.62 bench_linalg.Linalg.time_op('norm', 'float64')
- 95.3±5μs 58.6±2μs 0.62 bench_lib.Pad.time_pad((10, 100), 1, 'constant')
- 44.6±1μs 27.2±0.9μs 0.61 bench_lib.Pad.time_pad((1000,), (0, 5), 'edge')
- 447±20μs 270±10μs 0.61 bench_shape_base.Block2D.time_block2d((1024, 1024), 'uint16', (4, 4))
- 53.9±2μs 32.6±2μs 0.60 bench_lib.Pad.time_pad((10, 100), 1, 'wrap')
- 11.6±1μs 6.97±0.4μs 0.60 bench_reduce.MinMax.time_max(<type 'numpy.float32'>)
- 95.9±5μs 57.7±2μs 0.60 bench_lib.Pad.time_pad((10, 100), 3, 'constant')
- 47.2±2μs 28.2±2μs 0.60 bench_lib.Pad.time_pad((1000,), (0, 5), 'reflect')
- 5.51±0.2μs 3.27±0.07μs 0.59 bench_core.CountNonzero.time_count_nonzero(3, 100, <type 'object'>)
- 74.3±3μs 44.0±2μs 0.59 bench_lib.Pad.time_pad((10, 10, 10), (0, 5), 'wrap')
- 76.2±3μs 45.0±0.8μs 0.59 bench_lib.Pad.time_pad((10, 10, 10), 1, 'reflect')
- 57.1±1μs 33.5±2μs 0.59 bench_lib.Pad.time_pad((10, 100), (0, 5), 'wrap')
- 52.0±2μs 30.4±1μs 0.58 bench_lib.Pad.time_pad((1000,), 1, 'edge')
- 42.6±2μs 24.9±0.9μs 0.58 bench_lib.Pad.time_pad((1000,), 3, 'wrap')
- 15.0±3μs 8.73±0.3μs 0.58 bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 100, <type 'bool'>)
- 16.0±3μs 9.29±0.3μs 0.58 bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 100, <type 'int'>)
- 53.1±1μs 30.9±2μs 0.58 bench_lib.Pad.time_pad((1000,), 3, 'edge')
- 88.0±8μs 51.1±3μs 0.58 bench_lib.Pad.time_pad((10, 10, 10), 3, 'reflect')
- 44.6±2μs 25.9±1μs 0.58 bench_lib.Pad.time_pad((1000,), (0, 5), 'wrap')
- 90.3±5μs 51.9±1μs 0.57 bench_shape_base.Block2D.time_block2d((512, 512), 'uint8', (2, 2))
- 15.6±0.5μs 8.93±0.3μs 0.57 bench_linalg.Linalg.time_op('norm', 'float32')
- 102±6μs 58.3±0.9μs 0.57 bench_lib.Pad.time_pad((10, 10, 10), 1, 'edge')
- 80.1±4μs 45.6±3μs 0.57 bench_lib.Pad.time_pad((10, 100), 3, 'edge')
- 44.2±2μs 24.9±1μs 0.56 bench_lib.Pad.time_pad((1000,), 1, 'wrap')
- 71.6±8μs 39.5±1μs 0.55 bench_lib.Pad.time_pad((10, 10, 10), 1, 'wrap')
- 81.7±10μs 44.8±2μs 0.55 bench_lib.Pad.time_pad((10, 100), 1, 'edge')
- 420±90μs 230±10μs 0.55 bench_shape_base.Block.time_3d(10, 'block')
- 114±20μs 62.3±2μs 0.55 bench_lib.Pad.time_pad((10, 10, 10), (0, 5), 'constant')
- 5.76±0.1μs 3.13±0.08μs 0.54 bench_core.CorrConv.time_convolve(50, 10, 'same')
- 5.30±0.1μs 2.84±0.08μs 0.54 bench_core.CorrConv.time_correlate(50, 100, 'valid')
- 92.5±4μs 49.3±1μs 0.53 bench_shape_base.Block2D.time_block2d((256, 256), 'uint32', (2, 2))
- 13.5±3μs 7.07±0.2μs 0.52 bench_reduce.MinMax.time_min(<type 'numpy.float32'>)
- 7.66±1μs 3.88±0.2μs 0.51 bench_core.CorrConv.time_convolve(50, 100, 'valid')
- 29.0±3μs 14.5±0.8μs 0.50 bench_shape_base.Block.time_no_lists(10)
- 6.62±0.3μs 3.30±0.2μs 0.50 bench_core.CorrConv.time_convolve(1000, 1000, 'valid')
- 74.2±7μs 36.2±0.9μs 0.49 bench_shape_base.Block2D.time_block2d((256, 256), 'uint16', (2, 2))
- 5.55±0.3μs 2.70±0.2μs 0.49 bench_core.CorrConv.time_convolve(50, 10, 'valid')
- 73.9±20μs 35.8±2μs 0.48 bench_lib.Pad.time_pad((10, 100), 1, 'reflect')
- 224±20μs 107±7μs 0.48 bench_shape_base.Block2D.time_block2d((256, 256), 'uint64', (4, 4))
- 3.87±0.1μs 1.83±0.06μs 0.47 bench_core.CountNonzero.time_count_nonzero(2, 100, <type 'str'>)
- 109±30μs 51.5±3μs 0.47 bench_lib.Pad.time_pad((10, 10, 10), (0, 5), 'edge')
- 240±20μs 112±4μs 0.47 bench_shape_base.Block2D.time_block2d((512, 512), 'uint16', (4, 4))
- 337±40μs 158±7μs 0.47 bench_shape_base.Block2D.time_block2d((512, 512), 'uint32', (4, 4))
- 188±8μs 88.0±2μs 0.47 bench_shape_base.Block2D.time_block2d((512, 512), 'uint8', (4, 4))
- 4.39±0.2μs 2.04±0.09μs 0.47 bench_core.CountNonzero.time_count_nonzero(3, 10000, <type 'bool'>)
- 73.2±4μs 33.9±0.5μs 0.46 bench_shape_base.Block2D.time_block2d((128, 128), 'uint64', (2, 2))
- 5.48±1μs 2.44±0.1μs 0.45 bench_core.CountNonzero.time_count_nonzero(2, 100, <type 'object'>)
- 4.46±0.1μs 1.97±0.08μs 0.44 bench_core.CorrConv.time_correlate(50, 10, 'full')
- 30.4±9μs 13.3±0.3μs 0.44 bench_shape_base.Block.time_no_lists(1)
- 7.05±0.2μs 3.05±0.06μs 0.43 bench_reduce.SmallReduction.time_small
- 7.35±1μs 3.12±0.2μs 0.42 bench_core.CorrConv.time_convolve(50, 10, 'full')
- 4.36±0.1μs 1.84±0.07μs 0.42 bench_core.CorrConv.time_correlate(50, 10, 'same')
- 3.51±0.2μs 1.46±0.05μs 0.42 bench_core.CountNonzero.time_count_nonzero(1, 100, <type 'object'>)
- 4.03±0.05μs 1.66±0.1μs 0.41 bench_core.CorrConv.time_correlate(1000, 1000, 'valid')
- 199±10μs 80.1±3μs 0.40 bench_shape_base.Block2D.time_block2d((256, 256), 'uint32', (4, 4))
- 3.98±0.2μs 1.60±0.08μs 0.40 bench_core.CountNonzero.time_count_nonzero(2, 10000, <type 'bool'>)
- 61.8±2μs 24.8±1μs 0.40 bench_shape_base.Block2D.time_block2d((256, 256), 'uint8', (2, 2))
- 4.13±0.1μs 1.62±0.05μs 0.39 bench_core.CorrConv.time_correlate(50, 10, 'valid')
- 61.6±2μs 23.9±1μs 0.39 bench_shape_base.Block2D.time_block2d((128, 128), 'uint32', (2, 2))
- 184±10μs 70.5±3μs 0.38 bench_shape_base.Block2D.time_block2d((256, 256), 'uint16', (4, 4))
- 56.1±4μs 21.0±0.9μs 0.38 bench_shape_base.Block2D.time_block2d((64, 64), 'uint64', (2, 2))
- 40.0±2μs 15.0±0.6μs 0.37 bench_shape_base.Block.time_block_simple_column_wise(10)
- 121±2μs 45.1±2μs 0.37 bench_shape_base.Block.time_nested(1)
- 179±4μs 66.1±4μs 0.37 bench_shape_base.Block2D.time_block2d((128, 128), 'uint64', (4, 4))
- 59.8±2μs 22.0±1μs 0.37 bench_shape_base.Block2D.time_block2d((128, 128), 'uint16', (2, 2))
- 3.19±0.05μs 1.17±0.02μs 0.37 bench_core.CountNonzero.time_count_nonzero(1, 100, <type 'str'>)
- 54.0±3μs 19.7±1μs 0.37 bench_shape_base.Block2D.time_block2d((32, 32), 'uint64', (2, 2))
- 56.9±1μs 20.7±0.7μs 0.36 bench_shape_base.Block2D.time_block2d((64, 64), 'uint32', (2, 2))
- 3.14±0.1μs 1.14±0.04μs 0.36 bench_core.CountNonzero.time_count_nonzero(1, 10000, <type 'bool'>)
- 92.7±2μs 33.7±2μs 0.36 bench_shape_base.Block.time_block_complicated(1)
- 104±4μs 37.8±1μs 0.36 bench_shape_base.Block.time_block_complicated(10)
- 128±5μs 45.5±2μs 0.36 bench_shape_base.Block.time_nested(10)
- 196±100μs 69.4±3μs 0.35 bench_ma.Concatenate.time_it('unmasked+masked', 2)
- 153±5μs 53.9±2μs 0.35 bench_shape_base.Block2D.time_block2d((128, 128), 'uint16', (4, 4))
- 39.4±2μs 13.8±0.5μs 0.35 bench_shape_base.Block.time_block_simple_column_wise(1)
- 53.5±2μs 18.7±1μs 0.35 bench_shape_base.Block2D.time_block2d((32, 32), 'uint8', (2, 2))
- 55.2±2μs 19.3±0.6μs 0.35 bench_shape_base.Block2D.time_block2d((32, 32), 'uint16', (2, 2))
- 16.9±1μs 5.89±0.5μs 0.35 bench_core.Core.time_dstack_l
- 60.6±3μs 21.1±0.6μs 0.35 bench_shape_base.Block2D.time_block2d((128, 128), 'uint8', (2, 2))
- 25.5±0.2μs 8.88±0.3μs 0.35 bench_shape_base.Block.time_block_simple_row_wise(10)
- 54.6±3μs 19.0±0.6μs 0.35 bench_shape_base.Block2D.time_block2d((16, 16), 'uint64', (2, 2))
- 52.6±2μs 18.2±0.7μs 0.35 bench_shape_base.Block2D.time_block2d((16, 16), 'uint16', (2, 2))
- 6.57±2μs 2.25±0.08μs 0.34 bench_core.CountNonzero.time_count_nonzero(3, 100, <type 'str'>)
- 24.3±1μs 8.30±0.6μs 0.34 bench_shape_base.Block.time_block_simple_row_wise(1)
- 148±3μs 50.0±3μs 0.34 bench_shape_base.Block2D.time_block2d((16, 16), 'uint32', (4, 4))
- 171±8μs 57.9±4μs 0.34 bench_shape_base.Block2D.time_block2d((256, 256), 'uint8', (4, 4))
- 159±5μs 53.8±1μs 0.34 bench_shape_base.Block2D.time_block2d((64, 64), 'uint64', (4, 4))
- 171±20μs 57.7±2μs 0.34 bench_shape_base.Block2D.time_block2d((128, 128), 'uint32', (4, 4))
- 3.15±0.3μs 1.06±0.03μs 0.34 bench_core.CountNonzero.time_count_nonzero(3, 100, <type 'int'>)
- 55.7±5μs 18.7±0.2μs 0.34 bench_shape_base.Block2D.time_block2d((16, 16), 'uint8', (2, 2))
- 158±7μs 52.6±3μs 0.33 bench_shape_base.Block2D.time_block2d((128, 128), 'uint8', (4, 4))
- 153±4μs 50.7±1μs 0.33 bench_shape_base.Block2D.time_block2d((32, 32), 'uint64', (4, 4))
- 152±7μs 50.3±1μs 0.33 bench_shape_base.Block2D.time_block2d((16, 16), 'uint8', (4, 4))
- 53.6±3μs 17.7±0.4μs 0.33 bench_shape_base.Block2D.time_block2d((16, 16), 'uint32', (2, 2))
- 156±4μs 51.4±3μs 0.33 bench_shape_base.Block2D.time_block2d((64, 64), 'uint8', (4, 4))
- 148±3μs 48.2±2μs 0.33 bench_shape_base.Block2D.time_block2d((16, 16), 'uint16', (4, 4))
- 160±10μs 52.0±1μs 0.33 bench_shape_base.Block2D.time_block2d((64, 64), 'uint32', (4, 4))
- 159±8μs 51.4±3μs 0.32 bench_shape_base.Block2D.time_block2d((64, 64), 'uint16', (4, 4))
- 59.8±3μs 19.3±1μs 0.32 bench_shape_base.Block2D.time_block2d((32, 32), 'uint32', (2, 2))
- 153±4μs 49.4±2μs 0.32 bench_shape_base.Block2D.time_block2d((32, 32), 'uint32', (4, 4))
- 15.6±0.6μs 5.03±0.3μs 0.32 bench_core.Core.time_vstack_l
- 154±7μs 49.7±2μs 0.32 bench_shape_base.Block2D.time_block2d((32, 32), 'uint8', (4, 4))
- 59.6±6μs 19.1±0.8μs 0.32 bench_shape_base.Block2D.time_block2d((64, 64), 'uint8', (2, 2))
- 3.03±0.4μs 969±30ns 0.32 bench_core.CountNonzero.time_count_nonzero(2, 100, <type 'int'>)
- 120±10μs 38.4±2μs 0.32 bench_shape_base.Block.time_3d(1, 'block')
- 156±5μs 49.3±1μs 0.32 bench_shape_base.Block2D.time_block2d((16, 16), 'uint64', (4, 4))
- 164±10μs 49.3±2μs 0.30 bench_shape_base.Block2D.time_block2d((32, 32), 'uint16', (4, 4))
- 65.7±10μs 19.6±0.7μs 0.30 bench_shape_base.Block2D.time_block2d((64, 64), 'uint16', (2, 2))
- 2.82±0.08μs 732±30ns 0.26 bench_core.CountNonzero.time_count_nonzero(1, 100, <type 'int'>)
- 2.77±0.07μs 664±30ns 0.24 bench_core.CountNonzero.time_count_nonzero(2, 100, <type 'bool'>)
- 2.61±0.1μs 624±20ns 0.24 bench_core.CountNonzero.time_count_nonzero(1, 100, <type 'bool'>)
- 16.8±3μs 3.97±0.2μs 0.24 bench_core.Core.time_hstack_l
- 2.78±0.1μs 637±20ns 0.23 bench_core.CountNonzero.time_count_nonzero(3, 100, <type 'bool'>)
- 2.36±0.2μs 207±5ns 0.09 bench_overrides.ArrayFunction.time_mock_broadcast_to_numpy
- 2.68±0.1μs 221±7ns 0.08 bench_overrides.ArrayFunction.time_mock_concatenate_numpy
- 2.58±0.1μs 201±10ns 0.08 bench_overrides.ArrayFunction.time_mock_broadcast_to_duck
- 3.02±0.2μs 222±6ns 0.07 bench_overrides.ArrayFunction.time_mock_concatenate_duck
- 4.29±0.3μs 216±6ns 0.05 bench_overrides.ArrayFunction.time_mock_concatenate_mixed
- 142±20μs 213±8ns 0.00 bench_overrides.ArrayFunction.time_mock_concatenate_many
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
see also https://docs.google.com/spreadsheets/d/15-AFI_cmZqfkU6mo2p1znsQF2E52PEXpF68QqYqEar4/edit#gid=0 for a spreadsheet.
Not surprisingly, the largest performance difference is for functions that call other numpy functions internally many times, e.g., for np.block()
.
@shoyer - I was slightly disconcerted by the extra time taken... Probably, we really should have a C implementation, but in the meantime I made a PR with some small changes that shave off some time for the common case of only a single type, and for the case where the only type is ndarray
. See #12321.
@shoyer - I raised two issues on the mailing list that are probably good to state here too:
types
? (rather than just those of arguments that provide an override.) It would seem helpful to know for implementations. (see #12327).ndarray.__array_function__
implementation accept subclasses even if they override __array_function__
? This would be reasonable given the Liskov substitution principle and given that the subclass already had a chance to bail. It would imply calling the implementation rather than the public function inside ndarray.__array_function__
. (And something similar in __array_ufunc__
...) See #12328 for a trial for __array_function__
only.@shoyer - see #12327 for a quick implemenation of (1) - if we do go this route, I think we should also adjust the NEP.
And #12328 for a trial of (2), mostly to see how it looks.
I’m +1 on both modifications here.
The name of dispatcher functions in error messages came up again in https://github.com/numpy/numpy/pull/12789, where someone was surprised to see TypeError: _pad_dispatcher missing 1 required positional argument
In addition to the alternatives outlined above https://github.com/numpy/numpy/issues/12028#issuecomment-429377396 (we currently use 2), I'll add a fourth option:
Option 4: Write a separate dispatcher for each function, with the same name as the function:
def sin(a):
return (a,)
@array_function_dispatch(sin)
def sin(a):
...
def cos(a):
return (a,)
@array_function_dispatch(cos)
def cos(a):
...
Advantages:
Disadvantages:
pad
received the wrong arguments (but we do have tests to verify that they are kept in sync).I think in order to keep current code working, the actual function must come _after_ the dispatcher.
Right, but we can give it the same name as the dispatcher. The dispatcher name will get overwritten.
It would great to be able to define custom dispatching for function like np.arange or np.empty.
I guess one option would be for to NumPy to dispatch on scalars as well as arrays. Is this incompatible with the NEP? Would anything break with this change?
For discussion about np.arange
, see https://github.com/numpy/numpy/issues/12379.
I don't see how np.empty()
could do dispatching -- there's nothing to dispatch on, just a shape and a dtype. But certainly np.empty_like()
could do dispatching with an overwritten shape -- that's exactly what https://github.com/numpy/numpy/pull/13046 is about supporting.
Option 4: Write a separate dispatcher for each function, with the same name as the function:
Any objections to adopting this option? I think it's probably the friendliest choice from a user perspective.
I don't see how np.empty() could do dispatching -- there's nothing to dispatch on, just a shape and a dtype
You might want to dispatch on either of those. For example, here is a custom shape object, that we might want to dispatch differently on.
This example isn't very useful, but the idea is that I have a lazy object that behaves like shape, but doesn't return integers, it returns expressions. For example, it would nice to be able to do something like this:
class ExprShape:
def __getitem__(self, i):
return ('getitem', self, i)
def __len__(self):
return ('len', self)
numpy.empty(ExprShape())
Which would I would like to override to return something like ExprArray('empty', ExprShape())
.
Yes, in principle we could dispatch on shape, too. That would add additional complexity/overhead to the protocol. Do you have use-cases where using an array as a template (like empty_like
with shape
) would not suffice?
The other cases I can think of is the size
argument to np.random.RandomState
methods, but note that we currently don't support those at all -- see http://www.numpy.org/neps/nep-0018-array-function-protocol.html#callable-objects-generated-at-runtime
Do you have use-cases where using an array as a template (like empty_like with shape) would not suffice?
If we are taking an existing API that depends on NumPy and would like to transparently have it work on a different backend, without changing the existing source code.
For example, let's say we were attempting to call scipy.optimize.differential_evolution
with NP like arrays, that build up a call graph instead of executing immediately.
You can see here it would be helpful if we could change np.full
to create a symbolic array instead of a default numpy array, if the input passed into it was also symbolic.
If we are taking an existing API that depends on NumPy and would like to transparently have it work on a different backend, without changing the existing source code.
This isn't possible in general. Explicit array construction like np.array()
is definitely going to need to be rewritten to be compatible with duck typing.
In this case, switching energies = np.full(num_members, np.inf)
to energies = np.full_like(population, np.inf, shape=num_members)
seems like an easy and readable change.
This isn't possible in general. Explicit array construction like np.array() is definitely going to need to be rewritten to be compatible with duck typing.
Is there a proposal around making this sort of change or are you saying that supporting dispatching of np.array
would be really hard and so we can't ever get to 100% support?
In this case, switching energies = np.full(num_members, np.inf) to energies = np.full_like(population, np.inf, shape=num_members) seems like an easy and readable change.
Definitely. But there are many cases where either you don't control the source code, or you want to support users in using the functions they know and love as much as possible.
There are other ways to provide users with that experience like:
Both of those options might be required in certain cases (like letting users call np.full
and return a symbolic result currently), but if I understand correctly, the goal of NEP-18 is to try to limit when those are needed and let people use the original NumPy in more cases.
I understand there is a performance/complexity tradeoff here and that might be a good reason not to implement these. But it might force users to explore other means to get the flexibility they desire.
Is there a proposal around making this sort of change or are you saying that supporting dispatching of
np.array
would be really hard and so we can't ever get to 100% support?
NEP 22 has some discussion of the options here. I don't think we can safely change the semantics of np.asarray()
to return anything other than a numpy.ndarray
object -- we will need a new protocol for this.
The problem is that np.asarray()
is currently the idiomatic way of casting to a numpy array object, which uses can and do expect to exactly match numpy.ndarray
, e.g., down to the memory layout.
There are certainly plenty of use-cases where this isn't the case, but switching this behavior would break lots of downstream code, so it's a non-starter. Downstream projects will need to opt-in to at least this aspect of array duck typing.
I understand there is a performance/complexity tradeoff here and that might be a good reason not to implement these. But it might force users to explore other means to get the flexibility they desire.
Yes. NEP 18 is not intended to be a complete solution for drop-in NumPy alternatives, but its a step in that direction.
I've drafted a revision to NEP-18 for adding a __numpy_implementation__
attribute:
https://github.com/numpy/numpy/pull/13305
It occurs to me that we forget to warp the functions in numpy.testing
: https://github.com/numpy/numpy/issues/13588
I'm going to do that shortly...
There's one revision that I'd like to see to the NEP, specifically to clarify what guarantees NEP-18 offers to subclass authors: https://github.com/numpy/numpy/pull/13633
I marked the usability tasks complete since gh-13329 was fixed. We decidedgh-#13588 can wait till after the release of 1.17. That leaves documentation improvements and arange
gh-12379 still open for inclusion in 1.17.
There's also #13728 - a bug in the dispatcher for histogram[2d]d
That leaves documentation improvements and arange gh-12379 still open for inclusion in 1.17.
An issue for documentation was missing, so I opened gh-13844. I think docs are a lot more important than the arange
open issue.
@shoyer can we close this?
Most helpful comment
NEP 22 has some discussion of the options here. I don't think we can safely change the semantics of
np.asarray()
to return anything other than anumpy.ndarray
object -- we will need a new protocol for this.The problem is that
np.asarray()
is currently the idiomatic way of casting to a numpy array object, which uses can and do expect to exactly matchnumpy.ndarray
, e.g., down to the memory layout.There are certainly plenty of use-cases where this isn't the case, but switching this behavior would break lots of downstream code, so it's a non-starter. Downstream projects will need to opt-in to at least this aspect of array duck typing.
Yes. NEP 18 is not intended to be a complete solution for drop-in NumPy alternatives, but its a step in that direction.