Numpy: Tracking issue for implementation of NEP-18 (__array_function__)

Created on 25 Sep 2018  ·  54Comments  ·  Source: numpy/numpy

  • [x] Core functionality for supporting overrides:

    • [x] Initial implementation in pure Python (#12005)

    • [x] Validate dispatcher functions in array_function_dispatch (https://github.com/numpy/numpy/pull/12099)



      • Disable validation when not testing NumPy (if there is a measurable impact on import times) (unnecessary)



    • [x] Add a .__skip_array_function__ function attribute to allow for skipping __array_function__ dispatch. (https://github.com/numpy/numpy/pull/13389)

  • [x] Reimplement parts of numpy/core/overrides.py in C for speed (https://github.com/numpy/numpy/issues/12028):

    • [x] get_overloaded_types_and_args

    • [x] array_function_implementation_or_override

    • [x] ndarray.__array_function__?

    • [x] array_function_dispatch?

  • [x] Support overrides for all public NumPy functions

    • [x] numpy.core



      • [x] the easy part (https://github.com/numpy/numpy/pull/12115)


      • [x] np.core.defchararray (#12154)


      • [x] np.einsum and np.block (https://github.com/numpy/numpy/pull/12163)



    • [x] numpy.lib



      • [x] part 1 (https://github.com/numpy/numpy/pull/12116)


      • [x] part 2 (#12119)



    • [x] numpy.fft/numpy.linalg (https://github.com/numpy/numpy/pull/12117)

    • [x] functions currently written entirely in C: empty_like, concatenate, inner, where, lexsort, can_cast, min_scalar_type, result_type, dot, vdot, is_busday, busday_offset, busday_count, datetime_as_string (https://github.com/numpy/numpy/pull/12175)

    • [x] linspace

    • [ ] [arange?](https://github.com/numpy/numpy/issues/12379)

  • [x] Usability improvements

    • [x] [better error message](https://github.com/numpy/numpy/issues/12213) for unimplemented functions (https://github.com/numpy/numpy/pull/12251)

    • [x] ndarray.__repr__ should not rely on __array_function__ (https://github.com/numpy/numpy/pull/12212)

    • [x] stacklevel should be increased by 1 for wrapped functions, so tracebacks point to the right place (gh-13329)

  • [x] Fix all known bugs / downstream test failures
  • [ ] Documentation

    • [x] Release notes (#12028)

    • [x] Narrative docs

    • [ ] Revised docstrings to clarify overloaded arguments?

__array_function__

Most helpful comment

Is there a proposal around making this sort of change or are you saying that supporting dispatching of np.array would be really hard and so we can't ever get to 100% support?

NEP 22 has some discussion of the options here. I don't think we can safely change the semantics of np.asarray() to return anything other than a numpy.ndarray object -- we will need a new protocol for this.

The problem is that np.asarray() is currently the idiomatic way of casting to a numpy array object, which uses can and do expect to exactly match numpy.ndarray, e.g., down to the memory layout.

There are certainly plenty of use-cases where this isn't the case, but switching this behavior would break lots of downstream code, so it's a non-starter. Downstream projects will need to opt-in to at least this aspect of array duck typing.

I understand there is a performance/complexity tradeoff here and that might be a good reason not to implement these. But it might force users to explore other means to get the flexibility they desire.

Yes. NEP 18 is not intended to be a complete solution for drop-in NumPy alternatives, but its a step in that direction.

All 54 comments

It might be good to merge a preliminary "Decorate all public NumPy functions with @array_function_dispatch" for some high-profile functions and request downstream consumers of the protocol to try it out

Once we merge https://github.com/numpy/numpy/pull/12099 I have another PR ready that will add dispatch decorators for most of numpy.core. It will be pretty easy to finish things up -- this one took less than an hour to put together.

cc @eric-wieser @mrocklin @mhvk @hameerabbasi

See https://github.com/shoyer/numpy/tree/array-function-easy-impl for my branch implementing all the "easy" overrides on functions with Python wrappers. The leftover parts are np.block, np.einsum and a handful of multiarray functions written entirely in C (e.g., np.concatenate). I'll split this into a bunch of PRs once we're done with #12099.

Note that I haven't written tests for overrides on each individual function. I'd like to add a few integration tests when we're done (e.g., a duck array that logs all applied operations), but I don't think it would be productive to write dispatching tests for each individual function. The checks in #12099 should catch the most common errors on dispatchers, and every line of code in dispatcher functions should get executed by existing tests.

@shoyer - on the tests, I agree that it is not particularly useful to write tests for each one; instead, within numpy, it may make most sense to start using the overrides relatively quickly in MaskedArray.

@mhvk sounds good to me, though I'll let someone else who uses/knows MaskedArray take the lead on that.

See https://github.com/numpy/numpy/pull/12115, https://github.com/numpy/numpy/pull/12116, #12119 and https://github.com/numpy/numpy/pull/12117 for PRs implementing __array_function__ support on functions defined in Python.

@shoyer - seeing some of the implementations, I have two worries:

  • For some functions, like reshape, the original functionality already provided a way to override it, by defining a reshape method. We are effectively deprecating that for any class that defines __array_function__.
  • For other functions, like np.median, careful use of np.asanyarray and ufuncs ensured that subclasses could already use them. But that functionality can no longer be accessed directly.

I think overall these two things are probably benefits, since we simplify the interface and can make the implementations optimized for pure ndarray - though the latter suggests that ndarray.__array_function__ should take over converting lists, etc., to ndarray, so that the implementations can skip that part). Still, I thought I'd note it since it makes me dread implementing this for Quantity a bit more than I thought -- in terms of both the amount of work and the possible hit in performance.

though the latter suggests that ndarray.__array_function__ should take over converting lists, etc., to ndarray, so that the implementations can skip that part).

I'm not sure I follow here.

We are indeed effectively deprecating the old way of overriding functions like reshape and mean, though the old way does still support incomplete implementations of NumPy's API.

I'm not sure I follow here.

I think the issue is that if we implement __array_function__ for even a single function, the previous mechanisms break completely and there is no way to fail-over. Which is why I propose we revisit my NotImplementedButCoercible proposal.

@hameerabbasi - yes, that is the problem. Though we need to be careful here how easy we make it to rely on duct-tape solutions that we would really rather get rid of... (which is why I wrote above that my "problems" may actually be benefits...). Maybe there is a case for trying as is in 1.16 and then deciding on actual experience whether we want to provide fall-back of "ignore my __array_function__ for this case".

Re: dispatcher styling: My preferences on style are based on memory/import time considerations, and verbosity. Quite simply, merge the dispatchers where the signature is likely to remain the same. This way, we create the least amount of objects and the cache hits will be higher too.

That said, I'm not too opposed to the lambda style.

The style for writing dispatcher functions has now come up in a few PRs. It would good to make a consistent choice across NumPy.

We have a few options:


Option 1: Write a separate dispatcher for each function, e.g.,

def _sin_dispatcher(a):
    return (a,)


@array_function_dispatch(_sin_dispatcher)
def sin(a):
     ...


def _cos_dispatcher(a):
    return (a,)


@array_function_dispatch(_cos_dispatcher)
def cos(a):
    ...

Advantages:

  • Very readable
  • Easy to find definitions of dispatcher functions
  • Clear error message when you supply the wrong arguments, e.g., sin(x=1) -> TypeError: _sin_dispatcher() got an unexpected keyword argument 'x'.

Disadvantages:

  • Lots of repetition, even when many functions in a module have the exact same signature.

Option 2: Reuse dispatcher functions within a module, e.g.,

def _unary_dispatcher(a):
    return (a,)


@array_function_dispatch(_unary_dispatcher)
def sin(a):
     ...


@array_function_dispatch(_unary_dispatcher)
def cos(a):
    ...

Advantages:

  • Less repetition
  • Readable

Disadvantages:

  • Can be a little harder to find definitions of dispatcher functions
  • Slightly less clear error messages for bad arguments, e.g., sin(x=1) -> TypeError: _unary_dispatcher() got an unexpected keyword argument 'x'

Option 3: Use lambda functions when the dispatcher definition would fit on one line, e.g.,

# inline style (shorter)
@array_function_dispatch(lambda a: (a,))
def sin(a):
     ...


@array_function_dispatch(lambda a, n=None, axis=None, norm=None: (a,))
def fft(a, n=None, axis=-1, norm=None):
     ...
# multiline style (more readable?)
@array_function_dispatch(
    lambda a: (a,)
)
def sin(a):
     ...


@array_function_dispatch(
    lambda a, n=None, axis=None, norm=None: (a,)
)
def fft(a, n=None, axis=-1, norm=None):
     ...

Advantages:

  • No need to hunt for dispatcher definitions, they are right there.
  • Fewer number of characters and lines of codes.
  • Looks very nice for short cases (e.g., one argument), especially when the lambda is shorter than the function name.

Disadvantages:

  • More repeated code than option 2.
  • Looks pretty cluttered if there are more than a few arguments
  • Also has less clear error messages (TypeError: <lambda>() got an unexpected keyword argument 'x')

@shoyer: edited to add the two-line PEP8 spacing to make the "lines of code" aspect more realistic

Note that the error message issues can be fixed by reconstructing the code object, although that will come with some import-time cost. Perhaps worth investigating, and breaking out @nschloe's tuna to compare some options.

Yep, the decorator module could also be used for generating function definition (it uses a slightly different approach for code generation, a little more like namedtuple in that it uses exec()).

As long as the error is not solved, I think we need to stick to the options with a dispatcher which has a clear name. I'd slightly to bundle dispatchers together (2) for memory reasons, though would then keep the error message very much in mind, so would suggest calling the dispatcher something like _dispatch_on_x.

Though if we can change the error, things change. E.g., it might be as simple as catching exceptions, replacing <lambda> with the function name in the exception text, and then re-raising. (Or does that chain thing these days?)

I agree that the error message needs to be clear, ideally shouldn't change at all.

OK, for now I think it's best to hold off on using lambda, unless we get some sort of code generation working.

https://github.com/numpy/numpy/pull/12175 adds a draft of what overrides for multiarray functions (written in C) could look like if we take the Python wrapper approach.

@mattip where are we at on implementing matmul as a ufunc? Once we finish up all these __array_function__ overrides, I think that's the last thing we need for making NumPy's public API fully overloadable. It would be nice to have it all ready for NumPy 1.16!

PR #11175, which implements NEP 20, has been making slow progress. It is a blocker for PR #11133, which has the matmul loop code. That one still needs to be updated and then verified via benchmarks that the new code is not slower than the old.

I have four PRs up for review which should complete the full set of overrides. Final reviews/signoffs/merges would be appreciated so we can start testing __array_function__ in earnest! https://github.com/numpy/numpy/pull/12154, https://github.com/numpy/numpy/pull/12163, https://github.com/numpy/numpy/pull/12119, https://github.com/numpy/numpy/pull/12175

Adding overrides to np.core caused a few pandas tests to fail (https://github.com/pandas-dev/pandas/issues/23172). We're not quite sure what's going on yet but we should definitely figure it out before releasing.

See https://github.com/numpy/numpy/issues/12225 for my best guess at the why this is causing test failures in dask/pandas.

Some benchmarks of imports times (on my macbook pro with a solid state drive):

  • NumPy 1.15.2: 152.451 ms
  • NumPy master: 156.5745 ms
  • Using decorator.decorate (#12226): 183.694 ms

My benchmark script

import numpy as np
import subprocess

times = []
for _ in range(100):
    result = subprocess.run("python -X importtime -c 'import numpy'",
                            shell=True, capture_output=True)
    last_line = result.stderr.rstrip().split(b'\n')[-1]
    time = float(last_line.decode('ascii')[-15:-7].strip().rstrip())
    times.append(time)

print(np.median(times) / 1e3)

Any idea of the memory usage (before/after)? That's kind of useful as well, especially for IoT applications.

Do you know how to reliably measure memory usage for a module?
On Sat, Oct 20, 2018 at 6:56 AM Hameer Abbasi notifications@github.com
wrote:

Any idea of the memory usage (before/after)? That's kind of useful as
well, especially for IoT applications.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/numpy/numpy/issues/12028#issuecomment-431584123, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABKS1k_IkrJ2YmYReaDrnkNvcH2X0-ZCks5umyuogaJpZM4W3kSC
.

I think writing a script containing import numpy as np, adding a sleep statement and tracking process memory should be good enough. https://superuser.com/questions/581108/how-can-i-track-and-log-cpu-and-memory-usage-on-a-mac

Any other core devs want to take a quick look (really, it only includes two functions!) at https://github.com/numpy/numpy/pull/12163? It's the last PR adding array_function_dispatch to internal numpy functions.

For reference, here's the performance difference I see when disabling __array_function__:

       before           after         ratio
     [45718fd7]       [4e5aa2cd]
     <master>         <disable-array-function>
+        72.5±2ms         132±20ms     1.82  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 10000)
-        44.9±2μs       40.8±0.6μs     0.91  bench_ma.Concatenate.time_it('ndarray', 2)
-      15.3±0.3μs       13.3±0.7μs     0.87  bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 100, <type 'object'>)
-        38.4±1μs         32.7±2μs     0.85  bench_linalg.Linalg.time_op('norm', 'longfloat')
-        68.7±3μs         56.5±3μs     0.82  bench_linalg.Linalg.time_op('norm', 'complex256')
-        80.6±4μs         65.9±1μs     0.82  bench_function_base.Median.time_even
-        82.4±2μs         66.8±3μs     0.81  bench_shape_base.Block.time_no_lists(100)
-        73.5±3μs         59.3±3μs     0.81  bench_function_base.Median.time_even_inplace
-      15.2±0.3μs       12.2±0.6μs     0.80  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 100, <type 'str'>)
-      2.20±0.1ms      1.76±0.04ms     0.80  bench_shape_base.Block2D.time_block2d((1024, 1024), 'uint64', (4, 4))
-        388±20μs         310±10μs     0.80  bench_lib.Pad.time_pad((10, 10, 10), 3, 'linear_ramp')
-        659±20μs         524±20μs     0.80  bench_linalg.Linalg.time_op('det', 'float32')
-      22.9±0.7μs       18.2±0.8μs     0.79  bench_function_base.Where.time_1
-        980±50μs         775±20μs     0.79  bench_shape_base.Block2D.time_block2d((1024, 1024), 'uint32', (4, 4))
-        36.6±1μs         29.0±1μs     0.79  bench_ma.Concatenate.time_it('unmasked', 2)
-      16.4±0.7μs       12.9±0.6μs     0.79  bench_core.CountNonzero.time_count_nonzero_axis(3, 100, <type 'str'>)
-      16.4±0.5μs       12.9±0.4μs     0.79  bench_core.CountNonzero.time_count_nonzero_axis(2, 100, <type 'object'>)
-         141±5μs          110±4μs     0.78  bench_lib.Pad.time_pad((10, 100), (0, 5), 'linear_ramp')
-      18.0±0.6μs       14.1±0.6μs     0.78  bench_core.CountNonzero.time_count_nonzero_axis(3, 100, <type 'object'>)
-      11.9±0.6μs       9.28±0.5μs     0.78  bench_core.CountNonzero.time_count_nonzero_axis(1, 100, <type 'int'>)
-        54.6±3μs         42.4±2μs     0.78  bench_function_base.Median.time_odd_small
-        317±10μs          246±7μs     0.78  bench_lib.Pad.time_pad((10, 10, 10), 1, 'linear_ramp')
-      13.8±0.5μs       10.7±0.7μs     0.77  bench_reduce.MinMax.time_min(<type 'numpy.float64'>)
-        73.3±6μs         56.6±4μs     0.77  bench_lib.Pad.time_pad((1000,), (0, 5), 'mean')
-      14.7±0.7μs       11.4±0.3μs     0.77  bench_core.CountNonzero.time_count_nonzero_axis(2, 100, <type 'str'>)
-        21.5±2μs       16.5±0.6μs     0.77  bench_reduce.MinMax.time_min(<type 'numpy.int64'>)
-         117±4μs         89.2±3μs     0.76  bench_lib.Pad.time_pad((1000,), 3, 'linear_ramp')
-        43.7±1μs         33.4±1μs     0.76  bench_linalg.Linalg.time_op('norm', 'complex128')
-      12.6±0.6μs       9.55±0.2μs     0.76  bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 100, <type 'int'>)
-        636±20μs         482±20μs     0.76  bench_ma.MA.time_masked_array_l100
-        86.6±4μs         65.6±4μs     0.76  bench_lib.Pad.time_pad((1000,), (0, 5), 'linear_ramp')
-         120±4μs         90.4±2μs     0.75  bench_lib.Pad.time_pad((1000,), 1, 'linear_ramp')
-         160±5μs          119±8μs     0.74  bench_ma.Concatenate.time_it('ndarray+masked', 100)
-      14.4±0.6μs       10.7±0.3μs     0.74  bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 100, <type 'str'>)
-      15.7±0.4μs       11.7±0.6μs     0.74  bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 100, <type 'str'>)
-        21.8±2μs       16.1±0.7μs     0.74  bench_reduce.MinMax.time_max(<type 'numpy.int64'>)
-      11.9±0.6μs       8.79±0.3μs     0.74  bench_core.CountNonzero.time_count_nonzero_axis(2, 100, <type 'bool'>)
-        53.8±3μs         39.4±2μs     0.73  bench_function_base.Median.time_even_small
-        106±20μs         76.7±4μs     0.73  bench_function_base.Select.time_select
-        168±10μs          122±4μs     0.72  bench_shape_base.Block2D.time_block2d((512, 512), 'uint32', (2, 2))
-      12.5±0.5μs       8.96±0.4μs     0.72  bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 100, <type 'int'>)
-        162±10μs          115±5μs     0.71  bench_function_base.Percentile.time_percentile
-        12.9±1μs       9.12±0.4μs     0.71  bench_random.Random.time_rng('normal')
-      9.71±0.4μs       6.88±0.3μs     0.71  bench_core.CorrConv.time_convolve(1000, 10, 'full')
-      15.1±0.8μs       10.7±0.4μs     0.71  bench_reduce.MinMax.time_max(<type 'numpy.float64'>)
-         153±9μs          108±7μs     0.71  bench_shape_base.Block2D.time_block2d((1024, 1024), 'uint8', (2, 2))
-         109±5μs         76.9±5μs     0.71  bench_ma.Concatenate.time_it('ndarray+masked', 2)
-        34.3±1μs       24.2±0.6μs     0.71  bench_linalg.Linalg.time_op('norm', 'complex64')
-      9.80±0.2μs       6.84±0.5μs     0.70  bench_core.CorrConv.time_convolve(1000, 10, 'same')
-        27.4±6μs         19.1±2μs     0.70  bench_core.CountNonzero.time_count_nonzero_axis(1, 10000, <type 'bool'>)
-      9.35±0.4μs       6.50±0.3μs     0.70  bench_core.CorrConv.time_convolve(50, 100, 'full')
-        65.2±4μs         45.2±1μs     0.69  bench_shape_base.Block.time_block_simple_row_wise(100)
-        12.9±1μs       8.89±0.3μs     0.69  bench_core.CountNonzero.time_count_nonzero_axis(3, 100, <type 'bool'>)
-        19.6±3μs       13.5±0.4μs     0.69  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 100, <type 'object'>)
-        75.6±2μs         52.1±3μs     0.69  bench_lib.Pad.time_pad((10, 10, 10), (0, 5), 'reflect')
-        12.4±1μs       8.51±0.4μs     0.69  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 100, <type 'bool'>)
-        172±30μs          117±4μs     0.68  bench_ma.Concatenate.time_it('unmasked+masked', 100)
-      23.1±0.5μs       15.8±0.9μs     0.68  bench_linalg.Linalg.time_op('norm', 'int16')
-      8.18±0.9μs       5.57±0.1μs     0.68  bench_core.CorrConv.time_correlate(1000, 10, 'full')
-         153±5μs          103±3μs     0.68  bench_function_base.Percentile.time_quartile
-       758±100μs         512±20μs     0.68  bench_linalg.Linalg.time_op('det', 'int16')
-        55.4±6μs         37.4±1μs     0.68  bench_ma.Concatenate.time_it('masked', 2)
-        234±30μs          157±5μs     0.67  bench_shape_base.Block.time_nested(100)
-         103±4μs         69.3±3μs     0.67  bench_linalg.Eindot.time_dot_d_dot_b_c
-      19.2±0.4μs       12.9±0.6μs     0.67  bench_core.Core.time_tril_l10x10
-         122±7μs         81.7±4μs     0.67  bench_lib.Pad.time_pad((10, 10, 10), 3, 'edge')
-        22.9±1μs       15.3±0.5μs     0.67  bench_linalg.Linalg.time_op('norm', 'int32')
-        16.6±2μs       11.0±0.3μs     0.66  bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 100, <type 'object'>)
-      9.98±0.3μs       6.58±0.1μs     0.66  bench_core.CorrConv.time_convolve(1000, 10, 'valid')
-         118±6μs         77.9±4μs     0.66  bench_shape_base.Block2D.time_block2d((512, 512), 'uint16', (2, 2))
-        212±50μs          140±8μs     0.66  bench_lib.Pad.time_pad((10, 10, 10), (0, 5), 'mean')
-      21.9±0.7μs       14.4±0.5μs     0.66  bench_linalg.Linalg.time_op('norm', 'int64')
-         131±5μs         85.9±5μs     0.65  bench_lib.Pad.time_pad((10, 10, 10), 3, 'constant')
-        56.8±2μs         37.0±3μs     0.65  bench_lib.Pad.time_pad((1000,), (0, 5), 'constant')
-        58.9±3μs         38.1±1μs     0.65  bench_lib.Pad.time_pad((10, 100), (0, 5), 'reflect')
-        72.1±2μs         46.5±3μs     0.64  bench_lib.Pad.time_pad((10, 100), (0, 5), 'constant')
-      8.66±0.3μs       5.58±0.2μs     0.64  bench_core.CorrConv.time_correlate(50, 100, 'full')
-        300±30μs         193±10μs     0.64  bench_shape_base.Block2D.time_block2d((1024, 1024), 'uint8', (4, 4))
-        15.9±5μs       10.2±0.3μs     0.64  bench_core.CountNonzero.time_count_nonzero_axis(3, 100, <type 'int'>)
-      13.7±0.5μs       8.80±0.1μs     0.64  bench_random.Random.time_rng('uniform')
-      8.60±0.5μs       5.50±0.2μs     0.64  bench_core.CorrConv.time_correlate(1000, 10, 'same')
-        44.7±2μs       28.5±0.7μs     0.64  bench_lib.Pad.time_pad((1000,), 1, 'reflect')
-        72.7±3μs         46.2±2μs     0.64  bench_lib.Pad.time_pad((10, 10, 10), 3, 'wrap')
-        567±50μs         360±40μs     0.63  bench_shape_base.Block2D.time_block2d((512, 512), 'uint64', (2, 2))
-        58.0±3μs         36.7±2μs     0.63  bench_lib.Pad.time_pad((10, 100), 3, 'reflect')
-        219±30μs          138±7μs     0.63  bench_lib.Pad.time_pad((10, 100), 1, 'mean')
-        261±60μs         164±10μs     0.63  bench_lib.Pad.time_pad((10, 100), 1, 'linear_ramp')
-       825±100μs         519±30μs     0.63  bench_shape_base.Block2D.time_block2d((512, 512), 'uint64', (4, 4))
-         121±5μs         75.7±2μs     0.63  bench_lib.Pad.time_pad((10, 10, 10), 1, 'constant')
-      8.16±0.2μs       5.08±0.4μs     0.62  bench_core.CorrConv.time_convolve(50, 100, 'same')
-        66.6±3μs         41.3±2μs     0.62  bench_lib.Pad.time_pad((1000,), 3, 'constant')
-        53.1±3μs       32.9±0.8μs     0.62  bench_lib.Pad.time_pad((10, 100), 3, 'wrap')
-        285±60μs         177±10μs     0.62  bench_lib.Pad.time_pad((10, 10, 10), (0, 5), 'linear_ramp')
-      8.30±0.9μs       5.14±0.1μs     0.62  bench_core.CorrConv.time_correlate(1000, 10, 'valid')
-         115±3μs         71.2±3μs     0.62  bench_shape_base.Block2D.time_block2d((256, 256), 'uint64', (2, 2))
-      19.1±0.5μs       11.8±0.6μs     0.62  bench_linalg.Linalg.time_op('norm', 'float64')
-        95.3±5μs         58.6±2μs     0.62  bench_lib.Pad.time_pad((10, 100), 1, 'constant')
-        44.6±1μs       27.2±0.9μs     0.61  bench_lib.Pad.time_pad((1000,), (0, 5), 'edge')
-        447±20μs         270±10μs     0.61  bench_shape_base.Block2D.time_block2d((1024, 1024), 'uint16', (4, 4))
-        53.9±2μs         32.6±2μs     0.60  bench_lib.Pad.time_pad((10, 100), 1, 'wrap')
-        11.6±1μs       6.97±0.4μs     0.60  bench_reduce.MinMax.time_max(<type 'numpy.float32'>)
-        95.9±5μs         57.7±2μs     0.60  bench_lib.Pad.time_pad((10, 100), 3, 'constant')
-        47.2±2μs         28.2±2μs     0.60  bench_lib.Pad.time_pad((1000,), (0, 5), 'reflect')
-      5.51±0.2μs      3.27±0.07μs     0.59  bench_core.CountNonzero.time_count_nonzero(3, 100, <type 'object'>)
-        74.3±3μs         44.0±2μs     0.59  bench_lib.Pad.time_pad((10, 10, 10), (0, 5), 'wrap')
-        76.2±3μs       45.0±0.8μs     0.59  bench_lib.Pad.time_pad((10, 10, 10), 1, 'reflect')
-        57.1±1μs         33.5±2μs     0.59  bench_lib.Pad.time_pad((10, 100), (0, 5), 'wrap')
-        52.0±2μs         30.4±1μs     0.58  bench_lib.Pad.time_pad((1000,), 1, 'edge')
-        42.6±2μs       24.9±0.9μs     0.58  bench_lib.Pad.time_pad((1000,), 3, 'wrap')
-        15.0±3μs       8.73±0.3μs     0.58  bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 100, <type 'bool'>)
-        16.0±3μs       9.29±0.3μs     0.58  bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 100, <type 'int'>)
-        53.1±1μs         30.9±2μs     0.58  bench_lib.Pad.time_pad((1000,), 3, 'edge')
-        88.0±8μs         51.1±3μs     0.58  bench_lib.Pad.time_pad((10, 10, 10), 3, 'reflect')
-        44.6±2μs         25.9±1μs     0.58  bench_lib.Pad.time_pad((1000,), (0, 5), 'wrap')
-        90.3±5μs         51.9±1μs     0.57  bench_shape_base.Block2D.time_block2d((512, 512), 'uint8', (2, 2))
-      15.6±0.5μs       8.93±0.3μs     0.57  bench_linalg.Linalg.time_op('norm', 'float32')
-         102±6μs       58.3±0.9μs     0.57  bench_lib.Pad.time_pad((10, 10, 10), 1, 'edge')
-        80.1±4μs         45.6±3μs     0.57  bench_lib.Pad.time_pad((10, 100), 3, 'edge')
-        44.2±2μs         24.9±1μs     0.56  bench_lib.Pad.time_pad((1000,), 1, 'wrap')
-        71.6±8μs         39.5±1μs     0.55  bench_lib.Pad.time_pad((10, 10, 10), 1, 'wrap')
-       81.7±10μs         44.8±2μs     0.55  bench_lib.Pad.time_pad((10, 100), 1, 'edge')
-        420±90μs         230±10μs     0.55  bench_shape_base.Block.time_3d(10, 'block')
-        114±20μs         62.3±2μs     0.55  bench_lib.Pad.time_pad((10, 10, 10), (0, 5), 'constant')
-      5.76±0.1μs      3.13±0.08μs     0.54  bench_core.CorrConv.time_convolve(50, 10, 'same')
-      5.30±0.1μs      2.84±0.08μs     0.54  bench_core.CorrConv.time_correlate(50, 100, 'valid')
-        92.5±4μs         49.3±1μs     0.53  bench_shape_base.Block2D.time_block2d((256, 256), 'uint32', (2, 2))
-        13.5±3μs       7.07±0.2μs     0.52  bench_reduce.MinMax.time_min(<type 'numpy.float32'>)
-        7.66±1μs       3.88±0.2μs     0.51  bench_core.CorrConv.time_convolve(50, 100, 'valid')
-        29.0±3μs       14.5±0.8μs     0.50  bench_shape_base.Block.time_no_lists(10)
-      6.62±0.3μs       3.30±0.2μs     0.50  bench_core.CorrConv.time_convolve(1000, 1000, 'valid')
-        74.2±7μs       36.2±0.9μs     0.49  bench_shape_base.Block2D.time_block2d((256, 256), 'uint16', (2, 2))
-      5.55±0.3μs       2.70±0.2μs     0.49  bench_core.CorrConv.time_convolve(50, 10, 'valid')
-       73.9±20μs         35.8±2μs     0.48  bench_lib.Pad.time_pad((10, 100), 1, 'reflect')
-        224±20μs          107±7μs     0.48  bench_shape_base.Block2D.time_block2d((256, 256), 'uint64', (4, 4))
-      3.87±0.1μs      1.83±0.06μs     0.47  bench_core.CountNonzero.time_count_nonzero(2, 100, <type 'str'>)
-        109±30μs         51.5±3μs     0.47  bench_lib.Pad.time_pad((10, 10, 10), (0, 5), 'edge')
-        240±20μs          112±4μs     0.47  bench_shape_base.Block2D.time_block2d((512, 512), 'uint16', (4, 4))
-        337±40μs          158±7μs     0.47  bench_shape_base.Block2D.time_block2d((512, 512), 'uint32', (4, 4))
-         188±8μs         88.0±2μs     0.47  bench_shape_base.Block2D.time_block2d((512, 512), 'uint8', (4, 4))
-      4.39±0.2μs      2.04±0.09μs     0.47  bench_core.CountNonzero.time_count_nonzero(3, 10000, <type 'bool'>)
-        73.2±4μs       33.9±0.5μs     0.46  bench_shape_base.Block2D.time_block2d((128, 128), 'uint64', (2, 2))
-        5.48±1μs       2.44±0.1μs     0.45  bench_core.CountNonzero.time_count_nonzero(2, 100, <type 'object'>)
-      4.46±0.1μs      1.97±0.08μs     0.44  bench_core.CorrConv.time_correlate(50, 10, 'full')
-        30.4±9μs       13.3±0.3μs     0.44  bench_shape_base.Block.time_no_lists(1)
-      7.05±0.2μs      3.05±0.06μs     0.43  bench_reduce.SmallReduction.time_small
-        7.35±1μs       3.12±0.2μs     0.42  bench_core.CorrConv.time_convolve(50, 10, 'full')
-      4.36±0.1μs      1.84±0.07μs     0.42  bench_core.CorrConv.time_correlate(50, 10, 'same')
-      3.51±0.2μs      1.46±0.05μs     0.42  bench_core.CountNonzero.time_count_nonzero(1, 100, <type 'object'>)
-     4.03±0.05μs       1.66±0.1μs     0.41  bench_core.CorrConv.time_correlate(1000, 1000, 'valid')
-        199±10μs         80.1±3μs     0.40  bench_shape_base.Block2D.time_block2d((256, 256), 'uint32', (4, 4))
-      3.98±0.2μs      1.60±0.08μs     0.40  bench_core.CountNonzero.time_count_nonzero(2, 10000, <type 'bool'>)
-        61.8±2μs         24.8±1μs     0.40  bench_shape_base.Block2D.time_block2d((256, 256), 'uint8', (2, 2))
-      4.13±0.1μs      1.62±0.05μs     0.39  bench_core.CorrConv.time_correlate(50, 10, 'valid')
-        61.6±2μs         23.9±1μs     0.39  bench_shape_base.Block2D.time_block2d((128, 128), 'uint32', (2, 2))
-        184±10μs         70.5±3μs     0.38  bench_shape_base.Block2D.time_block2d((256, 256), 'uint16', (4, 4))
-        56.1±4μs       21.0±0.9μs     0.38  bench_shape_base.Block2D.time_block2d((64, 64), 'uint64', (2, 2))
-        40.0±2μs       15.0±0.6μs     0.37  bench_shape_base.Block.time_block_simple_column_wise(10)
-         121±2μs         45.1±2μs     0.37  bench_shape_base.Block.time_nested(1)
-         179±4μs         66.1±4μs     0.37  bench_shape_base.Block2D.time_block2d((128, 128), 'uint64', (4, 4))
-        59.8±2μs         22.0±1μs     0.37  bench_shape_base.Block2D.time_block2d((128, 128), 'uint16', (2, 2))
-     3.19±0.05μs      1.17±0.02μs     0.37  bench_core.CountNonzero.time_count_nonzero(1, 100, <type 'str'>)
-        54.0±3μs         19.7±1μs     0.37  bench_shape_base.Block2D.time_block2d((32, 32), 'uint64', (2, 2))
-        56.9±1μs       20.7±0.7μs     0.36  bench_shape_base.Block2D.time_block2d((64, 64), 'uint32', (2, 2))
-      3.14±0.1μs      1.14±0.04μs     0.36  bench_core.CountNonzero.time_count_nonzero(1, 10000, <type 'bool'>)
-        92.7±2μs         33.7±2μs     0.36  bench_shape_base.Block.time_block_complicated(1)
-         104±4μs         37.8±1μs     0.36  bench_shape_base.Block.time_block_complicated(10)
-         128±5μs         45.5±2μs     0.36  bench_shape_base.Block.time_nested(10)
-       196±100μs         69.4±3μs     0.35  bench_ma.Concatenate.time_it('unmasked+masked', 2)
-         153±5μs         53.9±2μs     0.35  bench_shape_base.Block2D.time_block2d((128, 128), 'uint16', (4, 4))
-        39.4±2μs       13.8±0.5μs     0.35  bench_shape_base.Block.time_block_simple_column_wise(1)
-        53.5±2μs         18.7±1μs     0.35  bench_shape_base.Block2D.time_block2d((32, 32), 'uint8', (2, 2))
-        55.2±2μs       19.3±0.6μs     0.35  bench_shape_base.Block2D.time_block2d((32, 32), 'uint16', (2, 2))
-        16.9±1μs       5.89±0.5μs     0.35  bench_core.Core.time_dstack_l
-        60.6±3μs       21.1±0.6μs     0.35  bench_shape_base.Block2D.time_block2d((128, 128), 'uint8', (2, 2))
-      25.5±0.2μs       8.88±0.3μs     0.35  bench_shape_base.Block.time_block_simple_row_wise(10)
-        54.6±3μs       19.0±0.6μs     0.35  bench_shape_base.Block2D.time_block2d((16, 16), 'uint64', (2, 2))
-        52.6±2μs       18.2±0.7μs     0.35  bench_shape_base.Block2D.time_block2d((16, 16), 'uint16', (2, 2))
-        6.57±2μs      2.25±0.08μs     0.34  bench_core.CountNonzero.time_count_nonzero(3, 100, <type 'str'>)
-        24.3±1μs       8.30±0.6μs     0.34  bench_shape_base.Block.time_block_simple_row_wise(1)
-         148±3μs         50.0±3μs     0.34  bench_shape_base.Block2D.time_block2d((16, 16), 'uint32', (4, 4))
-         171±8μs         57.9±4μs     0.34  bench_shape_base.Block2D.time_block2d((256, 256), 'uint8', (4, 4))
-         159±5μs         53.8±1μs     0.34  bench_shape_base.Block2D.time_block2d((64, 64), 'uint64', (4, 4))
-        171±20μs         57.7±2μs     0.34  bench_shape_base.Block2D.time_block2d((128, 128), 'uint32', (4, 4))
-      3.15±0.3μs      1.06±0.03μs     0.34  bench_core.CountNonzero.time_count_nonzero(3, 100, <type 'int'>)
-        55.7±5μs       18.7±0.2μs     0.34  bench_shape_base.Block2D.time_block2d((16, 16), 'uint8', (2, 2))
-         158±7μs         52.6±3μs     0.33  bench_shape_base.Block2D.time_block2d((128, 128), 'uint8', (4, 4))
-         153±4μs         50.7±1μs     0.33  bench_shape_base.Block2D.time_block2d((32, 32), 'uint64', (4, 4))
-         152±7μs         50.3±1μs     0.33  bench_shape_base.Block2D.time_block2d((16, 16), 'uint8', (4, 4))
-        53.6±3μs       17.7±0.4μs     0.33  bench_shape_base.Block2D.time_block2d((16, 16), 'uint32', (2, 2))
-         156±4μs         51.4±3μs     0.33  bench_shape_base.Block2D.time_block2d((64, 64), 'uint8', (4, 4))
-         148±3μs         48.2±2μs     0.33  bench_shape_base.Block2D.time_block2d((16, 16), 'uint16', (4, 4))
-        160±10μs         52.0±1μs     0.33  bench_shape_base.Block2D.time_block2d((64, 64), 'uint32', (4, 4))
-         159±8μs         51.4±3μs     0.32  bench_shape_base.Block2D.time_block2d((64, 64), 'uint16', (4, 4))
-        59.8±3μs         19.3±1μs     0.32  bench_shape_base.Block2D.time_block2d((32, 32), 'uint32', (2, 2))
-         153±4μs         49.4±2μs     0.32  bench_shape_base.Block2D.time_block2d((32, 32), 'uint32', (4, 4))
-      15.6±0.6μs       5.03±0.3μs     0.32  bench_core.Core.time_vstack_l
-         154±7μs         49.7±2μs     0.32  bench_shape_base.Block2D.time_block2d((32, 32), 'uint8', (4, 4))
-        59.6±6μs       19.1±0.8μs     0.32  bench_shape_base.Block2D.time_block2d((64, 64), 'uint8', (2, 2))
-      3.03±0.4μs         969±30ns     0.32  bench_core.CountNonzero.time_count_nonzero(2, 100, <type 'int'>)
-        120±10μs         38.4±2μs     0.32  bench_shape_base.Block.time_3d(1, 'block')
-         156±5μs         49.3±1μs     0.32  bench_shape_base.Block2D.time_block2d((16, 16), 'uint64', (4, 4))
-        164±10μs         49.3±2μs     0.30  bench_shape_base.Block2D.time_block2d((32, 32), 'uint16', (4, 4))
-       65.7±10μs       19.6±0.7μs     0.30  bench_shape_base.Block2D.time_block2d((64, 64), 'uint16', (2, 2))
-     2.82±0.08μs         732±30ns     0.26  bench_core.CountNonzero.time_count_nonzero(1, 100, <type 'int'>)
-     2.77±0.07μs         664±30ns     0.24  bench_core.CountNonzero.time_count_nonzero(2, 100, <type 'bool'>)
-      2.61±0.1μs         624±20ns     0.24  bench_core.CountNonzero.time_count_nonzero(1, 100, <type 'bool'>)
-        16.8±3μs       3.97±0.2μs     0.24  bench_core.Core.time_hstack_l
-      2.78±0.1μs         637±20ns     0.23  bench_core.CountNonzero.time_count_nonzero(3, 100, <type 'bool'>)
-      2.36±0.2μs          207±5ns     0.09  bench_overrides.ArrayFunction.time_mock_broadcast_to_numpy
-      2.68±0.1μs          221±7ns     0.08  bench_overrides.ArrayFunction.time_mock_concatenate_numpy
-      2.58±0.1μs         201±10ns     0.08  bench_overrides.ArrayFunction.time_mock_broadcast_to_duck
-      3.02±0.2μs          222±6ns     0.07  bench_overrides.ArrayFunction.time_mock_concatenate_duck
-      4.29±0.3μs          216±6ns     0.05  bench_overrides.ArrayFunction.time_mock_concatenate_mixed
-        142±20μs          213±8ns     0.00  bench_overrides.ArrayFunction.time_mock_concatenate_many

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

see also https://docs.google.com/spreadsheets/d/15-AFI_cmZqfkU6mo2p1znsQF2E52PEXpF68QqYqEar4/edit#gid=0 for a spreadsheet.

Not surprisingly, the largest performance difference is for functions that call other numpy functions internally many times, e.g., for np.block().

@shoyer - I was slightly disconcerted by the extra time taken... Probably, we really should have a C implementation, but in the meantime I made a PR with some small changes that shave off some time for the common case of only a single type, and for the case where the only type is ndarray. See #12321.

@shoyer - I raised two issues on the mailing list that are probably good to state here too:

  1. Should all unique types of array-like arguments be included in types? (rather than just those of arguments that provide an override.) It would seem helpful to know for implementations. (see #12327).
  2. Should the ndarray.__array_function__ implementation accept subclasses even if they override __array_function__? This would be reasonable given the Liskov substitution principle and given that the subclass already had a chance to bail. It would imply calling the implementation rather than the public function inside ndarray.__array_function__. (And something similar in __array_ufunc__...) See #12328 for a trial for __array_function__ only.

@shoyer - see #12327 for a quick implemenation of (1) - if we do go this route, I think we should also adjust the NEP.

And #12328 for a trial of (2), mostly to see how it looks.

I’m +1 on both modifications here.

The name of dispatcher functions in error messages came up again in https://github.com/numpy/numpy/pull/12789, where someone was surprised to see TypeError: _pad_dispatcher missing 1 required positional argument

In addition to the alternatives outlined above https://github.com/numpy/numpy/issues/12028#issuecomment-429377396 (we currently use 2), I'll add a fourth option:

Option 4: Write a separate dispatcher for each function, with the same name as the function:

def sin(a):
    return (a,)


@array_function_dispatch(sin)
def sin(a):
     ...


def cos(a):
    return (a,)


@array_function_dispatch(cos)
def cos(a):
    ...

Advantages:

  • Python now always supplies the right function names in the error message.

Disadvantages:

  • More code repetition
  • More indirection -- it's no longer clear which function name pad received the wrong arguments (but we do have tests to verify that they are kept in sync).

I think in order to keep current code working, the actual function must come _after_ the dispatcher.

Right, but we can give it the same name as the dispatcher. The dispatcher name will get overwritten.

It would great to be able to define custom dispatching for function like np.arange or np.empty.

I guess one option would be for to NumPy to dispatch on scalars as well as arrays. Is this incompatible with the NEP? Would anything break with this change?

For discussion about np.arange, see https://github.com/numpy/numpy/issues/12379.

I don't see how np.empty() could do dispatching -- there's nothing to dispatch on, just a shape and a dtype. But certainly np.empty_like() could do dispatching with an overwritten shape -- that's exactly what https://github.com/numpy/numpy/pull/13046 is about supporting.

Option 4: Write a separate dispatcher for each function, with the same name as the function:

Any objections to adopting this option? I think it's probably the friendliest choice from a user perspective.

I don't see how np.empty() could do dispatching -- there's nothing to dispatch on, just a shape and a dtype

You might want to dispatch on either of those. For example, here is a custom shape object, that we might want to dispatch differently on.

Screen Shot 2019-04-03 at 1 06 46 PM

This example isn't very useful, but the idea is that I have a lazy object that behaves like shape, but doesn't return integers, it returns expressions. For example, it would nice to be able to do something like this:

class ExprShape:
    def __getitem__(self, i):
        return ('getitem', self, i)
    def __len__(self):
        return ('len', self)

numpy.empty(ExprShape())

Which would I would like to override to return something like ExprArray('empty', ExprShape()).

Yes, in principle we could dispatch on shape, too. That would add additional complexity/overhead to the protocol. Do you have use-cases where using an array as a template (like empty_like with shape) would not suffice?

The other cases I can think of is the size argument to np.random.RandomState methods, but note that we currently don't support those at all -- see http://www.numpy.org/neps/nep-0018-array-function-protocol.html#callable-objects-generated-at-runtime

Do you have use-cases where using an array as a template (like empty_like with shape) would not suffice?

If we are taking an existing API that depends on NumPy and would like to transparently have it work on a different backend, without changing the existing source code.

For example, let's say we were attempting to call scipy.optimize.differential_evolution with NP like arrays, that build up a call graph instead of executing immediately.

You can see here it would be helpful if we could change np.full to create a symbolic array instead of a default numpy array, if the input passed into it was also symbolic.

If we are taking an existing API that depends on NumPy and would like to transparently have it work on a different backend, without changing the existing source code.

This isn't possible in general. Explicit array construction like np.array() is definitely going to need to be rewritten to be compatible with duck typing.

In this case, switching energies = np.full(num_members, np.inf) to energies = np.full_like(population, np.inf, shape=num_members) seems like an easy and readable change.

This isn't possible in general. Explicit array construction like np.array() is definitely going to need to be rewritten to be compatible with duck typing.

Is there a proposal around making this sort of change or are you saying that supporting dispatching of np.array would be really hard and so we can't ever get to 100% support?

In this case, switching energies = np.full(num_members, np.inf) to energies = np.full_like(population, np.inf, shape=num_members) seems like an easy and readable change.

Definitely. But there are many cases where either you don't control the source code, or you want to support users in using the functions they know and love as much as possible.

There are other ways to provide users with that experience like:

  • Providing a new module that acts like numpy but behaves how you would like. Requires users to change their imports
  • Inspect the source to understand the behavior. ala numba or tangent.

Both of those options might be required in certain cases (like letting users call np.full and return a symbolic result currently), but if I understand correctly, the goal of NEP-18 is to try to limit when those are needed and let people use the original NumPy in more cases.

I understand there is a performance/complexity tradeoff here and that might be a good reason not to implement these. But it might force users to explore other means to get the flexibility they desire.

Is there a proposal around making this sort of change or are you saying that supporting dispatching of np.array would be really hard and so we can't ever get to 100% support?

NEP 22 has some discussion of the options here. I don't think we can safely change the semantics of np.asarray() to return anything other than a numpy.ndarray object -- we will need a new protocol for this.

The problem is that np.asarray() is currently the idiomatic way of casting to a numpy array object, which uses can and do expect to exactly match numpy.ndarray, e.g., down to the memory layout.

There are certainly plenty of use-cases where this isn't the case, but switching this behavior would break lots of downstream code, so it's a non-starter. Downstream projects will need to opt-in to at least this aspect of array duck typing.

I understand there is a performance/complexity tradeoff here and that might be a good reason not to implement these. But it might force users to explore other means to get the flexibility they desire.

Yes. NEP 18 is not intended to be a complete solution for drop-in NumPy alternatives, but its a step in that direction.

I've drafted a revision to NEP-18 for adding a __numpy_implementation__ attribute:
https://github.com/numpy/numpy/pull/13305

It occurs to me that we forget to warp the functions in numpy.testing: https://github.com/numpy/numpy/issues/13588

I'm going to do that shortly...

There's one revision that I'd like to see to the NEP, specifically to clarify what guarantees NEP-18 offers to subclass authors: https://github.com/numpy/numpy/pull/13633

I marked the usability tasks complete since gh-13329 was fixed. We decidedgh-#13588 can wait till after the release of 1.17. That leaves documentation improvements and arange gh-12379 still open for inclusion in 1.17.

There's also #13728 - a bug in the dispatcher for histogram[2d]d

That leaves documentation improvements and arange gh-12379 still open for inclusion in 1.17.

An issue for documentation was missing, so I opened gh-13844. I think docs are a lot more important than the arange open issue.

@shoyer can we close this?

Was this page helpful?
0 / 5 - 0 ratings