Opening this issue after some discussion with **@shoyer**, **@pentschev**, and **@mrocklin** in issue ( https://github.com/dask/dask/issues/4883 ). AIUI this was discussed in NEP 22 (so I'm mainly parroting other people's ideas here to renew discussion and correct my own misunderstanding ;).

It would be useful for various downstream array libraries to have a function to ensure we have some duck array (like `ndarray`

). This would be somewhat similar to `np.asanyarray`

, but without the requirement of subclassing. It would allow libraries to return their own (duck) array type. If no suitable conversion was supported by the object, we could fallback to handle `ndarray`

subclasses, `ndarray`

s, and coercion of other things (nested lists) to `ndarray`

s.

cc **@njsmith** (who coauthored NEP 22)

The proposed implementation would look something like the following:

```
import numpy as np
# hypothetical np.duckarray() function
def duckarray(array_like):
if hasattr(array_like, '__duckarray__'):
# return an object that can be substituted for np.ndarray
return array_like.__duckarray__()
return np.asarray(array_like)
```

Example usage:

```
class SparseArray:
def __duckarray__(self):
return self
def __array__(self):
raise TypeError
np.duckarray(SparseArray()) # returns a SparseArray object
np.array(SparseArray()) # raises TypeError
```

Here I've used `np.duckarray`

and `__duckarray__`

as placeholders, but we can probably do better for these names. See the Terminology from NEP 22:

“Duck array” works fine as a placeholder for now, but it’s pretty jargony and may confuse new users, so we may want to pick something else for the actual API functions. Unfortunately, “array-like” is already taken for the concept of “anything that can be coerced into an array” (including e.g. list objects), and “anyarray” is already taken for the concept of “something that shares ndarray’s implementation, but has different semantics”, which is the opposite of a duck array (e.g., np.matrix is an “anyarray”, but is not a “duck array”). This is a classic bike-shed so for now we’re just using “duck array”. Some possible options though include: arrayish, pseudoarray, nominalarray, ersatzarray, arraymimic, …

Some other name ideas: `np.array_compatible()`

, `np.array_api()`

....

`np.array_compatible`

could work, although I'm not sure I like it better than `duckarray`

. `np.array_api`

I don't like, gives the wrong idea imho.

Since after a long time we haven't come up with a better name, perhaps we should just bless the "duck-array" name......

I like the compatible word, maybe we can think of variations along that line as well `as_compatible_array`

(somewhat implies that all compatible objects are arrays). The `as`

is maybe annoying (partially because all `as`

functions have no spaces). "duck" seems nice in libraries, but I think a bit strange for random people seeing it. So I think I dislike "duck" if and only if we want downstream users to use it a lot (i.e. even when I start writing a small tool for myself/a small lab).

Maybe `quack_array`

:)

To extend a bit on the topic, there's one other case that isn't covered with `np.duckarray`

, which is the creation of new arrays with a type based on an existing type, similar to what functions such as `np.empty_like`

do. Currently we can do things like this:

```
>>> import numpy as np, cupy as cp
>>> a = cp.array([1, 2])
>>> b = np.ones_like(a)
>>> type(b)
<class 'cupy.core.core.ndarray'>
```

On the other hand, if we have an `array_like`

that we would like to create a CuPy array from via NumPy's API, that's not possible. I think it would be helpful to have something like:

```
import numpy as np, cupy as cp
a = cp.array([1, 2])
b = [1, 2]
c = np.asarray(b, like=a)
```

Any ideas/suggestions on this?

Maybe np.copy_like? We would want to define carefully which properties

(e.g., including dtype or not) are copied from the other array.

On Mon, Jul 1, 2019 at 5:40 AM Peter Andreas Entschev <

[email protected]> wrote:

To extend a bit on the topic, there's one other case that isn't covered

with np.duckarray, which is the creation of new arrays with a type based

on an existing type, similar to what functions such as np.empty_like do.

Currently we can do things like this:import numpy as np, cupy as cp>>> a = cp.array([1, 2])>>> b = np.ones_like(a)>>> type(b)

On the other hand, if we have an array_like that we would like to create

a CuPy array from via NumPy's API, that's not possible. I think it would be

helpful to have something like:import numpy as np, cupy as cp

a = cp.array([1, 2])

b = [1, 2]

c = np.asarray(b, like=a)Any ideas/suggestions on this?

—

You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub

https://github.com/numpy/numpy/issues/13831?email_source=notifications&email_token=AAJJFVRCWDHRAXHHRDHXXM3P5H3LRA5CNFSM4H3HQWAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY57YVQ#issuecomment-507247702,

or mute the thread

https://github.com/notifications/unsubscribe-auth/AAJJFVRSYHUYHMPWQTW2NLLP5H3LRANCNFSM4H3HQWAA

.

`np.copy_like`

sounds good too. I agree, we most likely should have ways to control things such as `dtype`

.

Sorry for the beginner's question, but should something like `np.copy_like`

be an amendment to NEP-22, should it be discussed in the mailing list, or what would be the most appropriate approach to that?

We don't really have strict rules about this, but I would lean towards putting `np.copy_like`

and `np.duckarray`

(or whatever we call it) together into a new NEP on coercing/creating duck arrays, one that is prescriptive like NEP 18 rather than "Informational" like NEP 22. It doesn't need to be long, most of the motivation is already clear from referencing NEP 18/22.

One note about `np.copy_like()`

: it should definitely do dispatching with `__array_function__`

(or something like it), so operations like `np.copy_like(sparse_array, like=dask_array)`

could be defined either on either array type.

Great, thanks for the info, and I agree with your dispatching proposal. I will work on an NEP for the implementation of both `np.duckarray`

and `np.copy_like`

and submit a draft PR this week for that.

Awesome, thank you Peter!

On Mon, Jul 1, 2019 at 9:29 AM Peter Andreas Entschev <

[email protected]> wrote:

will work on an NEP for the implementation of both np.duckarray and

np.copy_like and submit a draft PR this week for that.—

You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub

https://github.com/numpy/numpy/issues/13831?email_source=notifications&email_token=AAJJFVW2YUBNUCJZK6JWDBTP5IWHNA5CNFSM4H3HQWAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY6VM3Q#issuecomment-507336302,

or mute the thread

https://github.com/notifications/unsubscribe-auth/AAJJFVR2KTPAZ4JPWDYYMFLP5IWHNANCNFSM4H3HQWAA

.

My pleasure, and thanks a lot for the ideas and support with this work!

The `array_like`

and `copy_like`

functions would be a little odd to have in the main namespace I think, since we can't have a default implementation (at least not one that would do the right think for cupy/dask/sparse/etc), right? They're only useful when overridden. Or am I missing a way to create arbitrary non-numpy array objects here?

It's true, these would only really be useful if you want to support duck typing. But certainly `np.duckarray`

and `np.copy_like`

would work even if the arguments are only NumPy arrays -- they would just be equivalent to `np.array`

/`np.copy`

.

All array implementations have a `copy`

method right? Using that instead of `copy_like`

should work, so why add a new function?

`array_like`

I can see the need for, but we may want to discuss where to put it.

`np.duckarray`

does make sense to me.

I would lean towards putting np.copy_like and np.duckarray (or whatever we call it) together into a new NEP on coercing/creating duck arrays, one that is prescriptive like NEP 18 rather than "Informational" like NEP 22.

+1

array_like I can see the need for, but we may want to discuss where to put it.

That's actually the case which I would like to have addressed with something like `np.copy_like`

. I haven't tested, but probably `np.copy`

already dispatches correctly if the array is non-NumPy.

Just to be clear, are you referring also to a function `np.array_like`

? I intentionally avoided such a name because I thought it could be confusing to all existing references to `array_like`

-arrays. However, I do now realize that `np.copy_like`

may imply a necessary copy, and I think it would be good to have a behavior similar to `np.asarray`

, where the copy only happens if it's not already a NumPy array. In the case discussed here, the best would be to make the copy only if `a`

is not the same type as `b`

in a call such as `np.copy_like(a, like=b)`

.

I haven't tested, but probably

`np.copy`

already dispatches correctly if the array is non-NumPy.

It should, it's decorated to support `__array_function__`

.

Just to be clear, are you referring also to a function

`np.array_like`

? I intentionally avoided such a name because I thought it could be confusing to all existing references to array_like-arrays.

Yes. And yes agree it can be confusing.

However, I do now realize that

`np.copy_like`

may imply a necessary copy,

Yes that name implies a data copy.

may imply a necessary copy, and I think it would be good to have a behavior similar to

`np.asarray`

,

I thought that that was `np.duckarray`

.

I think Peter's example above might help clarify this. Copied below and subbed in `np.copy_like`

for simplicity.

```
import numpy as np, cupy as cp
a = cp.array([1, 2])
b = [1, 2]
c = np.copy_like(b, like=a)
```

I thought that that was np.duckarray.

Actually, `np.duckarray`

will basically do nothing and just return the array itself (if overriden), else return `np.asarray`

(leading to a NumPy array). We can't get a CuPy array from a Python list with it, for example. We still need a function that can be dispatched to CuPy (or any other `like=`

array) for an `array_like`

.

Thanks **@jakirkham** for the updated example.

`c = np.copy_like(b, like=a)`

So that will dispatch to CuPy via `a.__array_function__`

and fail if that attribute doesn't exist (e.g. `a=<scipy.sparse matrix>`

wouldn't work)? It feels like we need a new namespace or new interoperability utilities package for those kind of things. Either that or leave it to a more full-featured future dispatching mechanism where one could simple do:

```
with cupy_backend:
np.array(b)
```

Introducing new functions in the main namespace that don't make sense for NumPy itself to support working around a limitation of `__array_function__`

seems a bit unhealthy....

So that will dispatch to CuPy via

`a.__array_function__`

and fail if that attribute doesn't exist (e.g.`a=<scipy.sparse matrix>`

wouldn't work)?

I wouldn't say it has to fail necessarily. We could default to NumPy and raise a warning (or don't raise it at all), for example.

It feels like we need a new namespace or new interoperability utilities package for those kind of things. Either that or leave it to a more full-featured future dispatching mechanism

Certainly it would be nice to have a full-featured dispatching mechanism, but I imagine this wasn't done before due to its complexity and backwards compatibility issues? I wasn't around when discussions happened, so just guessing.

Introducing new functions in the main namespace that don't make sense for NumPy itself to support working around a limitation of __array_function__ seems a bit unhealthy....

I certainly see your point, but I also think that if we move too many things away from main namespace, it could scare users off. Maybe I'm wrong and this is just an impression. Either way, I'm not at all proposing to implement functions that won't work with NumPy, but perhaps only not absolutely necessary when using NumPy by itself.

Introducing new functions in the main namespace that don't make sense for NumPy itself to support working around a limitation of __array_function__ seems a bit unhealthy....

Actually, in this sense, also `np.duckarray`

wouldn't belong in the main namespace.

Actually, in this sense, also

`np.duckarray`

wouldn't belong in the main namespace.

I think that one is more defensible (analogous to `asarray`

and it would basically check "does this meet our definition of a ndarray-like duck type"), but yes. If we also want to expose `array_function_dispatch`

, and we have things `np.lib.mixins.NDArrayOperatorsMixin`

and plan on writing more mixins, a sensible new submodule for all things interoperability related could make sense.

Certainly it would be nice to have a full-featured dispatching mechanism, but I imagine this wasn't done before due to its complexity and backwards compatibility issues? I wasn't around when discussions happened, so just guessing.

I think there's multiple reasons. `__array_function__`

is similar to things we already had, so it's easier to reason about. It has low overhead. It could be designed and implemented on a ~6 month timescale, and **@shoyer** made a strong case that we needed that. And we had no concrete alternative.

sensible new submodule for all things interoperability related could make sense.

No real objections from me, I think it's better to have functionality somewhere rather than nowhere. :)

I think there's multiple reasons. __array_function__ is similar to things we already had, so it's easier to reason about. It has low overhead. It could be designed and implemented on a ~6 month timescale, and

@shoyermade a strong case that we needed that. And we had no concrete alternative.

But if we want to leverage `__array_function__`

more broadly, do we have other alternatives now to implementing things like `np.duckarray`

and `np.copy_like`

(or whatever else we would decide to call it)? I'm open to all alternatives, but right now I don't see any, of course, rather than going the full-feature dispatching way, which is likely going to take a long time and limit the scope of `__array_function__`

tremendously (and basically rendering it impractical for most of the more complex cases I've seen).

But if we want to leverage

`__array_function__`

more broadly, do we have other alternatives now to implementing things like`np.duckarray`

and`np.copy_like`

(or whatever else we would decide to call it)?

I think you indeed need a set of utility features like that, to go from covering some fraction of use cases to >80% of use cases. I don't think there's a way around that. I just don't like cluttering up the main namespace, so propose to find a better place for those.

I'm open to all alternatives, but right now I don't see any, of course, rather than going the full-feature dispatching way, which is likely going to take a long time and limit the scope of

`__array_function__`

tremendously (and basically rendering it impractical for most of the more complex cases I've seen).

I mean, we're just plugging a few obvious holes here right? We're never going to cover all of the "more complex cases". Say you want to override `np.errstate`

or `np.dtype`

, that's just not going to happen with the protocol-based approach.

As for alternatives, uarray is not yet there and I'm not convinced yet that the overhead will be pushed down low enough to be used by default in NumPy, but it's getting close and we're about to try it to create the `scipy.fft`

backend system (WIP PR: https://github.com/scipy/scipy/pull/10383). If that does prove itself there, it should be considered as a complete multiple dispatch solution. And it already has a numpy API with Dask/Sparse/CuPy/PyTorch/XND backends, some of which are complete enough to be usable: https://github.com/Quansight-Labs/uarray/tree/master/unumpy

The dispatch approach with uarray is certainly interesting. Though I'm still concerned about how we handle meta-arrays (like Dask, xarray, etc.). Please see this comment for details. It's unclear this has been addressed (though please correct me if I've missed something). I'd be interested in working with others at SciPy to try and hash out how we solve this problem.

Please see this comment for details. It's unclear this has been addressed (though please correct me if I've missed something).

I think the changes of the last week resolve that, but not sure - let's leave that for another thread.

I'd be interested in working with others at SciPy to try and hash out how we solve this problem.

I'll be there, would be great to meet you in person.

Maybe `np.coerce_like()`

or `np.cast_like()`

would be a better names than `copy_like`

, so that it's clear that copies are not necessarily required. The desired functionality is indeed pretty similar to the `.cast()`

method, except we want to convert array types as well as dtypes, and it should be a function rather than a protocol so it can be implemented by either argument.

The dispatch approach with uarray is certainly interesting. Though I'm still concerned about how we handle meta-arrays (like Dask, xarray, etc.).

`uarray`

has support for multiple backends so something like this should work

```
with ua.set_backend(inner_array_backend), ua.set_backend(outer_array_backend):
s = unumpy.sum(meta_array)
```

This could be done by having the meta-array call `ua.skip_backend`

inside of its implementation, or if the meta-array's backend returns `NotImplemented`

on type mismatch.

cc: **@hameerabbasi**

I’ll expand on this: As a general rule, for `dask.array`

, anything with `da`

would be written without a skip_backend. Anything with NumPy would need a skip_backend.

Or for `da`

you can always skip dispatch and call your own implementation directly and have `skip_backend(dask.array)`

everywhere.

As for dispatching functions that don’t have an array attached, like `ones`

, `cast`

, you would just set a backend and be done. Same for `np.errstate`

and `np.dtype`

. There’s an example covering `np.ufunc`

in `unumpy`

.

As for the original issue, `uarray`

provides the `__ua_convert__`

Protocol, which does exactly this. An alternative would be for backends to override `asarray`

directly.

Thanks for the heads up on `uarray`

, **@rgommers**, **@peterbell10**, @hameerabbasi.

But as I see, you _must_ set the proper backend before launching computation, is that correct? One of the advantages of `__array_function__`

is libraries can be entirely agnostic of other libraries, such as Dask doesn't need to know of the existence of CuPy, for example.

**@pentschev** This was the case until recently, when we added the ability to “register” a backend, but we recommend only NumPy (or a reference implementation) does this. Then users using Dask would need just a single set_backend.

Got it, I guess this is what **@rgommers** mentioned in https://github.com/numpy/numpy/issues/13831#issuecomment-507432311, pointing to the backends in https://github.com/Quansight-Labs/uarray/tree/master/unumpy.

Sorry for so many questions, but what if some hypothetical application relies on various backends, for example, both NumPy and Sparse, where depending on the user input, maybe everything will be NumPy-only, Sparse-only, or a mix of both. **@peterbell10** mentioned multiple backends are supported https://github.com/numpy/numpy/issues/13831#issuecomment-507458331, but can the selection of backend be made automatic or would there be a need to handle the three cases separately?

So, for this case, you would ideally register NumPy, use a context manager for Sparse, and return `NotImplemented`

from sparse when appropriate, which would make something fall-back to NumPy.

At SciPy **@rgommers**, **@danielballan**, and myself talked about this issue. We concluded it would be valuable to proceed with adding `duckarray`

(using that name). That said, it sounded like this would be slated for 1.18. Though please correct me if I misunderstood things. Given this, would be alright to start a PR?

We concluded it would be valuable to proceed with adding

`duckarray`

(using that name). That said, it sounded like this would be slated for 1.18. Though please correct me if I misunderstood things. Given this, would be alright to start a PR?

This all sounds great to me, but it would be good to start with a short NEP spelling out the exact proposal. See https://github.com/numpy/numpy/issues/13831#issuecomment-507334210

Sure that makes sense. 🙂

As for the copying point that has been brought up previously, I'm curious if this isn't solved through existing mechanisms. In particular what about these lines?

```
a2 = np.empty_like(a1)
a2[...] = a1[...]
```

Admittedly it would be nice to get this down to one line. Just curious whether this already works for that use case or if we are missing things.

We concluded it would be valuable to proceed with adding duckarray (using that name).

This all sounds great to me, but it would be good to start with a short NEP spelling out the exact proposal. See #13831 (comment)

I have already started to write that, haven't been able to complete it yet though (sorry for my bad planning https://github.com/numpy/numpy/issues/13831#issuecomment-507336302).

As for the copying point that has been brought up previously, I'm curious if this isn't solved through existing mechanisms. In particular what about these lines?

`a2 = np.empty_like(a1) a2[...] = a1[...]`

Admittedly it would be nice to get this down to one line. Just curious whether this already works for that use case or if we are missing things.

You can do that, but it may require special copying logic (such as in CuPy https://github.com/cupy/cupy/pull/2079).

That said, a copy function may be best, to avoid this sort additional code from being necessary.

On the other hand, this would be sort of a replacement for `asarray`

. So I was wondering if instead of some `copy_like`

new function, we would instead want to revisit the idea suggested by NEP-18:

These will need their own protocols:

...

array and asarray, because they are explicitly intended for coercion to actual numpy.ndarray object.

If there's a chance we would like to revisit that, maybe would be better to start a new thread. Any ideas, suggestions, objections?

Just to be clear on my comment above, I myself don't know if a new protocol is a great idea (probably many cumbersome details that I don't foresee are involved), really just wondering if that's an idea we should revisit and discuss.

The consensus from the dev meeting and sprint at SciPy'19 was: let's get 1.17.0 out the door and get some real-world experience with it before taking any next steps.

really just wondering if that's an idea we should revisit and discuss.

probably yes, but in a few months.

probably yes, but in a few months.

Ok, thanks for the reply!

As for the copying point that has been brought up previously, I'm curious if this isn't solved through existing mechanisms. In particular what about these lines?

`a2 = np.empty_like(a1) a2[...] = a1[...]`

Admittedly it would be nice to get this down to one line. Just curious whether this already works for that use case or if we are missing things.

My main issue with this is that it wouldn't work for duck arrays that are immutable, which is not terribly uncommon. Also, for NumPy the additional cost of allocating an array and then filling it may be nearly zero, but I'm not sure that's true for all duck arrays.

`a2 = np.empty_like(a1) a2[...] = a1[...]`

You can do that, but it may require special copying logic (such as in CuPy cupy/cupy#2079).

That said, a copy function may be best, to avoid this sort additional code from being necessary.

On the other hand, this would be sort of a replacement for

`asarray`

. So I was wondering if instead of some`copy_like`

new function, we would instead want to revisit the idea suggested by NEP-18:These will need their own protocols:

...

array and asarray, because they are explicitly intended for coercion to actual numpy.ndarray object.If there's a chance we would like to revisit that, maybe would be better to start a new thread. Any ideas, suggestions, objections?

I don't think it's a good idea to change the behavior of `np.array`

or `np.asarray`

with a new protocol. Their established meaning is to cast to NumPy arrays, which is basically why we need `np.duckarray`

That said, we could consider adding a `like`

argument to `duckarray`

. That would require changing the protocol from the simplified proposal above -- maybe to use `__array_function__`

instead of a dedicated protocol like `__duckarray__`

? I haven't really thought this through.

`a2 = np.empty_like(a1) a2[...] = a1[...]`

My main issue with this is that it wouldn't work for duck arrays that are immutable, which is not terribly uncommon. Also, for NumPy the additional cost of allocating an array and then filling it may be nearly zero, but I'm not sure that's true for all duck arrays.

That's fair. Actually we can already simplify things. For instance this works with CuPy and Sparse today.

```
a2 = np.copy(a1)
```

That's fair. Actually we can already simplify things. For instance this works with CuPy and Sparse today.

`a2 = np.copy(a1)`

Yes, but we also want "copy this duck-array into the type of this other duck-array"

I don't think it's a good idea to change the behavior of

`np.array`

or`np.asarray`

with a new protocol. Their established meaning is to cast to NumPy arrays, which is basically why we need`np.duckarray`

I'm also unsure about this, and I was reluctant even to raise this question, this is why I hadn't until today.

That said, we could consider adding a like argument to duckarray. That would require changing the protocol from the simplified proposal above -- maybe to use __array_function__ instead of a dedicated protocol like __duckarray__? I haven't really thought this through.

I don't know if there would be any complications with that, we probably need some careful though, but I tend to like this idea. That would seem redundant in various levels, but maybe to follow the existing pattern, instead of adding a `like`

parameter we could have `duckarray`

and `duckarray_like`

?

Yes, but we also want "copy this duck-array into the type of this other duck-array"

What about basing this around `np.copyto`

?

What about basing this around

`np.copyto`

?

Feel free to correct me if I'm wrong, but I'm assuming you mean something like:

```
np.copyto(cupy_array, numpy_array)
```

That could work, assuming NumPy is willing to change the current behavior, e.g., `asarray`

always implies the destination is a NumPy array, does `copyto`

make the same assumption?

`np.copyto`

already supporting dispatching with `__array_function__`

, but it's roughly equivalent to:

```
def copyto(dst, src):
dst[...] = src
```

We want the equivalent of:

```
def copylike(src, like):
dst = np.empty_like(like)
dst[...] = src
return dst
```

`np.copyto`

already supporting dispatching with`__array_function__`

, but it's roughly equivalent to:`def copyto(dst, src): dst[...] = src`

We want the equivalent of:

`def copylike(src, like): dst = np.empty_like(like) dst[...] = src return dst`

Correct, this is what we want. `copyto`

gets dispatched and works if source and destination have the same type, we need something that allows dispatching to the destination array's library.

Well `copyto`

could still make sense depending on how we thinking of it. Take for example the following use case.

```
np.copyto(cp.ndarray, np.random.random((3,)))
```

This could translate into something like allocate and copy over the data as we have discussed. If we dispatch around `dst`

(`cp.ndarray`

in this case), then libraries with immutable arrays could implement this in a suitable manner as well. It also saves us from adding a new API (that NumPy merely provides, but doesn't use), which seemed to be a concern.

Just to surface another thought that occurred to me recently, it's worthing thinking about what these APIs will mean downstream between other libraries (for instance how Dask and Xarray interact).

Was this page helpful?

0 / 5 - 0 ratings

## Most helpful comment

Maybe

`quack_array`

:)charrison 30 Jun 2019