Numpy: BUG: np.vectorize() functions operate twice on the first element (as seen when the function modifies a mutable object)

Created on 8 Mar 2017  ·  7Comments  ·  Source: numpy/numpy

I have a list of dictionaries. I'm trying to use np.vectorize to apply a function that modifies dictionary elements for each dictionary in the list. The results seem to show that vectorize is acting twice on the first element. Is this a bug that can be fixed?(perhaps related to the fact that vectorize checks type on the first element?) Below are some example cases and output:

A simple test case with no dictionary modifications:

def fcn1(x):
    return x['b']
a = [{'b': 1} for _ in range(3) ]
print(a)
print(np.vectorize(fcn1)(a))
print(a, '\n\n')

output:

[{'b': 1}, {'b': 1}, {'b': 1}]
[1 1 1]
[{'b': 1}, {'b': 1}, {'b': 1}]

Now modify the dictionary and see that the function is applied twice to the first element:

def fcn2(x):
    x['b'] += 1
    return x['b']
a = [{'b': 1} for _ in range(3) ]
print(a)
print(np.vectorize(fcn2)(a))
print(a, '\n\n')

output:

[{'b': 1}, {'b': 1}, {'b': 1}]
[3 2 2]
[{'b': 3}, {'b': 2}, {'b': 2}]  

Try a different modification to check consistency of the bug:

def fcn3(x):
    x['b'] *= 2
    return x['b']
a = [{'b': 1} for _ in range(3) ]
print(a)
print(np.vectorize(fcn3)(a))
print(a, '\n\n')

output:

[{'b': 1}, {'b': 1}, {'b': 1}]
[4 2 2]
[{'b': 4}, {'b': 2}, {'b': 2}]    

You can do the same thing without actually providing a return value (which is how i'm trying to use it in my use case):

def fcn4(x):
    x['b'] += 1
a = [{'b': 1} for _ in range(3) ]
print(a)
np.vectorize(fcn4)(a)
print(a, '\n\n')

output:

[{'b': 1}, {'b': 1}, {'b': 1}]
[{'b': 3}, {'b': 2}, {'b': 2}]

And by the way, there is nothing special about a list of length 3, you can change that and see the same behavior of only the first element being double-modified.

I'm confirmed the behavior using both Numpy version 1.11.3 and 1.12.0

EDIT:
I found a work-around that also confirms its a "testing the type on the first element" issue. If you specify the otypes argument, the first element doesn't get hit twice:

def fcn(x):
    x['b'] += 1
    return x['b']
a = [{'b': 1} for _ in range(3)]
print a
print np.vectorize(fcn, otypes=[dict])(a)
print a, '\n\n'

output:

[{'b': 1}, {'b': 1}, {'b': 1}]
[2 2 2]
[{'b': 2}, {'b': 2}, {'b': 2}]
00 - Bug

Most helpful comment

"If otypes is not specified, then a call to the function with the first argument will be used to determine the number of outputs. The results of this call will be cached if cache is True to prevent calling the function twice. However, to implement the cache, the original function must be wrapped which will slow down subsequent calls, so only do this if your function is expensive."
https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html

So it's well-documented behavior, but does seem counterintuitive. So maybe its more of an enhancement than a bug fix.

All 7 comments

I just edited my post with a new test that seems to confirm checking the first item type IS what causes the problem

Simpler testcase:

a = np.array([1, 2, 3])
def f(x):
    print('got', x)
    return x
fv = np.vectorize(f)
y = fv(a)

Gives:

got 1
got 1
got 2
got 3

"If otypes is not specified, then a call to the function with the first argument will be used to determine the number of outputs. The results of this call will be cached if cache is True to prevent calling the function twice. However, to implement the cache, the original function must be wrapped which will slow down subsequent calls, so only do this if your function is expensive."
https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html

So it's well-documented behavior, but does seem counterintuitive. So maybe its more of an enhancement than a bug fix.

ah, clearly I didn't read all of the docs well enough. That does fix the issue of double calls but at the cost of slower execution. So I guess there is no way to prevent this double-calling while still maintaining performance?

for example, in numpy/lib/function_base.py in the vectorize class function _get_ufunc_and_otypes(), I would naively think you could modify these lines:

inputs = [arg.flat[0] for arg in args]
outputs = func(*inputs)

to:

#earlier
import copy

...

inputs = copy.deepcopy([arg.flat[0] for arg in args])
outputs = func(*inputs)

And then you would not have to use the cacheing or specify otypes but I think you avoid hitting the actual mutable elements twice. But I'm ignorant to how much of a performance hit that would give compared to the cacheing.

I'm just getting the sense that the caching behavior was designed with expensive function execution time in mind, not thinking about the case of a function modifying a mutable object. I would think it's potentially possible to accommodate mutable object modification without the performance hit of caching that had a long function execution time in mind.

I think that for vectorized operations on arrays of very large objects, it might actually be more computationally intensive to make a deep copy of the first element than it would cost to apply the function. So it might be a good idea to have that functionality if the user doesn't specify otype, but then say in the documentation that performance may take a hit unless otype is specified for arrays of large objects. @eric-wieser what do you think?

copy.deepcopy() is not safe on many inputs, so unfortunately that's not a viable option.

The only way to fix this would be to rewrite the core of vectorize, which currently uses numpy.frompyfunc to create an actually numpy ufunc. An alternate inner loop would need to be created, similar to what we use for np.apply_along_axis. Unfortunately, to avoid performance degradation, I think we would need to do the loop in C (like frompyfunc does in the ufunc it constructs).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

toddrjen picture toddrjen  ·  4Comments

inducer picture inducer  ·  3Comments

navytux picture navytux  ·  4Comments

MareinK picture MareinK  ·  3Comments

kevinzhai80 picture kevinzhai80  ·  4Comments