numpy.vectorize's implementation is essentially a for loop?

Created on 6 Jul 2020 · 5Comments · Source: numpy/numpy

https://numpy.org/doc/1.18/reference/generated/numpy.vectorize.html
This tutorial mentions that vectorize's implementation is essentially a for loop. But as far as I know, a vectorized func will use
SIMD, so is it accurate to say numpy.vectorize's implementation is essentially a for loop? If true, so it's faster than unvectorized func only because it's loop implementated in C language?

Many thanks in advance.

33 - Question

Source

FrankHui

Most helpful comment

Yes. In the context of interpreted numerical array programming languages like Python (with numpy) and MATLAB™, we often use "vectorization" to refer to replacing explicit loops in the interpreted programming language with a function (or operator) that takes care of all of the looping logic internally. In numpy, the ufuncs implement this logic. This is unrelated to the usage of "vectorization" to refer to using SIMD CPU instructions that compute over multiple inputs concurrently, except that they both use a similar metaphor: they are like their "scalar" counterparts, but perform the computation over multiple input values with a single invocation.

With numpy.vectorize(), there is usually not a whole lot of speed benefit over the explicit Python for loop. The main point of it is to turn the Python function into a ufunc, which implements all of the broadcasting semantics and thus deals with any size of inputs. The Python function that's being "vectorized" still takes up most of the time, as well as converting the raw value of each element to a Python object to pass to the function. You wouldn't expect np.vectorize(lambda x, y: x + y) to be as fast as the ufunc np.add, which is C both in the loop and the contents of the loop.

rkern on 6 Jul 2020

👍2

All 5 comments

rkern on 6 Jul 2020

👍2

Thank you for your detailed explaination. But to be clear, let me take an example.

import pandas as pd
import numpy as np
df = pd.DataFrame({'a': range(100000), 'b': range(1, 1000001)})
# method1
df.loc[:, 'c'] = df.apply(lambda x: x['a'] + x['b'], axis=1)
# method2 
df.loc[:, 'c'] = np.vectorize(lambda x, y: x + y)(df['a'], df['b'])
# method3
df.loc[:, 'c'] = np.add(df['a'], df['b'])

so with your explaination, I guess

method | loop in C | loop content in C | use SIMD
-- | -- | -- | --
1 | × | × | ×
2 | √ | × | ×
3 | √ | √ | √

Right?

FrankHui on 6 Jul 2020

np.add is faster than np.vectorize(lambda x, y: x + y) because it avoids converting C doubles into Python objects and the Python function call overhead. It's possible that it also uses SIMD instructions, depending on whether or not you have the AVX2 extensions, but that's not why it's faster.

rkern on 6 Jul 2020

np.add is faster than np.vectorize(lambda x, y: x + y) because it avoids converting C doubles into Python objects and the Python function call overhead. It's possible that it _also_ uses SIMD instructions, depending on whether or not you have the AVX2 extensions, but that's not why it's faster.

I got it. Thanks.

FrankHui on 6 Jul 2020

You can use numba's vectorize to produce ufuncs that operate in parallel without Python overheads:

https://numba.pydata.org/numba-doc/latest/user/vectorize.html