<p>numpy.vectorize์˜ ๊ตฌํ˜„์€ ๋ณธ์งˆ์ ์œผ๋กœ for ๋ฃจํ”„์ž…๋‹ˆ๊นŒ?</p>

์— ๋งŒ๋“  2020๋…„ 07์›” 06์ผ  ยท  5์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: numpy/numpy

https://numpy.org/doc/1.18/reference/generated/numpy.vectorize.html
์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” ๋ฒกํ„ฐํ™”์˜ ๊ตฌํ˜„์ด ๋ณธ์งˆ์ ์œผ๋กœ for ๋ฃจํ”„๋ผ๊ณ  ์–ธ๊ธ‰ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋‚ด๊ฐ€ ์•„๋Š” ํ•œ, ๋ฒกํ„ฐํ™” ๋œ ํ•จ์ˆ˜๋Š”
SIMD, ๊ทธ๋ž˜์„œ numpy.vectorize์˜ ๊ตฌํ˜„์ด ๋ณธ์งˆ์ ์œผ๋กœ for ๋ฃจํ”„๋ผ๊ณ  ๋งํ•˜๋Š” ๊ฒƒ์ด ์ •ํ™•ํ•ฉ๋‹ˆ๊นŒ? true์ด๋ฉด C ์–ธ์–ด๋กœ ๊ตฌํ˜„ ๋œ ๋ฃจํ”„์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ฒกํ„ฐํ™”๋˜์ง€ ์•Š์€ func๋ณด๋‹ค ๋น ๋ฆ…๋‹ˆ๋‹ค.

๋ฏธ๋ฆฌ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

33 - Question

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

์˜ˆ. Python (numpy ํฌํ•จ) ๋ฐ MATLAB โ„ข๊ณผ ๊ฐ™์€ ํ•ด์„ ๋œ ์ˆซ์ž ๋ฐฐ์—ด ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด์˜ ๋งฅ๋ฝ์—์„œ ์ข…์ข… "๋ฒกํ„ฐํ™”"๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•ด์„ ๋œ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด์˜ ๋ช…์‹œ ์  ๋ฃจํ”„๋ฅผ ๋ชจ๋“  ๊ฒƒ์„ ์ฒ˜๋ฆฌํ•˜๋Š” ํ•จ์ˆ˜ (๋˜๋Š” ์—ฐ์‚ฐ์ž)๋กœ ๋Œ€์ฒดํ•˜๋Š” ๊ฒƒ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ๋‚ด๋ถ€์ ์œผ๋กœ ๋…ผ๋ฆฌ ๋ฃจํ•‘. numpy์—์„œ ufunc ๋Š”์ด ๋…ผ๋ฆฌ๋ฅผ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ๋‘˜ ๋‹ค ์œ ์‚ฌํ•œ ์€์œ ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ์ ์„ ์ œ์™ธํ•˜๊ณ ๋Š” ๋™์‹œ์— ์—ฌ๋Ÿฌ ์ž…๋ ฅ์— ๋Œ€ํ•ด ๊ณ„์‚ฐํ•˜๋Š” SIMD CPU ๋ช…๋ น์„ ์‚ฌ์šฉํ•˜๋Š” "๋ฒกํ„ฐํ™”"์‚ฌ์šฉ๊ณผ ๊ด€๋ จ์ด ์—†์Šต๋‹ˆ๋‹ค. ์ด๋“ค์€ "์Šค์นผ๋ผ"๋Œ€์‘ ๋ฌผ๊ณผ ๋น„์Šทํ•˜์ง€๋งŒ ์—ฌ๋Ÿฌ ์ž…๋ ฅ ๊ฐ’์— ๋Œ€ํ•ด ๊ณ„์‚ฐ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ํ•œ ๋ฒˆ์˜ ํ˜ธ์ถœ๋กœ.

numpy.vectorize() ์‚ฌ์šฉํ•˜๋ฉด ์ผ๋ฐ˜์ ์œผ๋กœ ๋ช…์‹œ์ ์ธ Python for ๋ฃจํ”„์— ๋น„ํ•ด ์†๋„ ์ด์ ์ด ๋งŽ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ทธ๊ฒƒ์˜ ์š”์ ์€ ํŒŒ์ด์ฌ ํ•จ์ˆ˜๋ฅผ ufunc ๋กœ ๋ฐ”๊พธ๋Š” ๊ฒƒ์ธ๋ฐ, ์ด๊ฒƒ์€ ๋ชจ๋“  ๋ฐฉ์†ก ์˜๋ฏธ๋ฅผ ๊ตฌํ˜„ํ•˜๊ณ  ๋”ฐ๋ผ์„œ ์–ด๋–ค ํฌ๊ธฐ์˜ ์ž…๋ ฅ๋„ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. "๋ฒกํ„ฐํ™”"๋˜๋Š” Python ํ•จ์ˆ˜๋Š” ์—ฌ์ „ํžˆ ๋Œ€๋ถ€๋ถ„์˜ ์‹œ๊ฐ„์„ ์ฐจ์ง€ํ•˜๋ฉฐ ๊ฐ ์š”์†Œ์˜ ์›์‹œ ๊ฐ’์„ ํ•จ์ˆ˜์— ์ „๋‹ฌํ•  Python ๊ฐ์ฒด๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. np.vectorize(lambda x, y: x + y) ๊ฐ€ ๋ฃจํ”„์™€ ๋ฃจํ”„์˜ ๋‚ด์šฉ ๋ชจ๋‘์—์„œ C ์ธ ufunc np.add ๋งŒํผ ๋น ๋ฅด๋‹ค๊ณ  ๊ธฐ๋Œ€ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

๋ชจ๋“  5 ๋Œ“๊ธ€

์˜ˆ. Python (numpy ํฌํ•จ) ๋ฐ MATLAB โ„ข๊ณผ ๊ฐ™์€ ํ•ด์„ ๋œ ์ˆซ์ž ๋ฐฐ์—ด ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด์˜ ๋งฅ๋ฝ์—์„œ ์ข…์ข… "๋ฒกํ„ฐํ™”"๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•ด์„ ๋œ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด์˜ ๋ช…์‹œ ์  ๋ฃจํ”„๋ฅผ ๋ชจ๋“  ๊ฒƒ์„ ์ฒ˜๋ฆฌํ•˜๋Š” ํ•จ์ˆ˜ (๋˜๋Š” ์—ฐ์‚ฐ์ž)๋กœ ๋Œ€์ฒดํ•˜๋Š” ๊ฒƒ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ๋‚ด๋ถ€์ ์œผ๋กœ ๋…ผ๋ฆฌ ๋ฃจํ•‘. numpy์—์„œ ufunc ๋Š”์ด ๋…ผ๋ฆฌ๋ฅผ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ๋‘˜ ๋‹ค ์œ ์‚ฌํ•œ ์€์œ ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ์ ์„ ์ œ์™ธํ•˜๊ณ ๋Š” ๋™์‹œ์— ์—ฌ๋Ÿฌ ์ž…๋ ฅ์— ๋Œ€ํ•ด ๊ณ„์‚ฐํ•˜๋Š” SIMD CPU ๋ช…๋ น์„ ์‚ฌ์šฉํ•˜๋Š” "๋ฒกํ„ฐํ™”"์‚ฌ์šฉ๊ณผ ๊ด€๋ จ์ด ์—†์Šต๋‹ˆ๋‹ค. ์ด๋“ค์€ "์Šค์นผ๋ผ"๋Œ€์‘ ๋ฌผ๊ณผ ๋น„์Šทํ•˜์ง€๋งŒ ์—ฌ๋Ÿฌ ์ž…๋ ฅ ๊ฐ’์— ๋Œ€ํ•ด ๊ณ„์‚ฐ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ํ•œ ๋ฒˆ์˜ ํ˜ธ์ถœ๋กœ.

numpy.vectorize() ์‚ฌ์šฉํ•˜๋ฉด ์ผ๋ฐ˜์ ์œผ๋กœ ๋ช…์‹œ์ ์ธ Python for ๋ฃจํ”„์— ๋น„ํ•ด ์†๋„ ์ด์ ์ด ๋งŽ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ทธ๊ฒƒ์˜ ์š”์ ์€ ํŒŒ์ด์ฌ ํ•จ์ˆ˜๋ฅผ ufunc ๋กœ ๋ฐ”๊พธ๋Š” ๊ฒƒ์ธ๋ฐ, ์ด๊ฒƒ์€ ๋ชจ๋“  ๋ฐฉ์†ก ์˜๋ฏธ๋ฅผ ๊ตฌํ˜„ํ•˜๊ณ  ๋”ฐ๋ผ์„œ ์–ด๋–ค ํฌ๊ธฐ์˜ ์ž…๋ ฅ๋„ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. "๋ฒกํ„ฐํ™”"๋˜๋Š” Python ํ•จ์ˆ˜๋Š” ์—ฌ์ „ํžˆ ๋Œ€๋ถ€๋ถ„์˜ ์‹œ๊ฐ„์„ ์ฐจ์ง€ํ•˜๋ฉฐ ๊ฐ ์š”์†Œ์˜ ์›์‹œ ๊ฐ’์„ ํ•จ์ˆ˜์— ์ „๋‹ฌํ•  Python ๊ฐ์ฒด๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. np.vectorize(lambda x, y: x + y) ๊ฐ€ ๋ฃจํ”„์™€ ๋ฃจํ”„์˜ ๋‚ด์šฉ ๋ชจ๋‘์—์„œ C ์ธ ufunc np.add ๋งŒํผ ๋น ๋ฅด๋‹ค๊ณ  ๊ธฐ๋Œ€ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์ž์„ธํ•œ ์„ค๋ช… ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ช…ํ™•ํ•˜๊ฒŒ ์˜ˆ๋ฅผ ๋“ค์–ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

import pandas as pd
import numpy as np
df = pd.DataFrame({'a': range(100000), 'b': range(1, 1000001)})
# method1
df.loc[:, 'c'] = df.apply(lambda x: x['a'] + x['b'], axis=1)
# method2 
df.loc[:, 'c'] = np.vectorize(lambda x, y: x + y)(df['a'], df['b'])
# method3
df.loc[:, 'c'] = np.add(df['a'], df['b'])

๊ทธ๋ž˜์„œ ๋‹น์‹ ์˜ ์„ค๋ช…๊ณผ ํ•จ๊ป˜

๋ฐฉ๋ฒ• | C์˜ ๋ฃจํ”„ | C์˜ ๋ฃจํ”„ ๋‚ด์šฉ | SIMD ์‚ฌ์šฉ
-| -| -| -
1 | ร— | ร— | ร—
2 | โˆš | ร— | ร—
3 | โˆš | โˆš | โˆš

๊ถŒ๋ฆฌ?

np.add ๋Š” C double์„ Python ๊ฐ์ฒด๋กœ ๋ณ€ํ™˜ํ•˜๊ณ  Python ํ•จ์ˆ˜ ํ˜ธ์ถœ ์˜ค๋ฒ„ ํ—ค๋“œ๋ฅผ ๋ฐฉ์ง€ํ•˜๋ฏ€๋กœ np.vectorize(lambda x, y: x + y) ๋ณด๋‹ค ๋น ๋ฆ…๋‹ˆ๋‹ค. AVX2 ํ™•์žฅ์ด ์žˆ๋Š”์ง€ ์—ฌ๋ถ€์— ๋”ฐ๋ผ SIMD ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ์ง€๋งŒ ์ด๊ฒƒ์ด ๋” ๋น ๋ฅธ ์ด์œ ๋Š” ์•„๋‹™๋‹ˆ๋‹ค.

np.add ๋Š” C double์„ Python ๊ฐ์ฒด๋กœ ๋ณ€ํ™˜ํ•˜๊ณ  Python ํ•จ์ˆ˜ ํ˜ธ์ถœ ์˜ค๋ฒ„ ํ—ค๋“œ๋ฅผ ๋ฐฉ์ง€ํ•˜๋ฏ€๋กœ np.vectorize(lambda x, y: x + y) ๋ณด๋‹ค ๋น ๋ฆ…๋‹ˆ๋‹ค. AVX2 ํ™•์žฅ์ด ์žˆ๋Š”์ง€ ์—ฌ๋ถ€์— ๋”ฐ๋ผ SIMD ๋ช…๋ น์–ด๋ฅผ _also_ ์‚ฌ์šฉํ• 

์•Œ์•˜์–ด. ๊ฐ์‚ฌ.

numba์˜ vectorize ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Python ์˜ค๋ฒ„ ํ—ค๋“œ์—†์ด ๋ณ‘๋ ฌ๋กœ ์ž‘๋™ํ•˜๋Š” ufunc๋ฅผ ์ƒ์„ฑ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

https://numba.pydata.org/numba-doc/latest/user/vectorize.html

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰