Scikit-learn: GridSearchCV๋Š” ๋ฉ€ํ‹ฐ ์Šค๋ ˆ๋”ฉ์ด ํ™œ์„ฑํ™” ๋œ ์ƒํƒœ์—์„œ ๋ฌด๊ธฐํ•œ ์ •์ง€๋ฉ๋‹ˆ๋‹ค (์˜ˆ : w / n_jobs! = 1).

์— ๋งŒ๋“  2015๋…„ 08์›” 12์ผ  ยท  88์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: scikit-learn/scikit-learn

์ €๋Š” 1 ๋…„ ๋™์•ˆ GridSearchCV๋ฅผ ํ†ตํ•ด Python 2.7, 3.3 ๋ฐ 3.4, ๋‘ ๊ฐ€์ง€ ์ž‘์—…, ์—ฌ๋Ÿฌ ๋‹ค๋ฅธ Mac OSX ํ”Œ๋žซํผ / ๋…ธํŠธ๋ถ, ์—ฌ๋Ÿฌ ๋ฒ„์ „์˜ numpy ๋ฐ scikit์—์„œ ๊ฐ„ํ—์ ์œผ๋กœ์ด ๋ฌธ์ œ๋ฅผ ๊ฒช์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฐฐ์šฐ์‹ญ์‹œ์˜ค (์ €๋Š” ๊ทธ๊ฒƒ๋“ค์„ ๊ฝค ์ž˜ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค).

์ด ๋ชจ๋“  ์ œ์•ˆ์„ ์‹œ๋„ํ–ˆ์ง€๋งŒ ํ•ญ์ƒ ์ž‘๋™ํ•˜๋Š”

https://github.com/scikit-learn/scikit-learn/issues/3605- ๋‹ค์ค‘ ์ฒ˜๋ฆฌ ์‹œ์ž‘ ๋ฐฉ๋ฒ•์„ 'forkserver'๋กœ ์„ค์ •
https://github.com/scikit-learn/scikit-learn/issues/2889- ์‚ฌ์šฉ์ž ์ง€์ • ์ฑ„์  ํ•จ์ˆ˜๊ฐ€ ์ „๋‹ฌ ๋  ๋•Œ๋งŒ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค (๋™์ผํ•œ GridSearchCV ํ˜ธ์ถœ์ด n_jobs! = 1๋กœ ๊ณ ์ •๋˜๋Š”์ด ๋ฌธ์ œ๊ฐ€ ์ ˆ๋Œ€์ ์œผ๋กœ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž ์ง€์ • ๋“์ ์ž์ด์ง€๋งŒ ํ•˜๋‚˜์—†์ด ์ž˜ ์ˆ˜ํ–‰)
https://github.com/joblib/joblib/issues/138-MKL ์Šค๋ ˆ๋“œ ์ˆ˜์—์„œ ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ • (Anaconda ๋ฐฐํฌ์—์„œ mkl์— ๋Œ€ํ•ด ๋นŒ๋“œ ๋œ numpy / sklearn์„ ์‹คํ–‰ํ•  ๋•Œ ์‹œ๋„ํ–ˆ์Šต๋‹ˆ๋‹ค)
์ž…๋ ฅ ํฌ๊ธฐ ์กฐ์ • ๋ฐ n_jobs = 1๋กœ ์˜ค๋ฅ˜๊ฐ€ ์—†๋Š”์ง€ ํ™•์ธ-์—ฌ๋Ÿฌ ์Šค๋ ˆ๋“œ์—์„œ ์ˆ˜ํ–‰ํ•˜๋ ค๋Š” ์ž‘์—…์ด ํ•œ ์Šค๋ ˆ๋“œ์—์„œ ์งง์€ ์‹œ๊ฐ„ ๋‚ด์— ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์‹คํ–‰๋œ๋‹ค๋Š” ๊ฒƒ์„ ์™„์ „ํžˆ ํ™•์‹ ํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๊ฒƒ์ด ์‚ฌ๋ผ ์กŒ๋‹ค๊ณ  ํ™•์‹  ํ•  ๋•Œ ํ•ญ์ƒ ํŠ€์–ด ๋‚˜์˜ค๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ด๋Š” ๋งค์šฐ ์‹ค๋ง์Šค๋Ÿฌ์šด ๋ฌธ์ œ์ด๋ฉฐ, ๋‚˜์—๊ฒŒ 100 % ์ž‘๋™ํ•˜๋Š” ์œ ์ผํ•œ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์€ ๋‚ด๊ฐ€ ์†ํ•œ sklearn ๋ฐฐํฌํŒ์˜ GridSearchCV ์†Œ์Šค๋กœ ์ด๋™ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. Paralell ํ˜ธ์ถœ์—์„œ ๋ฐฑ์—”๋“œ ์„ธํŠธ๋ฅผ '์Šค๋ ˆ๋”ฉ'(๋‹ค์ค‘ ์ฒ˜๋ฆฌ ๋Œ€์‹ )์œผ๋กœ ์ˆ˜๋™ ๋ณ€๊ฒฝ.

๋‚˜๋Š” ๊ทธ ํ•ดํ‚น๊ณผ n_jobs = 1 ์„ค์ •์˜ ์ฐจ์ด๋ฅผ ๋ฒค์น˜๋งˆํ‚นํ•˜์ง€ ์•Š์•˜์ง€๋งŒ, ๋ณ‘๋ ฌํ™”๊ฐ€ ์ „ํ˜€์—†์ด ์Šค๋ ˆ๋”ฉ ๋ฐฑ์—”๋“œ๋กœ ์ด๋“์„ ๊ธฐ๋Œ€ํ•  ์ด์œ ๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ? ํ™•์‹คํžˆ ๋‹ค์ค‘ ์ฒ˜๋ฆฌ๋งŒํผ ์ข‹์ง€๋Š” ์•Š์ง€๋งŒ ์ ์–ด๋„ ๋” ์•ˆ์ •์ ์ž…๋‹ˆ๋‹ค.

btw ๋™์ผํ•œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ ์ตœ์‹  ๋ฒ„์ „์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ๋งฅ OS 10.9.5
  • Python 3.4.3 :: Continuum Analytics, Inc.
  • scikit-learn == 0.16.1
  • scipy == 0.16.0
  • numpy == 1.9.2
  • ํŒ๋‹ค == 0.16.2
  • joblib == 0.8.4

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

@ eric-czech Python 3.4 ๋˜๋Š” 3.5 ๋ฏธ๋งŒ์ธ ๊ฒฝ์šฐ ๋‹ค์Œ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์„ค์ • ํ•œ ๋‹ค์Œ Python ํ”„๋กœ๊ทธ๋žจ์„ ๋‹ค์‹œ ์‹œ์ž‘ํ•˜์‹ญ์‹œ์˜ค.

export JOBLIB_START_METHOD="forkserver"

joblib ๋ฌธ์„œ์— ์„ค๋ช… ๋œ๋Œ€๋กœ. forkserver is mode๋Š” ๋Œ€ํ™”์‹์œผ๋กœ ์ •์˜ ๋œ ๊ธฐ๋Šฅ์„ ์ค‘๋‹จํ•˜๋ฏ€๋กœ ๊ธฐ๋ณธ์ ์œผ๋กœ ํ™œ์„ฑํ™”๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

๋ชจ๋“  88 ๋Œ“๊ธ€

ํ•ด๋‹น ํ”Œ๋žซํผ์— ์ง€์†์ ์œผ๋กœ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ ??

๋ฉ€ํ‹ฐ ์Šค๋ ˆ๋”ฉ๊ณผ ๊ด€๋ จํ•˜์—ฌ : ๋ฉ€ํ‹ฐ ์Šค๋ ˆ๋”ฉ์ด ์ƒ๋‹นํ•œ ์ด๋“์„ ์ค„ ๊ฐ€๋Šฅ์„ฑ์ด์žˆ๋Š” ์ถ”์ •์ž๊ฐ€ ์žˆ์œผ๋ฉฐ, ๋Œ€๋ถ€๋ถ„์˜ ์ž‘์—…์ด GIL์—†์ด numpy ๋˜๋Š” Cython ์ž‘์—…์œผ๋กœ ์ˆ˜ํ–‰๋˜๋Š” ์ถ”์ •๊ธฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ •๋ง, ๋‚˜๋Š” ์ด๊ฒƒ์ด ๋งŽ์ด ํ‰๊ฐ€๋˜์—ˆ๋‹ค๊ณ  ์ƒ๊ฐํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. backend='threading' ์€ ๊ฝค ์ตœ๊ทผ์˜ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ง„์งœ ์งˆ๋ฌธ์€ ๋ฌธ์ œ๊ฐ€ ๋ฌด์—‡์ธ์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ๋ฌด์—‡์„ ๋” ํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€์ž…๋‹ˆ๋‹ค.

์šฐ์„ , ์–ด๋–ค ๊ธฐ๋ณธ ๊ฒฌ์ ์„ ๊ณ ๋ ค ํ–ˆ์Šต๋‹ˆ๊นŒ?

@jnothman ํ”Œ๋žซํผ๋ณ„๋กœ OSX 10.9.5์ž…๋‹ˆ๊นŒ? ๊ทธ๋ ‡๋‹ค๋ฉด ๊ทธ ๋ฌธ์ œ๊ฐ€ ์ฒ˜์Œ์ด ์•„๋‹™๋‹ˆ๋‹ค.

์ด์ „์— ์ƒ๋žต ํ•œ ์ฃผ์š” ์„ธ๋ถ€ ์‚ฌํ•ญ ์ค‘ ํ•˜๋‚˜๋Š” ๋ฌธ์ œ๊ฐ€์žˆ์„ ๋•Œ ํ•ญ์ƒ IPython ๋…ธํŠธ๋ถ์„ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. n_jobs! = 1๋กœ "scoring"์ธ์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด GridSearchCV๊ฐ€ ์˜์›ํžˆ ์ค‘๋‹จ๋˜์ง€๋งŒ ํ•ด๋‹น ์ธ์ˆ˜๋ฅผ ์ œ๊ฑฐํ•˜๋ฉด ๋ชจ๋‘ ๊ดœ์ฐฎ์Šต๋‹ˆ๋‹ค. ๋‚ด๊ฐ€ ์ œ๊ณตํ•˜๋Š” ์ ์ˆ˜ ๋งค๊ธฐ๊ธฐ ํ•จ์ˆ˜๊ฐ€ ์ƒ์ˆ˜ ๋ถ€๋™ ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•˜๋Š” ๊ฒƒ ์™ธ์—๋Š” ์ž‘๋™ํ•˜์ง€ ์•Š๋”๋ผ๋„ ์—ฌ์ „ํžˆ ๋ฉˆ ์ถฅ๋‹ˆ ๋‹ค (ํ•˜์ง€๋งŒ n_jobs = 1๋กœ ์˜ˆ์ƒํ•˜๋Š” ๊ฒƒ๊ณผ ์ •ํ™•ํžˆ ์ผ์น˜ ํ•จ).

Re : ๋“ฃ๊ธฐ ์ข‹์€ ์Šค๋ ˆ๋”ฉ์ด๋ฏ€๋กœ GridSearchCV์— ๋Œ€ํ•œ ํ•ด๋‹น ์˜ต์…˜์ด ์‹ค์ œ๋กœ ์˜๋ฏธ๊ฐ€์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‚ด๊ฐ€ ๋ฌธ์ œ๊ฐ€์žˆ๋Š” ๊ฒฌ์ ๊ฐ€์— ๊ด€ํ•œ ํ•œ ๋‚˜๋Š” ๊ทธ๊ฒƒ์„ ๋งŽ์ด ์ขํž ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์‹ ํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ๋‚˜๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์—ฌ๊ธฐ์—์„œ ์œ ์šฉํ•œ ์ •๋ณด๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด ๋‚ด๊ฐ€ ๊ด€๋ฆฌ ํ•  ์ˆ˜์žˆ๋Š” ํ•œ ๋งŽ์ด ์‹œ๋„ํ•ฉ๋‹ˆ๋‹ค. ๋‚˜๋Š” ๋ฐฉ๊ธˆ ์œ„์—์„œ ์–ธ๊ธ‰ ํ•œ ์กฐ๊ฑด์„ ์–ด๋–ค ์ถ”์ •๊ธฐ๋กœ ์žฌํ˜„ ํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธํ–ˆ์œผ๋ฉฐ ๋ชจ๋“  ๊ฒฝ์šฐ์— ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค (๋˜๋Š” ์ ์–ด๋„ ์‹œ๋„ํ–ˆ์Šต๋‹ˆ๋‹ค LogisticRegression, SGDClassifier, GBRT ๋ฐ RF).

์ด์™€ ๊ฐ™์€ ๋ฉ€ํ‹ฐ ์Šค๋ ˆ๋”ฉ ๋ฌธ์ œ์— ๋Œ€ํ•ด ์ผ๋ฐ˜์ ์œผ๋กœ ์–ด๋–ค ์ปจํ…์ŠคํŠธ๊ฐ€ ๊ฐ€์žฅ ๋„์›€์ด๋˜๋Š”์ง€ ์ž˜ ๋ชจ๋ฅด์ง€๋งŒ ๊ณ„์† ์ง„ํ–‰ํ•  ์ˆ˜์žˆ๋Š” ๋” ๋งŽ์€ ๊ฒƒ์„ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด ํ•  ์ˆ˜์žˆ๋Š” ๋ชจ๋“  ์ž‘์—…์„ํ•˜๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค. ๋‚˜์—๊ฒŒ ์–ด๋–ค ์ œ์•ˆ์ด ์žˆ์Šต๋‹ˆ๊นŒ?

๊ฐ€์† ํ”„๋ ˆ์ž„ ์›Œํฌ์— ๋Œ€ํ•ด ์—ฐ๊ฒฐ๋œ numpy๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๊นŒ?

์•„๋‹ˆ, ๋‚ด๊ฐ€ ๋ญ”๊ฐ€ ๋น ์ง„ ๊ฒŒ ์•„๋‹ˆ๋ผ๋ฉด. ๋‚˜๋Š” ๋‹น์‹ ์ด ๊ทธ๋ ‡๊ฒŒ ํ•  ๋•Œ ์„ค์น˜๋œ numpy ๋ฒ„์ „์ด ๋ณ€๊ฒฝ๋˜๊ฑฐ๋‚˜ ์ ์–ด๋„ ๊ฐ€์† ํŒจํ‚ค์ง€๊ฐ€ ์กด์žฌํ•  ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ–ˆ์Šต๋‹ˆ๋‹ค.

(research3.4) eczech $ pip ๋™๊ฒฐ | grep numpy
numpy == 1.9.2
(research3.4) eczech $ conda ์—…๋ฐ์ดํŠธ ๊ฐ€์†ํ™”
์˜ค๋ฅ˜ : '๊ฐ€์†'ํŒจํ‚ค์ง€๊ฐ€ /Users/eczech/anaconda/envs/research3.4์— ์„ค์น˜๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

100 % ํ™•์‹ ์„ ๊ฐ€์ง€๊ณ  ๋Œ€๋‹ต ํ•  ์ˆ˜ ์—†๋‹ค๋Š” ๋‚˜์˜ ๋ฌด์ง€๋ฅผ ์šฉ์„œํ•˜์‹ญ์‹œ์˜ค. ๊ทธ๋Ÿฌ๋‚˜ ๋‚˜๋Š” ํ™•์‹คํžˆ ๊ทธ๊ฒƒ์„ ์„ค์น˜ํ•˜๊ธฐ ์œ„ํ•ด ์˜๋„์ ์œผ๋กœ ์•„๋ฌด๊ฒƒ๋„ํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

conda ๊ฐ€์†์€ ์‚ฌ๊ณผ ๊ฐ€์†๊ณผ ๋™์ผํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
http://docs.continuum.io/accelerate/index
https://developer.apple.com/library/mac/documentation/Accelerate/Reference/AccelerateFWRef/

conda ๊ฐ€์†์€ ํŒจํ‚ค์ง€์˜ MKL ๊ฐ€์† ๋ฒ„์ „์ด๋ฉฐ, ์‚ฌ๊ณผ ๊ฐ€์†์€ MKL์˜ ๋Œ€์•ˆ์ž…๋‹ˆ๋‹ค.

numpy.__config__.show() ์„ ์ฃผ์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

๋‹ค์ค‘ ์ฒ˜๋ฆฌ๋Š” ๊ฐ€์† IIRC์—์„œ ์ž‘๋™ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํ•‘ @ogrisel

ํ™•์‹คํžˆ:

np. config .show ()
atlas_3_10_blas_threads_info :
์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค
atlas_info :
์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค
atlas_3_10_info :
์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค
atlas_threads_info :
์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค
atlas_3_10_blas_info :
์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค
blas_opt_info :
extra_compile_args = [ '-msse3', '-DAPPLE_ACCELERATE_SGEMV_PATCH', '-I / System / Library / Frameworks / vecLib.framework / Headers']
extra_link_args = [ '-Wl, -framework', '-Wl, Accelerate']
define_macros = [( 'NO_ATLAS_INFO', 3)]
lapack_mkl_info :
์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค
atlas_blas_info :
์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค
mkl_info :
์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค
lapack_opt_info :
extra_compile_args = [ '-msse3', '-DAPPLE_ACCELERATE_SGEMV_PATCH']
extra_link_args = [ '-Wl, -framework', '-Wl, Accelerate']
define_macros = [( 'NO_ATLAS_INFO', 3)]
blas_mkl_info :
์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค
atlas_3_10_threads_info :
์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค
openblas_info :
์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค
openblas_lapack_info :
์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค
atlas_blas_threads_info :
์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค

์˜ˆ, ๋ฌธ์ œ ์ถ”์ ๊ธฐ์—์„œ ์ฐพ์„ ์ˆ˜์—†๋Š” ์•Œ๋ ค์ง„ ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. ๊ฐ€์†์€ ๋‹ค์ค‘ ์ฒ˜๋ฆฌ์—์„œ ์ž‘๋™ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์•ฝ๊ฐ„ ํ˜ผ๋ž€ ์Šค๋Ÿฌ์›Œ์š”. ์Šค๋ ˆ๋”ฉ ๋ฐฑ์—”๋“œ๋Š” GIL์ด ๋ฆด๋ฆฌ์Šค ๋  ๋•Œ๋งŒ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ฃ ?

Gotcha, ๊ทธ๋Ÿผ numpy๋ฅผ ์–ด๋–ป๊ฒŒ ์žฌ ๊ตฌ์ถ•ํ•ด์•ผํ•˜๋Š”์ง€ ์•Œ์•„? conda ํŒจํ‚ค์ง€๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋Œ€์‹  pip๋ฅผ ์„ค์น˜ํ•ด์•ผํ•ฉ๋‹ˆ๊นŒ? ์•„๋‹ˆ๋ฉด ์†Œ์Šค์—์„œ ๋นŒ๋“œํ•˜๊ณ  ์• ํ”Œ ๊ฐ€์† ์ธ์ˆ˜๊ฐ€ ์—†๋Š”์ง€ ํ™•์ธํ•˜๋Š” ๊ฒƒ์ด ๋” ๋‚˜์„๊นŒ์š”?

์ด๊ฒƒ์ด ๋ฌธ์ œ์˜ ์‹œ์ž‘์ด ์•„๋‹Œ ๊ฒƒ์ฒ˜๋Ÿผ ๋“ค๋ฆฝ๋‹ˆ๋‹ค. ์ฃฝ์€ ๋ง์„ ์น˜๊ณ  ์žˆ๋‹ค๋ฉด ๊ฐ€๊นŒ์ด ๊ฐ€์‹ญ์‹œ์˜ค.

์ฝ˜๋‹ค ๊ฐ€์†์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๋ฉด ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.)

joblib ๋กœ ๋ณด์„๊ธˆ์„ ๋‚ผ ์ˆ˜ ์žˆ์„๊นŒ์š”?

์•„ ์ข‹์•„์š”, ์—ฐ์†์ฒด๋Š” ๊ทธ ํ•˜ํ•˜๋ฅผํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ๊ณผ๋ฅผ ์ง€๋ถˆํ–ˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

$ 0 ์ œ์•ˆ์ด ์žˆ์Šต๋‹ˆ๊นŒ? ๊ทธ๋ฆฌ๊ณ  ์–ด๋Š ์ชฝ์ด๋“  ํ†ต์ฐฐ๋ ฅ์— ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

์•„, ๊ทธ๋ฆฌ๊ณ  ์ด๊ฒƒ์ด ์ด์ „์— ์š”์ฒญ ๋œ ์ ์ด ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๊ณ  ์žˆ์ง€๋งŒ, ์‚ฌ์šฉ์ž ์ง€์ • ์ ์ˆ˜ ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•  ๋•Œ ํ˜„์žฌ ํ”Œ๋žซํผ์—์„œ๋งŒ์ด ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค๋Š” ์‚ฌ์‹ค์ด ๊ณ„์† ๋ ๊นŒ์š”? ๋‚ด ์ธ์ƒ ๋™์•ˆ ๋‚˜๋Š” grid_search.py โ€‹โ€‹์†Œ์Šค ์ฝ”๋“œ๊ฐ€ ์ฃผ์–ด์ง€๋ฉด ๋ฌธ์ œ๊ฐ€ ๋  ์ˆ˜์žˆ๋Š” ๊ฒƒ์ด ๋ฌด์—‡์ธ์ง€ ์•Œ ์ˆ˜ ์—†์ง€๋งŒ ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜์˜ ์‚ฐ์„ธ์™€ ๊ด€๋ จ์ด์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

๊ทธ๋ฆฌ๊ณ  ๊ทธ๊ฒƒ๊ณผ๋Š” ๋‹ค์†Œ ๊ด€๋ จ์ด์—†๋Š”, ๋‚˜๋Š” ๋˜ํ•œ ์ด์ „์— IPython ๋ณ‘๋ ฌ ๋ฐฑ์—”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ˆ˜์ • ๋œ ๋ฒ„์ „์˜ GridSearchCV๋ฅผ ๋งŒ๋“ค์–ด์„œ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๊ณ  ์‹œ๋„ํ–ˆ์Œ์„ ๊ธฐ์–ตํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ๊ทธ ์†”๋ฃจ์…˜์„ ๋‹ค์‹œ ๋ฐฉ๋ฌธํ–ˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๋ฉด ์–ด๋–ค ์‹ ์œผ๋กœ๋“  ๊ณต์œ  ํ•  ๊ฐ€์น˜๊ฐ€ ์žˆ์„๊นŒ์š”? ์ด ์†”๋ฃจ์…˜์€ ์ž˜ ์ž‘๋™ํ–ˆ์ง€๋งŒ ๋…ธํŠธ๋ถ ์ž์ฒด๊ฐ€ ์•„๋‹Œ pythonpath์—์„œ ์‚ฌ์šฉ์ž ์ •์˜ ํด๋ž˜์Šค์™€ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์–ด์•ผํ–ˆ์ง€๋งŒ ๋‹ค๋ฅธ ๋” ๋‚˜์€ ์˜ต์…˜์ด ์—†๋‹ค๋ฉด ๋‹ค๋ฆฌ๊ฐ€์žˆ์„ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์šฉํ•˜๊ธฐ๊ฐ€ ์•ฝ๊ฐ„ ๊ณ ํ†ต ์Šค๋Ÿฌ์› ์Šต๋‹ˆ๋‹ค.

์•„ํ‹€๋ผ์Šค์— ์—ฐ๊ฒฐํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์†๋„๊ฐ€ ๋Š๋ ค์งˆ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์•„๋งˆ๋„ OS X์— ๋Œ€ํ•œ ๋ฌด๋ฃŒ MKL ๋งํฌ numpy๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ? ์ฐฝ๋ฌธ ์šฉ์ด ํ•˜๋‚˜ ์žˆ์Šต๋‹ˆ๋‹ค.

[ํ•™์ˆ ์ž์ธ ๊ฒฝ์šฐ ์—ฐ์† ๊ฐ€์†์€ ๋ฌด๋ฃŒ์ž…๋‹ˆ๋‹ค.]

๋‚˜๋Š” ์ด๊ฒƒ์ด ์‚ฌ์šฉ์ž ์ง€์ • ์ ์ˆ˜ ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๊ณผ ์ „ํ˜€ ๊ด€๋ จ์ด ์—†๋‹ค๊ณ  ํ™•์‹ ํ•ฉ๋‹ˆ๋‹ค.
์‚ฌ์šฉ์ž ์ง€์ • ์ ์ˆ˜ ๊ธฐ๋Šฅ์œผ๋กœ ์ค‘๋‹จ๋˜์ง€๋งŒ ์—†์ด๋Š”์—†๋Š” ์ž์ฒด ํฌํ•จ ๋œ ์Šค๋‹ˆ ํ”Œ๋ฆฟ์„ ์ œ๊ณต ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

์ปค์Šคํ…€ ์Šค์ฝ”์–ด๋ง ๊ธฐ๋Šฅ์˜ ์‚ฌ์‹ค์ด ๊ด€๋ จ์ด์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค (์˜ˆ : ํ”ผํด ๋ง ๋ฌธ์ œ ๋˜๋Š” ์ค‘์ฒฉ ๋œ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๊ฐ€ ์ ์ ˆํ•  ์ˆ˜ ์žˆ์Œ). ์ฝ”๋“œ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์„๊นŒ์š”?

์•„๋‹ˆ๋ฉด make_scorer ์žˆ๋Š” ํ‘œ์ค€ ๋ฉ”ํŠธ๋ฆญ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๊นŒ?

ํ™•์‹คํžˆ, ์—ฌ๊ธฐ ๊ด€๋ จ ๋ถ€๋ถ„์ด ์žˆ์œผ๋ฉฐ make_scorer์—์„œ๋Š” ๋ฌธ์ œ๊ฐ€ ์—†์ง€๋งŒ ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜์—์„œ๋Š” ๋ฌธ์ œ๊ฐ€์—†๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ž…๋‹ˆ๋‹ค.

from sklearn.linear_model import LogisticRegression
from sklearn.grid_search import GridSearchCV
from sklearn.cross_validation import StratifiedKFold
from sklearn.metrics import average_precision_score, make_scorer
import functools

res = []
clfs = []

for response in responses:
    X, y = d_in[features], d_in[response]
    for i, (train, test) in enumerate(StratifiedKFold(y, 5)):
        X_train, y_train, X_test, y_test = X.iloc[train], y.iloc[train], X.iloc[test], y.iloc[test]
        clf = LogisticRegression(penalty='l1')
        grid = {
            'class_weight': [{0: 1, 1: 10}, {0: 1, 1: 100}, {0: 1, 1: 1000}],
            'C': np.logspace(-3, 0, num=4)
        }

        # Using make_scorer doesn't cause any issues
        # clf = GridSearchCV(clf, grid, cv=StratifiedKFold(y_train, 3),  
        #                    scoring=make_scorer(average_precision_score), n_jobs=-1)

        # This however is a problem:
        def avg_prec_score(estimator, X, y):
            return average_precision_score(y, estimator.predict_proba(X)[:, 1])
        clf = GridSearchCV(clf, grid, cv=StratifiedKFold(y_train, 5),  
                           scoring=avg_prec_score, n_jobs=-1)

        clf = clf.fit(X_train, y_train)
        print('Best parameters for response {} inferred in fold {}: {}'.format(response, i, clf.best_params_))

        y_pred = clf.predict(X_test)
        y_proba = clf.predict_proba(X_test)

        clfs.append((response, i, clf))
        res.append(pd.DataFrame({
            'y_pred': y_pred, 
            'y_actual': y_test, 
            'y_proba': y_proba[:,1],
            'response': np.repeat(response, len(y_pred))
        }))

res = functools.reduce(pd.DataFrame.append, res)
res.head()

ํ•„์ž๋Š” ์‚ฌ์šฉ์ค‘์ธ ๋ฐ์ดํ„ฐ์˜ ์ผ๋ถ€ ๋ฒ„์ „์„ ํฌํ•จํ•˜๋Š” ์ž์ฒด ํฌํ•จ ๋ฒ„์ „์œผ๋กœ ์ž‘์—… ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค (ํ•˜์ง€๋งŒ ์‹œ๊ฐ„์ด ๋” ์˜ค๋ž˜ ๊ฑธ๋ฆฝ๋‹ˆ๋‹ค). ํ•˜์ง€๋งŒ ๊ทธ ๋™์•ˆ ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜์˜ ํ”ผํด ๋ง์€ ์ข‹์€ ๋ฆฌ๋“œ์ฒ˜๋Ÿผ ๋“ค๋ฆฝ๋‹ˆ๋‹ค. ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ ๋ฒˆ ์‹œ๋„ํ•ด ๋ณด์•˜๊ณ  ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜์—์„œ๋Š” 100 %, make_scorer๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ๋Š” 0 %๊ฐ€ ์ค‘๋‹จ๋ฉ๋‹ˆ๋‹ค. ์ผ๋ถ€ ์•Œ๋ ค์ง„ ๊ฐ€์ ธ์˜จ ๋ฉ”ํŠธ๋ฆญ ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ๊ทธ๊ฒƒ์€ main (์ฆ‰, ํ•ด์„๋˜๋Š” ์ตœ์ƒ์œ„ ์Šคํฌ๋ฆฝํŠธ) ๋˜๋Š”
๊ฐ€์ ธ์˜จ ๋ชจ๋“ˆ?

2015 ๋…„ 8 ์›” 15 ์ผ 23:37์— Eric Czech [email protected] ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ผ์Šต๋‹ˆ๋‹ค.

๋ฌผ๋ก  ์—ฌ๊ธฐ์— ๊ด€๋ จ ๋ถ€๋ถ„์ด ์žˆ์œผ๋ฉฐ ๊ดœ์ฐฎ์€ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
make_scorer๋ฅผ ์‚ฌ์šฉํ•˜์ง€๋งŒ ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜๋Š” ์‚ฌ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

from sklearn.linear_model import LogisticRegressionfrom sklearn.grid_search import GridSearchCV from sklearn.cross_validation import StratifiedKFoldfrom sklearn.metrics import average_precision_score, make_scorerimport functools

res = []
clfs = []
์‘๋‹ต์— ๋Œ€ํ•œ ์‘๋‹ต :
X, y = d_in [๊ธฐ๋Šฅ], d_in [์‘๋‹ต]
i์˜ ๊ฒฝ์šฐ (train, test) enumerate (StratifiedKFold (y, 5)) :
X_train, y_train, X_test, y_test = X.iloc [train], y.iloc [train], X.iloc [test], y.iloc [test]
clf = LogisticRegression (penalty = 'l1')
๊ทธ๋ฆฌ๋“œ = {
'class_weight': [{0 : 1, 1 : 10}, {0 : 1, 1 : 100}, {0 : 1, 1 : 1000}],
'C': np.logspace (-3, 0, num = 4)
}

    # Using make_scorer doesn't cause any issues
    # clf = GridSearchCV(clf, grid, cv=StratifiedKFold(y_train, 3),
    #                    scoring=make_scorer(average_precision_score), n_jobs=-1)

    # This however is a problem:
    def avg_prec_score(estimator, X, y):
        return average_precision_score(y, estimator.predict_proba(X)[:, 1])
    clf = GridSearchCV(clf, grid, cv=StratifiedKFold(y_train, 5),
                       scoring=avg_prec_score, n_jobs=-1)

    clf = clf.fit(X_train, y_train)
    print('Best parameters for response {} inferred in fold {}: {}'.format(response, i, clf.best_params_))

    y_pred = clf.predict(X_test)
    y_proba = clf.predict_proba(X_test)

    clfs.append((response, i, clf))
    res.append(pd.DataFrame({
        'y_pred': y_pred,
        'y_actual': y_test,
        'y_proba': y_proba[:,1],
        'response': np.repeat(response, len(y_pred))
    }))

res = functools.reduce (pd.DataFrame.append, res)
res.head ()

์ผ๋ถ€ ๋ฒ„์ „์˜ ๋…๋ฆฝํ˜• ๋ฒ„์ „์„ ์ž‘์—…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
๋‚ด๊ฐ€ ์‚ฌ์šฉํ•˜๊ณ ์žˆ๋Š” ๋ฐ์ดํ„ฐ (ํ•˜์ง€๋งŒ ์‹œ๊ฐ„์ด ๋” ์˜ค๋ž˜ ๊ฑธ๋ฆฝ๋‹ˆ๋‹ค). ๊ทธ๋™์•ˆ
์ด๋Ÿฌํ•œ ์‚ฌ์šฉ์ž ์ง€์ • ํ•จ์ˆ˜์˜ ํ”ผํด ๋ง์€ ์ข‹์€ ๋ฆฌ๋“œ์ฒ˜๋Ÿผ ๋“ค๋ฆฝ๋‹ˆ๋‹ค.
์—ฌ๋Ÿฌ ๋ฒˆ ๋‹ค์‹œ ํ™•์ธํ•˜๊ณ  ์‚ฌ์šฉ์ž ์ •์˜๋กœ 100 % ๋งค๋‹ฌ๋ฆฝ๋‹ˆ๋‹ค.
์ผ๋ถ€ ์•Œ๋ ค์ง„ ๊ฒƒ๊ณผ ํ•จ๊ป˜ make_scorer๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ์˜ 0 %,
๊ฐ€์ ธ์˜จ ๋ฉ”ํŠธ๋ฆญ ๊ธฐ๋Šฅ.

โ€”
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ๋‹ต์žฅํ•˜๊ฑฐ๋‚˜ GitHub์—์„œ ํ™•์ธํ•˜์„ธ์š”.
https://github.com/scikit-learn/scikit-learn/issues/5115#issuecomment -131376298
.

์•„, ipynb์ž…๋‹ˆ๋‹ค. ํฅ๋ฏธ ๋กญ๊ตฐ์š”. ์˜ˆ, ์ ˆ์ž„์ด ๋ฌธ์ œ๊ฐ€ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ..?

2015 ๋…„ 8 ์›” 15 ์ผ 23:51, Joel Nothman joel. [email protected] ์ž‘์„ฑ :

๊ทธ๋ฆฌ๊ณ  ๊ทธ๊ฒƒ์ด ๋ฉ”์ธ (์ฆ‰, ํ•ด์„๋˜๋Š” ์ตœ์ƒ์œ„ ์Šคํฌ๋ฆฝํŠธ) ๋˜๋Š”
๊ฐ€์ ธ์˜จ ๋ชจ๋“ˆ?

2015 ๋…„ 8 ์›” 15 ์ผ 23:37์— Eric Czech [email protected] ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ผ์Šต๋‹ˆ๋‹ค.

๋ฌผ๋ก  ์—ฌ๊ธฐ์— ๊ด€๋ จ ๋ถ€๋ถ„์ด ์žˆ์œผ๋ฉฐ ๊ดœ์ฐฎ์€ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
make_scorer๋ฅผ ์‚ฌ์šฉํ•˜์ง€๋งŒ ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜๋Š” ์‚ฌ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

from sklearn.linear_model import LogisticRegressionfrom sklearn.grid_search import GridSearchCV from sklearn.cross_validation import StratifiedKFoldfrom sklearn.metrics import average_precision_score, make_scorerimport functools

res = []
clfs = []
์‘๋‹ต์— ๋Œ€ํ•œ ์‘๋‹ต :
X, y = d_in [๊ธฐ๋Šฅ], d_in [์‘๋‹ต]
i์˜ ๊ฒฝ์šฐ (train, test) enumerate (StratifiedKFold (y, 5)) :
X_train, y_train, X_test, y_test = X.iloc [train], y.iloc [train], X.iloc [test], y.iloc [test]
clf = LogisticRegression (penalty = 'l1')
๊ทธ๋ฆฌ๋“œ = {
'class_weight': [{0 : 1, 1 : 10}, {0 : 1, 1 : 100}, {0 : 1, 1 : 1000}],
'C': np.logspace (-3, 0, num = 4)
}

    # Using make_scorer doesn't cause any issues
    # clf = GridSearchCV(clf, grid, cv=StratifiedKFold(y_train, 3),
    #                    scoring=make_scorer(average_precision_score), n_jobs=-1)

    # This however is a problem:
    def avg_prec_score(estimator, X, y):
        return average_precision_score(y, estimator.predict_proba(X)[:, 1])
    clf = GridSearchCV(clf, grid, cv=StratifiedKFold(y_train, 5),
                       scoring=avg_prec_score, n_jobs=-1)

    clf = clf.fit(X_train, y_train)
    print('Best parameters for response {} inferred in fold {}: {}'.format(response, i, clf.best_params_))

    y_pred = clf.predict(X_test)
    y_proba = clf.predict_proba(X_test)

    clfs.append((response, i, clf))
    res.append(pd.DataFrame({
        'y_pred': y_pred,
        'y_actual': y_test,
        'y_proba': y_proba[:,1],
        'response': np.repeat(response, len(y_pred))
    }))

res = functools.reduce (pd.DataFrame.append, res)
res.head ()

์ผ๋ถ€ ๋ฒ„์ „์˜ ๋…๋ฆฝํ˜• ๋ฒ„์ „์„ ์ž‘์—…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
๋‚ด๊ฐ€ ์‚ฌ์šฉํ•˜๊ณ ์žˆ๋Š” ๋ฐ์ดํ„ฐ (ํ•˜์ง€๋งŒ ์‹œ๊ฐ„์ด ๋” ์˜ค๋ž˜ ๊ฑธ๋ฆฝ๋‹ˆ๋‹ค). ๊ทธ๋™์•ˆ
์ด๋Ÿฌํ•œ ์‚ฌ์šฉ์ž ์ง€์ • ํ•จ์ˆ˜์˜ ํ”ผํด ๋ง์€ ์ข‹์€ ๋ฆฌ๋“œ์ฒ˜๋Ÿผ ๋“ค๋ฆฝ๋‹ˆ๋‹ค.
์—ฌ๋Ÿฌ ๋ฒˆ ๋‹ค์‹œ ํ™•์ธํ•˜๊ณ  ์‚ฌ์šฉ์ž ์ •์˜๋กœ 100 % ๋งค๋‹ฌ๋ฆฝ๋‹ˆ๋‹ค.
์ผ๋ถ€ ์•Œ๋ ค์ง„ ๊ฒƒ๊ณผ ํ•จ๊ป˜ make_scorer๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ์˜ 0 %,
๊ฐ€์ ธ์˜จ ๋ฉ”ํŠธ๋ฆญ ๊ธฐ๋Šฅ.

โ€”
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ๋‹ต์žฅํ•˜๊ฑฐ๋‚˜ GitHub์—์„œ ํ™•์ธํ•˜์„ธ์š”.
https://github.com/scikit-learn/scikit-learn/issues/5115#issuecomment -131376298
.

๊ทธ๊ฒƒ์€ ๋…ธํŠธ๋ถ์—

๋Œ€์‹  ๋ชจ๋“ˆ์—์„œ ๊ฐ€์ ธ ์™€์„œ ์–ด๋–ป๊ฒŒ ์ง„ํ–‰๋˜๋Š”์ง€ ์‚ดํŽด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์•„์‹œ๋‹ค์‹œํ”ผ ๋…ธํŠธ๋ถ ์™ธ๋ถ€์—์„œ ์ •์˜ํ•˜๋ฉด ์ž˜ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

๋‚˜๋Š” ๋ณธ์งˆ์ ์œผ๋กœ ํŒŒ์ด์ฌ 2.7์—์„œ ์‹คํ–‰๋˜๋Š” ๋™์ผํ•œ ์ฝ”๋“œ (๋‚˜๋Š” ๋” ์˜ค๋ž˜๋œ lib๊ฐ€ ํ•„์š”ํ•จ)์™€ ํŒŒ์ด์ฌ 3.4 ์—์„œ์ด ์ฝ”๋“œ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜์ธ์ง€ ์—ฌ๋ถ€์— ๊ด€๊ณ„์—†์ด 2.7์—์„œ ์ค‘๋‹จ ๋ฌธ์ œ๊ฐ€ ์žˆ์ง€๋งŒ make_scorer๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ƒˆ ๋ฒ„์ „์˜ ๋ชจ๋“  ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋˜์–ด ์ด์ „ ๋ฒ„์ „์˜ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์œผ๋กœ ์‚ด ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

๋…ธํŠธ๋ถ์— ์ •์˜ ๋œ ์‚ฐ์„ธ ํ•จ์ˆ˜๊ฐ€ ์™œ ๋ฌธ์ œ๊ฐ€ ๋  ์ˆ˜ ์žˆ๋Š”์ง€ ์ถ”์ ํ•˜๊ธฐ ์œ„ํ•ด ๋‚ด๊ฐ€ ํ•  ์ˆ˜์žˆ๋Š” ๋‹ค๋ฅธ ์ผ์ด ์žˆ์Šต๋‹ˆ๊นŒ?

๊ธ€์Ž„, ์šฐ๋ฆฌ๋Š” ๋‹ค์Œ์„ ์ดํ•ดํ•˜๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.

  • ํ”ผํด ๋ง๊ณผ ์–ธ ํ”ผํด ๋ง์€ ์ผ๋ฐ˜์ ์œผ๋กœ ํ•ด๋‹น ํ”Œ๋žซํผ์˜ ๋กœ์ปฌ ์ •์˜ ํ•จ์ˆ˜์— ๋Œ€ํ•œ ๋ฌธ์ œ์ž…๋‹ˆ๊นŒ, ์•„๋‹ˆ๋ฉด ํŠน์ • ๋ฌธ์ œ์— ๋ถ€๋”ช ํžˆ๊ณ  ์žˆ์Šต๋‹ˆ๊นŒ?
  • ์™œ ์‚ฐ์„ธ๊ฐ€ ๋ฌธ์ œ๋ผ๋ฉด ์˜ˆ์™ธ๋ฅผ ์ผ์œผํ‚ค์ง€ ์•Š๊ณ  ๋งค ๋‹ฌ๋ฆฌ๋‚˜์š”? pickle.dumps(function) ํ™•์ธ https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/externals/joblib/parallel. py # L150 with pickle.loads(pickle.dumps(function)) ๊ฒฐ๊ณผ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๊นŒ? (์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•ด ์ด๊ฒƒ์€ ๋‹ค์ค‘ ์ฒ˜๋ฆฌ๋ฅผ ์‹คํ–‰ํ•˜๊ธฐ ์ „์— ํ”ผํด ๊ฐ€๋Šฅ์„ฑ์„ ํ™•์ธํ•˜๊ธฐ์œ„ํ•œ ์•ˆ์ „ ์ ๊ฒ€์ž…๋‹ˆ๋‹ค.)

@ogrisel ์ด ๊ด€์‹ฌ์„

๋‚ด๊ฐ€ ์œˆ๋„์šฐ์—์„œ ๋ณธ ๊ฒƒ์—์„œ ๋…ธํŠธ๋ถ์€ ๋ฉ€ํ‹ฐ ํ”„๋กœ์„ธ์‹ฑ๊ณผ ์ด์ƒํ•œ ์ƒํ˜ธ ์ž‘์šฉ์„ํ•œ๋‹ค.

๋™์ผํ•œ ๋…ธํŠธ๋ถ์— ์ •์˜ ๋œ ํ•จ์ˆ˜๋ฅผ ํ”ผํด ๋ง ๋ฐ ์–ธ ํ”ผํด ๋ง ํ•ด ๋ณด์…จ์Šต๋‹ˆ๊นŒ?

์˜ค๋Š˜ ๋‚˜๋Š” ์šฐ์—ฐํžˆ https://pythonhosted.org/joblib/parallel.html#bad -interaction-of-multiprocessing-and-third-party-libraries๋ฅผ ๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๊ด€๋ จ์ด ์—†์Šต๋‹ˆ๊นŒ?
ํŒŒ์ด์ฌ 3.4 ์ด์ƒ์œผ๋กœ ์—…๊ทธ๋ ˆ์ด๋“œํ•ด์•ผํ• ๊นŒ์š”?

๊ธด ํœด๊ฐ€๋ฅผ ๋ณด๋‚ด์„œ ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜๋„ ์งˆ๋ฌธ์— ๋Œ€๋‹ตํ•˜๋ ค๋ฉด :

  1. re @jnothman : pickle.loads(pickle.dumps(function)) ๋ฅผ parallel.py์— ๋„ฃ๊ณ  ๊ทธ ๋’ค์— ์ธ์‡„ ๋ฌธ์„ ๋„ฃ์–ด ๊น”๋”ํ•˜๊ฒŒ ์‹คํ–‰๋˜๊ณ  ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๊ณ  ๊ฑฐ๊ธฐ์— ๋ฌธ์ œ๊ฐ€ ์—†๋Š”์ง€ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ช…ํ™•ํžˆํ•˜๊ธฐ ์œ„ํ•ด ๋…ธํŠธ๋ถ์—์„œ ํ˜ธ์ถœ ๋œ GridSearchCV.fit์€ ๋ณ€๊ฒฝ์—†์ด ์ด์ „๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์—ฌ์ „ํžˆ ๋ฉˆ์ถฐ ์žˆ์Šต๋‹ˆ๋‹ค (n_jobs = -1๋กœ 16 ๋ฒˆ ํ‘œ์‹œ๋˜๋Š” ์ธ์‡„ ๋ฌธ ์ œ์™ธ).
  2. re @amueller : ๋‚ด๊ฐ€ ๋‹น์‹ ์„ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ดํ•ดํ•˜๊ณ  ์žˆ๋‹ค๋ฉด ๋ฌธ์ œ์—†์ด ๋…ธํŠธ๋ถ์—์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒƒ์„ ์‹คํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.
def test_function(x):
    return x**2
pickle.loads(pickle.dumps(test_function))(3)
# 9
  1. re @olologin : ๋‚˜๋Š” 3.4.3์— ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜๋Š” ๋” ๊ตฌ์ฒด์ ์œผ๋กœ : '3.4.3 | Continuum Analytics, Inc. | (๊ธฐ๋ณธ๊ฐ’, 2015 ๋…„ 3 ์›” 6 ์ผ, 12:07:41) \ n [GCC 4.2.1 (Apple Inc. ๋นŒ๋“œ 5577)] '

์œ„์˜ ๋Œ€ํ™”๋ฅผ ์ฝ์ง€ ์•Š์•˜์ง€๋งŒ์ด ์ตœ์†Œํ•œ์˜ ํ…Œ์ŠคํŠธ๋Š” travis์˜ Python 2.6 ๋นŒ๋“œ์—์„œ ์‹คํŒจํ–ˆ์ง€๋งŒ ๋‚ด PC์—์„œ ์œ ์‚ฌํ•œ ๊ตฌ์„ฑ์œผ๋กœ ํ†ต๊ณผํ–ˆ์Šต๋‹ˆ๋‹ค ... ( n_jobs = -1 ๋•Œ ์‹คํŒจํ•œ๋‹ค๊ณ  ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ด์ „ python / joblib / scipy ๋ฒ„์ „์˜ ๋‹จ์ผ ์ฝ”์–ด ๋จธ์‹ ์— ์„ค์ •๋˜์–ด ์žˆ์Šต๋‹ˆ๊นŒ?)

def test_cross_val_score_n_jobs():
    # n_jobs = -1 seems to hang in older versions of joblib/python2.6
    # See issue 5115
    cross_val_score(LinearSVC(), digits.data, digits.target, cv=KFold(3),
                    scoring="precision_macro", n_jobs=-1)

์ด ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ ๊ฒฝ์šฐ +1, ๋„์›€์ด ๋  ๊ฒฝ์šฐ ์„ธ๋ถ€ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ด์ฃผ์„ธ์š”.

@ eric-czech Python 3.4 ๋˜๋Š” 3.5 ๋ฏธ๋งŒ์ธ ๊ฒฝ์šฐ ๋‹ค์Œ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์„ค์ • ํ•œ ๋‹ค์Œ Python ํ”„๋กœ๊ทธ๋žจ์„ ๋‹ค์‹œ ์‹œ์ž‘ํ•˜์‹ญ์‹œ์˜ค.

export JOBLIB_START_METHOD="forkserver"

joblib ๋ฌธ์„œ์— ์„ค๋ช… ๋œ๋Œ€๋กœ. forkserver is mode๋Š” ๋Œ€ํ™”์‹์œผ๋กœ ์ •์˜ ๋œ ๊ธฐ๋Šฅ์„ ์ค‘๋‹จํ•˜๋ฏ€๋กœ ๊ธฐ๋ณธ์ ์œผ๋กœ ํ™œ์„ฑํ™”๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์ตœ์‹  ์†Œํ”„ํŠธ์›จ์–ด๊ฐ€ ์„ค์น˜๋œ OS X 10.11.4 ๋ฐ Ubuntu 14.04 ๋ชจ๋‘์—์„œ ๋™์ผํ•œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

# Metrics
B_R = 10.0

def raw_TPR(y_true, y_pred):
    return np.sum((y_true == 1) & (y_pred == y_true))

def raw_FPR(y_true, y_pred):
    return np.sum((y_true == 0) & (y_pred != y_true))

def AMS(y_true, y_pred):
    print("Hello")
    tpr = raw_TPR(y_true, y_pred)
    fpr = raw_FPR(y_true, y_pred)
    score = np.sqrt(2 * ((tpr + fpr + B_R) * np.log(1 + tpr / (fpr + B_R))) - tpr)
    return score


# Grid search

param_grid = {
    "max_depth":[6, 10],
    "learning_rate":[0.01, 0.5],
    "subsample":[0, 1],
    "min_child_weight":[0.1, 1],
    "colsample_bytree":[0.1, 1],
    "base_score":[0.1, 1],
    "gamma":[0.5, 3.5]
}

scorer = make_scorer(AMS, greater_is_better=True)


clf = XGBClassifier()
gridclf = GridSearchCV(clf, param_grid, scorer, n_jobs=-1, verbose=2)
gridclf.fit(X_train, y_train)

์‹ค์ œ๋กœ์ด ์ฝ”๋“œ๋Š” n_jobs=1 ๊ฒฝ์šฐ์—๋งŒ ๊ณ ์ •๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์ด๊ฒƒ์€ ์ด์ œ ๊ธฐ๋ณธ์ ์œผ๋กœ ํŒŒ์ด์ฌ 3์—์„œ ์ž‘๋™ํ•˜๊ณ  ํŒŒ์ด์ฌ 2์—์„œ wontfix , ๋งž์Šต๋‹ˆ๋‹ค

๊ฒฝ๊ณ  ๋‚˜ ์˜ค๋ฅ˜ ( "n_jobs> 1 not supported on Python 2")์—†์ด Python 2์—์„œ ์ž๋™์œผ๋กœ ์ค‘๋‹จ๋˜๋Š” ๊ฒฝ์šฐ ํ—ˆ์šฉ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์˜ค๋ฅ˜๋ฅผ ๋˜์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

@amueller on Python 3์—์„œ๋Š” https://github.com/scikit-learn/scikit-learn/issues/5115#issuecomment -187683383์„ ํŒ”๋กœ์šฐํ•˜์—ฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, Python 3์—์„œ๋„ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ž‘๋™ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์›๋ž˜ OP๊ฐ€ joblib start_method๋ฅผ forkserver๋กœ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์ด ํ•ญ์ƒ ์ž‘๋™ํ•˜์ง€ ์•Š๋Š”๋‹ค๊ณ  ๋งํ•˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์˜€๊ธฐ ๋•Œ๋ฌธ์— ๋‹ซ์•„์•ผํ• ์ง€ ์—ฌ๋ถ€๋Š” ํ™•์‹คํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค ...

BTW xgboost ํ•˜๋‚˜๋Š” ์•Œ๋ ค์ง„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. https://github.com/scikit-learn/scikit-learn/issues/6627#issuecomment -206351138์„ ์ฐธ์กฐ

ํŽธ์ง‘ : ์•„๋ž˜ ๋ณ€๊ฒฝ ์‚ฌํ•ญ์€ ์‹ค์ œ๋กœ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‚ด๊ฐ€ ์‹ค์ œ๋กœ ์ˆ˜์ •ํ–ˆ์„ ์ˆ˜๋„์žˆ๋Š” Pathos๋กœ ๋ฉ€ํ‹ฐ ํ”„๋กœ์„ธ์‹ฑ์„ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ๋ฒ•์—๋„ ๊ด€๋ จ์—†๋Š” ๋ณ€๊ฒฝ์ด์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

๋น ๋ฅธ ์ˆ˜์ •:
np.random.seed(0)

์„ค๋ช…:
๋‚˜๋Š” auto_ml ์— ๋Œ€ํ•œ ํ…Œ์ŠคํŠธ ์Šค์œ„ํŠธ์—์„œ ๊ฐ€์žฅ ์‹ฌ๊ฐ ์ง๋ฉดํ–ˆ์Šต๋‹ˆ๋‹ค . GridSearchCV๋ฅผ ์ฒ˜์Œ (2?) ๋ฒˆ ์‹คํ–‰ํ–ˆ์ง€๋งŒ ๊ดœ์ฐฎ ์•˜์ง€๋งŒ ํ›„์† ์‹คํ–‰์€ ์˜ค๋ฅ˜์—†์ด ์ค‘๋‹จ๋ฉ๋‹ˆ๋‹ค.

์žฌํ˜„์„ฑ์„ ๋ณด์žฅํ•˜๋ฉด์„œ ๋ฌด์ž‘์œ„์„ฑ์„ ํ›ผ์†ํ•˜์ง€ ์•Š๊ณ  ์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ ํ…Œ์ŠคํŠธ๋ฅผ ์žฌ์ •๋ ฌ ํ•  ์ˆ˜์žˆ๋Š” ์œ ์—ฐ์„ฑ์„ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ ํ…Œ์ŠคํŠธ์— np.random.seed(0) ์„ค์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ฒŒํ•˜์ž๋งˆ์ž GSCV ์˜ค๋ฅ˜์— ๊ฑธ๋ฆฐ ๋ชจ๋“  ํ…Œ์ŠคํŠธ๊ฐ€ ๋‹ค์‹œ ์ž‘๋™ํ•˜๊ธฐ ์‹œ์ž‘ํ–ˆ์Šต๋‹ˆ๋‹ค.

def test_name():
    np.random.seed(0)
    test_code_involving_gscv_here

์ด๊ฒƒ์ด ๋””๋ฒ„๊น…์— ๋„์›€์ด๋˜๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค!

๊ฐœ๋ฐœ ํ™˜๊ฒฝ :
Mac OS X (Sierra)
ํŒŒ์ด์ฌ 2.7
๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ ์ตœ์‹  ๋ฒ„์ „.

@ClimbsRocks ๊ธ€์Ž„, ๊ทธ๊ฒƒ์€ ์•„๋งˆ๋„ ๋‹น์‹ ์˜ ์ถ”์ •๊ธฐ์—์„œ ์•ฝ๊ฐ„์˜ ์˜ค๋ฅ˜ ์ผ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ์˜ˆ๊ฐ€ ์žˆ์œผ๋ฉด ์•Œ๋ ค์ฃผ์‹ญ์‹œ์˜ค.)

@amueller : ์ข‹์€ ์ „ํ™”์ž…๋‹ˆ๋‹ค. ๋‚˜๋Š” ์—ฌ๋Ÿฌ๋ถ„๋“ค์ด ์ด๊ฒƒ์„ ์žฌํ˜„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋‚˜๋ญ‡ ๊ฐ€์ง€๋ฅผ ์ž๋ฅด๊ธฐ ์œ„ํ•ด ๋‹ฌ๋ ค ๊ฐ”์ง€๋งŒ ์ด๋ฒˆ์—๋Š” ๋ชจ๋“  ๊ฒƒ์ด ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์‹คํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

ํ”„๋กœ๊ทธ๋žจ์˜ ๋‹ค๋ฅธ ๋ถ€๋ถ„์—์„œ๋„ Pathos์˜ ๋ณ‘๋ ฌํ™”๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ GSCV์˜ ๋ณ‘๋ ฌํ™”๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์•„๋งˆ๋„ ๋ฌธ์ œ๋ผ๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ์ง€๋‚œ์ฃผ์— ๋‚ด๊ฐ€ ๋ณ€๊ฒฝ ํ•œ ์œ ์ผํ•œ ๋‹ค๋ฅธ ๊ด€๋ จ ์‚ฌํ•ญ์ž…๋‹ˆ๋‹ค.

๋‚˜๋Š” ๊ทธ ์ดํ›„๋กœ ๊ทธ๋“ค์˜ ๋‹ค์ค‘ ์ฒ˜๋ฆฌ ํ’€์„ ๋” ์ฒ ์ €ํžˆ ๋‹ซ๊ณ  ์—ด๋„๋ก ๋ฆฌํŒฉํ† ๋งํ–ˆ์Šต๋‹ˆ๋‹ค.

๋‚˜๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์€ ์ถ”์ •๊ธฐ ์ค‘ ํ•˜๋‚˜์˜ ๋ฒ„๊ทธ๊ฐ€ ์•„๋‹ˆ๋ผ ํ…Œ์ŠคํŠธ ์Šค์œ„ํŠธ๋ฅผ ๋นŒ๋“œ ํ•  ๋•Œ ๊ฐ ํ…Œ์ŠคํŠธ๊ฐ€ ๊ฐœ๋ณ„์ ์œผ๋กœ ์‹คํ–‰๋˜๊ณ  ํ†ต๊ณผ๋˜์—ˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋™์ผํ•œ ํŒจ์Šค์—์„œ ์—ฌ๋Ÿฌ ํ…Œ์ŠคํŠธ๋ฅผ ์‹คํ–‰ํ–ˆ์„ ๋•Œ๋งŒ GSCV์— ์˜์กดํ•˜์—ฌ ์ค‘๋‹จ๋˜๊ธฐ ์‹œ์ž‘ํ–ˆ์Šต๋‹ˆ๋‹ค.

์ด ๋ถˆํ™•์‹ค์„ฑ์„ ์–ธ๊ธ‰ํ•˜๊ธฐ ์œ„ํ•ด ์ด์ „ ์ฃผ์„์„ ํŽธ์ง‘ํ–ˆ์Šต๋‹ˆ๋‹ค.

joblib๋ฅผ ๋‹ค๋ฅธ ๋ณ‘๋ ฌํ™”์™€ ๊ฒฐํ•ฉํ•˜๋ฉด ์ถฉ๋Œ์ด ๋ฐœ์ƒํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์œผ๋ฏ€๋กœ ์‹œ๋„ํ•ด์„œ๋Š” ์•ˆ๋ฉ๋‹ˆ๋‹ค.

์ด ์Šค๋ ˆ๋“œ์— ๋Œ€ํ•ด ์ฃ„์†กํ•˜์ง€๋งŒ์ด ๋ฌธ์ œ๋„ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
Python 3.5 ์ปค๋„์„ ๋งŒ๋“ค๊ณ  forkserver์— ๋Œ€ํ•œ ์ž‘์—… lib ์‹œ์ž‘ ๋ฐฉ๋ฒ•์„ ์ •์˜ํ–ˆ์ง€๋งŒ ์—ฌ์ „ํžˆ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

์‹ค์ œ๋กœ n_jobs = 1์—์„œ๋„ ์ž‘๋™ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰ ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์ œ์™ธํ•˜๊ณ  ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์„ ๋ด…๋‹ˆ๋‹ค.

๋‰ด์Šค๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ?

์‹ค์ œ๋กœ n_jobs = 1์—์„œ๋„ ์ž‘๋™ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰ ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์ œ์™ธํ•˜๊ณ  ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์„ ๋ด…๋‹ˆ๋‹ค.

์ด๊ฒƒ์€ ์ด์ƒํ•˜๊ณ ์ด ๋ฌธ์ œ์™€ ๊ด€๋ จ์ด ์—†์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋งค์šฐ ๋†’์Šต๋‹ˆ๋‹ค (์•ฝ n_jobs != 1 ). ์ข‹์€ ํ”ผ๋“œ๋ฐฑ์„ ์–ป๋Š” ๊ฐ€์žฅ ์ข‹์€ ๋ฐฉ๋ฒ•์€ ๋ฌธ์ œ๋ฅผ ์žฌํ˜„ํ•˜๋Š” ๋…๋ฆฝ ์‹คํ–‰ ํ˜• ์Šค ๋‹ˆํŽซ์œผ๋กœ ๋ณ„๋„์˜ ๋ฌธ์ œ๋ฅผ ์—ฌ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด ๋ฌธ์ œ๋ฅผ ์ง์ ‘ ์ ‘ํ•˜๊ณ  ์žˆ๋‹ค๊ณ  ํ™•์‹ ํ•ฉ๋‹ˆ๋‹ค. ๋งŽ์€ ์กฐํ•ฉ์„ ์‹œ๋„ํ•œ ํ›„ n_jobs> 1๋กœ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ชจ๋“  ์ž‘์—…์€ ๋ช‡ ๋ฒˆ ์ ‘์€ ํ›„์— ๋ฉˆ ์ถฅ๋‹ˆ ๋‹ค. ๋‚˜๋Š” sklearn = 0.19.0์˜ Ubuntu Linux ๋žฉํƒ‘์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์œผ๋ฏ€๋กœ ์ด๊ฒƒ์€ ๋‚ด๊ฐ€ ์ฝ์€ ๋‹ค๋ฅธ ๊ตฌ์„ฑ๊ณผ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ๋‹ค์Œ์€ "๋ถˆ์พŒ๊ฐ์„์ฃผ๋Š”"์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

import xgboost as xgb
from sklearn.model_selection import GridSearchCV
cv_params = {'max_depth': [3,5,7], 'min_child_weight': [1,3,5]}

ind_params = {'learning_rate': 0.1, 'n_estimators': 1000, 'seed':0, 'subsample': 0.8, 'colsample_bytree': 0.8,  'objective': 'binary:logistic'}
optimized_XGB = GridSearchCV(xgb.XGBClassifier(**ind_params), 
                            cv_params, scoring = 'roc_auc', cv = 5, n_jobs = 1, verbose=2) 
optimized_XGB.fit(xgboost_train, label_train,eval_metric='auc')

ํฅ๋ฏธ๋กœ์šด ์  ์ค‘ ํ•˜๋‚˜๋Š” xgboost๋ฅผ ๊ฐ€์ ธ์˜ฌ ๋•Œ model_selection์—์„œ ๊ฐ€์ ธ ์˜ค์ง€ ์•Š์€ ๊ฒƒ์ฒ˜๋Ÿผ GridSearchCV์—์„œ ์‚ฌ์šฉ ์ค‘๋‹จ ๊ฒฝ๊ณ ๊ฐ€ ํ‘œ์‹œ๋œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋‚˜๋Š” xgboost 0.62์— ์žˆ์œผ๋ฉฐ ์ €์žฅ์†Œ๋ฅผ ๋ณด๋ฉด ์˜ฌ๋ฐ”๋ฅธ GridSearchCV๋ฅผ ๊ฐ€์ ธ ์˜ค๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ž…๋‹ˆ๋‹ค. ๋ช…ํ™•ํ•˜๊ฒŒ ๋งํ•˜๋ฉด, ์ง€์› ์ค‘๋‹จ ๊ฒฝ๊ณ ๋Š” ์ €์™€ ๊ด€๋ จ๋œ ๋ฌธ์ œ๊ฐ€ ์•„๋‹ˆ๋ผ ๋‹น๋ฉดํ•œ ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. n_jobs> 1๋กœ ์‹คํ–‰์ด ์ค‘๋‹จ๋˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋„์›€์ด ๋  ์ˆ˜์žˆ๋Š” ๊ฒฝ์šฐ๋ฅผ ์ง€์ ํ•˜์‹ญ์‹œ์˜ค.

๋ฌธ์ œ๋ฅผ ์žฌํ˜„ํ•˜๋Š” ๋ฐ ๋„์›€์ด๋˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณต ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

2017 ๋…„ 8 ์›” 24 ์ผ 20:29, Xavier Amatriain [email protected]
์ผ๋‹ค :

์ด ๋ฌธ์ œ๋ฅผ ์ง์ ‘ ์ ‘ํ•˜๊ณ  ์žˆ๋‹ค๊ณ  ํ™•์‹ ํ•ฉ๋‹ˆ๋‹ค. ๋งŽ์€ ์‹œ๋„ ํ›„
์กฐํ•ฉ, ๋‚ด๊ฐ€ n_jobs> 1๋กœํ•˜๋Š” ๋ชจ๋“  ์ž‘์—…์€
์ฃผ๋ฆ„. ์ €๋Š” sklearn = 0.19.0์˜ Ubuntu Linux ๋žฉํƒ‘์„ ์‚ฌ์šฉ ์ค‘์ด๋ฏ€๋กœ
๋‚ด๊ฐ€ ์ฝ์€ ๋‹ค๋ฅธ ๊ตฌ์„ฑ๊ณผ ๋‹ค๋ฅธ ๊ตฌ์„ฑ. ์—ฌ๊ธฐ์—
"๋ถˆ๋ฒ•"์ฝ”๋“œ :

`xgboost๋ฅผ xgb๋กœ ๊ฐ€์ ธ ์˜ค๊ธฐ
sklearn.model_selection์—์„œ ๊ฐ€์ ธ ์˜ค๊ธฐ GridSearchCV
cv_params = { 'max_depth': [3,5,7], 'min_child_weight': [1,3,5]}

ind_params = { 'ํ•™์Šต _ ์†๋„': 0.1, 'n_estimators': 1000, '์”จ์•—': 0,
'subsample': 0.8, 'colsample_bytree': 0.8, 'objective': ' binary : logistic '}
optimize_XGB = GridSearchCV (xgb.XGBClassifier (** ind_params),
cv_params, ์ ์ˆ˜ = 'roc_auc', cv = 5, n_jobs = 1, verbose = 2)
optimize_XGB.fit (xgboost_train, label_train, eval_metric = 'auc')`

ํฅ๋ฏธ๋กœ์šด ์  ์ค‘ ํ•˜๋‚˜๋Š” xgboost๋ฅผ ๊ฐ€์ ธ์˜ฌ ๋•Œ
GridSearchCV์—์„œ ๊ฐ€์ ธ ์˜ค์ง€ ์•Š์€ ๊ฒƒ์ฒ˜๋Ÿผ ์‚ฌ์šฉ ์ค‘๋‹จ ๊ฒฝ๊ณ 
model_selection. ๊ทธ๋Ÿฌ๋‚˜ ๋‚˜๋Š” xgboost 0.62๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์œผ๋ฉฐ
์ €์žฅ์†Œ๋Š” ์˜ฌ๋ฐ”๋ฅธ GridSearchCV๋ฅผ ๊ฐ€์ ธ ์˜ค๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ž…๋‹ˆ๋‹ค. ๋˜๋ ค๊ณ 
๋ถ„๋ช…ํ•œ ๊ฒƒ์€ ์ง€์› ์ค‘๋‹จ ๊ฒฝ๊ณ ๊ฐ€ ์ €์™€ ๊ด€๋ จ๋œ ๋ฌธ์ œ๊ฐ€ ์•„๋‹ˆ๋ผ
์†์— ํ•˜๋‚˜ : n_jobs> 1๋กœ ์‹คํ–‰ ์ค‘์ง€. ๊ทธ๋ƒฅ ์ง€์ 
๋„์›€์ด ๋  ์ˆ˜์žˆ๋Š” ๊ฒฝ์šฐ.

โ€”
๋‹น์‹ ์ด ์–ธ๊ธ‰ ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๊ฒƒ์„ ๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ๋‹ต์žฅํ•˜๊ณ  GitHub์—์„œ ํ™•์ธํ•˜์„ธ์š”.
https://github.com/scikit-learn/scikit-learn/issues/5115#issuecomment-324597686 ,
๋˜๋Š” ์Šค๋ ˆ๋“œ ์Œ์†Œ๊ฑฐ
https://github.com/notifications/unsubscribe-auth/AAEz66DbfTlnU_-dcxLKa5zkrcZ-0qVOks5sbVCmgaJpZM4FqYlN
.

๋ฌผ๋ก ์ž…๋‹ˆ๋‹ค. ๋‹ค์Œ์—์„œ ์‚ฌ์šฉ์ค‘์ธ ์ •ํ™•ํ•œ ํŒŒ์ผ์„ ๋‹ค์šด๋กœ๋“œ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
https://xamat.github.io/xgboost_train.csv
https://xamat.github.io/label_train.csv

HTTP404

์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋งํฌ์— ์˜ค๋ฅ˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ ์ˆ˜์ •๋˜์–ด์•ผํ•ฉ๋‹ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ๋„ ๊ดœ์ฐฎ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ฐฉ๊ธˆ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.

xgboost์˜ ์•Œ๋ ค์ง„ ๋ฌธ์ œ๋Š” https://github.com/scikit-learn/scikit-learn/issues/6627#issuecomment -206351138์„ ์ฐธ์กฐ

์ฐธ๊ณ ๋กœ joblib ์˜

์—ฌ์ „ํžˆ ๋ฒ„๊ทธ์ž…๋‹ˆ๊นŒ? 80 ๊ฐœ์˜ ๋งค๊ฐœ ๋ณ€์ˆ˜ ์กฐํ•ฉ๊ณผ ShuffleSplit CV (n = 20)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ RandomForestClassifier ์‚ฌ์šฉํ•˜๋Š” pre_dispatch = 1๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๊ธฐ๋ณธ๊ฐ’ (n_jobs = 1)์—์„œ๋„ ๋™์ผํ•œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

๋˜ํ•œ ํŒŒ์ดํ”„ ๋ผ์ธ ( SelectKBest(score_func=mutual_info_classif, k=10) ๋‹ค์Œ์— RandomForestClassifier )์— ๋Œ€ํ•ด ์ค‘๋‹จ๋ฉ๋‹ˆ๋‹ค. ๋‘˜ ๋‹ค ์ตœ์‹  ๋ฆด๋ฆฌ์Šค์™€ ๊ฐœ๋ฐœ ๋ฒ„์ „์—์„œ ๋ชจ๋‘ ์ค‘๋‹จ๋ฉ๋‹ˆ๋‹ค.

ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์ด๋‚˜ ์•ˆ์ •์ ์œผ๋กœ ์ž‘๋™ํ•˜๋Š” ๋‹ค๋ฅธ ๋ชจ๋ธ ์„ ํƒ ๋ฐฉ๋ฒ•์„ ์ฐพ์•˜๋‹ค๋ฉด ์•Œ๋ ค์ฃผ์„ธ์š”. scikit-optimize ์‹œ๋„๋ฅผ ๊ณ ๋ ค ์ค‘์ž…๋‹ˆ๋‹ค.

n_jobs = 1์„ ์˜๋ฏธํ•ฉ๋‹ˆ๊นŒ ์•„๋‹ˆ๋ฉด ์˜คํƒ€์ž…๋‹ˆ๊นŒ? ์ด ๋ฌธ์ œ๋Š” n_jobs! = 1์— ๊ด€ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

ํ’ˆ์งˆ ํ”ผ๋“œ๋ฐฑ์„ ์–ป๋Š” ๊ฐ€์žฅ ์ข‹์€ ๋ฐฉ๋ฒ•์€ ๋ฌธ์ œ๋ฅผ ์žฌํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ณด๊ณ ์žˆ๋Š” ๋ฌธ์ œ๊ฐ€ ์‹ค์ œ๋กœ n_jobs = 1์—์žˆ๋Š” ๊ฒฝ์šฐ์ด ๊ฒฝ์šฐ ๋ณ„๋„์˜ ๋ฌธ์ œ๋ฅผ์—ฌ์‹ญ์‹œ์˜ค.

"๋ฉ€ํ‹ฐ ์Šค๋ ˆ๋”ฉ ๊ฐ€๋Šฅ"์ด๋ผ๋Š” ๋œป์„ ์ผ์Šต๋‹ˆ๋‹ค.
n_jobs! = 1์€ '1๊ณผ ๊ฐ™์ง€ ์•Š์Œ'๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ n_jobs> 1์ž…๋‹ˆ๋‹ค. ์˜ˆ : n_jobs = 4

n_jobs = 4์— ๋Œ€ํ•œ ๋™๊ฒฐ์„ ์žฌํ˜„ ํ•  ์ˆ˜ ์—†๋‹ค๋Š” ๋ง์ž…๋‹ˆ๊นŒ?

๊ทธ๋ ‡๋‹ค๋ฉด ํ•œ ๋‹ฌ ์ด๋‚ด์— ํ…Œ์ŠคํŠธ ์ผ€์ด์Šค๋ฅผ ์ œ๊ณต ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค (์ƒˆ๋กœ์šด ๋จธ์‹ ์œผ๋กœ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค).

2017 ๋…„ 9 ์›” 12 ์ผ ์˜ค์ „ 7์‹œ 10 ๋ถ„์— Loรฏc Estรจve < [email protected] [email protected] >์ด ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

n_jobs = 1์„ ์˜๋ฏธํ•ฉ๋‹ˆ๊นŒ ์•„๋‹ˆ๋ฉด ์˜คํƒ€์ž…๋‹ˆ๊นŒ? ์ด ๋ฌธ์ œ๋Š” n_jobs! = 1์— ๊ด€ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

ํ’ˆ์งˆ ํ”ผ๋“œ๋ฐฑ์„ ์–ป๋Š” ๊ฐ€์žฅ ์ข‹์€ ๋ฐฉ๋ฒ•์€ ๋ฌธ์ œ๋ฅผ ์žฌํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ณด๊ณ ์žˆ๋Š” ๋ฌธ์ œ๊ฐ€ ์‹ค์ œ๋กœ n_jobs = 1์—์žˆ๋Š” ๊ฒฝ์šฐ์ด ๊ฒฝ์šฐ ๋ณ„๋„์˜ ๋ฌธ์ œ๋ฅผ์—ฌ์‹ญ์‹œ์˜ค.

โ€”
๋‹น์‹ ์ด ์–ธ๊ธ‰ ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๊ฒƒ์„ ๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ๋‹ต์žฅํ•˜๊ฑฐ๋‚˜ GitHub https://github.com/scikit-learn/scikit-learn/issues/5115#issuecomment-328864498 ์—์„œ ๋ณด๊ฑฐ๋‚˜ https://github.com/notifications/unsubscribe- ์Šค๋ ˆ๋“œ๋ฅผ ์Œ์†Œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.

@smcinerney @raamana์ž…๋‹ˆ๊นŒ? ๋‚˜๋Š” @lesteve ๊ฐ€ n_jobs=1 ๋ฅผ ์“ด @raamana ์—๊ฒŒ ๋‹ต์žฅ์„ํ–ˆ๋‹ค๊ณ  ์ƒ๊ฐ ํ•˜๋Š”๋ฐ , ์ด๊ฒƒ์€์ด ๋ฌธ์ œ์™€ ๊ด€๋ จ์ด์—†๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์•„ ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค. ์ €๋Š” @raamana๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค. ์˜ˆ @raamana์˜ ๋ฌธ์ œ๋Š” ๋‹ค๋ฆ…๋‹ˆ๋‹ค (ํ•˜์ง€๋งŒ ๋™์ผํ•œ ์ฝ”๋“œ ๋•Œ๋ฌธ์ผ ์ˆ˜ ์žˆ์Œ)

2017 ๋…„ 9 ์›” 12 ์ผ ์˜ค์ „ 9์‹œ 23 ๋ถ„์— Andreas Mueller < [email protected] [email protected] >์ด ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

@smcinerney https://github.com/smcinerney @raamana https://github.com/raamana ์ž…๋‹ˆ๊นŒ? ๋‚˜๋Š” @lesteve https://github.com/lesteve ๊ฐ€ n_jobs = 1์„ ์ž‘์„ฑํ•œ @raamana https://github.com/raamana ์—๊ฒŒ ํšŒ์‹ ํ–ˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.์ด ๋ฌธ์ œ์™€ ๊ด€๋ จ์ด์—†๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

โ€”
๋‹น์‹ ์ด ์–ธ๊ธ‰ ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๊ฒƒ์„ ๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ๋‹ต์žฅํ•˜๊ฑฐ๋‚˜ GitHub https://github.com/scikit-learn/scikit-learn/issues/5115#issuecomment-328905819 ์—์„œ ๋ณด๊ฑฐ๋‚˜ https://github.com/notifications/unsubscribe- ์Šค๋ ˆ๋“œ๋ฅผ ์Œ์†Œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.

๋‚ด ์ž˜๋ชป์ด์•ผ, ๋‚˜๋Š” ๋ฌผ๊ฑด์„ ์„ž์„ ์˜๋„๊ฐ€ ์—†์—ˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ๋‹ค๋ฅธ ๋ฌธ์ œ๋ฅผ ์—ด โ€‹โ€‹๊ฒƒ์ž…๋‹ˆ๋‹ค (์ตœ์†Œํ•œ์˜ ์ฝ”๋“œ๋กœ ์žฌํ˜„), ๊ธฐ๋ณธ n_jobs = 1๋กœ๋„ ์ค‘๋‹จ๋˜์ง€ ์•Š๋Š” GridSearchCV๋Š” n_jobs> 1๋ณด๋‹ค ๋” ํฐ ๊ด€์‹ฌ์‚ฌ์ž…๋‹ˆ๋‹ค (๊ธฐ๋ณธ๊ฐ’์ด๊ณ  ์ž‘๋™ํ•ด์•ผ ํ•จ).

@raamana ์˜ˆ, ๋” ํฐ ๋ฌธ์ œ์ด์ง€๋งŒ ๊ด€๋ จ ๋ฌธ์ œ๋กœ ์ธํ•ด ๋ฐœ์ƒํ•  ๊ฐ€๋Šฅ์„ฑ๋„ ์ ์Šต๋‹ˆ๋‹ค.

@ eric-czech @jnothman
๋”ฐ๋ผ์„œ backend = 'threading'์„ ์‚ฌ์šฉํ•˜๊ธฐ๋กœ ๊ฒฐ์ •ํ•œ ๊ฒฝ์šฐ. sklearn ์ฝ”๋“œ๋ฅผ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๊ณ  ์‰ฌ์šด ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜๋Š” GSV์˜ fit ๋ฐฉ๋ฒ•์„ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๊ณ  parallel_backend ์ปจํ…์ŠคํŠธ ๊ด€๋ฆฌ์ž๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

from sklearn.externals.joblib import parallel_backend

clf = GridSearchCV()
with parallel_backend('threading'):
    clf.fit(x_train, y_train)

์ถ”์‹  : "์Šค๋ ˆ๋”ฉ"์ด ๋ชจ๋“  ๊ฒฌ์  ์ž์—๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€ ํ™•์‹คํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ GSV njob> 1์„ ์‚ฌ์šฉํ•˜๋Š” ๋‚ด ๊ฒฌ์  ๊ธฐ์™€ ๋™์ผํ•œ ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ๊ณ  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๊ณ  ์˜ˆ์ƒ๋Œ€๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

์‹œ์Šคํ…œ ์‹œ๋„ :
MAC OS : 10.12.6
ํŒŒ์ด์ฌ : 3.6
numpy == 1.13.3
ํŒ๋‹ค == 0.21.0
scikit-learn == 0.19.1

ํ  ... ์Šค๋ ˆ๋”ฉ ๋ฐฑ์—”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐ ๋™์‹œ์„ฑ ๋ฌธ์ œ๊ฐ€์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๊ทธ๋ฆฌ๋“œ ๊ฒ€์ƒ‰, ์˜ˆ๋ฅผ ๋“ค์–ด # 10329์˜ ๋ฒ„๊ทธ๋Š” ๊ฒฝ์Ÿ ์กฐ๊ฑด์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค ...

2017 ๋…„ 12 ์›” 22 ์ผ 03:59์— Trideep Rath [email protected] ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ผ์Šต๋‹ˆ๋‹ค.

@ eric-czech https://github.com/eric-czech @jnothman
https://github.com/jnothman
๋”ฐ๋ผ์„œ backend = 'threading'์„ ์‚ฌ์šฉํ•˜๊ธฐ๋กœ ๊ฒฐ์ •ํ•œ ๊ฒฝ์šฐ. ์—†๋Š” ์‰ฌ์šด ๋ฐฉ๋ฒ•
sklearn ์ฝ”๋“œ๋ฅผ ๋ณ€๊ฒฝํ•˜๋Š” ๊ฒƒ์€ parallel_backend ์ปจํ…์ŠคํŠธ ๊ด€๋ฆฌ์ž๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
GSV์˜ ์ ํ•ฉ ๋ฐฉ๋ฒ•์€ ๋ณ€๊ฒฝ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

sklearn.externals.joblib์—์„œ import parallel_backend

clf = GridSearchCV ()
parallel_backend ( 'threading') ์‚ฌ์šฉ :
clf.fit (x_train, y_train)

์ถ”์‹  : "์Šค๋ ˆ๋”ฉ"์ด ๋ชจ๋“  ๊ฒฌ์  ์ž์—๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€ ํ™•์‹คํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋‚˜๋Š”
GSV njob> 1์„ ์‚ฌ์šฉํ•˜๋Š” ๋‚ด ์ถ”์ •๊ธฐ์™€ ๋™์ผํ•œ ๋ฌธ์ œ๊ฐ€ ์žˆ๊ณ  ์ด๊ฒƒ์„ ์‚ฌ์šฉํ•˜์—ฌ
๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๊ณ  ์˜ˆ์ƒ๋Œ€๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

์‹œ์Šคํ…œ ์‹œ๋„ :
MAC OS : 10.12.6
ํŒŒ์ด์ฌ : 3.6
numpy == 1.13.3
ํŒ๋‹ค == 0.21.0
scikit-learn == 0.19.1

โ€”
๋‹น์‹ ์ด ์–ธ๊ธ‰ ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๊ฒƒ์„ ๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ๋‹ต์žฅํ•˜๊ณ  GitHub์—์„œ ํ™•์ธํ•˜์„ธ์š”.
https://github.com/scikit-learn/scikit-learn/issues/5115#issuecomment-353402474 ,
๋˜๋Š” ์Šค๋ ˆ๋“œ ์Œ์†Œ๊ฑฐ
https://github.com/notifications/unsubscribe-auth/AAEz64SfwYpjLU1JK0vukBRXJvWYs3LKks5tCo51gaJpZM4FqYlN
.

์‚ฌ๋ก€ : ๋ฐฑ์—”๋“œ๋ฅผ "์Šค๋ ˆ๋”ฉ"์œผ๋กœ ์‚ฌ์šฉํ•˜๊ณ  BaseEstimator ๋ฐ ClassifierMixin์„ ํ™•์žฅํ•˜๋Š” Estimator๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ฒฝ์Ÿ ์กฐ๊ฑด์ด ์–ด๋””์—์„œ ๋ฐœ์ƒํ•˜๋Š”์ง€ ์ž˜ ๋ชจ๋ฅด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ž์„ธํžˆ ์„ค๋ช…ํ•ด ์ฃผ์‹œ๊ฒ ์Šต๋‹ˆ๊นŒ?

๋‚ด ์ดํ•ด์™€ ์‹คํ—˜์— ๋”ฐ๋ฅด๋ฉด ๊ฒฝ์Ÿ ์กฐ๊ฑด์„ ๊ด€์ฐฐํ•˜์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค.

out = Parallel(
    n_jobs=self.n_jobs, verbose=self.verbose,
    pre_dispatch=pre_dispatch
)(delayed(_fit_and_score)(clone(base_estimator), X, y, scorers, train,
                          test, self.verbose, parameters,
                          fit_params=fit_params,
                          return_train_score=self.return_train_score,
                          return_n_test_samples=True,
                          return_times=True, return_parameters=False,
                          error_score=self.error_score)
  for parameters, (train, test) in product(candidate_params,
                                           cv.split(X, y, groups)))

_fit_and_score๋Š” clone (base_estimator)์—์„œ ํ˜ธ์ถœ๋ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ deep_copy๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ  ์ž์ฒด ๋ฐ์ดํ„ฐ์˜ ์‚ฌ๋ณธ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

out์€ _fit_and_score ๋ฉ”์„œ๋“œ์˜ ์ถœ๋ ฅ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ์ด ํ›„ ๋ชจ๋“  ์“ฐ๋ ˆ๋“œ๋Š” ์ถ”์ •๊ธฐ์˜ fit ๋ฐฉ๋ฒ• ์‹คํ–‰์„ ์™„๋ฃŒํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ๋ณด๊ณ ํ•ฉ๋‹ˆ๋‹ค.

๊ฒฐ๊ณผ๋Š” GCV_clf.cv_results_์—์„œ ์–ป์€ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด ํŠน์ • ๊ฒฝ์šฐ์— ์™œ ๊ฒฝ์Ÿ ์กฐ๊ฑด์ด ๋ฐœ์ƒํ•˜๋Š”์ง€ ์„ค๋ช…ํ•ด ์ฃผ์‹œ๊ฒ ์Šต๋‹ˆ๊นŒ?

๊ฒฝ์Ÿ ์กฐ๊ฑด์€ ์ค‘์ฒฉ ๋œ ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์„ค์ •ํ•˜๋Š” ๊ฒฝ์šฐ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
๋ณ€๊ฒฝ๋œ ํ•˜๋‚˜์˜ ๋งค๊ฐœ ๋ณ€์ˆ˜๋Š” ์ถ”์ •๊ธฐ์ด๊ณ  ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” ๊ทธ ๋งค๊ฐœ ๋ณ€์ˆ˜์ž…๋‹ˆ๋‹ค.
ํ‰๊ฐ€์ž.

์ตœ์‹  ๋ฒ„์ „์˜ Win 7์—์„œ make_scorer ์™€ GridSearchCv ๋ฐ n_jobs=-1 ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋ฉด ๋™์ผํ•œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

Windows-7-6.1.7601-SP1
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)]
NumPy 1.12.1
SciPy 1.0.0
Scikit-Learn 0.19.1

@mansenfranzen ๋ฒ„์ „๊ณผ ํ”Œ๋žซํผ์„ ๊ฒŒ์‹œ https://stackoverflow.com/help/mcve ๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.

์‚ฌ์šฉ์ž ์ง€์ • ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„๋กœ Win7์—์„œ ๋™์ผํ•œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
ํˆด์ฒด์ธ :

Python 3.6.2
NumPy 1.13.1, 1.14.2 (under both)
SciPy 1.0.0
SkLearn 0.19.1

MCVE :

from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
import numpy as np

class CustomTransformer:
    def fit(self, X, y):
        return self

    def transform(self, X):
        return X

pipeline = make_pipeline(CustomTransformer(),
                         SVC())

X_train = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0]])
y_train = np.array([1.0, 0.0, 0.0, 1.0])

print(cross_val_score(pipeline, X_train, y_train, cv=2, n_jobs=-1))

ํŒŒ์ด์ฌ ๋‹ค์ค‘ ์ฒ˜๋ฆฌ๊ฐ€ if __name__ == '__main__' ์—†์ด Windows์—์„œ ์ž‘๋™ํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๊ณ  ์žˆ์Šต๋‹ˆ๊นŒ?

๋„ค, ๊ทธ๋ ‡์Šต๋‹ˆ๋‹ค. ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค. Jupyter๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋‹ค๋Š” ์‚ฌ์‹ค์„ ์žŠ์–ด ๋ฒ„๋ ธ์Šต๋‹ˆ๋‹ค.
if __name__ == '__main__' ๊ฐ€ ํฌํ•จ ๋œ ๋…๋ฆฝ ์‹คํ–‰ ํ˜• ์Šคํฌ๋ฆฝํŠธ๋Š” ๋‹ค์Œ ์ถ”์ ์„ ์ธ์‡„ ํ•œ ๋‹ค์Œ ์—ฌ์ „ํžˆ ์ •์ง€๋ฉ๋‹ˆ๋‹ค.

Process SpawnPoolWorker-1:
Traceback (most recent call last):
  File "C:\Python\Python36\lib\multiprocessing\process.py", line 249, in _bootstrap
    self.run()
  File "C:\Python\Python36\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Python\Python36\lib\multiprocessing\pool.py", line 108, in worker
    task = get()
  File "C:\Python\Python36\lib\site-packages\sklearn\externals\joblib\pool.py", line 362, in get
    return recv()
  File "C:\Python\Python36\lib\multiprocessing\connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
AttributeError: Can't get attribute 'CustomTransformer' on <module '__mp_main__' from 'C:\\projects\\Python\\Sandbox\\test.py'>
< same for SpawnPoolWorker-3 here >

์˜ค, ํฅ๋ฏธ ๋กญ๊ตฐ์š”. ํ‰๋ฒ”ํ•œ ๊ฒŒ์œผ๋ฆ„์œผ๋กœ ์ „์ฒด ์Šคํฌ๋ฆฝํŠธ๋ฅผ if __name__ == '__main__' ์•„๋ž˜์— ๋†“๊ณ  ์ด์ „ ์ฃผ์„์—์„œ ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค.

์ด์ œ pipeline = make_pipeline... ๋งŒ ๋ฐฐ์น˜ํ–ˆ๊ณ  ์„ฑ๊ณต์ ์œผ๋กœ ์‹คํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค. Jupyter์˜ ์›์ธ์ผ๊นŒ์š”?

์–ด์จŒ๋“  ์ด์ „ ๋Œ“๊ธ€์˜ ๋™์ž‘์ด ์œ ํšจํ•˜๊ณ  if __name__ == '__main__' ์˜ ๋ถ€์ ์ ˆํ•œ ์‚ฌ์šฉ์œผ๋กœ ์ธํ•œ ๊ฒƒ์ธ์ง€ ์•„๋‹ˆ๋ฉด SkLearn์˜ ์ž˜๋ชป์ธ์ง€ ์•Œ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ๋ฌธ์ œ๊ฐ€ ์•„๋‹Œ ๊ฒƒ ๊ฐ™์ง€๋งŒ ์‹คํ–‰์— ๋Œ€ํ•ด์„œ๋Š”
Windows์—์„œ ๋‹ค์ค‘ ์ฒ˜๋ฆฌ๋ฅผ์œ„ํ•œ ์ปจํ…์ŠคํŠธ ...

์ €๊ฑด ๋”๋Ÿฝ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์‹ค์ œ๋กœ ๋ชจ๋“  ๊ฒƒ์˜ ๋™์ผํ•œ ๋ฒ„์ „์œผ๋กœ Ubuntu์—์„œ ๋ฌธ์ œ๋ฅผ ์žฌํ˜„ ํ•  ์ˆ˜ ์—†์—ˆ์Šต๋‹ˆ๋‹ค. ๋„์™€ ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!

์ด ๋ฒ„๊ทธ๊ฐ€ ์‚ด์•„ ์žˆ๊ณ  ์ž˜ ์žˆ๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

jupyter ๋…ธํŠธ๋ถ, Python3, Sklearn 0.19.1์˜ Windows 10์—์„œ ์‹คํ–‰

Linux Mint (Ubuntu 16.10) Python 3.5์—์„œ ๋™์ผํ•œ ๋ฌธ์ œ

๋ชจ๋“  ๊ฒƒ์ด ๊ฐ ์ฝ”์–ด์˜ ์ฒซ ๋ฒˆ์งธ Epoch์—์„œ ๋ฉˆ์ถ”๊ณ  CPU๊ฐ€ ์ค‘๊ฐ„์— ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ž‘์—…์ด ์ˆ˜ํ–‰๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

@MrLobs ๋Š” ์‚ฐ์„ธ ์˜ค๋ฅ˜์ฒ˜๋Ÿผ ๋“ค๋ฆฌ ์ฃ ? CustomTransformer๋ฅผ ๋ณ„๋„์˜ ํŒŒ์ด์ฌ ํŒŒ์ผ์— ๋„ฃ์Šต๋‹ˆ๋‹ค.

@ Chrisjw42 @avatsaev ๋” ๋งŽ์€ ์ปจํ…์ŠคํŠธ๊ฐ€ ์—†์œผ๋ฉด ์šฐ๋ฆฌ๋Š” ์‹ค์ œ๋กœ ๋งŽ์ด ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.
@avatsaev ๊ฐ€

@amueller ์˜ˆ, ํ…์„œ ํ”Œ๋กœ์šฐ์ž…๋‹ˆ๋‹ค

@avatsaev ๋Š” ์•„์ง ์ •๋ณด๊ฐ€ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์žฌํ˜„ ํ•  ์ตœ์†Œํ•œ์˜ ์˜ˆ๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ? ์–ด๋–ค blas๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๊นŒ, GPU๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๊นŒ, scikit-learn์˜ ์–ด๋–ค ๋ฒ„์ „์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๊นŒ ....

Ok ๊ทธ๊ฒƒ์€ ๋‚ด๊ฐ€ TF GPU๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— n_jobs๋ฅผ> 1๋กœ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์ด ์‹ค์ œ๋กœ ์ž‘๋™ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ GPU๊ฐ€ ํ•˜๋‚˜๋ฟ์ด๊ธฐ ๋•Œ๋ฌธ์— ์ •์ƒ์ž…๋‹ˆ๋‹ค.

๋„ค, TF์™€ ํ•จ๊ป˜ n_jobs๋ฅผ ์‹ค์ œ๋กœ ์‚ฌ์šฉํ•ด์„œ๋Š” ์•ˆ๋ฉ๋‹ˆ๋‹ค.

์™œ ์•ˆ๋ผ?

@amueller , ์˜ˆ, ์‚ฌ์šฉ์ž ์ •์˜ ๋ณ€ํ™˜๊ธฐ๋ฅผ ๋ณ„๋„์˜ ํŒŒ์ผ์— ๋„ฃ์œผ๋ฉด ํ•ด๊ฒฐ๋ฉ๋‹ˆ๋‹ค.

n_jobs! = 1์ด ์ค‘๋‹จ ๋  ํ™˜๊ฒฝ์—์„œ ์˜ค๋ฅ˜ (๋˜๋Š” ์ตœ์†Œํ•œ ๊ฒฝ๊ณ )๋ฅผ ๋˜์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ? ๋ฐฉ๊ธˆ jupyter ๋…ธํŠธ๋ถ์—์„œ์ด ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์ œ๊ฐ€ ๋” ์ดˆ๋ณด์ž ์ธ ๊ฒฝ์šฐ (๋‚˜๋จธ์ง€ ํ•™๊ธ‰์ฒ˜๋Ÿผ) ์™œ gridsearchcv๊ฐ€ ๊ณ„์† ๋งค๋‹ฌ๋ฆฌ๋Š” ์ง€ ์•Œ ์ˆ˜ ์—†์—ˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์‚ฌ์‹ค ์šฐ๋ฆฌ ์„ ์ƒ๋‹˜์€ n_jobs =- 1. ์—ฌ๊ธฐ์—์„œ ๋ฌธ์ œ๊ฐ€ ์•Œ๋ ค์ง„ ๊ฒฝ์šฐ ํŒจํ‚ค์ง€ (keras ๋˜๋Š” sklearn ์ค‘ ์–ด๋Š ๊ฒƒ)์—์„œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ๊ฒƒ์ด๋ผ๊ณ  ๊ฒฝ๊ณ ํ•˜๊ณ  ์ค‘๋‹จ์„ ๋ฐฉ์ง€ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

๋‚˜๋Š” ์ด๊ฒƒ์ด ์–ด๋–ค ํ™˜๊ฒฝ์— ์ฒ˜ํ•˜๊ฒŒ ๋ ์ง€ ์•„๋ฌด๋„ ๋ชจ๋ฅธ๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค. ๋‚˜๋Š” ๋ˆ„๊ตฌ๋„์ด ๋ฒ„๊ทธ๋ฅผ ์‹ ๋ขฐํ•  ์ˆ˜์žˆ๋Š” ๋ฐฉ์‹์œผ๋กœ ์žฌํ˜„ ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ๋ฏฟ์ง€ ์•Š๋Š”๋‹ค.

ํ•˜์ง€๋งŒ ์šฐ๋ฆฌ๋Š” ๋‹ค์ค‘ ์ฒ˜๋ฆฌ ์ธํ”„๋ผ๋ฅผ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด ๋…ธ๋ ฅํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
์ด๊ฒƒ์ด ๊ทธ๋Ÿฌํ•œ ๋ชจ๋“  ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์„์ง€ ํ™•์‹คํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

@jnothman ๐Ÿ‘

๋“ฃ๊ธฐ ์ข‹์Šต๋‹ˆ๋‹ค!

์™œ ์ด๊ฒƒ์ด 0.21 ํƒœ๊ทธ์ธ์ง€ ์ž˜ ๋ชจ๋ฅด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ 0.20์—์„œ ํ•ด๊ฒฐ๋ฉ๋‹ˆ๋‹ค. ์ €๋Š” ์šฐ๋ฆฌ๊ฐ€ ์ด๊ฒƒ์„ ๋‹ซ๊ณ  ์‚ฌ๋žŒ๋“ค์ด ์ƒˆ๋กœ์šด ๋ฌธ์ œ๋ฅผ ์—ด๋„๋กํ•ด์•ผํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ๋„ˆ๋ฌด ๊ธธ๊ณ  ๊ตฌ์ฒด์ ์ด์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

jupyter๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ AWS Ubuntu์—์„œ ๋ฐฉ๊ธˆ ๋™์ผํ•œ ๋ฌธ์ œ๋ฅผ ๋งŒ๋‚ฌ์Šต๋‹ˆ๋‹ค.

parallel_backend ์‚ฌ์šฉ์ด ์ž‘๋™ํ•˜๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค ...

from sklearn.externals.joblib import parallel_backend

clf = GridSearchCV(...)
with parallel_backend('threading'):
    clf.fit(x_train, y_train)

@morienor scikit-learn 0.20.1๋กœ์ด ๋ฌธ์ œ๋ฅผ ์žฌํ˜„ ํ•  ์ˆ˜์žˆ๋Š” ๊ฒฝ์šฐ ๋‹ค๋ฅธ ์‚ฌ๋žŒ์ด ๋ฌธ์ œ๋ฅผ ์žฌํ˜„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•„์š”ํ•œ ๋ชจ๋“  ์„ธ๋ถ€ ์ •๋ณด๊ฐ€ ํฌํ•จ ๋œ ์ƒˆ ๋ฌธ์ œ๋ฅผ ์—ฝ๋‹ˆ ๋‹ค (๊ฐ€์งœ ์ž„์˜ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•œ import ๋ฌธ์ด ํฌํ•จ ๋œ ์ „์ฒด ์Šคํฌ๋ฆฝํŠธ). python, scikit-learn, numpy, scipy ๋ฐ ์šด์˜ ์ฒด์ œ์˜ ๋ชจ๋“  ๋ฒ„์ „ ๋ฒˆํ˜ธ์™€ ํ•จ๊ป˜.

jupyter๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ AWS Ubuntu์—์„œ ๋ฐฉ๊ธˆ ๋™์ผํ•œ ๋ฌธ์ œ๋ฅผ ๋งŒ๋‚ฌ์Šต๋‹ˆ๋‹ค.

parallel_backend ์‚ฌ์šฉ์ด ์ž‘๋™ํ•˜๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค ...

from sklearn.externals.joblib import parallel_backend

clf = GridSearchCV(...)
with parallel_backend('threading'):
    clf.fit(x_train, y_train)

๊ทธ๊ฒƒ์€ ๋‚˜๋ฅผ ์œ„ํ•ด ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค! ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!

@ jmq19950824 @morienor ์˜ˆ,ํ•˜์ง€๋งŒ GIL๋กœ ์ธํ•ด threading ๋ฐฑ์—”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ์˜๋ฏธ๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.

jupyter๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ AWS Ubuntu์—์„œ ๋ฐฉ๊ธˆ ๋™์ผํ•œ ๋ฌธ์ œ๋ฅผ ๋งŒ๋‚ฌ์Šต๋‹ˆ๋‹ค.

parallel_backend ์‚ฌ์šฉ์ด ์ž‘๋™ํ•˜๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค ...

from sklearn.externals.joblib import parallel_backend

clf = GridSearchCV(...)
with parallel_backend('threading'):
    clf.fit(x_train, y_train)

์ฒœ์žฌ๋Š” ๋‚˜๋ฅผ ์œ„ํ•ด ์ผํ•œ๋‹ค

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰