Xgboost: python xgboost์˜ ๋†’์€ ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋น„

์— ๋งŒ๋“  2020๋…„ 04์›” 02์ผ  ยท  3์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: dmlc/xgboost

์ €๋Š” python AutoML ํŒจํ‚ค์ง€ ์—์„œ ์ž‘์—… ์ค‘์ด๋ฉฐ ์‚ฌ์šฉ์ž ์ค‘ ํ•œ ๋ช…์ด xgboost ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋™์•ˆ ๋งค์šฐ ๋†’์€ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ๋ณด๊ณ ํ–ˆ์Šต๋‹ˆ๋‹ค.

xgboost์— ์˜ํ•œ ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋น„๋ฅผ ๋ณด์—ฌ์ฃผ๊ธฐ ์œ„ํ•ด ์กฐ์‚ฌ๋ฅผ ํ–ˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ ์—์„œ ๋…ธํŠธ๋ถ์„ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฝ”๋“œ์—์„œ ๋ชจ๋ธ์ด 7GB ์ด์ƒ์˜ RAM ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ• ๋‹นํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์„ ํ•˜๋“œ ๋””์Šคํฌ(5kB !)์— ์ €์žฅํ•œ ๋‹ค์Œ ๋‹ค์‹œ ๋กœ๋“œํ•˜๋ฉด ์—„์ฒญ๋‚œ ์–‘์˜ RAM์„ ์ ˆ์•ฝํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‚˜์—๊ฒŒ xgboost๊ฐ€ ๊ตฌ์กฐ์— ๋ฐ์ดํ„ฐ ์‚ฌ๋ณธ์„ ์ €์žฅํ•˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ž…๋‹ˆ๊นŒ? ๋‚ด ๋ง์ด ๋งž์•„?

xgboost๋กœ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๊นŒ? ๋ชจ๋ธ์„ ํ•˜๋“œ ๋“œ๋ผ์ด๋ธŒ์— ์ €์žฅํ•œ ๋‹ค์Œ ๋‹ค์‹œ ๋กœ๋“œํ•˜๋Š” ๊ฒƒ์ด ์ด ๋ฌธ์ œ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜์‹ญ๋‹ˆ๊นŒ?

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

@pplonski , ์šฐ๋ฆฌ๋Š” ์ด PR https://github.com/dmlc/xgboost/pull/5334 ์—์„œ๋„ CPU์˜ ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋น„๋ฅผ ์ค„์ด๋Š” ๊ฒƒ์„ ๊ตฌํ˜„ํ–ˆ์ง€๋งŒ 'hist' ๋ฐฉ๋ฒ•์—๋งŒ ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ์ง€๊ธˆ์€ ๋งˆ์Šคํ„ฐ์— ํฌํ•จ๋˜์–ด ์žˆ์ง€๋งŒ ํ–ฅํ›„ ๋ฆด๋ฆฌ์Šค์˜ ์ผ๋ถ€๊ฐ€ ๋˜๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค.

๋ฉ”๋ชจ๋ฆฌ, Kb | ํ•ญ๊ณต์‚ฌ | ํž‰์Šค1m |
-- | -- | -- |
์ด์ „ | 28311860 | 1907812 |
https://github.com/dmlc/xgboost/pull/5334 | 16218404 | 1155156 |
๊ฐ์†Œ: | 1.75 | 1.65 |

@trivialfis ์— ๋™์˜ํ•˜์‹ญ์‹œ์˜ค. ์ด ์ง€์—ญ์—๋Š” ํ•  ์ผ์ด ๋งŽ์Šต๋‹ˆ๋‹ค.

๋ชจ๋“  3 ๋Œ“๊ธ€

@pplonski ํžˆ์Šคํ† ๊ทธ๋žจ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋Œ€ํ•œ ๋ณต์‚ฌ๋ฅผ ์ œ๊ฑฐํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ง„ํ–‰์ค‘์ธ ์ž‘์—…์ž…๋‹ˆ๋‹ค. GPU์˜ ๊ฒฝ์šฐ ๋Œ€๋ถ€๋ถ„ ์™„๋ฃŒ๋ฉ๋‹ˆ๋‹ค. https://github.com/dmlc/xgboost/pull/5420 https://github.com/dmlc/xgboost/pull/5465

CPU๋Š” ์•„์ง ํ•  ์ผ์ด ๋” ์žˆ์Šต๋‹ˆ๋‹ค.

@pplonski , ์šฐ๋ฆฌ๋Š” ์ด PR https://github.com/dmlc/xgboost/pull/5334 ์—์„œ๋„ CPU์˜ ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋น„๋ฅผ ์ค„์ด๋Š” ๊ฒƒ์„ ๊ตฌํ˜„ํ–ˆ์ง€๋งŒ 'hist' ๋ฐฉ๋ฒ•์—๋งŒ ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ์ง€๊ธˆ์€ ๋งˆ์Šคํ„ฐ์— ํฌํ•จ๋˜์–ด ์žˆ์ง€๋งŒ ํ–ฅํ›„ ๋ฆด๋ฆฌ์Šค์˜ ์ผ๋ถ€๊ฐ€ ๋˜๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค.

๋ฉ”๋ชจ๋ฆฌ, Kb | ํ•ญ๊ณต์‚ฌ | ํž‰์Šค1m |
-- | -- | -- |
์ด์ „ | 28311860 | 1907812 |
https://github.com/dmlc/xgboost/pull/5334 | 16218404 | 1155156 |
๊ฐ์†Œ: | 1.75 | 1.65 |

@trivialfis ์— ๋™์˜ํ•˜์‹ญ์‹œ์˜ค. ์ด ์ง€์—ญ์—๋Š” ํ•  ์ผ์ด ๋งŽ์Šต๋‹ˆ๋‹ค.

์•ˆ๋…•ํ•˜์„ธ์š”, ์ €๋Š” ์ตœ๊ทผ์— xgboost์™€ ์œ ์‚ฌํ•œ ๋†’์€ ๋ฉ”๋ชจ๋ฆฌ ๋ฌธ์ œ์— ์ง๋ฉดํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ํ›ˆ๋ จ์„ ์œ„ํ•ด 'gpu_hist'๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

train() ๋ฉ”์„œ๋“œ๊ฐ€ ์‹คํ–‰๋  ๋•Œ ํฐ ์‹œ์Šคํ…œ ๋ฉ”๋ชจ๋ฆฌ ์ŠคํŒŒ์ดํฌ๊ฐ€ ๋ฐœ์ƒํ•˜์—ฌ jupyter ์ปค๋„์ด ์ถฉ๋Œํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.

  1. Xgboost๊ฐ€ ์‹œ์Šคํ…œ RAM์— ๋‚ด ๋ฐ์ดํ„ฐ์˜ ๋ณต์‚ฌ๋ณธ์„ ๋งŒ๋“ค๊ณ  ์žˆ๋‹ค๊ณ  ๋งํ•˜๋Š” ๊ฒƒ์ด ๋งž์Šต๋‹ˆ๊นŒ('gpu_hist'๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ์—๋„)?
  2. xgboost๊ฐ€ ์ „์ฒด ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ฅผ GPU์— ๋กœ๋“œํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๊ฒƒ๋„ ์ž˜๋ชป๋œ๊ฑด๊ฐ€์š”?
์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰