Xgboost: python์˜ xgboost.Booster์˜ ํ•œ ๋‹จ๊ณ„ ์—…๋ฐ์ดํŠธ๊ฐ€ segfault๋กœ ์‹คํŒจํ•ฉ๋‹ˆ๋‹ค.

์— ๋งŒ๋“  2017๋…„ 03์›” 22์ผ  ยท  3์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: dmlc/xgboost

์ €๋Š” 1๋‹จ๊ณ„ ์ฆ๋ถ„ XGBoost ์•™์ƒ๋ธ” ๊ตฌ์„ฑ์„ ์‹คํ—˜ํ•˜๊ณ  ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. Booster๊ฐ€ xgboost.train ํ•จ์ˆ˜( python library )์— ์˜ํ•ด ์ƒ์„ฑ๋˜๋ฉด ๋ชจ๋“  ๊ฒƒ์ด ์ž˜ ์ž‘๋™ํ•˜๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
๊ทธ๋Ÿฌ๋‚˜ ๋ถ€์Šคํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ๋‹ค์Œ ๊ณผ ๊ฐ™์ด
```ํŒŒ์ด์ฌ
booster_ = xgboost.Booster({'๋ชฉํ‘œ': 'reg:linear'})
booster_.update(dtrain, 1)

the python process fails with a segmentation fault.

## Environment info
Operating System:
* **python 3.6** Mac OS X 10.10.5 (Darwin 14.5.0), Ubuntu 14.04.5 LTS (GNU/Linux 3.19.0-25-generic x86_64);
* **python 2.7** Mac OS X 10.10.6 (Darwin 15.6.0);

Compiler:
* **python 3.6** used `pip install xgboost`;
* **python 2.7** gcc (6.3.0 --without-multilib);

`xgboost` version used:
* **python 3.6** version 0.6 from pip;
* **python 2.7.13** git HEAD 4a63f4ab43480adaaf13bde2485d5bfedd952520;

## Steps to reproduce
```python
import xgboost
dtrain = xgboost.DMatrix(data=[[-1.0], [0.0], [1.0]], label=[0.0, -1.0, 1.0])

booster_ = xgboost.Booster({'objective': 'reg:linear', 'max_depth': 1})
booster_.update(dtrain, 1)

booster_.update(dtrain, 1)

๋งˆ์ง€๋ง‰ ์ค„์€ ๋ถ„ํ•  ์˜ค๋ฅ˜๋ฅผ ์ผ์œผํ‚ต๋‹ˆ๋‹ค. python 2.7.13 ์— ๋Œ€ํ•œ

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

๋ฌธ์ œ์˜ ์›์ธ์ด ๋ฌด์—‡์ธ์ง€ ์•Œ์•„๋ƒˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ๋‹ค์Œ์„ ํ˜ธ์ถœํ•˜์—ฌ ๋นˆ ๋ถ€์Šคํ„ฐ๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค.

booster_ = xgboost.Booster({'objective': 'reg:linear'})

GBTree ๋ถ€์Šคํ„ฐ๋ฅผ ๋ถ€๋ถ„์ ์œผ๋กœ๋งŒ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ์ค‘์š”ํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜ num_feature
๊ธฐ๋ณธ์ ์œผ๋กœ 0์œผ๋กœ ์„ค์ • .update() ํ˜ธ์ถœ์— ์˜ํ•ด ์ ์ ˆํ•œ ์ˆ˜์˜ ๊ธฐ๋Šฅ์œผ๋กœ ์—…๋ฐ์ดํŠธ ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค .

๊ทธ๋Ÿฌ๋‚˜ num_feature ๋Œ€ํ•œ ๋ช…์‹œ์  ๊ฐ’์„ ์ „๋‹ฌํ•˜๋ฉด ๋ถ„ํ•  ์˜ค๋ฅ˜๊ฐ€ ํ•ด๊ฒฐ๋ฉ๋‹ˆ๋‹ค.

booster_ = xgboost.Booster({'objective': 'reg:linear', 'num_feature': dtrain.num_col()})

cache=() ๊ฐ€ ๋น„์–ด ์žˆ๊ฑฐ๋‚˜ xgboost.Booster() ๊ฐ€ ๊ฒฝ๊ณ ๋ฅผ ๋ฐœํ–‰ํ•ด์•ผ ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.
'num_feature' ๋Š” params ์ธ์ˆ˜์— ๋ช…์‹œ์ ์œผ๋กœ ์„ค์ •๋˜์–ด ์žˆ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

๋ชจ๋“  3 ๋Œ“๊ธ€

์— ์˜ํ•ด ๋ฐ˜ํ™˜๋œ ๋นˆ Booster ๊ฐœ์ฒด์˜ ์ฆ๋ถ„ ์—…๋ฐ์ดํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ ๋‚ด ๊ธฐ๋Œ€์น˜๊ฐ€ ๋ฌด์—‡์ธ์ง€ ๋ช…ํ™•ํžˆ ํ•˜๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.

booster_ = xgboost.Booster({'objective': 'reg:linear', 'max_depth': 1})

์ผ๋ฐ˜์ ์ธ ๊ทธ๋ž˜๋””์–ธํŠธ ๋ถ€์ŠคํŒ… ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋”ฐ๋ฅด๋ฉด DMatrix dtrain ์—์„œ ์ƒ˜ํ”Œ(X, y)์„ ์—…๋ฐ์ดํŠธํ•œ ํ›„

booster_.update(dtrain, 1)

๋นˆ ๋ถ€์Šคํ„ฐ๋Š” f_0(x) -- ์ƒ์ˆ˜ ์˜ˆ์ธก ๋˜๋Š” f_1(x) -- ์•„๋ž˜์— ํ‘œ์‹œ๋œ ๊ฒƒ์ฒ˜๋Ÿผ ๊ทธ๋ž˜๋””์–ธํŠธ ๋ถ€์ŠคํŒ…์˜ ํ•œ ๋‹จ๊ณ„ ํ›„ ์˜ˆ์ธก์ด ๋ฉ๋‹ˆ๋‹ค(from Hastie, Tibshirani, Friedman; 2013 10th ed page 361).

screen shot 2017-03-21 at 18 07 07

๋‚˜๋Š” ๋ฌธ์ œ๋ฅผ ์žฌํ˜„ ์˜ˆ์ œ์—์„œ ์ œ๊ณตํ•˜๋Š” ๋นˆ RegTree::FVec.data ๋ฒกํ„ฐ๊นŒ์ง€ ์ถ”์ ํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. booster_.update ์— ๋Œ€ํ•œ ๋‘ ๋ฒˆ์งธ ํ˜ธ์ถœ์˜ (์ˆ˜๋™) ์ถ”์ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์„œ ์‹คํŒจ:
ํฌํ•จ/xgboost/tree_model.h:#L528
..., feat.fvalue(split_index), feat.is_missing(split_index), ...

feat.data.size() == 0 ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. .update() ๋Œ€ํ•œ ์ฒซ ๋ฒˆ์งธ ํ˜ธ์ถœ ํ›„์— ์ด ๋ฒกํ„ฐ๊ฐ€ ์—ฌ์ „ํžˆ ๋น„์–ด ์žˆ์ง€๋งŒ .train() ๋Œ€ํ•œ ๋Œ€์ฒด ํ˜ธ์ถœ ํ›„์— ๋น„์–ด ์žˆ์ง€ ์•Š์€ ์ด์œ ๋ฅผ ๋ชจ๋ฅด๊ฒ ์Šต๋‹ˆ๋‹ค.

๋ฌธ์ œ์˜ ์›์ธ์ด ๋ฌด์—‡์ธ์ง€ ์•Œ์•„๋ƒˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ๋‹ค์Œ์„ ํ˜ธ์ถœํ•˜์—ฌ ๋นˆ ๋ถ€์Šคํ„ฐ๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค.

booster_ = xgboost.Booster({'objective': 'reg:linear'})

GBTree ๋ถ€์Šคํ„ฐ๋ฅผ ๋ถ€๋ถ„์ ์œผ๋กœ๋งŒ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ์ค‘์š”ํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜ num_feature
๊ธฐ๋ณธ์ ์œผ๋กœ 0์œผ๋กœ ์„ค์ • .update() ํ˜ธ์ถœ์— ์˜ํ•ด ์ ์ ˆํ•œ ์ˆ˜์˜ ๊ธฐ๋Šฅ์œผ๋กœ ์—…๋ฐ์ดํŠธ ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค .

๊ทธ๋Ÿฌ๋‚˜ num_feature ๋Œ€ํ•œ ๋ช…์‹œ์  ๊ฐ’์„ ์ „๋‹ฌํ•˜๋ฉด ๋ถ„ํ•  ์˜ค๋ฅ˜๊ฐ€ ํ•ด๊ฒฐ๋ฉ๋‹ˆ๋‹ค.

booster_ = xgboost.Booster({'objective': 'reg:linear', 'num_feature': dtrain.num_col()})

cache=() ๊ฐ€ ๋น„์–ด ์žˆ๊ฑฐ๋‚˜ xgboost.Booster() ๊ฐ€ ๊ฒฝ๊ณ ๋ฅผ ๋ฐœํ–‰ํ•ด์•ผ ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.
'num_feature' ๋Š” params ์ธ์ˆ˜์— ๋ช…์‹œ์ ์œผ๋กœ ์„ค์ •๋˜์–ด ์žˆ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰