éšå±ã®äžã®è±¡ãã€ãŸããã«ãã³ã¢CPUã§ã®ããã©ãŒãã³ã¹ã«åãçµãæãæ¥ãŸããã
çŸåšã hist
ããªãŒæé·ã¢ã«ãŽãªãºã ïŒ tree_method=hist
ïŒã¯ããã«ãã³ã¢CPUã§ã®ã¹ã±ãŒãªã³ã°ãäžååã§ããäžéšã®ããŒã¿ã»ããã§ã¯ãã¹ã¬ããæ°ãå¢ãããš@ Laurae2ã®GradientBoosting Benchmark ïŒ GitHub repo ïŒã«ãã£ãŠçºèŠãããŸããã
BoschããŒã¿ã»ããã®ã¹ã±ãŒãªã³ã°åäœã¯æ¬¡ã®ãšããã§ãã
'hist'ã¢ã«ãŽãªãºã ã®ããã©ãŒãã³ã¹ã®ããã«ããã¯ãç¹å®ãããããå°ããªãªããžããªhcho3 / xgboost-fast-hist-perf-labã«é 眮ããŸããã src / build_hist.ccãä¿®æ£ããããšã§ãããã©ãŒãã³ã¹ã®åäžãè©Šã¿ãããšãã§ããŸãã
@ Laurae2GBTãã³ãããŒã¯ããçšæããã ãããããšãããããŸãã åé¡ã®ããå Žæãç¹å®ããã®ã«åœ¹ç«ã¡ãŸããã
@ hcho3 OpenMP guided
ã¹ã±ãžã¥ãŒã«ã¯è² è·åæ£ã«åœ¹ç«ã¡ãŸããïŒ ãããããªããellpackã¯ããŸã圹ã«ç«ã¡ãŸããã
ç§ã®æšæž¬ã§ã¯ãellpackã䜿çšããäœæ¥ã®éçå²ãåœãŠã¯ãOpenMPã®guided
ãŸãã¯dynamic
ã¢ãŒããããäœããªãŒããŒãããã§ãã©ã³ã¹ã®åããã¯ãŒã¯ããŒããå®çŸããŸãã dynamic
ãäœæ¥ãçããã¥ãŒãç¶æããããã®å®è¡æã®ãªãŒããŒããããçºçããŸã
å°ã話é¡ããå€ããŠãããããããŸãããã approx
ãã³ãããŒã¯çµæã¯ãããŸããïŒ
å éšç°å¢ã§ã®ãã«ãã¹ã¬ããã«ãã次åã®ã¹ããŒãã¢ãããèŠã€ããŸãã...ä»ã®äººã®ããŒã¿ãèŠãã
@CodingCatãªã³ã¯ããããã³ãããŒã¯ã¹ã€ãŒãã¯hist
ã®ã¿ã䜿çšããŸãã approx
ã¯hist
ãããªããã©ãŒãã³ã¹ã®äœäžã瀺ããŠããŸããïŒããšãã°ã36ã¹ã¬ããã¯3ã¹ã¬ããããé
ãïŒïŒ
@ hcho3ã¯ã©ã¹ã¿ãŒã®å¶éã«ãããæ倧8ã€ã®ã¹ã¬ããã§ãããã¹ãã§ããŸãã...ãããã8ãš4 ....ãæ¯èŒãããšéåžžã«éãããã¹ããŒãã¢ãããèŠãããŸãã
@CodingCat 8ã€ã®ã¹ã¬ããã®å®è¡é床ã4ããé ããšããããšã§ããïŒ
@CodingCat approx
ã¹ã±ãŒãªã³ã°ãéåžžã«æªãããããã³ãããŒã¯ãè©ŠããŠã¿ãããããŸããã§ããã ç§ã®4ã³ã¢ã©ãããããïŒ3.6 GHzïŒã§ãé©åã«ã¹ã±ãŒãªã³ã°ãããªãããã64ã¹ã¬ãããŸãã¯72ã¹ã¬ããã§ã¯æ³åãã§ããŸããã
@ hcho3åŸã§ãVTuneã§ãªããžããªã䜿çšããŠç¢ºèªããŸãã
VTuneã§è©³çŽ°ãªããã©ãŒãã³ã¹ãååŸãããå Žåã¯ã以äžã䜿çšããŠããããŒã«è¿œå ã§ããŸãã
#include <ittnotify.h> // Intel Instrumentation and Tracing Technology
ã«ãŒãã®å€åŽã§è¿œè·¡ãããã®ã®åã«ä»¥äžãè¿œå ããŸãïŒæåå/å€æ°ã®ååãå€æŽããŸãïŒã
__itt_resume();
__itt_domain* domain = __itt_domain_create("MyDomain");
__itt_string_handle* task = __itt_string_handle_create("MyTask");
__itt_task_begin(domain, __itt_null, __itt_null, task);
ã«ãŒãã®å€åŽã§è¿œè·¡ãããã®ã®åŸã«ä»¥äžãè¿œå ããŸãïŒæåå/å€æ°ã®ååãå€æŽããŸãïŒã
__itt_task_end(domain);
__itt_pause();
ãããŠãã¹ã¬ããæ°ã®æ£ãããã©ã¡ãŒã¿ãŒã䜿çšããŠVTuneã§ãããžã§ã¯ããéå§ããŸãã ããã©ãŒãã³ã¹åæãè¡ãããã«ãã€ã³ã¹ãã«ã¡ã³ããŒã·ã§ã³ãäžæåæ¢ããŠå®è¡å¯èœãã¡ã€ã«ãèµ·åããŸãã
@ hcho3é ãã¯ãããŸããããã¹ã¬ããã4ã€å¢ãããšããããã15ïŒ ã®ã¹ããŒãã¢ããã«ãªããŸã...ïŒããã«å®éšãè¡ããšãçµæãåæããã®ã§ã¯ãªãããšæããŸã.....
@ Laurae2ã¯ç§ã ãã§ã¯ãªãããã§ã
@ hcho3 exact
ã approx
ã hist
ããã¹ãŠdepth=6
誰ãå®è¡ããªãå Žåã¯ãä»é±ã®çµãããŸã§ã«ã¹ã±ãŒãªã³ã°çµæãååŸããããšããŸãã
æè¿ãã³ã³ãã¥ãŒãã£ã³ã°ãµãŒããŒã移è¡ããŸãããä»é±ã¯ããã¹ãŠã¿ãŒã/ 36ã³ã¢/ 72ã¹ã¬ãã/ 80 GBpsRAM垯åå¹ ã®3.7GHzã®æ°ãããã·ã³ã§Boschã®æ°ãããã³ãããŒã¯ãåå®è¡ããŠããŸãã
fast_histã¢ããããŒã¿ã¯ãåæ£xgboostã®æ¹ãã¯ããã«é«éã§ããå¿ èŠããããŸãã @CodingCat AllReduceåŒã³åºããè¿œå ããããšãã人ãããªãã®ã§ãåæ£ã¢ãŒãã§åäœããããšã«é©ããŠããŸãã
@RAMitchell fast_histã¢ããããŒã¿ãäœæãããšãã¯ããªãæ°ããã®ã§ãåæ£ã¢ãŒãã®ãµããŒãããããŸããã 0.81ãªãªãŒã¹åŸã«å ¥æãããã®ã§ããã
@ Laurae2åèãŸã§ã«ããã³ãããŒã¯ã¹ã€ãŒããC5.9xlargeãã·ã³ã§å®è¡ããŸããããXGBoost hist
çµæã¯ä»¥åã®çµæãšäžèŽããŠããããã§ãã ãããããã°çªå·ããä»ãããŸãã
@ Laurae2ãŸããEC2ãã·ã³ã«ã¢ã¯ã»ã¹ã§ããŸãã EC2ã€ã³ã¹ã¿ã³ã¹ã§å®è¡ãããã¹ã¯ãªãããããå Žåã¯ããç¥ãããã ããã
@RAMitchell fast_histã¢ããããŒã¿ãäœæãããšãã¯ããªãæ°ããã®ã§ãåæ£ã¢ãŒãã®ãµããŒãããããŸããã 0.81ãªãªãŒã¹åŸã«å ¥æãããã®ã§ããã
@ hcho3ãããããã°ãåæ£åã®ããé«éãªãã¹ãã°ã©ã ã¢ã«ãŽãªãºã ãååŸããããã«ææŠããããšãã§ããŸããçŸåšãUberã®ä»äºã§ããŒãã¿ã€ã ãéãããŠãããæ¥å¹Žã¯xgboostã«ãã£ãšæéããããå¯èœæ§ããããŸãã
@CodingCatããã¯çŽ æŽãããããšã§ããããããšãïŒ 'hist'ã³ãŒãã«ã€ããŠè³ªåãããå Žåã¯ãç¥ãããã ããã
@CodingCatåèãŸã§ã«ã0.81ãªãªãŒã¹ã®çŽåŸã«ãhistãã¢ããããŒã¿ãŒã®åäœãã¹ããè¿œå ããäºå®ã§ãã åæ£ãµããŒããè¿œå ããå Žåã¯ãããã圹ç«ã€ã¯ãã§ãã
@ hcho3 @CodingCat approx
ã¯å
æåé€ãããããã§ãããäºæ³ãããåäœã§ããïŒ
https://github.com/dmlc/xgboost/commit/70d208d68c3a32aaa4fcd6aa456f286a4da5912f#diff -53a3a623be5ce5a351a89012c7b03a31ïŒPR https://github.com/dmlc/xgboost/pull/3395ãtree_method = approx
ãåé€ããŸãããïŒããããšéãããã®éã®çµæ...
@ Laurae2ãªãã¡ã¯ã¿ãªã³ã°ã«ããã approx
ãéžæãããŠãããšããINFOã¡ãã»ãŒãžãåé€ãããããã§ãã ãã以å€ã®å Žåã¯ã approx
ã¯åŒãç¶ã䜿çšã§ããŸãã
@ Laurae2å®ã¯ãããªãã¯æ£ããã§ãã approx
ã¯ãŸã ã³ãŒãããŒã¹ã«ãããŸãããäœããã®çç±ã§tree_method=approx
ãèšå®ãããŠããŠãåŒã³åºãããŸããã ãã®ãã°ãã§ããã ãæ©ã調æ»ããŸãã
åé¡ïŒ3840ãæåºãããŸããã ãããä¿®æ£ããããŸã§ããªãªãŒã¹0.81ã¯ãªãªãŒã¹ãããŸããã
@ hcho3é«éãã¹ãã°ã©ã ã䜿çšããŠãµãŒããŒäžã§éåžžã«å¥åŠãªãã®ãèŠã€ããŸãããææ¥ãã³ãããŒã¯èšç®ãçµäºããããçµæããç¥ããããŸãïŒé«éãã¹ãã°ã©ã ã®éåžžã«å€§ããªè² ã®å¹çã«ã€ããŠè©±ããŠããã®ã§ãç§ãè©ŠããŠããã®ã¯éåžžã«å€§ããã§ããããã枬å®ããŸãããé·ããªããããªãããšãé¡ã£ãŠããŸãïŒã
ããããå¹çã®æªãã¯äºæ³ãããã¯ããã«åªããŠããŸãããã©ã®ã³ã³ãã¥ãŒã¿ãŒã«ãåœãŠã¯ãŸããšã¯æããŸããïŒæ°ããIntel CPUäžä»£= RAMåšæ³¢æ°ãé«ãã»ã©è¯ããªãã®ã§ã¯ãªãã§ããããïŒïŒã ãµãŒããŒã§é«éãã¹ãã°ã©ã ãçµäºããããããŒã¿ãæçš¿ããŸãã
åèãŸã§ã«ãç§ã¯477åã®æ©èœïŒ5ïŒ æªæºã®æ¬ æå€ãæã€æ©èœïŒãåããBoschããŒã¿ã»ããã䜿çšããŠããŸãã
3000æé以äžã®CPUæéã«éããŸãã...ïŒå°ãªããšãç§ã®ãµãŒããŒã¯ãã°ããã®éæå¹ã«äœ¿çšãããŠããŸãïŒæ¬¡ã«ç§ã¯https://github.com/hcho3/xgboost-fast-hist-perf-ãèŠãŠãã䜿çšãã
@ hcho3å¿
èŠã«å¿ããŠããµãŒããŒã®èšç®ãçµäºãããããã³ãããŒã¯Rã¹ã¯ãªãããæäŸã§ããŸãã depth=8
ãšnrounds=50
ããã¹ãŠã®tree_method=exact
ã tree_method=approx
ïŒ updater=grow_histmaker,prune
åé¿çãïŒ3849ããåïŒãããã³tree_method=hist
ã1ãã72ã¹ã¬ããã åãçµãã¹ããã£ãšèå³æ·±ããã®ãèŠã€ãããããããŸããïŒãããŠAWSã§ããã¹ãã§ããã§ãããïŒã
以äžã®äºåçãªçµæãåç §ããŠãã ãããå¹³åçµæãŸã§7åå®è¡ãããŸããã ããèŠãã«ã¯ã¯ãªãã¯ããŠãã ããã æäŸãããåæããŒãã«ã ããããã瀺ãã®ãšã¯ç°ãªããCPUã¯åºå®ãããŠããŸããã
ãã£ãŒãã¯æããã«ç§ãæºåãããã®ãšã¯ããªãç°ãªã£ãŠããããã«èŠããŸã...ïŒåäœãããã«å¥åŠã§ãããããUMAããªã³ïŒNUMAããªãïŒã«ããŠãããåå®è¡ããŠããŸãïŒã åŸã§IntelVTuneã§ç¢ºèªããŸãã
ããŒããŠã§ã¢ãšãœãããŠã§ã¢ïŒ
pti=off spectre_v2=off spec_store_bypass_disable=off l1tf=off noibrs noibpb nopti no_stf_barrier
-O3 -mtune=native
ã¡ã«ãããŠã³/ã¹ãã¯ã¿ãŒãããã¯ã·ã§ã³ïŒ
laurae@laurae-compute:~$ head /sys/devices/system/cpu/vulnerabilities/*
==> /sys/devices/system/cpu/vulnerabilities/l1tf <==
Mitigation: PTE Inversion; VMX: vulnerable
==> /sys/devices/system/cpu/vulnerabilities/meltdown <==
Vulnerable
==> /sys/devices/system/cpu/vulnerabilities/spec_store_bypass <==
Vulnerable
==> /sys/devices/system/cpu/vulnerabilities/spectre_v1 <==
Mitigation: __user pointer sanitization
==> /sys/devices/system/cpu/vulnerabilities/spectre_v2 <==
Vulnerable
| ã¹ã¬ãã| æ£ç¢ºïŒå¹çïŒ| ããããïŒå¹çïŒ| å±¥æŽïŒå¹çïŒ|
| ---ïŒ| ---ïŒ| ---ïŒ| ---ïŒ|
| 1 | 1367ç§ïŒ100ïŒ
ïŒ| 1702ç§ïŒ100ïŒ
ïŒ| 69.9ç§ïŒ100ïŒ
ïŒ|
| 2 | 758.7ç§ïŒ180ïŒ
ïŒ| 881.0ç§ïŒ193ïŒ
ïŒ| 52.5ç§ïŒ133ïŒ
ïŒ|
| 4 | 368.6ç§ïŒ371ïŒ
ïŒ| 445.6ç§ïŒ382ïŒ
ïŒ| 31.7ç§ïŒ221ïŒ
ïŒ|
| 6 | 241.5ç§ïŒ566ïŒ
ïŒ| 219.6ç§ïŒ582ïŒ
ïŒ| 24.1ç§ïŒ290ïŒ
ïŒ|
| 9 | 160.4ç§ïŒ852ïŒ
ïŒ| 194.4ç§ïŒ875ïŒ
ïŒ| 23.1ç§ïŒ303ïŒ
ïŒ|
| 18 | 86.3ç§ïŒ1583ïŒ
ïŒ| 106.3ç§ïŒ1601ïŒ
ïŒ| 24.2ç§ïŒ289ïŒ
ïŒ|
| 27 | 66.4ç§ïŒ2059ïŒ
ïŒ| 80.2ç§ïŒ2122ïŒ
ïŒ| 63.6ç§ïŒ110ïŒ
ïŒ|
| 36 | 52.9ç§ïŒ2586ïŒ
ïŒ| 60.0ç§ïŒ2837ïŒ
ïŒ| 55.2ç§ïŒ127ïŒ
ïŒ|
| 54 | 215.4ç§ïŒ635ïŒ
ïŒ| 289.5ç§ïŒ588ïŒ
ïŒ| 343.0ç§ïŒ20ïŒ
ïŒ|
| 72 | 218.9ç§ïŒ624ïŒ
ïŒ| 295.6ç§ïŒ576ïŒ
ïŒ| 1237.2ç§ïŒ6ïŒ
ïŒ|
xgboostæ£ç¢ºãªé床ïŒ
xgboostæ£ç¢ºãªå¹çïŒ
xgboostããããã®é床ïŒ
xgboostããããã®å¹çïŒ
xgboostãã¹ãã°ã©ã é床ïŒ
xgboostãã¹ãã°ã©ã ã®å¹çïŒ
è€æ°ã®ãœã±ããã«åé¡ãããããã§ãã
@RAMitchell NUMAããŒãã®å¯çšæ§ã«åé¡ãããããã§ãããµãNUMAã¯ã©ã¹ã¿ãªã³ã°ïŒ1ãœã±ãã= 2NUMAããŒãã§ã¯ãªã2ãœã±ãã= 2 NUMAããŒãïŒã䜿çšããŠããã®åé¡ãåçŸã§ããŸãïŒãã¬ãŒãã³ã°äžã®ã¹ã¬ãããå°ãªããªããšãçµæãããã«æªåããŸãïŒã ïŒã
ã»ãšãã©ã®æ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ãšåæ§ã«ãxgboostã«ã¯NUMAããŒããåŠçããããã®æé©åããããŸããã ããããããã¯2çªç®ã®åé¡ã«ãªããŸãã ãããã£ãŠããããã¯ãã«ããœã±ããç°å¢ããNUMAããŒããCODïŒCluster on DieïŒãŸãã¯SNCïŒSub NUMA ClusteringïŒãä»ããŠå©çšã§ããå Žåã«ã¯é©åã§ã¯ãªãããã€ããŒã¹ã¬ããã£ã³ã°ã«ãã£ãŠã¯ãŒã¯ããŒãã®äžåè¡¡ã倧ããªããã«ãã£ã«ãªããŸãã
åé¡1ã¯ãxgboost histã¢ãŒãã§ã®ãã«ãã¹ã¬ããããã©ãŒãã³ã¹ã®å€§å¹ ãªäœäžã«é¢ãããã®ã§ãïŒãã®åé¡ïŒã
åé¡2ã¯ãNUMAã®æé©åã«é¢ãããã®ã§ãïŒå¥ã®åé¡ãéãå¿ èŠããããŸãïŒã
NUMAãç¡å¹ã«ããå Žåã®çµæã¯æ¬¡ã®ãšããã§ãã æ¯èŒã®ããã«NUMAãæå¹ã«ããŠçµæããã¢ã«ããŸããã ãŸããCPUã72ã¹ã¬ããã§ã«ãŒãã«ã¹ã±ãžã¥ãŒã©ã«å§åãããåã®ããã©ãŒãã³ã¹ã瀺ãããã«71ã¹ã¬ãããè¿œå ããŸããïŒäœ¿çšå¯èœãªãªãœãŒã¹ãããå€ãã®ãªãœãŒã¹ãå¿ èŠã§ãïŒã
UMAã¯ããã«ãã¹ã¬ããã«é¢ããŠNUMAãããã¯ããã«åªããŠããŸããããã¯ãNUMAã«å¯Ÿå¿ããŠããªãããã»ã¹ã§ã®ã¡ã¢ãªã€ã³ã¿ãŒãªãŒãã®äºæ³ãããçµæã§ãã
æéæéïŒ
| ã¹ã¬ãã| æ£ç¢º
NUMA | æ£ç¢º
UMA | çŽ
NUMA | çŽ
UMA | å±¥æŽ
NUMA | å±¥æŽ
UMA |
| ---ïŒ| ---ïŒ| ---ïŒ| ---ïŒ| ---ïŒ| ---ïŒ| ---ïŒ|
| 1 | 1367幎代| 1667幎代| 1702幎代| 1792幎代| 69.9ç§| 85.6ç§|
| 2 | 758.7ç§| 810.3ç§| 881.0s | 909.0s | 52.5ç§| 54.1ç§|
| 4 | 368.6ç§| 413.0ç§| 445.6ç§| 452.9ç§| 31.7ç§| 36.2ç§|
| 6 | 241.5ç§| 273.8ç§| 219.6ç§| 302.4ç§| 24.1ç§| 30.5ç§|
| 9 | 160.4ç§| 182.8ç§| 194.4ç§| 202.5ç§| 23.1ç§| 28.3ç§|
| 18 | 86.3ç§| 94.4ç§| 106.3ç§| 105.8ç§| 24.2ç§| 31.2ç§|
| 27 | 66.4ç§| 66.4ç§| 80.2ç§| 73.6ç§| 63.6ç§| 37.5ç§|
| 36 | 52.9ç§| 52.7ç§| 60.0ç§| 59.4ç§| 55.2ç§| 43.5ç§|
| 54 | 215.4ç§| 49.2ç§| 289.5ç§| 58.5ç§| 343.0ç§| 57.4ç§|
| 71 | 218.3ç§| 47.01ç§| 295.9ç§| 56.5ç§| 1238.2s | 71.5ç§|
| 72 | 218.9ç§| 49.0ç§| 295.6ç§| 58.6ç§| 1237.2s | 79.1ç§|
å¹çè¡šïŒ
| ã¹ã¬ãã| æ£ç¢º
NUMA | æ£ç¢º
UMA | çŽ
NUMA | çŽ
UMA | å±¥æŽ
NUMA | å±¥æŽ
UMA |
| ---ïŒ| ---ïŒ| ---ïŒ| ---ïŒ| ---ïŒ| ---ïŒ| ---ïŒ|
| 1 | 100ïŒ
| 100ïŒ
| 100ïŒ
| 100ïŒ
| 100ïŒ
| 100ïŒ
|
| 2 | 180ïŒ
| 206ïŒ
| 193ïŒ
| 197ïŒ
| 133ïŒ
| 158ïŒ
|
| 4 | 371ïŒ
| 404ïŒ
| 382ïŒ
| 396ïŒ
| 221ïŒ
| 236ïŒ
|
| 6 | 566ïŒ
| 609ïŒ
| 582ïŒ
| 593ïŒ
| 290ïŒ
| 280ïŒ
|
| 9 | 852ïŒ
| 912ïŒ
| 875ïŒ
| 885ïŒ
| 303ïŒ
| 302ïŒ
|
| 18 | 1583ïŒ
| 1766ïŒ
| 1601ïŒ
| 1694ïŒ
| 289ïŒ
| 274ïŒ
|
| 27 | 2059ïŒ
| 2510ïŒ
| 2122ïŒ
| 2436ïŒ
| 110ïŒ
| 229ïŒ
|
| 36 | 2586ïŒ
| 3162ïŒ
| 2837ïŒ
| 3017ïŒ
| 127ïŒ
| 197ïŒ
|
| 54 | 635ïŒ
| 3384ïŒ
| 588ïŒ
| 3065ïŒ
| 20ïŒ
| 149ïŒ
|
| 71 | 626ïŒ
| 3545ïŒ
| 575ïŒ
| 3172ïŒ
| 6ïŒ
| 120ïŒ
|
| 72 | 624ïŒ
| 3401ïŒ
| 576ïŒ
| 3059ïŒ
| 6ïŒ
| 108ïŒ
|
UMAã¢ãŒãã
xgboostæ£ç¢ºãªé床ïŒ
xgboostæ£ç¢ºãªå¹çïŒ
xgboostããããã®é床ïŒ
xgboostããããã®å¹çïŒ
xgboostãã¹ãã°ã©ã é床ïŒ
xgboostãã¹ãã°ã©ã ã®å¹çïŒ
https://github.com/dmlc/xgboost/pull/3957#issuecomment -453815876ã§ã³ã¡ã³ããããŠããããã«ãã³ãããa2dc929ïŒCPUã®æ¹ååïŒãš5f151c5ïŒCPUã®æ¹ååŸïŒããã¹ãããŸããã
Dual Xeon 6154ãµãŒããŒïŒIntelã§ã¯ãªãgccã³ã³ãã€ã©ïŒã䜿çšããŠãBoschã500åã®å埩ãeta 0.10ãæ·±ã8ã§ãã¹ããã1ã72ã¹ã¬ããã§ãããã3åå®è¡ããŸããã ããŒã¯ããã©ãŒãã³ã¹ã§ãã«ãã¹ã¬ããã¯ãŒã¯ããŒãã®ããã©ãŒãã³ã¹ãæ倧çŽ50ïŒ ïŒ1/3é«éïŒåäžããããšãããããŸãã
ïŒ3957ïŒcommit a2dc929ïŒããåã®çµæã¯æ¬¡ã®ãšããã§ãã
ïŒ3957ïŒcommit 5f151c5ïŒã®çµæã¯æ¬¡ã®ãšããã§ãã
å¹çæ²ç·ã䜿çšãããšãã¹ã±ãŒã©ããªãã£ã50ïŒ åäžããããšãããããŸãïŒããã¯ãåé¡ã解決ãããããšãæå³ããããã§ã¯ãããŸãããå¯èœã§ããã°ãããã§ãæ¹åããå¿ èŠããããŸããçæ³çã«ã¯ããã¡ããã¡ãã«ãªã1000ã2000ïŒ ã®ç¯å²ã«å°éã§ããå Žåã§ãããããïŒã
a2dc929ã®å¹çæ²ç·ïŒ
5f151c5ã®å¹çæ²ç·ïŒ
@ Laurae2ã«æè¬ããŸãããã®åé¡ããã³çãããŠãåžžã«åé¡è¿œè·¡ã·ã¹ãã ã®æäžäœã«è¡šç€ºãããããã«ããŸãã ããã¹ãããšã¯ç¢ºãã«ãã£ãšãããŸãã
@ hcho3 @SmirnovEgorRu Xã³ã¢x 1 xgboostã¹ã¬ããã§ãã€ããŒãã©ã¡ãŒã¿ãŒèª¿æŽãè¡ããšãã³ããã5f151c5ã§å šäœçã«10ïŒ ã15ïŒ ã®ããã«ãã£ãçºçããã100ïŒ é«å¯åºŠããŒã¿ã®ã·ã³ã°ã«ã¹ã¬ããã¯ãŒã¯ããŒãã§å°ããªCPUããã©ãŒãã³ã¹ã®äœäžãèŠãããŸãã
ããã¯ã5000äžè¡x 100åã®ã©ã³ãã ãªé«å¯åºŠããŒã¿ïŒgcc 8ïŒã®äŸã§ãããPython / Rããé©åã«ãã¬ãŒãã³ã°ããã«ã¯å°ãªããšã256GBã®RAMãå¿ èŠã§ã3åïŒ6æ¥ïŒå®è¡ãããŸãã
a2dc929ãã³ãããããŸãïŒ
5f151c5ãã³ãããããŸãïŒ
ãããã¯éåžžã«é¡äŒŒãããã«ãã¹ã¬ããããã©ãŒãã³ã¹ã«ã€ãªãããŸãããã·ã³ã°ã«ã¹ã¬ããããã©ãŒãã³ã¹ã¯ããé ããã¬ãŒãã³ã°ã«ãã£ãŠææãåããŸãïŒ @SmirnovEgorRuã®æ¹åã¯ããã«éãã¹ã±ãŒãªã³ã°ãããã®50M x 100ã®å Žåã11ã¹ã¬ããã§500ïŒ ã®å¹çã«éããŸããã以åã¯13ã¹ã¬ããã§ããïŒã
gmatã®äœææéãé€ããŠã50M x100ã®ã·ã³ã°ã«ã¹ã¬ããã®å Žåã¯æ¬¡ã®ããã«ãªããŸãã
| ã³ããã| åèš| gmatæé| é»è»ã®æé|
| ïŒ--- | ---ïŒ| ---ïŒ| ---ïŒ|
| a2dc929 | 2926幎代| 816s | 2109ç§|
| 5f151c5 | ïŒ+ 13ïŒ
ïŒ3316ç§| ïŒãïŒ
ïŒ817ç§| ïŒ+ 18ïŒ
ïŒ2499s |
@ hcho3 @ Laurae2äžè¬ã«ããã€ããŒã¹ã¬ããã£ã³ã°ã¯ã³ã¢ããŠã³ãã¢ã«ãŽãªãºã ã®å Žåã«ã®ã¿åœ¹ç«ã¡ã
HTã¯ãå®è¡ã®ããã®ããå€ãã®åœä»€ã«ãã£ãŠCPUã®ãã€ãã©ã€ã³ãããŒãããã®ã«åœ¹ç«ã¡ãŸãã ã»ãšãã©ã®åœä»€ãåã®åœä»€ã®å®è¡ãåŸ
ã€å ŽåïŒã¬ã€ãã³ã·ãŒããŠã³ãïŒ-HTã¯æ¬åœã«åœ¹ç«ã¡ãŸããç¹å®ã®ã¯ãŒã¯ããŒãã§ã¯ãæ倧1.5åã®ã¹ããŒãã¢ãããèŠãããŸããã
ãã ããã¢ããªã±ãŒã·ã§ã³ãã»ãšãã©ã®æéãã¡ã¢ãªã®æäœïŒã¡ã¢ãªããŠã³ãïŒã«è²»ãããŠããå ŽåãHTã¯ããã«æªåããŸãã 2ã€ã®ãã€ããŒã¹ã¬ããã1ã€ã®CPUãã£ãã·ã¥ãå
±æããæçšãªæ
å ±ãçžäºã«çœ®ãæããŸãã ãã®çµæãããã©ãŒãã³ã¹ãäœäžããŸãã
åŸé
ããŒã¹ãã£ã³ã°-ã¡ã¢ãªããŠã³ãã¢ã«ãŽãªãºã ã HTã䜿çšããŠãããã©ãŒãã³ã¹ãåäžããããšã¯ãªãã1ã¹ã¬ããããŒãžã§ã³ãšæ¯èŒããã¹ã¬ããã«ããæ倧é床ã®åäžã¯ãããŒããŠã§ã¢ã³ã¢ã®æ°ã«ãã£ãŠå¶éãããŸãã ãããã£ãŠãHTã䜿çšããã«CPUã®ããã©ãŒãã³ã¹ã枬å®ããæ¹ããããšæããŸãã
NUMAã«ã€ããŠã¯ã©ãã§ããïŒDAALã®å®è£ ã§ãåãåé¡ãçºçããŸããã åã³ã¢ã«ããã¡ã¢ãªäœ¿çšéã®å¶åŸ¡ãå¿ èŠã§ãã å°æ¥çã«èŠãŠãããŸãã
1ã€ã®ã¹ã¬ããã§ã®å°ããªé床äœäžã¯ã©ãã§ããïŒèª¿æ»ããŸãã ä¿®æ£ã¯ç°¡åã ãšæããŸãã
@ hcho3çŸåšãç§ã¯æé©åã®æ¬¡ã®éšåã«åãçµãã§ããŸãã è¿ãå°æ¥ãæ°ãããã«ãªã¯ãšã¹ãã®æºåãã§ããŠããããšãé¡ã£ãŠããŸãã
@SmirnovEgorRuãç²ãæ§ã§ããã åèãŸã§ã«ãã¬ãã«ããšã®ããŒãæ¡åŒµãå®è¡ããããšã«ãã£ãŠäžŠååŠçã®éãå¢ããããšã«ã€ããŠã®æè¿ã®è°è«ããããŸããïŒïŒ4077ã
@ Laurae2 ïŒ3957ãïŒ4310ãããã³ïŒ4529ã§çµ±åããã®ã§ãã¹ã±ãŒãªã³ã°ã®åé¡ã解決ããããšæ³å®ã§ããŸããïŒ NUMAã®åœ±é¿ã¯ãŸã åé¡ãããå¯èœæ§ããããŸãã
@ hcho3åŸã§ç¢ºèªããããã«åãã³ãã³ã°ããŸãããå®çšŒåç°å¢ã§ããã©ãŒãã³ã¹ã®äœäžããã£ãããšã«æ°ã¥ããŸããïŒç¹ã«ãïŒ3957ã30å以äžã®é床äœäžãåŒãèµ·ãããŸããïŒã
@szilardã§ãããã©ãŒãã³ã¹çµæã確èªããŸãã
ãªãŒãã³ãªäŸïŒ https ïŒ
ãã«ãã³ã¢ã¹ã±ãŒãªã³ã°ãšå®éã«ã¯NUMAã®åé¡ãå€§å¹ ã«æ¹åãããŸããã
ãã«ãã³ã¢ïŒ
å°ããããŒã¿ïŒ0.1Mè¡ïŒã§ã®éåžžã«é¡èãªæ¹å
詳现ã¯ãã¡ãïŒ
https://github.com/szilard/GBM-perf#multi -core-scaling-cpu
https://github.com/szilard/GBM-perf/issues/29#issuecomment -689713624
ãŸãã NUMAã®åé¡ã¯å€§å¹ ã«è»œæžãããŸããã
@szilardãã³ãããŒã¯ã«æéã
ããããããéæããŠããããã®ã¹ã¬ããã®ã¿ããªçŽ æŽãããä»äºã
åèãŸã§ã«ãããŸããŸãªããŒãžã§ã³ã®xgboostã®1ã16ïŒ1soïŒno HTïŒããã³64ïŒãã¹ãŠïŒã³ã¢ã®EC2 r4.16xlargeïŒãããã16c + 16HTã®2ã€ã®ãœã±ããïŒã®1Mè¡ã§ã®ãã¬ãŒãã³ã°æéã¯æ¬¡ã®ãšããã§ãã
@szilard ãåæããããšãããããŸãïŒ æé©åãæ©èœãããšèããŠããããã§ãã
äžèšã®PSXGB1.2ã«ã¯1.1ããŒãžã§ã³ã«å¯ŸããŠããããã®ãªã°ã¬ãã·ã§ã³ãããããšãããããŸãã éåžžã«èå³æ·±ãæ å ±ã§ãããããæ確ã«ããŸãããã ããã¯ç§ã«ã¯æåŸ ãããŠããŸããã
@szilard ããã®ãããã¯ãããªãã«ãšã£ãŠèå³æ·±ããã®ã§ããå Žå
https://medium.com/intel-analytics-software/new-optimizations-for-cpu-in-xgboost-1-1-81144ea21115
æé©åäœæ¥ãšããã°æçš¿ãžã®ãªã³ã¯ãããããšã@SmirnovEgorRu ïŒç§ã¯ä»¥åã«ãã®æçš¿ãèŠãŸããã§ããïŒã
èªåã®çªå·ãç°¡åã«åçŸããå°æ¥æ°ããçªå·ããã®ä»ã®ããŒããŠã§ã¢ãå ¥æããããããããã«ã次ã®ããã«å¥ã®DockerfileãäœæããŸããã
https://github.com/szilard/GBM-perf/tree/master/analysis/xgboost_cpu_by_version
æåã®ãœã±ããã®CPUã³ã¢IDãèšå®ããå¿ èŠããããŸãããã€ããŒã¹ã¬ããã£ã³ã°ã³ã¢ã¯èšå®ããïŒããšãã°ã2ã€ã®ãœã±ãããåããr4.16xlargeã®0-15ããããã16c + 16HTïŒãxgboostããŒãžã§ã³ãèšå®ããŸãã
VER=v1.2.0
CORES_1SO_NOHT=0-15 ## set physical core ids on first socket, no hyperthreading
sudo docker build --build-arg CACHE_DATE=$(date +%Y-%m-%d) --build-arg VER=$VER -t gbmperf_xgboost_cpu_ver .
sudo docker run --rm -e CORES_1SO_NOHT=$CORES_1SO_NOHT gbmperf_xgboost_cpu_ver
ã¹ã¯ãªãããæ°åå®è¡ãã䟡å€ããããããããŸããããã¹ãŠã®ã³ã¢ã§ã®ãã¬ãŒãã³ã°æéã¯éåžžãä»®æ³åç°å¢ïŒEC2ïŒã«ãããã®ããNUMAã«ãããã®ãã¯ããããŸããããå€å°é«ãå€åæ§ã瀺ããŸãã
ãã³ãããŒã¯ã§äœ¿çšããŠããr4.16xlargeãããé«ãåšæ³¢æ°ãšå€ãã®ã³ã¢ãæã€c5.metalã§ã®çµæïŒ
https://github.com/szilard/GBM-perf/issues/41
TLDRïŒxgboostã¯ãä»ã®ã©ã€ãã©ãªãšæ¯èŒããŠãããé«éã§ããå€ãã®ã³ã¢ãæ倧éã«æŽ»çšããŸãã ð
ç§ã¯ããã«ã€ããŠçåã«æããŸãïŒ
xgboostã®1ã³ã¢ãã24ã³ã¢ãžã®ã¹ããŒãã¢ããã¯ãå°ããããŒã¿ïŒ100äžè¡ãäžå€®ã®åã®ããã«ïŒããã倧ããããŒã¿ïŒ1000äžè¡ãå³åŽã®ããã«ïŒã®æ¹ãå°ãããªããŸãã ããã¯ããã£ãã·ã¥ãããã®å¢å ã®ãããªãã®ã§ããããããšãä»ã®ã©ã€ãã©ãªã«ã¯ãªããã®ã§ããïŒ
AMDã«é¢ããããã€ãã®çµæã¯æ¬¡ã®ãšããã§ãã
https://github.com/szilard/GBM-perf/issues/42
xgboostã®æé©åã¯AMDã§ãããŸãæ©èœããŠããããã§ãã
æãåèã«ãªãã³ã¡ã³ã
ãã«ãã³ã¢ã¹ã±ãŒãªã³ã°ãšå®éã«ã¯NUMAã®åé¡ãå€§å¹ ã«æ¹åãããŸããã
ãã«ãã³ã¢ïŒ
å°ããããŒã¿ïŒ0.1Mè¡ïŒã§ã®éåžžã«é¡èãªæ¹å
詳现ã¯ãã¡ãïŒ
https://github.com/szilard/GBM-perf#multi -core-scaling-cpu
https://github.com/szilard/GBM-perf/issues/29#issuecomment -689713624
ãŸãã NUMAã®åé¡ã¯å€§å¹ ã«è»œæžãããŸããã