Scikit-learn: min_weight_fraction_leaf ์ œ์•ˆ ๊ฐœ์„  ์‚ฌํ•ญ

์— ๋งŒ๋“  2016๋…„ 06์›” 28์ผ  ยท  3์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: scikit-learn/scikit-learn

๊ธฐ์ˆ 

DecisionTreeClassifier ๋ฐ RandomForestClassifier์˜ min_weight_fraction_leaf ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์ž˜๋ชป ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์œผ๋ฉฐ ๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค์ด ์ €์™€ ๊ฐ™์€ ์ผ์„ํ•˜๊ณ ์žˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Šต๋‹ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด DecisionTreeClassifier์˜ min_weight_fraction_leaf ์— ๋Œ€ํ•œ ๋ฌธ์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋ฆฌํ”„ ๋…ธ๋“œ์— ์žˆ์–ด์•ผํ•˜๋Š” ์ž…๋ ฅ ์ƒ˜ํ”Œ์˜ ์ตœ์†Œ ๊ฐ€์ค‘์น˜ ๋น„์œจ์ž…๋‹ˆ๋‹ค.

๋ฌธ์„œ๊ฐ€ "์ž…๋ ฅ ์ƒ˜ํ”Œ์˜ ๊ฐ€์ค‘ ๋ถ€๋ถ„"์ด ๋ฌด์—‡์„ ์˜๋ฏธํ•˜๋Š”์ง€ ์ •๋ง ๋ถˆ๋ถ„๋ช…ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ฒ˜์Œ์—๋Š” ํด๋ž˜์Šค์˜ ํฌ๊ธฐ ๋˜๋Š” class_weight ์˜ํ•ด ์ œ๊ณต๋œ ๊ฐ’์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๊ฐ€์ค‘์น˜๋ผ๊ณ  ์ƒ๊ฐํ–ˆ์Šต๋‹ˆ๋‹ค. ๋งค๊ฐœ ๋ณ€์ˆ˜ ์„ค๋ช…์„ ์•ฝ๊ฐ„ ๋ณ€๊ฒฝํ•˜๋ฉด์ด ํ˜ผ๋ž€์„ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ์•„๋งˆ๋„

fit () ๋ฉ”์„œ๋“œ์—์„œ sample_weight์— ์˜ํ•ด ๊ฐ€์ค‘์น˜๊ฐ€ ๊ฒฐ์ •๋˜๋Š” ๋ฆฌํ”„ ๋…ธ๋“œ์— ์žˆ์–ด์•ผํ•˜๋Š” ์ž…๋ ฅ ์ƒ˜ํ”Œ์˜ ์ตœ์†Œ ๊ฐ€์ค‘์น˜ ๋น„์œจ์ž…๋‹ˆ๋‹ค.

๋˜ํ•œ min_weight_fraction_leaf ๋Š” sample_weight ๊ฐ€ fit() ํ˜ธ์ถœ์— ์ œ๊ณต๋œ ๊ฒฝ์šฐ์—๋งŒ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. ๊ฒฝ์šฐ sample_weight ์— ๋Œ€ํ•œ ํ˜ธ์ถœ์— ์ œ๊ณต๋˜์ง€ fit() , min_weight_fraction_leaf ์ž๋™์œผ๋กœ ๋ฌด์‹œ๋ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” min_weight_fraction_leaf ๊ฐ€ ๋ชจ๋“  ์ƒ˜ํ”Œ์— ๋™์ผํ•œ ๊ฐ€์ค‘์น˜๊ฐ€ ์ ์šฉ๋œ๋‹ค๋Š” ๊ฐ€์ •ํ•˜์— ์—ฌ์ „ํžˆ ์ ์šฉ๋˜์–ด์•ผํ•˜๊ฑฐ๋‚˜ sample_weight ๊ฐ€ ์ œ๊ณต๋˜์ง€ ์•Š์•˜๊ธฐ ๋•Œ๋ฌธ์— min_weight_fraction_leaf ๊ฐ€ ์‚ฌ์šฉ๋˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒฝ๊ณ ๊ฐ€ ์ฃผ์–ด์ ธ์•ผํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

๋ฒ„์ „

Darwin-15.5.0-x86_64-i386-64 ๋น„ํŠธ
Python 3.5.1 | Continuum Analytics, Inc. | (๊ธฐ๋ณธ๊ฐ’, 2015 ๋…„ 12 ์›” 7 ์ผ, 11:24:55)
[GCC 4.2.1 (Apple Inc. ๋นŒ๋“œ 5577)]
NumPy 1.11.0
SciPy 0.17.1
Scikit-Learn 0.17.1

๋˜ํ•œ ๋‚ด๊ฐ€ ์ œ์•ˆํ•œ ๋ณ€๊ฒฝ ์‚ฌํ•ญ์„ ์ž‘์„ฑํ•˜๊ณ  ์‹ถ์ง€๋งŒ (ํ•ด๋‹น ์‚ฌํ•ญ์ด ์žˆ๋‹ค๊ณ  ํŒ๋‹จ๋˜๋ฉด) ์˜คํ”ˆ ์†Œ์Šค ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ๊ธฐ์—ฌํ•œ ๊ฒฝํ—˜์ด ๊ฑฐ์˜ ์—†์Šต๋‹ˆ๋‹ค. ๋ˆ„๊ตฐ๊ฐ€ ๋‚˜๋ฅผ ๋„์™€ ์ค„ ์ˆ˜ ์žˆ๋‹ค๋ฉด ์•ฝ๊ฐ„์˜ ์† ์žก๊ธฐ๊ฐ€ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

min_weight_fraction_leaf ๊ฐ€ ์„ค์ •๋˜๊ณ  sample_weights ์ œ๊ณต๋˜์ง€ ์•Š์œผ๋ฉด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๊ฑฐ๋‚˜ ๊ท ์ผ ํ•œ ๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ€์ •ํ•ด์•ผํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ min_samples_leaf ์™€ ์•ฝ๊ฐ„ ์ค‘๋ณต๋˜์ง€๋งŒ ๊ท ์ผ ํ•œ ๊ฐ€์ค‘์น˜๊ฐ€ ์—ฌ์ „ํžˆ ๋” ๋‚ซ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋“  3 ๋Œ“๊ธ€

PR์„ ์ œ์ถœํ•˜์‹ญ์‹œ์˜ค

2016 ๋…„ 6 ์›” 29 ์ผ 06:09์— Ben [email protected] ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ผ์Šต๋‹ˆ๋‹ค.

๊ธฐ์ˆ 

๋‚˜๋Š” min_weight_fraction_leaf ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
DecisionTreeClassifier ๋ฐ RandomForestClassifier๊ฐ€ ์ž˜๋ชป๋˜์—ˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค๋„ ๋‚˜์™€ ๊ฐ™์€ ์ผ์„ํ•˜๊ณ ์žˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Šต๋‹ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด, min_weight_fraction_leaf์— ๋Œ€ํ•œ ๋ฌธ์„œ๋Š”
DecisionTreeClassifier
http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
๋งํ•œ๋‹ค

์ž…๋ ฅ ์ƒ˜ํ”Œ์˜ ์ตœ์†Œ ๊ฐ€์ค‘ ๋น„์œจ์€
๋ฆฌํ”„ ๋…ธ๋“œ.

๋ฌธ์„œ๊ฐ€ "๊ฐ€์ค‘์น˜ ๋น„์œจ"์ด ์˜๋ฏธํ•˜๋Š” ๋ฐ”๊ฐ€
์ž…๋ ฅ ์ƒ˜ํ”Œ "์ž…๋‹ˆ๋‹ค. ์ฒ˜์Œ์—๋Š” ์ด๊ฒƒ์ด
ํด๋ž˜์Šค์˜ ํฌ๊ธฐ ๋˜๋Š” class_weight์— ์˜ํ•ด ์ œ๊ณต๋œ ๊ฐ’. ๋‚˜๋Š” ์•ฝ๊ฐ„์˜ ์ƒ๊ฐ
๋งค๊ฐœ ๋ณ€์ˆ˜ ์„ค๋ช…์„ ๋ณ€๊ฒฝํ•˜๋ฉด ์ด๋Ÿฌํ•œ ํ˜ผ๋ž€์„ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ˜น์‹œ
๋ญ”๊ฐ€

์ž…๋ ฅ ์ƒ˜ํ”Œ์˜ ์ตœ์†Œ ๊ฐ€์ค‘ ๋น„์œจ์€
fit () ๋ฉ”์„œ๋“œ์—์„œ sample_weight์— ์˜ํ•ด ๊ฐ€์ค‘์น˜๊ฐ€ ๊ฒฐ์ •๋˜๋Š” ๋ฆฌํ”„ ๋…ธ๋“œ์ž…๋‹ˆ๋‹ค.

๋˜ํ•œ min_weight_fraction_leaf๋Š” ๋‹ค์Œ ๊ฒฝ์šฐ์—๋งŒ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.
sample_weight๋Š” fit () ํ˜ธ์ถœ์—์„œ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. sample_weight๊ฐ€ ์•„๋‹Œ ๊ฒฝ์šฐ
fit () ํ˜ธ์ถœ์—์„œ ์ œ๊ณต๋˜๋Š” min_weight_fraction_leaf๋Š” ์กฐ์šฉํžˆ
๋ฌด์‹œ๋ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ๋Š” min_weight_fraction_leaf๊ฐ€ ์—ฌ์ „ํžˆ ์•„๋ž˜์— ์ ์šฉ๋˜์–ด์•ผํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.
๋ชจ๋“  ์ƒ˜ํ”Œ์˜ ๊ฐ€์ค‘์น˜๊ฐ€ ๋™์ผํ•˜๊ฑฐ๋‚˜ ๊ฒฝ๊ณ ๊ฐ€ ์žˆ์–ด์•ผํ•œ๋‹ค๋Š” ๊ฐ€์ •
min_weight_fraction_leaf๋Š” sample_weight ์ดํ›„๋กœ ์‚ฌ์šฉ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
์ œ๊ณต๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.
๋ฒ„์ „

Darwin-15.5.0-x86_64-i386-64 ๋น„ํŠธ
Python 3.5.1 | Continuum Analytics, Inc. | (๊ธฐ๋ณธ๊ฐ’, 2015 ๋…„ 12 ์›” 7 ์ผ, 11:24:55)
[GCC 4.2.1 (Apple Inc. ๋นŒ๋“œ 5577)]
NumPy 1.11.0
SciPy 0.17.1
Scikit-Learn 0.17.1

โ€”
์ด ์Šค๋ ˆ๋“œ๋ฅผ ๊ตฌ๋…ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์—์ด ๋ฉ”์‹œ์ง€๊ฐ€ ์ „์†ก๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
์ด ์ด๋ฉ”์ผ์— ์ง์ ‘ ๋‹ต์žฅํ•˜๊ณ  GitHub์—์„œ ํ™•์ธํ•˜์„ธ์š”.
https://github.com/scikit-learn/scikit-learn/issues/6945 , ๋˜๋Š” ์Œ์†Œ๊ฑฐ
์‹ค
https://github.com/notifications/unsubscribe/AAEz6xE2BmEJHo6hGgTWoigsPutoD4_nks5qQX9zgaJpZM4JAe96
.

min_weight_fraction_leaf ๊ฐ€ ์„ค์ •๋˜๊ณ  sample_weights ์ œ๊ณต๋˜์ง€ ์•Š์œผ๋ฉด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๊ฑฐ๋‚˜ ๊ท ์ผ ํ•œ ๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ€์ •ํ•ด์•ผํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ min_samples_leaf ์™€ ์•ฝ๊ฐ„ ์ค‘๋ณต๋˜์ง€๋งŒ ๊ท ์ผ ํ•œ ๊ฐ€์ค‘์น˜๊ฐ€ ์—ฌ์ „ํžˆ ๋” ๋‚ซ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.

๋‚˜๋Š” ์ด๊ฒƒ์ด min_samples_leaf ์™€ ๋น„์Šทํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ๋ฆฌํ”„ ๋…ธ๋“œ์—์„œ ์ ˆ๋Œ€ ์ƒ˜ํ”Œ ์ˆ˜๋ฅผ ์š”๊ตฌํ•˜๋Š” ๋Œ€์‹  min_weight_fraction_leaf ๋Š” ๊ฐ ๋ฆฌํ”„์—์„œ ์ƒ˜ํ”Œ์˜ ์ผ๋ถ€ (๋˜๋Š” ๊ฐ€์ค‘์น˜)๋ฅผ ์š”๊ตฌํ•˜๋Š” ์˜ต์…˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ์ƒ˜ํ”Œ์— ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š”์ง€ ์—ฌ๋ถ€๋Š” class_weight ์— ๋”ฐ๋ผ ๋‹ค๋ฆ…๋‹ˆ๋‹ค.

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰