Xgboost: 정규화가 필요한가요?

에 만든 2015년 06월 17일 · 3코멘트 · 출처: dmlc/xgboost

xgboost가 이론적으로 어떻게 작동하는지 잘 모르겠습니다. 그러나 xgboost는 트리 기반 분류기이므로 기능의 정규화가 필요하지 않다고 가정해도 될까요?

출처

frankzhangrui

가장 유용한 댓글

아니요 기능을 정규화할 필요가 없습니다.

tqchen 에 2015년 06월 17일

👍5

모든 3 댓글

아니요 기능을 정규화할 필요가 없습니다.

tqchen 에 2015년 06월 17일

👍5

원칙적으로 나무를 부스트할 때 정규화할 필요가 없다는 것을 이해하고 있다고 생각합니다.

그러나 특히 ' reg:gamma '를 사용하여 대상 y를 조정할 때 상당한 영향을 볼 수 있지만 ' reg:linear '(기본값)에 대해서도 (덜 정도) 영향을 미칩니다. 그 이유는 무엇입니까?

Boston Housing 데이터 세트의 예:

import numpy as np
import xgboost as xgb
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_boston

boston = load_boston()
y = boston['target']
X = boston['data']

for scale in np.logspace(-6, 6, 7):
    xgb_model = xgb.XGBRegressor().fit(X, y / scale)
    predictions = xgb_model.predict(X) * scale
    print('{} (scale={})'.format(mean_squared_error(y, predictions), scale))

2.3432734454908335(축척=1e-06)
2.343273977065266(축척=0.0001)
2.3432793874455315(축척=0.01)
2.290595204136888(축척=1.0)
2.528513393507719(축척=100.0)
7.228978353091473(축척=10000.0)
272.29640759874474(축척=1000000.0)

' reg:gamma '를 사용할 때 y 스케일링의 영향은 정말 큽니다.

for scale in np.logspace(-6, 6, 7):
    xgb_model = xgb.XGBRegressor(objective='reg:gamma').fit(X, y / scale)
    predictions = xgb_model.predict(X) * scale
    print('{} (scale={})'.format(mean_squared_error(y, predictions), scale))

591.6509503519147(축척=1e-06)
545.8298971540023(축척=0.0001)
37.68688286293508(축척=0.01)
4.039819858716935(스케일=1.0)
2.505477263590776(축척=100.0)
198.94093800190453(축척=10000.0)
592.1469169959003(축척=1000000.0)

kdebrab 에 2018년 08월 31일

@tqchen Boosted Trees에 대한 훌륭한

loretoparisi 에 2018년 11월 08일

이 페이지가 도움이 되었나요?

0 / 5 - 0 등급

Xgboost: 정규화가 필요한가요?

가장 유용한 댓글

모든 3 댓글

관련 문제