Xgboost: [新功能] 树构造中的单调约束

创建于 2016-08-27 · 46评论 · 资料来源: dmlc/xgboost

我收到了一些关于在输出方面支持某些功能的单调约束的请求，

即当其他特征固定时，强制预测相对于特定指定特征单调递增。我打开这个问题是为了了解对这个功能的普遍兴趣。如果对此有足够的兴趣，我可以添加这个，

我需要社区志愿者的帮助来测试 beta 功能并提供有关使用此功能的文档和教程。如果您有兴趣，请回复问题

资料来源

tqchen

最有用的评论

目前此功能不在 Sklearn api 中。您或其他人可以帮忙添加吗？谢谢！

carsonyan 于 2017-02-27

👍4

所有46条评论

https://github.com/dmlc/xgboost/pull/1516 中提供了一个实验版本https://github.com/tqchen/xgboost ，

打开以下选项（可能通过 python, r API 实现）

monotone_constraints = "(0,1,1,0)"

有两种说法

monotone_constraints是特征数量长度的列表，1表示单调递增，-1表示递减，0表示无约束。如果它小于特征数量，则将填充 0。
- 目前它支持python的元组格式，您可以在使用r时将事物作为字符串传递

验证事项

[x] 原来tree boosters的速度没有变慢（我稍微改变了代码结构，理论上模板优化会内联出来，但需要确认）
[x] 单调回归的速度和正确性
[x] 通过引入此约束的性能

已知限制

目前仅在多核上支持精确贪婪算法。尚未提供分布式版本

tqchen 于 2016-08-27

@tqchen我今天在工作中收到一个请求，要构建一些具有单调约束的 GBM，以测试与其他一些模型的性能。这将带有花呢偏差损失，因此我将不得不采用今天的自定义损失函数。

无论如何，这似乎是一个帮助他人并同时完成一些工作的好机会。

madrury 于 2016-08-29

根据此处的讨论，GBM(R Package) 仅在本地强制执行单调性。
您能否澄清 XGBoost 如何强制执行单调约束？
如果 XGBoost 可以强制执行全局约束，那就太好了。

yanyachen 于 2016-08-30

我不明白你所说的局部或全局约束是什么意思，你能详细说明一下吗？

tqchen 于 2016-08-31

对不起，我贴错了链接，这是正确的（链接）
每棵树可能只遵循感兴趣特征的某些子集中的单调约束，因此许多树集成在一起可能会破坏该特征整个范围的整体单调性。

yanyachen 于 2016-08-31

好吧，据我所知，它是在全球范围内强制执行的。欢迎您试用。

tqchen 于 2016-08-31

只是在单变量回归的背景下做了一些简单的单调性约束测试。您可以在此处找到代码和一些非常简短的文档：

https://github.com/XiaoxiaoWang87/xgboost_mono_test/blob/master/xgb_monotonicity_constraint_testing1-univariate.ipynb

一些初步观察：

对于单变量回归问题，单调约束 = +1 似乎效果很好
对于单变量回归问题，在我的数据集中，单调约束 = -1 似乎不会产生单调递减的函数。相反，它给出了一个常数。但这也可能是由于在强制约束时缺乏改进。待确认（根据天齐的建议尝试翻转数据集并将约束设置为+1）。
添加约束（正确）可以潜在地防止过度拟合并带来一些性能/解释优势。

XiaoxiaoWang87 于 2016-09-02

结果我在约束 = -1 的情况下引入了一个错误。我推送了一个修复程序，请查看最新版本是否运行良好。当有多个约束时，还请检查它是否有效

tqchen 于 2016-09-03

@tqchen我测试了您对递减错误的修复，现在似乎可以正常工作了。

xgboost-no-constraint
xgboost-with-constraint

madrury 于 2016-09-03

让我们确认一些标准数据集上的速度与原始版本相比是否有所下降，然后我们可以将其合并

tqchen 于 2016-09-03

@tqchen我测试了一个两变量模型，一个约束增加，一个约束减少：

params_constrained = params.copy()
params_constrained['updater'] = "grow_monotone_colmaker,prune"
params_constrained['monotone_constraints'] = "(1,-1)"

结果很好

xgboost-two-vars-increasing
xgboost-two-vars-decreasing

今天下午我会尽量找点时间做一些计时测试。

madrury 于 2016-09-03

我对 #1516 进行了更新以允许自动检测单调选项，现在用户只需要传入monotone_constraints = "(0,1,1,0)" ，请检查它是否有效。

如果速度测试正常，我会将其合并，然后让我们继续添加教程的下一阶段

@madrury @XiaoxiaoWang87

tqchen 于 2016-09-06

在此处添加了多变量案例的测试：

https://github.com/XiaoxiaoWang87/xgboost_mono_test/blob/master/xgb_monotonicity_constraint_testing2-multivariate.ipynb

我现在确认单调约束 = 1 和 = -1 都按预期工作。
约束单调性不会导致明显的速度*退化
*speed = avg [提前停止前的时间/提前停止前的加速迭代次数]

no constraint: 964.9 microseconds per iteration
with constraint: 861.7 microseconds per iteration

（如果您有更好的速度测试方法，请发表评论）

在约束非单调变量的方向时需要小心。这可能会导致性能下降。
在玩不同的超参数时，由于Check failed: (wleft) <= (wright)导致代码崩溃。

XiaoxiaoWang87 于 2016-09-06

我在 jupyter notebook 中进行了几次计时实验。

第一个测试：一些简单的模拟数据。有两个特征，一个递增，一个递减，但是叠加了一个小的正弦波，因此每个特征都不是真正单调的

X = np.random.random(size=(N, K))
y = (5*X[:, 0] + np.sin(5*2*pi*X[:, 0])
     - 5*X[:, 1] - np.cos(5*2*pi*X[:, 1])
     + np.random.normal(loc=0.0, scale=0.01, size=N))

以下是有和没有单调约束的 xgboosts 的计时结果。我关闭了提前停止并为每个增加了一定数量的迭代。

首先没有单调约束：

%%timeit -n 100
model_no_constraints = xgb.train(params, dtrain, 
                                 num_boost_round = 2500, 
                                 verbose_eval = False)

100 loops, best of 3: 246 ms per loop

这里有单调性约束

%%timeit -n 100
model_with_constraints = xgb.train(params_constrained, dtrain, 
                                 num_boost_round = 2500, 
                                 verbose_eval = False)

100 loops, best of 3: 196 ms per loop

第二个测试：来自 sklearn 的加利福尼亚 hHousing 数据。无约束

%%timeit -n 10
model_no_constraints = xgb.train(params, dtrain, 
                                 num_boost_round = 2500, 
                                 verbose_eval = False)

10 loops, best of 3: 5.9 s per loop

这是我使用的约束

print(params_constrained['monotone_constraints'])

(1,1,1,0,0,1,0,0)

以及约束模型的时间

%%timeit -n 10
model_no_constraints = xgb.train(params, dtrain, 
                                 num_boost_round = 2500, 
                                 verbose_eval = False)

10 loops, best of 3: 6.08 s per loop

madrury 于 2016-09-07

@XiaoxiaoWang87我已经推动了另一个 PR 来取消对 wleft 和 wright 的检查，请看看它是否有效。
@madrury您能否与没有约束功能的 XGBoost 的先前版本进行比较？

tqchen 于 2016-09-07

@tqchen当然。你能推荐一个提交哈希来比较吗？我应该在添加单调约束之前使用提交吗？

madrury 于 2016-09-07

是的，前一个可以

tqchen 于 2016-09-07

@tqchen在重建更新版本时，我遇到了一些以前没有的错误。我希望你能清楚地说出原因。

如果我尝试运行与以前相同的代码，则会出现异常，这是完整的回溯：

XGBoostError                              Traceback (most recent call last)
<ipython-input-14-63a9f6e16c9a> in <module>()
      8    model_with_constraints = xgb.train(params, dtrain, 
      9                                        num_boost_round = 1000, evals = evallist,
---> 10                                    early_stopping_rounds = 10)  

/Users/matthewdrury/anaconda/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/training.pyc in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, learning_rates, xgb_model, callbacks)
    201                            evals=evals,
    202                            obj=obj, feval=feval,
--> 203                            xgb_model=xgb_model, callbacks=callbacks)
    204 
    205 

/Users/matthewdrury/anaconda/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/training.pyc in _train_internal(params, dtrain, num_boost_round, evals, obj, feval, xgb_model, callbacks)
     72         # Skip the first update if it is a recovery step.
     73         if version % 2 == 0:
---> 74             bst.update(dtrain, i, obj)
     75             bst.save_rabit_checkpoint()
     76             version += 1

/Users/matthewdrury/anaconda/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/core.pyc in update(self, dtrain, iteration, fobj)
    804 
    805         if fobj is None:
--> 806             _check_call(_LIB.XGBoosterUpdateOneIter(self.handle, iteration, dtrain.handle))
    807         else:
    808             pred = self.predict(dtrain)

/Users/matthewdrury/anaconda/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/core.pyc in _check_call(ret)
    125     """
    126     if ret != 0:
--> 127         raise XGBoostError(_LIB.XGBGetLastError())
    128 
    129 

XGBoostError: [14:08:41] src/tree/tree_updater.cc:18: Unknown tree updater grow_monotone_colmaker

如果我为您实现的关键字参数切换所有内容，我也会收到错误消息：

TypeError                                 Traceback (most recent call last)
<ipython-input-15-ef7671f72925> in <module>()
      8                                    monotone_constraints="(1)",
      9                                    num_boost_round = 1000, evals = evallist,
---> 10                                    early_stopping_rounds = 10)  

TypeError: train() got an unexpected keyword argument 'monotone_constraints'

madrury 于 2016-09-07

删除更新器参数并在参数中保留单调约束参数，现在单调约束更新器会在出现单调约束时自动激活

tqchen 于 2016-09-07

@tqchen我的好友@amontz在我发布消息后立即帮助我解决了这个问题。我将您的评论解释为将monotone_constraints作为 kwarg 传递给.train 。

它适用于这些调整。谢谢。

madrury 于 2016-09-07

@madrury你能确认速度吗？

tqchen 于 2016-09-08

还有@madrury和@XiaoxiaoWang87因为这个功能现在已经接近合并了，如果你能协调创建一个向用户介绍这个功能的教程，那就太好了。

我们不能直接将 ipy notebook 带到主仓库。但是可以将图像推送到https://github.com/dmlc/web-data/tree/master/xgboost并降价到主存储库。

tqchen 于 2016-09-08

我们还需要改变前端接口的字符串转换，让int tuple可以转换成后端可以接受的string tuple格式。

@hetong007用于 R 的更改， slundberg用于 Julia

tqchen 于 2016-09-08

@tqchen Julia 目前已连接到 0.4 版本的 XGBoost，因此下次我需要使用它并留出时间时，如果届时没有其他人，我将更新绑定。届时，也可以添加此更改。

slundberg 于 2016-09-08

这是从实现之前到之后_没有_单调约束的模型之间的比较。

提交 8cac37 ：在实现单调约束之前。
模拟数据： 100 loops, best of 3: 232 ms per loop
加州数据： 10 loops, best of 3: 5.89 s per loop

Commit b1c224 ：在实现单调约束之后。
模拟数据： 100 loops, best of 3: 231 ms per loop
加州数据： 10 loops, best of 3: 5.61 s per loop

实现后加利福尼亚的加速对我来说看起来很可疑，但我每种方式都尝试了两次，并且它是一致的。

madrury 于 2016-09-08

我很乐意尝试编写教程。我将查看现有文档并在接下来的几天内整理一些内容。

madrury 于 2016-09-08

太好了，PR现在正式合并到master了。期待看到教程

tqchen 于 2016-09-08

谢谢@madrury。对此期待。让我知道我能提供什么帮助。我当然愿意对这个主题进行更多的研究。

XiaoxiaoWang87 于 2016-09-08

明天我会加强它。我只是好奇通过字符串而不是数组与 C++ 通信的原因。

hetong007 于 2016-09-08

我正在从 R 进行测试。我随机生成了一个二变量数据并尝试进行预测。

然而，我发现

xgboost 不限制预测。
参数monotone_constraints使预测略有不同。

如果我有任何错误，请指出。

重现它的代码（在最新的 github 版本上测试，而不是来自drat ）：

set.seed(1024)
x1 = rnorm(1000, 10)
x2 = rnorm(1000, 10)
y = -1*x1 + rnorm(1000, 0.001) + 3*sin(x2)
train = cbind(x1, x2)

bst = xgboost(data = train, label = y, max_depth = 2,
                   eta = 0.1, nthread = 2, nrounds = 10,
                   monotone_constraints = '(1,-1)')

pred = predict(bst, train)
ind = order(train[,1])
pred.ord = pred[ind]
plot(train[,1], y, main = 'with constraint')
pred.ord = pred[order(train[,1])]
lines(pred.ord)

bst = xgboost(data = train, label = y, max_depth = 2,
                   eta = 0.1, nthread = 2, nrounds = 10)

pred = predict(bst, train)
ind = order(train[,1])
pred.ord = pred[ind]
plot(train[,1], y, main = 'without constraint')
pred.ord = pred[order(train[,1])]
lines(pred.ord)

woc

hetong007 于 2016-09-08

约束是在偏序上完成的。所以约束只有在我们移动单调轴时才会强制执行，保持其他轴固定

tqchen 于 2016-09-08

@hetong007为了制作我的情节，我

创建了一个包含 x 坐标网格的数组，我想预测该变量，然后加入线图。这将在 R 中使用seq 。
将所有其他变量设置为其在训练数据中的平均值。这类似于 R 中的colmeans 。

这是我用于上面包含的绘图的 python 代码，它应该很容易转换为等效的 R 代码。

def plot_one_feature_effect(model, X, y, idx=1):

    x_scan = np.linspace(0, 1, 100)    
    X_scan = np.empty((100, X.shape[1]))
    X_scan[:, idx] = x_scan

    left_feature_means = np.tile(X[:, :idx].mean(axis=0), (100, 1))
    right_feature_means = np.tile(X[:, (idx+1):].mean(axis=0), (100, 1))
    X_scan[:, :idx] = left_feature_means
    X_scan[:, (idx+1):] = right_feature_means

    X_plot = xgb.DMatrix(X_scan)
    y_plot = model.predict(X_plot, ntree_limit=bst.best_ntree_limit)

    plt.plot(x_scan, y_plot, color = 'black')
    plt.plot(X[:, idx], y, 'o', alpha = 0.25)

madrury 于 2016-09-08

这是我如何绘制部分依赖图（对于任意模型）：

扫描特征 X 的值网格。
对于特征 X 的每个网格值：
- 将整个特征 X 列（所有行）设置为此值。其他功能不变。
- 对所有行进行预测。
- 取预测的平均值。
结果（X 特征值，平均预测）对为您提供 X 特征部分依赖。

代码：

def plot_partial_dependency(bst, X, y, f_id):

    X_temp = X.copy()

    x_scan = np.linspace(np.percentile(X_temp[:, f_id], 0.1), np.percentile(X_temp[:, f_id], 99.5), 50)
    y_partial = []

    for point in x_scan:

        X_temp[:, f_id] = point

        dpartial = xgb.DMatrix(X_temp[:, feature_ids])
        y_partial.append(np.average(bst.predict(dpartial)))

    y_partial = np.array(y_partial)

    # Plot partial dependence

    fig, ax = plt.subplots()
    fig.set_size_inches(5, 5)
    plt.subplots_adjust(left = 0.17, right = 0.94, bottom = 0.15, top = 0.9)

    ax.plot(x_scan, y_partial, '-', color = 'black', linewidth = 1)
    ax.plot(X[:, f_id], y, 'o', color = 'blue', alpha = 0.02)

    ax.set_xlim(min(x_scan), max(x_scan))
    ax.set_xlabel('Feature X', fontsize = 10)    
    ax.set_ylabel('Partial Dependence', fontsize = 12)

XiaoxiaoWang87 于 2016-09-09

谢谢指导！我意识到我在情节中犯了一个愚蠢的错误。这是对单变量数据的另一个测试，情节似乎很好：

set.seed(1024)
x = rnorm(1000, 10)
y = -1*x + rnorm(1000, 0.001) + 3*sin(x)
train = matrix(x, ncol = 1)

bst = xgboost(data = train, label = y, max_depth = 2,
               eta = 0.1, nthread = 2, nrounds = 100,
               monotone_constraints = '(-1)')
pred = predict(bst, train)
ind = order(train[,1])
pred.ord = pred[ind]
plot(train[,1], y, main = 'with constraint', pch=20)
lines(train[ind,1], pred.ord, col=2, lwd = 5)

rplot

bst = xgboost(data = train, label = y, max_depth = 2,
               eta = 0.1, nthread = 2, nrounds = 100)
pred = predict(bst, train)
ind = order(train[,1])
pred.ord = pred[ind]
plot(train[,1], y, main = 'without constraint', pch=20)
lines(train[ind,1], pred.ord, col=2, lwd = 5)

woc

hetong007 于 2016-09-09

👍1

@hetong007所以 R 接口的目标是让用户除了字符串之外还可以传入 R 数组

monotone_constraints=c(1,-1)

tqchen 于 2016-09-09

请在您 PR 时告诉我们教程

@hetong007也非常欢迎您制作 r-blogger 版本

tqchen 于 2016-09-12

@tqchen对不起，伙计们，我这周一直在出差。

我为单调约束教程发送了几个拉取请求。请让我知道您的想法，我对任何批评或批评感到满意。

madrury 于 2016-09-15

希望在这里问这个问题是合适的：如果我们使用通常的git clone --recursive https://github.com/dmlc/xgboost更新，现在是否可以工作？

当我看到新的教程时，我问，但对代码本身的更改没有什么新意。谢谢你们！

JoshuaC3 于 2016-12-19

是的，在合并教程之前合并了新功能

tqchen 于 2016-12-19

你好，

我不确定您是否成功实现了全局单调性，从我在您的代码中看到的内容来看，它更符合局部单调性。

这是一个打破单调性的简单示例：

`
df <- data.frame(y = c(2,rep(6,100),1,rep(11,100)),
x1= c(rep(1,101),rep(2,101)),x2 = c(1,rep(2,100),1,rep(2,100)))

图书馆（xgboost）
设置种子（0）
XGB <- xgboost(data=data.matrix(df[,-1]),label=df[,1],
目标=“注册：线性”，
bag.fraction=1,nround=100,monotone_constraints=c(1,0),
eta=0.1 )

sans_corr <- data.frame(x1=c(1,2,1,2),x2=c(1,1,2,2))

sans_corr$prediction <- predict(XGB,data.matrix(sans_corr))
`

希望我对您的代码和示例的理解不是错误的