Xgboost: What exactly does gblinear+reg:linear do? And other questions

Created on 24 May 2015 · 12Comments · Source: dmlc/xgboost

Hi,

I'm starting to discover the power of xgboost and hence playing around with demo datasets (Boston dataset from sklearn.datasets right now).

If I understand correctly the parameters, by choosing:

plst=[('silent', 1),
 ('eval_metric', 'rmse'),
 ('nthread', 1),
 ('objective', 'reg:linear'),
 ('eta', 1),
 ('booster', 'gblinear'),
 ('lambda', 0),
 ('alpha', 10)]

I should be doing boosting with Lasso regressions as weak learners. _Is that right?_ If that is the case, with num_round=1 I should get back the output of a single Lasso regression, hence a very sparse weight vector, which is totally not the case.
But in fact, I don't think it is possible to get the output of a single round. Indeed, the model weight vector changes when I change eta, even with num_round=1, meaning that there are at least two rounds, or else I have misunderstood something. _Is my interpretation correct?_

Also, after a few tests, I don't think parameter eta is limited to gradient boosting tree in your implementation, although in the doc it is in the subsection _tree parameter_. This seems logical to me, as shrinkage by setting a learning rate is a way to regularize boosting which is not specific to boosting trees.
What could also not be limited to boosting tree is the 'subsample' option. It can be a way to regularize boosting linear regressors as well as boosting trees (at least in theory).

Finally, just to be sure I understand, to the learning objective corresponds a given loss, which is not necessarily linked to the eval_metric, only used for user evaluation? Since many eval_metrics can be specified. In that case, what are the different losses for reg:logistic, reg:linear and multi:softmax? (e.g. squared loss or absolute loss for regression, exponential or deviance loss for classification?)

Thank you for your answers,
Best,
Alice

Source

aschoenauer-sebag

Most helpful comment

@tqchen Can you give a but more detail on what gblinear is actually doing? Reference to an article/formula would be great.

In particular, I'd like to be able to work out the predictions for 1 round of training (nrounds=1). Here's a sample dataset:

> train
        x      y
 1: 13.36  37.54
 2:  5.35  14.54
 3:  0.26  -0.72
 4: 84.16 261.19
 5: 24.67  76.90
 6: 22.26  67.15
 7: 18.02  53.89
 8: 14.29  43.48
 9: 61.66 182.60
10: 57.26 179.44

After I train a linear regression model and an xgboost model with parameters {booster="gblinear", objective="reg:linear", eta=1, subsample=1, lambda=0, lambda_bias=0, alpha=0} I get the following results

> test
        x      y Pred.linreg Pred.xgb
 1: 47.75 153.23      146.25    155.7
 2: 12.13  40.05       35.78    107.9
 3: 89.05 274.37      274.34    211.1
 4: 38.87 116.51      118.71    143.8
 5: 27.30  80.61       82.83    128.2
 6: 87.66 267.95      270.02    209.3
 7: 39.33 114.97      120.14    144.4
 8: 64.32 191.73      197.64    177.9
 9: 13.18  48.28       39.04    109.3
10:  8.89  23.30       25.73    103.5

What's actually going on here? Thanks!

ben519 on 21 Nov 2016

👍22

All 12 comments

You want to use multiple rounds in gblinear in order to get back a single lasso regression.

This was because it makes less sense to stacking linear models(which was again a linear model). So num_round steps of update was used jointly to solve a single lasso problem.

eval_metric have nothing to do with objective function. The loss function are documented in the parameters.md reg:logistic for logistic regression, reg:linear for squared-loss and muti:softmax for softmax multiclass classification

tqchen on 24 May 2015

Hi,

Thank you for your answers! I've understood how you get back a single Lasso after a large num_rounds and not 1. Then it also answers my second question, that is you're using squared loss for regression. And I guess you're using binomial/multinomial deviance for classification. _Is that correct?_

I wasn't sure, because for regression you could also use the L1 loss or the Huber loss for example (common examples given section 10.10.2 of the elements of statistical learning).

Thank you for your help,
Alice

aschoenauer-sebag on 25 May 2015

Hi,

I want to apply xgboost on a regression model, means my dependent variable is type of continuous numeric. However I am confused, what should I provide in "label" argument. Please help here.

regards,
Vivek

viv1ag on 25 May 2015

in regression case label is your regression target

On Mon, May 25, 2015 at 8:43 AM, Vivek Agarwal [email protected]
wrote:

Hi,

I want to apply xgboost on a regression model, means my dependent variable
is type of continuous numeric. However I am confused, what should I provide
in "label" argument. Please help here.

regards,
Vivek

—
Reply to this email directly or view it on GitHub
https://github.com/dmlc/xgboost/issues/332#issuecomment-105253915.

Sincerely,

Tianqi Chen
Computer Science & Engineering, University of Washington

tqchen on 25 May 2015

See documentation of objective in
https://github.com/dmlc/xgboost/blob/master/doc/parameter.md
Different objective function can be specified, and you can find meaning of
current parameters

Tianqi

On Mon, May 25, 2015 at 1:47 AM, AliceS [email protected] wrote:

Hi,

Thank you for your answers! I've understood how you get back a single
Lasso after a large num_rounds and not 1.

Regarding the loss functions, it is still not clear what they are, at
least for regression. You could used the squared loss, the L1 loss or the
Huber loss for example (common examples given section 10.10.2 of the
elements of statistical learning). For classification I guess you use
binomial/multinomial deviance (could also be the exponential loss in which
case you get back classical boosting).

Thank you for your help,
Alice

—
Reply to this email directly or view it on GitHub
https://github.com/dmlc/xgboost/issues/332#issuecomment-105170486.

Sincerely,

Tianqi Chen
Computer Science & Engineering, University of Washington

tqchen on 25 May 2015

Thanks Tianqi,
Alice

aschoenauer-sebag on 25 May 2015

Dependent variable has numeric values. Below are the First 6 observations of the regression target:

head(final_data[1:n.train,'Dependent'])
[1] 4996 3784 1504 4994 3687 3084

Now if I am putting this dependent variable into labels & executing below code:

param <- list("objective" = "reg:linear",
"num_class" = 9,
"nthread" = 8,
"eta" = 0.08,
"subsample"=0.8,
"gamma" = 1,
"min_child_weight" = 2,
"max_depth"= 12,
"colsample_bytree" = 1
)
model_xg <- xgboost(param=param,data = final_data[1:n.train,],label=final_data[1:n.train,'Dependent'],nrounds=250)

Then I am getting following error:

Error in xgb.get.DMatrix(data, label) : xgboost: Invalid input of data
In addition: Warning message:
In xgb.get.DMatrix(data, label) : xgboost: label will be ignored.

Please let me know what am I doing wrong?

viv1ag on 26 May 2015

@vivekag Please open a new issue on this. It would be great if you can provide a snippet of code(possibly with some dummy data) that we can run to reproduce the problem.

tqchen on 27 May 2015

I guess it was because data type of final_data was not what xgboost expect (xgboost expect a matrix or sparse matrix) @hetong007

tqchen on 27 May 2015

@tqchen Can you give a but more detail on what gblinear is actually doing? Reference to an article/formula would be great.

In particular, I'd like to be able to work out the predictions for 1 round of training (nrounds=1). Here's a sample dataset:

> train
        x      y
 1: 13.36  37.54
 2:  5.35  14.54
 3:  0.26  -0.72
 4: 84.16 261.19
 5: 24.67  76.90
 6: 22.26  67.15
 7: 18.02  53.89
 8: 14.29  43.48
 9: 61.66 182.60
10: 57.26 179.44

> test
        x      y Pred.linreg Pred.xgb
 1: 47.75 153.23      146.25    155.7
 2: 12.13  40.05       35.78    107.9
 3: 89.05 274.37      274.34    211.1
 4: 38.87 116.51      118.71    143.8
 5: 27.30  80.61       82.83    128.2
 6: 87.66 267.95      270.02    209.3
 7: 39.33 114.97      120.14    144.4
 8: 64.32 191.73      197.64    177.9
 9: 13.18  48.28       39.04    109.3
10:  8.89  23.30       25.73    103.5

What's actually going on here? Thanks!

ben519 on 21 Nov 2016

👍22

What is the difference between xg regression and other general regression?
I know xgboost has tree version which is different from other tree algorithm.
Guess need to hack the source.