Scikit-learn: Add linear quantile regression

Created on 13 May 2014  ·  17Comments  ·  Source: scikit-learn/scikit-learn

It seems that there are not very much packages in python with quantile regression...

Most helpful comment

Hello everyone! Please review my PR when you have time for it.

All 17 comments

It seems that there are not very much packages in python with quantile
regression...

Why should it go in scikit-learn?

If possible, why not?

Because we have a lot of code to maintain already and we only include popular ML algorithms. Is this popular? Does it hold clear benefits for machine learning tasks over other approaches?

(Btw. VW has quantile regression, with loss ℓ(p,y) = τ(p - y)[[y ≤ p]] + (1 - τ)(y - p)[[y ≥ p]] where [[⋅]] are Iverson brackets.)

GradientBoostingRegressor supports quantile regression (using loss=quantile and the alpha parameter). See http://scikit-learn.org/dev/auto_examples/ensemble/plot_gradient_boosting_quantile.html#example-ensemble-plot-gradient-boosting-quantile-py for an example.

I should have checked that. Closing this issue.

While I don't agree that that aren't many packages for Quantile Regression on Python, I believe this is important to have pure Quantile Regression (not inside a Ensemble method) on sci-kit learn.

Quantile Regression has the advantage of targeting on a specific quantile of y. With this, it's possible to reduce the difference of the median of y_pred and y. It's similar to minimizing the absolute error in this case, but it's much more general and work for other quantiles.

Banks use this a lot when dealing with credit scoring and other models, so it's a battle tested model with real applications. R and SAS has this model implemented.

@prcastro do you mean for a linear model?

Exactly. Today, sklearn implements quantile regression on ensemble methods. However, it's usually used as a regular linear model.

I'd be open to adding it. @jnothman @GaelVaroquaux?

It seems fairly established indeed

I'd be open to adding it. @jnothman @GaelVaroquaux?

No opposition. We just need a good PR, and time to review it.

Regarding the specific case of quantile regression for the median (absolute loss), as opposed to more general quantiles, it seems that http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html would allow it if only we could pass epsilon=0.0. Why does HuberRegressor require epsilon : float, greater than 1.0? (The Huber Regressor optimizes the squared loss for the samples where |(y - X'w) / sigma| < epsilon and the absolute loss for the samples where |(y - X'w) / sigma| > epsilon.)

Huber loss with epsilon=0 is a non-smooth optimization problem: one
cannot use the same class of solvers.

I added classical quantile linear regression in the pull request above. Please review it!

Hello everyone! Please review my PR when you have time for it.

ping

It seems that there are not very much packages in python with quantile regression...

they aint gonna listen its 2020 and still we dont have a proper qunatile regression package

Was this page helpful?
0 / 5 - 0 ratings