Scikit-learn: Add linear quantile regression

Created on 13 May 2014 · 17Comments · Source: scikit-learn/scikit-learn

It seems that there are not very much packages in python with quantile regression...

Source

yangky11

👍3

Most helpful comment

Hello everyone! Please review my PR when you have time for it.

avidale on 19 Apr 2018

👍13

All 17 comments

It seems that there are not very much packages in python with quantile
regression...

Why should it go in scikit-learn?

GaelVaroquaux on 13 May 2014

If possible, why not?

yangky11 on 13 May 2014

Because we have a lot of code to maintain already and we only include popular ML algorithms. Is this popular? Does it hold clear benefits for machine learning tasks over other approaches?

(Btw. VW has quantile regression, with loss ℓ(p,y) = τ(p - y)[[y ≤ p]] + (1 - τ)(y - p)[[y ≥ p]] where [[⋅]] are Iverson brackets.)

larsmans on 17 May 2014

GradientBoostingRegressor supports quantile regression (using loss=quantile and the alpha parameter). See http://scikit-learn.org/dev/auto_examples/ensemble/plot_gradient_boosting_quantile.html#example-ensemble-plot-gradient-boosting-quantile-py for an example.

glouppe on 17 May 2014

I should have checked that. Closing this issue.

larsmans on 18 May 2014

While I don't agree that that aren't many packages for Quantile Regression on Python, I believe this is important to have pure Quantile Regression (not inside a Ensemble method) on sci-kit learn.

Quantile Regression has the advantage of targeting on a specific quantile of y. With this, it's possible to reduce the difference of the median of y_pred and y. It's similar to minimizing the absolute error in this case, but it's much more general and work for other quantiles.

Banks use this a lot when dealing with credit scoring and other models, so it's a battle tested model with real applications. R and SAS has this model implemented.

prcastro on 27 Aug 2017

@prcastro do you mean for a linear model?

amueller on 29 Aug 2017

Exactly. Today, sklearn implements quantile regression on ensemble methods. However, it's usually used as a regular linear model.

prcastro on 30 Aug 2017

I'd be open to adding it. @jnothman @GaelVaroquaux?

amueller on 30 Aug 2017

It seems fairly established indeed

agramfort on 30 Aug 2017

I'd be open to adding it. @jnothman @GaelVaroquaux?

No opposition. We just need a good PR, and time to review it.

GaelVaroquaux on 30 Aug 2017

Regarding the specific case of quantile regression for the median (absolute loss), as opposed to more general quantiles, it seems that http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html would allow it if only we could pass epsilon=0.0. Why does HuberRegressor require epsilon : float, greater than 1.0? (The Huber Regressor optimizes the squared loss for the samples where |(y - X'w) / sigma| < epsilon and the absolute loss for the samples where |(y - X'w) / sigma| > epsilon.)

atorch on 26 Sep 2017

Huber loss with epsilon=0 is a non-smooth optimization problem: one
cannot use the same class of solvers.

GaelVaroquaux on 26 Sep 2017

I added classical quantile linear regression in the pull request above. Please review it!

avidale on 22 Oct 2017

👍11

Hello everyone! Please review my PR when you have time for it.

avidale on 19 Apr 2018

👍13

ping

avidale on 28 Aug 2018

👍11

It seems that there are not very much packages in python with quantile regression...

they aint gonna listen its 2020 and still we dont have a proper qunatile regression package

mu745511 on 11 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

EM algorithm in GMM fails for one-dimensional datasets using 0.16.1 (but fine with 0.15.2)

rebeccaroisin · 4Comments

ValueError: continuous format is not supported in RidgeClassifierCV

yandrieiev · 3Comments

GridSearchCV.fit(...,n_job=-1) might contain bug in parallelism

tluocs · 3Comments

CountVectorizer and TfidfVectorizer docs do not mention token_pattern gets ignored when passing a custom tokenizer

stephantul · 3Comments

Examples using precomputed distance matrix for clustering

celiafish · 4Comments