Evalml: Prediction explanations

Created on 15 Apr 2020 · 8Comments · Source: alteryx/evalml

A new feature we could add for model understanding is prediction explanation. This would answer the question "why did my model predict x?", allowing users to see which input features were the most impactful to that prediction. This sort of feature can be useful for debugging models from a data setup perspective, because users could examine predictions they've categorized as "bad" and alter or eliminate the features which contributed to those predictions.

Some resources:

epic needs design new feature

Source

dsherry

👍1

Most helpful comment

We should look into using SHAP (SHapley Additive exPlanations)
https://github.com/slundberg/shap
I discovered this library via this notebook:
https://github.com/d6t/d6t-python/blob/master/blogs/blog-20200426-shapley.ipynb

gsheni on 7 May 2020

👍2

All 8 comments

gsheni on 7 May 2020

👍2

Design Document: https://alteryx.quip.com/D4ZwAOUklY5X/Prediction-Explanations

freddyaboulton on 20 Jul 2020

@freddyaboulton and I met to discuss this yesterday. This is ready for implementation. Below is the implementation plan from the design doc:

Tasks
Phase 1

Implement interpretation algorithm (1 day of engineering and testing, 1 day review).
1. Add compute_features method to PipelineBase (private method)
2. Implement ShapIntrepeter
Implement interpretation UI (1 day of engineering and testing, 1 day review).
1. Implement explain_prediction
Write/augment tutorial to display new functionality (1 day of engineering and testing, 1 day review).
1. Add it to the User Guide
2. Consider adding something to Tutorials
Qualitative Analysis of explanation quality: (3 days)
1. Run AutoML on difficult datasets.
2. Grab a couple pipelines and make sure prediction explanations make sense.
3. Mock dataset and run AutoML search, then explain predictions.
4. Add notebooks in repo
Stretch Task: Evaluate performance on many datasets.

Note: until all this is complete, we should keep the implementation private for the July release, i.e. _explain_prediction

Overall estimate: 9 days

Phase 2

Implement explain_predictions, which finds and explains the top n most/least confident predictions. (5 days)

Overall estimate: 5 days

Key Dates
July release: July 28, 2020.

Goal
Merge Phase 1 by Tues August 4th
Merge Phase 2 by Tues August 11th

Stretch Goal
Merge Phase 1 by Tues July 28th (July release)
Merge Phase 2 by Tues Aug 4th

dsherry on 21 Jul 2020

Hey @freddyaboulton , to date we've been keeping epics in the Epic pipeline and instead moving the individual issues through the pipeline. Could you please follow that pattern here as well? If that feels weird or incorrect to you, happy to discuss changing our process for how we organize epics. Its pretty simplistic at the moment.

dsherry on 23 Jul 2020

@dsherry My mistake! Keeping epics in the epic pipeline makes sense to me 👍

freddyaboulton on 23 Jul 2020

👍1

@freddyaboulton from my perspective, we should finish reviewing the shap qualitative analysis you did (which is super helpful!!), resolve those discussions and perhaps make some fixes/updates. But what I see in there already feels good enough to make public for July!

To confirm: explain_predictions is now public, in the API docs and we added a user guide, correct? Meaning it will be a part of the July release? So great!!

dsherry on 28 Jul 2020

@freddyaboulton can this epic be closed?

dsherry on 28 Aug 2020

@dsherry I think once we get #1107 merged we can close this!

freddyaboulton on 28 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Poor performance on diamond dataset

dsherry · 3Comments

BalancedClassificationDataCVSplit produces different splits each time it's called

freddyaboulton · 3Comments

SHAP test fails with the Elastic Net Classifier estimator

bchen1116 · 4Comments

Add a classification accuracy objective

dsherry · 4Comments

Update pipeline and components to return Woodwork data structures

angela97lin · 5Comments