Evalml: Prediction explanations

Created on 15 Apr 2020  ·  8Comments  ·  Source: alteryx/evalml

A new feature we could add for model understanding is prediction explanation. This would answer the question "why did my model predict x?", allowing users to see which input features were the most impactful to that prediction. This sort of feature can be useful for debugging models from a data setup perspective, because users could examine predictions they've categorized as "bad" and alter or eliminate the features which contributed to those predictions.

Some resources:

epic needs design new feature

Most helpful comment

We should look into using SHAP (SHapley Additive exPlanations)
https://github.com/slundberg/shap
I discovered this library via this notebook:
https://github.com/d6t/d6t-python/blob/master/blogs/blog-20200426-shapley.ipynb

All 8 comments

We should look into using SHAP (SHapley Additive exPlanations)
https://github.com/slundberg/shap
I discovered this library via this notebook:
https://github.com/d6t/d6t-python/blob/master/blogs/blog-20200426-shapley.ipynb

@freddyaboulton and I met to discuss this yesterday. This is ready for implementation. Below is the implementation plan from the design doc:

Tasks
Phase 1

  1. Implement interpretation algorithm (1 day of engineering and testing, 1 day review).

    1. Add compute_features method to PipelineBase (private method)

    2. Implement ShapIntrepeter

  2. Implement interpretation UI (1 day of engineering and testing, 1 day review).

    1. Implement explain_prediction

  3. Write/augment tutorial to display new functionality (1 day of engineering and testing, 1 day review).

    1. Add it to the User Guide

    2. Consider adding something to Tutorials

  4. Qualitative Analysis of explanation quality: (3 days)

    1. Run AutoML on difficult datasets.

    2. Grab a couple pipelines and make sure prediction explanations make sense.

    3. Mock dataset and run AutoML search, then explain predictions.

    4. Add notebooks in repo

  5. Stretch Task: Evaluate performance on many datasets.

Note: until all this is complete, we should keep the implementation private for the July release, i.e. _explain_prediction

Overall estimate: 9 days

Phase 2

  1. Implement explain_predictions, which finds and explains the top n most/least confident predictions. (5 days)

Overall estimate: 5 days

Key Dates
July release: July 28, 2020.

Goal
Merge Phase 1 by Tues August 4th
Merge Phase 2 by Tues August 11th

Stretch Goal
Merge Phase 1 by Tues July 28th (July release)
Merge Phase 2 by Tues Aug 4th

Hey @freddyaboulton , to date we've been keeping epics in the Epic pipeline and instead moving the individual issues through the pipeline. Could you please follow that pattern here as well? If that feels weird or incorrect to you, happy to discuss changing our process for how we organize epics. Its pretty simplistic at the moment.

@dsherry My mistake! Keeping epics in the epic pipeline makes sense to me 👍

@freddyaboulton from my perspective, we should finish reviewing the shap qualitative analysis you did (which is super helpful!!), resolve those discussions and perhaps make some fixes/updates. But what I see in there already feels good enough to make public for July!

To confirm: explain_predictions is now public, in the API docs and we added a user guide, correct? Meaning it will be a part of the July release? So great!!

@freddyaboulton can this epic be closed?

@dsherry I think once we get #1107 merged we can close this!

Was this page helpful?
0 / 5 - 0 ratings