Evalml: AutoMLSearch get_pipeline always returns pipelines with the same name

Created on 2 Mar 2021  ·  3Comments  ·  Source: alteryx/evalml

Repro:

from evalml.automl import AutoMLSearch
from evalml.demos import load_breast_cancer

X, y = load_breast_cancer()
automl = AutoMLSearch(X, y, problem_type="binary", max_batches=1)
automl.search()

pipelines = [automl.get_pipeline(i) for i in range(3)]
assert [p.name for p in pipelines] == ['LightGBM Classifier w/ Imputer', 'LightGBM Classifier w/ Imputer', 'LightGBM Classifier w/ Imputer']

The pipelines should not all have the same name. The estimators are different:

[p.estimator.name for p in pipelines]
['Baseline Classifier', 'Decision Tree Classifier', 'LightGBM Classifier']
bug needs design

Most helpful comment

I believe this stems from #1400 . I think we're in a pickle (pun intended) - our pipeline design relies on setting class attributes to define pipelines, which is conducive to having dynamically generated pipeline classes like we do in search. The problem is that then these pipelines can't be "exported" out of AutoMLSearch.

There may be an easy solution I'm overlooking but I think this will require a deep design discussion if we want to fix this and keep our automl pipelines pickle-able.

All 3 comments

It's not just the names; the hyperparameters (and I suspect other values that are stored on the class) are updated too. This is because we're using the GeneratedPipelineBinary class and updating its class variables. Since each pipeline is a GeneratedPipelineBinary, it will update it for the entire class and affect all instances.

I believe this stems from #1400 . I think we're in a pickle (pun intended) - our pipeline design relies on setting class attributes to define pipelines, which is conducive to having dynamically generated pipeline classes like we do in search. The problem is that then these pipelines can't be "exported" out of AutoMLSearch.

There may be an easy solution I'm overlooking but I think this will require a deep design discussion if we want to fix this and keep our automl pipelines pickle-able.

Plan
Short-term: this issue tracks reverting the change from #1400 to resolve the buggy behavior. Our pipelines won't support python pickle but will still be serializable using the existing save/load functionality which uses cloudpickle.

Long-term: after the revert, #1956 tracks figuring out how we should support saving evalml pipelines using python pickle.

Was this page helpful?
0 / 5 - 0 ratings