Evalml: AutoMLSearch get_pipeline always returns pipelines with the same name

Created on 2 Mar 2021 · 3Comments · Source: alteryx/evalml

Repro:

from evalml.automl import AutoMLSearch
from evalml.demos import load_breast_cancer

X, y = load_breast_cancer()
automl = AutoMLSearch(X, y, problem_type="binary", max_batches=1)
automl.search()

pipelines = [automl.get_pipeline(i) for i in range(3)]
assert [p.name for p in pipelines] == ['LightGBM Classifier w/ Imputer', 'LightGBM Classifier w/ Imputer', 'LightGBM Classifier w/ Imputer']

The pipelines should not all have the same name. The estimators are different:

[p.estimator.name for p in pipelines]

['Baseline Classifier', 'Decision Tree Classifier', 'LightGBM Classifier']

bug needs design

Source

freddyaboulton

👍1

Most helpful comment

I believe this stems from #1400 . I think we're in a pickle (pun intended) - our pipeline design relies on setting class attributes to define pipelines, which is conducive to having dynamically generated pipeline classes like we do in search. The problem is that then these pipelines can't be "exported" out of AutoMLSearch.

There may be an easy solution I'm overlooking but I think this will require a deep design discussion if we want to fix this and keep our automl pipelines pickle-able.

freddyaboulton on 8 Mar 2021

👍3

All 3 comments

It's not just the names; the hyperparameters (and I suspect other values that are stored on the class) are updated too. This is because we're using the GeneratedPipelineBinary class and updating its class variables. Since each pipeline is a GeneratedPipelineBinary, it will update it for the entire class and affect all instances.

angela97lin on 8 Mar 2021

👍2

There may be an easy solution I'm overlooking but I think this will require a deep design discussion if we want to fix this and keep our automl pipelines pickle-able.

freddyaboulton on 8 Mar 2021

👍3

Plan
Short-term: this issue tracks reverting the change from #1400 to resolve the buggy behavior. Our pipelines won't support python pickle but will still be serializable using the existing save/load functionality which uses cloudpickle.

Long-term: after the revert, #1956 tracks figuring out how we should support saving evalml pipelines using python pickle.

dsherry on 10 Mar 2021

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Warning messages in unit test: "invalid value encountered in double_scalars" and others

dsherry · 3Comments

Imputer cannot fit when there is None in a categorical or boolean column

freddyaboulton · 3Comments

Docs:Back Arrow on Install Page

chukarsten · 4Comments

Update automl search "raise_errors" flag to default to true

dsherry · 4Comments

Allow components which are not "leaf" children in the class hierarchy

angela97lin · 4Comments