Scikit-learn: Error raised during grid search on pipeline with None for transformer step

Created on 12 Nov 2020  ·  3Comments  ·  Source: scikit-learn/scikit-learn

Describe the bug

When performing a grid search on a pipeline that has None for a transformer step, an AttributeError is raised. This snippet below previously ran successfully with scikit-learn==0.23.2 but no longer works the 0.24.dev0.

Steps/Code to Reproduce

from sklearn.datasets import load_iris
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

iris = load_iris()
X, y = iris.data, iris.target

pipe = Pipeline([("setup", None), ("svc", SVC(kernel="linear", random_state=0))])

param_grid = [
    {"svc__C": [0.1, 0.1]},
    {"setup": [StandardScaler()]},
]

gs = GridSearchCV(pipe, param_grid=param_grid, return_train_score=True, cv=3)
gs.fit(X, y)

Expected Results

The GridSearchCV.fit call is able to successfully complete

Actual Results

The following error is raised (I've included the full traceback further down):

  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/base.py", line 863, in _is_pairwise
    pairwise_tag = estimator._get_tags().get('pairwise', False)
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/base.py", line 348, in _get_tags
    more_tags = base_class._more_tags(self)
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/pipeline.py", line 626, in _more_tags
    estimator_tags = self.steps[0][1]._get_tags()
AttributeError: 'NoneType' object has no attribute '_get_tags'

It appears that the _is_pairwise check doesn't work as expected when applied to a pipeline with None for a step transformer.


Full traceback:

Traceback (most recent call last):
  File "test-pipeline.py", line 18, in <module>
    gs.fit(X, y)
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/utils/validation.py", line 60, in inner_f
    return f(*args, **kwargs)
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/model_selection/_search.py", line 841, in fit
    self._run_search(evaluate_candidates)
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/model_selection/_search.py", line 1288, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/model_selection/_search.py", line 795, in evaluate_candidates
    out = parallel(delayed(_fit_and_score)(clone(base_estimator),
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/joblib/parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/joblib/parallel.py", line 866, in dispatch_one_batch
    self._dispatch(tasks)
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/joblib/parallel.py", line 784, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/joblib/parallel.py", line 262, in __call__
    return [func(*args, **kwargs)
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/joblib/parallel.py", line 262, in <listcomp>
    return [func(*args, **kwargs)
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/utils/fixes.py", line 222, in __call__
    return self.function(*args, **kwargs)
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 585, in _fit_and_score
    X_train, y_train = _safe_split(estimator, X, y, train)
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/utils/metaestimators.py", line 198, in _safe_split
    if _is_pairwise(estimator):
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/base.py", line 863, in _is_pairwise
    pairwise_tag = estimator._get_tags().get('pairwise', False)
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/base.py", line 348, in _get_tags
    more_tags = base_class._more_tags(self)
  File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/pipeline.py", line 626, in _more_tags
    estimator_tags = self.steps[0][1]._get_tags()
AttributeError: 'NoneType' object has no attribute '_get_tags'

Versions

System:
    python: 3.8.6 | packaged by conda-forge | (default, Oct  7 2020, 18:42:56)  [Clang 10.0.1 ]
executable: /Users/james/miniforge3/envs/dask-ml/bin/python3.8
   machine: macOS-10.15.5-x86_64-i386-64bit

Python dependencies:
          pip: 20.2.4
   setuptools: 49.6.0.post20201009
      sklearn: 0.24.dev0
        numpy: 1.19.4
        scipy: 1.5.3
       Cython: None
       pandas: 1.1.4
   matplotlib: None
       joblib: 0.17.0
threadpoolctl: 2.1.0

Built with OpenMP: True
Blocker Bug

Most helpful comment

I'll mark it as a blocker because the error will not just appear when using None, but when using any step that doesn't have _get_tags attribute (likely because it doesn't inherit from BaseEstimator)

All 3 comments

Thanks for the report @jrbourbeau , we can reproduce. We're investigating the best solution in the different issues linked above if you're interested

I'll mark it as a blocker because the error will not just appear when using None, but when using any step that doesn't have _get_tags attribute (likely because it doesn't inherit from BaseEstimator)

Fixed by #18797. Thanks for the timely bug report @jrbourbeau .

Was this page helpful?
0 / 5 - 0 ratings