Scikit-learn: 加载从流水线中提取的腌制模型时出现导入错误

创建于 2020-02-27 · 3评论 · 资料来源: scikit-learn/scikit-learn

描述错误

酸洗从 sklearn 管道中提取的 RandomForestClassifier 在加载到另一个笔记本时似乎会导致ModuleNotFoundError 。错误的模块确实存在，但找不到： sklearn.ensemble._forest 。

重现的步骤/代码

from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
import pickle
import numpy as np


pipeline = make_pipeline(
    # Other steps in pipeline as well
    RandomForestClassifier(),
)

# Create some fake data
X_train = np.array([[2,8,5],[4,7,2],[1,9,4]])
y_train = np.array([26, 29, 18])
# Train the model
pipeline.fit(X_train, y_train)

# Pickle the model
model = pipeline.named_steps['randomforestclassifier']
outfile = open("model.pkl", "wb")
pickle.dump(model, outfile)
outfile.close()

在另一个笔记本中：

from sklearn.ensemble import RandomForestClassifier
import pickle


# Attempt to load the pickled model in another file / notebook:
infile = open("model.pkl", "rb")
model = pickle.load(infile)
infile.close()

# It's lonely over here
model

预期成绩

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)

实际结果

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-20-c8d1783e8b58> in <module>
      5 # Attempt to load the pickled model in another file / notebook:
      6 infile = open("model.pkl", "rb")
----> 7 model = pickle.load(infile)
      8 infile.close()

ModuleNotFoundError: No module named 'sklearn.ensemble._forest'

版本

System:
    python: 3.7.4 (default, Oct  9 2019, 16:55:50)  [GCC 7.4.0]
executable: /usr/local/bin/python3.7
   machine: Linux-4.15.0-88-generic-x86_64-with-debian-buster-sid

Python deps:
       pip: 19.3
setuptools: 40.8.0
   sklearn: 0.21.3
     numpy: 1.17.2
     scipy: 1.3.1
    Cython: None
    pandas: 0.25.1

编辑：修复说明

Question

资料来源

bmulas1535

所有3条评论

似乎无法从sklearn.ensemble._forest导入。这是故意的吗？

bmulas1535 于 2020-02-27

你能告诉你使用哪个版本的 scikit-learn 来腌制管道，以及你想取消哪个版本。我假设你试图在 0.21.3 中解压，同时在 0.22.1 中进行酸洗
然后你就可以更新了。但是，请注意，我们不将其视为错误，因为我们不支持跨不同 scikit-learn 版本的 pickling/unpickling

glemaitre 于 2020-02-27

❤1

这正是问题所在。我没有意识到我在不同的内核中启动了新笔记本。非常感谢您的及时回复。