Scikit-learn: 导入错误：dlopen：无法使用使用 gcc 5.5 构建的火炬使用静态 TLS 加载更多对象

创建于 2019-07-26 · 12评论 · 资料来源: scikit-learn/scikit-learn

我不确定这是 PyTorch 错误、scikit-learn 错误还是 numba，但是这曾经在 scikit-learn 0.20.3 中工作并在 0.21.0 系列中停止工作，所以现在我要冒险猜测它是 scikit learn 中的回归。

当我执行以下一系列导入（从原始导入最小化为import librosa ）时，加载以下程序失败：

import torch
import soundfile
import scipy.signal
import numba
import sklearn

和

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/sklearn/__check_build/__init__.py", line 44, in <module>
    from ._check_build import check_build  # noqa
ImportError: dlopen: cannot load any more object with static TLS

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_torch.py", line 5, in <module>
    import sklearn
  File "/opt/conda/lib/python3.6/site-packages/sklearn/__init__.py", line 75, in <module>
    from . import __check_build
  File "/opt/conda/lib/python3.6/site-packages/sklearn/__check_build/__init__.py", line 46, in <module>
    raise_build_error(e)
  File "/opt/conda/lib/python3.6/site-packages/sklearn/__check_build/__init__.py", line 41, in raise_build_error
    %s""" % (e, local_dir, ''.join(dir_content).strip(), msg))
ImportError: dlopen: cannot load any more object with static TLS
___________________________________________________________________________
Contents of /opt/conda/lib/python3.6/site-packages/sklearn/__check_build:
_check_build.cpython-36m-x86_64-linux-gnu.so__pycache__               __init__.py
setup.py
___________________________________________________________________________
It seems that scikit-learn has not been built correctly.

If you have installed scikit-learn from source, please do not forget
to build the package before using it: run `python setup.py install` or
`make` in the source directory.

If you have used an installer, please check that it is suited for your
Python version, your operating system and your platform.

降级到 scikit-learn 0.20.3 会使问题消失。

版本

jenkins<strong i="15">@260bf77532d0</strong>:~/workspace/test$ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sklearn; sklearn.show_versions()

System:
    python: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)  [GCC 7.3.0]
executable: /opt/conda/bin/python
   machine: Linux-4.15.0-29-generic-x86_64-with-debian-jessie-sid

BLAS:
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /opt/conda/lib
cblas_libs: mkl_rt, pthread

Python deps:
       pip: 19.1.1
setuptools: 41.0.1
   sklearn: 0.21.2
     numpy: 1.16.4
     scipy: 1.1.0
    Cython: None
    pandas: None

此外，您可能对以下内容感兴趣：

jenkins<strong i="19">@260bf77532d0</strong>:~/workspace/test$ pip list | grep numba
numba                  0.43.1         
jenkins<strong i="20">@260bf77532d0</strong>:~/workspace/test$ pip list | grep torch
torch                  1.2.0a0+ab800ad

必须使用gcc 5.5.0 来构建torch 才会导致此问题；已知其他版本的 gcc 不会导致此问题。

为了方便复现，可以使用下面的docker镜像ezyang/scikit-learn-tls-repro:1 https://cloud.docker.com/repository/registry-1.docker.io/ezyang/scikit-learn-tls-repro进入后，按照上述复制说明进行操作。（编辑在撰写本文时，Docker 映像仍在上传。应该很快完成。）

资料来源

ezyang

👍6 🚀1 ❤1 🎉1 😄1

最有用的评论

我通过导入sklearn，然后导入tensorflow解决了这个问题。导入顺序导致这个错误。

Ningshiqi 于 2020-05-25

🚀4 👍3 🎉2

所有12条评论

谢谢你的报告。你是如何构建/安装 sklearn 的？

amueller 于 2019-07-26

pip install scikit-learn

ezyang 于 2019-07-26

👎22

你有那个日志吗？它是从源代码构建的还是您安装了轮子？

amueller 于 2019-07-26

Collecting scikit-learn                                                                           
  Using cached https://files.pythonhosted.org/packages/85/04/49633f490f726da6e454fddc8e938bbb5bfed
2001681118d3814c219b723/scikit_learn-0.21.2-cp36-cp36m-manylinux1_x86_64.whl

ezyang 于 2019-07-26

👍1

@ezyang如果可能，您可能想分享Dockerfile 。

如果有人有兴趣重现此错误，则要使用的正确 docker 咒语如下所示：

docker run -it ezyang/scikit-learn-tls-repro:1 bash

请注意，您需要明确指定标签即1否则您会收到一条神秘的错误消息（“最新”标签不存在）：

Unable to find image 'ezyang/scikit-learn-tls-repro:latest' locally
docker: Error response from daemon: manifest for ezyang/scikit-learn-tls-repro:latest not found.

我不知道为什么会发生这种情况，但我似乎有很多与此相关的错误报告，例如使用 pytorch 和 OpenCV https://github.com/pytorch/pytorch/issues/2083或 OpenCV 和 Tensorflow https://github.com /tensorflow/models/issues/523。总而言之，我猜这不是 scikit-learn 的错误。

它取决于导入顺序的事实是可疑的，例如这适用于您的 docker 镜像：

python -c 'import torch; import sklearn; import soundfile; import scipy.signal; import numba'

lesteve 于 2019-08-01

请注意，我尝试在 conda 环境中（在您的 docker 映像中进行良好测量）进行复制，但不能（ scikit-learn 0.21.2和pytorch 1.1.0 ），所以我想这可能与 pytorch dev 中的一些更改有关版本。

conda create -n test -c pytorch pytorch scikit-learn scipy numba scikit-learn -y
conda activate test
pip install soundfile
python -c 'import torch; import soundfile; import scipy.signal; import numba; import sklearn'

$ conda list
# packages in environment at /opt/conda/envs/test:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
blas                      1.0                         mkl  
ca-certificates           2019.5.15                     0  
certifi                   2019.6.16                py37_1  
cffi                      1.12.3           py37h2e261b9_0  
cudatoolkit               10.0.130                      0  
intel-openmp              2019.4                      243  
joblib                    0.13.2                   py37_0  
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
llvmlite                  0.29.0           py37hd408876_0  
mkl                       2019.4                      243  
mkl-service               2.0.2            py37h7b6447c_0  
mkl_fft                   1.0.12           py37ha843d7b_0  
mkl_random                1.0.2            py37hd81dba3_0  
ncurses                   6.1                  he6710b0_1  
ninja                     1.9.0            py37hfd86e86_0  
numba                     0.45.0           py37h962f231_0  
numpy                     1.16.4           py37h7e9f1db_0  
numpy-base                1.16.4           py37hde5b4d6_0  
openssl                   1.1.1c               h7b6447c_1  
pip                       19.1.1                   py37_0  
pycparser                 2.19                     py37_0  
python                    3.7.3                h0371630_0  
pytorch                   1.1.0           py3.7_cuda10.0.130_cudnn7.5.1_0    pytorch
readline                  7.0                  h7b6447c_5  
scikit-learn              0.21.2           py37hd81dba3_0  
scipy                     1.3.0            py37h7c811a0_0  
setuptools                41.0.1                   py37_0  
six                       1.12.0                   py37_0  
soundfile                 0.10.2                   pypi_0    pypi
sqlite                    3.29.0               h7b6447c_0  
tk                        8.6.8                hbc83047_0  
wheel                     0.33.4                   py37_0  
xz                        5.2.4                h14c3975_4  
zlib                      1.2.11               h7b6447c_3

lesteve 于 2019-08-01

我想如果问题在开发版本上重现，那么在 scikit-learn 上得到一分为二会很有用和有帮助。

ezyang 于 2019-08-01

一般来说，我的感觉是这类问题的专业知识在 PyTorch 方面。就我个人而言，我以前从未听说过静态 TLS，我猜这是许多其他核心 scikit-learn 开发人员的情况，尽管我可能对最后一条陈述有误。

IIUC 你最初看到了 scikit-learn 0.21.2 和 pytorch dev 版本的问题。如https://github.com/scikit-learn/scikit-learn/issues/14485#issuecomment -517195977 中所述，我无法在 scikit-learn 0.21.2 和 pytorch 1.1.0 上重现该问题。如果我想更详细地理解这一点，我会在 PyTorch 上一分为二。

lesteve 于 2019-08-01

问题@ezyang链接有大量关于此 TLS（线程本地存储）问题的信息。
这是我之前挖出的一些信息： https :

;TLDR：导入链中的某些内容不是使用-gPIC标志编译的 C/C++。导入该库会导致将所有导入变为“静态 TLS”的问题。这种“静态 TLS”插槽的数量是最大的（我在这里使用的名称肯定不正确）。确切的 N 个插槽取决于操作系统及其编译方式。

在链接的 pytorch 问题 2575 中，提到它是 OpenMP，它是在没有标志导致级联的情况下编译的。
这个 scikit-learn 问题可能是由于引入了一些新库或发生了一些变化，只是占用了更多的静态 TLS 插槽。

注意：不是真正的专家。除了“一个/某个库在编译时缺少‘-gPIC’标志”之外，此错误可能还有其他来源。不过还没找到。

lautjy 于 2019-08-17

这方面有任何更新吗？我也遇到了这个问题，在导入 librosa 时也是如此。

adarob 于 2019-10-18

检查https://github.com/pytorch/pytorch/issues/2575#issuecomment -523657178

ezyang 于 2019-10-22

我通过导入sklearn，然后导入tensorflow解决了这个问题。导入顺序导致这个错误。

Ningshiqi 于 2020-05-25

🚀4 👍3 🎉2

此页面是否有帮助？

0 / 5 - 0 等级

Scikit-learn: 导入错误：dlopen：无法使用使用 gcc 5.5 构建的火炬使用静态 TLS 加载更多对象

版本

最有用的评论

所有12条评论

相关问题