Numpy: np.dot crashes on some systems with numpy 1.14.5

Created on 6 Jul 2018  ·  61Comments  ·  Source: numpy/numpy

On one of our systems, the following code snippet leads to a crash of the python interpreter:

import numpy as np
A = np.matrix([[1.], [3.]])
B = np.matrix([[2., 3.]])
np.dot(A, B)

Some details:

  • Windows 10
  • Python 3.5.2 in a virtual environment
  • numpy 1.14.5

On other systems this command works just fine.

00 - Bug

All 61 comments

Where are you getting your numpy? pip? Anaconda? What does np.show_config() say?

Thank you for the response. I've installed numpy through pip.

>>> import numpy as np
>>> np.show_config()
blas_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    library_dirs = ['C:\\projects\\numpy-wheels-jc1cl\\numpy\\build\\openblas']
    libraries = ['openblas']
    define_macros = [('HAVE_CBLAS', None)]
    language = f77
blis_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
openblas_info:
    library_dirs = ['C:\\projects\\numpy-wheels-jc1cl\\numpy\\build\\openblas']
    libraries = ['openblas']
    define_macros = [('HAVE_CBLAS', None)]
    language = f77
lapack_opt_info:
    library_dirs = ['C:\\projects\\numpy-wheels-jc1cl\\numpy\\build\\openblas']
    libraries = ['openblas']
    define_macros = [('HAVE_CBLAS', None)]
    language = f77
blas_opt_info:
    library_dirs = ['C:\\projects\\numpy-wheels-jc1cl\\numpy\\build\\openblas']
    libraries = ['openblas']
    define_macros = [('HAVE_CBLAS', None)]
    language = f77

Does upgrading to latest Microsoft Visual C runtime on the machine
help? Maybe available from windows update or from:
https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads

@pv thanks for the hint, but after installing the latest 2017 runtime and restarting the system, the error still exists.

Maybe it has nothing to do with the problem, but the affected system is a Windows 10 running in a KVM.

When I run the test script above as t.py in this way: python -u -m trace -t t.py, I'm getting the following output before the crash:

t.py(4): A = np.matrix([[1.], [3.]])
 --- modulename: defmatrix, funcname: __new__
defmatrix.py(213):         if isinstance(data, matrix):
defmatrix.py(221):         if isinstance(data, N.ndarray):
defmatrix.py(232):         if isinstance(data, str):
defmatrix.py(236):         arr = N.array(data, dtype=dtype, copy=copy)
defmatrix.py(237):         ndim = arr.ndim
defmatrix.py(238):         shape = arr.shape
defmatrix.py(239):         if (ndim > 2):
defmatrix.py(241):         elif ndim == 0:
defmatrix.py(243):         elif ndim == 1:
defmatrix.py(246):         order = 'C'
defmatrix.py(247):         if (ndim == 2) and arr.flags.fortran:
defmatrix.py(248):             order = 'F'
defmatrix.py(250):         if not (order or arr.flags.contiguous):
defmatrix.py(253):         ret = N.ndarray.__new__(subtype, shape, arr.dtype,
defmatrix.py(254):                                 buffer=arr,
defmatrix.py(255):                                 order=order)
 --- modulename: defmatrix, funcname: __array_finalize__
defmatrix.py(259):         self._getitem = False
defmatrix.py(260):         if (isinstance(obj, matrix) and obj._getitem): return
defmatrix.py(261):         ndim = self.ndim
defmatrix.py(262):         if (ndim == 2):
defmatrix.py(263):             return
defmatrix.py(256):         return ret
t.py(5): B = np.matrix([[2., 3.]])
 --- modulename: defmatrix, funcname: __new__
defmatrix.py(213):         if isinstance(data, matrix):
defmatrix.py(221):         if isinstance(data, N.ndarray):
defmatrix.py(232):         if isinstance(data, str):
defmatrix.py(236):         arr = N.array(data, dtype=dtype, copy=copy)
defmatrix.py(237):         ndim = arr.ndim
defmatrix.py(238):         shape = arr.shape
defmatrix.py(239):         if (ndim > 2):
defmatrix.py(241):         elif ndim == 0:
defmatrix.py(243):         elif ndim == 1:
defmatrix.py(246):         order = 'C'
defmatrix.py(247):         if (ndim == 2) and arr.flags.fortran:
defmatrix.py(248):             order = 'F'
defmatrix.py(250):         if not (order or arr.flags.contiguous):
defmatrix.py(253):         ret = N.ndarray.__new__(subtype, shape, arr.dtype,
defmatrix.py(254):                                 buffer=arr,
defmatrix.py(255):                                 order=order)
 --- modulename: defmatrix, funcname: __array_finalize__
defmatrix.py(259):         self._getitem = False
defmatrix.py(260):         if (isinstance(obj, matrix) and obj._getitem): return
defmatrix.py(261):         ndim = self.ndim
defmatrix.py(262):         if (ndim == 2):
defmatrix.py(263):             return
defmatrix.py(256):         return ret
t.py(6): print(np.dot(A, B))
 --- modulename: defmatrix, funcname: __array_finalize__
defmatrix.py(259):         self._getitem = False
defmatrix.py(260):         if (isinstance(obj, matrix) and obj._getitem): return
defmatrix.py(261):         ndim = self.ndim
defmatrix.py(262):         if (ndim == 2):
defmatrix.py(263):             return

Hmm,

this one works

A = np.matrix([[1.], [3.]])
B = np.matrix([[2., 3.]])
print(np.dot(A.astype(np.float16), B.astype(np.float16)))

this one crashes

A = np.matrix([[1.], [3.]])
B = np.matrix([[2., 3.]])
print(np.dot(A.astype(np.float32), B.astype(np.float32)))

Any idea?

After playing around with different numpy versions I can say that

  • numpy <= 1.13.1 is not affected, while
  • numpy >= 1.13.3 is affected by this issue
  • numpy 1.13.2 was not tested because there is no build on pypi

I don't see any obvious changes that could lead to this. As the problem is specific to this installation it might be useful to see if there is something special about the environment, hardware, etc. I'm suspicious that there is a library problem somewhere.

In which information are you interested in?

OS Name Microsoft Windows 10 Pro
Version 10.0.16299 Build 16299
Other OS Description    Not Available
OS Manufacturer Microsoft Corporation
System Name <REMOVED>
System Manufacturer QEMU
System Model    Standard PC (i440FX + PIIX, 1996)
System Type x64-based PC
System SKU  
Processor   Common KVM processor, 1996 Mhz, 4 Core(s), 4 Logical Processor(s)
BIOS Version/Date   SeaBIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org, 4/1/2014
SMBIOS Version  2.8
Embedded Controller Version 255.255
BIOS Mode   Legacy
Platform Role   Desktop
Secure Boot State   Unsupported
PCR7 Configuration  Binding Not Possible
Windows Directory   C:\WINDOWS
System Directory    C:\WINDOWS\system32
Boot Device \Device\HarddiskVolume1
Locale  Austria
Hardware Abstraction Layer  Version = "10.0.16299.371"
User Name   Not Available
Time Zone   W. Europe Daylight Time
Installed Physical Memory (RAM) 4.00 GB
Total Physical Memory   4.00 GB
Available Physical Memory   1.94 GB
Total Virtual Memory    8.00 GB
Available Virtual Memory    5.27 GB
Page File Space 4.00 GB
Page File   C:\pagefile.sys
Virtualization-based security   Not enabled
Device Encryption Support   Reasons for failed automatic device encryption: TPM is not usable, PCR7 binding is not supported, Hardware Security Test Interface failed and device is not InstantGo, Un-allowed DMA capable bus/device(s) detected, Disabled by policy, TPM is not usable
A hypervisor has been detected. Features required for Hyper-V will not be displayed.    

I'm looking for differences with your working installations. Looks like the base system is linux?

It works on:

  • all physical Windows 10 installations
  • in a Windows 10 Hyper-V VM with Windows 10 host

I have no other, similar Windows 10 setup, I could compare the affected system with.

Looks like the base system is linux?

Exactly: Debian GNU/Linux 9 (stretch)

The NumPy change that probably triggered this was the switch to OpenBLAS on Windows for 1.13.3. I'm going to guess that there is something that needs tweaking in the KVM numeric environment.

Might also check if there is an atlas dll floating about.

Might also check if there is an atlas dll floating about.

What exactly do you mean with this sentence?

There should be an openblas dll for the later versions, so I'm wondering if there might might somehow be some confusion with an older atlas dll. Probably not, though. IIUC, numpy runs, it is just returning the wrong results.

Numpy runs, but for the example above it actually retruns nothing but leads to a crash.

Another possibility is OpenBLAS issuing an illegal instruction (SSE*) not supported in the KVM environment. I don't know what is enabled there, could you check?

You can control OpenBlas from environment variables. set OPENBLAS_VERBOSE=2 should print out the default processor model used. It seems the guest must use the same model as the actual host hardware, so there is set OPENBLAS_CORETYPE=nehalem for instance if that is correct for your host machine. Does NumPy behave properly on the host machine? If so you can use the verbose setting to find out what the guest should be using

Anymore info on this problem?

The problem still exists but I was not yet able to test the things @mattip and you mentioned. I won't be able to do this within the next two week but I'll do this afterwards.

OK, thanks.

@m55c55 Have you managed to debug this any more?

Hello,
I can confirm this issue, and might help by giving another use case...

I have a very similar behavior on my machine (fresh linux install):

Linux 4.15.0-42-generic #45-Ubuntu SMP Thu Nov 15 19:32:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Ubuntu 18.04.1 LTS

numpy (1.15.4)

>>> np.show_config()
blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]

My python script:

import numpy as np
A = np.random.randn(100, 3000)
B = np.random.randn(3000, 100)
np.dot(B, A)

The script doesn't crash with very small matrices.

However the crash is actually critical : the machine instantly reboots and many files are corrupted (got my bash and the numpy module broken during my investigation).

I assume you mean np.dot(B, A.T)? How did you get NumPy? If you built is yourself, what version of OpenBLAS?

No it should be np.dot(B, A) to have aligned dimensions.

I tried getting numpy from pip (1.15.4)

➜  ~ pip3 install numpy
Collecting numpy
  Using cached https://files.pythonhosted.org/packages/ff/7f/9d804d2348471c67a7d8b5f84f9bc59fd1cefa148986f2b74552f8573555/numpy-1.15.4-cp36-cp36m-manylinux1_x86_64.whl

And also tried from ubuntu repositories (1.13.3)

➜  ~ apt-cache show python3-numpy
Package: python3-numpy
Architecture: amd64
Version: 1:1.13.3-2ubuntu1
Priority: optional
Section: python
Source: python-numpy
Origin: Ubuntu
Maintainer: Ubuntu Developers <[email protected]>
Original-Maintainer: Sandro Tosi <[email protected]>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 10670
Provides: python3-f2py, python3-numpy-abi9, python3-numpy-api11, python3-numpy-dev, python3.6-numpy
Depends: python3 (<< 3.7), python3 (>= 3.6~), python3.6:any, python3:any (>= 3.4~), libblas3 | libblas.so.3, libc6 (>= 2.14), liblapack3 | liblapack.so.3
Suggests: gcc (>= 4:4.6.1-5), gfortran, python-numpy-doc, python3-dev, python3-nose (>= 1.0), python3-numpy-dbg
Filename: pool/main/p/python-numpy/python3-numpy_1.13.3-2ubuntu1_amd64.deb
Size: 1942804
MD5sum: 7b84ea9967f987a292b64f5bc5b6d65f
SHA1: 73fd8354b7106ac81b8add37947173aad515a9d5
SHA256: 3098ad88b8404cbeee66cc6eef96b13ea87eda848900d7cb754fd0bf280741bf

Just tried on another computer running ubuntu 18 and pip installed numpy ... and cannot reproduce the crash. Not sure what is going on !

Hello, I encountered this bug when running the corrcoef function. I drilled down and found that i can reproduce the crash with the following call np.dot(np.array([[0], [0]]), np.array([0, 0]).conj())

Im running windows 10 in a anaconda environment.

np.__version__

1.14.2

np.show_config()

mkl_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/Users/tommassino/AppData/Local/Continuum/Anaconda3/envs/project-conda\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Users/tommassino/AppData/Local/Continuum/Anaconda3/envs/project-conda\\Library\\include']
blas_mkl_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/Users/tommassino/AppData/Local/Continuum/Anaconda3/envs/project-conda\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Users/tommassino/AppData/Local/Continuum/Anaconda3/envs/project-conda\\Library\\include']
blas_opt_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/Users/tommassino/AppData/Local/Continuum/Anaconda3/envs/project-conda\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Users/tommassino/AppData/Local/Continuum/Anaconda3/envs/project-conda\\Library\\include']
lapack_mkl_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/Users/tommassino/AppData/Local/Continuum/Anaconda3/envs/project-conda\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Users/tommassino/AppData/Local/Continuum/Anaconda3/envs/project-conda\\Library\\include']
lapack_opt_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/Users/tommassino/AppData/Local/Continuum/Anaconda3/envs/project-conda\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries_2016.4.246\\windows\\mkl\\lib', 'C:/Users/tommassino/AppData/Local/Continuum/Anaconda3/envs/project-conda\\Library\\include']

@Tommassino Cannot reproduce. Are you sure those are the shapes that are failing?
This issue is about running inside a VM, if your setup is different please open a new issue.

>>> import sys
>>> print(sys.version)
3.6.6 (default, Sep 12 2018, 18:26:19) 
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]]
>>> import numpy as np
>>> np.__version__
'1.14.2'
>>> np.dot(np.array([[0], [0]]), np.array([0, 0]).conj())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: shapes (2,1) and (2,) not aligned: 1 (dim 1) != 2 (dim 0)

@mattip Ah sorry, i did not notice that this was related specifically to a VM environment as it was only mentioned in the following posts. Some of the other replies do seem to mention the problem occurring on non-vms. My mistake with the np.dot command, i copied the arrays from my debugger incorrectly.

>>> import sys
>>> import numpy as np
>>> print(sys.version)
3.6.6 | packaged by conda-forge | (default, Jul 26 2018, 11:48:23) [MSC v.1900 64 bit (AMD64)]
>>> print(np.__version__)
1.14.2
>>> X = np.array([0., 0.])
>>> X_T = np.array([0., 0.])
>>> print(X)
[0. 0.]
>>> print(X_T)
[0. 0.]
>>> print(np.dot(X, X_T.conj()))
Process finished with exit code 2

However I already found that it might be related to mkl natives not correctly loading, so this might indeed be another issue.

@Tommassino Did you also get numpy from conda?

@oleksandr-pavlyk: any thoughts?

@Tommassino Did you also get numpy from conda?

Sorry for the late reply, holidays... Yes i am using a conda numpy

> conda list --no-pip | grep numpy
numpy                     1.14.2           py36h5c71026_1

However, as I mentioned above, when I updated mkl (conda update mkl) the issue went away.

I guess we can likely close this? @Khemal since you are running linux, can you post the output of https://gist.github.com/seberg/ce4563f3cb00e33997e4f80675b80953 on the box where the crash occurs? That might tell you/us what is wrong, since this is pretty certainly related to the blas.

Unless there really is some reference counting issue, but I would be surprised by that not randomly also hitting other people.

Hi.

I collected some traceback about the issue.

Python 3.6.7
numpy (1.15.4)

(.kfr-test-env) root@whatever:/app# gdb python3
(gdb) run testcode.py 
Starting program: /app/.kfr-test-env/bin/python3 testcode.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff3c1b700 (LWP 29704)]

Thread 1 "python3" received signal SIGILL, Illegal instruction.
0x00007ffff4a5c204 in dgemm_oncopy_OPTERON_SSE3 () from /app/.kfr-test-env/lib/python3.6/site-packages/numpy/core/../.libs/libopenblasp-r0-8dca6697.3.0.dev.so
(gdb) bt
#0  0x00007ffff4a5c204 in dgemm_oncopy_OPTERON_SSE3 () from /app/.kfr-test-env/lib/python3.6/site-packages/numpy/core/../.libs/libopenblasp-r0-8dca6697.3.0.dev.so
#1  0x00007ffff6520630 in ?? () from /app/.kfr-test-env/lib/python3.6/site-packages/numpy/core/../.libs/libopenblasp-r0-8dca6697.3.0.dev.so
#2  0x00000000000000e0 in ?? ()
#3  0x00007ffff41275db in dgemm_tt () from /app/.kfr-test-env/lib/python3.6/site-packages/numpy/core/../.libs/libopenblasp-r0-8dca6697.3.0.dev.so
#4  0x00007ffff4052088 in cblas_dgemm () from /app/.kfr-test-env/lib/python3.6/site-packages/numpy/core/../.libs/libopenblasp-r0-8dca6697.3.0.dev.so
#5  0x00007ffff676d28f in ?? () from /app/.kfr-test-env/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
#6  0x00007ffff6732c36 in ?? () from /app/.kfr-test-env/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
#7  0x00007ffff673400a in ?? () from /app/.kfr-test-env/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
#8  0x00000000005030d5 in _PyCFunction_FastCallDict (kwargs=<optimized out>, nargs=<optimized out>, args=<optimized out>, 
    func_obj=<built-in method dot of module object at remote 0x7ffff6a535e8>) at ../Objects/methodobject.c:231
#9  _PyCFunction_FastCallKeywords (kwnames=<optimized out>, nargs=<optimized out>, stack=<optimized out>, func=<optimized out>) at ../Objects/methodobject.c:294
#10 call_function.lto_priv () at ../Python/ceval.c:4837
#11 0x0000000000506859 in _PyEval_EvalFrameDefault () at ../Python/ceval.c:3335
#12 0x0000000000504c28 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0xadeb18, for file testcode.py, line 10, in <module> ()) at ../Python/ceval.c:4166
#13 _PyEval_EvalCodeWithName.lto_priv.1761 () at ../Python/ceval.c:4166
#14 0x0000000000506393 in PyEval_EvalCodeEx (closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=0x0, argcount=0, args=0x0, locals=<optimized out>, 
    globals=<optimized out>, _co=<optimized out>) at ../Python/ceval.c:4187
#15 PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at ../Python/ceval.c:731
#16 0x0000000000634d52 in run_mod () at ../Python/pythonrun.c:1025
#17 0x0000000000634e0a in PyRun_FileExFlags () at ../Python/pythonrun.c:978
#18 0x00000000006385c8 in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:419
#19 0x00000000006387a5 in PyRun_AnyFileExFlags () at ../Python/pythonrun.c:81
#20 0x000000000063915a in run_file (p_cf=0x7fffffffe2fc, filename=<optimized out>, fp=<optimized out>) at ../Modules/main.c:340
#21 Py_Main () at ../Modules/main.c:810
#22 0x00000000004a6f10 in main (argc=2, argv=0x7fffffffe4f8) at ../Programs/python.c:69

@seberg this is the output:

Probing Multiarray
------------------
OpenBLAS:
    num threads: 2
    version info: DYNAMIC_ARCH NO_AFFINITY Opteron MAX_THREADS=64

Probing Linalg
--------------
OpenBLAS:
    num threads: 2
    version info: DYNAMIC_ARCH NO_AFFINITY Opteron MAX_THREADS=64

LDD information:
----------------
running: ldd /app/.kfr-test-env/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
    linux-vdso.so.1 (0x00007ffe941ef000)
    libopenblasp-r0-8dca6697.3.0.dev.so => /app/.kfr-test-env/lib/python3.6/site-packages/numpy/core/../.libs/libopenblasp-r0-8dca6697.3.0.dev.so (0x00007f9319cea000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f931994c000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f931972d000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f931933c000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f931c800000)
    libgfortran-ed201abd.so.3.0.0 => /app/.kfr-test-env/lib/python3.6/site-packages/numpy/core/../.libs/libgfortran-ed201abd.so.3.0.0 (0x00007f9319042000)
running: ldd /app/.kfr-test-env/lib/python3.6/site-packages/numpy/linalg/_umath_linalg.cpython-36m-x86_64-linux-gnu.so
    linux-vdso.so.1 (0x00007ffda69d5000)
    libopenblasp-r0-8dca6697.3.0.dev.so => /app/.kfr-test-env/lib/python3.6/site-packages/numpy/linalg/../.libs/libopenblasp-r0-8dca6697.3.0.dev.so (0x00007f12097c1000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f12095a2000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f12091b1000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1208e13000)
    libgfortran-ed201abd.so.3.0.0 => /app/.kfr-test-env/lib/python3.6/site-packages/numpy/linalg/../.libs/libgfortran-ed201abd.so.3.0.0 (0x00007f1208b19000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f120c0ee000)

@mattip I tried the environment variable things, that you mentioned:

With the os.environ["OPENBLAS_CORETYPE"] = "nehalem" it works, but without it does not work.

import os
os.environ["OPENBLAS_VERBOSE"] = "2"
os.environ["OPENBLAS_CORETYPE"] = "nehalem"
import numpy as np

A = np.matrix([[1.], [3.]])
B = np.matrix([[2., 3.]])
ret = np.dot(A, B)
print(ret)

@kfrendrich I guess you are also on a virtual machine or so? Otherwise, maybe there is an issue with the CPU detection in OpenBLAS? If this is a virtual machine I think you may just have to use those environment variables.

@seberg yes it is a virtual machine

This sounds like some bug in the way OpenBLAS is detecting the cpu. Could you report it to them? I could not find open kvm issue, but there are a few closed ones

Yes, I will report them soon.

Does this still reproduce?

The same crash with numpy 1.17.3.

$ python
>>> import numpy as np
>>> A = np.matrix([[1.0], [3.0]])
>>> B = np.matrix([[2.0, 3.0]])
>>> ret = np.dot(A, B)
Illegal instruction (core dumped)

This is probably the reason for the error I caught - Python dies with the "Illegal instruction (core dumped)" when trying to run:

import matplotlib.pyplot as plt
fig = plt.figure(figsize=(10,10))

Proposed sollution:

import os
os.environ["OPENBLAS_VERBOSE"] = "2"
os.environ["OPENBLAS_CORETYPE"] = "nehalem"
import numpy as np

doesn't work :(

Here is full story and config:

$ python
Python 3.6.8 (default, Oct  9 2019, 14:04:01)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> import numpy as np
>>> print(sys.version)
3.6.8 (default, Oct  9 2019, 14:04:01)
[GCC 8.3.0]
>>> print(np.__version__)
1.17.3
>>> np.show_config()
blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]

CPU info (this is virtual machine):

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel Xeon Processor (Skylake)
stepping        : 4
microcode       : 0x1
cpu MHz         : 2199.998
cache size      : 4096 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm pti fsgsbase smep erms
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips        : 4399.99
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel Xeon Processor (Skylake)
stepping        : 4
microcode       : 0x1
cpu MHz         : 2199.998
cache size      : 4096 KB
physical id     : 1
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm pti fsgsbase smep erms
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips        : 4399.99
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel Xeon Processor (Skylake)
stepping        : 4
microcode       : 0x1
cpu MHz         : 2199.998
cache size      : 4096 KB
physical id     : 2
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm pti fsgsbase smep erms
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips        : 4399.99
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel Xeon Processor (Skylake)
stepping        : 4
microcode       : 0x1
cpu MHz         : 2199.998
cache size      : 4096 KB
physical id     : 3
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 3
initial apicid  : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm pti fsgsbase smep erms
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips        : 4399.99
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

@prokulski how did you install openblas? Which version do you use? When you check their github repository, you can which version should behave in which way.

openblas has been installed automaticaly (as dependency of something). I've tried to remove it and install once again (via apt install) - didn't help

I assume this also crashes with regular arrays (instead of np.matrix)? It would be nice to eliminate deprecated code from the debugging. Does this behave on your system?

>>> import numpy as np
>>> A = np.array([[1.0], [3.0]])
>>> B = np.array([[2.0, 3.0]])
>>> ret = np.dot(A, B)

How about

>>> import numpy as np
>>> A = np.array([[1.0], [3.0]])
>>> B = np.array([[2.0, 3.0]])
>>> ret = np.matmul(A, B)

I assume this also crashes with regular arrays (instead of np.matrix)? It would be nice to eliminate deprecated code from the debugging. Does this behave on your system?

>>> import numpy as np
>>> A = np.array([[1.0], [3.0]])
>>> B = np.array([[2.0, 3.0]])
>>> ret = np.dot(A, B)

Crachesh with "Illegal instruction (core dumped)"

>>> import numpy as np
>>> A = np.array([[1.0], [3.0]])
>>> B = np.array([[2.0, 3.0]])
>>> ret = np.matmul(A, B)

Gives:

>>> ret
array([[2., 3.],
       [6., 9.]])

@prokulski how did you install openblas? Which version do you use? When you check their github repository, you can which version should behave in which way.

openblas has been installed automaticaly (as dependency of something). I've tried to remove it and install once again (via apt install) - didn't help

This still does not answer the question about the version number.

They changed the system about how the library detects the CPU type. Better contact them directly as numpy only uses the library. See e.g. my issue https://github.com/xianyi/OpenBLAS/issues/2067 as a reference.

$ apt list *blas* | grep installed
libblas-dev/bionic,now 3.7.1-4ubuntu1 amd64 [installed,automatic]
libblas3/bionic,now 3.7.1-4ubuntu1 amd64 [installed,automatic]
libgslcblas0/bionic,now 2.4+dfsg-6 amd64 [installed,automatic]

I've instaled libopenblas-dev, and now it is:

$ apt list *blas* | grep installed
libblas-dev/bionic,now 3.7.1-4ubuntu1 amd64 [installed,automatic]
libblas3/bionic,now 3.7.1-4ubuntu1 amd64 [installed,automatic]
libgslcblas0/bionic,now 2.4+dfsg-6 amd64 [installed,automatic]
libopenblas-base/bionic,now 0.2.20+ds-4 amd64 [installed,automatic]
libopenblas-dev/bionic,now 0.2.20+ds-4 amd64 [installed]

Python:

$ python
Python 3.6.8 (default, Oct  9 2019, 14:04:01)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> A = np.matrix([[1.0], [3.0]])
>>> B = np.matrix([[2.0, 3.0]])
>>> ret = np.dot(A, B)
Illegal instruction (core dumped)

You can try setting OPENBLAS_CORETYPE to the CPU of the host system. The numpy you get with pip install from the PyPI contains OpenBLAS 0.3.7, which should contain a fix for this. If you choose to use pip install and are not using a virtualenv, be sure to use pip install --upgrade --user numpy to avoid conflicts with your system's python.

To determine OpenBLAS version, you can use this code, which will report the version for OpenBLAS 0.3.5 and up

    import numpy
    import ctypes

    dll = ctypes.CDLL(numpy.core._multiarray_umath.__file__)
    get_config = dll.openblas_get_config
    get_config.restype=ctypes.c_char_p
    res = get_config()
    print('OpenBLAS get_config returned', str(res))

@mattip your code gives "OpenBLAS get_config returned b'OpenBLAS 0.3.7 DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=64'"

Setting OPENBLAS_CORETYPE to Haswell (lowercase, upercase, whateher) doesn't help - I've tried this earlier.

I think the problem is in KVM virtual machine and bad CPU detection in blas. Strange, that some days earlier everythink was ok...

I think Haswell is the wrong value for you, but a quick google search is not coming up with good answers for detecting the host cpu once inside a virtual machine.

I think Haswell is the wrong value for you, but a quick google search is not coming up with good answers for detecting the host cpu once inside a virtual machine.

Yep. Skylake would be better (if you'll see cpuinfo), but also doesn't help. I'm stuck.
https://github.com/numpy/numpy/milestone/69 gives hope ;)

Try these as OpenBLAS kernel specifiers, do any of the them work?

export OPENBLAS_CORETYPE=prescott

and dunnington, penryn, core2, nehalem, sandybridge

Every one dies.

Does KVM allow setting the cpu type? Has it been recently updated?

Just to confirm that setting the core works correctly, add:

export OPENBLAS_VERBOSE=2
export OPENBLAS_CORETYPE=prescott
python -c 'import numpy as np; A = np.array([[1.0], [3.0]]); B = np.array([[2.0, 3.0]]); print(np.dot(A, B))'

I get:

Core: Prescott
[[2. 3.]
 [6. 9.]]

Support said that nothing has changed with procesors last time.

I've also pulled and started docker image jupyter/tensorflow-notebook on this machine and np.dot() example runned in Jupyter fails same way. So, I think this is hardware problem?

@prokulski if you use the code from the comment above, do you get the output Core: Prescott before the illegal instruction?

Since this seems to be specific to your hardware, I am removing it from the 1.18 milestone.

I have "Core: Presscot" and then matrix 2x2 and NO illegal instruction.

Without "export OPENBLAS_CORETYPE=prescott" line there is Illegal instruction and no matrix.

It seems somehow the routine to detect your CPU inside OpenBLAS is confused by your KVM. Could you report it on the OpenBLAS issue tracker so they can work out what is going on?

You can probably help OpenBLAS by cycling through all the core types listed above, and finding which ones do crash.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dmvianna picture dmvianna  ·  4Comments

Levstyle picture Levstyle  ·  3Comments

toddrjen picture toddrjen  ·  4Comments

ghost picture ghost  ·  4Comments

manuels picture manuels  ·  3Comments