numpy 🚀 - Windows wheel package (.whl) on Pypi

Well said - and indeed there is a lot of work by @carlkl going on behind the scenes to make this happen. I believe we are nearly there now - @carlkl - when will you go public, do you think?

matthew-brett on 22 Jan 2015

For context: the reason this isn't trivial is that the binaries you linked
to depend on Intel's proprietary runtime and math library, which
complicates redistributing them.

njsmith on 22 Jan 2015

I deployed the recent OpenBLAS based numpy and scipy wheels on binstar. You can install them with:

pip install -i https://pypi.binstar.org/carlkl/simple numpy
pip install -i https://pypi.binstar.org/carlkl/simple scipy

This works for python-2.7 and for python-3.4. The wheels are marked as 'experimental'. Feedback is welcome.

carlkl on 22 Jan 2015

If you want widespread testing then you should send this to the list :-)

On Thu, Jan 22, 2015 at 8:54 PM, carlkl [email protected] wrote:

I deployed the recent OpenBLAS based numpy and scipy wheels on binstar.
You can install them with:

pip install -i https://pypi.binstar.org/carlkl/simple numpy
pip install -i https://pypi.binstar.org/carlkl/simple scipy

This works for python-2.7 and for python-3.4. The wheels are marked as
'experimental'. Feedback is welcome.

—
Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/issues/5479#issuecomment-71096693.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

njsmith on 22 Jan 2015

fwiw I personally would like to change the size of the default integer in win64 before we actually provide official binaries, though there was some resistance too it when I last proposed it, also possibly with anaconda and other third party binaries its probably already too late :(

also speaking of openblas, someone fancy some debugging, I'm tired of it (looks like the same failure that breaks scipy with openblas):

test_einsum_sums_float64 (test_einsum.TestEinSum) ... ==31931== Invalid read of size 16
==31931==    at 0x7B28EB9: ddot_k_NEHALEM (in /usr/lib/libopenblasp-r0.2.10.so)
==31931==    by 0x6DBDA90: DOUBLE_dot (arraytypes.c.src:3127)
==31931==    by 0x6E93DEC: cblas_matrixproduct (cblasfuncs.c:528)
==31931==    by 0x6E6B7B3: PyArray_MatrixProduct2 (multiarraymodule.c:994)
==31931==    by 0x6E6E29B: array_matrixproduct (multiarraymodule.c:2276)

juliantaylor on 22 Jan 2015

OpenBLAS version used is 0.2.12. I didn't experienced significant problems with this version yet.

The scipy failures are copied to https://gist.github.com/carlkl/b05dc6055fd42eba8cc7.

32bit only numpy failures due to http://sourceforge.net/p/mingw-w64/bugs/367

======================================================================
FAIL: test_nan_outputs2 (test_umath.TestHypotSpecialValues)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\tools\wp_279\python-2.7.9rc1\lib\site-packages\numpy\core\tests\test_umath.py", line 411, in test_nan_outputs2
    assert_hypot_isinf(np.nan, np.inf)
  File "D:\tools\wp_279\python-2.7.9rc1\lib\site-packages\numpy\core\tests\test_umath.py", line 402, in assert_hypot_isinf
    "hypot(%s, %s) is %s, not inf" % (x, y, ncu.hypot(x, y)))
  File "D:\tools\wp_279\python-2.7.9rc1\lib\site-packages\numpy\testing\utils.py", line 53, in assert_
    raise AssertionError(smsg)
AssertionError: hypot(nan, inf) is nan, not inf

======================================================================
FAIL: test_umath_complex.TestCabs.test_cabs_inf_nan(<ufunc 'absolute'>, inf, nan, inf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\tools\wp_279\python-2.7.9rc1\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "D:\tools\wp_279\python-2.7.9rc1\lib\site-packages\numpy\core\tests\test_umath_complex.py", line 523, in check_real_value
    assert_equal(f(z1), x)
  File "D:\tools\wp_279\python-2.7.9rc1\lib\site-packages\numpy\testing\utils.py", line 275, in assert_equal
    return assert_array_equal(actual, desired, err_msg, verbose)
  File "D:\tools\wp_279\python-2.7.9rc1\lib\site-packages\numpy\testing\utils.py", line 739, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "D:\tools\wp_279\python-2.7.9rc1\lib\site-packages\numpy\testing\utils.py", line 628, in assert_array_compare
    chk_same_position(x_isnan, y_isnan, hasval='nan')
  File "D:\tools\wp_279\python-2.7.9rc1\lib\site-packages\numpy\testing\utils.py", line 608, in chk_same_position
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

x and y nan location mismatch:
 x: array([ nan])
 y: array(inf)

======================================================================
FAIL: test_umath_complex.TestCabs.test_cabs_inf_nan(<ufunc 'absolute'>, -inf, nan, inf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\tools\wp_279\python-2.7.9rc1\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "D:\tools\wp_279\python-2.7.9rc1\lib\site-packages\numpy\core\tests\test_umath_complex.py", line 523, in check_real_value
    assert_equal(f(z1), x)
  File "D:\tools\wp_279\python-2.7.9rc1\lib\site-packages\numpy\testing\utils.py", line 275, in assert_equal
    return assert_array_equal(actual, desired, err_msg, verbose)
  File "D:\tools\wp_279\python-2.7.9rc1\lib\site-packages\numpy\testing\utils.py", line 739, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "D:\tools\wp_279\python-2.7.9rc1\lib\site-packages\numpy\testing\utils.py", line 628, in assert_array_compare
    chk_same_position(x_isnan, y_isnan, hasval='nan')
  File "D:\tools\wp_279\python-2.7.9rc1\lib\site-packages\numpy\testing\utils.py", line 608, in chk_same_position
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

x and y nan location mismatch:
 x: array([ nan])
 y: array(inf)

carlkl on 22 Jan 2015

I don't disagree about changing the win64 integer size, but I think that's
a separate issue that should be decoupled from the wheels. If this were the
first time win64 numpy builds were becoming widely available, then it'd
make sense to link them, but at this point there are already tons of users
for years, they're just using cgholke or anaconda or whatever. So let's
treat that as an independent discussion?

(Strictly speaking it's a backcompat break, but even so it seems reasonable
that we might be able to pull it off, since it actually reduces
incompatibility between platforms -- all portable code has to handle 64-bit
dtype=int already.)

On Thu, Jan 22, 2015 at 8:59 PM, Julian Taylor [email protected]
wrote:

fwiw I personally would like to change the size of the default integer in
win64 before we actually provide official binaries, though there was some
resistance too it when I last proposed it, also possibly with anaconda and
other third party binaries its probably already too late :(

also speaking of openblas, someone fancy some debugging, I'm tired of it
(looks like the same failure that breaks scipy with openblas):

test_einsum_sums_float64 (test_einsum.TestEinSum) ... ==31931== Invalid read of size 16
==31931== at 0x7B28EB9: ddot_k_NEHALEM (in /usr/lib/libopenblasp-r0.2.10.so)
==31931== by 0x6DBDA90: DOUBLE_dot (arraytypes.c.src:3127)
==31931== by 0x6E93DEC: cblas_matrixproduct (cblasfuncs.c:528)
==31931== by 0x6E6B7B3: PyArray_MatrixProduct2 (multiarraymodule.c:994)
==31931== by 0x6E6E29B: array_matrixproduct (multiarraymodule.c:2276)

—
Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/issues/5479#issuecomment-71097408.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

njsmith on 23 Jan 2015

I'm also interested in this. Is there some way of assisting with the process?

TFenby on 12 Feb 2015

OpenBLAS can be compiled with INTERFACE64=1 and numpy can be compiled with -fdefault-integer-8 for a first try.

carlkl on 12 Feb 2015

Just a heads up. Using 64 bit integers in blas is a terrible idea. Stop before you get too far down that road. Matlab, and Julia before I went and fixed it, did this, and it breaks any third-party library that assumes conventional 32-bit integers in blas.

What we've been doing in Julia for the past ~5 months is actually renaming all the symbols in openblas to add a _64 suffix to them for the 64-bit-int version, that way you can do linear algebra on really huge arrays if you want, but loading external libraries into the same process won't segfault by name shadowing and trying to call dgemm with the wrong ABI.

tkelman on 16 Mar 2015

Hey guys is there any update on the wheels files being made available for Numpy ?

guyverthree on 25 Jun 2015

Not that I'm aware of right now.
On Jun 25, 2015 4:27 AM, "guyverthree" [email protected] wrote:

Hey guys is there any update on the wheels files being made available for
Numpy ?

—
Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/issues/5479#issuecomment-115215236.

njsmith on 25 Jun 2015

@guyverthree Christoph Gohlke has been releasing NumPy using Intel's MKL as wheels for a while now.

Also, see my blog post on NumPy wheels. I made some NumPy wheels in my Dropbox using Carl Kleffner's modified mingw-w64 toolchain and Zhang Xianyi's OpenBLAS port of GotoBLAS. Olivier Grisel was looking for help modifying the NumPy buildbot to repeat the same steps used in the OpenBLAS google groups thread I post to.

mikofski on 25 Jun 2015

My latest version is available on binstar.org though I'm not sure if anaconda.org is the new prefered name now.
The wheels for py-2.6 .. 3.4 (32/64bit) are about 2 months old:

numpy-1.9.2
scipy-0.15.1
scikit-image-0.11.2

build with my https://bitbucket.org/carlkl/mingw-w64-for-python and a more or less recent OpenBLAS.
pip install:

pip install -i https://pypi.binstar.org/carlkl/simple numpy
pip install -i https://pypi.binstar.org/carlkl/simple scipy

carlkl on 26 Jun 2015

+1 @carlkl and I wish these could be added to NumPy build at the Cheese Factory as well.

mikofski on 10 Jul 2015

+1 I would love to see this happen too.

matysek on 23 Oct 2015

IMHO: there are at least three problems to be solved before these builds are going to be accepted:

the mingwpy patches for the numpy repository has to be recreated
there is no build mechanism aside manually building yet
many 3-rd party windows packages (depolyed by C. Gohlke) explicitly depends on numpy-MKL, because the binaries are hard linked against MKL DLLs. This may change in future, as scipy now provides a mechanism for an implicit dependancy on the scipy BLAS/Lapack implementation. So installing (numpy-MKL & scipy-MKL) OR (numpy-OpenBLAS & scipy-OpenBLAS) should be suffcient for all other packages in future.

carlkl on 23 Oct 2015

@carlkl: FWIW, I'm not really worried about @cgohlke's packages -- that will sort itself out (just like there aren't major problems now due to people trying to combine scipy-MKL with anaconda numpy). And I'm not even really worried about there being some fancy build mechanism -- a manual build is fine so long as there's a text file documenting the steps.

The main issue that I'm worried about is sustainability: if we can't get this stuff upstream, then we'll have to re-validate and re-do patches every time a new version of gcc / mingw-w64 / msvc comes out, and it probably won't happen. We don't want to get caught in the trap where we start providing builds, but then this becomes more and more onerous over time as we have to deal with a cranky old compiler to do it.

Which is why I've been trying to round up funding to support doing this upstream... +1's are great and all, but if anyone wants to donate some money or knows a company who might be interested in making gcc generally usable for python extensions on windows, then send me an email :-) ([email protected])

If you don't have $$ but still want to help, then one way to do that would be to send patches to mingw-w64 improving their support for transcendental functions like sin and cos. (It turns out that the MSVC ABI disagrees with everyone else about how the x87 FPU unit should be configured, so most of the free software mathematical functions don't work quite right.) Fortunately there are good, license-compatible implementations in Android's "bionic" libc, so this doesn't require any mathematical wizardry or deep insight into ABI issues -- it's just a mostly mechanical matter of finding and extracting the relevant source files and then dropping them into the mingw-w64 tree at the right place. We can provide more details on this too if anyone's interested.

njsmith on 24 Oct 2015

Isn't this the kind of thing that numfocus should be funding? If not, then perhaps we can go back and revisit applying to the PSF.

How much money are we talking?

matthew-brett on 11 Nov 2015

+1 please publish wheels for Windows to PyPI https://pypi.python.org/pypi/numpy

If you try pip install numpy on an out-the-box Python Windows installation, you get the infamously unhelpful error message "Unable to find vcvarsall.bat".

hickford on 2 Dec 2015

+1 would really help Windows users.

johnthagen on 25 Dec 2015

Can't play with https://github.com/glumpy/glumpy because of this. What are the manual build steps to get Numpy working on Windows? Looks like AppVeyor job is there, so should be no problem to upload artifacts to GitHub.

techtonik on 26 Dec 2015

Right now it is literally impossible to build a fast, BSD-licensed version of numpy on windows. We're working on fixing that, but it's a technical limitation; +1's aren't going to have any effect either way. (The appveyor job does build on windows, but it uses a fallback unoptimized linear algebra library that isn't really suitable for real work.) Until we get this sorted, then I'd recommend downloading wheels from Christoph Gohlke's website, or using Anaconda or another scientific python distribution.

njsmith on 26 Dec 2015

@njsmith can you be more specific? Preferably with exact commands that don't work. Right now this stuff is not actionable.

techtonik on 26 Dec 2015

I think 'impossible' is too strong - but there's certainly not yet an obvious general way forward. I put up a wiki page on the current status here: https://github.com/numpy/numpy/wiki/Whats-with-Windows-builds . Please feel free to edit / amend all ye who care to.

matthew-brett on 26 Dec 2015

@techtonik: there are no "exact commands that don't work", the problem is that there are no compilers that have the combination of features we need. mingwpy.github.io documents the current status of our efforts to create such a compiler.

njsmith on 26 Dec 2015

@matthew-brett nice. We can't use MSVC++ on its own to compile scipy because we need a Fortran compiler. It is for scipy, right? Why it is needed for numpy?

techtonik on 26 Dec 2015

@njsmith http://mingwpy.github.io/issues.html is an awesome initiative with a good analysis. Too bad that upstream (Python) will never support it (promotes using MSVS blindly). But I am trying to get a clear picture from the current status.

is it a problem of "having open toolchain for open work" or MSVS really can not compile C part of numpy?
are there still crashes with mingw-compiled extensions?

To narrow the focus for now, let's say it is only Python 2.7 + Win32. No performance is necessary (I just want to run application to test it there), but a benchmark data about that performance is needed.

So, what is the next action that should be done for this configuration to make Windows wheel available from PyPI?

techtonik on 26 Dec 2015

@techtonik , there are now preliminary versions of numpy and scipy wheels available on https://anaconda.org/carlkl/numpy and https://anaconda.org/carlkl/scipy. Performance is almost as good as the gohlke's +MKL wheels. I didn't encounter segfaults with my windows box at home.

Several issues with this approach has been discussed and are gone to be summarized at http://mingwpy.github.io (under construction). The combination of mingw-w64 based toolchain called _mingwpy_ and OpenBLAS is the way to go for the Windows platform.

_mingwpy_ has a special configuration that ensures better compatibility and a more conveniant usage compared to the most known mingw-w64 based toolchains, i.e. _mingw-builds_, _tdm_ ...

All this and more is explained at https://github.com/mingwpy/mingwpy.github.io. Feel free to open issues or PRs there.

carlkl on 26 Dec 2015

@techtonik: I think that's a serious misunderstanding/misrepresentation of upstream python.org's position. I would say that they refuse to promote the schisming of Windows CPython support into multiple incompatible ABIs (and I agree with them on this). Steve Dower, who maintains the official upstream windows builds, has been helping us figure out how to make mingwpy compatible with these builds.

IMO the prerequisite for putting numpy wheels on pypi is that they should be (a) performant, (b) maintainable, (c) appropriately licensed. If you want the project to apply a different set of criteria (i.e. that we should put effort into providing wheels with terrible performance) then the next step is to send an email to the numpy mailing list making a case that your criteria are better.

MSVS can build numpy itself, but it can't build any of the appropriately-licensed high-quality BLAS implementations. Upstream mingw-w64 can build numpy + BLAS (with patches), but the result will crash if you try to use it with upstream CPython. Carl's mingwpy toolchain can build numpy + BLAS (with patches), and the result will work on some versions of python (but not 3.5), but the toolchain is fragile and unmaintainable in its current state; literally no-one except Carl knows how it was built or could recreate it. No-one in the numpy project is ready to commit ourselves to providing "official builds" using a toolchain with these limitations, so we're focusing on fixing those.

There are multiple trivially-available sources of high-quality numpy builds on Windows. I am genuinely curious: why are you so insistent that we should throw some low-quality builds up just so that they'll be on PyPI?

njsmith on 26 Dec 2015

@njsmith Just wanted to state that my use case (which I admit in no way would justify an investment of developer resources on its own) is to distribute a very simple package on PyPI that depends on matplotlib, which in turn depends on numpy.

For my use case performance is not a concern, but being able to have a Windows user simply pip install ____ my package which recursively installs matplotlib, numpy, etc is a lot easier to explain than pointing them to URLs to also install, especially for users who don't understand the Python build ecosystem. So it's mostly for a simplification of the install instructions.

Again, not trying to use my case as justification, but just wanted to share as you were curious.

johnthagen on 26 Dec 2015

@johnthagen: Oh, sure, no worries! I totally get why this is desirable in general; if I come across as grumpy in these comments its exactly because I and others have been spending huge amounts of time over the last year trying to get this fixed :-). I was just asking @techtonik specifically because it sorta sounded like they were saying "I just want to try one little application, so I don't care about performance", but if they just want to try one little application, I don't know why they care about the PyPI part :-)

(It's important to keep in mind that any wheel we put up on pypi will immediately start being used by tens of thousands of people, most of whom aren't reading this thread. So I think there's some obligation on us to make sure that whatever we put up will in fact be broadly usable for a wide range of use cases.)

njsmith on 27 Dec 2015

I think it would be essentially trivial to start shipping 32-bit numpy wheels for Python 2.7, using ATLAS. They would likely have to be SSE2, so crash without SSE instructions, but that would only affect a very small proportion of users. We could use our current release toolchain for that. Bear in mind that this would mean that pip would give a binary wheel for 32-bit, but fall back to source install for 64-bit. Would that be useful?

matthew-brett on 27 Dec 2015

@njsmith Thanks for the info! Appreciate all of your hard work :)

johnthagen on 27 Dec 2015

I think it would be essentially trivial to start shipping 32-bit numpy wheels for Python 2.7, using ATLAS. They would likely have to be SSE2, so crash without SSE instructions, but that would only affect a very small proportion of users. We could use our current release toolchain for that. Bear in mind that this would mean that pip would give a binary wheel for 32-bit, but fall back to source install for 64-bit. Would that be useful?

@matthew-brett the current numpy-vendor setup is broken, there's a segfault in fromfile. File handle handling is somehow messed up, and we're not sure if that's due to a change in Wine version, Ubuntu version or (unlikely) a change in numpy itself. I'd say that spending more time on that is a waste of time - putting that time into mingwpy is way more productive.

rgommers on 28 Dec 2015

I have NumPy 1.10.4 compiled with both OpenBLAS (Int32 Windows 64, v0.2.15 precompiled binary) and MKL (using a community license on MKL, i.e. free distribution). But... I can't compile SciPy - seems a small portion looks for the gfortran compiler "fortan compiler not found" if anyone has an idea how to fix this issue. I'm using ifort.exe since Ananconda supports these builds as direct plug-ins. Compiled for Python 3.5 with Microsoft Visual Studio Community 2015 if anyone can help me figure out how to package this for distribution.... then I will upload to github or anaconda's website. Appreciate it.

mrslezak on 9 Feb 2016

@mrslezak: probably the best thing to do is to post on the scipy developers mailing list, or to open a new bug on scipy, rather than post on random existing bugs :-)

njsmith on 9 Feb 2016

I am genuinely curious: why are you so insistent that we should throw some low-quality builds up just so that they'll be on PyPI?

Just because I am tired of shaving yaks. I know that people want performance, and it is good that somebody has resources to go for it, but for me personally the complexity of getting this task done is enormous, so I can only hope that you manage to do this, but for me that may never happen, or it may happen in a two or three years, during which people continue to hit walls and waste time in hours proportional to the downloads of all windows binaries from PyPI that require installing NumPy as a direct of indirect dependency.

Wheew. Probably the longest English sentence that I wrote in my whole life. =)

techtonik on 9 Feb 2016

@techtonik - I share your frustration, I think many of us feel frustrated about this.

@carlkl - I would love your feedback here.

There's clearly strong pressure for us to put up a numpy windows wheel. Here is a list of the most-downloaded wheels for any platform from a couple of weeks ago : https://gist.github.com/dstufft/1dda9a9f87ee7121e0ee . matplotlib, scikit-learn and pandas windows wheels come at positions 3, 4, and 5. There would be a large market for numpy windows wheels.

I think the questions on the table are:

1) Can we commit ourselves to getting a working and near-optimal numpy wheel up on pypi in the short to medium term (say, 6 months). I'd say the answer to this is yes (happy to hear disagreement);
2) Is it worth putting up a not-optimal numpy wheel in the meantime for others to build against?

Question 2 is the harder one. "Not optimal" could mean slow (no optimized blas / lapack) or difficult to support (no guarantee we could repeat the build in 6 months time).

I can see arguments against "slow". We need to be careful that, when wheels do start to work for Windows, they do not immediately trigger stackoverflow questions with answers "On no account download the numpy wheels from pypi". I think those answers would be reasonable and they would last long enough to hurt us.

Not-optimal meaning, difficult to support the build process, I think we can live with, if we are really committed to finding a long-term solution fairly soon.

A while ago I built ATLAS binaries for Windows : http://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/

Am I right in thinking that we can already build numpy binaries that pass all tests using these ATLAS binaries?

In which case, why don't we put those up?

matthew-brett on 9 Feb 2016

1) Can we commit ourselves to getting a working and near-optimal numpy wheel up on pypi in the short to medium term (say, 6 months). I'd say the answer to this is yes (happy to hear disagreement);

I'd hope so, otherwise it means that by then we'll either have run into unexpected trouble with the mingwpy proposal or didn't cache in on what it enables:)

2) Is it worth putting up a not-optimal numpy wheel in the meantime for others to build against?

Your ATLAS builds seem to be done with Cygwin? Or is that just directory naming and you used some version of MingwPy?

rgommers on 9 Feb 2016

I think my ATLAS builds were done with Cygwin, but they don't link to the Cygwin.dll, so I think they would be safe for building with MSVC.

matthew-brett on 9 Feb 2016

mingwpy is not in trouble but need its time. Building the gcc toolchain, OpenBLAS and then numpy/scipy with different variants takes build and testing time. And I won't publish binaries without publish all build scripts first. A mingwpy based on gcc-5.3.0 is almost ready, as well as OpenBLAS. The next step is to build numpy and scipy wheels based on that.

This discussion as well as the latest contributions to the numpy thread "Multi-distribution Linux wheels - please test" leads to the question wether OpenBLAS has the quality that allows deployment of windows numpy wheels based on OpenBLAS. But I'm not sure that using atlas instead is better solution. Maybe numpy wheels should be build with both variants for a testing phase first.

carlkl on 9 Feb 2016

I'm guessing / hoping that we'll somehow to get the stage that OpenBLAS is of acceptable quality. But, until that time, it seems reasonable to me to start off with ATLAS numpy wheels, expecting that in due course we'll be able to switch to OpenBLAS wheels. We might have to put in the SSE2 check for the 32-bit builds though : http://mingwpy.github.io/blas_lapack.html#atlas

matthew-brett on 10 Feb 2016

Placing a progress box on top PyPI page may bring more people to the issue (including those who may donate to support the initiative). The box may list the current strategy, acceptance criteria (link to performance test?), status and the action that will done when final version is ready (increase major version?).

techtonik on 10 Feb 2016

@matthew-brett it's still not clear to me that your proposal for throwing something up is viable. What compiler would you use? If MingwPy, we have a clear plan of in what order to do and now seems too early. If another gcc, we're back to the static linking issue and distributing DLL pain.

rgommers on 10 Feb 2016

My idea was to compile numpy with ATLAS using MSVC. Of course that couldn't work for scipy, but at least people would be able to get started with shipping their windows wheels, however built.

I just tried that and got some errors of form unresolved external symbol __gfortran_compare_string so I suppose the ATLAS binaries have some dangling references to the gfortran run-time. @carlkl - any suggestions for how to debug?

matthew-brett on 10 Feb 2016

Mixing static object files coming from different compilers is something you should avoid like the devil avoids holy water. In some cases it works, but for a different set of combinations of compilers it will fail.
BTW: MS itself does not officially support or recommends mixing static objects from from different versions of their Visual Studio.

I made some tests some weeks ago, as this question pops up: can the static library npymath.a created by mingwpy be used with MSVC compilers? In principle it may work if some selected objects from the gcc runtime libraries are added to this library. I came to the conclusion, that such an approach is instable and fragile.

If atlas is an option for building numpy wheels I would try to build it as DLL, any objections?

carlkl on 10 Feb 2016

Mixing static object files coming from different compilers is something you should avoid like the devil avoids holy water.

I feel that https://mingwpy.github.io/motivation.html (Why page) lacks some very simple and straightforward explanation of the problem for dynamic loaded modules. I spoke with the Far Manager guys, whose file manager is native to Windows, built on plugins, which are loaded from .dlls written in different languages, and they don't have this problem with "exactly the same compiler". I wonder why Python has it - it also loads modules from .dlls..

techtonik on 11 Feb 2016

@techtonik, my comment was about linking object files produced by different compilers into a single binary file (DLL or EXE). That is what I meant with _mixing static object files_. Such a approach _can_ work in some well tested situations if handled with care. But it is far from being a robust way to build binaries.

The interoperability of DLLs from different compilers in a common process space is a completly different thing. Usually such an approach works fine as a general rule. One has to ensure, that these binaries are linked to the very same MS runtime DLL if they i.e. share file descriptors. There are other possible ABI issues as well that has to be handled. And of course you need a different set of debuggers for debugging depending on the compilers being used.

minwgpy is a project to support building python extensions with the help of mingw-w64 for using inside the standard MSVC CPython builds.

carlkl on 11 Feb 2016

OK - I managed to build numpy with MSVC linking against a build of ATLAS.

ATLAS build here:

http://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-3.10.1-sse2-32.tgz

There are some bare-bones instructions in there how to build the ATLAS dll.

All numpy tests pass apart from the f2py script check, I think that's a benign failure.

The last step is shipping the dynamic library inside the wheel. @carlkl - what's your current favorite way of doing that?

matthew-brett on 11 Feb 2016

Good to hear, I'd like to also figure out how to create a wheel with
binaries included - can post an MKL build and have others test the OpenBlas
one.
On Feb 11, 2016 1:28 PM, "Matthew Brett" [email protected] wrote:

OK - I managed to build numpy with MSVC linking against a build of ATLAS.

ATLAS build here:

http://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/atlas-3.10.1-sse2-32.tgz

There are some bare-bones instructions in there how to build the ATLAS
dll.

All numpy tests pass apart from the f2py script check, I think that's a
benign failure.

The last step is shipping the dynamic library inside the wheel. @carlkl
https://github.com/carlkl - what's your current favorite way of doing
that?

—
Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/issues/5479#issuecomment-183021728.

mrslezak on 11 Feb 2016

The last step is shipping the dynamic library inside the wheel.

And the SSE2 check and graceful bailout?

rgommers on 11 Feb 2016

@mrslezak - the easiest way is to put it into the numpy/core folder, as it is atomatically loaded into the process space during import of multiarray.pyd.

carlkl on 11 Feb 2016

The last step is shipping the dynamic library inside the wheel

@matthew-brett: I'm 99% sure that the "right" way to do this is via SxS assemblies, whose documentation is rigorously poor, but probably doable... I know you have spent time trying to understand them, and I've been reading up too, so if you want to sit down at some point and try and work out the details let me know :-).

(The problem with all other approaches is that IIUC windows processes normally maintain a single global namespace of all imported dlls. What this means is that if two extensions both ship a file named foo.dll, then whichever extension is loaded first will have its version of foo.dll "win" and the other extension will end up using it -- the classic "dll hell" problem. And IIUC the only way to avoid this behavior is through the SxS machinery, as ugly as it is.)

njsmith on 11 Feb 2016

Nathaniel - I wrote up my understanding of SxS assemblies here : https://github.com/numpy/numpy/wiki/windows-dll-notes#side-by-side-assemblies

My eventual conclusion was that it was hopeless, and that, in any case, renaming the DLL in some unique-per-process way was a reasonable alternative.

matthew-brett on 11 Feb 2016

Ralf - suggestion to formalize way of adding SSE2 etc hooks to installation process : https://github.com/numpy/numpy/pull/7231

matthew-brett on 11 Feb 2016

@matthew-brett: I've read those notes, yeah.... and ugh sigh, hopeless why? Because of the same-directory issues? And do you have any ideas for how to accomplish that renaming? (I haven't yet found any equivalent to patchelf --replace for PE files, and regenerating .lib files is non-trivial -- though I guess is using mingw-w64 it isn't so bad because you can link against the .dll directly. At least if you don't need to rename libgfortran or similar...)

njsmith on 11 Feb 2016

(It's possible that there is some PE equivalent to patchelf --replace somewhere on this list: http://www.woodmannsfortress.com/collaborative/tools/index.php/Category:Import_Editors)

njsmith on 11 Feb 2016

I don't see a problem to load satlas.dll (or alternatively libopenblaspy.dll) alongside multiarray.pyd, as this directory is prefered during DLL searching. This approach works due to the fact that this DLL is loaded via LoadLibraryEx from python into the process space. The folder numpy/core has to be used, as here is the very first occurence of a blas dependent python extension during import. Any further attempts to load a DLL with the same name are simply ignored because this DLL is already loaded into the process space. Windows just looks for the name of the DLL BTW.

The DLL hell start If such a library is dependent on _further_ DLLs, but this isn't the case as both satlas.dll and libopenblaspy.dll are self contained and only depends on standard Windows system DLLs. This is what is usually called statically linked DLLs - that means the gcc runtime code is statically linked.

_For comparison_: To import the MKL libraries the approach is to temporarily extend the PATH to numpy/core. Unfortunately this fails if older MKL libraries are placed in the Windows system folders.

carlkl on 11 Feb 2016

@matthew-brett @njsmith: DLL renaming: what is it good for?

carlkl on 11 Feb 2016

@carlkl: the case that we are worried about is if numpy include atlas.dll, and also scipy includes atlas.dll, and at some point the user upgrades scipy (and gets a newer version of atlas.dll), but then scipy ends up using the old version of atlas.dll that comes from the numpy package. This is bad, because scipy may be depending on having the newer version -- so things will randomly break depending on exactly what builds of which packages are involved, and which order the user imports them. This happens because if numpy includes a DLL named atlas.dll then that will "claim" the name atlas.dll in the process-wide DLL namespace, and it will block any other packages from using different DLLs with that name.

Two possible solutions are (a) if the SxS/activation-contexts stuff can be made to work, it provides a way to disable the process-wide DLL namespace, or (b) if numpy contains numpy-atlas.dll, and scipy contains scipy-atlas.dll, then they these can share the same process-wide namespace without colliding.

njsmith on 12 Feb 2016

Or if both depend on a separate clib_atlas package that provides the dll? Then version dependency requirements can be expressed as usual for python packages.

tkelman on 12 Feb 2016

@tkelman: We need to figure out how to support both vendored DLLs and separately-distributed DLLs, I think, since both options are appropriate in different situations. And the vendored case is much easier to start with :-)

njsmith on 12 Feb 2016

I believe the side by side solution will require admin rights to install into windows system32. Please don't do this.

mikofski on 12 Feb 2016

There are also 'private' side by side assemblies, where the assemblies sit in your own binary tree, but there is a limit of two up-directory path markers you can use to point to the assembly, i.e. you can point to ..\..\some_assembly but not ..\..\..\some_assembly.

So for example, scipy/first/second/third/something.pyd can only point to a side by side assembly in directories third or second or first but not in scipy (or other directories within that.

matthew-brett on 12 Feb 2016

OK, I built some wheels for testing, here:

http://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/

As usual:

pip install -f https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds numpy

Very crude build automation here: https://github.com/matthew-brett/np-wheel-builder

Wheels pass all tests apart from a spurious failure running the f2py script (a bug in that test I believe).

matthew-brett on 12 Feb 2016

Ralf - SSE2 check here : https://github.com/matthew-brett/np-wheel-builder/blob/master/_distributor_init.py

matthew-brett on 12 Feb 2016

I also built 64-bit installers for Pythons 2.7, 3.4, 3.5, at the same web address.

matthew-brett on 12 Feb 2016

@matthew-brett, I don't have permission to access these files.

carlkl on 12 Feb 2016

@matthew-brett, SxS assembly technology isn't used anymore (since VS2010), see https://en.wikipedia.org/wiki/Side-by-side_assembly.

How about adding version numbers to the DLL file names: libopenblaspy_0.15..dll or libatlas_3.10.1.dll or similar. And then use a _proxy DLL_ that is used as forwarder DLL to the versioned DLLs. Numpy and scipy extensions should be build against a proxy DLL called i.e. _libblaslapack.dll_.

If atlas is used this would in principle allow loading of an optimized atlas DLL at runtime. (not necessary if using openblas)

All this could be handled with the help of a clib_openblas and/or clib_atlas package. (Now I have to learn how to generate the code for a forwarder DLL). Numpy itself could be equiped with either atlas or openblas. This should be loaded if neither clib_openblas or clib_atlas is available.

carlkl on 12 Feb 2016

@carlkl: I think the wikipedia page is confusing, and trying to say that VS 2010 doesn't happen to use SxS _for certain libraries_, but SxS in general is certainly used still (e.g. later on that same page: "From Vista onward, the operating system also uses WinSxS for its core components.")

I believe the way you build a forwarder dll with msvc is to write a special .def file and then use it when generating your .dll. But, how does a forwarder dll help? (On osx or Linux I think it can be a useful tool, but on windows you still have that annoying global dll namespace problem.)

njsmith on 12 Feb 2016

@njsmith, we should look for a comprehensible solution. It's true, that SxS sitll exists. It is typically not used anymore for anything else than the operating system itself.

(1) The easiest solution IMHO is to link Blas Lapack statically. This approach creates huge binaries and is therefore not recommended (at least by me).
(2) The second easiest solution is to install the DLL in numpy/core and that's it.
(3) The third solution is to _force_ a dependency to an external Blas/Lapack package, which has be versioned and simply preloads the Blas Lapack DLL. The use of pip ensures that the correct version of the DLL is available.
(3) If such a constrainted dependency is unwelcome it could be augumented with a DLL supplied by numpy and scipy itself. These DLLs should be loaded _only in situations_, where a external DLL isn't installed.That means an external Blas/Lapack package would be prefered but is not strictly necessary.
The big plus of such a solution is, that newer bug-fixed releases of openblas/atlas could be exchanged without reinstalling numpy/scipy.
(4) Using manifests and SxS. @njsmith, can you fill up the details for this case?

carlkl on 12 Feb 2016

Sorry - I fixed the permissions for the wheels - do they work now?

matthew-brett on 12 Feb 2016

Sorry not to get back to you on the SxS assemblies. My 'hopeless' comment on SxS wasn't very useful, I'll try to unpack it.

The question is whether we should be using "private" SxS assemblies, which are SxS assemblies we host in our own binary tree. SxS assemblies can also be "shared". Shared assemblies go into your Windows system folder, and must be installed by an MS installer package. I think that means shared assemblies can't be installed via a wheel, and in any case would need admin permissions, so I think we can reject shared assemblies as an option.

So - what are the problems for using private SxS assemblies?

The first problem is that we'd be blazing a pretty fresh trail if we did try and do this. I don't know of any other open-source project that is using them. I asked Steve Dower about SxS assemblies. Steve works at MS and is current Python Windows maintainer. He suggested that I avoid them. It seemed that no-one he was working with was familiar with them. My notes linked above were an attempt to understand the few instances that he knew of someone (apparently) using them successfully. There are very few good resources to explain them.

Related is the observation Carl has already brought up that MS itself appears to be ambivalent about using them. For example, for the MSVC runtime, an obvious application of SxS assemblies, they use unique DLL names instead (MSVCR90.DLL, MSVCR100.DLL, etc).

~~To use SxS assemblies, I think we'd have to add initialization boilerplate code to every compiled module that needs to load another DLL, in order to make an "activation context".~~ EDIT: Nathaniel reminded me that Windows will automatically trigger a new activation context if it sees evidence of a side-by-side assembly "manifest" associated with the DLL (which can be embedded in the DLL, but also be an external XML file).

So, not hopeless, but tough.

matthew-brett on 12 Feb 2016

I'm sorry for this very basic question, but, in Windows, if I load library foo.dll containing my_symbol in one extension module, what happens if I load library bar.dll, also containing my_symbol, in another extension module? I'm assuming they are separately accessible in my process, so the first extension would get foo: my_symbol and the second extension will get bar:my_symbol? Can anyone point me to a reference?

If that's right, then surely all we would need, to avoid DLL hell, is to have a DLL name that was very unlikely to be used by accident in the same process (where the user did not intend to use our exact DLL).

matthew-brett on 12 Feb 2016

During linking each symbol is bound to a specific DLL identified by its name. During runtime one has to ensure that the correct DLL is loaded if more than one DLL with identical name can be found. Therefore the search order matters.
Example My anaconda.org numpy wheels uses the openblas library with the name libopenblas_py_.dll to avoid a name clash with a non-standard libopenblas,dll used by Julia.

carlkl on 12 Feb 2016

Recent versions of julia now use a different name libopenblas64_ to reflect the non standard ABI we build with. On 32 bit we don't rename any symbols or the library name since there is not much of a reason to choose 64 bit ints in the interface there.

The name shadowing of symbols within shared libraries was actually more of a problem on linux and osx than windows, but we did the same thing everywhere for consistency.

tkelman on 12 Feb 2016

Though that doesn't rule out the possibility on 32 bit where the ABI's are the same that we couldn't manage to break each other in other ways, like needing too old or too new of a version for the other.

tkelman on 13 Feb 2016

I polished the build process a bit - see https://github.com/matthew-brett/np-wheel-builder

Now the process is reasonably automated, I believe it is practical to keep building these wheels over the next few releases if we have to. I'm happy to do this as the Windows release manager until mingwpy gets up to spec.

I have tested these wheels on 32-bit and 64-bit Python 2.7, 3.4, 3.5, and a few others have tested too, so I think they are in good shape.

Is there anything else I can do to reassure y'all that these are worth putting up on pypi, as the OP asked?

matthew-brett on 14 Feb 2016

Hello, all! I just wanted to jump into this discussion because I've been frustrated by my inability to install numpy and scipy from source for quite some time, so it certainly is beneficial for me to read about what's going on on this front.

@matthew-brett : This automation script is awesome. Even if it doesn't quite make it to PyPI, this does seem like a very viable way to build numpy from source (see this issue I opened here). It's also extremely close to being able to build scipy in that I can build everything, but then the tests seem to cause a segfault somewhere in Python.

Also, for anyone who actually has been building numpy wheels, I have been trying to put together a polished, up-to-date documentation about building these libraries from source to replace what's currently online, so I would greatly appreciate people's input on that front too!

gfyoung on 14 Feb 2016

Thank you for the feedback - and your work on documenting the build - that would be very useful.

I guess you saw http://mingwpy.github.io - there's a fair amount of stuff there, of course, specific to the mingw-w64 project and the mingwpy toolchain.

matthew-brett on 14 Feb 2016

Thank you @matthew-brett! It passes numpy.test(). The f2py.py test was an issue in test_scripts() with virtualenvs which was fixed in numpy-SHAd3d2f8e, but I do get 3 warnings, 2 deprecation and 1 runtime.

One last, hopefully minor request, is it possible to display a build badge on your repo np-wheel-builder and/or PyPI? Looks like buildbot 0.8 has them, and there's a even a python package/repo to make them look nice, BuildbotEightStatusShields-0.1.

Also, I am curious, I have been scared away from the ATLAS Windows 64-bit build because of the lack of tuning parameters. Did it actually "take all day" or is there a proper set of architectural defaults?

mikofski on 14 Feb 2016

FYI: Continuum just released Anaconda with optimized mkl numpy. I think they have been monitoring this thread.

mikofski on 14 Feb 2016

Now for scipy builds with same atlas libs. Does it required gfortran?

mikofski on 14 Feb 2016

Yes. Otherwise, you won't be able to compile any of the .f files in scipy. Good luck with this! As I said earlier, I got _really close_, but if you're able to get through with tests passing, that would be great!

gfyoung on 14 Feb 2016

Yes, I'm afraid the ATLAS build took about 8 hours on a machine doing nothing else. The ATLAS build script is in the np-wheel-builder repo.

matthew-brett on 14 Feb 2016

Regarding the MKL news, that's great if you're a conda user, though I think using a Python distribution with numpy and scipy pre-installed is something that has been encouraged for some time. Talk to me when you can get the MKL libraries themselves for free too. :)

gfyoung on 14 Feb 2016

For building with gfortran - I think mingwpy is our best hope.

matthew-brett on 14 Feb 2016

@matthew-brett : Thanks for taking the time to build ATLAS! I tried running your script before, and I kept running into issues, probably due to machine-specific incompatibilities.

gfyoung on 14 Feb 2016

Sorry about the issues. I just built the ATLAS binaries in the np-wheel-builder repo, it was on a fresh install of Windows Server 2012, and 64-bit Cygwin, with the exact ATLAS and lapack versions listed. The source archives I used are up at http://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/. If you have another version of ATLAS, it could easily get hairy.

matthew-brett on 14 Feb 2016

Hmmm...that probably is the case. Again, much appreciated given the effort it took you guys to do so. If you guys are able to find a way to roll out Windows-compatible ATLAS builds that don't require as much time and resources as it does now, that would be great!

gfyoung on 14 Feb 2016

@gfyoung

Talk to me when you can get the MKL libraries themselves for free too. :)

See https://software.intel.com/sites/campaigns/nest/ and https://registrationcenter.intel.com/en/forms/?productid=2558&licensetype=2 - or did you mean source?

tkelman on 14 Feb 2016

@tkelman, just saw it on @carlk new mingwpy project site but Intel community license Nest has no ifort, and without that how scipy?

mikofski on 14 Feb 2016

@tkelman : Whoops, not sure why I had forgotten about that community licensing. However, @tkelman does bring up a valid point.

@tkelman : You could give it a shot with the MinGW, but from what I experienced, that doesn't work unfortunately. It won't even get you past numpy due to compatibility issues.

gfyoung on 14 Feb 2016

@mikofski right, doesn't help for scipy given the lack of compilers. Only options today for scipy builds are going to be mingwpy, or the all-gcc-all-the-time MSYS2 build of Python (https://github.com/Alexpux/MINGW-packages/tree/master/mingw-w64-python-scipy). The latter of course won't be compatible with msvc-built cpython or pypi binaries so it's not going to address all modules beyond scipy.

tkelman on 14 Feb 2016

@matthew-brett: what's the speed deficit for these ATLAS wheels versus openblas and/or MKL?

njsmith on 14 Feb 2016

Has any one looked into PGI Fortran. It's not mentioned on @carkl mingwpy project site. I tried to use it once, went pretty far down that rabbit hole but I can't remember what the show stopper was. I think the license is permissive even though it's closed source. Maybe PGI Fortran will play nicer with msvc?

mikofski on 14 Feb 2016

@mikofski: I don't have it in front of me, but when I looked at PGI last year I remember my conclusion was that it was even worse than Intel (in terms of forcing you to add FOSS-incompatible restrictions to your licensing).

njsmith on 14 Feb 2016

Okay, maybe some num focus funds can be targeted at a BLIS/FLAME solution for x86 architectures?

mikofski on 14 Feb 2016

Apparently Nvidia/PGI will be contributing their Fortran front end as open source to LLVM by the end of this year. https://www.llnl.gov/news/nnsa-national-labs-team-nvidia-develop-open-source-fortran-compiler-technology

tkelman on 14 Feb 2016

Okay, maybe some num focus funds can be targeted at a BLIS/FLAME solution for x86 architectures?

Don't think so. BLIS looks like a very unhealthy project (and libflame even more so); little activity in terms of commits, mailing list traffic, etc. Plus they've had significant funding (https://github.com/flame/blis#funding), so it's not like a few thousand dollars are going to magically make those projects mature.

I don't quite see where this discussion is coming from or going: we have a stopgap solution that Matthew has almost completed (using ATLAS), and more importantly we have a long-term solution that is being worked on very actively (MingwPy + OpenBLAS). Furthermore OpenBLAS is much more widely used; use of that project in both the Scipy stack and in Julia should further mature it faster.

rgommers on 14 Feb 2016

@rgommers : The conversation went where it went because @mikofski and I both were attempting to use @matthew-brett solution to build scipy. However, it seems that both of us are running up against the same problem: the Fortran compiler. I myself have attempted to use the installed gfortran.exe for both MinGW32 and MinGW64 without much success due to tons of unresolved externals for some reason or another.

gfyoung on 14 Feb 2016

@gfyoung Matthew's build uses MSVC. There's no point trying to use gfortran with MSVC, it's known to not work. The summary of the build situation is:

No Fortran, then you can use MSVC now.
With Fortran, you can use one of MingwPy, MSVC + ifort, or icc + ifort.
For the Scipy stack, we want a free solution that builds wheels for numpy, scipy, etc. For that, MingwPy it is.

rgommers on 14 Feb 2016

@rgommers I'm sorry for derailing the conversation. You are quite right, @matthew-brett solution for numpy works, and the mingwpy project by @carlk is already funded by num focus. I will try to see if I can get my company to support it. I am already a num focus member. About halfway through scipy 2829 and I guess I've come to the same conclusion. I just hope it works. In the short term we'll continue to use @cgohlke or switch to anaconda. Thanks again!

mikofski on 14 Feb 2016

Other than pushing builds to pypi, maybe one last issue for @matthew-brett is a buildbot shield on his np build scripts repo? Thanks! Then this can be closed?

mikofski on 14 Feb 2016

Before this is closed, quick question: I built @matthew-brett numpy so that it points to ATLAS. However, when I attempt to build scipy using ifort, it also picks up my other site.cfg file that uses MKL located in my home directory. I am actually able to build successfully against numpy, and tests pass save for a couple of errors due to minute rounding errors. However, I am curious, what did scipy do when I built it? Did it use the MKL libraries or did it attempt to use the ATLAS libraries already built with numpy?

gfyoung on 14 Feb 2016

There's a summary of Windows Fortran compilers in https://github.com/numpy/numpy/wiki/Numerical-software-on-Windows

matthew-brett on 14 Feb 2016

@gfyoung - just going by a combination of guessing and distant memory - I believe that scipy will pick up first the site.cfg in its own directory, and if that is missing, will pick up the configuration of the numpy build. This in turn will point to wherever the libraries where, when I built the wheels. So you'd need to rewrite the site.cfg for scipy to pick up the np-wheel-builder atlas libraries - the build_numpy.py script does that for the numpy build.

matthew-brett on 14 Feb 2016

BLIS looks like a very unhealthy project (and libflame even more so); little activity in terms of commits, mailing list traffic, etc.

I'm not sure I'd call them unhealthy, because they aren't trying to be community run FOSS projects; they're essentially a one-person show, and they like it that way (for now at least). I've been in contact with them on and off for the last ~year, and the good news is that the current focus of their efforts is exactly on the things we need (runtime kernel selection and runtime threading configuration); the bad news is that there's not much to do except wait for the one architect to rearrange things to his liking. Maybe 6 months will see some results?

njsmith on 14 Feb 2016

It sounds like BLIS etc are a fairly distant option at this point, and that we will have to plan for the case where it doesn't work out.

matthew-brett on 14 Feb 2016

Nathaniel - any suggestions on where to get good benchmarks? I don't think numpy.bench() does anything anymore. I tried running asv, but many tests fail because Windows numpy does not have complex256.

matthew-brett on 14 Feb 2016

I guess the parts of asv that work are useful? Or even %timeit np.dot(big_array, other_big_array) would be useful to get at least some crude idea where we stand :-)

njsmith on 14 Feb 2016

Also BTW, here's a general solution to the Windows DLL global namespace problem, allowing us to write a Windows delocate: https://github.com/njsmith/redll

njsmith on 14 Feb 2016

Unfortunately the asv complex256 failure breaks whole sequences of tests across dtypes. I guess it wouldn't be too hard to fix though.

Simple testing with this:

def test_dot():
    """
    Test the dot product
    """
    i = 1000
    a = random((i, i))
    b = numpy.linalg.inv(a)
    result = numpy.dot(a, b) - numpy.eye(i)

suggests that, as Clint Whaley has warned before - 64 bit ATLAS is not well-optimized on Windows. With 64-bit MKL via Christoph Gohlke's wheels:

In [9]: %timeit test_dot()
1 loop, best of 3: 764 ms per loop

With my wheels, built with 64-bit ATLAS:

In [10]: %timeit test_dot()
1 loop, best of 3: 2.41 s per loop

The difference is a lot less with the 32-bit wheels (on a different, 32-bit machine). MKL:

In [3]: %timeit test_dot()
1 loop, best of 3: 663 ms per loop

vs ATLAS:

In [4]: %timeit test_dot()
1 loop, best of 3: 1 s per loop

@rcwhaley - Cc'ing you in, in case you had some thoughts here. This is ATLAS 3.10.1 ...

matthew-brett on 14 Feb 2016

Here's another Windows 64-bit machine with a more modern processor - also shows ~3x slowdown.

MKL:

In [3]: %timeit test_dot()
1 loop, best of 3: 400 ms per loop

ATLAS:

In [3]: %timeit test_dot()
1 loop, best of 3: 1.28 s per loop

matthew-brett on 14 Feb 2016

Yup, complex 256 problem not hard to fix : https://github.com/numpy/numpy/pull/7251

matthew-brett on 15 Feb 2016

3x is a lot, but not nearly as dramatic as with lapack_lite right? I think it's OK for a short-term solution. And it's not like the old 32-bit .exe installers were any better.

rgommers on 15 Feb 2016

Also BTW, here's a general solution to the Windows DLL global namespace problem, allowing us to write a Windows delocate: https://github.com/njsmith/redll

nice license statement:)

rgommers on 15 Feb 2016

@gfyoung 'site.cfg' is looked for in:

1) Directory of main setup.py file being run.
2) Home directory of user running the setup.py file as ~/.numpy-site.cfg
3) System wide directory (location of this file...)

rgommers on 15 Feb 2016

@rgommers I'm sorry for derailing the conversation.

No worries, nothing was derailed.

You are quite right, @matthew-brett solution for numpy works, and the mingwpy project by @carlk is already funded by num focus. I will try to see if I can get my company to support it. I am already a num focus member. About halfway through scipy 2829 and I guess I've come to the same conclusion. I just hope it works. In the short term we'll continue to use @cgohlke or switch to anaconda. Thanks again!

Cool. And good to see you're interested in MingwPy. Note that it does have its own ML now, which may be of interest: https://groups.google.com/forum/#!forum/mingwpy

rgommers on 15 Feb 2016

@rgommers , @matthew-brett : Ah, yes, it does seem like it was building with MKL beforehand. I directly pointed my site.cfg to the ATLAS build, and scipy builds but segfaults during the tests. So close!

gfyoung on 15 Feb 2016

@rgommers - yes - performance is much worse without ATLAS (with lapack_lite):

In [2]: %timeit test_dot()
1 loop, best of 3: 17.7 s per loop

matthew-brett on 15 Feb 2016

I guess the remaining question here is whether it is worth standardizing to an OpenBLAS numpy (with all numpy tests passing), accepting the risk that this will be more likely to cause numerical errors in projects using numpy.

One argument for doing this would be that it looks like we will have to go this direction in the short / medium term, and it might be better to start now and commit ourselves to the miserable bug hunts that that will entail. At least we'll be in the good company of the Julia maintainers.

matthew-brett on 15 Feb 2016

Numpy also has a fairly different set of risk tolerance vs performance tradeoffs, and ratio of users to developers, than Julia does. So I think it might make a lot of sense for numpy to take a more conservative approach and go with slow but reliable as a default, working towards allowing openblas as a non-default opt in choice. Though those 8 hour build times do not sound fun, no wonder no one has been asking us about using Atlas with Julia.

tkelman on 15 Feb 2016

working towards allowing openblas as a non-default opt in choice

The problem is that I'm not really sure how this process could work :-/. We don't have any good way to distribute alternate builds to users (in the long run I'm hoping we can get build variants on pypi as numpy[openblas] and so forth, but that won't happen anytime soon), we don't have any way to improve the openblas builds except distributing them and waiting for bug reports, and the main alternative to ATLAS builds for people who are motivated to seek one out won't be openblas builds, it'll be MKL builds from some third party :-/.

I guess another option to put on the table would be to distribute BLIS builds using their reference/SSE2 kernel. Because BLIS still only has build time configuration this won't be competitive with openblas, but it might be competitive with ATLAS, and the benefits versus ATLAS are that the build time is _much_ quicker, and the chance of it being a good long term solution are hard to estimate but certainly better than ATLAS being a good long term solution (which I would put at zero). If we're going to be QAing something anyway then at least we'd be directing that energy at something that _might_ have a future.

Some questions that would need answering before seriously considering this option:

(1) I'm not sure whether or not BLIS's multithreading support is competitive with ATLAS's (I know there are some multi-threading options in the source, and I know that the main developer doesn't consider it to be "done" yet, I.e. competitive with MKL, but there's a lot of room between ATLAS and MKL.)

(2) for that matter, I also have no idea how BLIS in an untuned mode fares on those benchmarks above.

(3) I've not actually tried to build BLIS on windows, and there's the problem to deal with that it's just a BLAS, not a LAPACK -- not sure how much of an issue this is for numpy.

njsmith on 15 Feb 2016

How responsive is BLIS to bug reports? Openblas seems to be pretty good
about this.

On Mon, Feb 15, 2016 at 3:48 PM, Nathaniel J. Smith <
[email protected]> wrote:

working towards allowing openblas as a non-default opt in choice

The problem is that I'm not really sure how this process could work :-/.
We don't have any good way to distribute alternate builds to users (in the
long run I'm hoping we can get build variants on pypi as numpy[openblas]
and so forth, but that won't happen anytime soon), we don't have any way to
improve the openblas builds except distributing them and waiting for bug
reports, and the main alternative to ATLAS builds for people who are
motivated to seek one out won't be openblas builds, it'll be MKL builds
from some third party :-/.

I guess another option to put on the table would be to distribute BLIS
builds using their reference/SSE2 kernel. Because BLIS still only has build
time configuration this won't be competitive with openblas, but it might be
competitive with ATLAS, and the benefits versus ATLAS are that the build
time is _much_ quicker, and the chance of it being a good long term
solution are hard to estimate but certainly better than ATLAS being a good
long term solution (which I would put at zero). If we're going to be QAing
something anyway then at least we'd be directing that energy at something
that _might_ have a future.

Some questions that would need answering before seriously considering this
option:

(1) I'm not sure whether or not BLIS's multithreading support is
competitive with ATLAS's (I know there are some multi-threading options in
the source, and I know that the main developer doesn't consider it to be
"done" yet, I.e. competitive with MKL, but there's a lot of room between
ATLAS and MKL.)

(2) for that matter, I also have no idea how BLIS in an untuned mode fares
on those benchmarks above.

(3) I've not actually tried to build BLIS on windows, and there's the
problem to deal with that it's just a BLAS, not a LAPACK -- not sure how
much of an issue this is for numpy.

—
Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/issues/5479#issuecomment-184387401.

ewmoore on 15 Feb 2016

I believe libflame is the lapack equivalent in blis. There is a lapack2flame compatibility interface described in the reference docs.

mikofski on 15 Feb 2016

How responsive is BLIS to bug reports?

We don't know yet.

njsmith on 15 Feb 2016

Without having tried BLIS, I think it sounds like madness to go and ship numpy binaries built against what's basically a low-activity one-man project that very few people use.

I haven't seen a good reason in this thread yet to deviate from the MingwPy + OpenBLAS plan. No-scipy-ATLAS-MSVC-binaries are a nice to have stopgap, but less important than the medium/long-term MingwPy solution and if the stopgap turns into a major effort in itself, then I'd say it's not worth the effort.

rgommers on 15 Feb 2016

The BLIS / libflame documents suggest that, if I was going to try and build a full BLAS / LAPACK library on Windows, it would be a lonely path.

I'm happy to do that once the developers agree that that should work and is supported.

ATLAS has been the default library on Linux for a long time. It doesn't seem unreasonable to imagine that might be the case for Windows BSD-compatible builds for a while.

@tkelman - thanks for your analysis - I think you're right, that numpy must concentrate on correctness. But, it would be good to join forces to lean on some of the more exhausting OpenBLAS bugs and develop more comprehensive tests. This OpenBLAS bug comes to mind - somewhat obscure, very hard to debug.

matthew-brett on 15 Feb 2016

I believe for this particular issue, that of providing numpy wheels on pypi so that a casual user of package "x" that depends on "y" (eg: matplotlib) that depends on numpy will install using pip, not causing casual user to throw their arms up and say something like, "Python is too difficult." and go back to MATLAB. The Zen of python says there should be one obvious way to do it. That said, anything on pypi by numpy in particular carries a certain weight that it _is_ stable, or more so than random side project, with possible the exception of cgohlke. Obviously enthought and anaconda are perceived at least in industry as more stable.

I think in the short term the ATLAS build should go up with the caveat message that it is not possible to build with scipy. If this buildbot can be automated then that's done, right? Future 8hr ATLAS builds should be hopefully rare. Perhaps one day the windows 64bit issue will be solved. The SSE2 exception issue is a bummer, so another warning message on pypi. Also ATLAS is the standard on linux already and was the standard in the previous superpack bdist_winst packages which lends this path forward even more support.

Then for the near future, you have already decided on mingwpy. Here there are many options which don't have to be solved right now.

In the long term I am excited that blis/flame is the future. It is a bit scary that many of our mathematical tools depend on FORTRAN code from the 70's. A C only solution is a major break through and imo something to support enthusiastically.

But more is always better for experienced developers so keeping the documentation alive for non standard options is good too if such experienced developers have the time and inclination to build and test then.

mikofski on 15 Feb 2016

If you don't try to use one of the optimized kernels in blis, then you probably won't hit the issue (edit: singular) I've had open there since 2014. I think the build system only uses symlinks for the optimized kernels, so you won't confuse msys2's git if you were to try to build just the reference configuration there. Building from cygwin worked last I tried, though it was some time ago and I can't recall what I may have needed to modify locally. It's worth building, testing and benchmarking if the alternative is Atlas, but consider it unproven and therefore high risk in its own way until you do.

@mikofski to be fair Lapack is from the 90's, it's really the Fortran elephant in the room.

tkelman on 15 Feb 2016

@tkelman: to be clear, the issues you filed were specifically with the windows-native build system, right? Out of curiosity I just tried cross-compiling blis for windows from linux (using the mingw-w64 cross-compiler installed from debian packages), and I was surprised to find that it only took ~2 minutes. I did "./configure reference; make -j4 CC=x86_64-w64-mingw32-gcc AR=x86_64-w64-mingw32-ar CPICFLAGS=" and everything just worked. (CPICFLAGS= is just to suppress a bunch of warnings about "ignoring -fPIC, because that's the default", and probably I didn't even need to override AR, but hey why not.) Got a few warnings about printfs in bli_pool.c and bli_fprintm.c that use %ld to print intptr integers, so probably there are a few LLP64 kinks to work out.

@rgommers:

Without having tried BLIS, I think it sounds like madness to go and ship numpy binaries built against what's basically a low-activity one-man project that very few people use.

You're absolutely right! The problem is all our options are terrible :-(.

So obviously MKL has a definitely-bad license.

ATLAS has definitely-bad performance that will never improve.

And OpenBLAS, I think we have the evidence to say at this point, is just not maintainable and is not likely to become so soon :-(. The project is five years old, it still has fundamentally broken stuff like Julian's example of random volatiles and the threading code that doesn't use mutexes, there's no apparent interest upstream in fixing this stuff, and it's still the case that as soon as we post binaries to a small group (numpy-discussion) for testing, we immediately get back 2-3 utterly mysterious hard-or-impossible-to-reproduce crashes and incorrect-results bugs. And the BLIS paper makes a reasonably compelling case that this has to do with limitations of the basic GotoBLAS architecture, rather than something that can be easily fixed by just spending a bit more time polishing stuff. (Basically the way GotoBLAS is put together requires lots of new tricky ASM code every time a new microarchitecture is released, so the ratio of time spent adding new fragile code / time spent stabilizing the old fragile code does not work in your favor. Notice that the most reproducible bug in the numpy-discussion list is one that we suspect lies in OpenBLAS's Core2 kernel, which is a microarchitecture that was discontinued 5 years ago.)

So the reason I keep bringing up BLIS is not that I think BLIS is definitely the solution, but as a kind of calculated optimism: BLIS _might_ become as fast as MKL/OpenBLAS, as reliable as ATLAS/MKL, and as open to community-contributions as OpenBLAS; or then again, it might not. But there don't seem to be any other projects that have a real hope of hitting all of those criteria. [And this isn't even mentioning the other advantages, like the fact that it can support strided arrays natively; it's not unimaginable we might be able to delete all our awful special-case BLAS dispatch code.]

IIUC, GotoBLAS was maintained by a single full-time developer (Kazushige Goto) working at UT Austin with Robert van de Geijn as PI. BLIS is maintained by a single full-time developer (Field G. Van-Zee) working at UT Austin with Robert van de Geijn as PI. So it's not like this can't work :-) But yeah, it's not going to just magically happen if we wait -- if there's ever going to be a community of developers around it then it's going to be because some community showed up their front lawn with tents like "hey, here we are, we're moving in and making this work for us, hope you don't mind". And what we really need to know to determine its long-term viability is, like, "how reliable is it really" and "how amenable to patches are they" and stuff, which we can't know unless we start testing it and submitting patches and so forth.

In conclusion: I seriously dunno what our best option is, but sticking our toes in the BLIS water seems like a good idea; even if we decide that we want to wait then we'll at least learn something.

njsmith on 16 Feb 2016

👍1

I filed several issues and one ~~or two~~ PR. The fact that there are symlinks in the repository means building from msys2 is broken (or only works if you set msys2 options in a specific way). Cross building from cygwin or linux (I wouldn't trust wine to run the tests though) should work but had issues in 2014 with aligned malloc, and the sandy bridge kernels segfaulted in a test. I just rebuilt the sandy bridge kernels on latest master of blis with a cygwin cross (on a newer skylake laptop) and the segfault may be gone now. Who knows when or what fixed it, would have to bisect.

tkelman on 16 Feb 2016

I think this has been mentioned before, but we could build ATLAS binaries for SSE2, SSE3, AVX, and put them into a directory structure like:

numpy/.lib/sse2/numpy-atlas.dll
numpy/.lib/sse3/numpy-atlas.dll
numpy/.lib/avx/numpy-atlas.dll

We could then use the numpy/_distributor_init.py to check the current CPU and pre-load the matching library.

matthew-brett on 16 Feb 2016

I suggested doing basically the same thing, but for blis instead of atlas, to @njsmith. It's also worth comparing how well the threading in blis works vs atlas. The blis reference configuration does not enable threading by default, though tweaking a define in a header file should be all that's needed to switch that.

tkelman on 16 Feb 2016

I set up Appveyor to build the binaries. The current iteration of the build is churning away here : https://ci.appveyor.com/project/matthew-brett/np-wheel-builder/build/1.0.10

Built wheels arrive here: https://84c1a9a06db6836f5a98-38dee5dca2544308e91131f21428d924.ssl.cf2.rackcdn.com

Any further kinks in the Appveyor build should be easy to iron out, so I think these wheels are ready to be uploaded to pypi when that's done, presumably tomorrow sometime.

matthew-brett on 16 Feb 2016

@rgommers , @matthew-brett : Regarding site.cfg, it seems that your response applies only to numpy. It seems that scipy does not search for site.cfg in the same directory as setup.py only starts searching for site.cfg first in your home directory before defaulting the numpy config.

gfyoung on 16 Feb 2016

OK - build script running without error, including tests of installed wheel: https://ci.appveyor.com/project/matthew-brett/np-wheel-builder/build/1.0.10

Wheels here: http://58688808cd85529d4031-38dee5dca2544308e91131f21428d924.r12.cf2.rackcdn.com/

I've installed and tested them on another 64 bit machine and another 32-bit machine.

So, I think these are ready to go. Any objection to me uploading these to pypi?

matthew-brett on 16 Feb 2016

It might be a good idea to have a note then on pypi explaining/linking to an explanation of the difference between these wheels and the ones by gohlke (mkl) to preempt confusion by people wondering why the wheels appear now on pypi and what the difference between them are.

A side question, sorry, but I was wondering what

  # Pin wheel to 0.26 to avoid Windows ABI tag for built wheel
  - pip install wheel==0.26

in the appveyor script means?

matham on 16 Feb 2016

Good suggestion about the explanation - I will try and work out how to add that for this existing release.

Wheel > 0.26 adds an extra ABI tag to the Windows wheel. Wheel==0.26 gives a wheel name like this:

numpy-1.10.4-cp27-none-win32.whl

With Wheel > 0.26, you get an extra ABI tag, like this:

numpy-1.10.4-cp27-cp27m-win32.whl

(I think) - which specifies the Windows ABI. This is annoying because earlier pip won't install these guys, so it seems to me the no-ABI name is better for now.

matthew-brett on 16 Feb 2016

OK - I propose to add this text to the current pypi page:

All numpy wheels distributed from pypi are BSD licensed.

Windows wheels are linked against the ATLAS BLAS / LAPACK library, restricted to SSE2 instructions, so may not give optimal linear algebra performance for your machine. See http://docs.scipy.org/doc/numpy/user/install.html for alternatives.

matthew-brett on 17 Feb 2016

I'd say differently:

These Windows wheels have suboptimal linear algebra performance (link to benchmark like http://speed.python.org), because they are linked against the ATLAS BLAS / LAPACK library, which are restricted to SSE2 instructions (and which non-restricted instruction should be there?). If you need performance, you may support the mingwpy project which is aimed to bring more performance to Python extensions compiled on this platform. See ??? for details and http://docs.scipy.org/doc/numpy/user/install.html for alternatives.

techtonik on 17 Feb 2016

Well - mingwpy's current numpy / scipy versions do use openblas, but I think that's unrelated to mingwpy vs MSVC as a compiler. We could also ship openblas with these wheels, but I was worried that openblas was not yet reliable enough to use in a standard wheel that we support.

matthew-brett on 17 Feb 2016

OpenBlas seems stable enough, I know Anaconda uses it for their Linux
builds now. There aren't any updated Windows Python 3.5 x64 builds out
there, benchmarks show it is about equal to MKL. I'd definitely try it if
someone could put a wheel together.
On Feb 16, 2016 10:36 PM, "Matthew Brett" [email protected] wrote:

Well - mingwpy's current numpy / scipy versions do use openblas, but I
think that's unrelated to mingwpy vs MSVC as a compiler. We could also ship
openblas with these wheels, but I was worried that openblas was not yet
reliable enough to use in a standard wheel that we support.

—
Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/issues/5479#issuecomment-185017546.

mrslezak on 17 Feb 2016

Ok. I am just confused about the source of suboptimal performance - I don't use those BLAS libraries and don't know what they do and what's the difference, so explaining these options for mortals helps to become.. erm, more scientific, you know. =) I thought that absence of open compiler with optimal performance is the problem.

techtonik on 17 Feb 2016

@mrslezak : regarding OpenBLAS, I can certainly agree. The provided OpenBLAS package on Cygwin, coupled with the Lapack package, appears to capable of building both NumPy and SciPy without problems.

gfyoung on 17 Feb 2016

@mrslezak : where can I find information about the benchmarks? I'm trying to write documentation on building source from Windows for scipy.org, and that would be a great reference for anyone who needs performance with these libraries.

gfyoung on 17 Feb 2016

Maybe the shotgun approach is the right idea? Something like:

Stable: ATLAS with performance, sse2 caveats
Dev: OpenBLAS see mingwpy and binstar
Alt: MKL @cgohlke, MKL @continuum and @enthought
Caveat: binaries are not compatible.
Links for more info at scipy and Matthew Brett's github numpy wiki

mikofski on 17 Feb 2016

@techtonik I would expect GCC to perform somewhat worse than MSVC or ICC on equivalent code that all of those compilers are capable of building. The issue is the lack of a free (python.org-cpython-compatible) compiler that can build a competitive version of Lapack, which is in Fortran (SciPy also has other Fortran components). The pure BLAS part of OpenBLAS (and probably Atlas too) can actually be built with MSVC, but MSVC can't build any of the pieces that require inline assembly so it won't be competitive either.

I don't have a 64 bit MKL handy (I may have a 32 bit one from conda somewhere if I go digging), but here are some benchmarks run in Julia comparing the Atlas dll that @matthew-brett built against reference and sandy-bridge configurations of BLIS, and the OpenBLAS build that comes with Julia https://gist.github.com/54da587b01b7fb163103

Summary: openblas (on a skylake, newest openblas kernel is haswell) is 23x faster than atlas, 44x faster than reference blis, and 5.5x faster than sandybridge blis. I might try haswell blis to see how much closer it is.

tkelman on 17 Feb 2016

Hum - I don't suppose you happen to have build scripts lying around for your BLIS compilations?

Do you think it would be worth making BLIS builds for a range of processors and selecting one at run-time? Is there a small subset of processors that would capture most of the performance for most processors?

matthew-brett on 17 Feb 2016

It's in the comments, but here (run in cygwin 64)

cd build
for i in reference dunnington sandybridge haswell bulldozer piledriver carrizo; do
  mkdir -p ../build64$i
  cd ../build64$i
  ../configure $i
  cp /usr/x86_64-w64-mingw32/sys-root/mingw/bin/lib* .
  make -j8 all test CC=x86_64-w64-mingw32-gcc CPICFLAGS="" BLIS_ENABLE_DYNAMIC_BUILD=yes
done

Here's what they've got available: https://github.com/flame/blis/tree/master/config

In terms of Intel x86, the reference, dunnington, sandybridge and haswell would cover a pretty good range. Also bulldozer, piledriver, and carrizo for AMD (which has recently stopped developing ACML in favor of BLIS, so that's a vote in favor at least).

There's some auto-detection code in https://github.com/flame/blis/tree/master/build/auto-detect that might be reusable (it currently only runs at configure time in BLIS, but that doesn't mean it couldn't be reused for other purposes), depending on whether there is already a piece of cpu family identification code in Python lying around that you want to use.

tkelman on 17 Feb 2016

depending on whether there is already a piece of cpu family identification code in Python lying around

Does this help? http://stackoverflow.com/a/35154827/239247

techtonik on 17 Feb 2016

You mostly want the processor family which is derived from that, but https://github.com/flame/blis/blob/master/build/auto-detect/cpuid_x86.c isn't exactly long or complicated. The numexpr source that is linked from SO there is doing regex matching on the string output (at least on linux), and doesn't look like it has many recent architectures listed.

openblas is 3.4x faster than Haswell blis, and 17x faster than dunnington (basically the same as ~~nehalem~~ penryn I think) blis. What's interesting is I don't think the multithreading is working in blis on these runs. The default setup enables openmp for sandybridge and haswell, maybe the mingw pthreads would work better. Setting OMP_NUM_THREADS didn't seem to make much difference.

tkelman on 17 Feb 2016

I believe that ATLAS 3.11 should do a lot better on 64 bit than the 3.10 version, but I can't build it at the moment, hoping for some help from Clint Whaley.

Tony - I don't suppose you have time / energy to test the 32-bit ATLAS wheel? It should do a lot better, relatively.

My own preference is to go ahead with these ATLAS wheels, so other packagers can depend on us shipping some sort of wheel. If we work out a good way of improving performance, we have a new numpy release coming up soon, and even for 1.10.4 we can always do a maintenance release to update the wheels.

matthew-brett on 17 Feb 2016

@matthew-brett : quick question, why might numpy not be able to detect the ATLAS builds on Cygwin? I was able to detect them perfectly fine in a native Windows environment, but when I tried running your script in Cygwin, numpy didn't compile with ATLAS.

gfyoung on 17 Feb 2016

If you're using Cygwin's python, then you'd likely need a cygwin-built version of atlas for things to be compatible.

32 bit Julia seems to be failing to dlopen the 32 bit atlas dll. Not sure why, maybe because we already have a 32 bit openblas and the symbol names are conflicting?

tkelman on 17 Feb 2016

But @matthew-brett version is built with Cygwin, and that's why I'm confused.

gfyoung on 17 Feb 2016

Cygwin build environment, cross-compiled to a mingw library. See how it's linked to msvcrt.dll rather than cygwin1.dll?

atlas-depwalker

tkelman on 17 Feb 2016

As soon as I posted the comment, that's what I suddenly suspected might be the case. Alas, it looks like I'll have to build it from scratch. Thanks @tkelman !

gfyoung on 17 Feb 2016

dlopen issue figured out (ref https://github.com/matthew-brett/np-wheel-builder/pull/1, and https://github.com/JuliaLang/julia/issues/15117 was hiding the useful version of the error message).

On 32 bit, atlas is 3.6 times slower than openblas. 32 bit openblas is 3x slower than 64 bit openblas for the same size problem. The newest few kernel families are not enabled in openblas on 32 bit systems.

tkelman on 17 Feb 2016

...
In conclusion: I seriously dunno what our best option is, but sticking our toes in the BLIS water seems like a good idea; even if we decide that we want to wait then we'll at least learn something.

That's probably useful, at least some testing/benchmarking. But at this point it's pretty much unrelated to our _Windows_ issues. BLIS is Linux-only at the moment; there's an open PR for OSX build support, and Windows is very far off. And worse, I tried it yesterday on 32-bit Linux and even that doesn't work. ./configure auto && make crashes horribly on some assembler code (for sandybridge). I can only build reference.

So I think step 0 is to add support for BLIS in numpy.distutils (got that mostly working already), step 1 to test on Linux to see that at least reference works, step 2 some benchmarking, ..., step something on Windows.

rgommers on 17 Feb 2016

@matthew-brett your proposed text for PyPI seems fine to me. Which pip versions ignore the name with ABI tag? Pip nags you a lot to upgrade itself these days, so I'd expect a lot of people to have the latest version. And versions >1(.5) years old didn't even install wheels at all by default.

rgommers on 17 Feb 2016

@rgommers my tests above were on windows. Not MSVC, but mingwpy or openblas won't do much different there - clang probably would work but needs repo reorganization in blis to avoid symlinks.

I didn't run Julia or numpy's tests against blis, but blis was passing its own unit tests. Things went much better than my experience from 2014 led me to think they would. Still need to figure out how to get multithreading to work properly, but with that you might have blis already performance competitive.

Does appear that the reference config is the only thing in blis that works for 32 bit x86 right now. ~~That would require writing new assemby microkernels I believe~~ maybe not, see njsmith's comments below.

tkelman on 17 Feb 2016

@tkelman, concerning OpenBLAS kernels for 32 bit https://github.com/numpy/numpy/issues/5479#issuecomment-185096062: according to a priv. message I got from Werner Saar some time ago there is nobody working on Intel 32 bit kernels for newer architectures. So this is a fact that is unlikely to be changed in the future. The focus is on Intel 64bit and ARM processors.

@tkelman, concerning C-runtime https://github.com/numpy/numpy/issues/5479#issuecomment-185055210: IMHO this is not critical as ATLAS and OpenBLAS do not share resources of the C-runtime (file descriptors and heap). _Hopefully I'm right_. It may be useful for ATLAS builds to increase the stacksize. This can be given as flag during linking, i.e.:

-Wl,--stack,16777216

concerning the dicussions ATLAS vs. OpenBLAS: thanks to @matthew-brett there are now SSE2 based ATLAS DLLs available. This Atlas build should be compared to OpenBLAS build against a SSE2 enabled Target (or simply set OPENBLAS_CORETYPE=NORTHWOOD - basically PENTIUM4) to disable the CPU runtime detection. Of course a generic OpenBLAS build can exploit much more CPU variants thanks to the CPU runtime detection. This is one of the reasons OpenBLAS is more performant in comparison to ATLAS. Another question is the reliability of OpenBLAS. Maybe a repository with gathered BLAS, LAPACK tests could help.

concerning BLIS/Flame: interesting, but a high hanging fruit at least for today.

The decision making of how to choose between ATLAS and OpenBLAS is however not clear for me.

carlkl on 17 Feb 2016

Ralf - pip 8 will install the wheels with the new Windows ABI tags, pip 7 will not. Pip 7 and pip 8 will install the wheels without the ABI tags, without warning.

There are still a lot of pip 7s out there, it was released in August 2015 - so I'd much prefer to stick to the more compatible name, at least for a little while.

matthew-brett on 17 Feb 2016

+1 on investigating BLIS. That seems like a good long-term solution. Have we considered Eigen at all? They support building a partial LAPACK interface, and the license is MPL2 for most of the code. That may be good enough for NumPy.

insertinterestingnamehere on 17 Feb 2016

I noticed from the BLIS cpu detection code that it very often falls back to the reference implementation if it doesn't find AVX instructions, which are still pretty recent.

Ian : this is the state for Eigen as of a year or so ago : http://mingwpy.github.io/blas_lapack.html#eigen - so I believe it would be some work to build a usable library for numpy.

matthew-brett on 17 Feb 2016

And worse, I tried it yesterday on 32-bit Linux and even that doesn't work. ./configure auto && make crashes horribly on some assembler code (for sandybridge). I can only build reference.

If you look at the contents of config/ -- the various named "configurations" (like "sandybridge", "haswell") are actually prepackaged "starter" configurations that include a bunch of pre-specified settings (not just CPU-tuning-related settings, but also threading mode settings, compiler settings, etc.). And the configuration called "sandybridge" is an x86-64 configuration. Sounds like a bug that the auto-configuration selected it, but yeah it's not going to work on x86-32 :-). BLIS does seem to ship with 32-bit x86 kernels (see kernels/x86), though it looks like at the moment none of the prepackaged configurations use them. Making new configurations is mostly trivial; the one piece of magic is in the bli_kernel.h file that names which inner kernel + a few buffer sizes. We could enquire upstream if they have any suggestions for x86-32.

njsmith on 17 Feb 2016

Also:

BLIS is Linux-only at the moment; there's an open PR for OSX build support, and Windows is very far off

A few comments above, @tkelman is building and benchmarking BLIS on Windows :-)

njsmith on 17 Feb 2016

The previous crude test_dot benchmark with OpenBLAS 0.2.12:

In [2]: %timeit test_dot()
1 loop, best of 3: 449 ms per loop

Compared to (previous result from) MKL

In [9]: %timeit test_dot()
1 loop, best of 3: 764 ms per loop

64-bit ATLAS:

In [10]: %timeit test_dot()
1 loop, best of 3: 2.41 s per loop

matthew-brett on 18 Feb 2016

So when I compare openblas and MKL (thanks, conda) in serial to the Haswell BLIS configuration, they're all within at most 10-20% of each other on dgemm. Here's a dockerfile that built successfully on docker hub to cross-compile Windows dll's of each configuration (except bulldozer which didn't link properly https://github.com/flame/blis/pull/37#issuecomment-185480513, oh well): https://github.com/tkelman/docker-mingw/blob/09c7cadd5d682066cea89b3b97bfe8ba783bbfd5/Dockerfile.opensuse

You may want to try hooking up something similar to Travis' services: docker configuration and play with deploying binary artifacts to github releases/bintray/whatever.

tkelman on 18 Feb 2016

I was looking at the BLIS CPU detection -> template code : https://raw.githubusercontent.com/flame/blis/master/build/auto-detect/cpuid_x86.c

Here's a Python rewrite, that should be a tiny bit more liberal in accepting one of the advanced templates (it's more likely to believe the OS can use AVX than the C code): https://gist.github.com/matthew-brett/a53778f99b7062cc332d

On all the machines I've tested on, this algorithm returns 'reference' - probably because I have old machines that no-one else wanted to use, to rescue for my buildbot farm.

Compiling numpy against the reference BLIS, with no lapack, gives the following on my crude benchmark:

In [6]: %timeit test_dot()
1 loop, best of 3: 16.2 s per loop

Just the dot product of two (1000, 1000) arrays is 12 seconds. So, as Tony also found, reference BLIS is the worst of our options, around the same numpy's no-library default build with lapack_lite.

So, I think we will need either more templates covering older machines or more liberal CPU detection -> template mapping in order to give reasonable performance on a wide range of machines.

matthew-brett on 18 Feb 2016

@matthew-brett when can we expect the new ATLAS 64-bits windows wheels to be up? Which version? v1.10.2? Will they only be at pypi or also source forge? Are you going to make any kind of announcement? Thanks so, so much!

mikofski on 19 Feb 2016

@matthew-brett what was the ratio between atlas and reference blis for you on the same machine? Comparable to the factor of around 2 I was seeing? I got multithreading to work in blis, I just didn't rtfm properly (https://github.com/flame/blis/wiki/Multithreading), it isn't automatically enabled, and there are 4 different env vars to play with. With this patch https://gist.github.com/0fc9497a75411fcc0ec5 to enable pthreads-based parallel blis for all configs and setting BLIS_JC_NT=1 BLIS_IC_NT=2 BLIS_JR_NT=2 BLIS_IR_NT=2, the Haswell blis is basically tied with mkl and openblas on my machine. If I set just BLIS_JR_NT to 2 then the parallel reference blis is most of the way caught up to atlas, and it's faster with 3 threads.

tkelman on 19 Feb 2016

@tkelman IMO it would be useful if you could document your progress on BLIS in the NumPy GitHub Wiki pages. I also think it might be interesting to propose a plan similar to mingwpy for making a NumPy-BLIS-FLAME wheel (and a SciPy-BLIS-FLAME wheel if even possible?).

mikofski on 19 Feb 2016

@tkelman: to make sure I'm clear -- your atlas is threaded, right?
another thing to consider is adding -msse2 or similar to the reference build settings -- it looks like by default it is maximally compatible and doesn't allow the compiler to use SSE, but at least in numpy-land I know we're bumping up to SSE2 as the minimum supported configuration anyway for other reasons...

I don't know if FLAME is relevant or not right now versus regular LAPACK -- we'd want to ask.

Possibly we should open a new issue for the BLIS stuff instead of continuing to clutter this one :-)

njsmith on 19 Feb 2016

For this thread - I think we can already ship a wheel with various BLIS kernels selected at run-time using the same rules as BLIS uses at build-time, but I think that would result in many machines with reference BLIS, and therefore having worse performance than 64-bit ATLAS, even though 64-bit ATLAS on Windows is particularly bad (for ATLAS).

But - if the reference build is faster than than the 64-bit ATLAS - say with -msse2 - that would be a real option.

SSE2 is the minimum configuration for 64-bit so it's safe to use something like -mfpmath=sse -msse2 for the reference compile.

matthew-brett on 19 Feb 2016

Possibly we should open a new issue for the BLIS stuff instead of continuing to clutter this one :-)

That would be a good idea (edit: may I suggest it be titled "Occupy BLIS," given @njsmith's sentiment about lawns in https://github.com/numpy/numpy/issues/5479#issuecomment-184472378 ?). I think having @matthew-brett proceed with uploading his existing Atlas wheels would be sufficient to close this one for now, with future work left to new issues.

to make sure I'm clear -- your atlas is threaded, right?

My atlas is the dll from https://github.com/matthew-brett/np-wheel-builder/tree/d950904f19309db103e676d876ea681b6a6b882e/atlas-builds, but I have yet to see it successfully use more than 1 thread. Am I missing an environment variable?

another thing to consider is adding -msse2 or similar to the reference build settings -- it looks like by default it is maximally compatible and doesn't allow the compiler to use SSE

SSE2 is part of the x86_64 spec so this would only be relevant on 32 bit. In Julia we add -march=pentium4 for our 32 bit builds.

I don't know if FLAME is relevant or not right now versus regular LAPACK -- we'd want to ask.

Haven't touched flame yet, but it's worth playing with. Eventually you might be able to use WIndows Clang as a backup plan alternative to mingwpy. (edit: actually this doesn't fix the fortran in scipy, so maybe not)

tkelman on 19 Feb 2016

@matthew-brett: I think (could be wrong) the dunnington kernel only requires SSE3, which the Steam Hardware Survey claims is present on 99.94% of machines (versus 99.99% for SSE2). So it seems wrong if you're finding that a majority of systems can't handle that -- don't know if that's a bug in their cpuid code, in your somehow having a realllly unrepresentative set of test machines, or in my understanding of what that kernel requires.

njsmith on 19 Feb 2016

I posted a Python rewrite of the CPU detection code in the gist above. I'm guessing the template selection is conservative, defaulting to reference where another template might have worked.

matthew-brett on 19 Feb 2016

To remind myself, to link to BLIS, I needed a site.cfg like:

[blas]
blas_libs = numpy-blis-reference
library_dirs = c:\code\blis\test\lib
include_dirs = c:\code\blis\test\include

I also did this, I assume it's necessary (patch relative to numpy 1.10.4):

diff --git a/numpy/distutils/system_info.py b/numpy/distutils/system_info.py
index d7eb49e..3cb7f95 100644
--- a/numpy/distutils/system_info.py
+++ b/numpy/distutils/system_info.py
@@ -1680,18 +1680,11 @@ class blas_info(system_info):
         info = self.check_libs(lib_dirs, blas_libs, [])
         if info is None:
             return
-        if platform.system() == 'Windows':
-            # The check for windows is needed because has_cblas uses the
-            # same compiler that was used to compile Python and msvc is
-            # often not installed when mingw is being used. This rough
-            # treatment is not desirable, but windows is tricky.
-            info['language'] = 'f77'  # XXX: is it generally true?
-        else:
-            lib = self.has_cblas(info)
-            if lib is not None:
-                info['language'] = 'c'
-                info['libraries'] = [lib]
-                info['define_macros'] = [('HAVE_CBLAS', None)]
+        lib = self.has_cblas(info)
+        if lib is not None:
+            info['language'] = 'c'
+            info['libraries'] = [lib]
+            info['define_macros'] = [('HAVE_CBLAS', None)]
         self.set_info(**info)

     def has_cblas(self, info):

matthew-brett on 19 Feb 2016

Utility to allow run-time detection of CPU: https://github.com/matthew-brett/x86cpu

I guess this might be a candidate for inclusion in numpy itself, but we can also copy the single compiled cpuinfo module into the numpy tree for the Windows wheel.

matthew-brett on 20 Feb 2016

Hi all. A thought: if you wanted to publish several different numpy wheels built with various vector libraries, you could use different PyPI package names

I registered 2 to try to upload Gohlke's wheels, but PyPI rejected them. You're welcome to the URL.

hickford on 20 Feb 2016

gh-7294 adds BLIS support to numpy.distutils. Would be great if someone could verify that this works as expected.

rgommers on 20 Feb 2016

There are still a lot of pip 7s out there, it was released in August 2015 - so I'd much prefer to stick to the more compatible name, at least for a little while.

Pip 7.0 isn't that old yet, so makes sense.

rgommers on 20 Feb 2016

... BLIS does seem to ship with 32-bit x86 kernels (see kernels/x86), though it looks like at the moment none of the prepackaged configurations use them

That explains it, thanks.

rgommers on 20 Feb 2016

Thanks Ralf - I'll test.

I realize this may need a new thread, but we're now very close to being able to use the BLIS builds for a release.

I think all we need now are recommended templates for a machine that has SSE2, and a machine that SSE3, that work somewhat faster than the ATLAS 64-bit Windows build.

matthew-brett on 20 Feb 2016

I realize this may need a new thread, but we're now very close to being able to use the BLIS builds for a release.

Eh, technically it may be possible to make it work, but it's still not a good plan to throw builds over the wall like that. We haven't even had serious testing of BLIS on Linux or OS X yet. So on Windows, where the BLIS FAQ says:

Support for building in Windows is also a long-term goal of ours. 
The Windows build system exists as a separate entity within the top-level
windows directory. However, this feature is still experimental and should not 
(yet) be expected to work reliably. Please contact the developers on the blis-devel 
mailing list for the latest on the Windows build system.

, it's definitely too early. Besides testing, some benchmarking is also a good idea I'd think.

rgommers on 20 Feb 2016

Sure - but as Tony has shown, it's actually not hard to build BLIS for Windows, using cross-compiling. The experimental thing - I believe - is their MSVC build system, which we are not using.

For now, I'm only suggesting using BLIS for the Windows wheel, but of course it would be very good to get it working for the manylinux builds as well.

I completely agree that, if we aren't getting a significant on-average performance boost, then we shouldn't be using BLIS, and, at the moment, I don't think we are, except for very new processors. That might be trivially fixable with a couple of new templates, I'd love to know if that were the case.

For correctness, I also agree. How about if we show that

a) All numpy tests pass on all versions of Windows;
b) All numpy and scipy tests pass on the manylinux system?

We can make the BLIS template selectable at run-time, and test all kernels on a modern machine. I can test on some old nasty machines as well.

matthew-brett on 20 Feb 2016

For now, I'm only suggesting using BLIS for the Windows wheel, but of course it would be very good to get it working for the manylinux builds as well.

manylinux is less important I'd think, as we have package managers there with a full stack as well as users who can much more easily compile things. Let's first see the whole manylinux concept taking off before we worry about it in this numpy + BLAS/LAPACK context:)

For Windows, I think our prios are:

1) a full-stack solution (needs MingwPy, with one of OpenBLAS/ATLAS/BLIS)
2) stopgap binary wheels (we have one about to go up with your ATLAS build)
3) increasing the performance of (1). This is where BLIS could come in.

So imho there's no need to be in a big rush with BLIS on Windows.

I completely agree that, if we aren't getting a significant on-average performance boost, then we shouldn't be using BLIS, and, at the moment, I don't think we are, except for very new processors. That might be trivially fixable with a couple of new templates, I'd love to know if that were the case.

Agreed, there should be a significant gain for it to make sense. It's a bit hard to oversee how much work is needed indeed.

For correctness, I also agree. How about if we show that

a) All numpy tests pass on all versions of Windows;
b) All numpy and scipy tests pass on the manylinux system?

That sounds good. Would make sense to include scikit-learn as well, it's a pretty significant linalg user.

rgommers on 20 Feb 2016

I wasn't aware that blis and libflame have been a part of the ACML codebase, that has been open sourced some time ago:

http://developer.amd.com/community/blog/2015/08/07/open-source-strikes-again-accelerated-math-libraries-at-amd/
http://developer.amd.com/tools-and-sdks/opencl-zone/acl-amd-compute-libraries/

Nevertheless: howto solve the problem to compare 4 different accelerated BLAS/Lapack implementations for numpy/scipy build with either MSVC or mingwpy to be tested on numerous CPU architectures: Pentium4 up to skylake?

carlkl on 20 Feb 2016

Nice find @carlk, I seen to remember them announcing dropping acml and open sourcing acl, but I didn't recall them adopting blis/libflame. The bsd license is very good news! Is there someway to work with AMD and shpc at ut Austin to target numpy and Julia?

I was able to cross compile the libblis.a using msys2 and the haswell config out of the box and passing all tests by patching the kernel symlinks, but i could not build libflame - I got the same "argument list to long" error as in my blis-discuss mailing list post. Also I personally could not figure out how to link to libblis.a from lapack, but I didn't try very hard.

With the community licensing of MKL, is it not possible to provide an MKL wheel on pypi, are the licenses really incompatible? Or is it just not possible to build scipy without ifort?

One issue, and it probably belongs on scipy, that hasn't been mentioned is the remaining Fortran files in scipy. Sorry for the noob question but why must we use them? To me it seems Fortran, and the lack of a free multiplatform compiler, is the real issue here. Isn't that after all what mingwpy aims to solve. Given either free MKL or some future magic acl blis/flame anyone with a c-compiler could build the scipy stack of it weren't for the *.f files.

mikofski on 20 Feb 2016

@mikofski, great to hear, that blis can be compiled with msys2. Is this also true for libflame? I guess we need libflame for the Lapack API.
Personally _it is_ possible to have a MSVC compiled numpy and use it together with a mingwpy compiled scipy. You need to add -mlong-double-64 to the gcc flags to ensure that long doubles == double.

It is tricky to make this behaviour the default on gcc, I'm playing on this problem since one week :(

I will came up tomorrow with scipy wheels. These will be based on Atlas provided by the numpy wheels from @matthew-brett.

Nevertheless, I'm in favour of using OpenBLAS right now.

carlkl on 20 Feb 2016

One issue, and it probably belongs on scipy, that hasn't been mentioned is the remaining Fortran files in scipy. Sorry for the noob question but why must we use them?

Because it's a lot of very useful and high-performance code. And it's not just BLAS/LAPACK - a lot of scipy.sparse.linalg, scipy.linalg, scipy.special and scipy.interpolate for example is Fortran. Also, Scipy is not the only project with Fortran code, there's other packages like bvp_solver as well as people's own Fortran code that they wrapped with f2py.

rgommers on 21 Feb 2016

Indeed, nice find Carl.

Nevertheless: howto solve the problem to compare 4 different accelerated BLAS/Lapack implementations for numpy/scipy build with either MSVC or mingwpy to be tested on numerous CPU architectures: Pentium4 up to skylake?

This indeed requires a decent automated build/test/benchmark framework. We don't have to bother with very old CPU architectures (as long as things work there it's fine) and also not with MSVC I'd think. But still it'll be some work to set this up properly.

rgommers on 21 Feb 2016

@rgommers thanks!

mikofski on 21 Feb 2016

Hi all. A thought: if you wanted to publish several different numpy wheels built with various vector libraries, you could use different PyPI package names
https://pypi.python.org/pypi/numpy/1.8.1
https://pypi.python.org/pypi/numpy-mkl
https://pypi.python.org/pypi/numpy-atlas
I registered 2 to try to upload Gohlke's wheels, but PyPI rejected them. You're welcome to the URL.

@hickford please don't do that. It's breaking the MKL license to redistribute binaries like that (unless you have a personal license), and it's not the right way to do this. In the future we may want to distribute some flavors via extras (numpy[atlas], numpy[openblas] etc.) though.

Also, redistributing someone else's wheels on PyPi without asking is probably not the thing to do....

rgommers on 21 Feb 2016

Mingwpy and any fortran issues that rely on linking to the same c runtime as cpython are rate limited on @carlkl, experimenting with BLIS solves fewer problems but can be done independently by anyone. I've unfortunately exhausted my personal supply of time for looking at BLIS right now, but see #7294.

tkelman on 24 Feb 2016

Tony - thanks very much for all your help, it's been invaluable.

matthew-brett on 24 Feb 2016

I added a build of later ATLAS (3.11.38) on 64-bits

https://github.com/matthew-brett/np-wheel-builder

This is a serial (unthreaded) build, because of problems compiling 3.11.38 on Windows, but it should be a bit faster than 3.10.1, and is on my simple benchmark:

In [2]: %timeit test_dot()
1 loop, best of 3: 1.65 s per loop

compared to the earlier 3.10.1 build (see above):

In [10]: %timeit test_dot()
1 loop, best of 3: 2.41 s per loop

@tkelman - can you benchmark this build on Julia?

matthew-brett on 2 Mar 2016

Sorry to jump in here with a prior note on MKL binaries - Intel offers the
community version which should allow redistribution as it is free for all...
On Mar 2, 2016 3:08 PM, "Matthew Brett" [email protected] wrote:

I added a build of later ATLAS (3.11.38) on 64-bits

https://github.com/matthew-brett/np-wheel-builder

This is a serial (unthreaded) build, because of problems compiling 3.11.38
on Windows, but it should be a bit faster than 3.10.1, and is on my simple
benchmark:

In [2]: %timeit test_dot()
1 loop, best of 3: 1.65 s per loop

compared to the earlier 3.10.1 build (see above):

In [10]: %timeit test_dot()
1 loop, best of 3: 2.41 s per loop

@tkelman https://github.com/tkelman - can you benchmark this build on
Julia?

—
Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/issues/5479#issuecomment-191431331.

mrslezak on 3 Mar 2016

@mrslezak - the license does allow redistribution, but makes the redistributor liable for any legal fees if Intel gets sued as a result of using the software. Also, the resulting binary cannot be BSD licensed. See: http://mingwpy.github.io/blas_lapack.html#intel-math-kernel-library

matthew-brett on 3 Mar 2016

Could that be avoided by adding 'provided as is, with no liability for any
monetary losses that may result from its use' or something to that affect?
On Mar 2, 2016 6:22 PM, "Matthew Brett" [email protected] wrote:

@mrslezak https://github.com/mrslezak - the license does allow
redistribution, but makes the redistributor liable for any legal fees if
Intel gets sued as a result of using the software. Also, the resulting
binary cannot be BSD licensed. See:
http://mingwpy.github.io/blas_lapack.html#intel-math-kernel-library

—
Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/issues/5479#issuecomment-191505500.

mrslezak on 3 Mar 2016

I don't think that would work, because we have to agree to Intel's license, and Intel's license says that we are liable for their legal fees if they get sued. I guess we could require users not to sue Intel in our license agreement, and so maybe, if they sue Intel, and Intel asks us for the money, we can try and sue the user for those fees, but still - putting that on our license would take us even further from BSD, and require us to get the user to agree explicitly, which is not practical in the case of wheels installed by pip.

matthew-brett on 3 Mar 2016

Building ATLAS for SSE3 only gives a 5% performance benefit compared to the SSE2 ATLAS, but the build was tricky and I had to disable the most obvious enabling flags for SSE3, and just use -msse3.

matthew-brett on 3 Mar 2016

I wrote a mail to the numpy mailing list proposing to deploy these wheels : https://mail.scipy.org/pipermail/numpy-discussion/2016-March/075125.html

matthew-brett on 4 Mar 2016

@matthew-brett As someone who support Windows with Python applications, thank you.

johnthagen on 4 Mar 2016

@matthew-brett, I added 2 issues to your atlas-build-scripts repository.
See https://github.com/matthew-brett/atlas-build-scripts/issues

The first one https://github.com/matthew-brett/atlas-build-scripts/issues/1 is important, as numpy-atlas.dll exports to much symbols and thus prevents from further usage with mingwpy without hacking of the import library.

carlkl on 7 Mar 2016

@matthew-brett sorry I've been a bit busy to do any more benchmarking. Were any of the earlier atlas builds multithreaded? I wasn't able to get the first build running on multiple cores. The gist should be pretty straightforward to run even if you aren't very familiar with julia. Or were you mostly interested in newer hardware than you have access to?

tkelman on 8 Mar 2016

Don't worry - wasn't expecting you to drop everything and run benchmarks.

Actually my latest atlas build was not multithreaded - ATLAS 3.11 needs some more work to get the threading working on Windows apparently.

For the benchmarks, I was thinking it would be easier to compare to the other benchmarks you've run, and I do only have old hardware with Windows on - I'm guessing the hit is a lot greater on your machine than on mine.

matthew-brett on 8 Mar 2016

Windows wheels are now up on pypi : https://pypi.python.org/pypi/numpy/1.10.4

matthew-brett on 8 Mar 2016

Sorry Tony - yes the previous 3.10 ATLAS builds were (or appeared to be) multithreaded.

matthew-brett on 8 Mar 2016

I guess we can close this issue now. Maybe @matthew-brett you should transfer your https://github.com/matthew-brett/np-wheel-builder under the numpy org or maybe contribute it as a PR to the numpy repo under the tools folder.

ogrisel on 10 Mar 2016

Ralf - any suggestions about where np-wheel-builder should go? numpy/vendor maybe?

matthew-brett on 10 Mar 2016

I'd prefer a separate new repo (numpy-wheel-builder?) under the numpy org I think. There's overlap with numpy-vendor in purpose, but not much in code. That repo is quite large, is really meant for running under Wine, and the gcc toolchain in it is obsolete.

rgommers on 11 Mar 2016

Fine with me - OK with y'all to go ahead and create that?

matthew-brett on 11 Mar 2016

Fine with me, though if it's windows-specific (right now it is AFAICT?) then the repo name should have "windows" in it :-). Or else it could be where we put the analogous infrastructure for other wheels too. I'd also be fine with putting it directly into the numpy repo somewhere if it's small enough for that to make sense. Whatever works :-)

njsmith on 11 Mar 2016

Repo has fairly big ATLAS binaries in it, would make the numpy repo large to not good purpose, I think.

How about win-wheel-builder?

matthew-brett on 11 Mar 2016

How about windows-wheel-builder. I'm not a fan of win ;)

charris on 11 Mar 2016

What about not making it windows specific and having the macosx and future manylinux1 wheel build config all in one place?

ogrisel on 11 Mar 2016

Otherwise +1 for "windows" over "win".

ogrisel on 11 Mar 2016

What about not making it windows specific and having the macosx and future manylinux1 wheel build config all in one place?

Would be easier to change things on all platforms. But I'd expect that OS X and Linux would just need build scripts, while for Windows we have the huge ATLAS binaries. If it's all going into one repo, can the ATLAS binaries be separated somehow (maybe with git-lfs)?

rgommers on 13 Mar 2016

Use large file storage (LFS) on github for binaries

mikofski on 13 Mar 2016

@rgommers: I think we'll soon be carrying atlas-or-some-other-blas binaries for Linux as well, and possibly osx too (e.g. if we decide that we're tired of accelerate breaking multiprocessing).

njsmith on 13 Mar 2016

could start using github releases or bintray or something instead of checking them in... not like they're all that large though until you start getting into DYNAMIC_ARCH enabled openblas builds or the equivalent combinations of multiple blis configs

tkelman on 13 Mar 2016

How about putting up the repo as windows-wheel-builder for now, and refactoring / renaming when it is more clear what we're going to do with Linux / OSX?

matthew-brett on 14 Mar 2016

Sounds good to me.

ogrisel on 15 Mar 2016

fine with me as well

rgommers on 15 Mar 2016

I think I need admin rights to the numpy organization - or I can give
someone admin rights to the repo, and they can do it, I suppose.

matthew-brett on 15 Mar 2016

@matthew-brett: I'm very confused by github's permissions page (and numpy's in particular is a mess), but if you want to make me an admin on the repo or transfer the repo to me then I can move it into numpy/

njsmith on 15 Mar 2016

I transferred the repo to @njsmith ...

matthew-brett on 15 Mar 2016

https://github.com/numpy/windows-wheel-builder

njsmith on 15 Mar 2016

Is there a numpy appveyor account? Can somebody enable Appveyor builds for this repo?

matthew-brett on 16 Mar 2016

I think we're using @charris's Appveyor account...

njsmith on 16 Mar 2016

Yes, see here https://ci.appveyor.com/project/charris/numpy/history

On Wed, Mar 16, 2016 at 12:15 AM, Nathaniel J. Smith <
[email protected]> wrote:

I think we're using @charris https://github.com/charris's Appveyor
account...

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/issues/5479#issuecomment-197064930

gfyoung on 16 Mar 2016

Actually, I just made a new group account for numpy at appveyor (been meaning to do this anyway, and this prompted me to actually do it :-)), and enabled it there:
https://ci.appveyor.com/project/numpy/windows-wheel-builder

njsmith on 16 Mar 2016

@njsmith How did you manage that? Last I looked someone needed to ask the admins to create project accounts and the way to add others to it wasn't completely transparent.

charris on 16 Mar 2016

If the account works out, I'd like to transfer responsibility for the numpy testing.

charris on 16 Mar 2016

@charris: check your email :-). I just made a individual account with numpy-steering-council @googlegroups.com as the individual. I didn't know that project accounts were a thing that existed... Do we want one?

njsmith on 16 Mar 2016

for the sake of the queue you probably want to spread different projects on different accounts

tkelman on 16 Mar 2016

The downside of using the numpy-steering-council mail is that appveyor sends out notifications when a merge test fails. If the appveyor folks have something better these days it would be good to use that, but given the mess their interface has been in the past I wouldn't bet on it.

charris on 16 Mar 2016

@tkelman Good point. Also, if we are going to spend money to get a speedier queue we probably want something more official.

charris on 16 Mar 2016

@charris: I just attempted to enable testing of numpy/numpy in the new appveyor account, and also to disable all notifications, and also add all the relevant numpy github teams as administers on the account -- let's see what happens I guess...

@matthew-brett: It occurs to me that the most elegant approach might be to stash the BLAS builds somewhere like numpy/windows-build-tools, but to run the actual wheel build tools out of the real numpy/numpy repository as part of the appveyor build -- they could pull the BLAS binaries down on demand.

njsmith on 16 Mar 2016

Thanks for all the great work! Will numpy 1.11.0 Window wheels be added to pypi soon? https://pypi.python.org/pypi/numpy

fgregg on 29 Mar 2016

oh yeah, we possibly need to figure out how to update our release procedures here... IIUC the user experience right now is that as soon as the 1.11 source release was uploaded, all the windows machines out there suddenly switched from downloading wheels (yay) to trying to download and build the source (boo). I guess the "right" way to do this is that once the final release is tagged, we build and upload all the binary wheels _before_ uploading the sdist. As annoying as that is...

njsmith on 29 Mar 2016

👍2

@njsmith that would be nice, but a few minutes lag (or even a few hours) lag would be fine with me.

fgregg on 30 Mar 2016

Just to clarify are the current Windows whl files on PyPI for the 1.11.0 release build against ATLAS? Is there a build script that can be shared?

jjhelmus on 20 Apr 2016

Yes, the wheels are built against ATLAS, but we're thinking of moving to OpenBLAS when we're confident of the results.

Build is automated via Appveyor : https://github.com/numpy/windows-wheel-builder

matthew-brett on 20 Apr 2016

23735 downloads in the last day. =)

It might be possible to create hidden release - at least there is an option on PyPI form https://pypi.python.org/pypi?%3Aaction=submit_form and unhide it when all files are ready.

techtonik on 22 Apr 2016

Sadly, the hidden release feature does stop people getting that release via the command line, it only stops them seeing the release via the pypi GUI:

https://sourceforge.net/p/pypi/support-requests/428/

matthew-brett on 22 Apr 2016

I have tried the 64-bit windows install of numpy and that works great, so thanks to all who have put in work on this.

What I am wondering is if there is still a plan to do the same thing with scipy wheels? Is this awaiting the decision to move to OpenBLAS?

rowleya on 28 Apr 2016

On https://bitbucket.org/carlkl/mingw-w64-for-python/downloads there are some test wheels of scipy-0.17.0 . These wheels have been build with mingwpy against @matthew-brett 's builds of numpy https://pypi.python.org/pypi/numpy/1.10.4

carlkl on 28 Apr 2016

On Thu, Apr 28, 2016 at 12:48 PM, carlkl [email protected] wrote:

On https://bitbucket.org/carlkl/mingw-w64-for-python/downloads there are
some test wheels of scipy-0.17.0 . These wheels have been build with
mingwpy against @matthew-brett https://github.com/matthew-brett 's
builds of numpy https://pypi.python.org/pypi/numpy/1.10.4

Sorry if you said already, and I missed it - but do you get any test
failures for these wheels?

Are you linking to the ATLAS shipped inside the numpy wheels?

matthew-brett on 28 Apr 2016

@matthew-brett, I announced these builds a month ago, but I don't remember where. Anyway, these builds link against numpy-atlas supplied by your numpy wheels.

scipy-0.17.0-cp35-cp35m-win##.whl are linked against the _wrong_ C-runtime msvcrt.dll. For scipy this seems to be OK. Test logs are here: https://gist.github.com/carlkl/9e9aa45f49fedb1a1ef7

carlkl on 28 Apr 2016

Is that the right log? It has NumPy is installed in D:\devel\py\python-3.4.4\lib\site-packages\numpy at the end.

I was wondering if we are close to being able to provide a scipy wheel, even if it dangerously links against the wrong MSVC runtime, but it looks as is there are far too many errors for this build.

Do you get fewer errors for the 64-bit build? For the current best build against openblas 0.2.18 ?

matthew-brett on 28 Apr 2016

64bit has only 6 failures all with:

FAIL: test_continuous_basic.test_cont_basic(<scipy.stats._continuous_distns.nct_gen object ...

I know: this cries for comparison with OpenBLAS. However, I'm stuck since the last 4 weeks for several reasons as you may have noticed. Hopefully the situation will continue to improve.

@matthew-brett, I would appreciate using numpy MSVC builds with OpenBLAS. My latest builds are here:

carlkl on 28 Apr 2016

As if mingwpy, conda-forge, Anaconda and Canopy wasn't enough here comes the Intel Distribution for Python and it's free to download. It includes just the numerical tools (SciPy, NumPy, Numba, Scikit-Learn) plus some extras (mpi4py Intel mp interface and pyDAAL data analytics) and uses conda.

mikofski on 29 Apr 2016

No worries the license expires 10/29/16 so these Intel builds are just a
beta test followed by probably an MKL+ etc. license fee. OpenBLAS builds
will remain the open source solution so thank you for providing these
builds.
On Apr 28, 2016 7:21 PM, "Mark Mikofski" [email protected] wrote:

As if mingwpy, conda-forge, Anaconda and Canopy wasn't enough here comes
the Intel Distribution for Python
https://software.intel.com/en-us/python-distribution and it's free to
download
https://software.intel.com/en-us/articles/intel-distribution-for-python-support-and-documentation.
It includes just the numerical tools (SciPy, NumPy, Numba, Scikit-Learn)
plus some extras (mpi4py Intel mp interface and pyDAAL data analytics) and
uses conda.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/issues/5479#issuecomment-215600103

mrslezak on 29 Apr 2016

For 1.11.1, it looks like that there is a missing Windows wheel on PyPi for Python 3.5 amd64.

Is there a particular reason for that? If I go to 1.11.0 (https://pypi.python.org/pypi/numpy/1.11.0), the wheel is there.

giumas on 18 Jul 2016

Thanks for the report - I think we must have uploaded too soon, and therefore before all the wheels were built. I've uploaded the missing wheel. It looks like we need a test to make sure this doesn't happen again.

matthew-brett on 18 Jul 2016

I've uploaded the missing wheel.

I have just tested it, and it works great!

Thank you so much for all the work done to make the Windows wheels available.

giumas on 18 Jul 2016

Closing the issue -- wheels have been available for the last few releases.

pv on 24 Oct 2016

I understand that this issue is closed but I believe we should consider re-opening it.

This remains an issue for windows users trying to get their scientific stack running without having to resort to conda. I still need to use the @cgohlke 'MKL' builds see this related scipy issue which remains open. Although wheels are being created, without being compatible with scipy, they are not usable for many.

waynenilsen on 6 Sep 2017

@waynenilsen you have the instructions for installing the new wheels in the mailing list thread that is linked in the issue you just mentioned:

https://github.com/scipy/scipy/issues/5461#issuecomment-326744515

So if you do

pip install -f https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com/ --pre scipy

it should work for you.

astrojuanlu on 6 Sep 2017

There's nothing left to be done for Numpy, so the issue is closed. The
Scipy issue is still open, and it will likely be resolved in the next
release.

pv on 6 Sep 2017

This works great for me @Juanlu001 I am really looking forward to when this is on pypi!

waynenilsen on 6 Sep 2017

Numpy: Windows wheel package (.whl) on Pypi

Most helpful comment

All 267 comments

Related issues