lapack 🚀 - Allow a possible installation of index-64 library alongside standard index-32 library?

Hi Aisha, that makes total sense to me, but let us see if we have feedback from others. So let us wait a few days. J

langou on 1 Nov 2020

❤1

Most definitely, that sounds like a good plan.

epsilon-0 on 1 Nov 2020

OMG @langou
you are so fast :heart:

Just for completeness, I am writing down things that we still have to do:

Figure out how to name headers so that 32bit API can co-exist with 64bit API
Fix printf/fprintf statements so that they use correct qualifier for printing.

Any suggestions to solve the first point are welcome, I unfortunately don't have a "clean" solution.

epsilon-0 on 5 Nov 2020

A couple of questions, which will help me handle the naming for the files

There seems to be a tonne of duplicate definitions between cblas_f77.h and cblas_test.h. Do we really need that?
Does cblas_test.h need to be installed? Given its name (and the files that it is used in) I presume it to be used only during the testing phase. Maybe we should not install this file on a system-wide level?

epsilon-0 on 7 Nov 2020

Hi @epsilon-0,

Some things which I had in mind for allowing this to co-exist with the standard installation - name the libraries to libblas64.so, libcblas64.so, liblapack64.so, liblapacke64.so, this way there is no conflict between the library names (though of course, you can't link with both libblas and libblas64 at the same time).

you might be looking for PR #218. The author of this PR is Björn Esser from the Fedora Project.

christoph-conrads on 15 Feb 2021

Hi @epsilon-0. Did #462 solve this issue?

weslleyspereira on 21 Apr 2021

@weslleyspereira no this isn't complete yet.
There needs to be some more header renaming/handling done.
I am busy for the next few weeks so am not going to be able to do this soon.
Basic outline

the header files should be called cblas.h and cblas64.h, likewise for other headers
- this means the *.c files would need some slight adjustment to include the proper header, but this is only during build time, so it can be hacked out.
the cmake files should be installed under lapack64 or cblas64, etc.

epsilon-0 on 21 Apr 2021

Ok, I see. Thanks for the quick follow-up!

weslleyspereira on 21 Apr 2021

I have similar trouble trying to package things with pkgsrc. I would like to have a complete install of the reference, with cblas and lapacke. For differing implementations being installed at the same time, I settled for differing library names and sub-directories for the headers, so for example

/usr/lib/libopenblas.so
/usr/lib/libopenblas64.so
/usr/lib/libblas.so
/usr/lib/libcblas.so
/usr/lib/libblas64.so
/usr/lib/libcblas64.so
/usr/include/openblas/cblas.h
/usr/include/openblas64/cblas.h
/usr/include/netlib/cblas.h
/usr/include/netlib64/cblas.h
/usr/include/cblas.h -> netlib/cblas.h (for compatibility, having the default)

(and so forth)

We do not consider runtime switching like the binary distros, so it is OK if each cblas.h (and lapacke.h) is specific to its matched library, like with extra names for libopenblas. Build-time selection happens via

BLAS_INCLUDES=-I/prefix/include/netlib64
BLAS_LIBS=-lblas64
CBLAS_LIBS=-lcblas64

(etc.) That's what the .pc files are supposed to say, and it a lot easier than to communicate a different header file name. They are not yet consistent in that, but I am fixing it up. Seems so far people just hacked that in their distros, if bothering with all reference libs at all.

I got one question about those headers, though.

I am hacking the cmake build to get each component built separately, and am trying some other fixup (see https://github.com/Reference-LAPACK/lapack/pull/556). I get the libblas.so and libblas64.so libraries built fine, I get the header dirs configured … but the installed cblas.h and lapacke.h are identical for the 32 and 64 bit indexing versions. This is at odds with openblas: There, I got a crucial difference that I do not see for the netlib builds:

diff -ruN /data/pkg/include/openblas/openblas_config.h /data/pkg/include/openblas64/openblas_config.h
--- /data/pkg/include/openblas/openblas_config.h    2021-06-03 19:03:53.000000000 +0200
+++ /data/pkg/include/openblas64/openblas_config.h  2021-06-03 19:13:36.000000000 +0200
@@ -44,6 +44,7 @@
 #define OPENBLAS_DLOCAL_BUFFER_SIZE 32768
 #define OPENBLAS_CLOCAL_BUFFER_SIZE 16384
 #define OPENBLAS_ZLOCAL_BUFFER_SIZE 12288
+#define OPENBLAS_USE64BITINT 
 #define OPENBLAS_GEMM_MULTITHREAD_THRESHOLD 4
 #define OPENBLAS_VERSION " OpenBLAS 0.3.15 "
 /*This is only for "make install" target.*/

For the reference libraries, all headers from the 32 and 64 bit index builds are identical and apparently users are expected to put
-DWeirdNEC in their flags (might have been funny 30 years ago) for cblas.h and -DLAPACK_ILP64 -DHAVE_LAPACK_CONFIG_H. Since people use the optimized BLAS libraries in production, the de facto standard is not to expose that to the users. These feed back on the reference, IMHO, and the headers installed from an ILP64 build should not require funky flags to avoid crashing your app when linking to the 64 bit lib.

Do we agree that it is the right solution to modify the headers at build time to define the correct integers?

Btw, the cblas config files that are installed also miss any reference to the necessary defs, so are broken for the 64 bit index builds, as it seems. But actually, I ponder not installing these at all. They are redundant with the .pc files and make it possibly harder to convince cmake-using dependent packages to accept a packager choice via BLAS_LIBS etal.

drhpc on 3 Jun 2021

PS: With Intel MKL, there is a central switch -DMKL_ILP64 to be set. I imagine setting up trivial
include/intel-mkl64/cblas.h with

#ifndef MKL_ILP64
#define MKL_ILP64
#endif
#include <mkl_cblas.h>

to fit the general scheme. I could also put the define into BLAS_INCLUDES, same for the weird netlib defines. What is better? Do we want to do it like Intel or like OpenBLAS?

drhpc on 3 Jun 2021

Do we agree that it is the right solution to modify the headers at build time to define the correct integers?

Yes. I agree with that, and prefer the solution that does not replicate the entire header. I think it is cleaner.

Btw, the cblas config files that are installed also miss any reference to the necessary defs, so are broken for the 64 bit index builds, as it seems.

Right. I just installed the 64-bits libraries (BUILD_INDEX64=ON) and couldn't see anything telling me to use WeirdNEC, LAPACK_ILP64 or HAVE_LAPACK_CONFIG_H. Thanks for noticing that!

weslleyspereira on 7 Jun 2021

Yes. I agree with that, and prefer the solution that does not replicate the entire header. I think it is cleaner.

This is ambiguous to me. Which is the cleaner solution? What I am preparing now is such:

#if defined(WeirdNEC) || @HAVE_ILP64@
   #define CBLAS_INDEX long
   #ifndef WeirdNEC
   #define WeirdNEC
   #endif
#else
   #define CBLAS_INDEX int
#endif

The CMakeFile shall replace HAVE_ILP with 1 or 0, the resulting header being installed for the current build.

(Btw.: long wouldn't work on Windows. It's long long there … or int64_t on all platforms with stdint.)

Right. I just installed the 64-bits libraries (BUILD_INDEX64=ON) and couldn't see anything telling me to use WeirdNEC, LAPACK_ILP64 or HAVE_LAPACK_CONFIG_H. Thanks for noticing that!

I am imagining a future where you do

cc -I/foo/include/netlib64 -o bar bar.c -L/foo/lib -lcblas64

And things are handled in foo/include/netlib64/cblas.h, otherwise by foo/include/netlib/cblas.h (possibly linked to foo/include/cblas.h).

I have the suspicion that this is _not_ what you meant, but I want to convince that it is better;-)

You could try to not to duplicate the header by placing 'the' header in /foo/include/cblas.h and have /foo/include/netlib64/cblas.h include that one only by defining WeirdNEC, but that means that the 64 bit and 32 bit packages share that common header file, which is messy for packaging. It is way better if each puts its file into separate places/names. The name needs to stay cblas.h because you don't want to go around replacing #include <cblas.h> lines.

Edit: Also, having cblas.h include ../cblas.h is messy by itself. Also we define _one_ header installation directory for cmake. By default that's /foo/include, not /foo/netlib64/include. I am not going to change this default. Packagers will have to specify the subdirectory like this (BSD make in pkgsrc):

.if !empty(LAPACK_COMPONENT:M*64)
.  if empty(MACHINE_ARCH:M*64)
PKG_FAIL_REASON+=       "${LAPACK_COMPONENT} incompatible with non-64-bit platform"
.  endif
HEADERDIR=netlib64
.else
HEADERDIR=netlib
.endif

# Note: We patch the build to install both static and
# shared libraries.
CMAKE_ARGS=     -DBUILD_DEPRECATED=ON \
                -DBUILD_SHARED_LIBS=ON \
                -DBUILD_STATIC_LIBS=ON \
                -DCMAKE_INSTALL_INCLUDEDIR=${PREFIX}/include/${HEADERDIR} \
                ${LAPACK_COMPONENT_CMAKE_ARGS}

drhpc on 8 Jun 2021

A beautiful aspect of shipping/installing the 32 bit cblas.h with this modification to the usual location is that the original mechanics still work. Only the 64 bit variant will enforce WeirdNEC. You could decide to only install the 64 bit one into a prefix and keep the other parts of the ecosystem untouched.

drhpc on 8 Jun 2021

Oh, come on … the CBLAS/cmake/cblas-config-install.cmake.in seems to forget -DCMAKE_INSTALL_INCLUDEDIR, doesn't it?

# Report lapacke header search locations.
set(CBLAS_INCLUDE_DIRS ${_CBLAS_PREFIX}/include)

(The comment is sugar on top.)

I have the feeling that the CMake build is a lot less mature at one would think. Is the project serious about having that as primary build or is this just a drive-by contribution? I am really tempted to rather fix up the old-style Makefile, less fuss all around. But I now sank so much time into fixing up the CMake stuff, that I detest anyway. So I would like to get it over with.

drhpc on 8 Jun 2021

I have to give up now … I managed to move cblas.h to cblas.h.in as indicated above, and added

configure_file(${CMAKE_CURRENT_SOURCE_DIR}/cblas.h.in cblas.h @ONLY)
configure_file(${CMAKE_CURRENT_SOURCE_DIR}/cblas_f77.h.in cblas_f77.h @ONLY)

to CBLAS/include/CMakeLists.txt, also having defined @HAVE_ILP64@ to 1 or 0 in the toplevel CMakeLists.txt. But I can for the life of me not figure out how to make the install stuff which is in a higher-level CMakeLists.txt install the generated headers, or the weird copy of the same from ${LAPACK_BINARY_DIR}/include (really? A copy within the source tree?)

What is the macro append_subdir_files supposed to do? It seems to prepend a copy of the prefix to the header paths. I got not enough or too much path to the source header files. I just want to install the header files from HERE to THERE, dammit.

Can someone knowledgeable help out here? I guess I could figure it out tomorrow, but I am not sure if that is without smashing something in the real world for emotional relief.

drhpc on 8 Jun 2021

Yes. I agree with that, and prefer the solution that does not replicate the entire header. I think it is cleaner.

This is ambiguous to me. Which is the cleaner solution? What I am preparing now is such:
#if defined(WeirdNEC) || @HAVE_ILP64@
   #define CBLAS_INDEX long
   #ifndef WeirdNEC
   #define WeirdNEC
   #endif
#else
   #define CBLAS_INDEX int
#endif
The CMakeFile shall replace HAVE_ILP with 1 or 0, the resulting header being installed for the current build.

(Btw.: long wouldn't work on Windows. It's long long there … or int64_t on all platforms with stdint.)

Right. I just installed the 64-bits libraries (BUILD_INDEX64=ON) and couldn't see anything telling me to use WeirdNEC, LAPACK_ILP64 or HAVE_LAPACK_CONFIG_H. Thanks for noticing that!

I am imagining a future where you do
cc -I/foo/include/netlib64 -o bar bar.c -L/foo/lib -lcblas64
And things are handled in foo/include/netlib64/cblas.h, otherwise by foo/include/netlib/cblas.h (possibly linked to foo/include/cblas.h).

I have the suspicion that this is _not_ what you meant, but I want to convince that it is better;-)

Sorry, let me explain. At first, I liked the idea of keeping the original header cblas.h and creating include/netlib64/cblas.h and include/netlib/cblas.h with something like

#if defined(WeirdNEC)
   #define WeirdNEC
#endif
#include <cblas.h>

You could try to not to duplicate the header by placing 'the' header in /foo/include/cblas.h and have /foo/include/netlib64/cblas.h include that one only by defining WeirdNEC, but that means that the 64 bit and 32 bit packages share that common header file, which is messy for packaging. It is way better if each puts its file into separate places/names. The name needs to stay cblas.h because you don't want to go around replacing #include <cblas.h> lines.

Edit: Also, having cblas.h include ../cblas.h is messy by itself. Also we define _one_ header installation directory for cmake.

but yes, we would have to use include/netlib64 and include in as include dirs if we one header includes the other.

By default that's /foo/include, not /foo/netlib64/include. I am not going to change this default. Packagers will have to specify the subdirectory like this (BSD make in pkgsrc):

.if !empty(LAPACK_COMPONENT:M*64)
.  if empty(MACHINE_ARCH:M*64)
PKG_FAIL_REASON+=       "${LAPACK_COMPONENT} incompatible with non-64-bit platform"
.  endif
HEADERDIR=netlib64
.else
HEADERDIR=netlib
.endif

# Note: We patch the build to install both static and
# shared libraries.
CMAKE_ARGS=     -DBUILD_DEPRECATED=ON \
                -DBUILD_SHARED_LIBS=ON \
                -DBUILD_STATIC_LIBS=ON \
                -DCMAKE_INSTALL_INCLUDEDIR=${PREFIX}/include/${HEADERDIR} \
                ${LAPACK_COMPONENT_CMAKE_ARGS}

That seems good to me. So, you would only add an alternative to build LAPACK without having to _guess_ the compiler flags. But the current way would also work.

(Btw.: long wouldn't work on Windows. It's long long there … or int64_t on all platforms with stdint.)

Good to know. BLAS++ and LAPACK++ use int64_t instead of long long.

weslleyspereira on 8 Jun 2021

@weslleyspereira So you at first liked this idea:

#if defined(WeirdNEC)
   #define WeirdNEC
#endif
#include "../cblas.h"

with /prefix/include/cblas.h and /prefix/include/netlib64/cblas.h, the latter locating the former? But you do agree now that it is a more robust solution to install a header that looks like this for a 64 bit build?

#if defined(WeirdNEC) || @HAVE_ILP64@
   #define CBLAS_INDEX long
   #ifndef WeirdNEC
   #define WeirdNEC
   #endif
#else
   #define CBLAS_INDEX int
#endif

(long vs. int64 is a different matter, but I am all for doing that change, just like BLAS++)

Heck, I am not even sure if it is safe to assume that `#include ".../cblas.h" will find only the other indented header. The C standard seems to say that search order is implementation-defined, not necessarily relative to the current header. My main issue as packager is that I'd need a separate package for that common header or have the 64 bit package depend on the 32 bit one just for that. This would suck.

I would really like to go forward now with such a change for pkgsrc, to settle a change for the upstream code later. We could discuss a new symbol to force 32 bit indices or 64 bit indices explicitly with any of the headers (-DNETLIB_INDEX_BITS=64 ?), just defaulting to what the library was built with.

Can I get agreement about our intended solution being this?

lib/libcblas64.so
include/optional_subdir64/cblas.h

and

lib/libcblas.so
include/optional_subdir/cblas.h

Each build of LAPACK code results in headers that, at least by default, match the installed libraries without the user defining anything. OK?

I then could slip this in before the upcoming release of pkgsrc (deadline nearing) and we can further discuss the details of that implementation so that I can drop the patches after merging something here, with a new LAPACK release. With this change, the plain Makefile build also needs fixing, but I don't need that for _my_ patches yet when I just use the CMake build.

(Just need to somehow check my temper when trying to beat that weird CMake build into submission, where it shuffles header copies around the build directories and then cannot find them for install. Or decide about those broken .cmake files having any use for us, maybe just drop them from install … we got pkg-config!)

drhpc on 9 Jun 2021

Anything? I must admit that I don't see much chance for a different solution in practice, as this is the example set by openblas, the main implementation we use. I can imagine convincing Intel to have subdirectory for 64 bit/32 bit index headers, too, wrapping over their mkl_cblas.h and mkl_lapacke.h. Otherwise I build a simple package that just provides those.

include/mkl-blas/cblas.h
include/mkl-blas64/cblas.h

Currently, I added machinery to pkgsrc to provide builds with the funny -DWeirdNEC -DHAVE_LAPACK_CONFIG_H -DLAPACK_ILP64 line, with both cblas and cblas64 installing identical headers. It could stay that way, but I still think it makes sense to have the header set up to match the build ABI.

drhpc on 10 Jun 2021

👍1

@weslleyspereira So you at first liked this idea:
#if defined(WeirdNEC)
   #define WeirdNEC
#endif
#include "../cblas.h"
with /prefix/include/cblas.h and /prefix/include/netlib64/cblas.h, the latter locating the former? But you do agree now that it is a more robust solution to install a header that looks like this for a 64 bit build?
#if defined(WeirdNEC) || @HAVE_ILP64@
   #define CBLAS_INDEX long
   #ifndef WeirdNEC
   #define WeirdNEC
   #endif
#else
   #define CBLAS_INDEX int
#endif

Yes, that's it. I agree with your solution of having subfolders for the 32- and 64-bits headers. I discussed this with @langou, and he was also convinced this would be a good solution.

(long vs. int64 is a different matter, but I am all for doing that change, just like BLAS++)

Right. This should be addressed in another issue.

I would really like to go forward now with such a change for pkgsrc, to settle a change for the upstream code later. We could discuss a new symbol to force 32 bit indices or 64 bit indices explicitly with any of the headers (-DNETLIB_INDEX_BITS=64 ?), just defaulting to what the library was built with.

Can I get agreement about our intended solution being this?
lib/libcblas64.so
include/optional_subdir64/cblas.h
and
lib/libcblas.so
include/optional_subdir/cblas.h

Yes. I think you can go forward and propose a PR in the future, thanks! I personally think a new symbol like NETLIB_INDEX_BITS makes total sense. I would just make sure the default value remains 32, and that -DWeirdNEC implies -DNETLIB_INDEX_BITS=64.

Each build of LAPACK code results in headers that, at least by default, match the installed libraries without the user defining anything. OK?

Sounds good to me.

I then could slip this in before the upcoming release of pkgsrc (deadline nearing) and we can further discuss the details of that implementation so that I can drop the patches after merging something here, with a new LAPACK release. With this change, the plain Makefile build also needs fixing, but I don't need that for _my_ patches yet when I just use the CMake build.

Ok! We will probably have a LAPACK release in the second semester of 2021. And yes, the Makefile should be adjusted accordingly, and I am willing to help with that.

weslleyspereira on 11 Jun 2021

This is somewhat related. We should not forget that the headers for netlib CBLAS are not only provided by netlib … NumPy always uses its own header:

https://github.com/numpy/numpy/blob/main/numpy/core/src/common/npy_cblas.h

And in this header, it sets CBLAS_INDEX=size_t, different from the integer type used in specifying indices. It is used solely for the return values of some functions:

$ grep CBLAS_INDEX ./numpy/core/src/common/npy_cblas_base.h                                                                                                                                  
CBLAS_INDEX BLASNAME(cblas_isamax)(const BLASINT N, const float  *X, const BLASINT incX);
CBLAS_INDEX BLASNAME(cblas_idamax)(const BLASINT N, const double *X, const BLASINT incX);
CBLAS_INDEX BLASNAME(cblas_icamax)(const BLASINT N, const void   *X, const BLASINT incX);
CBLAS_INDEX BLASNAME(cblas_izamax)(const BLASINT N, const void   *X, const BLASINT incX);

The difference:

$ grep cblas_isamax ./numpy/core/src/common/npy_cblas_base.h  /data/pkg/include/cblas.h                                                                                                      
./numpy/core/src/common/npy_cblas_base.h:CBLAS_INDEX BLASNAME(cblas_isamax)(const BLASINT N, const float  *X, const BLASINT incX);
/data/pkg/include/cblas.h:CBLAS_INDEX cblas_isamax(const CBLAS_INDEX N, const float  *X, const CBLAS_INDEX incX);

I wonder if that's possibly causing trouble. For Netlib, there is only one type of index, while other implementations use a differing return value type for the index functions. OpenBLAS sets the example. They say isamax returns unsigned size_t, but the C wrapper actually calls a Fortran function that returns a signed integer (Edit: A subroutine that writes a signed 32 or 64 bit integer value to a handed-in reference to an unsigned 64 bit variable on 64 bit systems).

Does the reference implementation have an opinion about this? I _guess_ there is no real trouble, as a size_t value will always be able to hold any non-negative return from isamax(). But it smells iffy. (Edit: You could build with 64 bit indices on a 32 bit system where size_t is 32 bits, right? Then you got overflow. In additon to the uneasyness of casting size_t * to int *.)

Since optmized implementations seem to have decided on size_t there, should the reference accept that fact and follow?

drhpc on 12 Jun 2021

And how dangerous is it, actually, to link numpy with reference cblas?

drhpc on 12 Jun 2021

OpenBLAS sets the example. (...)
Since optmized implementations seem to have decided on size_t there, should the reference accept that fact and follow?

I certainly cannot speak for numpy (or mkl etc. for that matter), but I'd hesitate to claim OpenBLAS to be normative in any form, least of all relative to what is (I believe) generally seen as __the__ reference implementation...

martin-frbg on 12 Jun 2021

Sure. It's just that OpenBLAS or MKL are what people use in practice and both seem to have settled on

#define CBLAS_INDEX size_t  /* this may vary between platforms */
#ifdef MKL_ILP64
#define MKL_INT MKL_INT64
#else
#define MKL_INT int
#endif
CBLAS_INDEX cblas_isamax(const MKL_INT N, const float  *X, const MKL_INT incX);

or similarily

#ifdef OPENBLAS_USE64BITINT
typedef BLASLONG blasint;
#else
typedef int blasint;
#endif
#define CBLAS_INDEX size_t
CBLAS_INDEX cblas_isamax(OPENBLAS_CONST blasint n, OPENBLAS_CONST float  *x, OPENBLAS_CONST blasint incx);

vs. the reference

#ifdef WeirdNEC
   #define CBLAS_INDEX long
#else
   #define CBLAS_INDEX int
#endif
CBLAS_INDEX cblas_isamax(const CBLAS_INDEX N, const float  *X, const CBLAS_INDEX incX);

How come that they diverge from the reference here? Was there communication about that? Also … I see MKL and OpenBLAS defining a host of functions that aren't even part of reference CBLAS:

CBLAS_INDEX cblas_isamin(const MKL_INT N, const float  *X, const MKL_INT incX);
CBLAS_INDEX cblas_idamin(const MKL_INT N, const double *X, const MKL_INT incX);
CBLAS_INDEX cblas_icamin(const MKL_INT N, const void   *X, const MKL_INT incX);
CBLAS_INDEX cblas_izamin(const MKL_INT N, const void   *X, const MKL_INT incX);

CBLAS_INDEX cblas_isamin(OPENBLAS_CONST blasint n, OPENBLAS_CONST float  *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_idamin(OPENBLAS_CONST blasint n, OPENBLAS_CONST double *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_icamin(OPENBLAS_CONST blasint n, OPENBLAS_CONST void  *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_izamin(OPENBLAS_CONST blasint n, OPENBLAS_CONST void *x, OPENBLAS_CONST blasint incx);

CBLAS_INDEX cblas_ismax(OPENBLAS_CONST blasint n, OPENBLAS_CONST float  *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_idmax(OPENBLAS_CONST blasint n, OPENBLAS_CONST double *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_icmax(OPENBLAS_CONST blasint n, OPENBLAS_CONST void  *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_izmax(OPENBLAS_CONST blasint n, OPENBLAS_CONST void *x, OPENBLAS_CONST blasint incx);

CBLAS_INDEX cblas_ismin(OPENBLAS_CONST blasint n, OPENBLAS_CONST float  *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_idmin(OPENBLAS_CONST blasint n, OPENBLAS_CONST double *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_icmin(OPENBLAS_CONST blasint n, OPENBLAS_CONST void  *x, OPENBLAS_CONST blasint incx);
CBLAS_INDEX cblas_izmin(OPENBLAS_CONST blasint n, OPENBLAS_CONST void *x, OPENBLAS_CONST blasint incx);

So, extending the standard is one thing, but size_t vs int seems to be a serious issue on 64 bit systems. This should be settled some way. It seems to me that the Netlib way is sensible: Same type as that which is used for indices. As all call Fortran routines like this in the end

c     isamaxsub.f
c
c     The program is a fortran wrapper for isamax.
c     Witten by Keita Teranishi.  2/11/1998
c
      subroutine isamaxsub(n,x,incx,iamax)
c
      external isamax
      integer  isamax,iamax
      integer n,incx
      real x(*)
c
      iamax=isamax(n,x,incx)
      return
      end

… handing in an address of size_t for iamax, that just seems wrong. I didn't find another implementation than this reference one in the OpenBLAS sources. Are they just stupid changing the external type like that or am I overlooking something very basic? Is anyone actually using these functions?

drhpc on 12 Jun 2021

Hi all, Reference BLAS, reference CBLAS, reference LAPACK, two of the main thrusts of these projects are (1) numerical algorithms and (2) defining common interfaces, a reference implementation and a test suite that goes this. I think everyone involved in these projects is happy to look and learn from other projects (OpenBLAS, MKL, etc.) about software engineering, best practice for deploying the software, etc. We have a lot to learn from these projects. (And we also learn a lot from other numerical linear algebra projects!) Anyhow: reference BLAS, CBLAS, LAPACK can use some improvement in its CMake packaging, interfaces, and if OpenBLAS (e.g.) has a better process, that is well suited for us, well, I am all in favor moving toward this model.

langou on 12 Jun 2021

To add some context, the CBLAS was born from a committee (the Basic Linear Algebra Subprograms Technical Forum) that worked from 1996 to 2000 on revisiting the BLAS, as part of this they defined a C interface for the BLAS. See:
http://www.netlib.org/blas/blast-forum/
In particular see:
http://www.netlib.org/blas/blast-forum/cinterface.pdf
I believe that the CBLAS offered by LAPACK is an implementation of the interface as defined by the Basic Linear Algebra Subprograms Technical Forum 25 years ago.

If there are suggestions to improve CBLAS, send them along. I can try to pass this to the various stakeholders.

langou on 12 Jun 2021

Thanks for the pointer. So the relevant part seems to be B.2.2 in that spec, which says that BLAS_INDEX usually is size_t, but might also be chosen to be identical to the (signed) Fortran integer type used for indexing. It's up to the implementation.

So it seems that popular optimized implementations chose size_t and the Netlib reference chose the same integer it uses for Fortran. I see copies of cblas.h all around in various projects that use the lib (like numpy, shipping a header for an external lib), with that line

#define CBLAS_INDEX size_t  /* this may vary between platforms */

In https://github.com/LuaDist/gsl/blob/master/cblas/gsl_cblas.h, this is accompanied by

/* This is a copy of the CBLAS standard header.
 * We carry this around so we do not have to
 * break our model for flexible BLAS functionality.
 */

This sounds like this originated in the reference implementation, but has since changed? Looking at 41779680d1f233928b67f5f66c0b239aecb42774 … I see that the CBLAS_INDEX switch with WeirdNEC has been there before the 64 bit build. Wow, is this commit recent. Now I see that size_t was in the referenc cblas.h until 2015, 83fc0b48afd1f9a6d6f8dddb16e69ed7ed0e7242 having changed it and introduced the WeirdNEC define. I did not imagine that this is so recent! Wildly confusing.

I also see that the earlier version of cblas.h handed an int to the fortran call, now CBLAS_INDEX. This seems to be correct now, with consistent usage of CBLAS_INDEX as integer type and the switch for 32 or 64 bits in the Fortran part.

But could it be that the optimized libraries that inherited an old version of cblas.h with size_t but sync sources with current CBLAS code from the reference have a nice bug going? Aren't they doing something like this for the 32 bit case on a 64 bit system?

#include <stdio.h>
#include <stdlib.h>


void ia_max(int a, void *b)
{
    int *ib = (int*)b;
    *ib = a*2;
}


int main(int argc, char **argv)
{
    int i = atoi(argv[1]);
    size_t maxi;
    ia_max(i, &maxi);
    printf("in %d out %ld\n", i, (long)maxi);
    return 0;
}

This results in

$ gcc -O -o t t.c
$ ./t 42
in 42 out 140724603453524

Iniitalizing the size_t value to zero helps, but probably only in the little-endian case. Does nobody get into trouble for this? I have to be missing something.

To conclude:

Reference CBLAS had the size_t as return value first.
It used int in the actual call to Fortran, though.
Downstream (optimized BLAS, CBLAS users) run with the old version of the header.
Reference CBLAS introduces WeirdNEC hack for a specific system, replacing size_t with int or long (matching the Fortran side?!)
64 bit Reference CBLAS interface gets build on top of that, using CBLAS_INDEX everywhere for Fortran default integer.
Downstreams did their own thing with 64 bit support, but separating it from CBLAS_INDEX, which is always size_t.
Downstreams inherit the CBLAS wrappers that employ CBLAS_INDEX to call Fortran that expects default integer.

This sounds like a wonderful breakage as a result. Headers and code diverged. How come nobody noticed issues yet? Or did I miss the part where the reference CBLAS wrapper code for isamax and friends is not actually used?

drhpc on 13 Jun 2021

OpenBLAS at least does not use the CBLAS wrapper code from Reference-LAPACK (and never did, the source is there but does not get built)

martin-frbg on 13 Jun 2021

@martin-frbg Good to know. Can you point out a code path for, say, x86-64 that shows how the size_t is passed around to the actual computation for cblas_isamax()? I found some specific kernel implementation but am not sure about the general case.

Would be good to know that nobody actually passes a (size_t*) to the Fortran interface.

For sure it's not good that projects just assume

size_t cblas_isamax(…)

when the actual library might offer an int or long (or int64_t) as return value. Might work most of the time with values in 64 bit registers, but it's not nice. Can we rectify this in the implementations? People did not pick up on the example of Netlib in the past 5 years about consinstent use of CBLAS_INDEX.

drhpc on 13 Jun 2021

relevant code is in OpenBLAS/interface, e.g. interface/imax.c gets compiled to cblas_isamax() when CBLAS is defined, no Fortran code involved in its call graph.

martin-frbg on 13 Jun 2021

Ah, good. So the one case which is actually problematic is depending projects using a copy of cblas.h that does not fit the library.

I don't find actual usage of cblas_isamax() and friends in NumPy (and SciPy), so this might be only a theoretical issue. It should be fixed nonetheless. So:

Others follow the Netlib example of using int32_t/int64_t (let's be explicit while at that;-) BLAS_INDEX for both size returns and index arguments.
Netlib caves in and reverts to size_t for those returns like others.

Is this a separate issue to discuss? It does releate to the choice of 32 or 64 bit library, though.

PS: I am still unsure if enums in the API are a good idea (as actual data type for function arguments and struct members), as there are compiler options to change what integer is used underneath them. Not that relevant in practice, but makes me uneasy nevertheless.

drhpc on 13 Jun 2021

The more I think of this, the more I lean towards Option 2: We had size_t in the API for a very long time. Then Netlib changed that size_t to int or long. Regardless of what better matches the Fortran code or might be more consistent, the size_t was established API and Netlib reference broke that.

Should I open a PR about changing things for

size_t cblas_isamax(const CBLAS_INDEX N, const float  *X, const CBLAS_INDEX incX);
size_t cblas_idamax(const CBLAS_INDEX N, const double *X, const CBLAS_INDEX incX);
size_t cblas_icamax(const CBLAS_INDEX N, const void   *X, const CBLAS_INDEX incX);
size_t cblas_izamax(const CBLAS_INDEX N, const void   *X, const CBLAS_INDEX incX);

again? There should be no macro at this position anymore to emphasize that it's always size_t, everywhere, past and future.

drhpc on 14 Jun 2021

In https://github.com/numpy/numpy/issues/19243 we now basically got down to: „Screw Netlib, size_t works for everyone else“.

drhpc on 18 Jun 2021

There are three reasons to use size_t:

All of the C and C++ standard library functions accept and return this value, e.g., void* malloc(size_t), size_t strlen(), or std::size_t std::vector<T>::size() (C++). Using size_t avoids truncating values and signed/unsigned conversions.
size_t is often used to be express quantities that cannot be negative, e.g., matrix dimensions.
The C and C++ standards guarantee that you can store the size of any array in a size_t and that you can index all elements with size_t, cf. cppference.com: size_t.

Edit: You could build with 64 bit indices on a 32 bit system where size_t is 32 bits, right? Then you got overflow.

No because a 32-bit system may have more than 4 GB of virtual memory (Linux supports this) but a single 32-bit process can never access more than 4 GB. That is, the upper 32-bit of the 64-bit indices are never used.

_Memory limit to a 32-bit process running on a 64-bit Linux OS_

christoph-conrads on 20 Jun 2021

I'm also thinking that keeping size_t is the correct thing to do, because changing from it was the ABI break and brought Netlib out of sync with the rest of the world.

But I feel compelled to nitpick on your arguments;-)

1. All of the C and C++ standard library functions accept and return this value

When I researched this,I stumbled upon the admission that it was a historic error to use an unsigned type for C++ container indices, and probably even the return type of the size() method, as you quickly end up mixing signed and unsigned numbers in some way. The current state of Netlib would be consistent with itself, always using signed types for size and indices, but of course inconsistent with malloc(), which has a requirement for unsigned size to actually be able to address all memory that fits into 32 bits (or 64 bits, in theory).

I am wondering myself right about that in code I wrote where I eventually hand an index as offset to a function call. The index is unsigned, the offset signed. Apart from compilers (MSVC) being confused by -unsigned_value, this would mean that I always have to worry about possible overflow in conversion.

But anyway, if it's just about computing memory sizes to hand to malloc() and friends, size_t is the natural thing, and it has been there before in CBLAS.

On possible issues with the current state of the code, mismatch with vendored cblas.h in builds:

No because a 32-bit system may have more than 4 GB of virtual memory (Linux supports this) but a single 32-bit process can never access more than 4 GB. That is, the upper 32-bit of the 64-bit indices are never used.

Right, size_t stays 32 bit. When you (silly as it might be) built cblas_isamax() to return a 64 bit integer, after hacking the build to not use long, but int64_t, of course, what will really happen in such usage?

size_t cblas_isamax(); // really int64_t cblas_isamax()!
size_t value = cblas_isamax(…);

The x86 calling convention might put the 64 bit value into EAX and EDX. Or it might work with a pointer return and some buffer. But what would other architectures do? So you might not get corruption, but for sure a wrong value. The best case is the higher 32 bits being ignored.

Now imagine a big-endian 32 bit system (some form of ARM) … sure you'd even get the desired half of the value returned?

You couldn't really work with non-sparse data that needs 64 bit indices in the 32 bit program, sure. But just being able to make a non-matching function call that _at_least_ gives wrong results seems unhealthy.

I did some quick testing … on x86 Linux (gcc -m32 on an x86-64 system), you do just drop the upper 32 bits.

The more interesting case … 64 bit size_t:

size_t cblas_isamax(); // really int32_t cblas_isamax()!
size_t value = cblas_isamax(…);

Again, on x86-64, the peculiar relation between 64 bit RAX and 32 bit EAX makes things somewhat well-defined to also just silently zero the upper 32 bits once you do a 32 bit operation on the shared register. But there is fun to have with a slightly weird function defintion:

$ cat ret32.c 
#include <stdint.h>

int32_t ret64(int64_t a)
{
    a += 1LL<<32;
    return a;
}
$ gcc -m64  -g -c -o ret32.o ret32.c 
$ LANG=C objdump -S ret32.o 
[…]
   8:   48 89 7d f8             mov    %rdi,-0x8(%rbp)
    a += 1LL<<32;
   c:   48 b8 00 00 00 00 01    movabs $0x100000000,%rax
  13:   00 00 00 
  16:   48 01 45 f8             add    %rax,-0x8(%rbp)
    return a;
  1a:   48 8b 45 f8             mov    -0x8(%rbp),%rax

You could argue if it is smart for the compiler to work on the full 64 bit register and leave the upper 32 bits uncleared for a function that is expected to return a 32 bit value, but it is perfectly legal if you rely on the caller only using the lower 32 bits, I guess.

$ cat call.c 
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

INDEX ret64(int64_t);

int main(int argc, char **argv)
{
    if(argc < 2)
        return 1;
    int64_t a = (int64_t)strtoll(argv[1], NULL, 10);
    INDEX  s = ret64(a);
    printf("%lld\n", (long long)s);
    return 0;
}
$ gcc -m64 -g -DINDEX=int32_t -c -o call32_64.o call.c
$ gcc -m64 -g -DINDEX=size_t -c -o call64_64.o call.c
$ ./call32_64 1
1
$ ./call64_64 1
4294967297

Fun. A 32 bit return value that gives more than what's possible in 32 bits. This is what can happen (in principle) with the current state of Netlib CBLAS being linked with code that expects size_t. I guess though that the upper 32 bits of RAX will be zero in the actual code in practice. But who knows … compiler expects caller not to use more than the lower 32 bits on any platform … might as well store garbage there.

So … are we agreeing on moving Netlib back to size_t as return value?

drhpc on 21 Jun 2021

Thanks for all these valuable comments!

I discussed this topic a bit with @langou. Based on the discussion here, my proposal is:

In a separate PR:

We move back to a cblas.h that uses two integer definitions, let's say CBLAS_INDEX and CBLAS_INT. That is what happens in MKL (CBLAS_INDEX and MKL_INT) and OpenBLAS (CBLAS_INDEX and blasint). CBLAS_INDEX will be used only in the return of i*amax. With that, we restore an ABI that is compatible with other BLAS.
Additionally, we choose the default value of CBLAS_INDEX to be size_t and collect opinions from the community.

I think this is aligned (or maybe the same) idea behind the recent discussions in this thread.
As @drhpc pointed out,
https://github.com/Reference-LAPACK/lapack/commit/83fc0b48afd1f9a6d6f8dddb16e69ed7ed0e7242 changed the default value of CBLAS_INDEX, and
https://github.com/Reference-LAPACK/lapack/commit/41779680d1f233928b67f5f66c0b239aecb42774 changed the use of CBLAS_INDEX.

Just to reinforce:

OpenBLAS, MKL, GNU Scientific Library, and Numpy all use size_t by default.
The C interface for BLAS (https://www.netlib.org/blas/blast-forum/cinterface.pdf) indicates that, usually, CBLAS_INDEX = size_t.

Do you agree? If you do, I can open the PR. Or maybe @drhpc would like to do that.

weslleyspereira on 23 Jun 2021

👍1

I agree. And please just go ahead with the PR.

drhpc on 23 Jun 2021

👍1

@mgates3 mentioned to me the discussion on the Slate Google Group:
https://groups.google.com/a/icl.utk.edu/g/slate-user/c/f5y6gt0aoLs/m/oQyyhikwCgAJ
The discussion is not about what "CBLAS_INDEX" should be, but more what "CBLAS_INT" should be. Should CBLAS_INT be size_t or signed integer or etc.? I think participants are making good points so I am passing along.

langou on 23 Jun 2021

Please, see #588.

weslleyspereira on 23 Jun 2021

Right, size_t stays 32 bit. When you (silly as it might be) built cblas_isamax() to return a 64 bit integer, after hacking the build to not use long, but int64_t, of course, what will really happen in such usage?
size_t cblas_isamax(); // really int64_t cblas_isamax()!
size_t value = cblas_isamax(…);
The x86 calling convention might put the 64 bit value into EAX and EDX. Or it might work with a pointer return and some buffer. But what would other architectures do? So you might not get corruption, but for sure a wrong value. The best case is the higher 32 bits being ignored.

Now imagine a big-endian 32 bit system (some form of ARM) … sure you'd even get the desired half of the value returned?

This is game over. On 32-bit Arm CPUs, four 32-bit values can be passed and returned in registers, 64-bit values occupy two consecutive registers, see Section 6.1.1.1 in _Procedure Call Standard for the Arm Architecture_. Instead of writing to one register, the callee will clutter two registers with his 64-bit integers; this is obviously a problem. As soon as the caller runs out of registers for parameters, the stack is used. The stack alignment is 32 bits but instead of reading or writing 32 bit, the callee writes 64 bit; again, this is game over and this problem (mismatch of stack read/write sizes) should cause problems on all instruction set architectures at some point.

Arm ABI documentation

christoph-conrads on 27 Jun 2021

I am wondering myself right about that in code I wrote where I eventually hand an index as offset to a function call. The index is unsigned, the offset signed. Apart from compilers (MSVC) being confused by -unsigned_value, this would mean that I always have to worry about possible overflow in conversion.

No, the standard committees behind C and C++ make your code behave the obvious way in this case: If u is an unsigned value and s is a signed value, where u has at least as many bits as s, then u + s will yield the mathematically correct result unless u + s over- or underflows. If it under-/overflows, the result will wrap around, i.e., (u + s) mod 2^b, where b is the number of bits in u and s. On the other hand, if the signed type can represent all values of the unsigned type, then the unsigned value will be converted to the unsigned type.

The relevant clauses in the C11 standard draft are the following:

6.2.5.9: Binary operations with only unsigned operands cannot overflow; the result is taken modulo MAX + 1, where MAX is the largest representable value.
6.3.1.3: Given a signed value s, it is converted to the unsigned value s if s >= 0, otherwise it is converted to s + MAX + 1.
6.3.1.8: Signed and unsigned operands [of the same size] are converted to unsigned; an unsigned operand is converted to a signed type if the signed type can represent all values of the unsigned type

Hence, u + s (C syntax) will be evaluated to

(u + s) mod (M + 1) if s >= 0,
(u + s + M + 1) mod (M + 1) otherwise.

In the absence of over- or underflow, this expression will evaluate to u + s which is the intuitively desired outcome.

christoph-conrads on 27 Jun 2021

When I researched this,I stumbled upon the admission that it was a historic error to use an unsigned type for C++ container indices, and probably even the return type of the size() method, as you quickly end up mixing signed and unsigned numbers in some way.

There are some C++ programmers (including the inventor of C++) who are proposing to use signed integers everywhere, see the C++ Core Guidelines but I would not call this an admission. The problem with the "signed integers everywhere" policy are

checking for minimum values: with an unsigned integer it is in many cases superfluous to check the minimum value, with a signed integer it is obligatory; this is error prone and can cause security problems, see for example CWE-839 _Numeric Range Comparison Without Minimum Check_.
overflows: an unsigned overflow has a well-defined result whereas a signed integer overflow constitutes undefined behavior.

You can attempt to check for a signed overflow with the expression a + b < a but the compiler might optimize it away without warning, see for example GCC bug 30475 _assert(int+100 > int) optimized away_ from 2007. This would work with unsigned integers though (a unsigned, b possibly signed and b having at most as many bits as a). Seeing the article _OptOut – Compiler Undefined Behavior Optimizations_ from 2020, GCC behavior has apparently not changed.

christoph-conrads on 27 Jun 2021

Lapack: Allow a possible installation of index-64 library alongside standard index-32 library?

All 41 comments

Related issues