Lapack: Loss of precision in all single precision workspace queries, implicit and explicit

Created on 19 Jul 2021  ·  6Comments  ·  Source: Reference-LAPACK/lapack

Hello everybody,

recently I stumbled across an error when using ssyevd via the lapacke interface.
It points to a problem in the lapack interface in general. It goes like this:

According the the lapack standard interface, many routines like ssyevd you have to call twice:
Once for asking the routine how much scratch memory it needs for a certain matrix size,
and only then call the routine in earnest with the required scratch memory sections
as parameters.

If you look closely at this first call, which should return the required memory size, e.g.
for a routine like ssyevd, you see that even according to the lapack documentation,
the memory requirement is passed back via a pointer to a float value.
So when calculating the memory it goes through a series of values:

calculate memory       ->    store value in reference   ->  retrieve the value for use (allocation)
    int64                            float                              int64

It is int64 is in case of an ilp64 interface, otherwise it would be int32.
So in essence, we have an intermediary shortening of the memory value from
63 bit to 24 bit !!!
(or more precise, to 24bit between the outermost set bits in IEEE 754 float representation)
Even in the case of 32bit integers, you have a shortening of 31 bits to 24 bits.

So, if you would call the memory requirement calculation as in the past 'by hand'
you might have chance to see where it goes wrong, but still you cannot prevent the shorting
to a float value. If you use the automatic memory allocation via the modern lapacke interface
you don't even have an idea what could be wrong, as the routine advertises to take care
of all memory management by itself!

This happens in all single precision routines (s/c) in lapack, which calculate
the memory requirement as an intermediary step.
Switching to double precison would then use a double precision reference instead,
increasing the intermediary value to 53 bits instead, which is still not close to the 64 bits
one would assume with a 64bit interface.

Workaround, four possible ways:

  1. If you want to use single or complex lapack routines, do not use the automatic memory allocation via the C lapacke interface
  2. If you use the two-call lapack function method, for the memory calculation use the double(!) routine
  3. Have a look at the reference implementation of the lapack routine and calculate the required memory on your own
  4. Only use small matrix sizes when using float/complex matrices

Other people have stumbled upon this, but didn't follow it through to the real cause, e.g.
openBLAS build with int64 support fails on valid input for ssyevd

One has to stress, that at least with the two-call method, this is not a bug, but a design flaw.
In case of the lapacke automatic memory allocation, it has to be considered a fairly severe bug.

Regards,

oxydicer

Bug

Most helpful comment

Changing lwork to be IN/OUT would be a nice solution originally, but it is not backwards compatible. The application would then have to know whether the LAPACK version was <= 3.10 (say) or > 3.10 to know where to get the lwork. Worse, there are instances where applications pass in a const value — expecting it to remain const — so LAPACK changing its behavior to overwrite that value would be very detrimental (UB). For instance, in MAGMA:

    const magma_int_t ineg_one = -1;
    ...
            magma_int_t query_magma, query_lapack;
            magma_zgesdd( *jobz, M, N,
                          unused, lda, runused,
                          unused, ldu,
                          unused, ldv,
                          dummy, ineg_one,  // overwriting ineg_one would break MAGMA
                          #ifdef COMPLEX
                          runused,
                          #endif
                          iunused, &info );
            assert( info == 0 );
            query_magma = (magma_int_t) MAGMA_Z_REAL( dummy[0] );

The solution I proposed some years ago and implemented in MAGMA is simply in sgesdd, etc., to round the lwork returned in work[1] up a little bit, so the returned value is always >= the intended value. See https://bitbucket.org/icl/magma/src/master/control/magma_zauxiliary.cpp and use in https://bitbucket.org/icl/magma/src/master/src/zgesdd.cpp. (See a release for the generated single-precision version.) Basically replace

    WORK( 1 ) = MAXWRK

with

    WORK( 1 ) = lapack_roundup_lwork( MAXWRK )

where the function lapack_roundup_lwork rounds it up slightly, as magma_*make_lwork does. In MAGMA, I rounded up by multiplying by (1 + eps), using single-precision eps but doing the calculation in double. Then existing applications will behave correctly without any need to change their workspace queries.

After more testing, I found for lwork > 2^54, it needs to use C/C++/Fortran definition of epsilon = 1.19e-07 (aka ulp), rather than the LAPACK definition slamch("eps") = 5.96e-08 (aka unit roundoff, u). If using ulp, it looks like the calculation can be done in single.

All 6 comments

I guess it must have made sense at the time (to return the size via the work array pointer), but I wonder what keeps us from turning the size specifier into an in/out variable and passing back the exact value there as well? A "modern" caller would then check that first and resort to the work array member only if lwork was still -1, an "old" caller would notice no change.

Also probably the required LWORK back then was physically way too big to hit this problem and ilp64 made it obvious. I'm dreaming the days where NB value is derived at runtime instead of two-call scheme.

Here are two discussion threads related to this:
https://icl.cs.utk.edu/lapack-forum/viewtopic.php?t=1418
http://icl.cs.utk.edu/lapack-forum/archives/lapack/msg00827.html

This is especially an issue for algorithm that require an O(n^2) workspace. For algorithm that requires an O(n*nb) workspace, this is less of an issue.

Yes, this is a design flaw.

@martin-frbg: how would your proposed change for the C _work interface? We have LWORK as INPUT only there. Changing LWORK to INPUT/OUTPUT there is a major change. Do you have an idea to solve this issue? See:
https://github.com/Reference-LAPACK/lapack/blob/aa631b4b4bd13f6ae2dbab9ae9da209e1e05b0fc/LAPACKE/src/lapacke_dgeqrf_work.c#L35

I was thinking that we could also create some workspace allocation subroutines such as LAPACK_dgeqrf__workspace_query( ) and this would return the workspace needed.

Welp, my cunning plan does not really work when sober...
But these are actually two problems I think, one the work size overflowing a lapack_int and the other "only" a misrepresentation due to limited precision - I wonder if it would be possible to round up the calculated size to anticipate the latter at the expense of "some" unused memory ?

Yes, this is a design flaw.

I can see 2 different flaws here:

  1. On LAPACK: routines return the work size using a real variable.
  2. On LAPACKE: routines propagate the flaw from LAPACK.

@martin-frbg's idea is a good solution to (1). New Fortran code could use the return value of LWORK instead of WORK(1). We can try modifying the code with some (semi-)automatic procedure for replacement. In ssyevd.f, for instance, we could replace

  ELSE IF( LQUERY ) THEN
     RETURN
  END IF

by

  ELSE IF( LQUERY ) THEN
     LWORK = LOPT
     RETURN
  END IF

Adding LAPACKE_dgeqrf__work_query(), as @langou suggests, solves (2), although there is a lot of work associated with this modification.

Changing lwork to be IN/OUT would be a nice solution originally, but it is not backwards compatible. The application would then have to know whether the LAPACK version was <= 3.10 (say) or > 3.10 to know where to get the lwork. Worse, there are instances where applications pass in a const value — expecting it to remain const — so LAPACK changing its behavior to overwrite that value would be very detrimental (UB). For instance, in MAGMA:

    const magma_int_t ineg_one = -1;
    ...
            magma_int_t query_magma, query_lapack;
            magma_zgesdd( *jobz, M, N,
                          unused, lda, runused,
                          unused, ldu,
                          unused, ldv,
                          dummy, ineg_one,  // overwriting ineg_one would break MAGMA
                          #ifdef COMPLEX
                          runused,
                          #endif
                          iunused, &info );
            assert( info == 0 );
            query_magma = (magma_int_t) MAGMA_Z_REAL( dummy[0] );

The solution I proposed some years ago and implemented in MAGMA is simply in sgesdd, etc., to round the lwork returned in work[1] up a little bit, so the returned value is always >= the intended value. See https://bitbucket.org/icl/magma/src/master/control/magma_zauxiliary.cpp and use in https://bitbucket.org/icl/magma/src/master/src/zgesdd.cpp. (See a release for the generated single-precision version.) Basically replace

    WORK( 1 ) = MAXWRK

with

    WORK( 1 ) = lapack_roundup_lwork( MAXWRK )

where the function lapack_roundup_lwork rounds it up slightly, as magma_*make_lwork does. In MAGMA, I rounded up by multiplying by (1 + eps), using single-precision eps but doing the calculation in double. Then existing applications will behave correctly without any need to change their workspace queries.

After more testing, I found for lwork > 2^54, it needs to use C/C++/Fortran definition of epsilon = 1.19e-07 (aka ulp), rather than the LAPACK definition slamch("eps") = 5.96e-08 (aka unit roundoff, u). If using ulp, it looks like the calculation can be done in single.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

JHenneberg picture JHenneberg  ·  10Comments

5tefan picture 5tefan  ·  3Comments

Peter9606 picture Peter9606  ·  7Comments

weslleyspereira picture weslleyspereira  ·  5Comments

christoph-conrads picture christoph-conrads  ·  26Comments