Pytorch: Integrating complex tensors

Created on 16 Feb 2017  ·  128Comments  ·  Source: pytorch/pytorch

New description from @ezyang:

Work is in progress at https://github.com/Roger-luo/pytorch-complex

Organizational principles

  • Complex tensor support is important to PyTorch, and we will accept patches to core which add small amounts of code to make adding complex support.
  • Adding complex involves writing a lot of new kernels and code: we'd like this code to initially live out of repo, so it is easier for people to iterate quickly on them without having to go through the PyTorch main code review process. We will NOT commit to reviewing large new kernels in the short term, but eventually we would like all the kernels to come back to PyTorch.
  • The external library will be buildable separately from PyTorch, so you will be able to maintain it as a separate repository without having to merge with PyTorch (and deal with loads of merge conflicts).

    • PyTorch may occasionally make breaking changes in C++ API; if you bring these to our attention we will do our utmost to help solve these problems.

  • The hooks needed for this will NOT ship with PyTorch 1.0, but they will ship with a released version of PyTorch in the not too distant future.

How will I work on complex kernels?

Here is what the workflow will look like in the steady state.

PyTorch will natively contain APIs for referring to the complex dtype, but they won't do anything by default. PyTorch defines torch.complex64 and torch.complex128 referring to complex tensors. However, if you try to construct a tensor this way, by default, PyTorch will error:

>>> torch.zeros({2,2}, dtype=torch.complex64)
RuntimeError: complex64 not supported by PyTorch

@ezyang provided a patch which adds these dtypes to PyTorch. https://github.com/pytorch/pytorch/pull/11173

In the mid-term, we will merge support for basic functionality (like allocating a tensor of zeros) to be supported by PyTorch natively. A reasonable proxy for what support is “basic” is PyTorch's native support for CPU half tensors (which are extremely impoverished).

PyTorch publishes an interface for registering an implementation of complex tensors. The implementation inherits from the TypeDefault class (https://github.com/pytorch/pytorch/pull/11013) and will override methods on this class to define implementations of functions for which we have complex implementations. It will look something like this:

struct CPUComplexFloatType final : public TypeDefault {
  virtual Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1) const override {
    // Your implementation of add for complex tensors
  }
  // ...
}

This class will override exactly the types which are supported for complex; all other implementations are provided by TypeDefault and will error by default.

There will be a canonical listing of methods supported on Type (the overall interface) as an autogenerated file that is checked into the PyTorch source repository; we'll communicate API changes by diffs to this file. In general, the methods are in one-to-one correspondence with their corresponding names in the PyTorch frontend.

In general, when you use an operation which you haven't implemented yet,

WARNING: We intend to refactor Type away into a new system that also supports open registration of new operations (this obviously doesn't work if you have a single superclass that defines all the methods you might possibly want to support). Thus, try not to get too tied to the particular implementation strategy of writing Type as a subclass.

To publish new, complex only operations, you will use the C++ extension API. The C++ extension API is documented at https://pytorch.org/tutorials/advanced/cpp_extension.html Essentially, you can write a C++ function like:

at::Tensor imag(at::Tensor z) {
  ...
}

And then the C++ extension API will generate a Python binding so that you invoke this function from Python.

Some operations will be “easy” to integrate into PyTorch as it exists today. For example, for implementation of binary operations, it probably makes more sense to extend add_kernel in BinaryOpsKernel.cpp so that it dispatches over complex types (and then you get it for free, because std::complex implements addition). As long as these patches are small and self-contained, we promise to merge them on a timely basis.

It should ALWAYS be possible to unblock, by just writing an override on Type instead of using existing infrastructure, and doing liberal copy pasting. But let's avoid it when it's easy!

Autograd. As long as you're working on operations which already have derivative formulas defined for them, you will “automatically” get autograd support, as long as you implement complex support for all the constituent functions which are invoked in the backwards implementation from derivatives.yaml.

In some cases, we may need to adjust autograd formulas so that they work for complex numbers; e.g., the gradient of 'abs' isn't 'grad . self.sign()'. In these cases, all we need to do is upstream fix of changing the autograd formula of 'abs' to 'abs_backward', which is a function that can be overridden.

For general complex valued back propagation, there are some references:

  1. Akira’s “Complex Valued Neural Networks”.
  2. https://giggleliu.github.io/2018/02/01/complex_bp.html

Generally, we won't need to modify the autograd since in most cases we only calculate the derivatives of a real-valued function (the loss).

Work plan

Many of the necessary pieces are in place today, but they are not put together in an end-to-end way. Here is what needs to be done.

Short term integration plan. These operations are “easy” to implement, and so we should mainline them in PyTorch as soon as possible.

  • [X] Basic tensor factories: torch.empty, torch.zeros, torch.ones
  • [ ] CPU binary operations: add, sub, mul, div #11641
  • [ ] FFT
  • [ ] ???

Kernel implementation:

TODO: Generate a list based on https://github.com/Roger-luo/TH/blob/master/ChangeLog.md

Other complex related tasks:

  • [ ] Figure out the type promotion rules for complex tensors, and implement it in promoteTypes #11641

Historical issue content

Original comment from @PhilippPelz

I was wondering if there is interest in incorporating complex tensors into pytorch.
For CPU support there is ztorch and I have written z-cutorch ( https://github.com/PhilippPelz/z-cutorch ) a while ago. It is a fork off cutorch before the refactoring for CudaHalfTensor (don't have the hardware yet).
If it's not too much work, I would like to slowly integrate it with pytorch. I am using matplotlib for plotting via fb.ptyhon and it turns out a huge pain every time I reinstall my system (compiling all the dependencies), plus it seems pytorch will work under Windows soon, which one of my experiment PCs runs on.
I would also need complex gradients, so I would sooner or later touch autograd as well.
While tf supports complex tensors per se, it seems many ops don't support it yet (https://github.com/tensorflow/tensorflow/issues/2255), plus it seems a bit heavyweight for my purposes.

Maybe someone could say a few words how and where to start with this, if it's a welcome idea.

feature complex triaged

Most helpful comment

@sunilkpai, @boeddeker, @Randl,

Thanks for the report on the complex derivatives. I will try to follow that and I will get back on this next week. I thought I would add some links here and describe the project status.

The status of complex numbers is unofficially supported and must be added via PyTorch Extension:

Each extension contains two things:

  • A .cpp that contains any necessary math kernel registrations.
  • A test/ folder that contains very simplified versions of the pytorch test scripts.
    Look in the test scripts to see which kernels are supported (and why others are not).

Why can't I print a complex tensor to the console?

  • The Tensor python object has some pretty-print formatting that calls some functions that are unsupported.

    • You can modify the contents of tensor.py to bypass print formatting.

    • Or, you can simply convert Pytorch tensors to Numpy arrays and then print.

Current Project Status:

  • CPU coverage is pretty good.

    • The Kernels are implemented inside PyTorch under 'aten/src/ATen/native/cpu/</li> <li>Complex number specific code is under 'aten/src/ATen/native/cpu/zmath.h

    • Intel AVX256 acceleration is under 'aten/src/ATen/cpu/vec256/`



      • @sunilkpai: I did not know the optimization of exp. This is the folder where you add that.


      • Let me know if you are comfortable making the change.



  • GPU coverage is limited to binary and unary ops:

    • The Kernels are implemented inside PyTorch under 'aten/src/ATen/native/cuda/*</li> <li>Complex number specific code is under 'aten/src/ATen/native/cuda/zmath.cuh

    • thrust::complex<T> data types are used and they include the optimized kernels.

Current Development:

  • Waiting for C-based TH kernels to be ported to the C++ ATen folder.

    • The rand() function is needed to port the test cases to pytorch's internal tests.

    • Some indexing operations are currently not ported.

    • There are currently 168/1300 math kernels (down from 230 in October) that need to be ported from TH to ATen.

  • I will try to add complex number support as these kernels become available in ATen.

--

All 128 comments

I think we'd be interested in adding an optional support for complex tensors. The best way would be to fork and work on the C libraries in torch/lib. This should be conflict-free with master, so you can do this for a long time. Once you get the libs to a usable state, you can start writing the bindings, and this is where we can provide some guidance on how to avoid conflicts at that time.

I've got TH with complex types compiling. What do I need to add for the python integration?

@PhilippPelz do you mean like: https://github.com/facebook/ztorch/tree/master/lib/THZ ? or did you build your own fork of TH that enables complex types?

@killeent has some notes on how TH is bound to Python, he can share those.

In general, to get complex Tensors in, I'd prefer THZ, as it has tests, etc.

Building a CUDA backend for complex Tensors though, is a fairly big effort, we haven't even started on that.

I have written z-cutorch ( https://github.com/PhilippPelz/z-cutorch ) a while ago. It is a fork off cutorch before the refactoring for CudaHalfTensor (don't have the hardware yet).

This is great. I guess you already pushed a big effort in that direction :)

@soumith I did a fork of TH with complex types. Basically a THGenerateComplexTypes.h + added BLAS + LAPACK routines the rest was almost for free. It seemed a lot less work for me than checking what parts of THZ are compatible and then copy pasting.

I'm stuck with compiling THPP right now, figuring out compiler messages like

/home/philipp/projects/pytorch/torch/lib/tmp_install/include/TH/generic/THBlas.h:6:40: error: expected ‘,’ or ‘...’ before ‘*’ token
TH_API void THBlas_(swap)(long n, real *, long incx, real *, long incy);

is a bit tricky.

I'd appreciate help on how to enable python integration. CUDA backend should be mostly copy paste from z-cutorch.

@PhilippPelz here are some notes about PyTorch wraps TH: https://gist.github.com/killeent/4675635b40b61a45cac2f95a285ce3c0

@killeent thanks, looks very helpful. lib/build_all.sh is now compiling, I think I can look at the csrc directory.

This now runs:

import torch as th
import numpy as np

a = np.array([1+1j,2+2j])
b = np.array([3+3j,4+4j])
ath = th.from_numpy(a)
bth = th.from_numpy(b)
ath_cuda = ath.cuda()
ath_cuda += bth.cuda()
ath = ath_cuda.cpu()
print(ath.numpy())

Out: [ 4.+4.j 6.+6.j]

along with most of the math functions.
I'll add convenience functions and ffts over the next weeks. I guess there need to be tests for everything before you can merge this. If you know anyone else who is interested in complex tensors and would be willing to contribute to writing the tests, that would be awesome. This paper springs to mind: Deep Complex Networks, maybe those guys would be interested.
I won't have the time to write all the tests on my own.

@PhilippPelz Thanks for your comments. I'm checking your implementation. And firstly, I'm not sure about your ger implementation. Some complex blas functions are not included in your THBlas.c like you defined GER as zger_ and cger_ in the generate headers, but there is no blas function with cger_ in the generic/THBlas.c. Though, I can use your gemv and some other functions. And IMO maybe you should add .gch to .gitignore? Have you pushed all your extensions to your fork? I can make some pull request to your master based on your implementation first.

And for DOT I guess maybe for complex vectors, dotc routines for dot are more common?

And yes, if just use real will be easier for implementation, I was just feeling odd when real is actually a complex...

And for tests, I didn't see any previous tests for TH. Where should I write those tests? or we just write some python tests

Yes, sorry I see I may not have pushed everything that is needed. I will check it again on Monday. Some declarations are missing, eg. zger and cger

For DOT I'm using cdotc and zdotc, they seem to be missing, I'll update next week.

Check with the pytorch maintainers what naming they prefer for real. I like your version more, just didn't put in the effort yet.

Yes, python tests for the math stuff. Should be easily changed for most functions to also include compelx number checks.

Cool that your also looking into this!

Ok, I pushed some changes. TH blas routines are there now for complex

@PhilippPelz I just made a pull request to your repo. And for complex linear layers and some other operators. there could a lot hermitian operations (like bp for complex linear layer). Maybe add some function for a tensor? Have you check THNN part?

Yes hermitian is useful. cuda fft is working now. cpu fft could be wrapped from numpy. I haven't touched THNN or THCUNN yet.

@PhilippPelz I've add a simple hermitian in the PR. And could you review it. So we could see if those changes is suitable and move to next step. Thanks! PS. it seems you missed some headers, I also correct that and some other warnings. For complex function with real output, should we return a real tensor rather than complex tensor? I've implemented copy methods between complex and real types, so it is possible.

I'll rebase all the commits after your review.

@PhilippPelz Hi, I'm quite confused about THPP part you implemented. Why it has dependency on thrust in Traits.hpp?. This will cause error when compiling without cuda. Is it possible to only use like or in Traits.hpp? I haven't figure it out. Maybe you could offer some clues?

@Roger-luo Yes, I am also having some problems with that elsewhere. The complex types we are using should be either from complex.h or std::complex. Since THPP is the C++ wrapper, maybe std::complex is more appropriate. Can you please change that?

Thrust is also causing problems for the exact same reason when trying to build cffi extensions. Right now I am doing a workaround, but the proper way would be to change the complex type to cuFloatComplex/cuDoubleComplex in THC. so that the cffi compiler does not complain. I just want to get on with research right now, this is taking way too much time from me :( . If you have time, please do it.

Also, building cffi extension with custom kernel calls is quite cumbersome, because one always needs to create an extra library compiled with nvcc, which is then linked to a cffi wrapper. I guess there is no other way. One could use cffi in ABI mode, but the website says "The API mode instead compiles a CPython C wrapper that directly invokes the target function. It is, comparatively, massively faster (and works better than libffi ever can)."

@PhilippPelz maybe reinterpret_cast could be a solution? I guess it should be changed to cuComplex, and use reinterpret_cast in THPP. I'll have a try first...

Yes, I guess there is no other way than reinterpret_cast if you want THPP to build also without cuda installed.

@PhilippPelz I'd like to help. Is there any todo list anywhere?

THNN and THCUNN need to be enabled for complex types. Can you coordinate with with @roger-luo? Also, if we aim for integration with master, unit tests need to be written for all complex methods.

@elbamos Most of the work in THNN will be about implementing new complex backpropagtion methods for each existed layer. There is a WIP PR in Philipp's fork. I've listed some references.

@apaszke @soumith @PhilippPelz And there is two questions:

  • does anyone know why there is another GenerateXXXTypes.h file in THS? It looks the same with those in TH.

  • What is the following code in byte_order.cpp for?

void THP_decodeFloatBuffer(float* dst, const uint8_t* src, THPByteOrder order, size_t len)
{
  for (size_t i = 0; i < len; i++) {
    union { uint32_t x; float f; };
    x = (order == THP_BIG_ENDIAN ? decodeUInt32BE(src) : decodeUInt32LE(src));
    dst[i] = f;
    src += sizeof(float);
  }
}

void THP_decodeDoubleBuffer(double* dst, const uint8_t* src, THPByteOrder order, size_t len)
{
  for (size_t i = 0; i < len; i++) {
    union { uint64_t x; double d; };
    x = (order == THP_BIG_ENDIAN ? decodeUInt64BE(src) : decodeUInt64LE(src));
    dst[i] = d;
    src += sizeof(double);
  }
}

Any suggestions on implementing its related complex version? I'm not sure if the following implementation is correct...

void THP_decodeZFloatBuffer(std::complex<float>* dst, const uint8_t* src, THPByteOrder order, size_t len)
{
  for (size_t i = 0; i < len; i++) {
    union { uint64_t x; std::complex<float> cf;};
    x = (order == THP_BIG_ENDIAN ? decodeUInt64BE(src) : decodeUInt64LE(src));
    dst[i] = cf;
    src += sizeof(std::complex<float>);
  }
}

void THP_decodeDoubleBuffer(std::complex<double>* dst, const uint8_t* src, THPByteOrder order, size_t len)
{
  for (size_t i = 0; i < len; i++) {
    union { uint128_t x; std::complex<double> df;};
    x = (order == THP_BIG_ENDIAN ? decodeUInt128BE(src) : decodeUInt128LE(src));
    dst[i] = df;
    src += sizeof(std::complex<double>);
  }
}

Previous decodeUInt128XE is declared as

static inline uint128_t decodeUInt128LE(const uint8_t *data) {
  return (((uint128_t)data[ 0])<<  0) | (((uint128_t)data[ 1])<<  8)|
         (((uint128_t)data[ 2])<< 16) | (((uint128_t)data[ 3])<< 24)|
         (((uint128_t)data[ 4])<< 32) | (((uint128_t)data[ 5])<< 40)|
         (((uint128_t)data[ 6])<< 48) | (((uint128_t)data[ 7])<< 56)|
         (((uint128_t)data[ 8])<< 64) | (((uint128_t)data[ 9])<< 72)|
         (((uint128_t)data[10])<< 80) | (((uint128_t)data[11])<< 88)|
         (((uint128_t)data[12])<< 96) | (((uint128_t)data[13])<<104)|
         (((uint128_t)data[14])<<112) | (((uint128_t)data[15])<<120);
}

static inline uint128_t decodeUInt128BE(const uint8_t *data) {
  return (((uint128_t)data[15])<<  0) | (((uint128_t)data[14])<<  8)|
         (((uint128_t)data[13])<< 16) | (((uint128_t)data[12])<< 24)|
         (((uint128_t)data[11])<< 32) | (((uint128_t)data[10])<< 40)|
         (((uint128_t)data[ 9])<< 48) | (((uint128_t)data[ 8])<< 56)|
         (((uint128_t)data[ 7])<< 64) | (((uint128_t)data[ 6])<< 72)|
         (((uint128_t)data[ 5])<< 80) | (((uint128_t)data[ 4])<< 88)|
         (((uint128_t)data[ 3])<< 96) | (((uint128_t)data[ 2])<<104)|
         (((uint128_t)data[ 1])<<112) | (((uint128_t)data[ 0])<<120);
}

I'm currently use std::complex<T> instead of T _Complex in THPP. I'm not sure this is can be used by Python yet. Or only c type T _Complex is usable for python. So here the type of dst is std::complex<T>.

And if I'm correct for this implementation, probably we need a uint128_t implementation, like https://github.com/calccrypto/uint128_t ? Since it seems not all the compliers supports 128-bit integer (gcc has a int128_t and uint128_t).

@PhilippPelz i noticed your fork doesn't have issues enabled - what's the status with your project? i'm a little bummed that complex tensors aren't on the roadmap for pytorch

@el3ment I have added a complex backend for CPU https://github.com/pytorch/pytorch/pull/4899 But it is not reviewed yet... And I have not received any comments for my PR, thus I turned to use the Julia programming language recently...

I emailed @PhilippPelz last time, I guess his repo is still under v0.1 and he is busy for thesis until Sept? And I was working on v0.3's new CUDA backend though, but I don't have time to finish all these bindings alone. The map/reduce functions are different from v0.1 with some optimizations but they cannot be trivially converted to support complex numbers. I would be happy if there is anyone who is willing to help...

I’m willing to help.

On Apr 10, 2018, at 10:52 PM, Rogerluo notifications@github.com wrote:

@el3ment I have added a complex backend for CPU #4899

I emailed @PhilippPelz last time, I guess his repo is still under v0.1 and he is busy for thesis until Sept? And I was working on v0.3's new CUDA backend though, but I don't have time to finish all these bindings alone. The map/reduce functions are different from v0.1 with some optimizations but they cannot be trivially converted to support complex numbers. I would be happy if there is anyone who is willing to help...


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@elbamos cool, it seems that pytorch team prefer a separated implementation. I'll update my fork later for the other parts later. But I really do not have time on this and I guess we should start working on it when there is a plan from the pytorch team because this would be a big extension for pytorch.

Hi, my code is on some commit after v0.2

I had seen that there was a pretty big refactor moving all the tensor code into Aten. This means one cannot easily merge my fork into the current version and there might be some more work involved.

I am still writing up my Ph.D., but I was planning to wait for 0.4 anyway until the merge of Variable and Tensor is released. I fear there might be too much refactoring going on to catch up with it if one does it earlier.

@elbamos if you want you can start adding stuff to my fork, I will merge it in. In your place, I would just implement what you need for whatever project you're doing. TH(CU)NN is a pretty big interface and would be a huge workload.

@el3ment I don't have time to work on other's issues. I will, however, merge stuff in if you need to implement something that isn't there.

If you just want something that works with complex numbers out of the box I would highly recommend tensorflow.

I'll also help if there are compile problems.

If I continue with postdoc I will port all this stuff to the current version at some point. It's truly sad that facebook does not want to support this. :((

@PhilippPelz Agreed, it is really sad and actually tensorflow does not support all the operators in quantum physics... I have started using Julia and abandon python.

@Roger-luo interesting, are you using a specific julia package or is it all self-written code?

@PhilippPelz I am developing a quantum many-body toolkit in Julia (since that PyTorch PR), it includes a complex/real neural network implementation based on some previous papers about complex neural network, and I found it is so easy to develop with Julia's metaprogramming. I currently just put it in QMTK.jl, it is still working in progress and I have not finish all I want. PyTorch inspires me a lot indeed, but really sorry for the complex support...

But I do have plans to separate it to a neural network single package in the future (just don't want to maintain several repos at the moment). And there will be more people join the development from the Institute of Physics, CAS. I'll accept PRs after its first tagged version (which will be in a few weeks).

You can watch it if you are interested in its development.

If the PyTorch team still have plans for complex support in the future, I'll be willing to help though.

Cool, I'll keep an eye on it!

Hey guys, so sorry that we haven't responded on this issue since it opened.

Here are two facts:

  1. We absolutely agree that PyTorch needs complex support, and
  2. We don't have the manpower to adequately fill out the long tail that all complex operations would need. (For evidence of this, look at sparse support, which is in master and been limping along.)

Since this issue was opened back in 2017, a few important things have changed that may make implementing complex support a bit simpler. The first is that we now have ATen, an ergonomic C++ library for manipulating tensors. This means that you don't have to copy paste giant swathes of TH/THC code and hope you've gotten all of the manual refcounting right, you can write C++ code as if it were Python and it will run fast. Second is that we're working on a new version of ATen, called C10, which is much more serious about having open backends than ATen (which is a closed thing) which should make it easier to work on complex support, since it wouldn't entail actually forking PyTorch, just adding a new directory of code.

So, @Roger-luo and @PhilippPelz, we'd love to have your help making the complex backend a reality, but we'd really like to figure out a way to do it that helps us sustainably maintain it into the future. Let us know what you think.

@ezyang If you lack of manpower, I could try to maintain the complex tensor part in the future, I just started my PhD (and this is my gap year actually), and therefore I won't have the problem for writing my thesis in recent years at least. But I really cannot keep contributing without any feedback from pytorch team. I do think there should be a roadmap for this big extension. And we could add complex support smoothly so your guys won't need to review a large PR and it will ease developers' efforts on tracking master branch.

Firstly, I think the main problem about complex support would be the CUDA part. It is quite easy to support CPU part with ATen or any other libraries, I can rewrite the CPU part in just a few days if the there is any feedback. There is a few problems I might concern for CUDA part, and I think this might lead to two different approach:

  1. Use float2, etc. to simulate a single complex value like cuComplex do in the CUDA part.
  2. Use existing FloatTensor and DoubleTensor to simulate a complex tensor in ATen's C++ part.

The reason for second approach is because in THC, pytorch use some tricks to accelerate map/reduce operations and it is not suitable for cuComplex trivially because cuComplex is actually float2, but __shfl_xxx functions does not natively support float2. I'm not sure how to efficiently simulate such a function for float2 at the moment.

The second approach would be easier because we now don't need to care about the hardware, and we can make our new complex extension work on old devices much easier. However, this might cause some overhead due to in-contiguous memory address.

Besides, I found that to integrate complex number into ATen, we might have to handle four different types that is actually the same on hardware: std::complex, thrust::complex, cuComplex, float2 which might be dangerous sometimes. (in fact, I met this problem last year, and reinterpreter_cast was the solution).

I personally would prefer to write everything more native though.

And I think we probably need a timeframe or roadmap, and we can pick up each small part and work together so I won't need to track master myself which is totally impossible...

there was a ChangeLog when I was trying to implement the CPU backend, I classified functions needs to be modified for complex numbers in the log. We could write a road map based on this log.

Besides, since my visa was just rejected (by Australia), I have to start a gap year, if you need someone keep working on this, I could apply for an internship.

I thought a lot about this over the last day. It's a little sad that we couldn't merge Roger's effort as-is, but I thought to myself

"how can we build out complex Tensor support, while keeping low maintenance overheads?"

This is what I am laying out as an effective plan from the above goal:

  • Complex Tensors shouldn't be a fundamental new Tensor type, like sparse Tensors. Adding a fundamental type causes a lot of maintenance overhead and cross-cutting changes. The maintenance overhead is not about "who maintains the complex bits?", but more of "now all core-devs should be aware of this complex type when doing any fundamental changes, any ATen changes, etc."

    • Instead, they should always be [Tensor Shape x 2] or [2 x TensorShape], i.e. the Tensor should have one extra dimension with a size 2.

  • Complex Tensors should be a small file / folder of ~2k lines of simple C++ that are built on top of ATen Tensor API.

    • For example, as https://github.com/pytorch/pytorch/issues/6514 suggests, complex multiplication should be implemented as torch.stack([real1 * real2 - imag1 * imag2, real1 * imag2 + imag1 * real2], dim = -1) where real1 = input1[:, :, :, ..., 0]

    • This hurts performance: yes we wont get as much perf as if we inline everything. However, the question is: "by how much?". I think we should aim for 20% lower performance in exchange for a healthy and full-featured + maintained complex support.

    • The most used complex functions can start getting dedicated kernels, so that where performance is taking greater than 20% hit on a frequently used function, we step in.

It has to be [Tensor Shape x 2] as BLAS, cublas and MAGMA all expect their own complex types which are byte-compatible to float2. Also blas, cublas and magma calls cannot be handled on the python level.
I don't think it will be only 20% for complex multiplication, don't you have 4 full copy operations on top of the calculations for real and imag part?
Anyway, I'd still be happy if I don't have to merge in changes from master continuously.

Agree with @PhilippPelz , we might lose a lot performance since we will lose the complex support from BLAS, cublas and MAGMA. But I'm not sure about it. However, to be clear, complex Tensor is something completely different from sparse tensor, most libraries like scipy.sparse, and Julia's SparseArrays treat sparse array as a composition of fundamental multi-dimensional arrays. But nobody treat a multi-dimensional array with complex type by composite two real arrays... (nobody here I mean tensorflow, arrayfire, numpy and Julia). Though in MXNet, the FFT is accomplished by a composition of two real tensors indeed, they do not support complex... It seems that tensorflow implemented a DataType as a wrapper around differnet types including complex64 and complex128 see types.proto

About the performance loss

Firstly, the element-wise functions (functions calls map/reduce) will not have large performance loss (at least, the memory for these operations will be contiguous). But I think we should try to benchmark the some BLAS functions first, to see if a composition of FloatTensor have similar performance with Complex64Tensor on GPU, and how much we will lose on the performance with a draft implementation, like:

  • gemm
  • gemv

A composited complex tensor would be something looks like (or just use shared_ptr):

class ComplexTensor {
    FloatTensor *real;
    FloatTensor *imag;
};

However, as I mentioned in the disadvantage of first approach, functions like __shfl_xxx also looks like an obstacle if we want to do this more native.

currently torch.fft returns a single float tensor of shape [dim1, ..., dimN, 2]

@ezyang what is the timeframe for the C10 release? That sounds like a very reasonable point to start supporting complex in the master branch.

@PhilippPelz Definitely not for 0.4. We're internally targeting June, hope that is not too long to wait.

@ezyang you mentioned June, did you manage to add support for complex numbers to PyTorch?

I think he meant C10, not complex support. C10 is going to make adding complex easier. That's how I understood it.

Yes, C10 will have open registration of both Tensor types and functions. So adding a complex type as a separate package will be much easier.

Is there any ETA on complex numbers? Does "much easier" mean "will probably be done quickly"?

@themightyoarfish by much easier, I mean that we will not be blocked on what can be pushed to pytorch master. We haven't set an ETA. I'll scope out the work once we get open registration into PyTorch.

@soumith do you still need people to work on this (complex number)? PyTorch team will support complex numbers? I can spend sometime work on this if you want in September, since I will maintain QuCumber (it will use complex number heavily)

@Roger-luo yes. I wanted to reach out to you once we have open-registration available in the PyTorch backend, and we can work out details.
@ezyang will we have open type registration by September?

@soumith Cool, at your service.

We can make it happen. (We won't have the "full" new system in place, but as long as we set things up so that it's refactorable, we can keep moving it along as new developments happen. It will be a good test case for the new open registration. I can make sure this happens.)

@ezyang any notes by now? I could read though it before working on it. It seems things changed a lot since last time.

@Roger-luo @PhilippPelz I would also like to help you with the implementation of complex tensors. I also need it for my PhD investigations..

@alexgomezalanis maybe we could have a channel to discuss on the slack, I just created a channel call #complex-numbers. But I won't start working on it until September (still need to work on some of my Julia code...)

BTW, it seems changed a lot since last time. I'll use some time to catch up before I get my hands on it.

@alexgomezalanis I can't. you have to join pytorch's workspace on slack first. I cannot find you. Please send an email to the address: [email protected] to get an invitation.

@Roger-luo @alexgomezalanis Great to see life again on the complex tensor issue. I can offer to get involved as well, but realistically this won't happen until end of September / beginning of October. As for quite some commenters on this issue, complex tensor support would be very helpful for my PhD project.

I was also trying to save my research last year 😏… but now I just want to bring my old 1w+ loc code back to life again. 🤣 let's chat on slack!

:) Yeah, let's chat on slack. Just found the invitation in the mail folder.

The work in progress plugin (for CPU only in short term) is here: https://github.com/Roger-luo/pytorch-complex

Please feel free to give me issue and PR.

I've posted the notes about how complex implementation will be carried out at the top of this issue.

I recently started using PyTorch and I absolutely love it -- it's so much nicer to use than TensorFlow. However, complex tensor support is pretty critical for my research (optical neural networks). Is this still actively being worked on? If so, does anyone know a (loose) timeframe for complex tensor support?

I would be happy to help work on this where I can, but I'm relatively new to PyTorch, so I don't yet have a good idea of how big of an undertaking this feature is. Some of my labmates have also expressed keen interest in complex tensor support (in physics, adding this could make Torch almost a drop-in GPU-accelerated replacement for NumPy) and might be willing to help as well if it means getting complex support in the near future.

Hi @bencbartlett

I'm still trying to work on it slowly.... but I'm currently just a student as well (with a quite unstable situation), which means I can't work on this full time but only in spare time. (I implement my research related code in Julia from last year, which means only our legacy package need better complex number support from torch.).

If complex number is crucial to you and it is urgent to have in torch, I would suggest try this:

https://github.com/PIQuIL/QuCumber/blob/master/qucumber/utils/cplx.py

It is super slow... but it works at least. Or I had a C version in old TH style.

This won't be a small project that can be done in a few days. Therefore I can't guarantee any specific time frame for a full functional support with complex value on CPU or CUDA.

I would love to help you work with me together on this however. I would suggest you to start with trying to solve the issues I posted in the extension repo. And please feel free to ask me through slack or email or issue if you have questions (since there's not much docs yet).

I don't have access to the PyTorch Slack yet, unfortunately. (I emailed twice asking for an invitation, but haven't heard back.) Could someone invite me? ([email protected])

@Roger-luo I'll definitely take a look through your fork, but I can't promise I'll be much help -- my C++ is rusty and as you pointed out, it's hard to find time to work on this as a student. The QuCumber utilities are nice, but unfortunately they wouldn't be terribly helpful for me: until complex tensors are GPU-supported or are supported by autograd and torch.nn, they don't provide much utility above what NumPy can offer.

@soumith @ezyang It would be great to get more attention on this from the PyTorch team! Complex support seems like an important feature for a general tensor library to have, it's virtually essential in physics, and specifically within ML over the last few years, there's been a rapidly growing interest in complex-valued models.

@bencbartlett QuCumber's approach can be used on GPU with AD... it is just super slow... I mean if you just want that AD, you might be able to use it.

Yeah, frankly speaking, I'm using a slightly modified version of https://github.com/FluxML/Flux.jl and some my own package in Julia for research (I need complex AD on GPU with tensors in some situation as well). The source2source AD package Zygote.jl can do AD on complex tensors, but it is on very early stage which may have segment faults. The ecosystem is not that stable yet comparing to torch, I sometimes have to hack those implementation a little bit for self-use... But it just basically works for what I need for research in quantum physics. I can have complex tensors on GPU as well.

I don't think complex value support for torch.nn is needed, we might just need to add a few definitions for autograd once the complex tensor is functional, because things like linear layers can stay the same. And some activation function might not have a standard expansion in Hilbert space... (You can check my collaborator @GiggleLiu 's blog post)

For the pytorch-complex extension, I'm not sure when can we get full support with AD on GPU... this still seems pretty far to me. I would say the CPU implementation will go through some period that requires patches in the main tree (e.g type promotions, simd support, etc.), this might also related to the coming ATen implementation in C++ and get rid of TH, etc., and then we will be able to add operators for complex tensors faster.

I can apply for internships in Spring (which I just asked @ezyang about). So I might be able to work on this full time for several months before I start my PhD. Let's see.

In the meanwhile, I implemented my own version of the complex multiplication. However, when I profile it, it occurs that a substantial amount of time goes to: torch._C_._cuda_isDriverSufficient

image

Do you have any idea why? If you know of a better implementation of the complex multiplication then please let me know. Somehow, my version (even though optimized for the number of multiplications: 3 instead of 4) seems to be relatively slow, e.g. irfft of the out tensor is 10X faster than my element-wise multiplication. Is the complex multiplication supported on the C++ level of PyTorch?

def complex_mul(x, y, out):
    uavc = x[..., 0] * (y[..., 0] + y[..., 1])
    out[..., 0] = uavc - (x[..., 0] + x[..., 1]) * y[..., 1]
    out[..., 1] = (x[..., 1] - x[..., 0]) * y[..., 0] + uavc
def test_complex_mul_out_tensor(self):
        N, C, H, W, I = 128, 3, 32, 32, 2
        K = 16  # number of filter banks
        repetitions = 1000
        dtype = torch.float
        if torch.cuda.is_available():
            device = torch.device("cuda")
        else:
            device = torch.device("cpu")
        x = torch.randn(N, 1, C, H, W, I, dtype=dtype, device=device)
        y = torch.randn(K, C, H, W, I, dtype=dtype, device=device)
        start_mul_time = time.time()
        out = torch.empty(N, K, C, H, W, I, dtype=dtype, device=device)
        for _ in range(repetitions):
            complex_mul(x, y, out)
        print("multiplication time: ", time.time() - start_mul_time)

We are trying to support it from C++. see the post on the top. If you can compile the extension, it should work for scalar multiplication at least at the moment....

Your implementation is similar to what we have in QuCumber. It might call much extra GPU threads if you don't call the correct cuda kernel for complex number. And you might lose SIMD if you don't have C++ backend as support in Python.

I would suggest you run nvprof to get more details.

@Roger-luo @apaszke @soumith Thanks for this thread btw. I implemented a basic complex tensor hacked together from subclassing torch.Tensor.

I treat the first half as real, second as imaginary and implemented my own basic arithmetic operations and some others I need for my research.

I verified against Tensorflow and numpy. The gradients and all the ops I implemented match their outputs!

It's just intended as a holdover until PT fully supports complex tensors.

Features:

  1. Tests implemented.
  2. Pypi supported (ie: pip install)
pip install pytorch-complex-tensor

https://github.com/williamFalcon/pytorch-complex-tensor

Thanks @williamFalcon !

Any update one this? Just wondering if there will be a plan to integrate complex type support into the pytorch.

Hi, @whmrtm

@ezyang is working on https://github.com/Roger-luo/pytorch-complex/issues/4 Or whoever interested in this could help us make it run. This issue will solve some basic broadcast issue (then you can use a lot functions after this one is resolved). Please feel free to do any PR or ask me to add you as collaborator.

I won't be able to work on anything until summer, have to finish a fresh release for our own package.

Hi, @whmrtm

@ezyang is working on Roger-luo/pytorch-complex#4 Or whoever interested in this could help us make it run. This issue will solve some basic broadcast issue (then you can use a lot functions after this one is resolved). Please feel free to do any PR or ask me to add you as collaborator.

I won't be able to work on anything until summer, have to finish a fresh release for our own package.

Thanks for the update, I will see what I can do.

Hi @Roger-luo

Can I access slack channel related to complex tensors support topic ([email protected])? I emailed for an invitation but nothing happened yet. Right now I am trying to figure out points where to start contributing to this issue. I guess https://github.com/Roger-luo/pytorch-complex/issues/4 is a current entry point now?

@beconstant yes, that's the start point, this should make some broadcast function work, but I don't know why it throws type promotion error on cuda, it was working on CPU. (Although we don't intend to support cuda in the first place, this would cause build failure)

I can't send you invitation email (I don't have access). I think you should follow pytorch official guide to join slack. But we can always discuss in the issue/PR.

@Roger-luo ok, got it :)

Let me know if you guys need any help. I will start by building the specified pytorch version. Any progress on pytorch-complex/issues/4?

Let me know if you guys need any help. I will start by building the specified pytorch version. Any progress on pytorch-complex/issues/4?

@dylanbespalko Hi, I urgently need pytorch implemented in Complex-valued version.
Thank you very much for your contributions.

Best regards,
Zellar209

Hi @Zellar209,

I'm getting the feeling that @ezyang is working hard on one of the bigger issues (pytorch-complex/issues/4). I have a AMD system now and a Nvidia system in 3 weeks that I can use to ramp up the GPU support.

I guess the problem is just about the original type promotion change breaks CUDA, as long as that PR is solved it at least have some operators works on CPU, we don't have CUDA support at all yet...

IMHO I think we should focus on CPU and make things work first, then consider GPU later.

CPU only support is fine. Is this type promotion problem (pytorch-complex/issues/4 being handled internally by fb? Is ok to work on it externally?

Hi @dylanbespalko; I did tell @Roger-luo that I was going to look into it (because I was probably best placed to figure out what the problem is), but I haven't had time to look at it yet. If you do want to take a look into figuring out how to fix the problem, I'd be happy to advise.

Hi @Zellar209,

I'm getting the feeling that @ezyang is working hard on one of the bigger issues (pytorch-complex/issues/4). I have a AMD system now and a Nvidia system in 3 weeks that I can use to ramp up the GPU support.

Yes, I do not need any GPU now, I am using MAC system. But I have had some errors when building this project.

Hi @Zellar209 , could you post what you get in pytorch-complex's issue? I think there's something wrong with Mac's new Xcode, which make it hard to build. But people will need some more error msg to figure out why.

I asked about the OS and error msg, but you didn't reply...

Hi @dylanbespalko; I did tell @Roger-luo that I was going to look into it (because I was probably best placed to figure out what the problem is), but I haven't had time to look at it yet. If you do want to take a look into figuring out how to fix the problem, I'd be happy to advise.

Thank you for your early reply.

1. When I run "python setup.py install" using gcc (default), I get errors like this:

building 'torch_complex.cpp' extension
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/anaconda3/lib/python3.6/site-packages/torch/include -I/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/anaconda3/lib/python3.6/site-packages/torch/include/TH -I/anaconda3/lib/python3.6/site-packages/torch/include/THC -I/anaconda3/include/python3.6m -c src/module.cpp -o build/temp.macosx-10.7-x86_64-3.6/src/module.o -g -stdlib=libc++ -std=c++11 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=cpp
gcc: error: unrecognized command line option ‘-stdlib=libc++’

error: command 'gcc' failed with exit status 1

2. When I use clang to compile it, the errors are:

In file included from src/module.cpp:2:
In file included from src/CPUComplexType.h:60:
src/CPUComplexTypeImpl.h:102:105: warning: 'IntList' is deprecated [-Wdeprecated-declarations]
Tensor & CPUComplexType::set_(Tensor & self, Storage source, int64_t storage_offset, IntList sizes, IntList strides) const {
^
/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/ArrayRef.h:273:7: note: 'IntList' has been explicitly marked deprecated here
using IntList C10_DEPRECATED_USING = ArrayRef;
^
In file included from src/module.cpp:2:
In file included from src/CPUComplexType.h:60:
src/CPUComplexTypeImpl.h:105:76: error: no member named 'scalarTypeToDataType' in namespace 'at'
auto source_ = checked_storage(source,"source",2, DeviceType::CPU, at::scalarTypeToDataType(CPUComplexTypeInfo::scalar_type));
~~~~^
7 warnings and 2 errors generated.

error: command 'clang' failed with exit status 1

I cannot fix it. I really hope you can help me!

Hey guys,

Thanks for your feedback. I think I can spend the week looking into this. So far I have compiled have compiled @Roger-luo’s pytorch-complex as follows:

@Zellar209: I have attached my environment variables running on macOS 10.13.

  1. Delete existing pytorch distribution as follows
    conda uninstall pytorch
    pip uninstall torch
    pip uninstall torch # run this command twice
    python setup.py clean
    Delete torch folder in the python site-packages folder if it exists.
    Rename (or delete) previous pytorch source folder (something was referring to it).

  2. Install PyTorch revision 6cb593b88cb0c411690b4957850058329526d87b.

    git clone [email protected]:pytorch/pytorch.git
    git checkout 6cb593b88cb0c411690b4957850058329526d87b
    git submodule update --init —recursive
    export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../“}
    MACOSX_DEPLOYMENT_TARGET=10.13 CC=clang CXX=clang++ python setup.py develop
    python
>>> import torch
  1. Install pytorch-complex
    python setup.py install
    python setup.py build
    python setup.py test
    # ERROR: test (unittest.loader._FailedTest)
    # ERROR: test_scalar_binary_op (tests.test_tensor.TestComplexTensor)
  1. Create a complex tensor
   from torch_complex import torch
   a = torch.ones(3, dtype=torch.complex128)
   a*a  
   RuntimeError: promoteTypes with complex numbers is not handled yet; figure out what the correct rules should be

@ezyang, @Roger-luo:

Everything for type promotion for tensor operations seems to be done in c10/core/ScalarType.h
I have found the error AT_ERROR("promoteTypes with complex numbers is not handled yet; figure out what the correct rules should be”);
It looks like I have to add an entry for c8 and c16 inside this table.
Does this have anything to do with 9515? I think this is just for calling numpy functions.
Is that a good place to start?

9515 is unrelated. However, fixing this codepath in ScalarType.h is a good place to start.

I fixed the codepath in ScalarType.h
BinaryOps (add, sub, mul, div) working, but only if both arguments are Tensors.
Some other weird issues, but I need to look at it some more.

@dylanbespalko I've added type promotions here: https://github.com/pytorch/pytorch/pull/11641

You could just copy that, but the problem is this breaks CUDA somehow.

IIRC, there was a wire bug due to gcc version. I had some workarounds there.

Ah, thanks @Roger-luo. I was looking at the comments of #11641. I will do a better job of copying the code tomorrow.

How do I know when I have broken CUDA when I don't have a CUDA device? I'm assuming the CI will tell me?

Yes, as along as you submit a PR, it will tell you which one is broken. And if everything just pass, then we could just merge that and make things working.

Ok, then I will start submitting PRs so that I know when it happens.

@dylanbespalko Hi, there still seems some errors in your environment?
If you fix it, please share with us. Thanks a lot.

Hey guys,

I tried doing several PRs after copying several of @Roger-luo's commits. Unfortunately, I don't have a CUDA GPU right now and the CI machines with a CUDA are not initializing. I cannot recreate the CUDA test failure right now so I'll come back to this in a few weeks when I can locally run on that GPU. Looks promising, at least.

@ezyang, @Roger-luo

I have taken a look at Roger's PR #11641:

  • It builds and runs on my CUDA 9.0 machine
  • It fails to build on CI machines that are running CUDA 9.0

I have also taken a look at some of the recent PyTorch developments:

  • A presentation from @ezyang describing how to write a C++/CUDA extension that can define a custom device/layout/dtype.

    • A more recent PR #21964 which "removes the ComplexHooksInterface", but defines a complex number C++ extension located at pytorch/test/cpp_extensions/complex_registration_extension.cpp

It looks to me that a new "out-of-tree" extension capability is being developed, which would allow me to investigate complex number support without breaking the rest pytorch. My goal is to:

  1. Define Complex CPU support without AVX.
  2. Define Complex CUDA support using Thrust.

@ezyang
Can you provide an expected timeline for this out-of-tree device/layout/dtype extension that you presented? Can we expect this feature in the next 3-months?

@ezyang

Are you able to merge complex number support on the CPU without AVX/SSE support? I plan to submit the following in separate merge-requests:

  • [ ] Added complex support of CPU BinaryOp kernels
  • [ ] Added complex support of CPU TensorFactories
  • [ ] Added complex support of CPU FillKernels
  • [ ] Added complex support of CPU Range kernels
  • [ ] Added complex support of CPU Unary kernels
  • [ ] Added complex support of CPU Compare kernels
  • [ ] Added complex support of CPU TensorCompare kernels
  • [ ] Added complex support of CPU ReduceOp kernels
  • [ ] Added complex support of CPU PointwiseOps kernels
  • [ ] Added complex support of CPU learpOps kernels
  • [ ] Added complex support of CPU LinearAlgebraOps kernels
  • [ ] Added complex support of CPU SpectralOps kernels

I plan to have this tested across intel/arm cpus in the next couple days.

@ezyang,

I'm looking into operations like fft() and var() where the complex number implementation must convert the tensor data to a double tensor of shape: (complex_shape, 2). This doesn't work with any existing tensor methods:

  1. 'tensor.to(torch.float64): only keeps the real part, and returns a tensor with the same shape.
  2. 'tensor.view(new_shape): new shape must have the same number of elements.

Obviously I can do something inefficient like:

def to_float(tensor):
    return th.stack((tensor.real().type(th.float64), tensor.imag().type(th.float64)), -1)

def to_complex(tensor):
    tensor = tensor.type(th.complex128) 
    return tensor[..., 0] + 1j*tensor[..., 1]

Obviously that is creating copies, when all I need is to static_cast<double> and change the shape of the tensor to (old_shape, 2). Do you have any suggestions on how to do this?

Also, There is a hack in numpy that allows you to do this:

a = np.array([1 + 1j], dtype=np.complex128)
a.dtype = np.float64  ## This works

a = torch.tensor([1 + 1j], dtype=torch.complex128)
a.dtype = torch.float64  ## This does not work

The ability to set dtype really works in this situation, however it could be unpredictable.

Some additional information regarding interpreting a complex number as an length-2 array of real numbers. The following is valid in C++11.

For any pointer to an element of an array of complex numbers p and any valid array index i, reinterpret_cast(p)[2i] is the real part of the complex number p[i], and reinterpret_cast(p)[2i + 1] is the imaginary part of the complex number p[i]. (Since C++11)

I think this means it is possible to convert a complex_tensor to a real_tensor with shape (complex_shape, 2) and then perform an operation without calling real() and imag() which allocated new memory.

@dylanbespalko I was afraid of when you'd ask about this :) The std::complex guarantee means that if you have the data pointer std::complex<float>*, you can safely cast it into float* (mumble strict aliasing) and then pass it on to whatever fft thing you are using. If you only need to implement fft/var where you can pass on this low level rep, that will be easiest.

However, if you need to literally re-view a complex tensor as a float tensor, we're in a bit of a pickle, because there is no precedent for this in PyTorch today. Storage dtype has always agreed with Tensor dtype. So if you make a complex storage there is no way to review it as a float storage.

One thought that I have had is maybe we should relax this invariant. The idea is:

  1. We always allocate storages as the "un-vectorized" type in question. So for a complex we allocate a float tensor.
  2. Tensor dtype is allowed to disagree from storage dtype, but only as a vectorized variant of the underlying type

I'm not sure how much code we'd have to change to make this invariant happen though.

@ezyang,

Yes this was inevitable...

If you only need to implement fft/var where you can pass on this low level rep, that will be easiest.

Yes this possible in many cases. Are you able to provide a code-snippet of how to interpret tensor data as a std::vector?

However, if you need to literally re-view a complex tensor as a float tensor,....

I would imagine it is rare to view a tensor using another dtype. I implemented a set_dtype() method for Tensor, but I got some errors. I also didn't update the strides to reflect the changes in shape. I'm not sure why setting dtype works in numpy (is it a coincidence?), however when you upload data to a digital-to-analog converter (DAC) it often expects the real/imaginary data to be interleaved. Perhaps that would motivate the need to decouple tensor dtype from the storage dtype as you have suggested.

I'll avoid doing this for now. I'm sure there are other performance bottlenecks for me.

Yes this possible in many cases. Are you able to provide a code-snippet of how to interpret tensor data as a std::vector?

Not exactly a std::vector, but I am imagining something like this:

Tensor complex_tensor;
assert(complex_tensor.is_contiguous());
std::complex<float>* cp = complex_tensor.data_ptr<std::complex<float>>();
float* fp = reinterpret_cast<float*>(cp);
auto num_floats = complex_tensor.numel() * 2;

I implemented a set_dtype() method for Tensor, but I got some errors. I also didn't update the strides to reflect the changes in shape.

Yeah, this is probably a bad idea if you don't also fix the strides. Also I'm not a big fan of tensors transmuting into other dtypes; better to just do it all out-of-place :)

however when you upload data to a digital-to-analog converter (DAC) it often expects the real/imaginary data to be interleaved. Perhaps that would motivate the need to decouple tensor dtype from the storage dtype as you have suggested.

Yes, ultimately this is the right thing to do, but I agree it's easier to not do this now.

@ezyang,

I am beginning to mess around with complex number CUDA support.

There are two binary compatible options:

  1. cuComplex: Very basic add, sub, mul, div, real, imag support.
  2. thrust::complex: drop in replacement for std::complex that supports host and device memory allocation.

The thrust::complex container seems to be the way to go. The Thrust::Complex API suggests that thrust::complex<T> containers can be allocated on host and device memory, while the std::complex<T> can only be allocated on the host memory:

__host__ __device__     thrust::complex< T >::complex (const complex< T > &z)  //thrust container
__host__    thrust::complex< T >::complex (const std::complex< T > &z) //stl container.
  1. Is this suggesting that AT_DISPATCH_COMPLEX_TYPES should set using scalar_t = thrust::complex<double> instead of using scalar_t = std::complex<double>?

  2. How does Pytorch automatically call CUDA equivalents of std::log for real data types? How do I know there is a CUDA equivalent of a math kernel?

  1. I think the difficulty with using thrust::complex<double> universally for CPU and CUDA is that we don't actually build against thrust if you do a CPU only build. I guess there are a bunch of options; we could roll our own complex type (similar to how we roll our own half type), or you could just reinterpret cast your way to victory, because std::complex<> is defined to have a specific binary layout. It's up to you, but just reinterpret casting between the types seems easier for now.
  2. We have math overloads in THCNumerics.cuh, does that answer your question?

@iotamudelta has raised an issue with C++11 compliance in #29547

std::real is only constexpr from C++14

If I understand correctly the std::real() needs to be a constexpr so that the hcc compiler can compile the instruction for __device__.

Possible solutions:

  1. Find another method or function to convert complex<double> to double:
  1. Find a way to wrap the function:

    • Most calls to std::real are made in aten/src/ATen/native/cpu/zmath.h. Example: replace inline w/ constexpr:

      inline VALUE_TYPE real_impl (SCALAR_TYPE z) ->
      constexpr VALUE_TYPE real_impl (SCALAR_TYPE z)

      inline std::complex<float> real_impl <std::complex<float>> (std::complex<float> z) -> constexpr std::complex<float> real_impl <std::complex<float>> (std::complex<float> z)

      inline std::complex<float> real_impl <std::complex<double>> (std::complex<float> z) -> constexpr std::complex<float> real_impl <std::complex<double>> (std::complex<float> z)

This won't compile because there is still a nested call to std::real() which is not an constexpr.

3. If I use std::complex::real() instead of std::real() this seems to be C++11 compliant. See the following link.

  1. I think you are saying that no matter what I do this code is UB until C++14. Is there any other way to convert a std::complex<double> to double that would meet your requirement?

@iotamudelta, @bddppq, @ezyang,

I have added support for complex UnaryOps and BinaryOps on CUDA thrust::complex API, but I need to ask a few questions before I submit it.

I defined a template function that allows you to use thrust::complex data types when dealing with complex numbers.
aten/src/ATen/native/cuda/zmath.cuh

#pragma once

#include <complex>
#include <thrust/complex.h>

namespace at { namespace native {
namespace {

template <typename TYPE>
struct ztype_cuda {
  using value_t = TYPE; // Complex template type
  using thrust_t = TYPE; // Equivalent thrust type
};

template <>
struct ztype_cuda<std::complex<float>> {
  using value_t = float;
  using thrust_t = thrust::complex<float>;
};

template <>
struct ztype_cuda<std::complex<double>> {
  using value_t = double;
  using thrust_t = thrust::complex<double>;
};

} // end namespace
}} //end at::native

Then in aten/src/ATen/native/cuda/BinaryOpsKernel.cu
Replace:

void add_kernel_cuda(TensorIterator& iter, Scalar alpha_scalar) {
  AT_DISPATCH_ALL_TYPES_AND2(kHalf, kBool, iter.common_dtype(), "add_cuda/sub_cuda", [&]() {
    auto alpha = alpha_scalar.to<scalar_t>();
    gpu_kernel_with_scalars(iter, [alpha]GPU_LAMBDA(scalar_t a, scalar_t b) -> scalar_t {
      return a + alpha * b;
    });
  });
}

With:

void add_kernel_cuda(TensorIterator& iter, Scalar alpha_scalar) {
  AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND2(kHalf, kBool, iter.dtype(), "add_cuda/sub_cuda", [&]() {
    using thrust_t = typename ztype_cuda<scalar_t>::thrust_t;
    auto alpha = thrust_t(alpha_scalar.to<scalar_t>());
    gpu_kernel_with_scalars(iter, [alpha]GPU_LAMBDA(thrust_t a, thrust_t b) -> thrust_t {
      return a + alpha * b;
    });
  });
}

Questions

  1. @ezyang: For non-complex numbers, scalar_t and thrust_t are the same type. Maybe I could replace the variable name thrust_t with something more friendly to non-complex numbers, such as scalar_t_c?
  2. The thrust library seems to widely referenced in the code:
    a) @bddppq: Is there some reason why I should use cuComplex instead of thrust::complex?
    b) @iotamudelta: hip-thrust has been removed in ROCm2.7. Should I use hip_complex instead?
    thrust::complex seems to support more functionality than cuComplex.

Please let me know what you think.

@iotamudelta

I've updated the discussion about std::real(). Can you confirm that std::complex::real() would fix the problem.

Hi @dylanbespalko,

I guess what @iotamudelta is complaining about is the cast_and_store for complex types is missing a C10_HOST_DEVICE, this would be an UB if that code path is ever executed on GPU.

Currently, this dynamic casting utility is only used in GPU TensorIterator, and it is only used when there is type promotion. For the reason that complex was not supported on GPU currently, cast_and_store for complex types right now does not have the C10_HOST_DEVICE qualifier and uses std::real which is totally OK for a host-only function. There is no UB here because it is not used and there is nothing you need to worry about.

But since you want to add support of complex to GPU, and complex is supported by type promotion as we can see in https://github.com/pytorch/pytorch/blob/master/c10/core/ScalarType.h#L398-L420, you need to be very careful on this code path and there are a few modifications you might need to do to make it work:

Of course, you need to add C10_HOST_DEVICE as @iotamudelta is doing in https://github.com/pytorch/pytorch/pull/29547, but that is not enough, because simply adding C10_HOST_DEVICE without other changes is still UB on C++11 as mentioned by @iotamudelta, a good solution might be what you have mentioned: use std::complex::real() to replace the std::real.

But beyond that, if you look at the file https://github.com/pytorch/pytorch/blob/master/c10/util/TypeCast.h, you would see inside fetch_and_cast, there is something like:

#ifndef C10_HOST_DEVICE
    AT_FORALL_COMPLEX_TYPES(FETCH_AND_CAST_COMPLEX_CASE)
#endif

This code path is disabled on GPU. You need to enable it and make it work.

Also, I didn't see any conversion between complex<float> and complex<double> inside fetch_and_cast and cast_and_store. You might also need to add the conversion for that. Make sure you test thoroughly on the coverage of these functions of all dtypes.

cc: @ezyang and @bddppq

Also @dylanbespalko, please cc me if you are making any change to TypeCast.h in your PR.

OK, I have a couple small things to fix with torch.real() and torch.imag() on ARM, so I'll fix TypeCast.h and some others while I'm at it. I'll cc you guys on the PR.

Drive by comment: @smessmer is moving us to C++14, at which point it won't be UB. Since this is coming soon, if the UB is not causing real problems I wouldn't worry too much about it.

@ezyang: Good to know. Most of the third_party stuff like Eigen still calls std::real() very liberally.

For non-complex numbers, scalar_t and thrust_t are the same type. Maybe I could replace the variable name thrust_t with something more friendly to non-complex numbers, such as scalar_t_c?

I'm not too sure, but scalar_t_c seems a bit less clear than thrust_t (what does the c mean anyway?) The types in question here seem quite thrust specific, so it seems better to use a name that directly talks about the intention.

Ok, I will stick with thrust_t. If anybody dives into ztype_cuda<>() they should instantly figure out that scalar_t is thrust_t for non-complex types.

Hi everyone! It looks like good progress is being made towards adding complex support to pytorch! Thanks @dylanbespalko for taking initiative on this and adding CUDA support as well! From a high level, I'm interested to know what is the current progress in complex support? I am mostly interested in a rough timeline for having CUDA support for adding and multiplying complex tensors (binary ops). Thank you!

Hi @sunilkpai,

I have an open PR that should support the binary and unary ops on CUDA: #30295.

One more issue is with the backwards propagation. I think the derivative of complex abs() is defined differently than real numbers. Not sure what to do about that, but derivatives are defined in tools/autograd/derivatives.yaml

I think for complex numbers /dz abs(z) = z/abs(z). This can be used for real numbers too, but likely will be slower than sgn(z)

@dylanbespalko Maybe the tables 4.1, 4.2 and 4.3 in my Report https://arxiv.org/pdf/1701.00392.pdf may help you to define the derivatives.

For the complex derivatives (wirtinger calculus), there are two options.
Calculating the derivative w.r.t. z or z conjugate.
I personally like the derivative w.r.t. z conjugate more.
It feels more natural for matrix operations and the gradient update does not need a conjugate.
The definition of them is:

  • derivative w.r.t. z for z = x + jy: dJ/dz = dJ/dx -j dJ/dy
  • derivative w.r.t. z.conj for z = x + jy: dJ/dz.conj = dJ/dx + j dJ/dy

From your comment, my assumption is, that you calculate the derivative w.r.t. z at the moment.
In this case the derivative is d abs(z) / d z = z.conj / abs(z). When you take the other definition you can follow @Randl suggestion.

Let me know if I should explain more. I have also some numpy implementations for the complex derivatives.

One other operation that would be useful (especially for projects in the physics space requiring complex number support) is a handler for the exp() operator. In tensorflow, we have tf.exp(x + iy) = tf.exp(x) * (tf.cos(y) + 1j * tf.sin(y)). Is this straightforward to implement in pytorch as well?

@sunilkpai, @boeddeker, @Randl,

Thanks for the report on the complex derivatives. I will try to follow that and I will get back on this next week. I thought I would add some links here and describe the project status.

The status of complex numbers is unofficially supported and must be added via PyTorch Extension:

Each extension contains two things:

  • A .cpp that contains any necessary math kernel registrations.
  • A test/ folder that contains very simplified versions of the pytorch test scripts.
    Look in the test scripts to see which kernels are supported (and why others are not).

Why can't I print a complex tensor to the console?

  • The Tensor python object has some pretty-print formatting that calls some functions that are unsupported.

    • You can modify the contents of tensor.py to bypass print formatting.

    • Or, you can simply convert Pytorch tensors to Numpy arrays and then print.

Current Project Status:

  • CPU coverage is pretty good.

    • The Kernels are implemented inside PyTorch under 'aten/src/ATen/native/cpu/</li> <li>Complex number specific code is under 'aten/src/ATen/native/cpu/zmath.h

    • Intel AVX256 acceleration is under 'aten/src/ATen/cpu/vec256/`



      • @sunilkpai: I did not know the optimization of exp. This is the folder where you add that.


      • Let me know if you are comfortable making the change.



  • GPU coverage is limited to binary and unary ops:

    • The Kernels are implemented inside PyTorch under 'aten/src/ATen/native/cuda/*</li> <li>Complex number specific code is under 'aten/src/ATen/native/cuda/zmath.cuh

    • thrust::complex<T> data types are used and they include the optimized kernels.

Current Development:

  • Waiting for C-based TH kernels to be ported to the C++ ATen folder.

    • The rand() function is needed to port the test cases to pytorch's internal tests.

    • Some indexing operations are currently not ported.

    • There are currently 168/1300 math kernels (down from 230 in October) that need to be ported from TH to ATen.

  • I will try to add complex number support as these kernels become available in ATen.

--

FYI. Regarding to complex derivatives, we had a long discussion in Julia and its implementation is now in ChainRules (see also: http://www.juliadiff.org/ChainRules.jl/dev/api.html#ChainRulesCore.Wirtinger ) and Zygote now. Generally, people only need
\partial L/\partial adjoint(z) as the gradient (by definition it is the fastest decrease direction), but the derivative is different as \partial L/\partial z, an extra interface should be added, if we want a full support for complex number AD. For the detailed rules, you can check what's implemented in ChainRules or Zygote/lib (since there are only generic rules, there is no separate rules for complex numbers for most of the operators, backward pass for things like matmul are written in generic definition e.g adjoint(A) * B)

Why can't I print a complex tensor to the console?
The Tensor python object has some pretty-print formatting that calls some functions that are unsupported.
You can modify the contents of tensor.py to bypass print formatting.
Or, you can simply convert Pytorch tensors to Numpy arrays and then print.

I think I fixed at least part of the printing in https://github.com/Roger-luo/pytorch-complex for debugging etc. in the first place, not sure if this would help since the master has changed a lot in past year. You can just take it if it is helpful, I'm not going to work on this anymore.

@dylanbespalko I'm relatively inexperience with the pytorch internals, though I have started learning! I could conceivably attempt this change, though based on what I see in aten/src/ATen/cpu/vec256/*, I'm not sure whether it's necessary given that the default behavior of std::exp(std::complex) is exactly what I mentioned in my previous comment: see notes of https://en.cppreference.com/w/cpp/numeric/complex/exp. I'm also not sure how this translates to implementing these ops in CUDA (which currently seems limited to real, imag, conj and angle?).

@sunilkpai,

I have added the AVX support for exp() using the equation provided.

I also noticed some things were broken due to some recent changes in PyTorch. I have fixed these in #30871.

@dylanbespalko

Is there a timeline for porting from TH to ATen?
Is there a way i can contribute, given the fact that i am not well versed with the inner workings of pytorch?

I found a formula for the backpropagation of the complex svd on the arxiv and could implement that, if you show me where

Thanks for your work!

@Jakob-Unfried

https://github.com/pytorch/pytorch/wiki/TH-to-ATen-porting-guide

The TH kernels are implemented in C and there is little interest in adding complex support there due to all of the inherent reference counting issues. You can track the progress in aten/src/ATen/native/native_functions.yaml where each kernel is registered:

Search for legacy::cpu::_th and divide that number by 3 for the number of old TH kernels.
Search for legacy::cpu::_thnn and divide that number by 3 for the number of old TH neural network kernels.

Each kernel is typically registered 3 different ways:
1. Regular Kernel y = add(a, b)
2. In-place Kernel a = add_(a, b)
3. Output Kernel add_out(a, b, out=y)
The actual implementation is always in the Output Kernel and the other 2 call that function.

The nn kernels tend to be easier to port because they have fewer dependent kernels. Hence, If you can port the kernels in the reverse order of how they were implemented, then you will do less overall work.

Checking the porting tracking issue https://github.com/pytorch/pytorch/issues/24507 , also cc @VitalyFedyunin

Here is a status update on complex number support as requested in #32437. I am back working on CPU related support today.

Autograd Support

  • I haven't had much time for this.
  • angle(), real(), imag(), conj() were all implemented.
  • abs() will require a separate implementation for complex numbers. (see notes from @boeddeker and @Randl above)

Hardware support

The complex number support is currently implemented out-of-tree. Here is what that means:

In-Tree Code

  • The math ops are actually implemented in-tree (inside the PyTorch source code).
  • None of the in-tree Pytorch tests validate complex number support (so things tend to break).
  • PyTorch is migrating from TH to ATen (#24507).
    - Math kernels implemented in TH do not support complex numbers.
    - Only kernels implemented in ATen can support complex numbers.
  • You need to install a PyTorch Extension to enable to the complex dtype.

Out-Of-Tree Code

  • Several PyTorch Extensions are implemented and can be easily internalized in the future:
  • Each extension has four files of importance:

    • setup.py: build and install the pytorch extension.

    • [CPU/CUDA/FPGA]ComplexType.cpp: Math kernel registration similar to CPUType.cpp or CUDAType.cpp

    • test/test_torch.py: Very ugly test cases that indicate which kernels are working, limited by ATen support.

    • test/test_autograd.py: Testing autograd functionality.

Out-Of-Tree PyTorch Extensions

  1. cpu_strided_complex
    - [x] CopyKernel
    - [ ] TensorFactories (th_random, th_uniform, th_normal)
    - [x] RangeFactories
    - [x] BinaryOpKernals
    - [x] UnaryOpKernels
    - [x] CompareOpKernels
    - [x] PowKernels
    - [x] ReduceOpKernels (th_min, th_max, some norms don't compute complex conjugate)
    - [ ] IndexOpKernels (th_masked_select_bool, th_set, th_gather, th_cat)
    - [x] PointwiseOps
    - [x] Lerp Ops
    - [ ] BlasOps (th_mv, th_mm, th_fmod, th_eig)
    - [ ] LinpackOps (uses Blas)
    - [ ] SpectralOps (supported, but needs work)

    1. cuda_strided_complex



      • [x] CopyKernel


      • [ ] TensorFactories (see cpu problems)


      • [x] BinaryOpKernals


      • [x] UnaryOpKernels (Todo: add angle, real, imag, conj)


      • [ ] CompareOpKernels (Todo)


      • [ ] ReduceOpKernels (Error messages regarding WARP_SIZE)


      • GPU becomes difficult beyond point-wise computations, however additional functions can be supported



  2. vitis_strided_complex

    • Xilinx Vitis is an FPGA High-Level synthesis platform that supports server and embedded devices.

    • Released in October 2019 (likely limited support for devices).

    • Combines SDAccel (server) with VIvado HLS (embedded).

    • [ ] BinaryOpKernals (End of Jan)

    • [ ] UnaryOpKernels (End of Jan)

    • [ ] ReduceOpKernels (End of Feb).

    • FPGA recently added vectorized objects support similar the the Vec256 PyTorch template class

    • Vec256 was how complex support was added to the CPU and it seems like the more natural way to implement $C$ or $R^N$ tensor spaces

More updates to come on this issue: https://github.com/pytorch/pytorch/issues/33152

This may or may not deserve a separate issue, but I would appreciate seeing and think it is practically more important currently to have in the documentation something that explains 'how pytorch works with complex numbers right now'. aka can do addition, multiplication, some sort of norm, can't have complex weights, etc. all of which can be summarized by a few lines of documentation explaining what is the high-level intended current behavior.

This may or may not deserve a separate issue, but I would appreciate seeing and think it is practically more important currently to have in the documentation something that explains 'how pytorch works with complex numbers right now'. aka can do addition, multiplication, some sort of norm, can't have complex weights, etc. all of which can be summarized by a few lines of documentation explaining what is the high-level intended current behavior.

Hi @redwrasse thanks for the feedback! we currently have a note for complex numbers which talks about some torch fundamentals and complex functions supported for complex tensors on master
(most of which are included in the 1.6 release) https://pytorch.org/docs/master/complex_numbers.html?highlight=complex. Can you share what other functions are you interested in? happy to talk more about our current support and what's the plan for upcoming releases.

This may or may not deserve a separate issue, but I would appreciate seeing and think it is practically more important currently to have in the documentation something that explains 'how pytorch works with complex numbers right now'. aka can do addition, multiplication, some sort of norm, can't have complex weights, etc. all of which can be summarized by a few lines of documentation explaining what is the high-level intended current behavior.

Hi @redwrasse thanks for the feedback! we currently have a note for complex numbers which talks about some torch fundamentals and complex functions supported for complex tensors on master
(most of which are included in the 1.6 release) https://pytorch.org/docs/master/complex_numbers.html?highlight=complex. Can you share what other functions are you interested in? happy to talk more about our current support and what's the plan for upcoming releases.

Thanks @anjali411 , it's great to see this documentation, I wasn't aware of it previously. I think what's needed is front and center a few lines 'current state of support for complex neural networks', but let me go through it ...

People who are interested in complex autograd, you may be interested in https://github.com/pytorch/pytorch/issues/41857 which touches on what convention PyTorch will follow (JAX or TF).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

NgPDat picture NgPDat  ·  3Comments

szagoruyko picture szagoruyko  ·  3Comments

negrinho picture negrinho  ·  3Comments

keskarnitish picture keskarnitish  ·  3Comments

dablyo picture dablyo  ·  3Comments