Julia: new syntax for transpose

Created on 15 Mar 2017  ·  103Comments  ·  Source: JuliaLang/julia

Now that .op is generally the vectorized form of op, it's very confusing that .' means transpose rather than the vectorized form of ' (adjoint, aka ctranspose). This issue is for discussing alternative syntaxes for transpose and/or adjoint.

linear algebra parser

Most helpful comment

Very strongly oppose making tr(A) mean matrix transpose - everyone's going to think it means matrix trace: https://en.wikipedia.org/wiki/Trace_(linear_algebra)

All 103 comments

Andreas tried Aᵀ (and maybe Aᴴ) in #19344, but it wasn't very well received. We could similarly pun on ^ with special exponent types T (and maybe H) such that A^T is transpose, but that's rather shady, too. Not sure there are many other good options that still kinda/sorta look like math notation.

I kind of think that t(A) might be the best, but it's unfortunate to "steal" another one-letter name.

Moving my comment from the other issue (not that it solves anything, but...):

+1 for using something else than .'.

I couldn't find languages with a special syntax for transposition, except for APL which uses the not-so-obvious , and Python which uses *X (which would be confusing for Julia). Several languages use transpose(X); R uses t(X). That's not pretty, but it's not worse than .'. At least you're less tempted to use ' by confusing it with .': it would be clear that these are very different operations.

See Rosetta code. (BTW, the Julia example actually illustrates conjugate transpose...)

Could one of the other ticks be used? ` or "

-100 to changing adjoint, since it's one of the awesome things that makes writing Julia code as clear as writing math, plus conjugate transpose is usually what you want anyway so it makes sense to have an abbreviated syntax for it.

As long as we have the nice syntax for conjugate transpose, a postfix operator for regular transpose seems mostly unnecessary, so just having it be a regular function call seems fine to me. transpose already works; couldn't we just use that? I find the t(x) R-ism unfortunate, as it's not clear from the name what it's actually supposed to do.

Using a different tick would be kind of weird, e.g. A` can look a lot like A' depending on the font, and A" looks too much like A''.

If we make the change in #20978, then a postfix transpose actually becomes more useful than it is now. e.g. if you have two vectors x and y and you want to apply f pairwise on them, you can do e.g. f.(x, y.') ... with #20978, this will be applicable to arrays of arbitrary types.

Honestly, I think our best option is still to leave it as-is. None of the suggestions seem like a clear improvement to me. .' has the advantage of familiarity from Matlab. The . actually is somewhat congruent with dot-call syntax in examples like f.(x, y.'), and suggests (somewhat correctly) that the transpose "fuses" (it doesn't produce a temporary copy thanks to RowVector and future generalizations thereof).

In fact, we could even take it further, and make f.(x, g.(y).') a fusing operation. i.e. we change .' transpose to be non-recursive ala #20978 and we extend its semantics to include fusion with other nested dot calls. (If you want the non-fusing version, you would call transpose.)

I like that plan a lot, @stevengj.

One wrinkle: presumably the @. macro does not turn y' into y.' (since that would be wrong). It could, however, turn y' into some kind of fused adjoint operation.

The main problem is finding a clean way to make f.(x, g.(y).') have fusing semantics. One possibility would be to transform it to f.(x, g.(y.')) and hence to broadcast(x,y -> f(x, g(y)), x, y.')?

Note that, for this to work properly, we might need to restore the fallback transpose(x) = x method, in which case we might as well let transpose remain recursive.

I think deciding whether transpose should be recursive or not is orthogonal to whether we make it participate in dot syntax fusion. The choice of making it non-recursive is not motivated by that.

@StefanKarpinski, if restore a fallback transpose(x) = x, then most of the motivation for changing it to be non-recursive goes away.

What's the problem if the fallback is restored but we still have the transpose be non-recursive?

@jebej, recursive transpose is more correct when it is used as a mathematical operation on linear operators. If I remember correctly, the main reason for making it non-recursive was so that we don't have to define the transpose(x) = x fallback, rather than throwing a MethodError.

But it would not be terrible to have the fallback but still be non-recursive.

Let me add two comments (I have looked through the earlier discussion and did not notice them - sorry if I have omitted something):

  • documentation for permutedims says This is a generalization of transpose for multi-dimensional arrays. Transpose is equivalent to permutedims(A, [2,1]).; it is a bit misleading as it suggests on the first read a generalization of transpose function, which it is not.
  • How one is supposed to make a transpose of a vector x=["a", "b"]? Actually y=x.' works and creates a new variable but getindex fails on it. AFAIK you have to use reshape(x, 1, :) or much slower hcat(x...) to achieve it but it is unnatural to have a different syntax for Vector (permutedims does not work here).

What is your use case for transposing a vector of strings?

Consider the following scenario for instance:

x = ["$(j+i)" for j in 1:3, i in 1:5]
y = ["$i" for i in 5:9]

and I want to append y after the last row of x. And the simplest way is to vcat a transpose of y.

Comes up in practice when incrementally logging text data to a Matrix{String} (I could use Vector{Vector{String}}), but often matrix is more useful (or then again there is a question how to convert Vector{Vector{String}} to Matrix{String} by vertically concatenating consecutive elements).

Another use-case: transposing is the simplest way to make two vectors orthogonal to each other in order to broadcast a function over the cartesian product (f.(v, w.')).

Data point: Yesterday I encountered a party confused by the postfix "broadcast-adjoint" operator and why it behaves like transpose. Best!

FWIW, I strongly feel that we should get rid of the .' syntax. As someone more familiar with Julia than with Matlab, I expected it to mean vectorized adjoint and I got really tripped up when it didn't. Julia isn't Matlab and shouldn't be bound by Matlab's conventions - if in Julia, a dot means vectorization of the adjacent function, then this should be consistent across the language and shouldn't randomly have the one horrible exception that .' is formally unrelated to '.

I think it's fine to just have transpose without any special "tick" notation, since the vast majority of the time, it's called on a matrix of real numbers, so ' would be equivalent if you really want to save typing. If we want to make a fusing version of transpose, then I really don't think that .' is the right syntax.

That's a good point. Arguably only the adjoint needs super-compact syntax.

Let's just call this transpose and deprecate .'. In the future, we can consider if we want .' as pointwise adjoint or if we just want to leave it perma-deprecated to avoid trapping Matlab users.

Note that I just grepped the registered packages and found 600+ usages of .', so it's not terribly rare. And with dot calls / broadcast (which only in 0.6 began to fully handle non-numeric data), the desire to lazily transpose non-numeric arrays (where adjoint makes less sense) will probably become much more common, so the argument for a compact syntax is somewhat strengthened.

Then we'd better deprecate .' as soon as possible, before more code get trapped in a bad usage pattern.

Why is it bad?

The problem is that .' now doesn't mean what it seems to mean as dotted operator.

As I said above, because it violates the general pattern that . means vectorization, and looks like it means vectorized adjoint (especially to someone who's not familiar with Matlab).

I think @stevengj makes a good point — this is tied to the desire for a simple non-recursive transpose.

I know it was unpopular, but I'm starting to favor Andreas' #19344 for . At this point, I'd favor deprecating the use of _all_ superscripts as identifiers, and interpret _any_ trailing superscripts as postfix operators. This also gives a path towards resolving some of the kludginess around literal_pow using superscript numbers. Yes, it'd be sad to lose χ² and such as variable names, but I think the benefits would outweigh the downsides.

At this point, I'd favor deprecating the use of _all_ superscripts as identifiers, and interpret _any_ trailing superscripts as postfix operators.

RIP my code
screenshot from 2017-11-09 22-08-25

At this point, I'd favor deprecating the use of all superscripts as identifiers

I really don't think that would be necessary, when we just want T and maybe a couple other things in the future.

A foolish consistency…

Yes, it's slightly inconsistent to use .' for transpose, but all of the alternatives proposed so far seem to be worse. It's not the worst thing in the world to say ".' is transpose, an exception to the usual rule about dot operators." You learn this and move on.

One thing to note that may help with any potential confusion over .' not being a dot broadcast is that it's a postfix operator, whereas prefix broadcasting is op. and infix is .op. So we can say that . doesn't mean broadcast when it's postfix. The other use of postfix . is field lookup, and getfield(x, ') doesn't make sense, so it's distinct from other meanings.

(That said, I favor transpose(x) over keeping .'.)

@stevengj I would bet that many (perhaps most) of the 600+ uses of .' in the registered packages that you mentioned above could be replaced by ' at no cost to readability, and the code would continue to work.

Possibly not popular, but there could still be postfix " and `?

uses of .' in the registered packages that you mentioned above could be replaced by ' at no cost to readability, and the code would continue to work.

Note that once #23424 lands, we will be able to use transpose on arrays of strings and so-on, but not adjoint. Best practice for linear algebra use of x.' will most likely become something like conj(x') (hopefully this is lazy, i.e. free). While I love using .' for its compactness, perhaps getting rid of it will force linear algebra users to use the correct thing and arrays-of-data users to use spelled-out transpose.

there could still be postfix " and `?

New syntax for transpose() seems rather premature. IMHO it would be better to just deprecate .' to be replaced as you suggest with conj(x') and transpose as required.

I have a feeling that .' is so useful in matlab mainly because of the matlab insistence that "everything is a matrix" along with the lack of coherent slicing rules such that you often need to insert random transposes in various places to get things to work.

To summarize the arguments here:

  1. .' is now the lone standout as a dotted operator that doesn't mean "apply undotted operator elementwise"; new users not coming from Matlab find this to be a surprising trap.

  2. .' is now effectively ambiguous: did you mean transpose or did you mean conj(x')? In principle, every legacy usage of .' should be vetted to determine whether it's permuting the indices of a 2-dimensional array or whether it's doing an "unconjugated adjoint".

The first issue is problematic but not fatal by itself; the second issue is the really bad one – this is no longer a single coherent operation, but rather it will be split into two separate meanings.

I just noticed that if we ever changed .' to mean "elementwise adjoint", then conj(x') would be roughly equivalent to x'.' and conj(x)' would roughly be x.'' which is sooo close to x.' 😬.

Possibly not popular, but there could still be postfix " and `?

Copy pasting code into Slack and seeing that destroy syntax highlighting would be...

Being able to transpose anything is nice because it makes it easy to "cross product" via the dispatch mechanism, and other short concise use cases like that. The issue with not having an easy fallback for this kind of stuff is that invariably the hack that we'll see is to just define transpose(x) = x fallbacks (or on Base types, so type-piracy in packages) to make this kind of thing work easily. That makes me think: why isn't Complex the the odd one? Adjoint of most numbers is itself, so adjoint of complex is the one to specialize on: can't that be extended beyond numbers?

I see two very related things here:

1) x' doesn't work for non-number types, so we want a way to easily do this for other data
2) transpose(x) is not as simple as x.'. This is mostly for the cases of (1), since the use cases for transposing complex matrices are much more rare.

But instead of going down (2), why not try and do a reasonable fix for (1)?

Maybe a reasonable fix is just a macro that makes ' mean transpose instead of adjoint?

But instead of going down (2), why not try and do a reasonable fix for (1)?

We've already been down that path and several adjacent to it. There's been a large amount of resulting discussion which perhaps someone else can distill, but in summary, it doesn't work out well. Fundamentally, the mathematical adjoint operation does not make sense on things that are not numbers. Using ' on non-numbers just because you like the terse syntax is bad – it's the worst kind of operator punning and it shouldn't be surprising that bad things ensue from this kind of abuse of meaning. The adjoint function should only be defined on things it makes sense to take the adjoint of and ' should only be used to mean that.

Remember that .' as currently used is fundamentally two different operations: array transposition and non-conjugate adjoint. The recursive transpose problem highlights the fact that these are different operations and that we therefore need different ways to express them. The mathy folks seem adamant that the non-conjugate adjoint operation is (a) important, and (b) different from simple swapping of dimensions. In particular, to be correct, non-conjugate adjoint should be recursive. On the other hand, swapping dimensions of a generic array should clearly not be recursive. So these operations need to be written differently, and existing usages of .' need to be disambiguated as having one meaning or the other. Deprecating .' is a way to force this.

Finally, while I feel strongly that permutedims(x, (2, 1)) is definitely too inconvenient for swapping the dimensions of a 2d array, I find the argument that transpose(x) is too inconvenient unconvincing. Is this operation so common that having a simple, clear function name for it is too much? Really? Is swapping the dimensions of an array that much more common or important than all the other things in the language that we use function names and function call syntax for? Householder notation does make adjoint quite special since we want to write things like v'v, v*v' and v'A*v. That's why adjoint gets really nice syntax. But swapping the dimensions of an array? It does not warrant an operator in my opinion.

Not a strong argument, but I often use the ' operator for printing more compactly arrays (when used as simple containers), for example when I want to see the content of few vectors at the same time on my screen (and invariably get frustrated when it fails because elements can't be transposed). So a short syntax for the REPL is definitely handy. (Also, this makes it easier for people used to row-major arrays, to have a simple way to "switch the order", in particular when porting algorithms to julia using 2d arrays; but definitely not a strong argument either). Just to say that it's a nice terse syntax which is not useful only to linear algebraist.

I had commented some syntax ideas at https://github.com/JuliaLang/julia/pull/19344#issuecomment-261621763, basically it was:

julia> const ᵀ, ᴴ = transpose, ctranspose;

julia> for op in (ᵀ, ᴴ)
           @eval Base.:*(x::AbstractArray{T}, f::typeof($op)) where {T<:Number} = f(x)
       end

julia> A = rand(2, 2)
2×2 Array{Float64,2}:
 0.919332  0.651938
 0.387085  0.16784

julia>  Aᵀ = (A)ᵀ    # variable definition and function application are both available!
2×2 Array{Float64,2}:
 0.919332  0.387085
 0.651938  0.16784

julia> Aᴴ = (A)ᴴ
2×2 Array{Float64,2}:
 0.919332  0.387085
 0.651938  0.16784

But without the hack of course, just the idea that there can be "postfix function application" of sorts and that it demands parenthesis (x)f, dotted versions could be like this (x).f (xf would be an identifier, even with f being a superscript symbol).

This example hack used to work on 0.6 but now:

julia> Aᵀ = (A)ᵀ               
ERROR: syntax: invalid operator

julia> Aᵀ = (A)transpose       
2×2 Array{Float64,2}:          
 0.995848  0.549117            
 0.69401   0.908227            

julia> Aᴴ = (A)ᴴ               
ERROR: syntax: invalid operator

julia> Aᴴ = (A)ctranspose      # or adjoint or whatever
2×2 Array{Float64,2}:          
 0.995848  0.549117            
 0.69401   0.908227            

Which is sad, I originally wanted to do that for powers:

julia> square(n) = n^2; cube(n) = n^3;

julia> Base.:*(n, f::typeof(square)) = f(n)

julia> Base.:*(n, f::typeof(cube)) = f(n)

julia> const ² = square    # why?
syntax: invalid character "²"

julia> const ³ = cube    # why?
syntax: invalid character "³"

Which I naively thought would enable syntax like: n² = (n)² and n³ = (n)³ But any kinda numeric identifier is banned from being at first position, however (A)⁻¹ also worked, where ⁻¹ was const ⁻¹ = inv.

I have implemented a similar hack for InfixFunctions.jl.

As a user I could just do a PostfixFunctions.jl package, and be happy with whatever you find the best here. But currently this syntax restrictions:

  • using numeric superindices at the start of an identifier is not allowed
  • superindex x * ᶠ in postfix (implicit multiplication in the hack) (x)ᶠ not allowed

Seem a little bit too much to me IMHO, I'd like to at least be able to define identifiers that can start with numeric superscripts, or more generally, only disalow actual numeric characters 0-9 with number semantics, at the start of an identifier, that would be awesome. 😄

Cheers!

See #10762 for some discussion of other number characters as identifiers.

The other issue is related to #22089, operator suffixes. +ᵀ is now a valid operator, which (probably accidentally) disallowed identifiers consisting only of combining characters in contexts where an operator might be expected. That seems like a bug to me. It's also a bit odd that is a valid identifier but -ᵀ does not do -(ᵀ). However that's not the end of the world, and IMO fixing it would not be worth losing other possible uses of .

Note that using .' as a postfix transpose operator is not even on the table here (despite what the subject of the issue says), the consideration is actually whether we should keep .' as a postfix operator for non-conjugate adjoint, which would be recursive. This happens to often be the same as transposition, but is not generally the same operation. If linear algebra folks are willing to let .' mean generic array transpose, that's a different story, but my impression is that's not acceptable.

@Ismael-VC, I can see allowing (x)ᵀ as a postfix function syntax for superscripts – since what else would it mean? I think where your proposal starts to rub people the wrong way is allowing any identifier to be applied as a function in the postfix syntax. I would limit it to superscripts.

@StefanKarpinski, I thought that the consensus was precisely to allow .' mean non-recursive, non-conjugate array transposition (if we have this operator at all), while ' is the recursive, conjugate adjoint operation.

I really, really hate the idea of using for a postfix transpose operator. It is way too useful to have as a superscript in variable names, like aᵀa or LᵀDL = ltdlfact(A). (Besides the fact that using only for an operator while other superscripts are valid in identifies would be weird.)

That was not my understanding at all – I thought that linalg people were in favor of keeping a.' as is, i.e. meaning conj(a)'. Keeping .' but changing its meaning to array transpose is quite different – I'm not sure how I feel about that. I agree that having only as a postfix operator would be annoying and inconsistent. I rather like @Ismael-VC's (a)ᵀ proposal, however, which wouldn't prevent using aᵀ as a name.

My memory of those discussions mirrors Steven's. The recursive, non-conjugated transpose is rare and generally pretty strange. Decent summary here: https://github.com/JuliaLang/julia/issues/20978#issuecomment-316141984.

I think we all agree that postfix ' is adjoint and should stay.
I think we all agree that postfix .' is suboptimal syntax.
I think most agree that non-recursive (structural) transpose is more useful than a recursive transpose.

Ok, so the points everyone seems to agree on:

  1. Use a' for adjoint(a)
  2. Use conj(a)' or conj(a') for the (non-)conjugate adjoint.

So the only point of contention is how to write the array transpose:

  • As a.' or
  • As transpose(a) or
  • As (a)ᵀ.

Is this assessment correct?

Yes, I think so (where the "array transpose" is non-recursive).

Also, as I understand it, everyone agrees that transpose(a) should definitely be valid syntax (and non-recursive), and the only points of disagreement are whether .' and/or (a)ᵀ should be alternate (completely equivalent) valid syntaxes.

Approach (1) from https://github.com/JuliaLang/julia/issues/20978#issuecomment-315902532, which received a good bit of support (e.g. https://github.com/JuliaLang/julia/issues/20978#issuecomment-316080448), remains a possibility. I have a branch realizing that approach (introducing flip(A)) which I can post.

For what it's worth, I support deprecating .'. The confusion and ambiguity in this thread is a strong argument for doing so in itself. Best!

I believe that as long as we have postfix ', folks are going to want to use it to broadcast f over a cartesian product of vectors with f.(v, w'). And folks are going to want to use it to reshape a vector of strings into a row-vector of headers for a table-like structure. So it's compelling to me to have a simple and easy-to-use replacement that we can direct them to.

Here's an option we haven't considered: A*' — a new bigraph. Typical math notation might interpret this as conj(A)', which is actually pretty close to what we want. It had been available on 0.6, but on 0.7 we allow using * to concatenate characters… still workable, though.

I don't believe postfix " and ` are available due to custom string literals parsing beyond the end of a line. Postfix * by itself is also unavailable for the same reason. Postfix prime A′ is probably one of the most commonly used unicode identifiers, so that's even more out than Aᵀ.

Honestly, after looking at my code, I don't use .' like at all, so transpose(a) is probably fine.

Note that I just grepped the registered packages and found 600+ usages of .', so it's not terribly rare.

Was this spot checked at all to see if .' wasn't used where ' would've been okay? I am starting to think that might be true more often than not. Otherwise, the only place where I have seen a legitimate use of .' was before Plots.jl labels would allow a vector (instead wanted a row vector of strings), but that's been changed. For codes where I really need this often, I think I'd start doing T = transpose locally, or throw a macro to change ' to transpose.

@transpose A = A'*A*B'*B*C'*C

would be fine with me for that rare case.

folks are going to want to use it to broadcast f over a cartesian product of vectors with f.(v, w'). And folks are going to want to use it to reshape a vector of strings into a row-vector of headers for a table-like structure. So it's compelling to me to have a simple and easy-to-use replacement that we can direct them to.

If it only shows up once in a statement, isn't it okay to just use transpose?

The a*' syntax for conjugate-adjoint is pretty nice, although it doesn't really seem like that's the operation we need a better syntax for. &a will be available soon and suggests swapping things, although it's quite different from traditional notations for this.

Maybe it's time for a straw poll?

How should we spell the structural transpose?

(roughly in order of proposal; no judgements on emoji names here)

  • 👍: A.' — just change the meaning, keep the syntax the same
  • 👎: transpose(A) — no special syntax
  • 😄: t(A) or tr(A) — no special syntax, but export a shorter name
  • 🎉: Aᵀ — with just and maybe one or two superscripts special-cased from identifiers
  • 😕: (A)ᵀ — with all superscripts separated from identifiers behaving like postfix operators
  • ❤️: A*' — gloss right over that uncanny valley, it means a structural transpose
  • If you prefer &A, throw a 🎉 on Stefan's post immediately above (we're out of emoji)

The LinAlg discussions did indeed talk about gifting .’ to non-recursive transpose usage, since conj(x’) is relatively uncommon. However is mathematical syntax and really should take the mathematical meaning (if anything).

Very strongly oppose making tr(A) mean matrix transpose - everyone's going to think it means matrix trace: https://en.wikipedia.org/wiki/Trace_(linear_algebra)

If not deprecate superscripts as identifier (which probably has to be considered seriously before 1.0) then ᵀ(A) is possibility too.

In relation to the suggestion (A)ᵀ, my apologies for slightly derailing this discussion with the following remark:

I have never cared very much for having available as unary operator, especially since you'll anyway end up typing √(...) as soon as you want to apply it to a variable that is more than a one or a few characters. Furthermore, I have always found the difference in functioning between and √a very artificial. It probably makes sense if you know about Unicode classes etc, but to anyone else this must seems absurd. Sure its useful to have as a valid variable name, but similarly √a could be a useful variable name to store the square root of a if you need to use it several times. Or more complicated expressions like a²b and its square root a√b, where the former is a valid identifier and the latter is not. Above all, I like consistency.

So for consistency, I like the proposal of having postfix operators when using parentheses (A)ᵀ, (a)², in combination with removing the Unicode unary operator (and its relatives) so that it can also be used in identifiers (while still accessible as normal function call √(a)).

I agree 100% with what @Jutho said and have thought it on a number of occasions. Would you mind opening an issue, @Jutho? Proposal: allow in identifier names, require √(x) for calling as op.

next question -> what about 2 |> √ ?

Let's have discussion of in another thread, but in short 2 |> √ means √(2).

Another alternative, which would require no parser changes and be easy to type, would be A^T for transpose (by defining T to be a singleton type with a ^ method). … oh, I see that @mbauman had this idea too. It's a bit ugly, but no more so than A.'.

I don't have expertise, but I'm very invested in the outcome of this discussion as who will be likely typing thousands of lines containing matrix expressions in their course of my work.

transpose(A) # with no special syntax is winning the vote above, but is painful to my eyes and fingers.

In python the common usage is probably with numpy and a lot of stuff that looks like the following and is not too bad:

import numpy as np
# define matrix X of n columns, with m rows of observations
error = X.dot(Theta.T) - Y
gradient = (1 / m) * (X.dot(Theta.T) - Y).T.dot(X)

I wouldn't want to have to do:

grad = 1/m * transpose(X * transpose(Theta) - Y)) * X

It totally changes the mental conception of transpose from the convention that mathematics notation has settled on, which is a postfix signifier, usually Aᵀ or Aᵗ.

Personally I'm very happy with A' which works in Julia v.0.6, before it got taken by adjoint. Is adjoint used very often?

Here are my comments in a table:

Aᵀ or Aᵗ    if the world won't accept unicode operators, let them use transpose(A)
A'          close to math notation, easy to type and *especially* easy to read
A^'         this could signal `^` not to be parsed as Exponentiation.
A.'         conflicts with dotted operator syntax, but at face value OK
A^T or A^t  these are pretty good, but what if variable `T` is meant to be an exponent? 
A.T         same as numpy, same dotted operator collision
t(A)        nesting reverses semantics, 3 keystrokes and two of them with shift key.
transpose(A) with no special syntax     # please don't do this.

Personally I'm very happy with A' which works in Julia v.0.6, before it got taken by adjoint. Is adjoint used very often?

I don't understand, A' has always been the adjoint of A. We used to call the underlying function ctranspose for conjugate transpose, but we renamed it to the equivalent term adjoint with no change in functionality.

If you're doing linear algebra then you're much more likely to want a conjugate transpose anyway, so you'll be typing A' rather than transpose(A). The popularity of not defining a special syntax for non-conjugate transposes is (presumably) due in part to the fact that it's really not that common for most linear algebraic uses to want the non-conjugate transpose.

If you're doing linear algebra then ...

If your tool is hammer then ... :)

... you have to think about possibility that Julia could grow to general programming language.

Maybe not, maybe it will stay to be linear algebra argot - which is possibility which have to think about programmers like me. :)

@mahiki, you're NumPy example:

import numpy as np
# define matrix X of n columns, with m rows of observations
error = X.dot(Theta.T) - Y
gradient = (1 / m) * (X.dot(Theta.T) - Y).T.dot(X)

would be written literally in Julia as:

error = X*Θ' - Y
gradient = (1/m) * (X*Θ' - Y)' * X

or assuming that vectors are rows in that NumPy example and would be columns in Julia:

error = X'Θ - Y
gradient = (1/m) * (X'Θ - Y) * X'

which seems about as clear and mathematical as it gets. If your data is real, then adjoint and transpose are the same operation, which may be why you're using transpose above – but mathematically adjoint is the right operation. As @ararslan said, X' has always meant adjoint in Julia (and in Matlab as well). It was previously called ctranspose short for "conjugate transpose" but that name was a misnomer since the defining property of the operator is that

dot(A*x, y) == dot(x, A'y)

which is the defining property of the Hermitian adjoint but happens to be satisfied by the conjugate transpose when A is a complex matrix. That's why "adjoint" is the right generic term for this operator.

All that said, I voted above for both transpose(a) and a.' since I think it would be fine for a.' to mean structural transpose. It would work as expected, and even though it would not be recursive and therefore not "mathematically correct" in some generic code, having it work as expected seems good enough. And telling people to consider using conj(a') in generic code seems like an education thing rather than something we really need to hit people over the head with.

@mahiki If for some reason you really do need to use transpose instead of adjoint many times in your code, then you could define a shorter macro like @t that aliases transpose (although I know this solution isn't ideal, especially if you're writing your code with other people).

you have to think about possibility that Julia could grow to general programming language.

@Liso77 It already is. As just one of many examples, Nanosoldier runs a web server that listens for GitHub events and runs performance benchmarks on demand, all in Julia. That's a digression though, and I don't want this thread to get off topic.

If you're transposing a matrix of some kind with non-numeric data—which is an entirely valid use case—mathematical transpose notation actually seems like a bad pun. In that case I feel that it would be better to be more explicit about what you're asking for, e.g. transpose (or even permutedims, depending on your specific needs).

If you're transposing a matrix of some kind with non-numeric data—which is an entirely valid use case—mathematical transpose notation actually seems like a bad pun.

Since A.' is not really "mathematical transpose notation" in the usual sense, I don't see that this is an argument for or against.

I think that @ararslan isn't arguing against the existing .' but rather against introducing a superscript-T syntax. I tend to agree - if you mean the linear-algebra concept of adjoint, then you should use ' (even if your matrix happens to be real). And if you have a matrix of non-numeric data, then it's of course perfectly legitimate to permute the two indices, but this operation isn't really the "transpose" as we usually think of it, and using mathematical superscript-T notation is probably more likely to confuse than to clarify. The only situation where a superscript-T notation would really be appropriate is if you have a numerical matrix whose indices you want to permute, but you really don't want the adjoint linear operator. Such situations certainly exist, but may be too rare to justify introducing new syntax.

... but this operation isn't really the "transpose" as we usually think of it, ...

If this is so unusual why ararslan and many others vote for spelling structural transpose as transpose(A)?

Thank you @StefanKarpinski @ararslan @ttparker. I had to go back to my linear algebra text and rediscover the adjoint, its in there all right. I took that before Complex Analysis, probably why I took no notice of it.

I love being able to do this
gradient = (1/m) * (X'Θ - Y) * X'

My confusion stems from the widespread use of 'transpose' (as a superscript T) in reference docs, papers, textbooks, etc, for example Andrew Ng's stanford CS229 lecture notes, where the corresponding Julia code would use adjoint as in @StefanKarpinski 's clean-looking example above. This is because adjoint and transpose are equivalent in ℝ (right?). update: yup

Now my favored notation for transpose is simply whatever is logically consistent. Clearly .' is not because of conflict with dotted operator syntax, and I have no objection to transpose(A) with no special syntax, since no sane special syntax seems available, except for a unicode superscript.

I like @ttparker solution if I do find myself writing a lot of transpose, macro @t that aliases transpose.

Again, I was mistaken to state:

transpose(A) with no special syntax # please don't do this.

Thank you for taking my comments seriously despite my poor facility with graduate level mathematics.

(From discourse.)

I'd like ' to be a post-fix operator that maps f' to '(f) where Base.:'(x::AbstractMatrix) = adjoint(x) and the user is free to add other methods that have nothing to do with adjoints. (For example, some people might like f' to refer to df/dt.)

With the operator suffixes introduced in 0.7, it would then be natural for f'ᵃ to map to 'ᵃ(f), and so on, allowing the user to define his own postfix operators. This would make it possible to have Base.:'ᵀ(x::AbstractMatrix) = transpose(x) and Base.:'⁻¹(x::Union{AbstractMatrix,Number}) = inv(x), etc.

Writing A'ᵀ is perhaps not as clean as Aᵀ, but it wouldn't require deprecating variable names ending in .

At first glance that looks like it could be a non-breaking feature. It's a very clever compromise. I like it.

Seems reasonable to me. The hardest part is coming up with a name for the ' function—prefix syntax does not work in this case.

The hardest part is coming up with a name for the ' function

apostrophe? Might be too literal...

Is it at all possible to make prefix syntax work (e.g., with an explicit (')(A) syntax?)? If not, then that's a problem as it'd break the if-you-can-define-the-symbol-name-then-you-can-override-its-syntax rule introduced by https://github.com/JuliaLang/julia/pull/26380.

Edit: looks to be available:

julia> (')(A)


ERROR: syntax: incomplete: invalid character literal

julia> (')(A) = 2


ERROR: syntax: incomplete: invalid character literal

Unfortunately ' is one of the hardest characters to use as an identifier name, since it introduces a different kind of atom (characters), which has very high precedence (equal to the precedence of identifiers themselves). For example, is (')' an application of ' to itself, or an open paren followed by a ')' literal?

One option that's not practical in the short term is to declare character literals to be not worth ', and use a string macro like c"_" instead.

How about if ' parses as an identifier when preceded by dot-colon, so that Base.:' would work?

Of course (@__MODULE__).:'(x) = function_body might be a bit cumbersome to write, but (x)' = function_body should work the same. Edit: No, since (x)' should map to calling the ' in Base. Defining a ' function in the current module would be cumbersome, but there wouldn't be any reason to do it either.

Or how about letting '' parse as the identifier ' when it would otherwise have parsed as an empty character literal (which is currently a parsing-level error). Similarly, ''ᵃwould parse as the identifier 'ᵃ, etc.

Everything that is not currently a syntax error would still parse as before (for example 2'' is postfix ' applied twice to 2), but 2*'' would now parse as two times '.

It seems confusing that we would have a'' === a but ''(a) === a'. It seems better to use Base.apostrophe as the name instead (or something like that).

Might it be better to split this discussion off into a new Github issue, since it's about ' syntax that's not directly related to matrix transposition?

Is there an automated way of splitting issues, or should I simply open a new one and link to the discussion here?

The latter

The only situation where a superscript-T notation would really be appropriate is if you have a numerical matrix whose indices you want to permute, but you really don't want the adjoint linear operator. Such situations certainly exist, but may be too rare to justify introducing new syntax.

I guess I'm way too late for the discussion, but I'd like to point one use that I think is worth mentioning: Applying the complex-step differentiation to a real-valued function which has transpose inside of it. (I personally figured out that I needed .' in MATLAB and julia for this particular reason.)

I'll give an example with multiple occurences of transpose (maybe I could avoid doing it this way?)

using LinearAlgebra

# f : Rⁿ → R
#     x  ↦ f(x) = xᵀ * x / 2
f(x) = 0.5 * transpose(x) * x

# Fréchet derivative of f
# Df : Rⁿ → L(Rⁿ, R)
#      x  ↦ Df(x) : Rⁿ → R (linear, so expressed via multiplication)
#                   h  ↦ Df(x)(h) = Df(x) * h
Df(x) = transpose(x) 

# Complex-step method version of Df
function CSDf(x) 
    out = zeros(eltype(x), 1, length(x))
        for i = 1:length(x)
        x2 = copy(x) .+ 0im
        h = x[i] * 1e-50
        x2[i] += im * h
        out[i] = imag(f(x2)) / h
    end
    return out
end

# 2nd Fréchet derivative
# D2f : Rⁿ → L(Rⁿ ⊗ Rⁿ, R)
#       x  ↦ D2f(x) : Rⁿ ⊗ Rⁿ → R (linear, so expressed via multiplication)
#                     h₁ ⊗ h₂ ↦ D2f(x)(h₁ ⊗ h₂) = h₁ᵀ * D2f(x) * h₂
D2f(x) = Matrix{eltype(x)}(I, length(x), length(x))

# Complex-step method version of D2f
function CSD2f(x)
    out = zeros(eltype(x), length(x), length(x))
    for i = 1:length(x)
        x2 = copy(x) .+ 0im
        h = x[i] * 1e-50
        x2[i] += im * h
        out[i, :] .= transpose(imag(Df(x2)) / h)
    end
    return out
end 

# Test on random vector x of size n
n = 5
x = rand(n)
Df(x) ≈ CSDf(x)
D2f(x) ≈ CSD2f(x)

# test that the 1st derivative is correct Fréchet derivative
xϵ = √eps(norm(x))
for i = 1:10
    h = xϵ * randn(n) # random small y
    println(norm(f(x + h) - f(x) - Df(x) * h) / norm(h)) # Fréchet check
end

# test that the 2nd derivative is correct 2nd Fréchet derivative
for i = 1:10
    h₁ = randn(n) # random h₁
    h₂ = xϵ * randn(n) # random small h₂
    println(norm(Df(x + h₂) * h₁ - Df(x) * h₁ - transpose(h₁) * D2f(x) * h₂) / norm(h₂)) # Fréchet check
end
# Because f is quadratic, we can even check that f is equal to its Taylor expansion
h = rand(n)
f(x + h) ≈ f(x) + Df(x) * h + 0.5 * transpose(h) * D2f(x) * h

The point being that f and Df must be defined using transpose and must not use the adjoint.

I don't think the complex step method is super relevant in julia. Isn't it a neat hack/workaround to get automatic differentiation in cases where a language supports efficient builtin complex numbers, but an equivalently efficient Dual number type can't be defined? That's not the case in julia, which has really nice automatic differentiation libraries.

I agree about using Dual numbers instead of the complex-step method and that's a very good point you are making (I personally have already replaced all my complex-step-method evaluations with dual-number ones in julia). However, I do think that this is still a valid use case, for demonstration purposes, teaching tricks (see, e.g., Nick Higham talking about the complex-step method at Julia Con 2018), and portability (in other words, I worry that MATLAB's version of the code above using complex numbers would be cleaner).

Coming from the world of Engineers and possibly Physicists who use complex arrays more than real arrays, not having a transpose operator is a bit of a pain. (Complex phasor representation for a harmonic time dependency is ubiquitous in our field.) I personally would favor the numpy syntax of x.H and x.T, though my only consideration is conciseness .

The density of the transpose operator relative to Hermitian transpose is about 1 to 1 in my code. So the unconjugated transpose is equally important to me. A lot of the use of transpose is to create outer products and to size arrays correctly for interfacing to other code or for matrix multiplication.

I intend for now to simply provide a macro or one character function for the operation, however what is the proper equivalent to the old functionality, transpose() or permutedims()?

transpose is intended for linear algebra and is recursive, and permutedims is for non-recursive arrangement of data of any type.

It’s interesting you say you use transpose as much as adjoint. I used to be the same, but mostly because I tended to make mistakes where my data was real so I tended to transpose but actually the adjoint was the correct operation (generalized to the complex case - adjoint was the right operation for my algorithm). There are (many) valid exceptions, of course.

In everything related to electrodynamics, you often use space-like vectors and want to use vector operations in R^n (typically n=3), i.e. transpose in particular, even though your vectors are complex-valued because you've taken a Fourier transform. It seems like @mattcbro is talking about this kind of applications.

That being said, when reading up on syntax discussions, I am often contemplating that for me, personally, I could not imagine that a slightly more verbose syntax is the thing that slows down my programming speed or efficiency. Thinking about the algorithm itself and the most natural/efficient way of implementing it takes much more time.

In everything related to electrodynamics, you often use space-like vectors and want to use vector operations in R^n (typically n=3), i.e. transpose in particular, even though your vectors are complex-valued because you've taken a Fourier transform.

Not necessarily. Often you want time-average quantities from the Fourier amplitudes, in which case you use the complex dot product, e.g. ½ℜ[𝐄*×𝐇] is the time-average Poynting flux from the complex Fourier components and ¼ε₀|𝐄|² is a time-average vacuum energy density. On the other hand, since the Maxwell operator is (typically) a complex-symmetric operator ("reciprocal"), you often use an unconjugated "inner product" for (infinite-dimensional) algebra on the fields 𝐄(𝐱) etc. over all space.

That's true, I had the word often in the first sentence, but removed it apparently :-).

Well if you want to go there, Electromagnetic quantities are even more concisely written in a Clifford Algebraic formulation, often called Geometric algebra. Those algebras have multiple automorphisms and antiautomorphisms that play a critical role in the formulation of the theory, especially when considering scattering problems.

These algebras typically have a concise matrix representation and those morphisms are often easily computed via complex transpose, Hermitian transpose and conjugation.

Nevertheless as I stated earlier my primary use of transpose is often to arrange my arrays to interface with other arrays, other code, and to get matrix multiply to work against the correct dimension of a flattened array.

I personally would favor the numpy syntax of x.H and x.T

Easy to implement now in 1.0, and should be efficient:

function Base.getproperty(x::AbstractMatrix, name::Symbol)
    if name === :T
        return transpose(x) 
    #elseif name === :H # can also do this, though not sure why we'd want to overload with `'`
    #    return adjoint(x)
    else
        return getfield(x, name)
    end
end 

This is surprisingly easy and kind of neat. The downside seems to be that orthogonal uses of getproperty don't compose with each other. So anybody implementing getproperty on their specific matrix type will need to implement the generic behavior by hand.

orthogonal uses of getproperty don't compose

Hmm. I wonder if that implies x.T "should" have been lowered to getproperty(x, Val(:T)). I shudder to think what that would do to the poor compiler though.

I’m sure everyone has their opinion - but to me it’s almost a feature that it’s hard to build a generic interface out of dot syntax. Don’t get me wrong, it’s a really great feature and wonderful for defining named tuple-like structs, and so-on.

(It’s also possible to add a Val dispatch layer to your types pretty easily).

@c42f 's code works like a charm. Unfortunately for me I'm trying to write code that works on versions 0.64 and up, which forces me to use either transpose or my own defined function T(A) = transpose(A). Perhaps a macro would have been a little cleaner and slightly more efficient.

To be clear, I'm not suggesting defining this particular getproperty is a good idea for user code. It's just likely to break things in the longer term ;-) Though perhaps one day we'll have a good enough feel for the consequences that we could have x.T defined in Base.

But in a general sense, I do wonder why this kind of property use for defining "getters" in generic interfaces is actually bad. For example, generic field getter functions currently have a whopping big namespace problem which is simply solved by judicious use of getproperty. It's far nicer to write x.A than to write MyModule.A(x), some longer ugly function name like get_my_A(x), or to export the extremely generic name A from a user module. The only problem as I see it, is the expected ability to override the meaning of .B for subtypes independently of .A being defined generically on a super type. Hence the half serious comment about Val.

Funny idea:

julia> x'̄
ERROR: syntax: invalid character "̄"

The character looks kind of like a T but it's actually a ' with a bar over it. Not sure if serious...

screen shot 2018-09-10 at 11 29 56

Yeah, looks like that on GitHub to me, too. But it's an overbar. Copy and paste into my terminal shows:

screen shot 2018-09-10 at 10 31 24 am

Too clever and cute. I still do like the combining characters, though, and I think 'ᵀ is nice.

-100 to changing adjoint, since it's one of the awesome things that makes writing Julia code as clear as writing math, plus conjugate transpose is usually what you want anyway so it makes sense to have an abbreviated syntax for it.

There is a certain arrogance to a statement like this. Consider that some finite proportion of developers explicitly _do not_ want adjoint() but _need_ transpose().

Case and point for us working with symbolic calculations for modeling the default ' operator would cause for example the pseudo-inverse (A'*A)\(A*b) or a quadratic form v'*A*v to return erroneously long and complex results that cannot be reduced.

Maybe the solution is some kind of compiler directive declaring the meaning of '.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yurivish picture yurivish  ·  3Comments

omus picture omus  ·  3Comments

StefanKarpinski picture StefanKarpinski  ·  3Comments

StefanKarpinski picture StefanKarpinski  ·  3Comments

wilburtownsend picture wilburtownsend  ·  3Comments