Rust: Tracking issue for `asm` (inline assembly)

Created on 9 Nov 2015  ·  111Comments  ·  Source: rust-lang/rust

This issue tracks stabilization of inline assembly. The current feature has not gone through the RFC process, and will probably need to do so prior to stabilization.

A-inline-assembly B-unstable C-tracking-issue T-lang requires-nightly

Most helpful comment

I would like to point out that LLVM's inline asm syntax is different from the one used by clang/gcc. Differences include:

  • LLVM uses $0 instead of %0.
  • LLVM doesn't support named asm operands %[name].
  • LLVM supports different register constraint types: for example "{eax}" instead of "a" on x86.
  • LLVM support explicit register constraints ("{r11}"). In C you must instead use register asm variables to bind a value to a register (register asm("r11") int x).
  • LLVM "m" and "=m" constraints are basically broken. Clang translates these into indirect memory constraints "*m" and "=*m" and pass the address of the variable to LLVM instead of the variable itself.
  • etc...

Clang will convert inline asm from the gcc format into the LLVM format before passing it on to LLVM. It also performs some validation of the constraints: for example it ensures that "i" operands are compile-time constants,


In light of this I think that we should implement the same translation and validation that clang does and support proper gcc inline asm syntax instead of the weird LLVM one.

All 111 comments

Will there be any difficulties with ensuring the backward-compatibility of inline assembly in stable code?

@main-- has a great comment at https://github.com/rust-lang/rfcs/pull/1471#issuecomment-173982852 that I'm reproducing here for posterity:

With all the open bugs and instabilities surrounding asm!() (there's a lot), I really don't think it's ready for stabilization - even though I'd love to have stable inline asm in Rust.

We should also discuss whether today's asm!() really is the best solution or if something along the lines of RFC #129 or even D would be better. One important point to consider here is that asm() does not support the same set of constraints as gcc. Therefore, we can either:

  • Stick to the LLVM behavior and write docs for that (because I've been unable to find any). Nice because it avoids complexity in rustc. Bad because it will confuse programmers coming from C/C++ and because some constraints might be hard to emulate in Rust code.
  • Emulate gcc and just link to their docs: Nice because many programmers already know this and there's plenty of examples one can just copy-paste with little modifications. Bad because it's a nontrivial extension to the compiler.
  • Do something else (like D does): A lot of work that may or may not pay off. If done right, this could be vastly superior to gcc-style in terms of ergonomics while possibly integrating more nicely with language and compiler than just an opaque blob (lots of handwaving here as I'm not familiar enough with compiler internals to assess this).

Finally, another thing to consider is #1201 which in its current design (I think) depends quite heavily on inline asm - or inline asm done right, for that matter.

I personally think it would be better to do what Microsoft did in MSVC x64: define a (nearly-)comprehensive set of intrinsic functions, for each asm instruction, and do "inline asm" exclusively through those intrinsics. Otherwise, it's very difficult to optimize the code surrounding inline asm, which is ironic since many uses of inline asm are intended to be performance optimizations.

One advantage of the instrinsic-based approach is that it doesn't need to be an all-or-nothing thing. You can define the most needed intrinsics first, and build the set out incrementally. For example, for crypto, having _addcarry_u64, _addcarry_u32. Note that the work to do the instrinsics seems to have been done quite thoroughly already: https://github.com/huonw/llvmint.

Further, the intrinsics would be a good idea to add even if it were ultimately decided to support inline asm, as they are much more convenient to use (based on my experience using them in C and C++), so starting with the intrinsics and seeing how far we get seems like a zero-risk-of-being-wrong thing.

Intrinsics are good, but asm! can be used for more than just inserting instructions.
For example, see the way I'm generating ELF notes in my probe crate.
https://github.com/cuviper/rust-libprobe/blob/master/src/platform/systemtap.rs

I expect that kind of hackery will be rare, but I think it's still a useful thing to support.

@briansmith

Inline asm is also useful for code that wants to do its own register/stack allocation (e.g. naked functions).

@briansmith yeah those are some excellent reasons to use intrinsics where possible. But it's nice to have inline assembly as the ultimate excape hatch.

@briansmith Note that asm!() is _kind of_ a superset of intrinsics as you can build the latter using the former. (The common argument against this reasoning is that the compiler could theoretically optimize _through_ intrinsics, e.g. hoist them out of loops, run CSE on them, etc. However, it's a pretty strong counterpoint that anyone writing asm for _optimization_ purposes would do a better job at that than the compiler anyways.) See also https://github.com/rust-lang/rust/issues/29722#issuecomment-207628164 and https://github.com/rust-lang/rust/issues/29722#issuecomment-207823543 for cases where inline asm works but intrinsics don't.

On the other hand, intrinsics critically depend on a "sufficiently smart compiler" to achieve _at least_ the performance one would get with a hand-rolled asm implementation. My knowledge on this is outdated but unless there has been significant progress, intrinsics-based implementations are still measurably inferior in many - if not most - cases. Of course they're much more convenient to use but I'd say that programmers really don't care much about _that_ when they're willing to descend into the world of specific CPU instructions.

Now another interesting consideration is that intrinsics could be coupled with fallback code on architectures where they're not supported. This gives you the best of both worlds: Your code is still portable - it can just employ some hardware accelerated operations where the hardware supports them. Of course this only really pays off for either very common instructions or if the application has one obvious target architecture. Now the reason why I'm mentioning this is that while one could argue that this may potentially even be _undesirable_ with _compiler-provided_ intrinsics (as you'd probably care about whether you actually get the accelerated versions plus compiler complexity is never good) I'd say that it's a different story if the intrinsics are provided by a _library_ (and only implemented using inline asm). In fact, this is the big picture I'd prefer even though I can see myself using intrinsics more than inline asm.

(I consider the intrinsics from RFC #1199 somewhat orthogonal to this discussion as they exist mostly to make SIMD work.)

@briansmith

Otherwise, it's very difficult to optimize the code surrounding inline asm, which is ironic since many uses of inline asm are intended to be performance optimizations.

I'm not sure what you mean here. It's true that the compiler can't break down the asm into its individual operations to do strength reduction or peephole optimizations on it. But in the GCC model, at least, the compiler can allocate the registers it uses, copy it when it replicates code paths, delete it if it's never used, and so on. If the asm isn't volatile, GCC has enough information to treat it like any other opaque operation like, say, fsin. The whole motivation for the weird design is to make inline asm something the optimizer can mess with.

But I haven't used it a whole lot, especially not recently. And I have no experience with LLVM's rendition of the feature. So I'm wondering what's changed, or what I've misunderstood all this time.

We discussed this issue at the recent work week as @japaric's survey of the no_std ecosystem has the asm! macro as one of the more commonly used features. Unfortunately we didn't see an easy way forward for stabilizing this feature, but I wanted to jot down the notes we had to ensure we don't forget all this.

  • First, we don't currently have a great specification of the syntax accepted in the asm! macro. Right now it typically ends up being "look at LLVM" which says "look at clang" which says "look at gcc" which doesn't have great docs. In the end this typically bottoms out at "go read someone else's example and adapt it" or "read LLVM's source code". For stabilization a bare minimum is that we need to have a specification of the syntax and documentation.

  • Right now, as far as we know, there's no stability guarantee from LLVM. The asm! macro is a direct binding to what LLVM does right now. Does this mean that we can still freely upgrade LLVM when we'd like? Does LLVM guarantee it'll never ever break this syntax? A way to alleviate this concern would be to have our own layer that compiles to LLVM's syntax. That way we can change LLVM whenever we like and if the implementation of inline assembly in LLVM changes we can just update our translation to LLVM's syntax. If asm! is to become stable we basically need some mechanism of guaranteeing stability in Rust.

  • Right now there are quite a few bugs related to inline assembly. The A-inline-assembly tag is a good starting point, and it's currently littered with ICEs, segfaults in LLVM, etc. Overall this feature, as implemented today, doesn't seem to live up to the quality guarantees others expect from a stable feature in Rust.

  • Stabilizing inline assembly may make an implementation of an alternate backend very difficult. For example backends such as miri or cranelift may take a very long time to reach feature parity with the LLVM backend, depending on the implementation. This may mean that there's a smaller slice of what can be done here, but it's something important to keep in mind when considering stabilizing inline assembly.


Despite the issues listed above we wanted to be sure to at least come away with some ability to move this issue forward! To that end we brainstormed a few strategies of how we can nudge inline assembly towards stabilization. The primary way forward would be to investigate what clang does. Presumably clang and C have effectively stable inline assembly syntax and it may be likely that we can just mirror whatever clang does (especially wrt LLVM). It would be great to understand in greater depth how clang implements inline assembly. Does clang have its own translation layer? Does it validate any input parameters? (etc)

Another possibility for moving forward is to see if there's an assembler we can just take off the shelf from elsewhere that's already stable. Some ideas here were nasm or the plan9 assembler. Using LLVM's assembler has the same problems about stability guarantees as the inline assembly instruction in the IR. (it's a possibility, but we need a stability guarantee before using it)

I would like to point out that LLVM's inline asm syntax is different from the one used by clang/gcc. Differences include:

  • LLVM uses $0 instead of %0.
  • LLVM doesn't support named asm operands %[name].
  • LLVM supports different register constraint types: for example "{eax}" instead of "a" on x86.
  • LLVM support explicit register constraints ("{r11}"). In C you must instead use register asm variables to bind a value to a register (register asm("r11") int x).
  • LLVM "m" and "=m" constraints are basically broken. Clang translates these into indirect memory constraints "*m" and "=*m" and pass the address of the variable to LLVM instead of the variable itself.
  • etc...

Clang will convert inline asm from the gcc format into the LLVM format before passing it on to LLVM. It also performs some validation of the constraints: for example it ensures that "i" operands are compile-time constants,


In light of this I think that we should implement the same translation and validation that clang does and support proper gcc inline asm syntax instead of the weird LLVM one.

There's an excellent video about summaries with D, MSVC, gcc, LLVM, and Rust with slides online

As someone who'd love to be able to use inline ASM in stable Rust, and with more experience than I want trying to access some of the LLVM MC APIs from Rust, some thoughts:

  • Inline ASM is basically a copy-paste of a snippet of code into the output .s file for assembling, after some string substitution. It also has attachments of input and output registers as well as clobbered registers. This basic framework is unlikely to ever really change in LLVM (although some of the details might vary slightly), and I suspect that this is a fairly framework-independent representation.

  • Constructing a translation from a Rust-facing specification to an LLVM-facing IR format isn't hard. And it might be advisable--the rust {} syntax for formatting doesn't interfere with assembly language, unlike LLVM's $ and GCCs % notation.

  • LLVM does a surprisingly bad job in practice of actually identifying which registers get clobbered, particularly in instructions not generated by LLVM. This means it's pretty much necessary for the user to manually specify which registers get clobbered.

  • Trying to parse the assembly yourself is likely to be a nightmare. The LLVM-C API doesn't expose the MCAsmParser logic, and these classes are quite annoying to get working with bindgen (I've done it).

  • For portability to other backends, as long as you keep the inline assembly mostly on the level of "copy-paste this string with a bit of register allocation and string substitution", it shouldn't inhibit backends all that much. Dropping the integer constant and memory constraints and keeping just register bank constraints shouldn't pose any problems.

I've been having a bit of play to see what can be done with procedural macros. I've written one that converts GCC style inline assembly to rust style https://github.com/parched/gcc-asm-rs. I've also started working on one that uses a DSL where the user doesn't have to understand the constraints and they're all handled automatically.

So I've come to the conclusion that I think rust should just stabilise the bare building blocks, then the community can iterate out of tree with macros to come up with best solutions. Basically, just stabilise the llvm style we have now with only "r" and "i" and maybe "m" constraints, and no clobbers. Other constraints and clobbers can be stabilised later with their own mini rfc type things.

Personally I'm starting to feel as though stabilizing this feature is the sort of massive task that will never get done unless somehow someone hires a full-time expert contractor to push on this for a whole year. I want to believe that @parched's suggestion of stabilizing asm! piecemeal will make this tractable. I hope someone picks it up and runs with it. But if it isn't, then we need to stop trying to reach for the satisfactory solution that will never arrive and reach for the unsatisfactory solution that will: stabilize asm! as-is, warts, ICEs, bugs and all, with bright bold warnings in the docs advertising the jank and nonportability, and with the intent to deprecate someday if a satisfactory implementation should ever miraculously descend, God-sent, on its heavenly host. IOW, we should do exactly what we did for macro_rules! (and of course, just like for macro_rules!, we can have a brief period of frantic band-aiding and leaky future-proofing). I'm sad at the ramifications for alternative backends, but it's shameful for a systems language to relegate inline assembly to such a limbo, and we can't let the hypothetical possibility of multiple backends continue to obstruct the existence of one actually usable backend. I beg of you, prove me wrong!

it's shameful for a systems language to relegate inline assembly to such a limbo

As a data point, I happen to be working on a crate right now that depends on gcc for the sole purpose of emitting some asm with stable Rust: https://github.com/main--/unwind-rs/blob/266e0f26b6423f4a2b8a8c72442b319b5c33b658/src/unwind_helper.c


While it certainly has its advantages, I'm a bit wary of the "stabilize building blocks and leave the rest to proc-macros"-approach. It essentially outsources the design, RFC and implementation process to whoever wants to do the job, potentially no one. Of course having weaker stability/quality guarantees is the entire point (the tradeoff is that having something imperfect is already much better than having nothing at all), I understand that.

At least the building blocks should be well-designed - and in my opinion, "expr" : foo : bar : baz definitely isn't. I can't remember ever getting the order right on the first try, I always have to look it up. "Magic categories separated by colons where you specify constant strings with magic characters that end up doing magic things to the variable names that you also just mash in there somehow" is just bad.

One idea, …

Today, there is already a project, named dynasm, which can help you generate assembly code with a plugin used to pre-process the assembly with one flavor of x64 code.

This project does not answer the problem of inline assembly, but it can certainly help, if rustc were to provide a way to map variables to registers, and accept to insert set of bytes in the code, such project could also be used to fill-up these set of bytes.

This way, the only standardization part needed from rustc point of view, is the ability to inject any byte sequence in the generated code, and to enforce specific register allocations. This removes all the choice for specific languages flavors.

Even without dynasm, this can also be used as a way to make macros for the cpuid / rtdsc instructions, which would just be translated into the raw sequence of bytes.

I guess the next question might be if we want to add additional properties/constraints to the byte-sequences.

[EDIT: I don't think anything I said in this comment is correct.]

If we want to continue to use LLVM's integrated assembler (I assume this is faster than spawning an external assembler), then stabilization means stabilizing on exactly what LLVM's inline assembly expressions and integrated assembler support—and compensating for changes to those, should any occur.

If we're willing to spawn an external assembler, then we can use any syntax we want, but we're then foregoing the advantages of the integrated assembler, and exposed to changes in whatever external assembler we're calling.

I think it would be strange to stabilize on LLVM's format when even Clang doesn't do that. Presumably it does use LLVM's support internally, but it presents an interface more like GCC.

I'm 100% fine with saying "Rust supports exactly what Clang supports" and calling it a day, especially since AFAIK Clang's stance is "Clang supports exactly what GCC supports". If we ever have a real Rust spec, we can soften the language to "inline assembly is implementation-defined". Precedence and de-facto standardization are powerful tools. If we can repurpose Clang's own code for translating GCC syntax to LLVM, all the better. The alternative backend concerns don't go away, but theoretically a Rust frontend to GCC wouldn't be much vexed. Less for us to design, less for us to endlessly bikeshed, less for us to teach, less for us to maintain.

If we stabilize something defined in terms of what clang supports, then we should call it clang_asm!. The asm! name should be reserved for something that's been designed through a full RFC process, like other major Rust features. #bikeshed

There are a few things I'd like to see in Rust inline assembly:

  • The template-with-substitutions pattern is ugly. I'm always jumping back and forth between the assembly text and the constraint list. Brevity encourages people to use positional parameters, which makes legibility worse. Symbolic names often mean you have the same name repeated three times: in the template, naming the operand, and in the expression being bound to the operand. The slides mentioned in Alex's comment show that D and MSVC let you simply reference variables in the code, which seems much nicer.

  • Constraints are both hard to understand, and (mostly) redundant with the assembly code. If Rust had an integrated assembler with a sufficiently detailed model of the instructions, it could infer the constraints on the operands, removing a source of error and confusion. If the programmer needs a specific encoding of the instruction, then they would need to supply an explicit constraint, but this would usually not be necessary.

Norman Ramsey and Mary Fernández wrote some papers about the New Jersey Machine Code Toolkit way back when that have excellent ideas for describing assembly/machine language pairs in a compact way. They tackle (Pentium Pro-era) iA-32 instruction encodings; it is not at all limited to neat RISC ISAs.

I'd like to reiterate again the conclusions from the most recent work week:

  • Today, as far as we know, there's basically no documentation for this feature. This includes LLVM internals and all.
  • We have, as far as we know, no guarantee of stability from LLVM. For all we know the implementation of inline assembly in LLVM could change any day.
  • This is, currently, a very buggy feature in rustc. It's chock full of (at compile time) segfaults, ICEs, and weird LLVM errors.
  • Without a specification it's nigh impossible to even imagine an alternate backend for this.

To me this is the definition of "if we stabilize this now we will guarantee to regret it in the future", and not only "regret it" but seems very likely for "causes serious problems to implement any new system".

At the absolute bare minimum I'd firmly believe that bullet (2) cannot be compromised on (aka the definition of stable in "stable channel"). The other bullets would be quite sad into forgo as it erodes the expected quality of the Rust compiler which is currently quite high.

@jcranmer wrote:

LLVM does a surprisingly bad job in practice of actually identifying which registers get clobbered, particularly in instructions not generated by LLVM. This means it's pretty much necessary for the user to manually specify which registers get clobbered.

I would think that, in practice, it would be quite difficult to infer clobber lists. Just because a machine-language fragment uses a register doesn't mean it clobbers it; perhaps it saves it and restores it. Conservative approaches could discourage the code generator from using registers that would be fine to use.

@alexcrichton wrote:

We have, as far as we know, no guarantee of stability from LLVM. For all we know the implementation of inline assembly in LLVM could change any day.

The LLVM docs guarantee "Newer releases can ignore features from older releases, but they cannot miscompile them." (with respect to IR compatibility). That rather constrains how much they can change inline assembly, and, as I argued above, there's not really any viable replacement at LLVM level that would radically change semantics from the current situation (unlike, say, the ongoing issues around poison and undef). Saying that its prospective instability precludes using it as a base for a Rust asm! block is therefore somewhat dishonest. Now that's not to say there are other problems with it (poor documentation, although that has improved; constraint suckiness; poor diagnostics; and bugginess in less-common scenarios are ones that come to mind).

My biggest worry in reading the thread is that we make the perfect be the enemy of the good. In particular, I worry that searching for some magic DSL intermediary is going to take a few years to try to wrangle into usable form for inline-asm as people discover that integrating asm parsers and trying to get them to work with LLVM's cause more problems in edge cases.

Are LLVM really guaranteeing that they'll never miscompile a feature whose behavior they've never specified? How would they even decide if a change was a miscompilation or not? I could see it for the other parts of the IR, but this seems like a lot to expect.

I think it would be strange to stabilize on LLVM's format when even Clang doesn't do that.

Clang doesn't do that because it aims to be able to compile code that was written for GCC. rustc doesn't have that aim. The GCC format isn't very ergonomic so ultimately I think we don't want that, but whether that would be better to go with for now I'm unsure. There's a lot of (nightly) code out there using the current Rust format that would break if we changed to GCC style so it's probably only worth changing if we can come up with something notably better.

At least the building blocks should be well-designed - and in my opinion, "expr" : foo : bar : baz definitely isn't.

Agreed. At the very least I prefer the raw LLVM format where the constraints and clobbers are all in one list. Currently there is a redundancy having to specify "=" prefix and put it in the output list. I also think how LLVM treats it more like a function call where the outputs are the result of the expression, AFAIK the current asm! implementation is the only part of rust that has "out" parameters.

LLVM does a surprisingly bad job in practice of actually identifying which registers get clobbered

AFAIK LLVM doesn't event try to do this as the main reason for inline assembly is to include some code that LLVM doesn't understand. It only does register allocation and template substitution without looking at the actual assembly. (Obviously it parses the actually assembly at some stage to generate the machine code, but I think that happens later)

If we're willing to spawn an external assembler

I'm not sure there can ever be an alternative to using the integrated inline assembler because some how you would have to get LLVM to allocate registers for it. For global assembly though, an external assembler would be workable.

Regarding breaking changes in the LLVM inline assembler, we are in the same boat as Clang. That is, if they make some changes, we just have to work around them when they happen.

If we stabilize something defined in terms of what clang supports, then we should call it clang_asm!. The asm! name should be reserved for something that's been designed through a full RFC process, like other major Rust features. #bikeshed

I'm all for it. +1

There's a lot of (nightly) code out there using the current Rust format that would break if we changed to GCC style so it's probably only worth changing if we can come up with something notably better.

@parched Going by @jimblandy 's suggestion quoted above, anyone using asm! will happily be able to still use it.

Today, as far as we know, there's basically no documentation for this feature. This includes LLVM internals and all.

If GCC's assembly syntax is truly not specified or documented after 30 years, then it seems safe to assume that either producing a documented assembly sublanguage is a task that is either so difficult that it is beyond Rust's ability to accomplish given our limited resources, or that people who want to use assembly simply don't care.

We have, as far as we know, no guarantee of stability from LLVM. For all we know the implementation of inline assembly in LLVM could change any day.

It seems unlikely that GCC/Clang's implementation of inline assembly will ever change, since that would break all C code written since the 90s.

Without a specification it's nigh impossible to even imagine an alternate backend for this.

At the risk of being callous, the prospect of alternative backends is moot if Rust as a language does not survive due to its embarrassing inability to drop into assembly. Nightly does not suffice, unless one wants to tacitly endorse the idea that Nightly is Rust, which does more to undermine Rust's stability guarantee than the prospect of LLVM changes.

The other bullets would be quite sad into forgo as it erodes the expected quality of the Rust compiler which is currently quite high.

I'm not lying when I say that every day I am thankful for the attitude of the Rust developers and the enormous standard of quality that they hold themselves to (in fact sometimes I wish y'all would slow down so you can maintain that quality without burning yourselves out like Brian did). However, speaking as someone who was here when luqmana added the asm! macro four years ago, and who has observed no progress since then at getting this stabilized, and who is sad that crypto in Rust is still impossible and that SIMD in Rust doesn't even have a workaround while the cross-platform interface is being slowly determined, I feel despondent. If I seem emphatic here it's because I view this issue as existential to the survival of the project. It may not be a crisis right this moment, but it will take time to stabilize anything at all, and we don't have the years that it will take to design and implement a world-class assembly dialect from scratch (proven by the fact that we have made no progress towards this in the last four years). Rust needs stable inline assembly sometime in 2018. We need prior art to pull that off. The macro_rules! situation acknowledged that sometimes worse is better. Once again, I'm begging someone to prove me wrong.

FWIW and coming late to the party I like what @florob 's the cologne talk proposed. For those that haven't watched it, this is the gist of it:

// Add 5 to variable:
let mut var = 0;
unsafe {
    asm!("add $5, {}", inout(reg) var);
}

// Get L1 cache size
let ebx: i32;
let ecx: i32;
unsafe {
    asm!(r"
        mov $$4, %eax;
        xor %ecx, %ecx;
        cpuid;
        mov %ebx, {};",
        out(reg) ebx, out(ecx) ecx, clobber(eax, ebx, edx)
    );
}
println!("L1 Cache: {}", ((ebx >> 22) + 1)
    * (((ebx >> 12) & 0x3ff) + 1)
    * ((ebx & 0xfff) + 1) * (ecx + 1));

How about the following strategy: rename current asm to llvm_asm (plus maybe some minor changes) and state that it's behavior is implementation detail of LLVM, thus Rust stability guarantee does not fully extend to it? Problem of different backends should be more or less solved with target_feature like functionality for conditional compilation depending on the used backend. Yes, such approach will blur Rust stability a bit, but keeping assembly in limbo like this is damaging for Rust in its own way.

I've posted a pre-RFC with an alternative syntax proposal to the internals forum: https://internals.rust-lang.org/t/pre-rfc-inline-assembly/6443. Feedback welcome.

It looks to me like the best is definitely the enemy of the kind-of-ok here. I fully support sticking a gcc_asm! or clang_asm! or llvm_asm! macro (or any proper subset thereof) into stable with compatible syntax and semantics for now, while a better solution is worked out. I don't see supporting such a thing forever as a huge maintenance burden: the more sophisticated systems proposed above look like they'd pretty easily support just turning the old-style macros into syntactic saccharine for the new one.

I have a binary program http://[email protected]/BartMassey/popcount which requires inline assembly for the x86_64 popcntl instruction. This inline assembly is the only thing keeping this code in nightly. The code was derived from a 12-year-old C program.

Right now, my assembly is conditioned on

    #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]

and then gets the cpuid info to see if popcnt is present. It would be nice to have something in Rust similar to the recent Google cpu_features library https://opensource.googleblog.com/2018/02/cpu-features-library.html in Rust, but c'est la vie.

Because this is a demo program as much as anything, I'd like to keep the inline assembly. For real programs, the count_ones() intrinsic would be sufficient — except that getting it to use popcntl requires passing "-C target-cpu=native" to Cargo, probably through RUSTFLAGS (see issue #1137 and several related issues) since distributing a .cargo/config with my source doesn't seem like a great idea, which means that right now I've got a Makefile calling Cargo.

In short, it would be nice if one could use Intel's and others' fancy popcount instruction in real applications, but it seems harder than it needs to be. Intrinsics aren't entirely the answer. Current asm! is an ok answer were it available in stable. It would be great to have a better syntax and semantics for inline assembly, but I don't really need it. It would be great to be able to specify target-cpu=native directly in Cargo.toml, but it wouldn't really solve my problem.

Sorry, rambling. Just thought I'd share why I care about this.

@BartMassey I don‘t understand, why do you so desperately need to compile to popcnt? The only reason I can see is performance and IMO you should definitely just use count_ones() in that case. What you‘re looking for is not inline asm but target_feature (rust-lang/rfcs#2045) so you can tel the compiler that it‘s allowed to emit popcnt.

@BartMassey you don't even need to use inline assembly for this, just use coresimd cfg_feature_enabled!("popcnt") to query whether the cpu your binary runs on supports the popcnt instruction (it will resolve this at compile-time if its possible to do so).

coresimd also provides a popcnt intrinsic that is guaranteed to use the popcnt instruction.

@gnzlbg

coresimd also provides a popcnt intrinsic that is guaranteed to use the popcnt instruction.

It's a bit off-topic, but this statement is not strictly true. _popcnt64 uses leading_zeros under the hood, thus if popcnt feature will not be enabled by crate user and crate author will forget to use #![cfg(target_feature = "popcnt")] this intrinsic will get compiled into ineffective assembly and there is no safeguards against it.

thus if popcnt feature will not be enabled by crate user

This is incorrect since the intrinsic uses the #[target_feature(enable = "popcnt")] attribute to enable the popcnt feature for the intrinsic unconditionally, independently of what the crate user enables or disables. Also, the assert_instr(popcnt) attribute ensures that the intrinsic disassembles into popcnt on all x86 platforms that Rust supports.

If one is using Rust on an x86 platform that Rust does not currently support, then its up to whoever is porting core to ensure that these intrinsics generate popcnt on that target.


EDIT: @newpavlov

thus if popcnt feature will not be enabled by crate user and crate author will forget to use #![cfg(target_feature = "popcnt")] this intrinsic will get compiled into ineffective assembly and there is no safeguards against it.

At least in the example you mentioned in the issue, doing this is introduces undefined behavior into the program, and in this case, the compiler is allowed to do anything. Bad codegen that works is one of the multiple outcomes one can get.

First of all, apologies for the derailing the discussion. Just wanted to reiterate my main point, which was "I fully support sticking a gcc_asm! or clang_asm! or llvm_asm! macro (or any proper subset thereof) into stable with compatible syntax and semantics for now, while a better solution is worked out. "

The point of the inline assembly is that this is a popcount benchmark / demo. I want a true guaranteed popcntl instruction when possible both as a baseline and to illustrate how to use inline assembly. I also want to guarantee that count_ones() uses a popcount instruction when possible so that Rustc doesn't look terrible compared to GCC and Clang.

Thanks for pointing out target_feature=popcnt. I'll think about how to use it here. I think I want to bench count_ones() regardless of what CPU the user is compiling for and regardless of whether it has a popcount instruction. I just want to make sure that if the target CPU has popcount count_ones() uses it.

The stdsimd/coresimd crates looks nice, and should probably be enabled for these benchmarks. Thanks! For this app I'd prefer to use as little outside the standard language features as possible (I'm already feeling guilty about lazy_static). However, these facilities look too good to ignore, and it looks like they're well on their way to becoming "official".

There’s an idea floated around by @nbp where there could be some implementation which goes from some representation of code to machine bytes (could be a proc-macro crate or something?) and then those bytes are included directly into the particular location in code.

Splicing arbitrary code bytes into arbitrary places within a function seems like a much easier problem to solve (although ability to specify inputs, outputs and their constraints as ell as clobbers would still be necessary).

cc @eddyb

@nagisa it's a little more than just a chuck of machine code though, you also have to be careful about input, output and clobber registers. If the ASM chunk says that it wants a certain variable in %rax and that it will clobber %esi you need to make sure that the surrounding code plays nice. Also if the developer lets the compiler allocate the registers you'll probably want to optimize the allocation to avoid spilling and moving values around.

@simias , indeed you will have to specify how variables are associated to specific registers, and which registers are clobbered, but all of these is smaller than standardizing any assembly language, or any LLVM assembly language.

Standardizing on byte sequences, is probably the easiest way forward by moving the assembly flavor to a driver / proc-macro.

One issue of having verbatim bytes instead of proper inline assembly, is that the compiler would have no option for doing register alpha-renaming, which I do not expect people writing inline assembly are expecting either.

But how would that work with register allocation if I want to let the compiler handle that? For instance, using GCC's (atrocious) syntax:

asm ("leal (%1, %1, 4), %0"
     : "=r" (five_times_x)
     : "r" (x));

In something like this I let the compiler allocate the registers, expecting that it will give me whatever is the most convenient and efficient. For instance on x86 64 if five_time_x is the return value then the compiler might allocate eax and if x is a function parameter it might already be available in some register. Of course the compiler only knows exactly how it will allocate registers pretty late in the compilation sequence (especially if it's not as trivial as simply function params and return values).

Would your proposed solution work with something like that?

@nbp I have to say I'm a bit confused by this proposal.
First of all, standardizing assembly language was never something we wanted to achieve with inline assembly. At least to me, the premise was always that the assembly language used by the system assembler would be accepted.
The problem is not getting the assembly parsed/assembled, we can pass that off to LLVM easily.
The problem is with filling templated assembly (or giving LLVM the required information to do so), and specifying inputs, outputs and clobbers.
The later problem is not actually solved by your proposal. It is however alleviated, because you wouldn't/couldn't support classes of registers (which @simias asks about), but just concrete registers.
At the point where constraints are simplified to that extend, it is actually just as easy to support "real" inline assembly. The first argument is an string containing (non-templated) assembly, the other arguments are the constraints. This is somewhat easily mapped to LLVM's inline assembler expressions.
Inserting raw bytes on the other hand is not as far as I know (or can tell from the LLVM IR Reference Manual) supported by LLVM. So we'd basically be extending LLVM IR, and reimplementing a feature (assembling system assembly) that is already present in LLVM using separate crates.

@nbp

indeed you will have to specify how variables are associated to specific registers, and which registers are clobbered, but all of these is smaller than standardizing any assembly language, or any LLVM assembly language.

So how would that be done? I have a sequence of bytes with hardcoded registers with basically means that the input/out registers, clobbers, etc. are all hardcoded inside this sequence of bytes.

Now I inject this bytes somewhere in my rust binary. How do I tell rustc which registers are input/output, which registers got clobbered, etc.? How is this a smaller problem to solve than stabilizing inline assembly ? It looks to me that this is exactly what inline assembly does, just maybe a little harder because now one needs to specify input/outputs clobbers twice, in the assembly written, and in whatever way we pass this information to rustc. Also, rustc would not have an easy time validating this, because for that it would need to be able to parse the sequence of bytes into assembly, and then inspecting that. What am I missing?

@simias

asm ("leal (%1, %1, 4), %0"
     : "=r" (five_times_x)
     : "r" (x));

This would not be possible, as the raw of bytes does not allow alpha renaming of registers, and the registers would have to be enforced by the code sequence ahead.

@Florob

At least to me, the premise was always that the assembly language used by the system assembler would be accepted.

My understanding, is that relying on the system assembler is not something we want to rely on, but more an accepted flaw as part of the asm! macro. Also relying on asm! being the LLVM syntax would be painful for the development of additional backend.

@gnzlbg

So how would that be done? I have a sequence of bytes with hardcoded registers with basically means that the input/out registers, clobbers, etc. are all hardcoded inside this sequence of bytes.

The idea would be to have a list of inputs, outputs, and clobbered registers, where the inputs would be a tuple of the register name associated with a (mutable) reference or copy, the clobbered register would be a list of register names, and the output would be a list of output register would form a tuple of named register to which are associated types.

fn swap(a: u32, b: u32) -> (u32, u32) {
  unsafe{
    asm_raw!{
       bytes: [0x91],
       inputs: [(std::asm::eax, a), (std::asm::ecx, b)],
       clobbered: [],
       outputs: (std::asm::eax, std::asm::ecx),
    }
  }
}

This code sequence might be the output of some compiler procedural macro, which might look like:

fn swap(a: u32, b: u32) -> (u32, u32) {
  unsafe{
    asm_x64!{
       ; <-- (eax, a), (ecx, b)
       xchg eax, ecx
       ; --> (eax, ecx)
    }
  }
}

These sequences, will not be able to directly embedde any symbol or addresses and they would have to be computed and given as registers. I am sure we can figure out how do add the ability to insert some symbol addresses within the byte sequence later on.

The advantage of this approach is that only the list of registers and constraints have to be standardized, and this is something that would easily be supported by any future backend.

@nbp

My understanding, is that relying on the system assembler is not something we want to rely on, but more an accepted flaw as part of the asm! macro. Also relying on asm! being the LLVM syntax would be painful for the development of additional backend.

I don't think that's an accurate assessment? With the minor exception of the two different syntaxes for x86 assembly, assembly syntax is largely standard and portable. The only issue with the system assembler might be that it lacks newer instructions, but that's a niche situation not worth optimizing for.

The actual problem is the glue into register allocation. But, as far as the actual assembly string itself is concerned, this merely means someone has to do some string substitution stuff and maybe some parsing--and this kind of substitution should be trivially available for any putative backend.

I agree that LLVM's (or gcc's) syntax for this stuff is crap, but moving to precompiled bytes means that any asm crate now needs to install a full assembler and possibly a full register allocator (or make programmers hand-allocate registers), or attempt to use the system assembler. At that point, it doesn't seem like it's actually really adding much value.

@jcranmer

... but moving to precompiled bytes means that any asm crate now needs to install a full assembler and possibly a full register allocator (or make programmers hand-allocate registers), or attempt to use the system assembler

https://github.com/CensoredUsername/dynasm-rs

This crate uses macro handled by a plugin to assemble the assembly code and generate vectors of raw assembly code to be concatenated at runtime.

@nbp maybe my uses cases are peculiar but lack of register renaming and letting the compiler allocate registers for me would be a bit of a deal-breaker because it either means that I need to be very lucky with my choice of registers and happen to "hit right" or the compiler will have to emit non-optimal code to shuffle registers around to match my arbitrary conventions.

If the assembly blob doesn't integrate nicely with the surrounding compiler-emitted assembly I might as well just factor the ASM stub in an external C-style method in a stand-alone .s assembly file since function calls have the same type of register-allocation constraints. This already works today, although I suppose having it built into rustc might simplify the build system compared to having a standalone assembly file. I guess what I'm saying is that IMO your proposal doesn't get us very far compared to the current situation.

And what if the ASM code calls external symbols that would be resolved by the linker? You need to pass that info around since you can't possibly resolve those until late in the compilation process. You'd have to pass there reference alongside your byte array and let the linker resolve them much later.

@jcranmer

With the minor exception of the two different syntaxes for x86 assembly, assembly syntax is largely standard and portable.

I'm not sure I understand what you mean by that, obviously ASM syntax is not portable across architectures. And even within the same architecture there are often variations and options that change the way the language is assembled.

I can give MIPS as an example, there are two important configuration flags that tweak the assembler behaviour: at and reorder. at says whether the assembler is allowed to implicitly use the AT (assembler temporary) register when assembling certain pseudo-instructions. Code that explicitly uses AT to store data must be assembled with at or it'll break.

reorder defines if the coder manually handles branch delay slots or if they trust the assembler to deal with them. Assembling code with the wrong reorder setting will almost certainly generate bogus machine code. When you write MIPS assembly you must be aware of the current mode at all times if it contains any branching instruction. For instance it's impossible to know the meaning of this MIPS listing if you don't know if reorder is enabled:

    addui   $a0, 4
    jal     some_func
    addui   $a1, $s0, 3

32bit ARM assembly has the Thumb/ARM variations, it's important to know which instruction set you're targeting (and you can change on the fly across function calls). Mixing both sets needs to be done very carefully. ARM code also typically loads large immediate values using an implicit PC-relative load, if you pre-assemble your code you'll have to be careful about how you pass these values around since they have to remain close by but are not actual instructions with a well-defined location. I'm talking about pseudo-instructions like:

   ldr   r2, =0x89abcdef

MIPS on the other hand tends to split the immediate value in two 16bit values and use a lui/ori or lui/andi combo. It's usually hidden behind the li/la pseudo-instructions but if you're writing code with noreorder and don't want to waste the delay slot sometimes you have to handle it by hand which results in funny looking code:

.set noreorder

   /* Display a message using printf */
   lui $a0, %hi(hello)
   jal printf
   ori $a0, %lo(hello)

.data

hello:
.string "Hello, world!\n"

The %hi and %lo constructs are a way to tell the assembly to generate a reference to the high and low 16bits of the hello symbol respectively.

Some code needs very peculiar alignment constraints (common when you're dealing with cache invalidation code for instance, you need to be sure that you don't take a saw to the branch you're sitting on). And there's the problem of handling external symbols that can't be resolved at this point in the compilation process as I mentioned earlier.

I'm sure I could come up with peculiarities for a bunch of other architectures I'm less familiar with. For these reasons I'm not sure I'm very optimistic for the macro/DSL approach. I understand that having a random opaque string literal in the middle of the code isn't super elegant but I don't really see what integrating the full ASM syntax into rust one way or an other would give us except additional headaches when adding support for a new architecture.

Writing an assembler is something that may seem trivial at a glance but could turn out to be very tricky if you want to support all the bells, whistles and quirks of all the architectures out there.

On the other hand having a good way to specify bindings and clobbers would be extremely valuable (compared to gcc's... perfectible syntax).

Hi guys,

Sorry for bothering you, I only wanted to drop my two cents, because I'm just an user, and a very shy/quiet one indeed, oh, and a newcomer, I have just recently landed in Rust, but I'm already in love with it.

But this assembly thing is just crazy, I mean, it's a three years span conversation, with a bunch of ideas and complains, but nothing that seems like a minimum consensus. Three years and not a RFC, it seems a little like a death end. I'm developing a humble math library (that hopefully will materialize in two or three crates), and for me (and I suspect that for any other fellow interested in write assembly in rust), the most important thing is to actually being able to do it! with a minimum guarantee that everything is not going to change the next day (that's what the unstable channel, and specially this conversation, makes me feel).

I understand that everyone here wants the best solution, and maybe one day someone comes out with that one, but as for today I believe that the current macro is just fine (well, maybe a little restricting in some ways, but hopefully nothing that cannot be addressed in an incremental way). To write assembly is like the most important thing in a systems language, a very very necessary feature, and although I'm ok relying on cpp_build until this is fixed, I'm very afraid that if it takes a lot more time it will become a forever dependency. I don't know why, call it an irrational idea, but I find that having to call cpp to call assembly is a little sad, I want a pure rust solution.

FWIW Rust is not that special here, MSVC doesn‘t have inline asm for x86_64 either. They do have that really weird implementation where you can use variables as operands but that works for x86 only.

@josevalaad Could you talk more about what you're using inline assembly for?

We typically only see it used in OS-like situations, which are typically stuck on nightly for other reasons as well, and even then they barely use asm!, so stabilizing asm! hasn't been a high-enough priority to design & develop something that can properly survive outside LLVM and please everyone.

Additionally, most things can be done using the exposed platform intrinsics. x86 and x86_64 have been stabilized and other platforms are in progress. It's most people's expectation that these are going to accomplish 95-99% of the goals. You can see my own crate jetscii as an example of using some of the intrinsics.

We just merged a jemalloc PR that uses inline assembly to work around code generation bugs in LLVM - https://github.com/jemalloc/jemalloc/pull/1303 . Somebody used inline assembly in this issue (https://github.com/rust-lang/rust/issues/53232#issue-349262078) to work around a code generation bug in Rust (LLVM) that happened in the jetscii crate. Both happened in the last two weeks, and in both cases the users tried with intrinsics but the compiler failed them.

When code generation for a C compiler happens to be unacceptable, worst case the user can use inline assembly and continue working in C.

When this happens in stable Rust, right now we have to tell people to use a different programming language or wait an indeterminate amount of time (often in the order of years). That's not nice.

@eddyb Well, I'm writing a small matrix algebra library. Inside that library, I'm implementing the BLAS, maybe some LAPACK (not there yet) routines in Rust, because I wanted the library to be a pure rust implementation. It's nothing serious yet, but anyway, I wanted the user to be able to opt for some asm speed and fun, specially with the GEMM operation, that use to be essential (the most used, anyway, and if you follow the BLIS people approach it's all what you need), at least in x86/x86_64. And that's the full story. Obviously I can use the nightly channel too, I just wanted to push a little in the pragmatic direction of stabilization of the feature.

@shepmaster There are _plenty_ of use-cases for which intrinsics aren't enough. Of the top of my head of recent stuff where I thought "why oh why doesn't Rust have stable asm?", there's no XACQUIRE/XRELEASE intrinsics.

Stable inline asm is critical and no, the intrinsics aren't enough.

My original point was attempting to help someone have the ability to write faster code. They made no mention of knowing that intrinsics were even available, and that's all I sought to share. The rest was background information.

I'm not even advocating for a specific point of view, so please don't attempt to argue with me — I have no stake in this race. I'm simply repeating what the current point of view is as I understand it. I participate in a project that requires inline assembly that is highly unlikely to have intrinsics in any near future, so I am also interested in some amount of stable inline assembly, but nightly assembly doesn't unduly bother me, nor does invoking an assembler.

Yes, there are cases that require assembly for now and there are cases that will forever need it, I said as much originally (added emphasis for clarity):

It's most people's expectation that [intrinsics] are going to accomplish 95-99% of the goals.

It is my opinion that if you want to see stable assembly, someone (or a group of people) are going to need to get general consensus from the Rust team on a direction to start in and then put in a lot of effort to actualize it.

It's nothing serious yet, but anyway, I wanted the user to be able to opt for some asm speed and fun, specially with the GEMM operation, that use to be essential (the most used, anyway, and if you follow the BLIS people approach it's all what you need), at least in x86/x86_64.

I still don't understand what instructions you need to access that you can't without inline assembly. Or is it just a specific sequence of arithmetic instructions?
If so, have you benchmarked an equivalent Rust source against the inline assembly?

what instructions you need to access that you can't without inline assembly

Well, when you are talking about assembly in math, you are basically talking about using the SIMD registers and instructions like _mm256_mul_pd, _mm256_permute2f128_pd, etc. and vectorization operations where it proceed. The thing is that you can take different approaches for vectorization, and usually it's a little trial and error until you get an optimized performance for the processor you are targeting and the use you have in mind. So usually at the library level you first have to query the processor injecting asm code to know the set of instructions and registers supported, and then conditional compiling an specific version of your math asm kernel.

If so, have you benchmarked an equivalent Rust source against the inline assembly?

Right now I have no specific test at hand, and I'm on holiday, so I would prefer to not involve myself with it a lot, but yeah, if you give me a couple of weeks I can post a performance comparative. In any case, it use to be impossible for the compiler to produce code as fast as you can with manual tuned assembly. It's not possible in C at least, even if you use the classical performance techniques like manual loop unrolling where needed, etc., so I imagine it should be not possible in Rust.

Taylor Cramer suggested I post here. Forgive me as I haven't read through all comments to come up to speed with the current state of the discussion; this is only a voice of support and statement of our situation.

For a bare-metal project at Google, we'd love to see some movement on stabilizing inline and module-level assembler. The alternative is using the FFI to call functions written in pure assembly and assembled separately and linked together into a binary.

We could define functions in assembler and call them via the FFI, linking them in a separate step, but I know of no serious bare-metal project that does that exclusively, as it has drawbacks in terms of both complexity and performance. Redox uses 'asm!'. The usual suspects of Linux, BSDs, macOS, Windows, etc, all make copious use of inline assembler. Zircon and seL4 do it. Even Plan 9 caved on this a few years ago in the Harvey fork.

For performance-critical things, function call overhead might dominate depending on the complexity of the called function. In terms of complexity, defining separate assembler functions just to invoke a single instruction, read or write a register, or otherwise manipulate machine state that's ordinarily hidden from a user-space programmer means more tedious boilerplate to get wrong. In any event, we would have to be more creative in our use of Cargo (or supplement with an external build system or a shell script or something) to do this. Perhaps build.rs could help here, but feeding it into the linker seems more challenging.

I'd also very much like it if there were some way to plumb the values of symbolic constants into the assembler template.

we'd love to see some movement on stabilizing inline and module-level assembler.

The last pre-RFC (https://internals.rust-lang.org/t/pre-rfc-inline-assembly/6443) achieved consensus 6 months ago (at least on most of the fundamental issues), so the next step is to submit an RFC that builds on that. If you want this to happen faster I'd recommend contacting @Florob about it.

For what it's worth, I need direct access to FSGS registers to get the pointer to the TEB struct on Windows, I also need a _bittest64-like intrinsic to apply bt to an arbitrary memory location, neither of which I could find a way to do without inline assembly or extern calls.

The third point mentioned here concerns me, though, as LLVM indeed prefers to Just Crash if something is wrong providing no error messaging what so ever.

@MSxDOS

I also need a _bittest64-like intrinsic to apply bt to an arbitrary memory location, neither of which I could find a way to do without inline assembly or extern calls.

It shouldn't be hard to add that one to stdsimd, clang implements these using inline assembly (https://github.com/llvm-mirror/clang/blob/c1c07cca8cae5f924cedaac7b202b0f3c167111d/test/CodeGen/bittest-intrin.c#L45) but we can use that in the std library and expose the intrinsic to safe Rust.

Feel encouraged to open an issue in the stdsimd repo about the missing intrinsics.

@josevalaad

Well, when you are talking about assembly in math, you are basically talking about using the SIMD registers and instructions like _mm256_mul_pd, _mm256_permute2f128_pd, etc. and vectorization operations where it proceed.

Ah, I suspected that might be the case. Well, if you want to give it a try, you could translate the assembly into std::arch intrinsic calls and see if you get the same performance out of it.

If you don't, please file issues. LLVM isn't magic, but at least intrinsics should be as good as asm.

@dancrossnyc If you don't mind me asking, are there any usecases/platform features in particular that require inline assembly, in your situation?

@MSxDOS Maybe we should expose intrinsics for reading the "segment" registers?


Maybe we should do some data collection and get a breakdown of what people really want asm! for, and see how many of those could be supported in some other way.

Maybe we should do some data collection and get a breakdown of what people really want asm!

I want asm! for:

  • working around intrinsics not provided by the compiler
  • working around compiler bugs / sub-optimal code generation
  • performing operations that cannot be performed via a sequence of single intrinsics calls, e.g., a read EFLAGS-modify-write EFLAGS where LLVM is allowed to modify eflags between the read and the write, and where LLVM also assumes that the user won't modify this behind its back (that is, the only way to safely work with EFLAGS is to write the read-modify-write operations as a single atomic asm! block).

and see how many of those could be supported in some other way.

I don't see any other way of supporting any of those use cases that doesn't involve some form of inline assembly but my mind is open.

Copied from my post in the pre-RFC thread, here is some inline assembly (ARM64) which I am using in my current project:

// Common code for interruptible syscalls
macro_rules! asm_interruptible_syscall {
    () => {
        r#"
            # If a signal interrupts us between 0 and 1, the signal handler
            # will rewind the PC back to 0 so that the interrupt flag check is
            # atomic.
            0:
                ldrb ${0:w}, $2
                cbnz ${0:w}, 2f
            1:
               svc #0
            2:

            # Record the range of instructions which should be atomic.
            .section interrupt_restart_list, "aw"
            .quad 0b
            .quad 1b
            .previous
        "#
    };
}

// There are other versions of this function with different numbers of
// arguments, however they all share the same asm code above.
#[inline]
pub unsafe fn interruptible_syscall3(
    interrupt_flag: &AtomicBool,
    nr: usize,
    arg0: usize,
    arg1: usize,
    arg2: usize,
) -> Interruptible<usize> {
    let result;
    let interrupted: u64;
    asm!(
        asm_interruptible_syscall!()
        : "=&r" (interrupted)
          "={x0}" (result)
        : "*m" (interrupt_flag)
          "{x8}" (nr as u64)
          "{x0}" (arg0 as u64)
          "{x1}" (arg1 as u64)
          "{x2}" (arg2 as u64)
        : "x8", "memory"
        : "volatile"
    );
    if interrupted == 0 {
        Ok(result)
    } else {
        Err(Interrupted)
    }
}

@Amanieu note that @japaric is working towards the intrinsics for ARM. Would be worth checking to see if that proposal covers your needs.

@shepmaster

@Amanieu note that @japaric is working towards the intrinsics for ARM. Would be worth checking to see if that proposal covers your needs.

It is worth remarking that:

  • this work doesn't replace inline assembly, it merely complements it. This approach implements vendor APIs in std::arch, these APIs are insufficient for some people already.

  • this approach is only usable when a sequence of intrinsic calls like foo(); bar(); baz(); produces code indistinguishable from that sequence of instructions - this isn't necessarily the case, and when it isn't, code that looks correct produces at best incorrect results, and at worst has undefined behavior (we had bugs due to this in x86 and x86_64 in std already, e.g., https://github.com/rust-lang-nursery/stdsimd/blob/master/coresimd/x86/cpuid.rs#L108 - other architectures have these issues as well).

  • some intrinsics have immediate mode arguments, which you cannot pass via a function call, so that foo(3) won't work. Every solution to this problem is currently a whacky workaround, and in some cases, no workarounds are currently possible in Rust, so we just don't provide some of these intrinsics.

So if the vendor APIs are implementable in Rust, available on std::arch, and can be combined to solve a problem, I agree that they are better than inline assembly. But every now and then either the APIs are not available, maybe not even implementable, and / or they cannot be combined correctly. While we could fix the "implementability issues" in the future, if what you want to do is not exposed by the vendor API, or the APIs cannot be combined, this approach won't help you.

What can be very surprising about LLVM's implementation of intrinsics (SIMD especially) is that they do not conform to Intel's explicit mapping of intrinsics to instructions at all - they are subject to a wide range of compiler optimizations. For instance I remember one time where I attempted to reduce memory pressure by calculating some constants from other constants instead of loading them from memory. But LLVM simply proceeded to constant-fold the entire thing back into the exact memory load I was trying to avoid. In a different case I wanted to investigate replacing a 16-bit shuffle with an 8-bit shuffle to reduce port5 pressure. Yet in its unending wisdom the ever-helpful LLVM optimizer noticed that my 8-bit shuffle is in fact a 16-bit shuffle and replaced it.

Both optimizations certainly yield better throughput (especially in the face of hyperthreading) but not the latency reduction I was hoping to achieve. I ended up dropping down all the way to nasm for that experiment but having to rewrite the code from intrinsics to plain asm was just unnecessary friction. Of course I want the optimizer to handle things like instruction selection or constant folding when using some high-level vector API. But when I explicitly decided which instructions to use I really don't want the compiler to mess around with that. The only alternative is inline asm.

So if the vendor APIs are implementable in Rust, available on std::arch, and can be combined to solve a problem, I agree that they are better than inline assembly

That's all I've been saying at first

accomplish 95-99% of the goals

and again

Yes, there are cases that require assembly for now and there are cases that will forever need it, I said as much originally (added emphasis for clarity):

It's most people's expectation that [intrinsics] are going to accomplish 95-99% of the goals.

This is the same thing that @eddyb is saying in parallel. I'm unclear why multiple people are acting like I'm completely disregarding the usefulness of inline assembly while trying to point out the realities of the current situation.

I've

  1. Pointed one poster who made no mention of knowing that intrinsics existed towards stable today intrinsics.
  2. Pointed another poster at proposed intrinsics so they can provide early feedback to the proposal.

Let me state this very clearly: yes, inline assembly is sometimes required and good. I am not arguing that. I am only trying to help people solve real world problems with the tools that are available now.

What I was trying to say was that we should have a more organized approach to this, a proper survey, and gather up a lot more data than the few of us in this thread, and then use that to point out the most common needs from inline assembly (since it's clear that intrinsics can't fully replace it).

I suspect that each architecture has a tricky-to-model subset, that gets some use from inline asm!, and maybe we should focus on those subsets, and then try to generalize.

cc @rust-lang/lang

@eddyb _require_ is a strong word, and I would be compelled to say that no we're not strictly required to use inline assembler. As I mentioned earlier, we _could_ define procedures in pure assembly language, assemble them separately, and link them into our Rust programs via the FFI.

However, as I said earlier I know of no serious OS-level project that does that. It would mean lots of boiler plate (read: more chances to make a mistake), a more complex build process (right now we're fortunate enough that we can get away with a simple cargo invocation and a linked and nearly-ready-to-run kernel pops out of the other end; we'd have to invoke the assembler and link in a separate step), and a drastic decrease in the ability to inline things, etc; there would almost certainly be a performance hit.

Things like compiler intrinsics help in a lot of cases, but for things like the supervisory instruction set of the target ISA, particularly more esoteric hardware features (hypervisor and enclave features, for example), there often aren't intrinsics and we're in a no_std environment. What intrinsics are there often aren't sufficient; e.g., the x86-interrupt calling convention seems cool but doesn't give you mutable access to the general purpose registers in a trap frame: suppose I take an undefined instruction exception with the intent to do emulation, and suppose the emulated instruction returns a value in %rax or something; the calling convention doesn't give me a good way to pass that back to the call-site, so we had to roll our own. That meant writing my own exception handling code in assembler.

So to be honest no, we don't _require_ inline assembler, but it is sufficiently useful that it would almost be a non-starter not to have it.

@dancrossnyc I am specifically curious about avoiding separate assembling, that is, what kind of assembly you need at all in your project, no matter how you link it in.

In your case it seems to be a supervisor/hypervisor/enclave privileged ISA subset, is that correct?

there often aren't intrinsics

Is this by necessity, i.e. do the instructions have requirements which are unreasonably difficult or even impossible to uphold when compiled as intrinsic calls through, e.g. LLVM?
Or is this just because they're assumed to be too special-cased to be useful to most developers?

and we're in a no_std environment

For the record, vendor intrinsics are in both std::arch and core::arch (the former is a reexport).

the x86-interrupt calling convention seems cool but doesn't give you mutable access to the general purpose registers in a trap frame

cc @rkruppe Can this be implemented in LLVM?

@eddyb correct; we need the supervisor subset of the ISA. I'm afraid I can't say much more at the moment about our specific use case.

Is this by necessity, i.e. do the instructions have requirements which are unreasonably difficult or even impossible to uphold when compiled as intrinsic calls through, e.g. LLVM?
Or is this just because they're assumed to be too special-cased to be useful to most developers?

To some extent both are true, but on balance i would say the latter is more relevant here. Some things are microarchitecture specific and dependent on specific processor package configurations. Would it be reasonable for a compiler to (for example) expose something as an intrinsic that's part of the privileged instruction subset _and_ conditioned on a specific processor version? I honestly don't know.

For the record, vendor intrinsics are in both std::arch and core::arch (the former is a reexport).

That's actually really good to know. Thanks!

Would it be reasonable for a compiler to (for example) expose something as an intrinsic that's part of the privileged instruction subset and conditioned on a specific processor version? I honestly don't know.

We already do. For example, the xsave x86 instructions are implemented and exposed in core::arch, not available on all processors, and most of them require privileged mode.

@gnzlbg xsave isn't privileged; did you mean xsaves?

I took a look through https://rust-lang-nursery.github.io/stdsimd/x86_64/stdsimd/arch/x86_64/index.html and the only privileged instructions I saw in my quick sweep (I didn't do an exhaustive search) were xsaves, xsaves64, xrstors, and xrstors64. I suspect those are intrinsics because they fall into the general XSAVE* family and don't generate exceptions in real mode, and some folks want to use clang/llvm to compile real-mode code.

@dancrossnyc yes some of those are the ones I meant (we implement xsave, xsaves, xsaveopt, ... in the xsave module: https://github.com/rust-lang-nursery/stdsimd/blob/master/coresimd/x86/xsave.rs).

These are available in core, so you can use them to write an OS kernel for x86. In user-space they are useless AFAICT (they'll always raise an exception), but we don't have a way to distinguish about this in core. We could only expose them in core and not in std though, but since they are already stable, that ship has sailed. Who knows, maybe some OS runs everything in ring 0 someday, and you can use them there...

@gnzlbg I don't know why xsaveopt or xsave would raise an exception in userspace: xsaves is the only one of the family that's defined to generate an exception (#GP if CPL>0), and then only in protected mode (SDM vol.1 ch. 13; vol.2C ch. 5 XSAVES). xsave and xsaveopt are useful for implementing e.g. pre-emptive user-space threads, so their presence as intrinsics actually makes sense. I suspect the intrinsic for xsaves was either because someone just added everything from the xsave family without realizing the privilege issue (that is, assuming it was invocable from userspace), or someone wanted to call it from real mode. That latter may seem far-fetched but I know people are e.g. building real-mode firmware with Clang and LLVM.

Don't get me wrong; the presence of LLVM intrinsics in core is great; if I never have to write that silly sequence of instructions to get the results of rdtscp into a useful format again, I'll be happy. But the current set of intrinsics are not a substitute for inline assembler when you're writing a kernel or other bare-metal supervisory sort of thing.

@dancrossnyc when I mentioned xsave I was referring to some of the intrinsics that are available behind the CPUID bits XSAVE, XSAVEOPT, XSAVEC, etc. Some of these intrinsics require privileged mode.

Would it be reasonable for a compiler to (for example) expose something as an intrinsic that's part of the privileged instruction subset and conditioned on a specific processor version?

We already do and they are available in stable Rust.

I suspect the intrinsic for xsaves was either because someone just added everything from the xsave family without realizing the privilege issue

I added these intrinsics. We realized the privilege issues and decided to add them anyways because it is perfectly fine for a program depending on coreto be an OS kernel that wants to use these, and they are harmless in userspace (as in, if you try to use them, your process terminates).

But the current set of intrinsics are not a substitute for inline assembler when you're writing a kernel or other bare-metal supervisory sort of thing.

Agreed, that's why this issue is still open ;)

@gnzlbg sorry, I don't mean to derail this by rabbit-holing on xsave et al.

However, as near as I can tell, the only intrinsics that require privileged execution are those related to xsaves and even then it's not always privileged (again, real-mode doesn't care). It's wonderful that those are available in stable Rust (seriously). The others might be useful in userspace and similarly I think it's great that they're there. However, xsaves and xrstors are a very, very small portion of the privileged instruction set and having added intrinsics for two instructions is qualitatively different than doing so generally and I think the question remains as to whether it's appropriate _in general_. Consider the VMWRITE instruction from the VMX extensions, for example; I imagine an intrinsic would do something like execute instruction and then "return" rflags. That's sort of an oddly specialized thing to have as an intrinsic.

I think otherwise we're in agreement here.

FWIW per the std::arch RFC we can currently only add intrinsics to std::arch that the vendors expose in their APIs. For the case of xsave, Intel exposes them on its C API, so that's why it is ok that's there. If you need any vendor intrinsics that are not currently exposed, open an issue, whether it requires privileged mode or not is irrelevant.

If the vendor doesn't expose an intrinsic for it, then std::arch might not be the place for it, but there are many alternatives to that (inline assembly, global asm, calling C, ...).

Sorry, I understood you saying you wrote the intrinsics for xsave to mean the Intel intrinsics; my earlier comments still apply as to why I think xsaves is an intrinsic then (either an accident by a compiler writer at Intel or because someone wanted it for real mode; I feel like the former would be noticed really quickly but firmware does weird stuff, so the latter wouldn't surprise me at all).

Anyway, yes, I think we fundamentally agree: intrinsics aren't the place for everything, and that's why we'd like to see asm!() moved to stable. I'm really excited to hear that progress is being made in this area, as you said yesterday, and if we can gently nudge @Florob to bubble this up closer to the top of the stack, we'd be happy to do so!

A few additional details and use cases for asm!:

When you're writing an operating system, firmware, certain types of libraries, or certain other types of system code, you need full access to platform-level assembly. Even if we had intrinsics that exposed every single instruction in every architecture Rust supports (which we don't come anywhere close to having), that still wouldn't be enough for some of the stunts that people regularly pull with inline assembly.

Here are a small fraction of things you can do with inline assembly that you can't easily do in other ways. Every single one of these is a real-world example I've seen (or in some cases written), not a hypothetical.

  • Collect all the implementations of a particular pattern of instructions in a separate ELF section, and then in loading code, patch that section at runtime based on characteristics of the system you run on.
  • Write a jump instruction whose target gets patched at runtime.
  • Emit an exact sequence of instructions (so you can't count on intrinsics for the individual instructions), so that you can implement a pattern that carefully handles potential interruptions in the middle.
  • Emit an instruction, followed by a jump to the end of the asm block, followed by fault recovery code for a hardware fault handler to jump to if the instruction generates a fault.
  • Emit a sequence of bytes corresponding to an instruction the assembler doesn't know about yet.
  • Write a piece of code that carefully switches to a different stack and then calls another function.
  • Call assembly routines or system calls that require arguments in specific registers.

+1e6

@eddyb

Ok, I will try the intrinsics approach and see where it takes. You are probably right and that's the best approach for my case. Thank you!

@joshtriplett nailed it! These are the exact use cases I had in mind.

loop {
   :thumbs_up:
}

I would add a couple of other use cases:

  • writing code in weird architectural modes, like BIOS/EFI calls and 16-bit real-mode.
  • writing code with strange/unusual addressing modes (which comes up often in 16-bit real-mode, bootloaders, etc.)

@mark-i-m Absolutely! And generalizing a point that has sub-cases in both of our lists: translating between calling conventions.

I am closing out #53118 in favor of this issue and copying the PR here for the record. Note that this is from August, but a brief look seems to indicate the situation hasn't changed:


The section on inline assembly needs an overhaul; in its present state it implies that the behavior and syntax is tied to rustc and the rust language in general. Pretty much the entire documentation is specific to x86/x86_64 assembly with the llvm toolchain. To be clear, I am not referring to the assembly code itself, which is obviously platform-specific, but rather the general architecture and usage of inline assembly altogether.

I didn't find an authoritative source for the behavior of inline assembly when it comes to ARM target, but per my experimentation and referencing the ARM GCC inline assembly documentation, the following points seem to be completely off:

  • The ASM syntax, as ARM/MIPS (and most other CISC?) use intel-esque syntax with the destination register first. I understood the documentation to mean/imply that inline asm took at&t syntax which was transpiled to actual platform/compiler-specific syntax, and that I should just substitute the names of the x86 registers with that of the ARM registers only.
  • Relatedly, the intel option is invalid, as is it causes "unknown directive" errors when compiling.
  • Adapting from the ARM GCC inline assembly documentation (for building against thumbv7em-none-eabi with the arm-none-eabi-* toolchain, it appears that even some basic assumptions about the format of inline assembly are platform-specific. In particular, it seems that for ARM the output register (second macro argument) counts as a register reference, i.e. $0 refers to the first output register and not the first input register, as is the case with the x86 llvm instructions.
  • At the same time, other compiler-specific features are _not_ present; I can't use named references to registers, only indexes (e.g. asm("mov %[result],%[value],ror #1":[result] "=r" (y):[value] "r" (x)); is invalid).
  • (Even for x86/x86_64 targets, the usage of $0 and $2 in the inline assembly example is very confusing, as it does not explain why those numbers were chosen.)

I think what threw me the most is the closing statement:

The current implementation of the asm! macro is a direct binding to LLVM's inline assembler expressions, so be sure to check out their documentation as well for more information about clobbers, constraints, etc.

Which does not seem to be universally true.

I understood the documentation to mean/imply that inline asm took at&t syntax which was transpiled to actual platform/compiler-specific syntax, and that I should just substitute the names of the x86 registers with that of the ARM registers only.

A notion of intel vs at&t syntax only exists on x86 (though there may be other cases I'm not aware of). It's unique in that they are two different languages sharing the same set of mnemonics to represent the same set of binary code. The GNU ecosystem has established at&t syntax as the dominating default for the x86 world which is why this is what inline asm defaults to. You are mistaken in that it is very much a direct binding to LLVM's inline assembler expressions which in turn mostly just dump plaintext (after processing substitutions) into the textual assembly program. None of this is unique (or even relevant) to or about today's asm!() as it is entirely platform-specific and completely meaningless beyond the x86 world.

Relatedly, the intel option is invalid, as is it causes "unknown directive" errors when compiling.

This is a direct consequence of the "dumb"/simple plaintext insertion I described above. As the error message indicates the .intel_syntax directive is unsupported. This is an old and well-known workaround for using intel-style inline-asm with GCC (which emits att style): one would simply write .intel_syntax at the start of the inline asm block, then write some intel-style asm and finally terminate with .att_syntax to set the assembler back into att mode so it correctly processes the (following) compiler-generated code once again. It's a dirty hack and I remember at least the LLVM implementation having had some weird quirks for a long time so it seems like you're seeing this error because it was finally removed. Sadly, the only correct course of action here is to remove the "intel" option from rustc.

it appears that even some basic assumptions about the format of inline assembly are platform-specific

Your observation is entirely correct, each platform makes up both its own binary format and its own assembly language. They are completely independent and (mostly) unprocessed by the compiler - which is the entire point of programming in raw assembler!

I can't use named references to registers, only indexes

Sadly there is quite a big mismatch between the inline asm implementation of LLVM that rustc exposes and the implementation of GCC (which clang emulates). Without a decision on how to move forward with asm!() there is little motivation in improving this - besides, I outlined the major options a long time ago all of them have clear drawbacks. Since this does not seem to be a priority you're probably going to be stuck with today's asm!() for a few years at least. There are decent workarounds:

  • rely on the optimizer to produce optimal code (with a little nudging you can usually get exactly what you want without ever writing raw assembly yourself)
  • use intrinsics, another quite elegant solution which is better than inline asm in almost every way (unless you need exact control over instruction selection and scheduling)
  • invoke the cc crate from build.rs to link a C object with inline asm

    • basically just invoke any assembler you like from build.rs, using a C compiler may seem like overkill but saves you the hassle of integrating with the build.rs system

These workarounds apply to all but a small set of very specific edge cases. If you hit one of those (luckily I haven't yet) you're out of luck though.

I agree that the documentation is quite lackluster but it's good enough for anyone familiar with inline asm. If you aren't, you probably should not be using it. Don't get me wrong - you should definitely feel free to experiment and learn but as asm!() is unstable and neglected and because there are really good workarounds I would strongly advise against using it in any serious project if at all possible.

invoke the cc crate from build.rs to link a C object with inline asm

You can also invoke the cc crate from build.rs to build plain assembly files, which gives the maximum amount of control. I strongly recommend doing exactly this in case the two "workarounds" above this do not work for your use-case.

@main-- wrote:

These workarounds apply to all but a small set of very specific edge cases. If you hit one of those (luckily I haven't yet) you're out of luck though.

I mean, not entirely out of luck. You just have to use Rust's inline asm. I have an edge case that none of your listed workarounds cover here. As you say, if you're familiar with the process from other compilers it's mostly fine.

(I have another use case: I would like to teach systems programming computer architecture and stuff using Rust instead of C someday. Not having inline assembly would make this much more awkward.)

I wish we would make inline assembly a priority in Rust and stabilize it sooner rather than later. Maybe this should be a Rust 2019 goal. I am fine with any of the solutions you list in your nice comment earlier: I could live with the problems of any of them. Being able to inline assembly code is for me a prerequisite to writing Rust instead of C everywhere: I really need it to be stable.

I wish we would make inline assembly a priority in Rust and stabilize it sooner rather than later. Maybe this should be a Rust 2019 goal.

Please write a Rust 2019 blog post and express this concern. I think if enough of us do that, we can influence the roadmap.

To clarify my comment above - the problem is that the documentation does not explain just how "deeply" the contents of the asm!(..) macro are parsed/interacted with. I'm familiar with x86 and MIPS/ARM assembly but presumed that llvm had its own assembly language format. I've used inline assembly for x86 before, but was not clear on to what extent the bastardization of asm to brige C and ASM went. My presumption (now invalidated) based off the wording in the rust inline assembly section was that LLVM had its own ASM format that was built to mimic x86 assembly in either at&t or intel modes, and necessarily looked like the x86 examples shown.

(What helped me was studying the expanded macro output, which cleared up what was going on)

I think there needs to be less abstraction on that page. Make it clearer what gets parsed by LLVM and what gets interpreted as ASM directly. What parts are specific to rust, what parts are specific to the hardware you are running on, and what parts belong to the glue that holds them together.

invoke the cc crate from build.rs to link a C object with inline asm

Recent progress on cross-language LTO makes me wonder if some of the downsides of this avenue can be reduced, effectively inlining this "external assembly blob". (probably not)

invoke the cc crate from build.rs to link a C object with inline asm

Recent progress on cross-language LTO makes me wonder if some of the downsides of this avenue can be reduced, effectively inlining this "external assembly blob".

Even if this works, I don't want to write my inline assembly in C. I want to write it in Rust. :-)

I don't want to write my inline assembly in C.

You can compile and link .s and .S files directly (see for example this crate), which in my book are far enough from C. :)

if some of the downsides of this avenue can be reduced

I believe this is not currently feasible as cross-language LTO relies on having LLVM IR and assembly would not generate this.

I believe this is not currently feasible as cross-language LTO relies on having LLVM IR and assembly would not generate this.

You can stuff assembly into module level assembly in LLVM IR modules.

Does anyone know what the most recent proposal/current status is? Since the theme of the year is "maturity and finishing what we started", it seems like a great opportunity to finally finish up asm.

Vague plans for an new (to be stabilized) syntax were discussed last February: https://paper.dropbox.com/doc/FFI-5NmXV30TGiSsr9dIxpqpq

According to those notes @joshtriplett and @Amanieu signed up to write an RFC.

What is the status of the new syntax?

It needs to be RFC'ed and implemented on nightly

ping @joshtriplett @Amanieu Let me know if I can help move things along here! I'll be in touch shortly.

@cramertj AFAICT anybody can move this forward, this is unblocked and waiting on somebody to step in and put in the work. There is a pre-RFC sketching the overall design, and the next steps could be to implement that and see if it actually works, either as a proc macro, in a fork, or as a different unstable feature.

One could probably try to just turn that pre-RFC into a proper RFC and submit it, but I doubt that without an implementation such an RFC can be convincing.


EDIT: to be clear, by convincing I specifically mean parts of the pre-RFC like this one:

additionally mappings for register classes are added as appropriate (cf. llvm-constraint 6)

where there are dozens of arch-specific register classes in the lang-ref. An RFC cannot just wave all of these out, and making sure that they all work like they are supposed to, or are meaningful, or are "stable" enough in LLVM to be exposed from here, etc. would benefit from an implementation one can just try these in.

Is RISC-V inline assembly supported here with #![feature(asm)]?

To the best of my knowledge, all assembly on supported platforms is supported; it's pretty much raw access to the llvm compiler's asm support.

Yes, RISC-V is supported. Architecture-specific input/output/clobber constraint classes are documented in the LLVM langref.

There is a caveat, however - if you need to constrain to individual registers in input/output/clobber constraints, you must use the architectural register names (x0-x31, f0-f31), not the ABI names. In the Assembly fragment itself, you can use either kind of register name.

As someone new to these concepts can I just say... this whole discussion seems _silly_. How is it that a language (assembly) which is supposed to be a 1 to 1 mapping with it's machine code causes this much headache?

I'm pretty confused:

  • If you are writing asm, shouldn't it have to be rewritten (by a human with #[cfg(...)]) for every architecture _and backend_ you are trying to support?
  • This means that the "syntax" question is moot... just use the syntax for that architecture and backend the compiler happens to be using.
  • Rust would just need std unsafe functions to be able to put bytes into the correct registers and push/pop to the stack for whatever architecture is being compiled against -- again, this may have to be rewritten for every architecture and maybe even every backend.

I get that backwards compatibility is an issue, but with the huge number of bugs and the fact that this was never stabilized maybe it would be better to just pass it along to the backend. Rust shouldn't be in the business of trying to fix LLVM's or gcc's or anyone else's mistakes odd syntax. Rust is in the business of emitting machine code for the architecture and compiler it is targeting... and asm is already basically that code!

The reason there is no progress here is that nobody is investing time in fixing this issue. That's not a good reason for stabilizing a feature.

While reading through this thread, I had an idea and had to post it. Sorry if I'm answering to an old post, but I thought it was worth it:

@main-- said:

Both optimizations certainly yield better throughput (especially in the face of hyperthreading) but not the latency reduction I was hoping to achieve. I ended up dropping down all the way to nasm for that experiment but having to rewrite the code from intrinsics to plain asm was just unnecessary friction. Of course I want the optimizer to handle things like instruction selection or constant folding when using some high-level vector API. But when I explicitly decided which instructions to use I really don't want the compiler to mess around with that. The only alternative is inline asm.

Maybe instead of inline asm, what we really need here are function attributes for LLVM that told the optimizer: "optimize this for throughput", "optimize this for latency", "optimize this for binary size". I know this solution is upstream, but it would not only solve your particular problem automatically (by providing the lower-latency but otherwise isomorphic implementation of the algorithm), it would also allow Rust programmers to have more fine-grained control over the performance characteristics that matter to them.

@felix91gr That doesn't solve usecases that require emitting an exact sequence of instructions, eg interrupt handlers.

@mark-i-m of course not. That's why I put a literal quote! 🙂

My point was that even though you might solve the "compiler optimizes in a way opposite of what I need" (which is classic in their case: latency vs throughput) by using inline asm features, maybe (and imo definitely) that use case would be served better by more fine-grained control of optimizations :)

In light of the upcoming changes to inline assembly, most of the discussion in this issue is no longer relevant. As such, I'm going to close this issue in favor of two separate tracking issue for each flavor of inline assembly we have:

  • Tracking Issue for LLVM-style inline assembly (llvm_asm) #70173
  • Tracking Issue for inline assembly (asm!) #72016
Was this page helpful?
0 / 5 - 0 ratings