Rust: LLVM loop optimization can make safe programs crash

Created on 29 Sep 2015  ·  97Comments  ·  Source: rust-lang/rust

The following snippet crashes when compiled in release mode on current stable, beta and nightly:

enum Null {}

fn foo() -> Null { loop { } }

fn create_null() -> Null {
    let n = foo();

    let mut i = 0;
    while i < 100 { i += 1; }
    return n;
}

fn use_null(n: Null) -> ! {
    match n { }
}


fn main() {
    use_null(create_null());
}

https://play.rust-lang.org/?gist=1f99432e4f2dccdf7d7e&version=stable

This is based on the following example of LLVM removing a loop that I was made aware of: https://github.com/simnalamburt/snippets/blob/12e73f45f3/rust/infinite.rs.
What seems to happen is that since C allows LLVM to remove endless loops that have no side-effect, we end up executing a match that has to arms.

A-LLVM C-bug E-medium I-needs-decision I-unsound 💥 P-medium T-compiler WG-embedded

Most helpful comment

In case anybody wanted to play test case code golf:

pub fn main() {
   (|| loop {})()
}

With the -Z insert-sideeffect rustc flag, added by @sfanxiang in https://github.com/rust-lang/rust/pull/59546, it keeps on looping :)

before:

main:
  ud2

after:

main:
.LBB0_1:
  jmp .LBB0_1

All 97 comments

The LLVM IR of the optimised code is

; Function Attrs: noreturn nounwind readnone uwtable
define internal void @_ZN4main20h5ec738167109b800UaaE() unnamed_addr #0 {
entry-block:
  unreachable
}

This kind of optimisation breaks the main assumption that should normally hold on uninhabited types: it should be impossible to have a value of that type.
rust-lang/rfcs#1216 proposes to explicitly handle such types in Rust. It might be effective in ensuring that LLVM never has to handle them and in injecting the appropriate code to ensure divergence when needed (IIUIC this could be achieved with appropriate attributes or intrinsic calls).
This topic has also been recently discussed in the LLVM mailing list: http://lists.llvm.org/pipermail/llvm-dev/2015-July/088095.html

triage: I-nominated

Seems bad! If LLVM doesn't have a way to say "yes, this loop really is infinite" though then we may just have to sit-and-wait for the upstream discussion to settle.

A way to prevent infinite loops from being optimised away is to add unsafe {asm!("" :::: "volatile")} inside of them. This is similar to the llvm.noop.sideeffect intrinsic that has been proposed in the LLVM mailing list, but it might prevent some optimisations.
In order to avoid the performance loss and to still guarantee that diverging functions/loops are not optimised away, I believe that it should be sufficient to insert an empty non-optimisable loop (i.e. loop { unsafe { asm!("" :::: "volatile") } }) if uninhabited values are in scope.
If LLVM optimises the code which should diverge to the point that it does not diverge anymore, such loops will ensure that the control flow is still unable to proceed.
In "lucky" case in which LLVM is unable to optimise the diverging code, such loop will be removed by DCE.

Is this related to #18785? That one's about infinite recursion to be UB, but it sounds like the fundamental cause might be similar: LLVM doesn't consider not halting to be a side effect, so if a function has no side effects other than not halting, it's happy to optimize it away.

@geofft

It's the same issue.

Yes, looks like it's the same. Further down that issue, they show how to get undef, from which I assume it's not hard to make a (seemingly safe) program crash.

:+1:

So I've been wondering how long until somebody reports this. :) In my opinion, the best solution would of course be if we could tell LLVM not to be so aggressive about potentially infinite loops. Otherwise, the only thing I think we can do is to do a conservative analysis in Rust itself that determines whether:

  1. the loop will terminate OR
  2. the loop will have side-effects (I/O operations etc, I forget precisely how this is defined in C)

Either of this should be enough to avoid undefined behavior.

triage: P-medium

We'd like to see what LLVM will do before we invest a lot of effort on our side, and this seems relatively unlikely to cause problems in practice (though I have personally hit this while developing the compiler as well). There are no backwards incomatibility issues to be concerned about.

Quoting from the LLVM mailing list discussion:

 The implementation may assume that any thread will eventually do one of the following:
   - terminate
   - make a call to a library I/O function
   - access or modify a volatile object, or
   - perform a synchronization operation or an atomic operation

 [Note: This is intended to allow compiler transformations such as removal of empty loops, even
  when termination cannot be proven. — end note ]

@dotdash The excerpt you are quoting comes from the C++ specification; it is basically the answer to "how it [having side effects] is defined in C" (also confirmed by the standard committee: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1528.htm ).

Regarding what is the expected behaviour of the LLVM IR there is some confusion. https://llvm.org/bugs/show_bug.cgi?id=24078 shows that there seems to be no accurate & explicit specification of the semantics of infinite loops in LLVM IR. It aligns with the semantics of C++, most likely for historical reasons and for convenience (I only managed to track down https://groups.google.com/forum/#!topic/llvm-dev/j2vlIECKkdE which apparently refers to a time when infinite loops were not optimised away, some time before the C/C++ specs were updated to allow it).

From the thread it is clear that there is the desire to optimise C++ code as effectively as possible (i.e. also taking into account the opportunity to remove infinite loops), but in the same thread several developers (including some that actively contribute to LLVM) have shown interest in the ability to preserve infinite loops, as they are needed for other languages.

@ranma42 I'm aware of that, I just quoted that for reference, because one possibility to work-around this would be to detect such loops in rust and add one of the above to it to stop LLVM from performing this optimization.

Is this a soundness issue? If so, we should tag it as such.

Yes, following @ranma42's example, this way shows how it readily defeats array bounds checks. playground link

@bluss

The policy is that wrong-code issues that are also soundness issues (i.e. most of them) should be tagged I-wrong.

So just to recap prior discussion, there are really two choices here that I can see:

  • Wait for LLVM to provide a solution.
  • Introduce no-op asm statements wherever there may be an infinite loop or infinite recursion (#18785).

The latter is kind of bad because it can inhibit optimization, so we'd want to do it somewhat sparingly -- basically wherever we can't prove termination ourselves. You could also imaging tying it a bit more to how LLVM optimizes -- i.e., introducing only if we can detect a scenario that LLVM might consider to be an infinite loop/recursion -- but that would (a) require tracking LLVM and (b) require deeper knowledge than I, at least, possess.

Wait for LLVM to provide a solution.

What is the LLVM bug tracking this issue?

side-note: while true {} exhibits this behaviour. Maybe the lint should be upgraded to error-by-default and get a note stating that this currently can exhibit undefined behaviour?

Also, note that this is invalid for C. LLVM making this argument means that there is a bug in clang.

void foo() { while (1) { } }

void create_null() {
        foo();

        int i = 0;
        while (i < 100) { i += 1; }
}

__attribute__((noreturn))
void use_null() {
        __builtin_unreachable();
}


int main() {
        create_null();
        use_null();
}

This crashes with optimizations; this is invalid behavior under the C11 standard:

An iteration statement whose controlling expression is not a constant
expression, [note 156] that performs no  input/output  operations,
does  not  access  volatile  objects,  and  performs  no synchronization or
atomic operations in its body, controlling expression, or (in the case of
a for statement) its expression-3, may be   assumed   by   the
implementation to terminate. [note 157]

156: An omitted controlling expression is replaced by a nonzero constant,
     which is a constant expression.
157: This  is  intended  to  allow  compiler  transformations  such  as
     removal  of  empty  loops  even  when termination cannot be proven. 

Note the "whose controlling expression is not a constant expression" - while (1) { }, 1 is a constant expression, and thus may not be removed.

Is the loop removal an optimization pass that we could simply remove?

@ubsan

Did you find a bug report for that in LLVM's bugzilla or filled one? It seems that in C++ infinite loops that _can_ never terminate are undefined behavior, but in C they are defined behavior (either they can be safely removed in some cases, or they cannot in others).

@gnzlbg I'm filing a bug now.

https://llvm.org/bugs/show_bug.cgi?id=31217

Repeating myself from #42009: this bug can, under some circumstances, cause the emission of an externally callable function containing no machine instructions at all. This should never happen. If LLVM deduces that a pub fn can never be called by correct code, it should emit at least a trap instruction as the body of that function.

The LLVM bug for this is https://bugs.llvm.org/show_bug.cgi?id=965 (opened in 2006).

@zackw LLVM has a flag for that: TrapUnreachable. I haven't tested this, but it looks adding Options.TrapUnreachable = true; to LLVMRustCreateTargetMachine ought to address your concern. It's likely that this has a low enough cost that it could be done by default, though I haven't made any measurements.

@oli-obk It's unfortunately not just a loop-deletion pass. The problem arises from broad assumptions, for example: (a) branches have no side effects, (b) functions that contain no instructions with side effects have no side effects, and (c) calls to functions with no side effects can be moved or deleted.

Looks like there's a patch: https://reviews.llvm.org/D38336

@sunfishcode , looks like your LLVM patch at https://reviews.llvm.org/D38336 was "accepted" on October 3, can you give an update on what that means regarding LLVM's release process? What's the next step beyond acceptance, and do you have an idea of what future LLVM release will contain this patch?

I talked with some people offline who suggested we have an llvmdev thread. The thread is here:

http://lists.llvm.org/pipermail/llvm-dev/2017-October/118558.html

It's now concluded, with the result being that I need to make additional changes. I think the changes will be good, though they'll take me a little more time to do.

Thanks for the update, and thanks so much for your efforts!

Note that https://reviews.llvm.org/rL317729 has landed in LLVM. This patch is planned to have a follow-up patch which makes infinite loops exhibit defined behavior by default, so AFAICT all we need to do is wait and eventually this will be resolved for us upstream.

@zackw I've now created #45920 to fix the problem of functions containing no code.

@bstrie Yes, the first step is landed, and I'm working on the second step of making LLVM give infinite loops defined behavior by default. It's a complex change, and I don't yet know how long it'll take to finish, but I'll post updates here.

@jsgf Still repro. Have you selected Release mode?

@kennytm Woops, never mind.

Note that https://reviews.llvm.org/rL317729 has landed in LLVM. This patch is planned to have a follow-up patch which makes infinite loops exhibit defined behavior by default, so AFAICT all we need to do is wait and eventually this will be resolved for us upstream.

It has been several months since this comment. Anyone knows if the follow-up patch happened or will still happen?

Alternatively, it seems that the llvm.sideeffect intrinsic exists in the LLVM version we are using: could we fix this ourselves by translating Rust infinite loops into LLVM loops that contain the sideeffect intrinsic?

As seen is https://github.com/rust-lang/rust/issues/38136 and https://github.com/rust-lang/rust/issues/54214 , this is especially bad with the upcoming panic_implementation, as a logical implementation of it will be loop {}, and this would make all occurrences of panic! UB without any unsafe code. Which… is maybe the worse that could happen.

Just came across this issue in another light. Here's an example:

pub struct Container<'f> {
    string: &'f str,
    num: usize,
}

impl<'f> From<&'f str> for Container<'f> {
    #[inline(always)]
    fn from(string: &'f str) -> Container<'f> {
        Container::from(string)
    }
}

fn main() {
    let x = Container::from("hello");
    println!("{} {}", x.string, x.num);

    let y = Container::from("hi");
    println!("{} {}", y.string, y.num);

    let z = Container::from("hello");
    println!("{} {}", z.string, z.num);
}

This example reliably segfaults on stable, beta, and nightly, and shows how easy it is to construct uninitialized values of any type. Here it is on the playground.

@SergioBenitez that program does not segfault, it terminates with a stack overflow (you need to run it in debug mode). This is the correct behavior, since your program just recurses infinitely requiring an infinite amount of stack space, which at some point is going to exceed the stack space available. Minimal working example.

In release builds, LLVM can assume that you don't have infinite recursion, and optimizes that away (mwe). This has nothing to do with loops AFAICT, but rather with https://stackoverflow.com/a/5905171/1422197

@gnzlbg Sorry, but you're not correct.

The program segfaults in release mode. That's the entire point; that an optimization results in unsound behavior - that LLVM and Rust's semantics are not in agreement here - that I can write and compile a safe Rust program with rustc that allows me to use uninitialized memory, inspect arbitrary memory, and arbitrarily cast between types, violating the semantics of the language. That's the same point being illustrated in this thread. Note that the original program also doesn't segfault in debug mode.

You also seem to be proposing that there is a _different_, non-loop optimization taking place here. That's unlikely, though largely irrelevant, albeit it might warrant a separate issue if it is the case. My guess is that LLVM is noticing the tail recursion, treating it as an infinite loop, and optimizing it away, again, exactly what this issue is about.

@gnzlbg Well, slightly changing your mwe of optimization away of infinite recursion (here), it does generate an uninitialized value of NonZeroUsize (which turns out to be… 0, thus an invalid value).

And that is what @SergioBenitez also did with their example, except it's with pointers, and thus generates a segfault.

Are we in agreement that @SergioBenitez program has a stack overflow in both debug and release?

If so, I cannot find any loops in @SergioBenitez example, so I don't know how this issue would apply to it (this issue is about infinite loops after all). If I am wrong, please point me to the loop in your example.

As mentioned, LLVM assumes that infinite recursion cannot happen (it assumes that all threads eventually terminate), but that would be a different issue from this one.

I have not inspected the optimizations LLVM does or the generated code for either of the programs, but do note that a segfault is not unsound, if the segfault is all that happens. In particular, stack overflows that are caught (by stack probing + an unmapped guard page after the end of the stack) and don't cause any memory safety problems also show up as segfaults. Of course, segfaults can also indicate memory corruption or wild writes/reads or other soundness problems.

@rkruppe My program segfaults because a reference to a random memory location was allowed to be constructed, and the reference was subsequently read. The program can be trivially modified to instead write a random memory location, and without too much difficulty, read/write a _particular_ memory location.

@gnzlbg The program _does not_ stack overflow in release mode. In release mode, the program makes zero function calls; the stack is pushed onto a finite number of times, purely to allocate locals.

The program does not stack overflow in release mode.

So? The only thing that matters is that the example program, which is basically fn foo() { foo() }, has infinite recursion, which is not allowed by LLVM.

The only thing that matters is that the example program, which is basically fn foo() { foo() }, has infinite recursion, which is not allowed by LLVM.

I don't know why you're saying this like it resolves anything. LLVM considering infinite recursion and loops UB and optimizing accordingly, yet it being safe in Rust, is the entire point of this whole issue!

Author of https://reviews.llvm.org/rL317729 here, confirming that I have not yet implemented the follow-up patch.

You can insert an @llvm.sideeffect calls today to ensure that loops are not optimized away. That might disable some optimizations, but in theory not too many, since the major optimizations have been taught how to understand it. If one puts @llvm.sideeffect calls in all loops or things which might turn into loops (recursion, unwinding, inline asm, others?), that's theoretically sufficient to fix the problem here.

Obviously it would be nicer to have the second patch in place, so that it's not necessary to do this. I don't know when I'll get back to implementing that.

There's some small difference, but I'm not sure if it's material or not.

Recursion

#[allow(unconditional_recursion)]
#[inline(never)]
pub fn via_recursion<T>() -> T {
    via_recursion()
}

fn main() {
    let a: String = via_recursion();
}
define internal void @_ZN10playground4main17h1daf53946e45b822E() unnamed_addr #2 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
_ZN4core3ptr13drop_in_place17h95538e539a6968d0E.exit:
  ret void
}

Loop

#[inline(never)]
pub fn via_loop<T>() -> T {
    loop {}
}

fn main() {
    let b: String = via_loop();
}
define internal void @_ZN10playground4main17h1daf53946e45b822E() unnamed_addr #2 {
start:
  unreachable
}

Meta

Rust 1.29.1, compiling in release mode, viewing the LLVM IR.

I don't think that we can, in general, detect recursion (trait objects, C FFI, etc.), so we would have to use llvm.sideeffect on pretty much every call site unless we can prove that the call site won't recurse. Proving no recursions for the cases in which it can be proven requires interprocedural analysis except for the most trivial programs like fn main() { main() }. It might be good to know what's the impact of implementing this fix, and whether there are alternative solutions to this problem.

@gnzlbg That's true, though you could put the @llvm.sideeffects at the entries of functions, rather than at the call sites.

Strangely, I cannot reproduce the SEGFAULT in @SergioBenitez' test case locally.

Also, for a stack overflow, shouldn't there be a different error message? I thought we had some code to print "The stack overflowed" or so?

@RalfJung did you try in debug mode? (I can reliably reproduce the stack overflow in debug mode in my machines and the playground, so maybe you need to fill a bug if for you this is not the case). In --release you won't get a stack overflow because all that code is misoptimized away.


@sunfishcode

That's true, though you could put the @llvm.sideeffects at the entries of functions, rather than at the call sites.

It's hard to tell what the best way forward would be without knowing exactly which optimizations llvm.sideeffects prevents. Is it worth it to try to generate as few @llvm.sideeffects as possible? If not, then putting it in every function call might be the most straightforward thing to do. Otherwise, IIUC, whether @llvm.sideeffect is needed depends on what the call site does:

trait Foo {
    fn foo(&self) { self.bar() }
    fn bar(&self);
}

struct A;
impl Foo for A {
    fn bar(&self) {} // not recursive
}
struct B;
impl Foo for B {
    fn bar(&self) { self.foo() } // recursive
}

fn main() {
    let a = A;
    a.bar(); // Ok - no @llvm.sideeffect needed anywhere
    let b = B;
    b.bar(); // We need @llvm.sideeffect on this call site
    let c: &[&dyn Foo] = &[&a, &b];
    for i in c {
        i.bar(); // We need @lvm.sideeffect here too
    }
}

AFAICT, we have to put @llvm.sideeffect inside the functions to prevent them from being removed away, so even if these "optimizations" were worth it, I don't think they are straightforward to do with the current model. Even if they were, these optimizations would rely on being able to prove that there is no recursion.

did you try in debug mode? (I can reliably reproduce the stack overflow in debug mode in my machines and the playground, so maybe you need to fill a bug if for you this is not the case)

Sure, but in debug mode LLVM doesn't do the loop optimizations so there is no problem.

If the program stackoverflows in debug mode, that should not give LLVM license to create UB. The issue is figuring out whether the final program has UB, and from staring at the IR I cannot tell. It segfaults, but I do not know why. But it does seem like a bug to me to "optimize" a stackoverflowing program into one that segfaults.

If the program stackoverflows in debug mode, that should not give LLVM license to create UB.

But it does seem like a bug to me to "optimize" a stackoverflowing program into one that segfaults.

In C, a thread of execution is assumed to terminate, perform volatile memory accesses, I/O, or a synchronizing atomic operation. It would be surprising to me if LLVM-IR did not evolve to have the same semantics either by accident or by design.

The Rust code contains a thread of execution that never terminates, and does not perform any of the operations required for this to not be UB in C. I'd suspect that we generate the same LLVM-IR that a C program with undefined behavior would, so I don't think it is surprising that LLVM is misoptimizing this Rust program.

It segfaults, but I do not know why.

LLVM removes the infinite recursion, so as @SergioBenitez mentioned above, the program then proceeds to:

a reference to a random memory location was allowed to be constructed, and the reference was subsequently read.

The part of the program that does it is this one:

let x = Container::from("hello");  // invalid reference created here
println!("{} {}", x.string, x.num);  // invalid reference dereferenced here

where Container::from starts an infinite recursion, that LLVM concludes cannot ever happen, and replaces with some random value, which then gets dereferenced. You can see one of the many ways in which this gets misoptimized here: https://rust.godbolt.org/z/P7Snex On the playground (https://play.rust-lang.org/?gist=f00d41cc189f9f6897d429350f3781ec&version=stable&mode=release&edition=2015) one gets a different panic in release from debug builds due to this optimization, but UB is UB is UB.

The Rust code contains a thread of execution that never terminates, and does not perform any of the operations required for this to not be UB in C. I'd suspect that we generate the same LLVM-IR that a C program with undefined behavior would, so I don't think it is surprising that LLVM is misoptimizing this Rust program.

I was under the impression that you argued above that this is not the same bug as the infinite loop issue. It seems I misread your messages then. Sorry for the confusion.

So, seems like a good next step would be to sprinkle some llvm.sideffect into the IR we generate, and do some benchmarks?

In C, a thread of execution is assumed to terminate, perform volatile memory accesses, I/O, or a synchronizing atomic operation.

Btw, this is not entirely correct -- a loop with a constant conditional (such as while (true) { /* ... */ }) is explicitly allowed by the standard, even if it does not contain any side-effects. This is different in C++. LLVM does not implement the C standard correctly here.

I was under the impression that you argued above that this is not the same bug as the infinite loop issue.

The behavior of non-terminating Rust programs is always defined, while the behavior of non-terminating LLVM-IR programs is only defined if certain conditions are met.

I thought that this issue is about fixing the implementation of Rust for infinite loops such that the behavior of the generated LLVM-IR becomes defined, and for this, @llvm.sideeffect, sounded like a pretty good solution.

@SergioBenitez mentioned that one can also create non-terminating Rust programs using recursion, and @rkruppe argued that infinite recursion and infinite loops are equivalent, so that these are both the same bug.

I don't disagree that these two issues are related, or even that they are the same bug, but for me, these two issues look slightly different:

  • solution wise, we go from applying an optimization barrier (@llvm.sideeffect) exclusively to non-terminating loops, to apply it to every single Rust function.

  • value-wise, infinite loops are useful because the program never terminates. For infinite recursion, whether the program terminates depends on optimization level (e.g. whether LLVM transforms the recursion into a loop or not), and when and how the program terminates depends on the platform (stack size, protected guard page, etc.). Fixing both is required to make the Rust implementation sound, but for the case of infinite recursion, if the user intended their program to recurse forever, a sound implementation would still be "wrong" in the sense that it won't always recurse forever.

solution wise, we go from applying an optimization barrier (@llvm.sideeffect) exclusively to non-terminating loops, to apply it to every single Rust function.

The analysis required to show that a loop body actually has side effects (not just potentially, as with calls to external functions) and thus doesn't need a llvm.sideeffect insertion is quite tricky, probably in roughly the same order of magnitude as showing the same for a function that may be part of infinite recursion. Proving that a loop is terminating is also difficult without doing a lot of optimizations first, since most Rust loops involve iterators. So I think we'd wind up putting llvm.sideeffect into the vast majority of loops regardless. Admittedly, there are quite a few functions that don't contain loops, but it still doesn't seem like a qualitative difference to me.

If I understand the problem correctly, to fix the infinite-loop case, it should be sufficient to insert llvm.sideeffect into loop { ... } and while <compile-time constant true> { ... } where the body of the loop contains no break expressions. This captures the difference between C++ semantics and Rust semantics for infinite loops: in Rust, unlike C++, the compiler is not allowed to assume that a loop terminates when it is knowable at compile time that it _doesn't_. (I'm not sure how much we need to worry about correctness in the face of loops where the body might panic, but that can always be improved later.)

I don't know what to do about the infinite recursion, but I agree with RalfJung that optimizing an infinite recursion into an unrelated segfault is not desirable behavior.

@zackw

If I understand the problem correctly, to fix the infinite-loop case, it should be sufficient to insert llvm.sideeffect into loop { ... } and while { ... } where the body of the loop contains no break expressions.

I don't think it's that simple, e.g., loop { if false { break; } } is an infinite loop that contains a break expression, yet we need to insert @llvm.sideeffect to prevent llvm from removing it. AFAICT we have to insert @llvm.sideeffect unless we can prove that the loop always terminates.

@gnzlbg

loop { if false { break; } } is an infinite loop that contains a break expression, yet we need to insert @llvm.sideeffect to prevent llvm from removing it.

Hm, yes, that's troublesome. But we don't have to be _perfect_, just conservatively correct. A loop like

while spinlock.load(Ordering::SeqCst) != 0 {}

(from the std::sync::atomic documentation) would easily be seen not to need a @llvm.sideeffect, since the controlling condition is not constant (and an atomic load operation had better count as a side-effect for LLVM purposes, or we have bigger problems). The kind of finite loop that might be emitted by a program generator,

loop {
    if /* runtime-variable condition */ { break }
    /* more stuff */
}

should also not be troublesome. In fact, is there any case that the "no break expressions in the body of the loop" rule gets wrong _besides_

loop {
    if /* provably false at compile time */ { break }
}

?

I thought that this issue is about fixing the implementation of Rust for infinite loops such that the behavior of the generated LLVM-IR becomes defined, and for this, @llvm.sideeffect, sounded like a pretty good solution.

Fair enough. However, as you said, the issue (the mismatch between Rust semantics and LLVM semantics) is actually about non-termination, not about loops. So I think that's what we should be tracking here.

@zackw

If I understand the problem correctly, to fix the infinite-loop case, it should be sufficient to insert llvm.sideeffect into loop { ... } and while { ... } where the body of the loop contains no break expressions. This captures the difference between C++ semantics and Rust semantics for infinite loops: in Rust, unlike C++, the compiler is not allowed to assume that a loop terminates when it is knowable at compile time that it doesn't. (I'm not sure how much we need to worry about correctness in the face of loops where the body might panic, but that can always be improved later.)

What you describe holds for C. In Rust, any loop is allowed to diverge. Everything else would just be unsound to do.

So, for example

while test_fermats_last_theorem_on_some_random_number() { }

is an okay program in Rust (but neither in C nor C++), and it will loop forever without causing a side-effect. So, it has to be all loops, except for those we can prove will terminate.

@zackw

is there any case that the "no break expressions in the body of the loop" rule gets wrong besides

It's not only if /*compile-time condition */. All control-flow is affected (while, match, for, ...) and run-time conditions are affected as well.

But we don't have to be perfect, just conservatively correct.

Consider:

fn foo(x: bool) { loop { if x { break; } } }

where x is a run-time condition. If we don't emit @llvm.sideeffect here, then if the user writes foo(false) somewhere, foo could be inlined and with constant propagation and dead code elimination, the loop optimized into an infinite loop without side-effects, resulting in a mis-optimization.

If that makes sense, one transformation that LLVM would be allowed to do is replacing foo with foo_opt:

fn foo_opt(x: bool) { if x { foo(true) } else { foo(false) } }

where both branches are optimized independently, and the second branch would be mis-optimized if we don't use @llvm.sideeffect.

That is, to be able to omit @llvm.sideeffect, we would need to prove that LLVM cannot mis-optimize that loop under any circumstances. The only way to prove this is to either prove that the loop always terminates, or to prove that if it does not terminate, that it unconditionally does one of the things that prevent mis-optimizations. Even then, optimizations like loop splitting/peeling could transform one loop into a series of loops, and it would be enough for one of them to not have @llvm.sideeffect for a mis-optimization to happen.

Everything about this bug sounds to me like it'd be much easier to solve from LLVM than from rustc. (disclaimer: I don't really know the code base of either of these projects)

As I understand it, the fix from LLVM would be changing optimizations from running on (prove non-termination || can't prove either) to running only when non-termination can be proven (or the opposite). I'm not saying this is easy (in any way), but LLVM already (I guess) includes code to try to prove (non-)termination of loops.

On the other hand, rustc can only do this adding @llvm.sideeffect, which will potentially have more impact to optimization than “just” disabling the optimizations that make inappropriate use of non-termination. And rustc would have to embed new code to try to detect (non-)termination of loops.

So I would think the path forward would be:

  1. Add @llvm.sideeffect on every loop and function call to fix the issue
  2. Fix LLVM to not perform wrong optimizations on non-terminating loops, and remove the @llvm.sideeffects

What do you think about this? I hope the performance impact of step 1 wouldn't be too horrible, though, even if it's meant to vanish once 2 is implemented…

@Ekleog that's what @sunfishcode second patch might be about: https://lists.llvm.org/pipermail/llvm-dev/2017-October/118595.html

part of the function attribute proposal is to
change the default semantics of LLVM IR to have defined behavior on
infinite loops, and then add an attribute opting into potential-UB. So
if we do that, then the role of @llvm.sideeffect becomes a little
subtle -- it'd be a way for a frontend for a language like C to opt
into potential-UB for a function, but then opt out for individual
loops in that function.

To be fair to LLVM, compiler writers don't approach this topic from the perspective of "I'm going to write an optimization that proves loops are non-terminating, so that I can pedantically optimize them away!" Instead, the assumption that loops will either terminate or have side effects just arises naturally in some common compiler algorithms. Fixing this isn't just a tweak to existing code; it'll require a significant amount of new complexity.

Consider the following algorithm for testing whether a function body "has no side effects": if any instruction in the body has potential side effects, then the function body may have side effects. Nice and simple. Then later, calls to functions "with no side effects" are deleted. Cool. Except, branch instructions are considered to have no side effects, so a function containing only branches will appear to have no side effects, even though it may contain an infinite loop. Oops.

It is fixable. If anyone else is interested in looking into this, my basic idea is to split the concept of "has side effects" into independent concepts of "has actual side effects" and "may be non-terminating". And then go through the whole optimizer and find all the places that care about "has side effects" and figure out which concept(s) they actually need. And then teach the loop passes to add metadata to branches that aren't part of a loop, or the loops they're in are provably finite, in order to avoid pessimizations.


A possible compromise might be to have rustc insert @llvm.sideeffect when a user literally writes an empty loop { } (or similar) or unconditional recursion (which already has a lint). This compromise would allow people who actually do intend an infinite effectless spinning loop to get it, while avoiding any overhead for everyone else. Of course, this compromise wouldn't make it impossible to crash safe code, but it would likely reduce the chances of it happening accidentally, and it seems like it should be easy to implement.

Instead, the assumption that loops will either terminate or have side effects just arises naturally in some common compiler algorithms.

It is entirely unnatural though if you even start to think about correctness of those transformations. To be frank I still think it was a huge mistake of C to ever allow this assumption, but well.

if any instruction in the body has potential side effects, then the function body may have side effects.

There's a good reason that "non-termination" is typically considered an effect when you start to look at things formally. (Haskell isn't pure, it has two effects: Non-termination and exceptions.)

A possible compromise might be to have rustc insert @llvm.sideeffect when a user literally writes an empty loop { } (or similar) or unconditional recursion (which already has a lint). This compromise would allow people who actually do intend an infinite effectless spinning loop to get it, while avoiding any overhead for everyone else. Of course, this compromise wouldn't make it impossible to crash safe code, but it would likely reduce the chances of it happening accidentally, and it seems like it should be easy to implement.

As you noted yourself, this is still incorrect. I do not think we should accept a "solution" which we know to be incorrect. Compilers are such an integral part of our infrastructure, we shouldn't just hope that nothing goes wrong. This is no way to build a solid foundation.


What happened here is that the notion of correctness was built around what compilers did, instead of starting with "What do we want from our compilers" and then making that their specification. A correct compiler does not turn a program that always diverges into one that terminates, period. I find this rather self-evident, but with Rust having a reasonable type system, this is even clearly witnessed in the types, which is why the issue is surfacing regularly.

Given the constraints we are working with (namely, LLVM), what we should do is start by adding llvm.sideeffect in enough places such that every diverging execution is guaranteed to "execute" infinitely many of those. Then we have reached a reasonable (as in, sound and correct) baseline and can talk about improvements by way of removing these annotations when we can guarantee they are not needed.

To make my point more precise, I think the following is a sound Rust crate, with pick_a_number_greater_2 returning (non-deterministically) some kind of big-int:

fn test_fermats_last_theorem() -> bool {
  let x = pick_a_number_greater_2();
  let y = pick_a_number_greater_2();
  let z = pick_a_number_greater_2();
  let n = pick_a_number_greater_2();
  // x^n + y^n = z^n is impossible for n > 2
  pow(x, n) + pow(y, n) != pow(z, n)
}

pub fn diverge() -> ! {
  while test_fermats_last_theorem() { }
  // This code is unreachable, as proven by Andrew Wiles
  unsafe { mem::transmute(()) }
}

If we compile away that diverging loop, that's a bug and it should be fixed.

We don't even have numbers so far for how much performance it would cost to fix this naively. Until we do, I see no reason to deliberately break programs like the above.

In practice, fn foo() { foo() } will always terminate due to resource exhaustion, but since the Rust abstract machine has an infinitely large stack frame (AFAIK), it is valid to transform that code into fn foo() { loop {} } which will never terminate (or much later, when the universe freezes). Should this transformation be valid? I'd say yes, since otherwise we can't perform tail-call optimizations unless we can prove termination, which would be unfortunate.

Would it make sense to have an unsafe intrinsic that states that a given loop, recursion, ... always terminates? N1528 gives an example were, if loops cannot be assumed to terminate, loop fusion cannot be applied to pointer code traversing linked lists, because the linked lists could be circular, and proving that a linked list is not circular is not something that modern compilers can do.

I absolutely agree we need to fix this soundness issue for good. However, the way we go about that should be mindful of the possibility that "add llvm.sideeffect everywhere we can't prove it's unnecessary" may regress the code quality of programs that are compiled correctly today. While such concerns are ultimately overriden by the need to have a sound compiler, it might be prudent to proceed in a way that delays the proper fix a bit in exchange for avoiding performance regressions and improving quality of life for the average Rust programmer in the mean time. I propose:

  • As with other potentially-performance-regressing fixes for long-standing soundness bugs (#10184) we should implement the fix behind a -Z flag to be able to evaluate the performance impact on code bases in the wild.
  • If the impact turns out to be neglegible, great, we can just turn on the fix by default.
  • But if there are real regressions from it, we can take that data to LLVM people and try to improve LLVM first (or we could choose to eat the regression and fix it later, but in any case we'd make an informed decision)
  • If we decide to not turn on the fix by default due to regressions, we can at least go ahead with adding llvm.sideeffect to syntactically empty loops: they're rather common and them being miscompiled has lead to multiple people spending miserable hours debugging weird issues (#38136, #47537, #54214, and surely there's more), so even though this mitigation has no bearing on the soundness bug, it would have a tangible benefit for developers while we work out the kinks in the proper bug fix.

Admittedly, this perspective is informed by the fact that this issue has been standing for years. If it was a fresh regression, I would be more open to fixing it more quickly or reverting the PR that introduced it.

Meanwhile, should this be mentioned in https://doc.rust-lang.org/beta/reference/behavior-considered-undefined.html as long as this issue is open?

Would it make sense to have an unsafe intrinsic that states that a given loop, recursion, ... always terminates?

std::hint::reachable_unchecked?

Incidentally I ran into this writing real code for a TCP message system. I had an infinite loop as a stopgap until I put in a real mechanism for stopping but the thread exited immediately.

In case anybody wanted to play test case code golf:

fn main() {
    (|| loop {})()
}

```
$ cargo run --release
Illegal instruction (core dumped)

In case anybody wanted to play test case code golf:

pub fn main() {
   (|| loop {})()
}

With the -Z insert-sideeffect rustc flag, added by @sfanxiang in https://github.com/rust-lang/rust/pull/59546, it keeps on looping :)

before:

main:
  ud2

after:

main:
.LBB0_1:
  jmp .LBB0_1

By the way, the LLVM bug tracking this is https://bugs.llvm.org/show_bug.cgi?id=965 , which I haven't seen posted yet in this thread.

@RalfJung Can you update the hyperlink https://github.com/simnalamburt/snippets/blob/master/rust/src/bin/infinite.rs in the issue description into https://github.com/simnalamburt/snippets/blob/12e73f45f3/rust/infinite.rs this? The former hyperlink were broken for a long time since in was not a permalink. Thanks! 😛

@simnalamburt done, thanks!

Increasing the MIR opt level appears to avoid the misoptimization in the following case:

pub fn main() {
   (|| loop {})()
}

--emit=llvm-ir -C opt-level=1

define void @_ZN7example4main17hf7943ea78b0ea0b0E() unnamed_addr #0 !dbg !6 {
  unreachable
}

--emit=llvm-ir -C opt-level=1 -Z mir-opt-level=2

define void @_ZN7example4main17hf7943ea78b0ea0b0E() unnamed_addr #0 !dbg !6 {
  br label %bb1, !dbg !10

bb1:                                              ; preds = %bb1, %start
  br label %bb1, !dbg !11
}

https://godbolt.org/z/N7VHnj

rustc 1.45.0-nightly (5fd2f06e9 2020-05-31)

pub fn oops() {
   (|| loop {})() 
}

pub fn main() {
   oops()
}

It helped with that special case but does not solve the issue in general. https://godbolt.org/z/5hv87d

In general this issue can only be solved when either rustc or LLVM can prove a pure function is total before using any relevant optimisations.

Indeed, I wasn't asserting that it solved the issue. The subtle effect was interesting enough to others that it seemed worth mentioning here too. -Z insert-sideeffect continues to correct both cases.

Something is moving on the LLVM side: there's a proposal to add a function-level attribute to control progress guarantees. https://reviews.llvm.org/D85393

I am not sure why everyone (here and on the LLVM threads) seems to be emphasizing the clause about forward progress.

The elimination of the loop seems to be a direct consequence of a memory model: computations of values are allowed to be moved about, as long as they happen-before the use of the value. Now, if there is a proof there can be no use of the value, it is the proof that there is no happens-before, and the code can be moved infinitely far into the future, and still satisfy the memory model.

Or, if you are not familiar with memory models, consider that the entire loop is abstracted away into a function computing a value. Now replace all reads of the value outside the loop by a call of that function. This transformation certainly is valid. Now, if there are no uses of the value, there are no invocations of the function that does the infinite loop.

computations of values are allowed to be moved about, as long as they happen-before the use of the value. Now, if there is a proof there can be no use of the value, it is the proof that there is no happens-before, and the code can be moved infinitely far into the future, and still satisfy the memory model.

This statement is correct only if that computation is guaranteed to terminate. Non-termination is a side-effect, and just like you may not remove a computation that prints to stdout (it is "not pure"), you may not remove a computation that does not terminate.

It is not okay to remove the following function call, even if the result is unused:

fn sideeffect() -> u32 {
  println!("Hello!");
  42
}

fn main() {
  let _ = sideffect(); // May not be removed.
}

This is true for any kind of side-effect, and it remains true when you replace the print by a loop {}.

The claim about non-termination as a side-effect requires not only an agreement that it is (that is non-controversial), but also an agreement about _when_ it should be observed.

Non-termination sure is observed, if the loop computes the value. Non-termination is not observed, if you are allowed to re-order the computations that do not depend on the loop's outcome.

Like the example on the LLVM thread.

x = y % 42;
if y < 0 return 0;
...

The termination properties of the division have nothing to do with reordering. The modern CPUs will attempt to execute the division, the comparison, the branch prediction and prefetching of the successful branch in parallel. So you are not guaranteed to observe the division completed at the time you observe 0 returned, if y is negative. (By "observe" here I mean really measuring with an oscillometer where the CPU is at, not by the program)

If you can't observe the division completed, you can't observe the division started. So the division in the example above would usually allowed to be reordered, which is what a compiler might do:

if y < 0 return 0;
x = y % 42;
...

I say "usually", because maybe there are languages where this is not allowed. I don't know if Rust is such a language.

Pure loops are no different.


I am not saying it is not a problem. I am only saying forward progress guarantee is not the thing that allows it to happen.

The claim about non-termination as a side-effect requires not only an agreement that it is (that is non-controversial), but also an agreement about when it should be observed.

What I am expressing is the consensus of the entire research field of programming languages and compilers. Sure you are free to disagree, but then you might as well re-define terms like "compiler correctness" -- it's not helpful for a discussion with others.

What the permissible observations are is always defined on the source level. The language specification defines an "Abstract Machine", which describes (ideally in painstaking mathematical detail) what the permissible observable behaviors of a program are. This document does not talk about any optimizations.

Correctness of a compiler is then measured in whether the programs it produces only exhibit observable behaviors that the spec says the source program can have. This is how every single programming language that takes correctness seriously works, and it is the only way we know for how to capture in a precise way when a compiler is correct.

What is up to each language is to define what exactly is considered observable at the source level, and which source behaviors are considered "undefined" and thus may be assumed by the compiler to never occur. This issue arises because C++ says that an infinite loop with no other side-effects ("silent divergence") is undefined behavior, but Rust does not say such a thing. This means that non-termination in Rust is always observable, and must be preserved by the compiler. Most programming languages make this choice, because the C++ choice can make it very easy to accidentally introduce undefined behavior (and thus critical bugs) into a program. Rust makes a promise that no undefined behavior can arise from safe code, and since safe code can contain infinite loops, it follows that infinite loops in Rust must be defined (and thus preserved) behavior.

If these things are confusing, I suggest doing some background reading. I can recommend "Types and Programming Languages" by Benjamin Pierce. You'll probably also find plenty of blog posts out there, though it can be hard to judge how well-informed the author really is.

For concreteness, if your division example were changed to

x = 42 % y;
if y <= 0 { return 0; }

then I hope you would agree that the conditional _cannot_ be hoisted above the division, because that would change the observable behavior when y is zero (from crashing to returning zero).

In the same way, in

x = if y == 0 { loop {} } else { y % 42 };
if y < 0 { return 0; }

the Rust abstract machine allows this to be rewritten as

if y == 0 { loop {} }
else if y < 0 { return 0; }
x = y % 42;

but the first condition and the loop cannot be discarded.

Ralf, I don't pretend to know half of what you do, and I don't want to introduce new meanings. I totally agree with the definition of what the correctness is (the execution order must correspond to the program order). I only thought the "when" the non-termination is observable was part of it, as in: if you aren't watching the loop outcome, you don't have a witness of its termination (so can't claim its incorrectness). I need to revisit the execution model.

Thank you for bearing with me

@zackw Thank you. That's different code, which of course will result in a different optimization.

My premise about the loops being optimized the same way as the division was flawed (can't see the outcome of division == can't see the loop terminate), so the rest doesn't matter.

@olotenko I don't know what you mean by "watching the loop outcome". A non-terminating loop makes the entire program diverge, which is considered observable behavior -- this means it is observable outside the program. As in, the user can run the program and see that it goes on forever. A program that goes on forever may not be compiled into a program that terminates, because that changes what the user can observe about the program.

It doesn't matter what that loop was computing or if the "return value" of the loop is used or not. What matters is what the user can observe when running the program. The compiler must make sure that this observable behavior stays the same. Non-termination is considered observable.

To give another example:

fn main() {
  loop {}
  println!("Hello");
}

This program will never print anything, because of the loop. But if you optimized the loop away (or reordered the loop with the print), suddenly the program would print "Hello". Thus these optimizations change the observable behavior of the program, and are disallowed.

@RalfJung it's ok, I got it now. My original problem was what role the "forward progress guarantee" plays here. The optimization is possible entirely from data dependency. My mistake was that actually data dependency is not part of the program order: it is literally the expressions totally ordered as per language semantics. If the program order is total, then without forward progress guarantee (which we can restate as "any sub-path of program order is finite") we can reorder (in the execution order) only the expressions that we can _prove_ as terminating (and preserving a few other properties, like observability of synchronization actions, OS calls, IO, etc).

I need to think some more about it, but I think I can see the reason why we can "pretend" division happened in the example with x = y % 42, even if it doesn't really get executed for some inputs, but why the same does not apply to arbitrary loops. I mean, the subtleties of the correspondence of the total (program) order and the partial (execution) order.

I think "observable behaviour" may be a little more subtle than that, as an infinite recursion will end up with a stack overflow crash ("terminates" in the sense of a "user observing the outcome"), but a tail-call optimization will turn it into a non-terminating loop. At least this is one other thing that Rust / LLVM will do. But we don't have to discuss that question as that is not really what my problem was about (unless you want to! I sure am glad to understand if that is expected).

stack overflow

Stack overflows are challenging to model indeed, good question. Same for out-of-memory situations. As a first approxmation, we formally pretend they do not happen. A better approach is to say that any time you call a function, you may get an error due to stack overflow, or the program may continue -- this is a non-deterministic choice made on each call. This way you can soundly approximate what actually happens.

we can reorder (in the execution order) only the expressions that we can prove as terminating

Indeed. Morevoer they have to be "pure", i.e., side-effect-free -- you cannot reorder two println!. That's why we usually consider non-termination an effect as well, because then this all reduces to "pure expressions can be reordered", and "non-terminating expressions are impure" (impure = has a side-effect).

Divison is also potentially impure, but only when dividing by 0 -- which causes a panic, i.e., a control effect. This is not directly observable but indirectly (e.g. by having the panic handler print something to stdout, which is then observable). Thus division can only be reordered of we are sure we are not dividing by 0.

I have some demo code that I think might be this issue but I am not entirely sure. If necessary I can put this in a new bug report.
I put the code for it in a git repo at https://github.com/uglyoldbob/rust_demo

My infinite loop (with side effects) is optimized out and a trap instruction is generated.

I have no idea if that is an instance of this problem or something else... embedded devices are not my specialty at all and with all these external crate dependencies I have no idea what else that code is doing.^^ But your program is not safe and it does have a volatile access in the loop, so I'd say it is a separate problem. When I put your example on the playground, I think it is compiled correctly, so I'd suspect the problem is with one of the extra dependencies.

It seems everything in the loop is a reference to a local variable (none escaped to any other thread). In these circumstances it is easy to prove the absence of volatile stores and the absence of observable effects (no stores they can synchronize with). If Rust doesn't add special meaning to volatiles, then this loop can be reduced to a pure infinite loop.

@uglyoldbob What's really happening in your example would be more clear if llvm-objdump weren't being spectacularly unhelpful (and inaccurate). That bl #4 (which is not actually valid assembly syntax) here means branch to 4 bytes after the end of the bl instruction, aka the end of the main function, aka the start of the next function. The next function is called (when I build it) _ZN11broken_loop18__cortex_m_rt_main17hbe300c9f0053d54dE, and that is your actual main function. The function with the unmangled name main is not your function, but a completely different function generated by the #[entry] macro provided by cortex-m-rt. Your code is not actually being optimized away. (In fact, the optimizer isn't even running since you're building in debug mode.)

Was this page helpful?
0 / 5 - 0 ratings