I may have missed it in the discussion on the RFC, but am I correct in thinking that destructors of union variants are never run? Would the destructor for the Box::new(1) run in this example?

union Foo {
    f: i32,
    g: Box<i32>,
}

let mut f = Foo { g: Box::new(1) };
f.g = Box::new(2);

sfackler on 9 Apr 2016

@sfackler My current understanding is that f.g = Box::new(2) _will_ run the destructor but f = Foo { g: Box::new(2) } would _not_. That is, assigning to a Box<i32> lvalue will cause a drop like always, but assigning to a Foo lvalue will not.

solson on 9 Apr 2016

So an assignment to a variant is like an assertion that the field was previously "valid"?

sfackler on 9 Apr 2016

@sfackler For Drop types, yeah, that's my understanding. If they weren't previously valid you need to use the Foo constructor form or ptr::write. From a quick grep, it doesn't seem like the RFC is explicit about this detail, though. I see it as an instantiation of the general rule that writing to a Drop lvalue causes a destructor call.

solson on 9 Apr 2016

Should a &mut union with Drop variants be a lint?

On Friday, 8 April 2016, Scott Olson [email protected] wrote:

@sfackler https://github.com/sfackler For Drop types, yeah, that's my
understanding. If they weren't previously valid you need to use the Foo
constructor form or ptr::write. From a quick grep, it doesn't seem like
the RFC is explicit about this detail, though.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/rust-lang/rust/issues/32836#issuecomment-207634431

ohAitch on 9 Apr 2016

On April 8, 2016 3:36:22 PM PDT, Scott Olson [email protected] wrote:

@sfackler For Drop types, yeah, that's my understanding. If they
weren't previously valid you need to use the Foo constructor form or
ptr::write. From a quick grep, it doesn't seem like the RFC is
explicit about this detail, though.

I should have covered that case explicitly. I think both behaviors are defensible, but I think it'd be far less surprising to never implicitly drop a field. The RFC already recommends a lint for union fields with types that implement Drop. I don't think assigning to a field implies that field was previously valid.

joshtriplett on 9 Apr 2016

👍4

Yeah, that approach seems a bit less dangerous to me as well.

sfackler on 9 Apr 2016

Not dropping when assigning to a union field would make f.g = Box::new(2) act differently from let p = &mut f.g; *p = Box::new(2), because you can't make the latter case _not_ drop. I think my approach is less surprising.

It's not a new problem, either; unsafe programmers already have to deal with other situations where foo = bar is UB if foo is uninitialized and Drop.

solson on 9 Apr 2016

I personally don't plan to use Drop types with unions at all. So I'll defer entirely to people who have worked with analogous unsafe code on the semantics of doing so.

joshtriplett on 9 Apr 2016

I also don't intend to use Drop types in unions so either way doesn't matter to me as long as it is consistent.

retep998 on 9 Apr 2016

I don't intend to use mutable references to unions, and probably
just "weirdly-tagged" ones with Into

On Friday, 8 April 2016, Peter Atashian [email protected] wrote:

I also don't intend to use Drop types in unions so either way doesn't
matter to me as long as it is consistent.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/rust-lang/rust/issues/32836#issuecomment-207653168

ohAitch on 9 Apr 2016

Seems like this is a good issue to raise up as an unresolved question. I'm not sure yet which approach I prefer.

nikomatsakis on 12 Apr 2016

@nikomatsakis As much as I find it awkward for assigning to a union field of a type with Drop to require previous validity of that field, the reference case @tsion mentioned seems almost unavoidable. I think this might just be a gotcha associated with code that intentionally disables the lint for putting a type with Drop in a union. (And a short explanation of it should be in the explanatory text for that lint.)

joshtriplett on 12 Apr 2016

And I'd like to reiterate that unsafe programmers must already generally know that a = b means drop_in_place(&mut a); ptr::write(&mut a, b) to write safe code. Not dropping union fields would be _one more_ exception to learn, not one less.

(NB: the drop doesn't happen when a is _statically_ known to already be uninitialized, like let a; a = b;.)

But I support having a default warning against Drop variants in unions that people have to #[allow(..)] since this is a fairly non-obvious detail.

solson on 12 Apr 2016

@tsion this is not true for a = b and maybe only sometimes true for a.x = b but it is certainly true for *a = b. This uncertainty is what made me hesitant about it. For example, this compiles:

fn main() {
  let mut x: (i32, i32);
  x.0 = 2;
  x.1 = 3;
}

(though trying to print x later fails, but I consider that a bug)

nikomatsakis on 13 Apr 2016

@nikomatsakis That example is new to me. I guess I would have considered it a bug that that example compiles, given my previous experience.

But I'm not sure I see the relevance of that example. Why is what I said not true for a = b and only sometimes for a.x = b?

Say, if x.0 had a type with a destructor, surely that destructor is called:

fn main() {
    let mut x: (Box<i32>, i32);
    x.0 = Box::new(2); // x.0 statically know to be uninit, destructor not called
    x.0 = Box::new(3); // x.0 destructor is called before writing new value
}

solson on 13 Apr 2016

Maybe just lint against that kind of write?

arielb1 on 14 Apr 2016

My point is only that = does not _always_ run the destructor; it
uses some knowledge about whether the target is known to be
initialized.

On Tue, Apr 12, 2016 at 04:10:39PM -0700, Scott Olson wrote:

@nikomatsakis That example new to me. I guess I would have considered it a bug that that example compiles, given my previous experience.

But I'm not sure I see the relevance of that example. Why is what I said not true for a = b and only sometimes for 'a.x = b'?

Say, if x.0 had a type with a destructor, surely that destructor is called:
fn main() {
    let mut x: (Box<i32>, i32);
    x.0 = Box::new(2); // x.0 statically know to be uninit, destructor not called
    x.0 = Box::new(3); // x.0 destructor is called
}

nikomatsakis on 16 Apr 2016

@nikomatsakis

It runs the destructor if the drop flag is set.

But I think that kind of write is confusing anyway, so why not just forbid it? You can always do *(&mut u.var) = val.

arielb1 on 16 Apr 2016

My point is only that = does not _always_ run the destructor; it uses some knowledge about whether the target is known to be initialized.

@nikomatsakis I already mentioned that:

(NB: the drop doesn't happen when a is statically known to already be uninitialized, like let a; a = b;.)

But I didn't account for dynamic checking of drop flags, so this is definitely more complicated than I considered.

solson on 17 Apr 2016

@tsion

Drop flags are only semi-dynamic - after zeroing drop is gone, they are a part of codegen. I say we forbid that kind of write because it does more confusion than good.

arielb1 on 17 Apr 2016

Should Drop types even be allowed in unions? If I'm understanding things correctly, the main reason to have unions in Rust is to interface with C code that has unions, and C doesn't even have destructors. For all other purposes, it seems that it's better to just use an enum in Rust code.

ghost on 27 Apr 2016

👍3

There is a valid use case for using a union to implement a NoDrop type which inhibits drop.

Amanieu on 27 Apr 2016

👍3

As well as invoking such code manually via drop_in_place or similar.

joshtriplett on 27 Apr 2016

To me dropping a field value while writing to it is definitely wrong because the previous option type is undefined.

Would it be possible to prohibit field setters but require full union replacement? In this case if the union implements Drop full union drop would be called for the value replaced as expected.

RumataEstor on 21 Jun 2016

I don't think it makes sense to prohibit field setters; most uses of unions should have no problem using those, and fields without a Drop implementation will likely remain the common case. Unions with fields that implement Drop will produce a warning by default, making it even less likely to hit this case accidentally.

joshtriplett on 22 Jun 2016

👍1

For the sake of discussion, I intend to expose mutable references to fields in unions _and_ put arbitrary (possibly Drop) types into them. Basically, I would like to use unions to write custom space-efficient enums. For example,

union SlotInner<V> {
    next_empty: usize, /* index of next empty slot */
    value: V,
}

struct Slot<V> {
    inner: SlotInner<V>,
    version: u64 /* even version -> is_empty */
}

Stebalien on 29 Jun 2016

@nikomatsakis I'd like to propose a concrete answer to the question currently listed as unresolved here.

To avoid unnecessarily complex semantics, assigning to a union field should act like assigning to a struct field, which means dropping the old contents. It's easy enough to avoid this if you know about it, by assigning to the whole union instead. This is still slightly surprising behavior, but having a union field that implements Drop at all will produce a warning, and the text of that warning can explicitly mention this as a caveat.

Would it make sense to provide an RFC pull request amending RFC1444 to document this behavior?

joshtriplett on 2 Jul 2016

👍3

@joshtriplett Since @nikomatsakis is away on vacation, I'll answer: I think it's great form to file an amendment RFC for resolving questions like this. We'd often fast-track such RFC PRs when appropriate.

aturon on 6 Jul 2016

@aturon Thanks. I've filed the new RFC PR https://github.com/rust-lang/rfcs/issues/1663 with these clarifications.to RFC1444, to resolve this issue.

joshtriplett on 6 Jul 2016

_{_{(@aturon you can check-off that unresolved question now.)}}

Stebalien on 3 Aug 2016

I have some preliminary implementation in https://github.com/petrochenkov/rust/tree/union.

Status: Implemented (modulo bugs), PR submitted (https://github.com/rust-lang/rust/pull/36016).

petrochenkov on 13 Aug 2016

🎉4 👍1

@petrochenkov Awesome! Looks great so far.

joshtriplett on 13 Aug 2016

I'm not quite sure how to treat unions with non-Copy fields in move checker.
Suppose u is an initialized value of union U { a: A, b: B } and now we move out of one of the fields:

1) A: !Copy, B: !Copy, move_out_of(u.a)
This is simple, u.b is also put into uninitialized state.
Sanity check: union U { a: T, b: T } should behave exactly like struct S { a: T } + field alias.

2) A: Copy, B: !Copy, move_out_of(u.a)
Supposedly u.b should still be initialized, because move_out_of(u.a) is simply a memcpy and doesn't change u.b in any way.

2) A: !Copy, B: Copy, move_out_of(u.a)
This is the strangest case; supposedly u.b should also be put into uninitialized state despite being Copy. Copy values can be uninitialized (e.g. let a: u8;), but changing their state from initialized to uninitialized is something new, AFAIK.

petrochenkov on 25 Aug 2016

😕1

@retep998
I know this is completely irrelevant to FFI needs :)
The good news is that it's not a blocker, I'm going to implement whatever behavior is simpler and submit PR this weekend.

petrochenkov on 26 Aug 2016

@petrochenkov my instinct is that unions are a "bit-bucket", essentially. You are responsible for tracking whether the data is initialized or not and what it's true type is. This is very similar to the referent of a raw pointer.

This is why we can't drop the data for you, and also why any access to the fields is unsafe (even if, say, there is only one variant).

By these rules, I would expect unions to implement Copy if copy is implemented for them. Unlike structs/enums, however, there would be no internal sanity checks: you can always implement copy for a union type if you like.

nikomatsakis on 26 Aug 2016

Let me give some examples to clarify:

union Foo { ... } // contents don't matter

This union is affine, because Copy has not been implemented.

union Bar { x: Rc<String> }
impl Copy for Bar { }
impl Clone for Bar { fn clone(&self) -> Self { *self } }

This union type Bar is copy, because Copy has been implemented.

Note that if Bar were a struct, it would be an error to implement Copy because of the type of the field x.

Huh, I guess I'm not actually answering your question though, now that I re-read it. =)

nikomatsakis on 26 Aug 2016

OK, so, I realize I was not answering your question at all. So let me try again. Following the "bit bucket" principle, I would _still_ expect that we can move out from a union at will. But of course another option would be to treat it like we treat a *mut T, and require you to use ptr::read to move out.

EDIT: I'm not actually entire sure why we would prohibit such moves. It might have had to do w/ moving drop -- or perhaps just because it's easy to make a mistake and it seems better to make "moves out" more explicit? I am having trouble remembering the history here.

nikomatsakis on 26 Aug 2016

@nikomatsakis

my instinct is that unions are a "bit-bucket", essentially.

Ha, I, on the contrary, would like to give as many guarantees about union's content as we can for such a dangerous construct.

The interpretation is that union is an enum for which we don't know the discriminant, i.e. we can guarantee that at any moment of time at least one of union's variants has valid value (unless unsafe code is involved).

All the borrow/move rules in the current implementation support this guarantee, simultaneously this is the most conservative interpretation, that allows us to go either the "safe" way (for example, allowing safe access to unions with equally typed fields, this can be useful) or the "bit bucket" way in the future, when more experience with Rust unions is gathered.

Actually, I'd like to make it even more conservative as described in https://github.com/rust-lang/rust/pull/36016#issuecomment-242810887

petrochenkov on 26 Aug 2016

👍1

@petrochenkov

The interpretation is that union is an enum for which we don't know the discriminant, i.e. we can guarantee that at any moment of time at least one of union's variants has valid value (unless unsafe code is involved).

Note that unsafe code is always involved, when working with a union, since every access to a field is unsafe.

The way I think of it is, I think, similar. Basically a union is like an enum but it can be in more than one variant simultaneously. The set of valid variants is not known to the compiler at any point, though sometimes we can figure out that the set is empty (i.e., the enum is uninitialized).

So I see any use of some_union.field as basically an implicit (and unsafe) assertion that the set of valid variants currently includes field. This seems compatible with how the borrow-checker integration works; if you borrow field x and then try to use y, you are getting an error because you are basically saying that the data is simultaneously x and y (and it is borrowed). (In contrast, with a regular enum, it is not possible to inhabit more than one variant at a time, and you can see this in how the borrowck rules play out).

Anyway, the point is, when we "move" from one field of a union, the question at hand I guess is whether we can deduce that this implies that interpreting the value as the other variants is no longer valid. I think it'd be not so hard to argue either way, though. I consider this a grey zone.

The danger of being conservative is that we might well rule out unsafe code that would otherwise make sense and be valid. But I'm ok with starting out tighter and deciding whether to loosen later.

We should discuss the matter of what conditions are needed to implement Copy on a union -- also, we should make sure we have a complete list of these grey areas listed above to make sure we address and document before stabilization!

nikomatsakis on 31 Aug 2016

Basically a union is like an enum but it can be in more than one variant simultaneously.

One argument against the "more than one variant" interpretation is how unions behave in constant expressions - for these unions we always know the single active variant and also can't access inactive variants because transmuting at compile time is generally bad (unless we are trying to turn the compiler into some kind of partial target emulator).
My interpretation is that at runtime inactive variants are still inactive but can be accessed if they are layout compatible with the union's active variant (more restrictive definition) or rather with union's fragment assignment history (more vague, but more useful).

we should make sure we have a complete list of these grey areas

I'm going to amend the union RFC in some not-so-remote future! The "enum" interpretation has pretty fun consequences.

petrochenkov on 31 Aug 2016

transmuting at compile time is generally bad (unless we are trying to turn the compiler into some kind of partial target emulator)

@petrochenkov This is one of the goals of my Miri project. Miri can already do transmutes and various raw pointer shenanigans. It would be a small amount of work to make Miri handle unions (nothing new on the raw memory handling side).

And @eddyb is pushing to replace rustc constant evaluation with a version of Miri.

solson on 31 Aug 2016

@petrochenkov

One argument against the "more than one variant" interpretation is how unions behave in constant expressions...

How to best support the use of unions in constants is an interesting question, but I see no problem with restricting constant expressions to a subset of runtime behavior (this is what we always do, anyhow). That is, just because we may not be able to fully support some particular transmute at compilation time doesn't mean it's illegal at runtime.

My interpretation is that at runtime inactive variants are still inactive but can be accessed if they are layout compatible with the union's active variant

Hmm, I'm trying to think how this is different from saying that the union belongs to all of those variants simultaneously. I don't really see a difference yet. :)

I feel like this interpretation has odd interactions with moves in general. For example, if the data is "really" an X, and you interpret it as a Y, but Y is affine, then is it still an X?

Regardless, I think it's fine that having a move of any field consume the entire union can be seen as consistent with any of these interpretations. For example, in the "set of variants" approach, the idea is just that moving the value deinitializes all existing variants (and of course the variant you used must be one of the valid set). In your version, it would seem to "transmute" into that variant (and consume the original).

I'm going to amend the union RFC in some not-so-remote future! The "enum" interpretation has pretty fun consequences.

Such confidence! You're going to try ;)

Care to shed a few more details on what concrete changes you have in mind?

nikomatsakis on 1 Sep 2016

Care to shed a few more details on what concrete changes you have in mind?

More detailed description of the implementation (i.e. better documentation), some small extensions (like empty unions and .. in union patterns), two main (contradicting) alternatives of union evolution - more unsafe and less restrictive "scratch space" interpretation and more safe and more restrictive "enum with unknown discriminant" interpretation - and their consequences for move/initialization checker, Copy impls, unsafety of field access, etc.

petrochenkov on 1 Sep 2016

It would also be useful to define when accessing an inactive union field is UB, e.g.

union U { a: u8, b: () }
let u = U { b: () };
let a = u.a; // most probably an UB, equivalent to reading from `mem::uninitialized()`

but this is an infinitely tricky area.

petrochenkov on 1 Sep 2016

Sounds likely, cross-field semantics are basically a pointer cast right?
_(_() as *u8)

On Thursday, 1 September 2016, Vadim Petrochenkov [email protected]
wrote:

It would also be useful to define when accessing an inactive union field
is UB, e.g.

union U { a: u8, b: () }
let u = U { b: () };
let a = u.a; // most probably an UB, equivalent to reading from mem::uninitialized()

but this is an infinitely tricky area.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/32836#issuecomment-244154751,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABxXhi68qRITTFW5iJn6omZQQBQgzweNks5qlw4qgaJpZM4IDXsj
.

ohAitch on 1 Sep 2016

Isn't field access always unsafe?

On Thursday, 1 September 2016, Vadim Petrochenkov [email protected]
wrote:

Care to shed a few more details on what concrete changes you have in mind?

More detailed description of the implementation (i.e. better
documentation), some small extensions (like empty unions and .. in union
patterns), two main (contradicting) alternatives of union evolution - more
unsafe and less restrictive "scratch space" interpretation and more safe
and more restrictive "enum with unknown discriminant" interpretation - and
their consequences for move/initialization checker, Copy impls, unsafety
of field access, etc.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/32836#issuecomment-244151164,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABxXhuHStN8AFhR3KYDU27U29MiMpN5Bks5qlws9gaJpZM4IDXsj
.

ohAitch on 1 Sep 2016

Isn't field access always unsafe?

It can be made safe sometimes, e.g.

assignment to trivially-destructible union fields is safe.
any access to fields of a union U { f1: T, f2: T, ..., fN: T } (i.e. all fields have the same type) is safe in "enum with unknown discriminant" interpretation.

petrochenkov on 1 Sep 2016

It seems better not to apply special conditions to this, from a user point of view. Just call it unsafe, always.

cuviper on 2 Sep 2016

👍3

Currently testing out the support for unions in the latest rustc from git. Everything I've tried works perfectly.

joshtriplett on 3 Sep 2016

I ran into an interesting case in the dead field checker. Try the following code:

#![feature(untagged_unions)]

union U {
    i: i32,
    f: f32,
}

fn main() {
    println!("{}", std::mem::size_of::<U>());
    let u = U { f: 1.0 };
    println!("{:#x}", unsafe { u.i });
}

You'll get this error:

warning: struct field is never used: `f`, #[warn(dead_code)] on by default

Looks like the dead_code checker didn't notice the initialization.

(I already filed PR #36252 about the use of "struct field", changing it to just "field".)

joshtriplett on 6 Sep 2016

Unions cannot currently contain dynamically sized fields, but the RFC doesn't specify this behavior either way:

#![feature(untagged_unions)]

union Foo<T: ?Sized> {
  value: T,
}

Output:

error[E0277]: the trait bound `T: std::marker::Sized` is not satisfied
 --> <anon>:4:5
  |
4 |     value: T,
  |     ^^^^^^^^ trait `T: std::marker::Sized` not satisfied
  |
  = help: consider adding a `where T: std::marker::Sized` bound
  = note: only the last field of a struct or enum variant may have a dynamically sized type

apasel422 on 7 Sep 2016

Contextual keyword does not work outside the module/crate root context:

fn main() {
    // all work
    struct Peach {}
    enum Pineapple {}
    trait Mango {}
    impl Mango for () {}
    type Strawberry = ();
    fn woah() {}
    mod even_modules {
        union WithUnions {}
    }
    use std;

    // does not work
    union Banana {}
}

Seems like a pretty nasty consistency wart.

nagisa on 11 Sep 2016

@nagisa
Are you using some older version of rustc by accident?
I've just checked your example on playpen and it works (modulo "empty union" errors).
There's also a run-pass test checking for this specific situation - https://github.com/rust-lang/rust/blob/master/src/test/run-pass/union/union-backcomp.rs.

petrochenkov on 11 Sep 2016

@petrochenkov ah, I used play.rlo, but it seems like it might have reverted to stable or something. Never mind me, then.

nagisa on 11 Sep 2016

I think unions will eventually need to support _safe fields_, the evil twins of unsafe fields from this proposal.
cc https://github.com/rust-lang/rfcs/issues/381#issuecomment-246703410

petrochenkov on 13 Sep 2016

I do think it would make sense to have a way of declaring "safe unions", based on various criteria.

For instance, a union containing exclusively Copy non-Drop fields, all with the same size, seems safe; no matter how you access the fields, you might get unexpected data, but you can't encounter a memory safety problem or undefined behavior.

joshtriplett on 13 Sep 2016

@joshtriplett You would also need to make sure there are no "holes" in the types from which you could read uninitialized data. As I like to say: "Uninitialized data is either unpredictable random data or your SSH private key, whichever is worse."

Amanieu on 13 Sep 2016

Isn't &T Copy and non-Drop? Put that in a "safe" union with usize, and you've got a rogue reference generator. So the rules will have to be a bit more strict than that.

cuviper on 13 Sep 2016

For instance, a union containing exclusively Copy non-Drop fields, all with the same size, seems safe; no matter how you access the fields, you might get unexpected data, but you can't encounter a memory safety problem or undefined behavior.

I realize this is just an off-the-cuff example, not a serious proposal, but here are some examples to illustrate how tricky this is:

u8 and bool have the same size, but most u8 values are invalid for bool and ignoring this triggers UB
&T and &U have the same size and are Copy + !Drop for all T and U (as long as both or neither are Sized)
transmuting between uN/iN and fN is currently only possible in unsafe code. I believe such transmutes are always safe but this does enlarge the safe language so it may be controverial.
Violating privacy (e.g. punning between struct Foo(Bar); and Bar) is a big no no as wel, since privacy might be used to uphold invariants relevant to safety.

hanna-kruppe on 13 Sep 2016

@Amanieu When I was writing that, I'd meant to include a note about not having any internal padding, and somehow forgot to do so. Thanks for catching that.

@cuviper I was trying to define "plain old data", as in something that contains zero pointers. You're right, the definition would need to exclude references. Probably easier to whitelist a set of allowed types and combinations of those types.

@rkruppe

u8 and bool have the same size, but most u8 values are invalid for bool and ignoring this triggers UB

Good point; the same issue applies to enums.

&T and &U have the same size and are Copy + !Drop for all T and U (as long as both or neither are Sized)

I'd forgotten that.

transmuting between uN/iN and fN is currently only possible in unsafe code. I believe such transmutes are always safe but this does enlarge the safe language so it may be controverial.

Agreed on both points; this seems acceptable to allow.

Violating privacy (e.g. punning between struct Foo(Bar); and Bar) is a big no no as wel, since privacy might be used to uphold invariants relevant to safety.

If you don't know the internals of the type, you can't know if the internals meet the requirements (such as no internal padding). So you could exclude this by requiring that all components be plain-old-data, recursively, and that you have sufficient visibility to verify that.

joshtriplett on 13 Sep 2016

transmuting between uN/iN and fN is currently only possible in unsafe code. I believe such transmutes are always safe but this does enlarge the safe language so it may be controverial.

Floating point numbers have signalling NaN which is a trap representation which results in UB.

retep998 on 13 Sep 2016

@retep998 Does Rust run on any platforms that don't support disabling floating-point traps? (That doesn't change the UB issue, but in theory we could make that problem go away.)

joshtriplett on 13 Sep 2016

@petrochenkov

The interpretation is that union is an enum for which we don't know the discriminant, i.e. we can guarantee that at any moment of time at least one of union's variants has valid value

I think I've come around to this interpretation -- well, not _exactly_ this. I still think of it as there is some set of legal variants which is determined at the point where you store, as I always did. I think of storing a value into a union as a bit like putting it into a "quantum state" -- it could now potentially be transmuted into one of many legal interpretations. But I agree that when you move out from one of these variants, you have "forced" it into just one of those, and consumed the value. Therefore, you shouldn't be able to use the enum again (if that type is not Copy). So 👍, basically.

nikomatsakis on 13 Sep 2016

Question about #[repr(C)]: as @pnkfelix recently pointed out to me, the current spec states that if a union is not #[repr(C)], it is illegal to store with field x and read with field y. Presumably this is because we are not required to start all fields at the same offset.

I can see some utility in this: for example, a sanitizer might implement unions by storing them like a normal enum (or even a struct...?) and checking that you use the same variant that you put in.

_But_ it seems like a kind of footgun, and also one of those repr guarantees we would never be able to _actually_ change in practice, because too many people will be relying on it in the wild.

Thoughts?

nikomatsakis on 13 Sep 2016

@nikomatsakis

The interpretation is that union is an enum for which we don't know the discriminant, i.e. we can guarantee that at any moment of time at least one of union's variants has valid value

The worst part is variant/field fragments, which are directly accessible for unions.
Consider this code:

union U {
    a: (u8, bool),
    b: (bool, u8),
}
fn main() {
    unsafe {
        let mut u = U { a: (2, false) };
        u.b.1 = 2; // turns union's memory into (2, 2)
    }
}

All fields are Copy, no ownership involved and move checker is happy, but the partial assignment to the inactive field b turns the union into state with 0 valid variants. I haven't thought how to deal with it yet. Make such assignments UB? Change the interpretation? Something else?

petrochenkov on 13 Sep 2016

@petrochenkov

Make such assignments UB?

This would be my assumption, yes. When you assigned with a, the variant b was not in the set of valid variants, and hence later using u.b.1 (whether to read or assign) is invalid.

nikomatsakis on 13 Sep 2016

Question about #[repr(C)]: as @pnkfelix recently pointed out to me, the current spec states that if a union is not #[repr(C)], it is illegal to store with field x and read with field y. Presumably this is because we are not required to start all fields at the same offset.

I think the appropriate wording here is that 1) Reading from fields that are not "layout compatible" (this is vague) with previously written fields/field-fragments is UB 2) For #[repr(C)] unions users know what layouts are (from ABI docs) so they can discern between UB and non-UB 3) For #[repr(Rust)] union layouts are unspecified so users can't say what is UB and what is not, but WE (rustc/libstd + their tests) have this sacred knowledge, so we can separate the wheat from the chaff and use #[repr(Rust)] in non-UB fashion.

4) After size/stride and field reordering questions are decided upon I'd expect struct and union layouts to be set in stone and specified, so users will know the layouts as well and will be able to use #[repr(Rust)] unions as freely as #[repr(C)] and the problem will go away.

petrochenkov on 13 Sep 2016

@nikomatsakis In the discussion of the union RFC, people mentioned wanting to have native Rust code that uses unions to build compact data structures.

joshtriplett on 13 Sep 2016

Is there anything stopping from people using #[repr(C)]? If not, then I don't see the need to provide any sort of guarantees for #[repr(Rust)], just leave it as "here be dragons". Probably would be best to have a lint that is warn by default for unions that aren't #[repr(C)].

retep998 on 13 Sep 2016

👍2

@retep998 It seems reasonable to me that repr(Rust) doesn't guarantee any particular layout or overlap. I would just suggest that repr(Rust) should not in practice break people's assumptions about the memory usage of a union ("no larger than the largest member").

joshtriplett on 14 Sep 2016

👍1

Does Rust run on any platforms that don't support disabling floating-point traps?

That isn’t really a valid question to ask. First of all, optimiser itself can rely on the UB-ness of trap representations and rewrite the program in unexpected ways. Moreover, Rust does not really support altering FP environment either.

But it seems like a kind of footgun, and also one of those repr guarantees we would never be able to actually change in practice, because too many people will be relying on it in the wild.

Thoughts?

Adding a lint or some such which inspects the program flow and throws a complaint at the user if read is done from a field when the enum was provably written in some other field would help with this¹. MIR-based lint would make a short work of that. If a CFG does not allow making any conclusions on legality of the union field load and the user makes a mistake, the undefined behaviour is the best we can specify without having specified the Rust repr itself IMO.

¹: Especially effective if people begin using union as a poor man’s transmute for some reason.

should not in practice break people's assumptions about the memory usage of a union ("no larger than the largest member").

I disagree. It might make a lot sense to extend repr(Rust) stuff to the size of a machine word on some architectures, for example.

nagisa on 5 Oct 2016

One issue that may want to be considered before stabilization is https://github.com/rust-lang/rust/issues/37479. Looks like with the most recent version of LLDB debugging unions may not work :(

alexcrichton on 27 Dec 2016

@alexcrichton Does it work with GDB?

joshtriplett on 27 Dec 2016

From what I can tell, yes. The Linux bots seem to be running the test just fine.

alexcrichton on 27 Dec 2016

Then that means Rust provides all the right debugging information, and LLDB just has a bug here. I don't think a bug in one of multiple debuggers, not present in another, should block stabilizing this. LLDB just needs fixing.

joshtriplett on 27 Dec 2016

It would be cool to see if we could get this feature into FCP for the 1.17 cycle (that's the March 16th beta). Can anyone give a summary of the outstanding questions and the current situation of the feature so we can see if we can reach consensus and resolve everything?

withoutboats on 27 Jan 2017

@withoutboats
My plans are

Wait for the upcoming release (Feb 3).
Propose stabilization of unions with Copy fields. This will cover all FFI needs - FFI libraries will be able to use unions on stable. "POD" unions are used for decades in C/C++ and well understood (modulo type based aliasing, but Rust doesn't have it), there are also no known blockers.
Write "Unions 1.2" RFC until Feb 3. It will describe the current implementation of unions and outline future directions. The future of unions with non-Copy fields will be decided in the process of discussing this RFC.

Note, that exposing something like ManuallyDrop or NoDrop from the standard library doesn't require stabilizing unions.

STATUS UPDATE (Feb 4): I'm writing the RFC, but I'm having writer's block after each sentence, as usual, so there's a chance I'll finish it the next weekend (Feb 11-12) and not this weekend (Feb 4-5).
STATUS UPDATE (Feb 11): The text is 95% ready, I'll submit it tomorrow.

petrochenkov on 27 Jan 2017

👍4

@petrochenkov that seems like a very reasonable course of action.

withoutboats on 27 Jan 2017

@petrochenkov That sounds reasonable to me. I also reviewed your unions 1.2 proposal, and provided some comments; overall, it looks good to me.

joshtriplett on 27 Jan 2017

@joshtriplett I was thinking that, while in the @rust-lang/lang meeting we talked about keeping the check-lists up to date, I would actually like to see that -- for each of these points -- we make an affirmative decision (i.e., ideally with @rfcbot). This would probably suggest a distinct issue (or even amendment RFC). We could do this over time, but until then I don't feel like we've "settled" the answers to the open questions definitively. Along those lines, extracting and summarizing the relevant conversation into an amendment RFC or even just an issue we can link to from here seems like an excellent step to helping ensure that everyone is on the same page -- and something anyone who is interested can do, of course, not just @rust-lang/lang members or shepherds.

nikomatsakis on 27 Jan 2017

So, I have submitted the "Unions 1.2" RFC - https://github.com/rust-lang/rfcs/pull/1897.

Now I'd like to propose stabilization of a conservative subset of union - all fields of the union should be Copy, the number of fields should be non-zero and the union should not implement Drop.
(I'm not sure the last requirement is viable though, because it may be easily circumvent by wrapping the union into a struct and implementing Drop for that struct.)
Such unions cover all the needs of FFI libraries, which are supposed to be the primary consumer of this language feature.

The text of "Unions 1.2" RFC doesn't really tell anything new about FFI-style unions, except that it explicitly confirms that type punning is permitted.
EDIT: "Unions 1.2" RFC is also going to make assignments to ~~trivially-destructible~~Copy fields safe (see https://github.com/rust-lang/rust/issues/32836#issuecomment-281296416, https://github.com/rust-lang/rust/issues/32836#issuecomment-281748451), this affects FFI-style unions as well.

This text also provides documentation necessary for the stabilization.
The "Overview" section can be copy-pasted into the book and "Detailed design" into the reference.

petrochenkov on 12 Feb 2017

🎉1

ping @nikomatsakis

petrochenkov on 17 Feb 2017

Does something like this really need to be added as part of the language? It took me about 20 minutes to knock up an implementation of a union using a little unsafe and ptr::write().

use std::mem;
use std::ptr;


/// A union of `f64`, `bool`, and `i32`.
#[derive(Default, Clone, PartialEq, Debug)]
struct Union {
    data: [u8; 8],
}

impl Union {
    pub unsafe fn get<T>(&self) -> &T {
        &*(&self.data as *const _ as *const T)
    }

    pub unsafe fn set<T>(&mut self, value: T) {
        // "transmute" our pointer to self.data into a &mut T so we can 
        // use ptr::write()
        let data_ptr: &mut T = &mut *(&mut self.data as *mut _ as *mut T);
        ptr::write(data_ptr, value);
    }
}


fn main() {
    let mut u = Union::default();
    println!("data: {0:?} ({0:#p})", &u.data);
    {
        let as_i32: &i32 = unsafe { u.get() };
        println!("as i32: {0:?} ({0:#p})", as_i32);
    }

    unsafe {
        u.set::<f64>(3.14);
    }

    println!("As an f64: {:?}", unsafe { u.get::<f64>() });
}

I feel like it wouldn't be difficult for someone to write a macro which can generate something like that except ensuring that the internal array is the size of the largest type. Then instead of my completely generic (and hideously unsafe) get::<T>() they could add a trait bound to limit the types you can get and set. You might even add specific getter and setter methods if you want named fields.

I'm thinking they might write something like this:

union! { Foo(u64, Vec<u8>, String) };

My point is, this is something you could quite feasibly do as part of a library instead of adding extra syntax and complexity to an already fairly complex language. Plus with proc macros it's already quite possible, even if that hasn't fully hit stable yet.

Michael-F-Bryan on 21 Feb 2017

👎2

@Michael-F-Bryan We don't have constant size_of yet though.

eddyb on 21 Feb 2017

@Michael-F-Bryan It's not just enough to have a [u8] array, you also need to get the alignment correct. I in fact already use macros to handle unions but due to the lack of constant size_of and align_of I have to manually allocate the correct space, plus because there is no usable ident concatenation in declarative macros I have to manually specify the names for both the getters and setters. Even just initializing a union is hard at the moment because I have to first initialize it with some default value and then set the value to the variant I want (or add another set of methods to construct the union which is even more verbosity in the definition of the union). It's overall a lot more work and prone to error and uglier than native support for unions. Maybe you should read the RFC and the discussion that went with it so you can understand why this feature is so important.

retep998 on 21 Feb 2017

And the same for alignment.

jmesmon on 21 Feb 2017

I imagine ident concatenation shouldn't be too difficult now syn exists. It allows you to do operations on the AST passed in, so you could take two Idents, extract their string representation (Ident implements AsRef<str>), then create a new Ident which is the concatenation of the two using Ident::From<String>().

The RFC mentions a lot about how existing macro implementations are cumbersome to use, however with the recent creation of crates like syn and quote, it's now a lot easier to do proc macros. I feel like that'd go a long way towards improving ergonomics and make things less error prone.

For example, you could have a MyUnion::default() which just zero's the union's internal buffer, then a fn MyUnion::new<T>(value:T) -> MyUnion, where T has a trait bound ensuring you can only initialize with the correct types.

In terms of alignment and size, are you able to use the mem module from the standard library (i.e. std::mem::align_of() and friends)? I guess everything I'm proposing would depend on being able to use those at macro expansion time to figure out the size and alignment required. 99.9% of the times unions are used it's done with primitive types anyway, so I feel like you'd be able to write a helper function which takes a type's name and returns its alignment or size (possibly asking the compiler, although that's more of an implementation detail).

I admit, built in pattern matching would be very nice, but most of the time any unions you use in FFI would get wrapped in a thin abstraction layer anyway. So you might be able to get away with a couple if/else statements or using a helper function.

Michael-F-Bryan on 21 Feb 2017

👎2

In terms of alignment and size, are you able to use the mem module from the standard library (i.e. std::mem::align_of() and friends)?

That is not going to work in any cross compilation context.

sfackler on 21 Feb 2017

@Michael-F-Bryan All of these discussions and many more were had in the history of https://github.com/rust-lang/rfcs/pull/1444 . To summarize the responses to your specific concerns, in addition to those already mentioned: you'd have to reimplement the padding and alignment rules of every target platform/compiler, and use awkward syntax throughout your FFI code (which @retep998 has in fact done extensively for Windows bindings and can vouch for the awkwardness of). Also, proc macros currently only work for deriving; you can't extend syntax elsewhere.

Also:

99.9% of the times unions are used it's done with primitive types anyway

Not true at all. C code extensively uses a "struct of unions of structs" pattern, where most of the union fields consist of different struct types.

joshtriplett on 21 Feb 2017

👍2

@rfcbot fcp merge per @petrochenkov's comment https://github.com/rust-lang/rust/issues/32836#issuecomment-279256434

I have nothing to add, just triggering the bot

withoutboats on 21 Feb 2017

Team member @withoutboats has proposed to merge this. The next step is review by the rest of the tagged teams:

[x] @aturon
[x] @eddyb
[x] @nikomatsakis
[x] @nrc
[x] @pnkfelix
[x] @withoutboats

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot on 21 Feb 2017

PSA: I'm going to update the "Unions 1.2" RFC with one more change affecting FFI-style unions - I'll move safe assignments to trivially-destructible fields of unions from "Future directions" to the RFC proper.

union.trivially_destructible_field = 10; // safe

Why:

Assignments to trivially-destructible union fields are unconditionally safe, regardless of interpretation of unions.
It will remove roughly a half of union-related unsafe blocks.
It will be harder to do later due to potential large number of unused_unsafe warnings/errors in stable code.

petrochenkov on 21 Feb 2017

👍1

@petrochenkov Do you mean "trivially destructible union fields", or "unions with entirely trivially destructible fields"?

Are you proposing that all the unsafe behavior occurs on read, where you choose a interpretation? For instance, having a union containing an enum and other fields, where the union value contains an invalid discriminant?

At a high level that seems plausible. It does allow some things I'd consider unsafe, but Rust in general doesn't, such as bypassing destructors or leaking memory. At a low level, I'd hesitate to consider that sound.

joshtriplett on 21 Feb 2017

I feel ok about stabilizing that subset. I don't know my opinion about this Unions 1.2 RFC yet as I haven't had time to read it! I'm not sure what I think about allowing safe access to fields in some cases. I feel like our efforts to make a "minimal" notion of what is unsafe (just dereferencing pointers) was a mistake, in retrospect, and we should have declared a wider swath of things unsafe (e.g., a lot of casts), since they interact in complex ways with LLVM. I feel like this may be the case here too. Put another way, I might rather pull back the rules about unsafe in concert with more progress on unsafe code guidelines.

nikomatsakis on 21 Feb 2017

@joshtriplett
"trivially destructible fields", I've tweaked the wording.

Are you proposing that all the unsafe behavior occurs on read, where you choose a interpretation?

Yes. The write alone can't cause anything dangerous without a subsequent read.

EDIT:

I feel like our efforts to make a "minimal" notion of what is unsafe (just dereferencing pointers) was a mistake, in retrospect

Oh.
Safe writes are completely in line with the current approach to unsafety, but if you are going to change it, then I should probably wait.

petrochenkov on 21 Feb 2017

I don't feel great about stabilising this subset. Usually when we stabilise a subset it is a syntactic subset or at least fairly obvious subset. This subset feels a bit complex to me. If there is so much undecided about the feature that we aren't ready to stabilise the current implementation, then I'd rather leave the whole thing unstable for a while longer.

nrc on 22 Feb 2017

@nrc
Well, the subset is rather obvious - "FFI unions", or "C unions", or "pre-C++11 unions" - even if it's not syntactic. My initial goal was to stabilize this subset ASAP (this cycle, ideally) so it could be used in libraries like winapi.
There's nothing especially dubious about the remaining subset and its implementation, it's just not urgent and need to wait for unclear amount of time until the process for "Unions 1.2" RFC completes. My expectations would be to stabilize the remaining parts in 1, 2 or 3 cycles after stabilization of the initial subset.

petrochenkov on 22 Feb 2017

👍5

I think I have an ultimate argument for safe field assignments.
Unsafe field assignment

unsafe {
    u.trivially_destructible_field = value;
}

is equivalent to safe full union assignment

u = U { trivially_destructible_field: value };

except that the safe version is paradoxically less safe because it will overwrite u's bytes outside of trivially_destructible_field with undefs, while field assignment have guarantee about leaving them intact.

petrochenkov on 22 Feb 2017

😄1

@petrochenkov The extreme of that is size_of_val(&value) == 0, right?

eddyb on 22 Feb 2017

Equivalence between the two snippets is only true if field in question is
"trivially destructible", no?

In that sense making assignments like these safe but only in some cases
seems extremely inconsistent to me.

On Feb 22, 2017 14:50, "Vadim Petrochenkov" notifications@github.com
wrote:

I think I have an ultimate argument for safe field assignments.
Unsafe field assignment

unsafe {
u.trivially_destructible_field = value;
}

is equivalent to safe full union assignment

u = U { trivially_destructible_field: value };

except that the safe version is paradoxically less safe because it will
overwrite u's bytes outside of trivially_destructible_field with undefs,
while field assignment have guarantee about leaving them intact.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/32836#issuecomment-281660298,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AApc0lUOXLU5xNTfM5PEfEz9nutMZhXUks5rfC8UgaJpZM4IDXsj
.

nagisa on 22 Feb 2017

👍1

@eddyb

The extreme of that is size_of_val(&value) == 0, right?

Yep.

@nagisa
I don't understand why one additional simple rule eliminating large portion of false positives is extremely inconsistent. "only some cases" cover all FFI unions in particular.
I think "extremely inconsistent" is a large overestimation. Not as large as "only mut variables can be assigned? horrible inconsistency!", but still going in this direction.

petrochenkov on 22 Feb 2017

@petrochenkov please consider such a case:

// Somebody Somewhere in some crate (v 1.0.0)
struct Peach; // trivially destructible
union Banana { pub actually: Peach }

// Somebody Else in their dependent crate
extern some crate;
fn somefn(banana: &mut Banana) {
    banana.actually = Peach;
}

Now since adding trait implementations is generally not a breaking change, Mr. Somebody Somewhere figures out it may be a good idea to add following implementation

impl Drop for Peach { fn drop(&mut self) { println!("Moi Peach!") }

and release a 1.1.0 (semver compatible with 1.0.0 AFAIK) version of the crate.

Suddenly Mr. Somebody Else’s crate does not compile anymore:

fn somefn(banana: &mut Banana) {
    banana.actually = Peach; // ERROR: Something something… unsafe assingment… somewhat somewhat trivially indestructible… 
}

And therefore sometimes allowing safe assignments to union fields is not as trivial as only mut locals being mutation-able.

As I’ve written down this example I’ve became unsure which stance I should take wrt this, honestly. On one hand, I’d like to preserve the property that adding implementations is generally not a breaking change (ignoring potential cases of XID). On another hand, changing any field of union from trivially destructible to non-trivially destructible is obviously a semver incompatible change that’s extremely easy to overlook and the proposed rule would make these incompatibilities more visible (as long as the assignment is not in an unsafe block already).

nagisa on 22 Feb 2017

@nagisa
This is a good argument, I didn't thought about compatibility.

The problem seems solvable though. To avoid compatibility issues do the same thing as coherence does - avoid negative reasoning. I.e. replace "trivially-destructible" == "no components implement Drop" with closest positive approximation - "implements Copy".
Copy type cannot un-implement Copy backward-compatibly, and Copy types still represent majority of "trivially-destructible" types, especially in context of FFI unions.

petrochenkov on 22 Feb 2017

Implementing Drop is already not backwards-compatible and this has nothing to do with the union feature:

// Somebody Somewhere in some crate (v 1.0.0)
struct Apple; // trivially destructible
struct Pineapple { pub actually: Apple }

// Somebody Else in their dependent crate
extern some crate;
fn pineapple_to_apple(pineapple: Pineapple) -> Apple {
    pineapple.actually
}

// some crate v 1.1.0
impl Drop for Pineapple { fn drop(&mut self) { println!("Moi Pineapple!") }

fn pineapple_to_apple(pineapple: Pineapple) -> Apple {
    pineapple.actually // ERROR: can't move out of Pineapple
}

jethrogb on 22 Feb 2017

Which in turn sounds like implementing Drop drops implicit Copy. And Copy
can be depended on.

On Wed, Feb 22, 2017 at 10:11 AM, jethrogb notifications@github.com wrote:

Implementing Drop is already not backwards-compatible and this has
nothing to do with the union feature:

// Somebody Somewhere in some crate (v 1.0.0)
struct Apple; // trivially destructible
struct Pineapple { pub actually: Apple }

// Somebody Else in their dependent crate
extern some crate;
fn pineapple_to_apple(pineapple: Pineapple) -> Apple {
pineapple.actually
}

// some crate v 1.1.0
impl Drop for Pineapple { fn drop(&mut self) { println!("Moi Pineapple!") }

fn pineapple_to_apple(pineapple: Pineapple) -> Apple {
banana.actually // ERROR: can't move out of Pineapple
}

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/32836#issuecomment-281752949,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABxXhgbFgRNzYOsU4c6Gu1KFfwdjDHn3ks5rfHpYgaJpZM4IDXsj
.

ohAitch on 22 Feb 2017

@jethrogb
I wanted to mention this issue, but didn't because it has few rather special preconditions - the struct implementing Drop should have a public field and that field should not be Copy. The union case affects all structs unconditionally.

petrochenkov on 22 Feb 2017

@petrochenkov I consider that maybe an argument that creating a union ought to be unsafe :)

nikomatsakis on 22 Feb 2017

@petrochenkov what is the development and nightly-user-experience path for stabilizing a subset? Do we add a new feature gate first for the subset, so that people can get concrete experience using the subset before it becomes stabilized?

pnkfelix on 15 Mar 2017

@pnkfelix
What I assumed is just to stop requiring #[feature(untagged_unions)] for this subset, no new features or other bureaucracy.
FFI-style unions are supposed to be the most often used kind of unions, so a new feature would mean guaranteed breakage just before stabilization, which, I presume, would be annoying.

petrochenkov on 15 Mar 2017

I'd just like to note that because alignment and packing attributes haven't been implemented yet (nevermind stabilized), I don't really have a strong need for this to be stabilized yet.

retep998 on 15 Mar 2017

@retep998
Packing? If you mean #[repr(packed)] then it's supported on unions right now (unlike align(>1) attributes).

petrochenkov on 15 Mar 2017

@petrochenkov #[repr(packed(N))]. Packing other than 1 is needed quite a bit in winapi. It's not that I need these things supported on unions specifically, I just don't want to jump to a new major version to increase my minimum Rust requirement unless I can get all of those things at the same time.

retep998 on 16 Mar 2017

To clarify the state of play a bit here:

The current FCP proposal is for pure-Copy unions only. That's the case with, as far as I'm aware, basically no outstanding questions other than "Should we stabilize?" The discussion since the motion to FCP has all been about @petrochenkov's new RFC.

@nrc and @nikomatsakis, from discussion on IRC, I suspect that you are both ready to check off your boxes, but I'll leave that to you ;-)

aturon on 29 Mar 2017

👍1

:bell: This is now entering its final comment period, as per the review above. :bell:

rfcbot on 3 Apr 2017

@petrochenkov
It seems to me that this would benefit from an idea I floated recently. (https://internals.rust-lang.org/t/automatic-marker-trait-for-unconditionally-valid-repr-c-types/5054)

While not proposed with unions in mind, the trait Plain as explained (subject to bikeshedding) would allow any union composed of only Plain types to be used without any unsafety. It codifies the property that _any_ bit pattern in memory is equally valid, so you can solve the initialization issues by mandating memory be zeroed out, and it ensures reads cannot invoke UB.

In context of FFI, being Plain as defined is also a de facto requirement for such code being sound in many cases, while cases where non-Plain types are useful are rare and difficult to set up safely.

le-jzr on 8 Apr 2017

Factual existence of a named trait aside, it might be prudent to split this feature in two. With unqualified union enforcing the requirements and allowing use without unsafety, and unsafe union with relaxed requirements for content and more headache for users. That would allow stabilizing either without barring the way towards adding the other in the future.

le-jzr on 8 Apr 2017

@le-jzr
This seems sufficiently orthogonal to unions.
I would estimate accepting Plain into Rust in the near future as not very likely + making more accesses to union fields safe is more or less backward compatible (not entirely due to lints), so I wouldn't delay unions due to it.

petrochenkov on 8 Apr 2017

@petrochenkov I'm not suggesting to delay unions in waiting for any closure on my proposal, but rather to consider the _existence_ of such possible restrictions on its own right. Making more accesses to union fields safe in the future may face hindrances because it creates more inconsistency in the language. In particular, making it so that safety of a field access varies from field to field seems ugly.

Hence my suggestion of making the declaration unsafe union, so that unconditionally safe-to-use version can be introduced later without adding new keywords. It is worth noting that completely safe-to-use version is enough for most use cases.

Edit: Tried to clarify what I mean to say.

le-jzr on 8 Apr 2017

Another place where this possibility may inform current and future design is initialization. For unconditionally safe union, it is necessary that the entire span of memory reserved for it is zeroed. Even with current version, ensuring this would cut down on potential UB scenarios and make using unions easier.

le-jzr on 9 Apr 2017

The final comment period is now complete.

rfcbot on 13 Apr 2017

🎉2

Now that the FCP to merge is complete, what's the next step here? Would be nice to stabilize this for 1.19.

bstrie on 1 May 2017

@bstrie So you want to stabilize a feature

aturon on 1 May 2017

It'd be awesome if someone could add more detail to that @rfcbot message (source code here). It could make it easier for someone unfamiliar with the process to jump in and move things along.

solson on 1 May 2017

The path is clear to get this into 1.19. Anybody on the hook for it? cc @joshtriplett

brson on 3 May 2017

Is the NonZero enum layout optimization guaranteed to apply through a union? For example, Option<ManuallyDrop<&u32>> should not represent None as a null pointer. Some(ManuallyDrop::new(uninitialized::<[Vec<Foo>; 10]>())).is_some() should not read uninitialized memory.

https://crates.io/crates/nodrop (used in https://crates.io/crates/arrayvec) has hacks to deal with this.

SimonSapin on 19 May 2017

@SimonSapin
This is currently marked as an unresolved question in the RFC.
In the current implementation this program

#![feature(untagged_unions)]

struct S {
    _a: &'static u8
}
union U {
    _a: &'static u8
}

fn main() {
    use std::mem::size_of;
    println!("struct {}", size_of::<S>());
    println!("optional struct {}", size_of::<Option<S>>());
    println!("union {}", size_of::<U>());
    println!("optional union {}", size_of::<Option<U>>());
}

prints

struct 8
optional struct 8
union 8
optional union 16

, i.e. the optimization is not performed.
cc https://github.com/rust-lang/rust/issues/36394

petrochenkov on 20 May 2017

This is unlikely to make 1.19.

brson on 23 May 2017

@brson

This is unlikely to make 1.19.

The stabilization PR is merged.

petrochenkov on 27 May 2017

With untagged unions now shipping in 1.19 (in part from https://github.com/rust-lang/rust/pull/42068) - is there anything remaining on this issue or should we close?

jonathandturner on 15 Aug 2017

@jonathandturner
There is still a whole world of unions with non-Copy fields!
The progress is mostly blocked on the clarification/documentation RFC (https://github.com/rust-lang/rfcs/pull/1897).

petrochenkov on 15 Aug 2017

😕1

Has there been any progress on unions with non-Copy fields since August? The Unions 1.2 RFC seems stalled (I'm guessing due to the impl period?)

mystor on 29 Oct 2017

Allowing ?Sized types in unions – if only for unions of a single type – would make it easy to implement https://github.com/rust-lang/rust/issues/47034:

```rust
union ManuallyDrop {
value: T
}

mikeyhew on 2 Jan 2018

@mikeyhew You only really need to require that at most one type can be unsized.

eddyb on 2 Jan 2018

I am looking at some Rust code using unions, and have no idea whether it invokes undefined behavior or not.

The reference [items::unions] only mentions:

Inactive fields can be accessed as well (using the same syntax) if they are sufficiently layout compatible with the current value kept by the union. Reading incompatible fields results in undefined behavior.

But I can't find a definition of "layout compatible" neither in [items::unions] nor in [type_system::type_layout].

Looking through the RFCs I haven't been able to find a definition of "layout compatible" either, only hand-waved examples of what should and shouldn't work in the RFC 1897: Unions 1.2 (not merged).

The RFC1444: unions only seems to allow transmuting an union to its variants as long as it does not invoke undefined behavior but I can't find anywhere in the RFC when that is/isn't the case.

Are the _precise_ rules that tell me whether a piece of code using unions has defined behavior written down somewhere (and what the defined behavior is)?

gnzlbg on 16 Jan 2018

@gnzlbg To a first approximation: you may not access padding, may not access an enum that contains an invalid discriminant, may not access a bool that contains a value other than true or false, may not access an invalid or signaling floating-point value, and a few other things like those.

If you point to specific code involving unions, we could look at it and tell you if it's doing anything undefined.

joshtriplett on 16 Jan 2018

To a first approximation: you may not access padding, may not access an enum that contains an invalid discriminant, may not access a bool that contains a value other than true or false, may not access an invalid or signaling floating-point value, and a few other things like those.

Actually the latest consensus is that reading arbitrary floats is fine (#46012).

I would add one more requirement: both the source and target union variants are #[repr(C)] and so are all their fields (and recursively) if they are structs.

Amanieu on 16 Jan 2018

@Amanieu I stand corrected, thank you.

joshtriplett on 16 Jan 2018

So I guess that the rules are not written anywhere?

I am looking at how to use stdsimd with its new interface. Unless we stabilize some extras with it one will need to use unions to do type punning with some of the simd types, like this:

https://github.com/rust-lang-nursery/stdsimd/blob/03cb92ddce074a5170ed5e5c5c20e5fa4e4846c3/coresimd/src/x86/test.rs#L17

gnzlbg on 16 Jan 2018

AFAIK writing one union field and reading another is very much like using transmute_copy, with the same restrictions. That those restrictions are still a bit nebulous is not union-specific.

For that matter, the function you linked could just use transmute::<__m128d, [f64; 2]>. Although the union version is arguable nicer, at least once the transmute that's currently there is removed: It could be just A { a }.b[idx].

hanna-kruppe on 16 Jan 2018

@rkruppe I've filled a clippy issue to add that lint: https://github.com/rust-lang-nursery/rust-clippy/issues/2361

the function you linked could just use transmute::<__m128d, [f64; 2]>

I guess that the rules I am looking for are when does transmute invoke undefined behavior then (so I'll go look for those).

I think it would have helped me if the language reference about unions would have specified the rules for unions in terms of transmute (even if the rules for transmute aren't 100% clear yet) instead of just mention "layout compatibility" and leaving it at that. The leap from "layout compatibility" to "if transmute doesn't invoke undefined behavior, then the types are layout compatible and can be accessed via type punning" was not obvious to me.

gnzlbg on 16 Jan 2018

To be clear, transmute[_copy] is not "more primitive" than unions. In fact transmute_copy is literally just pointer as casts plus ptr::read. transmute additionally needs mem::uninitialized (deprecated) or MaybeUninitialized (a union) or something like that, and is implemented as intrinsic for efficency, but it too boils down to a type-punning memcpy. The main reason I drew the connection to transmute is because it's older and historically over-emphasized and therefore we currently have more write-ups and folklore knowledge that focuses on transmute specifically. The real underlying concept, which dictates what's valid and what not (and which a spec would describe), is how values are stored in memory as bytes and which byte sequences are UB to read as which types.

hanna-kruppe on 16 Jan 2018

👍2

Correction: transmute doesn't actually need uninitialized storage (via intrinsics, or unions, or otherwise). Efficency aside, you can do something like this (untested, may contain embarassing typos):

fn transmute<T, U>(x: T) -> U {
    assert!(size_of::<T>() == size_of::<U>());
    let mut bytes = [0u8; size_of::<U>()];
    ptr::write(bytes.as_mut_ptr() as *mut T, x);
    mem::forget(x);
    ptr::read(bytes.as_ptr() as *const U)
}

The only "magical" part of transmute is that it can constrain the type parameters to be of equal size at compile time.

hanna-kruppe on 16 Jan 2018

The reference and and Unions 1.2 RFC are intentionally vague on this matter because the rules for transmuting in general are not settled.
The intent was "for repr(C) unions see third-party ABI specifications, for repr(Rust) unions layout compatibility is mostly unspecified (unless it is)".

petrochenkov on 16 Jan 2018

Is it too late to revisit the drop check semantics of unions with drop fields?

The original issue is that adding ManuallyDrop caused Josephine to become unsound, because it (rather naughtily) relied on perma-borrowed values not having their backing store reclaimed without first running their destructor.

A stripped-down example is at https://play.rust-lang.org/?gist=607e2dfbd51f4062b9dc93d149815695&version=nightly. The idea is that there's a type Pin<'a, T>, with a method pin(&'a self) -> &'a T whose safety relies on the invariant "after calling pin.pin(), if the memory backing the pin is ever reclaimed, then the pin's destructor must have been run".

This invariant was maintained by Rust until #[allow(unions_with_drop_fields)] was added, and used by ManuallyDrop https://doc.rust-lang.org/src/core/mem.rs.html#949.

The invariant would be restored if the drop checker considered unions with drop fields to have a Drop impl. This is a breaking change, but I doubt any code in the wild relies on the current semantics.

IRC conversation: https://botbot.me/mozilla/rust-lang/2018-02-01/?msg=96386869&page=3

Josephine issue: https://github.com/asajeffrey/josephine/issues/52

cc: @nox @eddyb @pnkfelix

asajeffrey on 1 Feb 2018

The original issue is that adding ManuallyDrop caused Josephine to become unsound, because it (rather naughtily) relied on perma-borrowed values not having their backing store reclaimed without first running their destructor.

Destructors are not guaranteed to run. Rust doesn't guarantee that. It tries, but, for instance, std::mem::forget was made into a safe function.

The invariant would be restored if the drop checker considered unions with drop fields to have a Drop impl. This is a breaking change, but I doubt any code in the wild relies on the current semantics.

Unions are unsafe largely because you can't know what field of the union is valid. The union can't have an automatic Drop impl; if you wanted such an impl, you'd need to write it manually, taking into account whatever means you have to know whether the union field with a Drop impl is valid.

One clarification here: I don't believe we should ever allow unions with Drop fields by default without at least a warn-by-default lint, if not an error-by-default lint. unions_with_drop_fields shouldn't disappear as part of the stabilization process.

EDIT: oops, didn't mean to hit "close and comment".

joshtriplett on 1 Feb 2018

@joshtriplett yes, Rust doesn't guarantee that destructors will run, but it happened (prior to 1.19) to maintain the invariant that perma-borrowed values would only have their memory reclaimed if the destructor ran. This is even true in the presence of mem::forget, since you can't call it on a perma-borrowed value.

This is what Joephine rather naughtily was relying on, but isn't true any more because of how the drop checker treats unions_with_drop_fields.

It would be fine if allow(unions_with_drop_fields) were considered to be an unsafe annotation, this wouldn't be a drastic change, AFAICT, it would just require deny(unsafe_code) to check for allow(unions_with_drop_fields).

asajeffrey on 1 Feb 2018

@asajeffrey I'm still trying to understand the Pin thing... so, if I follow the example correctly, the reason this "works" is that fn pin(&'a Pin<'a, T>) -> &'a T forces the borrow to last for as long as the lifetime 'a annotated in the type, and that lifetime is moreover invariant.

That's an interesting observation! I was not aware of this trick. My gut feeling is that this works "by accident", i.e. safe Rust doesn't happen to provide a way to prevent the destructor from running but that doesn't make this part of the "contract". Notably, https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html does not list leaks.

RalfJung on 2 Feb 2018

IMO it doesn't matter if it works by accident or on purpose. There was no way to avoid Drop running with this trick prior to ManuallyDrop existing (which requires unsafe code to be implemented), and now we can't rely on that anymore.

The addition of ManuallyDrop basically killed that neat behaviour of Rust and saying that it shouldn't have been relied on in the first place sounds like circular reasoning to me. If ManuallyDrop didn't allow calling Pin::pin, would there be any other way to make calling Pin::pin unsound? I don't think so.

nox on 2 Feb 2018

I don't think we can commit to preserving every guarantee that rustc accidentally happens to provide right now. We have no idea what these guarantees may be, so we'd be stabilizing a pig in a poke (okay I hope this idiom makes sense... it's what the dictionary tells me matches my native language idiom, which would literally translate to "the cat in the bag" ;) -- what I want to say is we'd have no clue what we would be stabilizing).

Also, this is a two-edged sword -- every additional guarantee we decide to provide is something unsafe code has to take care of. So discovering a new guarantee may just as well break existing unsafe code (sitting silently somewhere on crates.io without being aware) as enable new unsafe code (the latter was the case here).

For example, it is very conceivable that lexical lifetimes enable unsafe code that gets broken by non-lexical lifetimes. Currently all lifetimes are well-nested, maybe there is a way for unsafe code to exploit this? Only with non-lexical lifetimes there can be lifetimes that overlap, but neither is included in the other. Does this make NLL a breaking change? I hope not!

If ManuallyDrop didn't allow calling Pin::pin, would there be any other way to make calling Pin::pin unsound? I don't think so.

With unsafe code, there would be. So by declaring this Pin trick sound, you are declaring some unsafe code unsound that would be sound if we decide ManuallyDrop is okay.

RalfJung on 2 Feb 2018

👍2

What we described is a very ergonomic way to integrate Rust with GCs. What I'm trying to say is that it sounds wrong to me to tell us that this was just an accident that it worked and that we should forget about it, when I can't find of any use case for not constraining unions with Drop fields as described by @asajeffrey here, and when this really is the only wart breaking Josephine.

I'll be happy to forget about it if someone can show it was unsound even without ManuallyDrop.

nox on 2 Feb 2018

What I'm trying to say is that it sounds wrong to me to tell us that this was just an accident that it worked

I see no indication of this trick ever being "designed", so I think it is quite fair to call it an accident.

and that we should forget about it

I should have made clearer that this part is just my personal gut feeling. I think it could also be a reasonable point of action to declare this a "happy accident" and actually make it a guarantee -- if we are reasonably confident that indeed all the other unsafe code respects this guarantee, and that providing this guarantee is more important than the ManuallyDrop use case. This is a trade-off, similar to leakpocalypse, where we can't eat our cake and have it, too (we can't have both Rc with its current API and the drop-based scoped threads; we can't have both ManuallyDrop and Pin) so we have to make a decision either way.

That said, I find it hard to express the actual guarantee provided here in a precise way, which makes me personally lean more towards the "ManuallyDrop is fine" side of things.

RalfJung on 2 Feb 2018

if we are reasonably confident that indeed all the other unsafe code respects this guarantee, and that providing this guarantee is more important than the ManuallyDrop use case. This is a trade-off, similar to leakpocalypse, where we can't eat our cake and have it, too (we can't have both Rc with its current API and the drop-based scoped threads; we can't have both ManuallyDrop and Pin) so we have to make a decision either way.

Fair enough, I heartily agree with that. Note that if we consider what @asajeffrey described as undefined behaviour in the end, that can bring back a drop-based scoped thread API.

nox on 2 Feb 2018

As far as I understand Alan’s proposal is not to remove ManuallyDrop, only to make dropck assume that it (and other unions with Drop fields) has a destructor. (That destructors happens to do nothing, but its mere existence affects what programs dropck accepts or rejects.)

SimonSapin on 2 Feb 2018

I'll be happy to forget about it if someone can show it was unsound even without ManuallyDrop.

Not sure if this qualifies, but here's my first attempt: A silly implementation of something like ManuallyDrop that works in pre-union Rust.

pub mod manually_drop {
    use std::mem;
    use std::ptr;
    use std::marker::PhantomData;

    pub struct ManuallyDrop<T> {
        data: [u8; 32],
        phantom: PhantomData<T>,
    }

    impl<T> ManuallyDrop<T> {
        pub fn new(x: T) -> ManuallyDrop<T> {
            assert!(mem::size_of::<T>() <= 32);
            let mut data = [0u8; 32];
            unsafe {
                ptr::copy(&x as *const _ as *const u8, &mut data[0] as *mut _, mem::size_of::<T>());
            }
            mem::forget(x);
            ManuallyDrop { data, phantom: PhantomData }
        }

        pub fn deref(&self) -> &T {
            unsafe {
                &*(&self.data as *const _ as *const T)
            }
        }
    }
}

(Yeah I probably have to do some more work to get the alignment right, but that could be done as well by sacrificing some bytes.)
Playground showing this breaks Pin: https://play.rust-lang.org/?gist=fe1d841cedb13d45add032b4aae6321e&version=nightly

This is what I meant by two-edged sword above -- as far as I can see, my ManuallyDrop respects all the rules we have put out. So, we have two pieces of incompatible unsafe code -- ManuallyDrop and Pin. Who's "right"? I'd say Pin relies on guarantees we have never made and hence it's "wrong" here, but this is a judgment call, not a proof.

RalfJung on 2 Feb 2018

Now that's interesting. In some versions of our pinning stuff, Pin::pin takes a &'this mut Pin<'this, T>, but it wouldn't be unreasonable for your ManuallyDrop to have a DerefMut impl, right?

nox on 2 Feb 2018

Here is a playground that shows that @RalfJung's (unsurprisingly) still breaks Pin with a &mut-taking pin method.

https://play.rust-lang.org/?gist=5057570b54952e245fa463f8d7719663&version=nightly

nox on 2 Feb 2018

it wouldn't be unreasonable for your ManuallyDrop to have a DerefMut impl, right?

Yeah, I just added the API I needed for this example. The obvious deref_mut should work just fine.

As far as I understand Alan’s proposal is not to remove ManuallyDrop, only to make dropck assume that it (and other unions with Drop fields) has a destructor. (That destructors happens to do nothing, but its mere existence affects what programs dropck accepts or rejects.)

Ah, I had missed that; sorry about that. Adding the following to my example keeps it working though:

    unsafe impl<#[may_dangle] T> Drop for ManuallyDrop<T> {
        fn drop(&mut self) {}
    }

Only if I remove the #[may_dangle] Rust rejects it. So, at the very least, we'd have to come up with some rule that the above code violates -- just saying "there exists some code we want to be sound that this is incompatible with" is a bad call because it makes it pretty much impossible to look at some code and check if it is sound.

I think what makes me most uneasy about this "accidental guarantee" is that I don't see a single good reason that this works. The way things are wired up in Rust makes this hold together, but dropck has been added not to prevent leaks but to avoid unsound references to dead data (a common problem in destructors). The reasoning for Pin to work is not based on "here is some mechanism in the Rust compiler, or some type system guarantee, that pretty clearly says perma-borrowed data cannot be leaked" -- it is rather based on "we've tried hard and we have not been able to leak perma-borrowed data, so we think that's okay". Relying on this for soundness makes me pretty nervous. EDIT: The fact that dropck is involved makes me even more nervous because this part of the compiler has a history of nasty soundness bugs. The reason this works seems to be that perma-borrows are at odds with safe drop. This really seems to be "reasoning based on exhaustive case analysis of what one can do with perma-borrowed data".

Now, to be fair, one could say similar things about interior mutability -- it happens to be the case that permitting modifications through shared references actually works safely in some cases, if we pick the right API. However, making this work actually required explicit support in the compiler (UnsafeCell) because it clashes with optimizations, and there is unsafe code that would be sound without interior mutability but is not sound with interior mutability. Another difference is that interior mutability was a design goal from the start (or from very early on -- this is way before my time in the Rust community), which is not the case for "perma-borrowed doesn't get leaked". And finally, for interior mutability, I think there's a pretty good story about "sharing makes mutation dangerous, but not impossible, and shared references' API just says you don't get mutability in general but doesn't exclude permitting more operations for specific types", resulting in a coherent overall picture. Of course, I've spent lots of time thinking about shared references, so maybe there is an equally coherent picture for the issue at hand that I'm just not aware of.

RalfJung on 2 Feb 2018

👍1

Time zones are fun, I only just got up! There seem to be two issues here (invariants in general, and dropck in particular), so I'll put them in separate comments...

asajeffrey on 2 Feb 2018

@RalfJung: yes, this is an issue about the invariants being maintained by unsafe Rust. For any version of Rust+std, there's more than one choice of invariant I which is maintained using rely-guarantee reasoning. And indeed there may be two libraries L1 and L2, which chose incompatible I1 and I2, such that Rust+L1 is safe and Rust+L2 is safe, but Rust+L1+L2 is unsafe.

In this case, L1 is ManuallyDrop and L2 is Josephine, and it's pretty clear that ManuallyDrop is going to win since it's now in std, which has much stronger backward compatibility constraints than Josephine.

Interestingly, the guidelines at https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html are written as "It is the programmer's responsibility when writing unsafe code that it is not possible to let safe code exhibit these behaviors: ..." that is, it's a contextual property (for all safe contexts C, C[P] can't go wrong) and so is dependent on version (since v1.20 of Rust+std has more safe contexts than v1.18). In particular, I'd claim that pinning actually did satisfy this constraint for Rust before 1.20, since there was no safe context C s.t. C[Pinning] goes wrong.

However, this is just barracks-room lawyering, I think everyone agrees that there's a problem with this contextual defn, hence all the discussions about unsafe code guidelines.

If nothing else, I think that pinning has shown an interesting example of accidental invariants going wrong.

asajeffrey on 2 Feb 2018

The particular thing that untagged unions (and hence ManuallyDrop) did was in the interaction with the drop checker, in particular ManualDrop acts like its defn is:

unsafe impl<#[may_dangle] T> Drop for ManuallyDrop<T> { ... }

and then you can have a conversation about whether this is allowed or not :) Indeed, this conversation is happening over in the may_dangle thread starting at https://github.com/rust-lang/rust/issues/34761#issuecomment-362375924

@RalfJung your code shows an interesting corner case, where the run-time type for data is T, but it's compile-time type is [u8; N]. Which type counts as far as may_dangle is concerned?

asajeffrey on 2 Feb 2018

Interestingly, the guidelines at https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html are written as "It is the programmer's responsibility when writing unsafe code that it is not possible to let safe code exhibit these behaviors: ..." that is, it's a contextual property

Ah, interesting. I agree this is clearly not sufficient -- this would make the original scoped threads sound. To be meaningful, this has to (at least) specify the set of unsafe code that the safe code is permitted to call.

Personally, I feel a better way of specifying this is to give the invariants that are to be maintained. But I'm clearly biased here, because the methodology I use for proving things about Rust requires such an invariant. ;)

I'm a little surprised that page doesn't contain some sort of disclaimer of being preliminary; we're not really certain yet what exactly the limit will be -- as this discussion shows. We require unsafe code to at least do what that document says, but we probably have to require more.

For example, the limits of undefined behavior and what unsafe code can do are not the same. See https://github.com/nikomatsakis/rust-memory-model/issues/44 for a recent discussion on that topic: Duplicating a &mut T for mem::size_of::<T>() == 0 does not lead to any undefined behavior directly, and yet is clearly considered illegal for unsafe code to do. The reason is that other unsafe code may rely on its ownership discipline being respected, and duplicating things violates that discipline.

RalfJung on 2 Feb 2018

If nothing else, I think that pinning has shown an interesting example of accidental invariants going wrong.

Oh, that certainly. And I wonder what we can do to avoid this in the future? Maybe put some big warning onto https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html saying "just because an invariant happens to hold in rustc+libstd, doesn't mean unsafe code can rely on it; instead here's some invariants that you can rely on"?

RalfJung on 2 Feb 2018

@RalfJung yes, I don't think anyone's in love with the contextual defn of "correctness", primarily because it's brittle wrt the observational power of contexts. I'd be a lot happier with a semantic defn in terms of invariants.

The only thing I'd ask for is please can we give ourselves some wiggle room and define two invariants for rely-guarantee reasoning (code can rely on R and should guarantee G, where G implies R). That way there's some room for strengthening R and weakening G. If we just have one invariant (i.e. R=G) we're stuck never being able to change them!

asajeffrey on 2 Feb 2018

Constant checking currently doesn't special-case union fields: (cc @solson @oli-obk)

union Transmute<T, U> { from: T, to: U }

const SILLY: () = unsafe {
    (Transmute::<usize, Box<String>> { from: 1 }.to, ()).1
};

fn main() {
    SILLY
}

The above code produces the miri evaluation error "calling non-const fn std::ptr::drop_in_place::<(std::boxed::Box<std::string::String>, ())> - shim(Some((std::boxed::Box<std::string::String>, ())))".

Changing it to force the type of .to to be observed by the const-checker:

const fn id<T>(x: T) -> T { x }

const SILLY: () = unsafe {
    (id(Transmute::<usize, Box<String>> { from: 1 }.to), ()).1
};

results in "destructors cannot be evaluated at compile-time".

Relevant implementation code is here (specifically the restrict call):
https://github.com/rust-lang/rust/blob/5e4603f99066eaf2c1cf19ac3afbac9057b1e177/src/librustc_mir/transform/qualify_consts.rs#L557

eddyb on 26 Mar 2018

Better analysis of #41073 had revealed that the semantics for when destructors run when assigning to subfields of unions are insufficiently ready for stabilization. See that issue for details.

arielb1 on 10 Apr 2018

Is it realistic to just entirely rule out Drop types in unions, and implement ManuallyDrop separately (as a lang-item)? From what I can tell, ManuallyDrop seems to be the biggest motivation for Drop in unions, but that's a very special case.

Absent of a positive "no drop" trait, we could then say a union is well-formed if every field is either Copy or of the form ManuallyDrop<T>. That would entirely side-step all the complications around dropping when assigning union fields (where it seems every possible solution will be full of surprising footguns), and the ManuallyDrop is a clear marker to programmers that they have to handle Drop themselves here. (The check could be smarter, e.g. it could traverse through product types and through nominal types that are declared in the same crate. Of course, having a positive way to say "this type will never implement Drop" would be nicer.)

The checklist in the first post does not mention unsized unions, nor does the RFC---but still we have an implementation, restricted to single-variant unions. This is closely related to the interaction with layout optimizations because it presupposes (once thin-pointer DSTs enter the picture) that a single-variant union has to be "valid" in some sense (it may be dropped, but it can't be any odd bit pattern either).

This conflicts with the way unions are sometimes used in C, which is an an "extension point" (IIRC @joshtriplett was the one to mention this at the all-hands): A header file may declare 3 variants for a union, but this is considered forward-compatible with adding more variants later (as long as that does not increase the size of the union). The user of the library promises to not touch the union data if the tag (sitting somewhere else) indicates that they do not know the current variant. Crucially, if you only know a single variant, that doesn't mean that there only is a single variant!

RalfJung on 17 Apr 2018

👍2

The check could be smarter, e.g. it could traverse through product types and through nominal types that are declared in the same crate.

This predicate already exists, but is conservative in generics due to there being no trait to bound on.
You can access it via std::mem::needs_drop (which uses an intrinsic which rustc implements).

eddyb on 17 Apr 2018

@eddyb will needs_drop take forwards compatibility into account, or will it happily look into other creates to determine if their types implement Drop? The goal here is to have a check that will never break from semver-compatible changes, where e.g. adding an impl Drop to a struct with no type or lifetime parameters and only private fields is semver compatible.

RalfJung on 17 Apr 2018

@RalfJung

This conflicts with the way unions are sometimes used in C, which is an an "extension point" (IIRC @joshtriplett was the one to mention this at the all-hands): A header file may declare 3 variants for a union, but this is considered forward-compatible with adding more variants later (as long as that does not increase the size of the union). The user of the library promises to not touch the union data if the tag (sitting somewhere else) indicates that they do not know the current variant. Crucially, if you only know a single variant, that doesn't mean that there only is a single variant!

That's a very specific case.
It only affects C-style unions (so there are no destructors and everything is Copy, exactly the subset available on stable) generated from C headers.
We can easily add a _dummy: () or _future: () field to such unions and keep benefitting from safer "enum" model by default. A FFI union being an "extension point" is something that needs to be well documented anyway.

petrochenkov on 17 Apr 2018

On April 17, 2018 10:08:54 AM PDT, Vadim Petrochenkov notifications@github.com wrote:

We can easily add a _dummy: () or _future: () field to such unions
and keep benefitting from our safer "enum" model by default.

I've seen people talk about treating unions like enums for which we just don't know the discriminant, but to the best of my knowledge, I don't know of any actual model or treatment of them as such. In the original discussion, even non-FFI unions wanted the "multiple variants valid at a time" model, including the motivating use cases for wanting non-FFI unions at all.

Adding a () variant to a union shouldn't change anything, and unions should not be required to do so to get the semantics they expect. Unions should continue being a bag of bits, with Rust having no idea what they might contain at any given time until unsafe code accesses it.

FFI union
being an "extension point" is something that need to be well documented
anyway.

We should certainly document the semantics as precisely as we can.

joshtriplett on 17 Apr 2018

👍1

@RalfJung No, it behaves like auto traits do, exposing all internal details.

eddyb on 18 Apr 2018

There's currently some discussion around "active fields" and drop in unions over at https://github.com/rust-lang/rust/issues/41073#issuecomment-380291471

RalfJung on 18 Apr 2018

Unions should continue being a bag of bits, with Rust having no idea what they might contain at any given time until unsafe code accesses it.

This is exactly how I would expect unions to work. They are an advanced feature to squeeze out extra performance and interact with C code, where there are no such things as destructors.

For me, if you want to drop the contents of a union, you should ~have to cast/transmute (maybe it can't be transmute because it might be bigger with some unused bits at the end for another variant) it to the type you want to drop~ take pointers to the fields that need dropping and use std::ptr::drop_in_place, or use the field syntax to extract the value.

If I knew nothing about unions this is how I would expect them to work:

Example - Representing `mem::uninitialized` as a union

pub union MaybeValid<T> {
    valid: T,
    invalid: ()
}

impl<T> MaybeValid<T> {
    #[inline] // this should optimize to a no-op
    pub fn from_valid(valid: T) -> MaybeValid<T> {
        MaybeValid { valid }
    }

    pub fn invalid() -> MaybeValid<T> {
        MaybeValid { invalid: () }
    }

   pub fn zeroed() -> MaybeValid<T> {
        // do whatever is necessary here...
        unimplemented!()
    }
}

fn example() {
    let valid_data = MaybeValid::from_valid(1_u8);
    // Destructor of a union always does nothing, but that's OK since our 
    // data type owns nothing.
    drop(valid_data);
    let invalid_data = MaybeValid::invalid();
    // Destructor of a union again does nothing, which means it needs to know 
    // nothing about its surroundings, and can't accidentally try to free unused memory.
    drop(invalid_data);
    let valid_data = MaybeValid::from_valid(String::from("test string"));
    // Now if we dropped `valid_data` we would leak memory, since the string 
    // would never get freed. This is already possible in safe rust using e.g. `Rc`. 
    // `union` is a similarly advanced feature to `Rc` and so new users are 
    // protected by the order in which concepts are introduced to them. This is 
    // still "safe" even though it leaks because it cannot trigger UB.
    //drop(valid_data)
    // Since we know that our union is of a particular form, we can safely 
    // move the value out, in order to run the destructor. I would expect this 
    // to fail if the drop method had run, even though the drop method does 
    // nothing, because that's the way stuff works in rust - once it's dropped
    // you can't use it.
    let _string_to_drop = unsafe { valid_data.valid };
    // No memory leak and all unsafety is encapsulated.
}

I'm going to post this then edit it so I don't loose my working.
EDIT @SimonSapin way to drop fields.

derekdreery on 11 Jul 2018

👍1

if you want to drop the contents of a union, you should have to cast/transmute (maybe it can't be transmute because it might be bigger with some unused bits at the end for another variant) it to the type you want to drop, or use the field syntax to extract the value

(If it’s only to let it drop there is no need to extract the value in the sense of moving it, you can take a pointer to one of the fields and use std::ptr::drop_in_place.)

SimonSapin on 11 Jul 2018

👍1

Related: For constants I'm currently arguing that at least one field of a union inside a constant needs to be correct: https://github.com/rust-lang/rust/pull/51361 (if you have a ZST field that's always true)

oli-obk on 11 Jul 2018

👍2

I'm going to post this then edit it so I don't loose my working.

Please note that edits are not reflected in email notifications. If you’re going to make significant changes to your comment, consider making a new comment instead or in addition.

SimonSapin on 11 Jul 2018

👍2

@derekdreery (and everyone else) I'd be interested in your feedback for https://internals.rust-lang.org/t/pre-rfc-unions-drop-types-and-manuallydrop/8025

Related: For constants I'm currently arguing that at least one field of a union inside a constant needs to be correct: #51361

I've seen the implementation but not seen the argument. ;)

RalfJung on 22 Jul 2018

Well... argument by "not checking at all seemed weird".

I'll be happy to implement any scheme we come up with in the const checker, but my intuition always was that one variant of a union needs to be fully correct.

Otherwise, unions are just a pretty way to specify a type with a specific size and alignment and some compiler generated convenience for transmuting between a fixed set of types.

oli-obk on 22 Jul 2018

I think unions are "bags of uninterpreted bits" with some convenient way to access them. I see nothing weird at all about not checking them.

AFAIK there are actually some use-cases @joshtriplett mentioned at the Berlin all-hands where the first half of the union matches one field and the second half matches some other field.

RalfJung on 22 Jul 2018

👍1

I think unions are "bags of uninterpreted bits" with some convenient way to access them. I see nothing weird at all about not checking them.

I always though this interpretation goes against the spirit of the language somewhat.
In other places we use static analysis to prevent footguns, check that uninitialized or borrowed values are not accessed, but for unions that analysis is suddenly disabled, please shoot.

petrochenkov on 22 Jul 2018

I see that as exactly the purpose of union. I mean we also have raw pointers where all analysis is disabled. Unions provide full control over data layout, just like raw pointers provide full control over memory access. Both go at the cost of safety.

Also, this makes union simple. I think being simple is important, and even more important when unsafe code is involved (which will always be the case with unions). We should only accept extra complexity here if it provide tangible benefits.

RalfJung on 22 Jul 2018

👍1

We don't think we have to pay that cost for unions since the bag-of-bits model doesn't give any new opportunities compared to enum-with-unknown-variant model.

petrochenkov on 22 Jul 2018

The property in question here is at least as much of a burden for unsafe code to uphold as it is a safeguard. There's no static analysis which can prevent all mistakes that could break this property since we do want to use unions for unsafe type punning¹, so "enum with unknown variant" really means code handling unions has to be super careful with how it writes to the union or risk instant UB, without really reducing the unsafety involved in reading from the union, since reading already requires knowing (through channels the compiler doesn't understand) that the bits are valid for the variant you're reading. We can only actually warn users about a union that isn't valid for any of its variants is when running under miri, not in the vast majority of cases where it happens at runtime.

¹ For example, assuming tuples are repr(C) for simplicity, union Foo { a: (bool, u8), b: (u8, bool) } allows you to construct something that's invalid just by field assignments.

hanna-kruppe on 22 Jul 2018

👍2

@rkruppe

union Foo { a: (bool, u8), b: (u8, bool) }

Hey, that's my example :)
And it's valid under the RFC 1897's model (at least one of "leaf" fragments bool-1, u8-1, u8-2, bool-2 is valid after any partial assignments).

code handling unions has to be super careful with how it writes to the union or risk instant UB

That's the point of RFC 1897's model, static checking ensures that no safe operation (like assignment or partial assignment) can turn the union into invalid state, so you don't need to be super careful all the time and don't get instant UB.
Only union-unrelated unsafe operations like writes through wild pointers can make a union invalid.

On the other hand, without move checking, union can be put into invalid state very easily.

let u: Union;
let x = u.field; // UB

petrochenkov on 22 Jul 2018

That's the point of RFC 1897's model, static checking ensures that no safe operation (like assignment or partial assignment) can turn the union into invalid state, so you don't need to be super careful all the time and don't get instant UB.
Only union-unrelated unsafe operations like writes through wild pointers can make a union invalid.

You can automatically recognize some kinds of writes as not violating the extra invariants imposed on unions, but it's still extra invariants that need to be upheld by writers. Since reading is still unsafe and requires manually ensuring that the bits will be valid for the variant that's read, this doesn't actually help readers, it just makes writers' lifes harder. Neither "bag of bits" nor "enum with unknown variant" helps solve the hard problem of unions: how to ensure it actually stores the kind of data you want to read.

hanna-kruppe on 22 Jul 2018

How would the fancier type-checking affect Dropping? If you create a union then pass it to C, which takes ownership, will rust try to free the data, perhaps causing a double-free? Or would you always implement Drop yourself?

edit it would be way cool if unions were like "enums where the variant is checked statically at compile time", if I've understood the suggestion

edit 2 could unions start off as a bag of bits and then later allow safe access whilst being backwards-compatible?

derekdreery on 22 Jul 2018

And it's valid under the RFC 1897's model (at least one of "leaf" fragments bool-1, u8-1, u8-2, bool-2 is valid after any partial assignments).

If we decide we want this to be valid, I think @oli-obk should update miri's checks to reflect that -- with https://github.com/rust-lang/rust/pull/51361 merged, it would be rejected by miri.

@petrochenkov The part I do not understand is what this buys us. We get extra complexity, in terms of implementation (static analysis) and usage (user still needs to be aware of the exact rules). This extra complexity adds up to fact that when unions are used, we are already in an unsafe context so things are naturally more complex. I think we should have a clear motivation for why this extra complexity is worth it. I do not consider "it violates the spirit of the language somewhat" to be a clear motivation.

The one thing I can think of is layout optimizations. In a "bag of bits" model, a union has no niche, ever. However, I feel that is better addresses by giving the programmer more manual control over the niche, which would also be useful in other cases.

RalfJung on 22 Jul 2018

👍1

I think I am missing something fundamental here. I agree with @rkruppe that
the hard problem with unions is making sure that the union currently stores
the data that the program wants to read.

But AFAIK this problem cannot be solved “locally” by static analysis. We
would at least nead whole program analysis, and even then it would still be
a hard problem to solve.

So... is there a solution for this problem on the table? Or, what does the
exact solutions being proposed actually buy us? Say I get an union from C,
without analyzing the whole Rust and C program, what can the proposed
static analyses actually guarantee for readers?

gnzlbg on 22 Jul 2018

@gnzlbg I think the only guarantee we'd get is what @petrochenkov wrote above

static checking ensures that no safe operation (like assignment or partial assignment) can turn the union into invalid state

On the other hand, without move checking, union can be put into invalid state very easily.

Your proposal does not protect against bad reads either, I don't think that's possible.

Also, I imagined some very basic "initialized" tracking along the lines of "writing to any field initializes the union". We'd need something anyway when impl Drop for MyUnion is allowed. For better or worse, we have to decide when and where to insert automatic drop calls for a union. Those rules should be as simple as at all possible because this is extra code that we are inserting into existing subtle unsafe code. For unions that do implement Drop, I also imagined a restriction similar to struct that does not allow writing to a field unless the data structure is already initialized.

@derekchiang

could unions start off as a bag of bits and then later allow safe access whilst being backwards-compatible?
No. Once we say it's a bag of bits, there could be unsafe code assuming that's allowed.

RalfJung on 22 Jul 2018

I think there's value in the bare-minimum move checking to see if a union is initialized. The original RFC explicitly specified that initializing or assigning to any union field makes the whole union initialized. Beyond that, though, rustc should not try to infer anything about the value in a union that the user doesn't explicitly specify; a union may contain any value at all, including a value that isn't valid for any of its fields.

One use case for that, for instance: consider a C-style tagged union that's explicitly extensible with more tags in the future. C and Rust code reading that union must not assume it knows every possible field type.

joshtriplett on 22 Jul 2018

@RalfJung

Perhaps I should start from the other direction.

Should this code work 1) for unions 2) for non-unions?

let x: T;
let y = x.field;

For me the answer is obvious "no" in both cases, because this is a whole class of errors that Rust can and want to prevent, regardless of "union"-ness of T.

This means move checker should have some kind of scheme in accordance to which it implements that support. Given that move checker (and borrow checker) generally work in per-field fashion, the simplest scheme for unions would be "same rules as for structs + (de)initialization/borrow of a field also (de)initializes/borrows its sibling fields".
This simple rule covers all the static checking.

Then, the enum model is simply a consequence of the static checking described above + one more condition.
If 1) initialization checking is enabled and 2) unsafe code doesn't write arbitrary invalid bytes into the area belonging to the union, then one of unions "leaf" fields is automatically valid. This is dynamic uncheckable (at least for unions with >1 fields and outside of const-evaluator) guarantee, but it's targeted at people reading code first of all.

This case from @joshtriplett , for example

One use case for that, for instance: consider a C-style tagged union that's explicitly extensible with more tags in the future. C and Rust code reading that union must not assume it knows every possible field type.

would be much clearer for people reading code if the union explicitly had an extra field for "possible future extensions".

Of course, we can keep the basic static initialization checking, but reject the second condition and allow writing arbitrary possibly invalid data to the union through some unsafe "third party" means without it being instant UB. Then we wouldn't have that dynamic people-targeted guarantee anymore, I just think that would be a net loss.

petrochenkov on 22 Jul 2018

👍1

@petrochenkov

Should this code work 1) for unions 2) for non-unions?
let x: T;
let y = x.field;
For me the answer is obvious "no" in both cases, because this is a whole class of errors that Rust can and want to prevent, regardless of "union"-ness of T.

Agreed, this level of checking for uninitialized values seems reasonable, and quite feasible.

This means move checker should have some kind of scheme in accordance to which it implements that support. Given that move checker (and borrow checker) generally work in per-field fashion, the simplest scheme for unions would be "same rules as for structs + (de)initialization/borrow of a field also (de)initializes/borrows its sibling fields".
This simple rule covers all the static checking.

Agreed so far, assuming I understand the rules for structs.

Then, the enum model is simply a consequence of the static checking described above + one more condition.
If 1) initialization checking is enabled and 2) unsafe code doesn't write arbitrary invalid bytes into the area belonging to the union, then one of unions "leaf" fields is automatically valid. This is dynamic uncheckable (at least for unions with >1 fields and outside of const-evaluator) guarantee, but it's targeted at people reading code first of all.

That additional condition isn't valid for unions.

This case from @joshtriplett , for example

One use case for that, for instance: consider a C-style tagged union that's explicitly extensible with more tags in the future. C and Rust code reading that union must not assume it knows every possible field type.

would be much clearer for people reading code if the union explicitly had an extra field for "possible future extensions".

That's not how C unions work, nor how Rust unions were specified to work. (And I'd question whether it'd be clearer, or simply whether it matches a different set of expectations.) Changing this would make Rust unions no longer fit for some of the purposes for which they were designed and proposed.

Of course, we can keep the basic static initialization checking, but reject the second condition and allow writing arbitrary possibly invalid data to the union through some unsafe "third party" means without it being instant UB. Then we wouldn't have that dynamic people-targeted guarantee anymore, I just think that would be a net loss.

Those 'unsafe "third party" means' include "getting a union from FFI", which is a completely valid use case.

Here's a concrete example:

union Event {
    event_id: u32,
    event1: Event1,
    event2: Event2,
    event3: Event3,
}

struct Event1 {
    event_id: u32, // always EVENT1
    // ... more fields ...
}
// ... more event structs ...

match u.event_id {
    EVENT1 => { /* ... */ }
    EVENT2 => { /* ... */ }
    EVENT3 => { /* ... */ }
    _ => { /* unknown event */ }
}

That's completely valid code that people can and will write using unions.

joshtriplett on 22 Jul 2018

@petrochenkov

Should this code work 1) for unions 2) for non-unions?
For me the answer is obvious "no" in both cases, because this is a whole class of errors that Rust can and want to prevent, regardless of "union"-ness of T.

Fine for me.

the simplest scheme for unions would be "same rules as for structs + (de)initialization/borrow of a field also (de)initializes/borrows its sibling fields".

Woah. The struct rules make sense because they are all based on the fact that different fields are disjoint. You can't just invalidate that basic assumption and still use the same rules. The fact that you need an addendum to the rules show that. I would never expect unions to be checked similar to structs. If anything, one might expect them to be checked similar to enums -- but of course that cannot work, because enums can only be accessed via match.

If 1) initialization checking is enabled and 2) unsafe code doesn't write arbitrary invalid bytes into the area belonging to the union, then one of unions "leaf" fields is automatically valid. This is dynamic uncheckable (at least for unions with >1 fields and outside of const-evaluator) guarantee, but it's targeted at people reading code first of all.

I think it is extremely desirable for the basic validity assumptions to be dynamically checkable (given type information). Then we can check them during CTFE in miri, we can even check them during "full" miri runs (e.g. of a test suite), we can eventually have some kind of sanitizer or maybe a mode where Rust emits debug_assert! in critical places to check the validity invariants.
I think the experience with C's uncheckable rules gives ample evidence that these are problematic. Usually, the first step to actually understand and clarify what the rules are is to find a dynamically checkable way to express them. Even for concurrency memory models, "dynamically checkable" variants (operational semantics explaining everything in terms of step-by-step execution of a virtual machine) are showing up and seem to be the only way to solve long-standing open problems of the axiomatic models that were previously used ("ouf of thin air problem" is a keyword here).

I can hardly overstate how important I think it is to have dynamically checkable rules. I think we should aim to have 0 uncheckable cases of UB. (We're not there yet, but it's the goal we should have.) That is the only responsible way to have UB in your language, everything else is a case of compiler/language authors making their life easier at the expense of everyone who has to live with the consequences. (I am currently working on dynamically checkable rules for aliasing and raw pointer accesses.)
Even if that would be the only problem, as far as I am concerned "not dynamically checkable" is sufficient grounds to not use this approach.

That said, I see no fundamental reason why this should not be checkable: For every byte in the union, go over all variants to see which values are allowed for that byte in this variant, and take the union (heh ;) ) of all of those sets. A sequence of bytes is valid for a union if every byte is valid according to this definition.
This is, however, quite hard to actually implement a check for -- by far the most complex basic type validity invariant we would have in Rust. That is a direct consequence of the fact that this validity rule is somewhat tricky to describe, which is why I don't like it.

Of course, we can keep the basic static initialization checking, but reject the second condition and allow writing arbitrary possibly invalid data to the union through some unsafe "third party" means without it being instant UB. Then we wouldn't have that dynamic people-targeted guarantee anymore, I just think that would be a net loss.

What does that guarantee buy us? Where does it actually help? Right now, all I see is that everyone has to work hard and be careful to uphold it. I don't see the benefit we, the people, get out of that.

@joshtriplett

consider a C-style tagged union that's explicitly extensible with more tags in the future. C and Rust code reading that union must not assume it knows every possible field type.

The model proposed by @petrochenkov allows those usecases, by adding a __non_exhaustive: () field to the union. However, I don't think that should be necessary. Conceivably, binding generators could add such a field.

RalfJung on 22 Jul 2018

👍1

@RalfJung

This is dynamic uncheckable (at least for unions with >1 fields and outside of const-evaluator) guarantee

I think it is extremely desirable for the basic validity assumptions to be dynamically checkable

A clarification: I meant uncheckable in "by default"/"in release mode", of course it can be checkable in "slow mode" with some extra instrumentation, but you already wrote about this better than I could.

petrochenkov on 22 Jul 2018

👍1

@RalfJung

The model proposed by @petrochenkov allows those usecases, by adding a __non_exhaustive: () field to the union.

Yes, I understood that that was the proposal.

However, I don't think that should be necessary. Conceivably, binding generators could add such a field.

They could, but they'd have to systematically add it to every single union.

I have yet to see an argument for why it makes sense to break primary use cases of unions in favor of some unspecified use case that depends on limiting what bit patterns they can contain.

joshtriplett on 22 Jul 2018

👍2

@joshtriplett

primary use cases of unions

It's not obvious to me at all why this is the primary use case.
It may be true for repr(C) unions if you assume that all uses of unions for tagged unions / "Rust enum emulation" in FFI assume extensibility (which is not true), but from what I've seen, uses of repr(Rust) unions (drop control, intialization control, transmutes) do not expect "unexpected variants" suddenly appearing in them.

petrochenkov on 29 Jul 2018

@petrochenkov I didn't say "break the primary use case", I said "break primary use cases". FFI is one of the primary use cases of unions.

joshtriplett on 29 Jul 2018

and take the union (heh ;) ) of all of those sets

There's certainly an attractive obviousness to a statement that "the possible values of a union are the union of the possible values of all its possible variants"...

scottmcm on 30 Jul 2018

True. However, that's not the proposal -- we all agree that the following should be legal:

union F {
  x: (u8, bool),
  y: (bool, u8),
}
fn foo() -> F {
  let mut f = F { x: (5, false) };
  unsafe { f.y.1 = 17; }
  f
}

Actually I think it is a bug that this even requires unsafe.

So, the union has to be taken bytewise, at least.
Also, I don't think "attractive obviousness" on its own is a sufficiently good reason. Any invariant we decide on is a significant burden for unsafe code authors, we should have concrete advantages that we get in turn.

RalfJung on 30 Jul 2018

@RalfJung

Actually I think it is a bug that this even requires unsafe.

I don't know about the new MIR-based unsafety-checker implementation, but in the old HIR-based one it was certainly a checker limitation/simplification - only expressions of the form expr1.field = expr2 were analyzed for possible "field assignment" unsafety opt-out, everything else was conservatively treated as generic "field access" that's unsafe for unions.

petrochenkov on 30 Jul 2018

Answering the comment in https://github.com/rust-lang/rust/issues/52786#issuecomment-408645420:

So the idea is that compiler still doesn't know anything about the Wrap<T>'s contract and can't e.g. do layout optimizations. Ok, this position is understood.
This means that internally, inside of Wrap's module, implementation of Wrap<T> module can, for example, temporarily write "unexpected values" into it, if it doesn't leak them to users, and compiler will be okay with them.

I'm not sure though how exactly the part of Wraps contract about absence of unexpected values is related to field privacy.

First of all, regardless of fields being private or public, unexpected values cannot be written directly through those fields. You need something like a raw pointer, or code on the other side of FFI to do it, and it can be done without any field access, just by having a pointer to the whole union. So we need to approach this from some other direction than access to a field being restricted.

As I interpret you comment, the approach is to say that a private field (in union or a struct, doesn't matter) implies an arbitrary invariant unknown to user, so any operations changing that field (directly or through wild pointers, doesn't matter) result in UB because they can potentially break that unspecified invariant.

This means that if a union has a single private field, then its implementer (but not compiler) can assume that no third party will write an unexpected value into that union.
That's a "default union documentation clause" for the user in some sense:
- (Default) If a union has a private field you can't write garbage into it.
- Otherwise, you can write garbage into a union unless its docs explicitly prohibit it.

If some union wants to prohibit unexpected values while still providing pub access to its expected fields (e.g. when those fields have no their own invariants), then it still can do it through documentation, that's why the "unless" in the second clause is necessary.

@RalfJung
Does this describe you position accurately?

How scenarios like this are treated?

mod m {
    union MyPrivateUnion { /* private fields */ }
    extern {
        fn my_private_ffi_function() -> MyPrivateUnion; // Can return garbage (?)
    }
}

petrochenkov on 5 Aug 2018

As I interpret you comment, the approach is to say that a private field (in union or a struct, doesn't matter) implies an arbitrary invariant unknown to user, so any operations changing that field (directly or through wild pointers, doesn't matter) result in UB because they can potentially break that unspecified invariant.

No, that is not what I meant.

There are multiple invariants. I do not know how many we will need, but there will be at least two (and I don't have great names for them):

The "Layout-level invariant" (or "syntactic invariant") of a type is completely defined by the syntactic shape of the type. These are things like "&mut T is non-NULL and aligned", "bool is 0 or 1", "! cannot exist". On this level, *mut T is the same as usize -- both allow any value (or maybe any initialized value, but that distinction is for another discussion). We are, eventually, going to have a document spelling out these invariants for all types, by structural recursion: The layout-level invariant of a struct is that all its fields have their invariant maintained, etc. Visibility does not play a role here.

Violating the layout-level invariant is instantaneous UB. This is a statement we can make because we have defined this invariant in very simple terms, and we make it part of the definition of the language itself. We can then exploit this UB (and we already do), e.g. to perform enum layout optimizations.

The "Custom type-level invariant" (or "semantic invariant") of a type is picked by whoever implements the type. The compiler cannot know this invariant as we do not have a language to express it, and the same goes for the language definition. We cannot make violating this invariant UB, as we cannot even say what that invariant is! The fact that it is even possible to have custom invariants is a feature of any useful type system: Abstraction. I wrote more about this in a past blog post.

The connection between the custom, semantic invariant and UB is that we declare that unsafe code may rely on its semantic invariants being preserved by foreign code. That makes it incorrect to just go ahead any put random stuff into a Vec's size field. Note that I said incorrect (I sometimes use the term unsound) -- but not undefined behavior! Another example to demonstrate this difference (really, the same example) is the discussion about aliasing rules for &mut ZST. Creating a dangling well-aligned non-null &mut ZST is never immediate UB, but it is still incorrect/unsound because one may write unsafe code which relies on this not to happen.

It would be nice to align these two concepts, but I do not think it is practical. First of all, for some types (function pointers, dyn traits), the definition of the custom, semantic invariant actually uses the definition of UB in the language. This definition would be circular if we wanted to say that it is UB to ever violate the custom, semantic invariant. Secondly, I'd prefer if the definition of our language, and whether a certain execution trace exhibits UB, was a decidable property. Semantic, custom invariants are frequently not decidable.

I'm not sure though how exactly the part of Wraps contract about absence of unexpected values is related to field privacy.

Essentially, when a type chooses its custom invariant, it has to make sure that anything that safe code can do preserves the invariant. After all, the promise is that just using this type's safe API can never lead to UB. This is applies to both structs and unions. One of the things safe code can do is access public fields, which is where this connection comes from.

For example, a public field of a struct cannot have a custom invariant that is different from the custom invariant of the field type: After all, any safe user could write arbitrary data into that field, or read form the field and expect "good" data. A struct where all fields are public can be safely constructed, placing further restrictions on the field.

A union with a public field... well that's somewhat interesting. Reading union fields is unsafe anyway, so nothing changes there. Writing union fields is safe, so a union with a public field has to be able to handle arbitrary data which satisfies that field's type's custom invariant being put into the field. I doubt this will be very useful...

So, to recap, when you choose a custom invariant, it is your responsibility to make sure that foreign safe code cannot break this invariant (and you have tools like private fields to help you achieve this). It is the responsibility of foreign unafe code to not violate your invariant when that code does something safe code could not do.

This means that internally, inside of Wrap's module, implementation of Wrap module can, for example, temporarily write "unexpected values" into it, if it doesn't leak them to users, and compiler will be okay with them.

Correct. (panic-safety is a concern here but you are probably aware). This is just like, in Vec, I can safely do

let sz = self.size;
self.size = 1337;
self.size = sz;

and there is no UB.

mod m {
    union MyPrivateUnion { /* private fields */ }
    extern {
        fn my_private_ffi_function() -> MyPrivateUnion; // Can return garbage (?)
    }
}

In terms of the syntactic layout invariant, my_private_ffi_function can do anything (assuming the function call ABI and signature matches). In terms of the semantic custom invariant, that's not visible in the code -- whoever wrote this module had an invariant in mind, they should document it next to their union definition and then make sure that the FFI function returns a value which satisfies the invariant.

RalfJung on 6 Aug 2018

❤1 👍1

I finally wrote that blog post about whether and when &mut T must be initialized, and the two kinds of invariants I mentioned above.

RalfJung on 22 Aug 2018

Is there anything left to track here that’s not already covered by https://github.com/rust-lang/rust/issues/55149, or should we close?

SimonSapin on 10 Mar 2019

E0658 still points here:

error[E0658]: unions with non-Copy fields are unstable (see issue #32836)

Nemo157 on 13 May 2019

😕2

This currently plays terribly with atomics, since they do not implement Copy. Does anyone know a workaround?

Avi-D-coder on 16 Sep 2019

When https://github.com/rust-lang/rust/issues/55149 is implemented, you’ll be able to use ManuallyDrop<AtomicFoo> in a union. Until then, the only work-around is to use Nightly (or not use union and find some alternative).

SimonSapin on 16 Sep 2019

👍1

With that implemented, you shouldn't even need ManuallyDrop; after all rustc knows that Atomic* does not implement Drop.

RalfJung on 16 Sep 2019

👍1

Assigning myself to switch the tracking issue to the new one.

Centril on 21 Oct 2019

Rust: Untagged unions (tracking issue for RFC 1444)

Most helpful comment

All 210 comments

Example - Representing `mem::uninitialized` as a union

Related issues

Rust: Untagged unions (tracking issue for RFC 1444)

Most helpful comment

All 210 comments

Example - Representing mem::uninitialized as a union

Related issues

Example - Representing `mem::uninitialized` as a union