Go: proposal: spec: add sum types / discriminated unions

Created on 6 Mar 2017  ·  320Comments  ·  Source: golang/go

This is a proposal for sum types, also known as discriminated unions. Sum types in Go should essentially act like interfaces, except that:

  • they are value types, like structs
  • the types contained in them are fixed at compile-time

Sum types can be matched with a switch statement. The compiler checks that all variants are matched. Inside the arms of the switch statement, the value can be used as if it is of the variant that was matched.

Go2 LanguageChange NeedsInvestigation Proposal

Most helpful comment

Thanks for creating this proposal. I've been toying with this idea for a year or so now.
The following is as far as I've got with a concrete proposal. I think
"choice type" might actually be a better name than "sum type", but YMMV.

Sum types in Go

A sum type is represented by two or more types combined with the "|"
operator.

type: type1 | type2 ...

Values of the resulting type can only hold one of the specified types. The
type is treated as an interface type - its dynamic type is that of the
value that's assigned to it.

As a special case, "nil" can be used to indicate whether the value can
become nil.

For example:

type maybeInt nil | int

The method set of the sum type holds the intersection of the method set
of all its component types, excluding any methods that have the same
name but different signatures.

Like any other interface type, sum type may be the subject of a dynamic
type conversion. In type switches, the first arm of the switch that
matches the stored type will be chosen.

The zero value of a sum type is the zero value of the first type in
the sum.

When assigning a value to a sum type, if the value can fit into more
than one of the possible types, then the first is chosen.

For example:

var x int|float64 = 13

would result in a value with dynamic type int, but

var x int|float64 = 3.13

would result in a value with dynamic type float64.

Implementation

A naive implementation could implement sum types exactly as interface
values. A more sophisticated approach could use a representation
appropriate to the set of possible values.

For example a sum type consisting only of concrete types without pointers
could be implemented with a non-pointer type, using an extra value to
remember the actual type.

For sum-of-struct-types, it might even be possible to use spare padding
bytes common to the structs for that purpose.

All 320 comments

This has been discussed several times in the past, starting from before the open source release. The past consensus has been that sum types do not add very much to interface types. Once you sort it all out, what you get in the end is an interface type where the compiler checks that you've filled in all the cases of a type switch. That's a fairly small benefit for a new language change.

If you want to push this proposal along further, you will need to write a more complete proposal doc, including: What is the syntax? Precisely how do they work? (You say they are "value types", but interface types are also value types). What are the trade-offs?

I think this is too significant a change of the type system for Go1 and there's no pressing need.
I suggest we revisit this in the larger context of Go 2.

Thanks for creating this proposal. I've been toying with this idea for a year or so now.
The following is as far as I've got with a concrete proposal. I think
"choice type" might actually be a better name than "sum type", but YMMV.

Sum types in Go

A sum type is represented by two or more types combined with the "|"
operator.

type: type1 | type2 ...

Values of the resulting type can only hold one of the specified types. The
type is treated as an interface type - its dynamic type is that of the
value that's assigned to it.

As a special case, "nil" can be used to indicate whether the value can
become nil.

For example:

type maybeInt nil | int

The method set of the sum type holds the intersection of the method set
of all its component types, excluding any methods that have the same
name but different signatures.

Like any other interface type, sum type may be the subject of a dynamic
type conversion. In type switches, the first arm of the switch that
matches the stored type will be chosen.

The zero value of a sum type is the zero value of the first type in
the sum.

When assigning a value to a sum type, if the value can fit into more
than one of the possible types, then the first is chosen.

For example:

var x int|float64 = 13

would result in a value with dynamic type int, but

var x int|float64 = 3.13

would result in a value with dynamic type float64.

Implementation

A naive implementation could implement sum types exactly as interface
values. A more sophisticated approach could use a representation
appropriate to the set of possible values.

For example a sum type consisting only of concrete types without pointers
could be implemented with a non-pointer type, using an extra value to
remember the actual type.

For sum-of-struct-types, it might even be possible to use spare padding
bytes common to the structs for that purpose.

@rogpeppe How would that interact with type assertions and type switches? Presumably it would be a compile-time error to have a case on a type (or assertion to a type) that is not a member of the sum. Would it also be an error to have a nonexhaustive switch on such a type?

For type switches, if you have

type T int | interface{}

and you do:

switch t := t.(type) {
  case int:
    // ...

and t contains an interface{} containing an int, does it match the first case? What if the first case is case interface{}?

Or can sum types contain only concrete types?

What about type T interface{} | nil? If you write

var t T = nil

what is t's type? Or is that construction forbidden? A similar question arises for type T []int | nil, so it's not just about interfaces.

Yes, I think it would be reasonable to have a compile-time error
to have a case that can't be matched. Not sure about whether it's
a good idea to allow non-exhaustive switches on such a type - we
don't require exhaustiveness anywhere else. One thing that might
be good though: if the switch is exhaustive, we could not require a default
to make it a terminating statement.

That means that you can get the compiler to error if you have:

func addOne(x int|float64) int|float64 {
    switch x := x.(type) {
    case int:
        return x + 1
    case float64:
         return x + 1
    }
}

and you change the sum type to add an extra case.

For type switches, if you have

type T int | interface{}

and you do:

switch t := t.(type) {
case int:
// ...
and t contains an interface{} containing an int, does it match the first case? What if the first case is case interface{}?

t can't contain an interface{} containing an int. t is an interface
type just like any other interface type, except that it can only
contain the enumerated set of types that it consists of.
Just like an interface{} can't contain an interface{} containing an int.

Sum types can match interface types, but they still just get a concrete
type for the dynamic value. For example, it would be fine to have:

type R io.Reader | io.ReadCloser

What about type T interface{} | nil? If you write

var t T = nil

what is t's type? Or is that construction forbidden? A similar question arises for type T []int | nil, so it's not just about interfaces.

According to the proposal above, you get the first item
in the sum that the value can be assigned to, so
you'd get the nil interface.

In fact interface{} | nil is technically redundant, because any interface{}
can be nil.

For []int | nil, a nil []int is not the same as a nil interface, so the
concrete value of ([]int|nil)(nil) would be []int(nil) not untyped nil.

The []int | nil case is interesting. I would expect the nil in the type declaration to always mean "the nil interface value", in which case

type T []int | nil
var x T = nil

would imply that x is the nil interface, not the nil []int.

That value would be distinct from the nil []int encoded in the same type:

var y T = []int(nil)  // y != x

Wouldn't nil always be required even if the sum is all value types? Otherwise what would var x int64 | float64 be? My first thought, extrapolating from the other rules, would be the zero value of the first type, but then what about var x interface{} | int? It would, as @bcmills points out, have to be a distinct sum nil.

It seems overly subtle.

Exhaustive type switches would be nice. You could always add an empty default: when it's not the desired behavior.

The proposal says "When assigning a value to a sum type, if the value can fit into more
than one of the possible types, then the first is chosen."

So, with:

type T []int | nil
var x T = nil

x would have concrete type []int because nil is assignable to []int and []int is the first element of the type. It would be equal to any other []int (nil) value.

Wouldn't nil always be required even if the sum is all value types? Otherwise what would var x int64 | float64 be?

The proposal says "The zero value of a sum type is the zero value of the first type in
the sum.", so the answer is int64(0).

My first thought, extrapolating from the other rules, would be the zero value of the first type, but then what about var x interface{} | int? It would, as @bcmills points out, have to be a distinct sum nil

No, it would just be the usual interface nil value in that case. That type (interface{} | nil) is redundant. Perhaps it might be a good idea to make it a compiler to specify sum types where one element is a superset of another, as I can't currently see any point in defining such a type.

The zero value of a sum type is the zero value of the first type in the sum.

That is an interesting suggestion, but since the sum type must record somewhere the type of the value that it currently holds, I believe it means that the zero value of the sum type is not all-bytes-zero, which would make it different from every other type in Go. Or perhaps we could add an exception saying that if the type information is not present, then the value is the zero value of the first type listed, but then I'm not sure how to represent nil if it is not the first type listed.

So (stuff) | nil only makes sense when nothing in (stuff) can be nil and nil | (stuff) means something different depending on whether anything in stuff can be nil? What value does nil add?

@ianlancetaylor I believe many functional languages implement (closed) sum types essentially like how you would in C

struct {
    int which;
    union {
         A a;
         B b;
         C c;
    } summands;
}

if which indexes into the union's fields in order, 0 = a, 1 = b, 2 = c, the zero value definition works out to all bytes are zero. And you'd need to store the types elsewhere, unlike with interfaces. You'd also need special handling for the nil tag of some kind wherever you store the type info.

That would make union's value types instead of special interfaces, which is also interesting.

Is there a way to make the all zero value work if the field which records the type has a zero value representing the first type? I'm assuming that one possible way for this to be represented would be:

type A = B|C
struct A {
  choice byte // value 0 or 1
  value ?// (thing big enough to store B | C)
}

[edit]

Sorry @jimmyfrasche beat me to the punch.

Is there anything added by nil that couldn't be done with

type S int | string | struct{}
var None struct{}

?

That seems like it avoids a lot of the confusion (that I have, at least)

Or better

type (
     None struct{}
     S int | string | None
)

that way you could type switch on None and assign with None{}

@jimmyfrasche struct{} is not equal to nil. It's a minor detail, but it would make type-switches on sums needlessly(?) diverge from type-switches on other types.

@bcmills It wasn't my intent to claim otherwise—I meant that it could be used for the same purpose as differentiating a lack of value without overlapping with the meaning of nil in any of the types in the sum.

@rogpeppe what does this print?

// r is an io.Reader interface value holding a type that also implements io.Closer
var v io.ReadCloser | io.Reader = r
switch v.(type) {
case io.ReadCloser: fmt.Println("ReadCloser")
case io.Reader: fmt.Println("Reader")
}

I would assume "Reader"

@jimmyfrasche I would assume ReadCloser, same as you'd get from a type-switch on any other interface.

(And I would also expect sums which include only interface types to use no more space than a regular interface, although I suppose that an explicit tag could save a bit of lookup overhead in the type-switch.)

@bcmills it's the assigment that's interesting, consider: https://play.golang.org/p/PzmWCYex6R

@ianlancetaylor That's an excellent point to raise, thanks. I don't think it's hard to get around though, although it does imply that my "naive implementation" suggestion is itself too naive. A sum type, although treated as an interface type, does not have to actually contain direct pointer to the type and its method set - instead it could, when appropriate, contain an integer tag that implies the type. That tag could be non-zero even when the type itself is nil.

Given:

 var x int | nil = nil

the runtime value of x need not be all zeros. When switching on the type of x or converting
it to another interface type, the tag could be indirected through a small table containing
the actual type pointers.

Another possibility would be to allow a nil type only if it's the first element, but
that precludes constructions like:

var t nil | int
var u float64 | t

@jimmyfrasche I would assume ReadCloser, same as you'd get from a type-switch on any other interface.

Yes.

@bcmills it's the assigment that's interesting, consider: https://play.golang.org/p/PzmWCYex6R

I don't get this. Why would "this [...] have to be valid for the type switch to print ReadCloser"
Like any interface type, a sum type would store no more than the concrete value of what's in it.

When there are several interface types in a sum, the runtime representation is just an interface value - it's just that we know that the underlying value must implement one or more of the declared possibilities.

That is, when you assign something to a type (I1 | I2) where both I1 and I2 are interface types, it's not possible to tell later whether the value you put into was known to implement I1 or I2 at the time.

If you have a type that's io.ReadCloser | io.Reader you can't be sure when you type switch or assert on io.Reader that it's not an io.ReadCloser unless assignment to a sum type unboxes and reboxes the interface.

Going the other way, if you had io.Reader | io.ReadCloser it would either never accept an io.ReadCloser because it goes strictly right-to-left or the implementation would have to search for the "best matching" interface from all interfaces in the sum but that cannot be well defined.

@rogpeppe In your proposal, ignoring optimization possibilities in the implementation and subtleties of zero values, the main benefit of using a sum type over a manually crafted interface type (containing the intersection of the relevant methods) is that the type checker can point out errors at compile time rather than runtime. A 2nd benefit is that a type's value is more discriminated and thus may help with readability/understanding of a program. Is there any other major benefit?

(I am not trying to diminish the proposal in any way, just trying to get my intuition right. Especially if the extra syntactic and semantic complexity is "reasonably small" - whatever that may mean - I can definitively see the benefit of having the compiler catch errors early.)

@griesemer Yes, that's about right.

Particularly when communicating messages over channels or the network, I think it helps readability and correctness to be able to have a type that expresses exactly the available possibilities. It's common currently to make a half-hearted attempt to do this by including an unexported method in an interface type, but this is a) circumventable by embedding and b) it's hard to see all the possible types because the unexported method is hidden.

@jimmyfrasche

If you have a type that's io.ReadCloser | io.Reader you can't be sure when you type switch or assert on io.Reader that it's not an io.ReadCloser unless assignment to a sum type unboxes and reboxes the interface.

It you have that type, you know that it's always an io.Reader (or nil, because any io.Reader can also be nil). The two alternatives aren't exclusive - the sum type as proposed is an "inclusive or" not an "exclusive or".

Going the other way, if you had io.Reader | io.ReadCloser it would either never accept an io.ReadCloser because it goes strictly right-to-left or the implementation would have to search for the "best matching" interface from all interfaces in the sum but that cannot be well defined.

If by "going the other way", you mean assigning to that type, the proposal says:

"When assigning a value to a sum type, if the value can fit into more
than one of the possible types, then the first is chosen."

In this case, a io.ReadCloser can fit into both an io.Reader and an io.ReadCloser, so it chooses io.Reader, but there's actually no way to tell afterwards. There is no detectable difference between the type io.Reader and the type io.Reader | io.ReadCloser, because io.Reader can also hold all interface types that implement io.Reader. That's why I suspect it might be a good idea to make the compiler reject types like this. For example, it could reject any sum type involving interface{} because interface{} can already contain any type, so the extra qualifications don't add any information.

@rogpeppe there are a lot of things I like about your proposal. The left to right assignment semantics and the zero value is the zero value of the leftmost type rules are very clear and simple. Very Go.

What I'm worried about is assigning a value that's already boxed in an interface to a sum typed variable.

Let's, for the moment, use my previous example and say that RC is a struct that can be assigned to an io.ReadCloser.

If you do this

var v io.ReadCloser | io.Reader = RC{}

the results are obvious and clear.

However, if you do this

var r io.Reader = RC{}
var v io.ReadCloser | io.Reader = r

the only sensible thing to do is have v store r as an io.Reader, but that means when you type switch on v you can't be sure that when you hit the io.Reader case that you don't in fact have an io.ReadCloser. You'd need to have something like this:

switch v := v.(type) {
case io.ReadCloser: useReadCloser(v)
case io.Reader:
  if rc, ok := v.(io.ReadCloser); ok {
    useReadCloser(rc)
  } else {
    useReader(v)
  }
}

Now, there's a sense in which io.ReadCloser <: io.Reader, and you could just disallow those, as you suggested, but I think the problem is more fundamental and may apply to any sum type proposal for Go†.

Let's say you have three interfaces A, B, and C, with the methods A(), B(), and C() respectively, and a struct ABC with all three methods. A, B, and C are disjoint so A | B | C and its permutations are all valid types. But you still have cases like

var c C = ABC{}
var v A | B | C = c

There a bunch of ways to rearrange that and you still get no meaningful guarantees about what v is when interfaces are involved. After you unbox the sum you need to unbox the interface if order is important.

Maybe the restriction should be that none of the summands can be interfaces at all?

The only other solution I can think of is to disallow assigning an interface to a sum typed variable, but that seems in its own way more severe.

† that doesn't involve type constructors for the types in the sum to disambiguate (like in Haskell where you have to say Just v to construct a value of type Maybe)—but I am not in favor of that at all.

@jimmyfrasche Is the use-case for ordered unboxing actually important? That's not obvious to me, and for the cases where it is important it's easy to work around with explicit box structs:

type ReadCloser struct {  io.ReadCloser }
type Reader struct { io.Reader }

var v ReadCloser | Reader = Reader{r}

@bcmills It's more that the results are not obvious and fiddly and means that all the guarantees you want with a sum type evaporate when interfaces are involved. I can see it causing all kinds of subtle bugs and misunderstanding.

The explicit box structs example you provide shows that disallowing interfaces in sum types doesn't limit the power of sum types at all. It's effectively creating the type constructors for disambiguation that I mentioned in the footnote. Admittedly it's slightly annoying and an extra step, but it's simple and feels very much in line with Go's philosophy of letting language constructs be as orthogonal as possible.

all the guarantees you want with a sum type

It depends what guarantees you expect. I think you're expecting a sum type to be
a strictly tagged value, so given any types A|B|C, you know exactly what static
type you assigned to it. I see it as a type restriction on a single value of concrete
type - the restriction is that the value is type-compatible with (at least) one of A, B and C.
In the end it's just an interface with a value in.

That is, if a value can be assigned to a sum type by virtue of it being assignment-compatible
with one of the sum type's members, we don't record which of those members has been
"chosen" - we just record the value itself. The same as when you assign an io.Reader
to an interface{}, you lose the static io.Reader type and just have the value itself
which is compatible with io.Reader but also with any other interface type that it happens
to implement.

In your example:

var c C = ABC{}
var v A | B | C = c

A type assertion of v to any of A, B and C would succeed. That seems reasonable to me.

@rogpeppe those semantics make more sense than what I was imagining. I'm still not entirely convinced that interfaces and sums mix well, but I'm no longer certain they don't. Progress!

Let's say you have type U I | *T where I is an interface type and *T is a type that implements I.

Given

var i I = new(T)
var u U = i

the dynamic type of u is *T, and in

var u U = new(T)

you can access that *T as an I with a type assertion. Is that correct?

That would mean assignment from a valid interface value to a sum would have to search for the first matching type in the sum.

It would also be somewhat different from something like var v uint8 | int32 | int64 = i which would, I imagine, just always go with whichever of those three types i is even if i was an int64 that could fit in a uint8.

Progress!

Yay!

you can access that *T as an I with a type assertion. Is that correct?

Yes.

That would mean assignment from a valid interface value to a sum would have to search for the first matching type in the sum.

Yup, as the proposal says (of course the compiler knows statically which one to choose so there's no searching at runtime).

It would also be somewhat different from something like var v uint8 | int32 | int64 = i which would, I imagine, just always go with whichever of those three types i is even if i was an int64 that could fit in a uint8.

Yes, because unless i is a constant, it will only be assignable to one of those alternatives.

Yes, because unless i is a constant, it will only be assignable to one of those alternatives.

That's not quite true, I realise, because of the rule allowing assignment of unnamed types to named types. I don't think that makes too much difference though. The rule remains the same.

So the I | *T type from my last post is effectively the same as the type I and io.ReadCloser | io.Reader is effectively the same type as io.Reader?

That's right. Both types would be covered by my suggested rule that the compiler reject sum types where one type is an interface that is implemented by another of the types. The same or similar rule could cover sum types with duplicate types like int|int.

One thought: it is perhaps unintuitive that int|byte isn't the same as byte|int, but it's probably ok in practice.

That would mean assignment from a valid interface value to a sum would have to search for the first matching type in the sum.

Yup, as the proposal says (of course the compiler knows statically which one to choose so there's no searching at runtime).

I'm not following this. The way I read it (which could be different from what was intended) there's at least two ways to deal with a union U of I and T-implements-I.

1a) at assignment of U u = t, the tag is set to T. Later selection results in a T because the tag is a T.
1b) at assignment of U u = i (i is really a T), the tag is set to I. Later selection results in a T because the tag is a I but a second check (performed because T implements I and T is a member of U) discovers a T.

2a) like 1a
2b) at assignment of U u = i (i is really a T), generated code checks the value (i) to see if it is actually a T, because T implements I and T is also a member of U. Because it is, the tag is set to T. Later selection directly results in a T.

In the case that T, V, W all implement I and U = *T | *V | *W | I, assignment U u = i requires (up to) 3 type tests.

Interfaces and pointers was not the original use case for union types, though, was it?

I can imagine certain sorts of hackery where a "nice" implementation would perform some bit banging -- for example, if you have a union of 4 or fewer pointer types where all referents are 4-byte aligned, store the tag in the lower 2 bits of the value. This in turn implies that it's not good to take the address of a member of a union (it wouldn't be anyhow, since that address could be used to re-store an "old" type without adjusting the tag).

Or if we had a 50-ish-bit address space and were willing to take some liberties with NaNs, we could slap integers, pointers, and doubles all into a 64-bit union, and the possible cost of some bit fiddling.

Both sub-suggestions are gross, I am certain that both would have a small (?) number of fanatical proponents.

This in turn implies that it's not good to take the address of a member of a union

Correct. But I don't think the result of a type assertion is addressable today anyway, is it?

at assignment of U u = i (i is really a T), the tag is set to I.

I think this is the crux - there is no tag I.

Ignore the runtime representation for a moment and consider a sum type as an interface. As with any interface, it has a dynamic type (the type that's stored in it). The "tag" you refer to is exactly that dynamic type.

As you suggest (and I tried to imply in the last paragraph of the proposal) there may be ways to store the type tag in more efficient ways than with a pointer to the runtime type, but in the end it is always just encoding the dynamic type of the sum-type value, not which of the alternatives was "chosen" when it was created.

Interfaces and pointers was not the original use case for union types, though, was it?

It was not, but any proposal needs to be as orthogonal as possible with respect to other language features, in my view.

@dr2chase my understanding so far is that, if a sum type includes any interface types in its definition, then at runtime its implementation is identical to an interface (containing the intersection of method sets) but the compile-time invariants about allowable types are still enforced.

Even if a sum type only contained concrete types and it was implemented like a C-style discriminated union, you wouldn't be able to address a value in the sum type since that address could become a different type (and size) after you took the address. You could take the address of the sum typed value itself, though.

Is it desirable that sum types behave this way? We could just as easily declare that the selected/asserted type is the same as what the programmer said/implied when a value was assigned to the union. Otherwise we might get led to interesting places with respect to int8 vs int16 vs int32, etc. Or, e.g., int8 | uint8.

Is it desirable that sum types behave this way?

That's a matter of judgement. I believe it is, because we already have the concept of interfaces in the language - values with both a static and a dynamic type. The sum types as proposed just provide a more precise way to specify interface types in some cases. It also means that sum types can work without restriction on any other types. If you don't do that, you need to exclude interface types and then the feature isn't fully orthogonal.

Otherwise we might get led to interesting places with respect to int8 vs int16 vs int32, etc. Or, e.g., int8 | uint8.

What's your concern here?

You can't use a function type as a map's key type. I'm not saying that that's equivalent, just that there is precedent for types restricting other kinds of types. Still open to allowing interfaces, still not sold.

What kind of programs can you write with a sum type containing interfaces that you can't otherwise?

Counterproposal.

A union type is a type that lists zero or more types, written

union {
  T0
  T1
  //...
  Tn
}

All of the listed types (T0, T1, ..., Tn) in a union must be different and none can be interface types.

Methods may be declared on a defined (named) union type by the usual rules. No methods are promoted from the listed types.

There is no embedding for union types. Listing one union type in another is the same as listing any other valid type. However, a union cannot list its own type recursively, for the same reason that type S struct { S } is invalid.

Unions can be embedded in structs.

The value of a union type is a dynamic type, limited to one of the listed types, and a value of the dynamic type—said to to be the stored value. Exactly one of the listed types is the dynamic type at all times.

The zero value of the empty union is unique. The zero value of a nonempty union is the zero value of the first type listed in the union.

A value for a union type, U, can be created with U{} for the zero value. If U has one or more types and v is a value of one of the listed types, T, U{v} creates a union value storing v with dynamic type T. If v is of a type not listed in U that can be assigned to more than one of the listed types, an explicit conversion is required to disambiguate.

A value of a union type U can be converted to another union type V as in V(U{}) iff the set of types in U is a subset of the set of types in V. That is, ignoring order, U must have all the same types as V does, and U cannot have types that are not in V but V can have types not in U.

Assignability between union types is defined as convertibility is, as long as at most one of the union types is defined (named).

A value of one of the listed types, T, of a union type U may be assigned to a variable of the union type U. This sets the dynamic type to T and stores the value. Assignment compatible values work as above.

If all of the listed types support the equality operators:

  • the equality operators can be used on two values of the same union type. Two values of a union type are never equal if their dynamic types differ.
  • a value of that union may be compared with a value of any of its listed types. If the dynamic type of the union is not the type of the other operand, == is false and != is true regardless of the stored value. Assignment compatible values work as above.
  • the union may be used as a map key

No other operators are supported on values of a union type.

A type assertion against a union type for one of its listed types holds if the asserted type is the dynamic type.

A type assertion against a union type for an interface type holds if its dynamic type implements that interface. (Notably, if all the listed types implement this interface the assertion always holds).

Type switches must either be exhaustive, including all listed types, or contain a default case.

Type assertions and type switches return a copy of the stored value.

Package reflect would require a way to get the dynamic type and stored value of a reflected union value and a way to get the listed types of a reflected union type.

Notes:

The union{...} syntax was chosen partially to differentiate from the sum type proposal in this thread, primarily to retain the nice properties in the Go grammar, and incidentally to reinforce that this is a discriminated union. As a consequence, this allows somewhat strange unions such as union{} and union{ int }. The first is in many senses equivalent to struct{} (though by definition a different type) so it doesn't add to the language, other than adding another empty type. The second is perhaps more useful. For example, type Id union { int } is very much like type Id struct { int } except that the union version allows direct assignment without having to specify idValue.int allowing for it to seem more like a built in type.

The disambiguating conversion required when dealing with assignment compatible types are a bit harsh but would catch errors if a union is updated to introduce an ambiguity that downstream code is unprepared for.

The lack of embedding is a consequence of allowing methods on unions and requiring exhaustive matching in type switches.

Allowing methods on the union itself rather than taking the valid intersection of methods of the listed types avoid accidentally getting an unwanted methods. Type asserting the stored value to common interfaces allows simple, explicit wrapper methods when promotion is desired. For example, on a union type U all of whose listed types implement fmt.Stringer:

func (u U) String() string {
  return u.(fmt.Stringer).String()
}

In the linked reddit thread, rsc said:

It would be weird for the zero value of sum { X; Y } to be different from that of sum { Y; X }. That's not how sums usually work.

I've been thinking about this, since it applies to any proposal really.

That's not a bug: it's a feature.

Consider

type (
  Undefined = struct{}
  UndefinedOrInt union { Undefined; int }
)

vs.

type (
  Illegal = struct{}
  IntOrIllegal union { int; Illegal }
)

UndefinedOrInt says by default it's not yet defined, but, when it is, it will be an int value. This is analogous to *int which is how the sum type (1 + int) needs to be represented in Go now and the zero value is also analogous.

IntOrIllegal, on the other hand, says by default it's the int 0, but it may at some point be marked as illegal. This is still analogous to *int but the zero value is more expressive of the intent, like enforcing that it defaults to new(int).

It's kind of like being able to phrase a bool field in a struct in the negative so the zero value is what you want as the default.

Both zero values of the sums are useful and meaningful in their own right and the programmer can choose the most appropriate for the situation.

If the sum were a days of the week enum (each day being a defined struct{}), whichever is listed first is the first day of the week, the same for an iota-style enum.

Also, I'm not aware of any languages with sum types or discriminated/tagged unions that have the concept of a zero value. C would be the closest but the zero value is uninitialized memory—hardly a lead to follow. Java defaults to null, I believe, but that's because everything is a reference. All the other languages I know of have mandatory type constructors for the summands so there isn't really a notion of zero value. Is there such a language? What does it do?

If the difference from the mathematical concepts​ of "sum" and "union" is the problem, we can always call them something else (e.g. "variant").

For names: Union confuses c/c++ purists. Variant is mainly familiar to COBRA and COM programmers where as discriminated union seems to be preferred by the functional languages. Set is a verb and noun. I like the keyword _pick_. Limbo used _pick_. It's short and describes the type's intention to pick from a finite set of types.

The name/syntax is largely irrelevant. Pick would be fine.

Either proposal in this thread fits the set theoretic definition.

The first type being special for the zero value is irrelevant since type theoretic sums commute, so the order is irrelevant (A + B = B + A). My proposal maintains that property, but product types also commute in theory and are considered different in practice by most languages (Go included,) so it's probably not essential.

@jimmyfrasche

I personally believe that disallowing interfaces as 'pick' members is a very big drawback. First, it would completely defeat one of the great use cases of 'pick' types - having an error be one of the members. Or you want to deal with a pick type that has either an io.Reader or a string, if you don't want to force the user to use a StringReader beforehand. But all in all, an interface is just another type, and I believe there shouldn't be type restrictions for 'pick' members. That being the case, if a pick type has 2 interface members, where one is fully enclosed by the other, that should be a compile-time error, as previously mentioned.

What I do like from your counter proposal is the fact that methods can be defined on the pick type. I don't think that it should provide a cross section of members' methods, since I don't think there would be a lot of cases where any methods would belong to all members (and you have interfaces for that anyway). And an exhaustive switch + default case is a very good idea.

@rogpeppe @jimmyfrasche Something I don't see in your proposals is why we should do this. There is a clear disadvantage to adding a new kind of type: it's a new concept that everybody who leans Go will have to learn. What is the compensating advantage? In particular, what does the new kind of type give us that we don't get from interface types?

@ianlancetaylor Robert summarized it well here: https://github.com/golang/go/issues/19412#issuecomment-288608089

@ianlancetaylor
At the end of the day, it makes code more readable, and that is the prime directive of Go. Consider json.Token, it's currently defined as an interface{}, however the documentation states that it can actually be only one of specific number of types. If, on the other hand, it's written as

type Token Delim | bool | float64 | Number | string | nil

The user will be able to immediately see all the possibilities, and the tooling would be able to create an exhaustive switch automatically. Furthermore, the compiler will prevent you from sticking an unexpected type in there as well.

At the end of the day, it makes code more readable, and that is the prime directive of Go.

More features means one has to know more to understand the code. For a person of an average-only knowledge of a language its readability is necessarily inversely proportional to the number of [newly added] features.

@cznic

More features means one has to know more to understand the code.

Not always. If you can substitute "knowing more about the language" for "knowing more about poorly- or inconsistently-documented invariants in the code", that can still be a net win. (That is, global knowledge can displace the need for local knowledge.)

If better compile-time type checking is indeed the only benefit, then we can get a very similar benefit without changing the language by introducing a comment checked by vet. Something like

//vet:types Delim | bool | float64 | Number | string | nil
type Token interface{}

Now, we don't currently have any kind of vet comments so this is not an entirely serious suggestion. But I'm serious about the basic idea: if the only advantage we get is something that we can do entirely with a static analysis tool, is it really worth adding a complex new concept to the language proper?

Many, perhaps all, of the tests done by cmd/vet could be added to the language, in the sense that they could be checked by the compiler rather than by a separate static analysis tool. But for various reasons we find it useful to separate vet from the compiler. Why does this concept fall on the language side rather than the vet side?

@ianlancetaylor re checked comments: https://github.com/BurntSushi/go-sumtype

@ianlancetaylor as far as whether the change is justified, I've been actively ignoring that—or rather pushing it back. Talking about it in the abstract is vague and doesn't help me: it all sounds like "good things are good and bad things are bad" to me. I wanted to get an idea of what the type would actually be—what it's limitations are, what implications it has, what are the pros, what are the cons—so I could see how it would fit with in the language (or not!) and have an idea of how I would/could use it in programs. I think I have a good idea of what sum types would have to mean in Go now, at least from my perspective. I'm not entirely convinced they're worth it (even if I want 'em real bad), but now that I have something solid to analyze with well defined properties that I can reason about. I know that's not really an answer, per se, but it's where I'm at with this, at least.

If better compile-time type checking is indeed the only benefit, then we can get a very similar benefit without changing the language by introducing a comment checked by vet.

This is still vulnerable to the need-to-learn-new-things criticism. If I have to learn about those magic vet comments to debug/understand/use code, it's a mental tax, no matter whether we assign it to the Go-language budget or the technically-not-the-Go-language budget. If anything, magic comments are more costly because I didn't know I needed to learn them when I thought I learned the language.

@cznic
I disagree. With your current assumption, you cannot be sure that a person would then understand what a channel is, or even what a function is. Yet these things exist in the language. And a new feature does not automatically mean that it would make the language harder. In this case, I would argue that it would in fact make it easier to understand, because it makes it immediately clear to the reader what a type is supposed to be, as opposed to using a black box interface{} type.

@ianlancetaylor
I personally think this feature has more to do with making code easier to read and reason about. Compile time safety is a very nice feature, but not the main one. Not only would it make a type signature immediately more obvious, but its subsequent usage would also be easier to understand, and easier to write. People would no longer need to resort to panics if they receive a type that they didn't expect - that is the current behavior even in the standard library, thus though would have an easier time thinking about the usage, without being encumbered by the unknown. And I don't think it is a good idea to rely on comments and other tools (even if they are first-party) for this, because a cleaner syntax is more readable than such a comment. And comments are structureless, and much easier to mess up.

@ianlancetaylor

Why does this concept fall on the language side rather than the vet side?

You could apply that same question to any feature outside the turing-complete core, and arguably we don't want Go to be a "turing tarpit". On the other hand, we do have examples of languages that have shoved significant subsets of the actual language off into a generic "extension" syntax. (For example, "attributes" in Rust, C++, and GNU C.)

The main reason to put features in extensions or attributes instead of in a core language is to preserve syntax compatibility, including compatibility with tools that are not aware of the new feature. (Whether "compatibility with tools" actually works in practice depends strongly on what the feature actually does.)

In the context of Go, it seems like the main reason to put features in vet is to implement changes which would not preserve Go 1 compatibility if applied to the language itself. I don't see that as an issue here.

One reason not to put features in vet is if they need to be propagated during compilation. For example, if I write:

switch x := somepkg.SomeFunc().(type) {
…
}

will I get the proper warnings for types that aren't in the sum, across package boundaries? It's not obvious to me that vet can do that deep a transitive analysis, so perhaps that's a reason it would need to go into the core language.

@dr2chase In general, of course, you are correct, but are you correct for this specific example? The code is completely comprehensible without knowing what the magic comment means. The magic comment doesn't change what the code does in any way. The error messages from vet should be clear.

@bcmills

Why does this concept fall on the language side rather than the vet side?

You could apply that same question to any feature outside the turing-complete core....

I don't agree. If the feature under discussion affects the compiled code, then there is an automatic argument in favor of it. In this case, the feature apparently does not affect the compiled code.

(And, yes, vet can parse the source of imported packages.)

I'm not trying to claim that my argument about vet is conclusive. But every language change starts from a negative position: a simple language is very very desirable, and a significant new feature like this inevitably makes the language more complex. You need strong arguments in favor of a language change. And from my perspective those strong arguments have not yet appeared. After all, we've thought about this issue for a long time, and it is a FAQ (https://golang.org/doc/faq#variant_types).

@ianlancetaylor

In this case, the feature apparently does not affect the compiled code.

I think that depends on the specific details? The "zero-value of the sum is the zero-value of the first type" behavior that @jimmyfrasche mentioned above (https://github.com/golang/go/issues/19412#issuecomment-289319916) certainly would.

@urandom I was writing up a long explanation of why interface and union types didn't mix without explicit type constructors, but then I realized there was a kind of sensible way to do that, so:

Quick and dirty counterproposal to my counterproposal. (Anything not explicitly mentioned is the same as my previous proposal). I'm not sure one proposal is better than the other, but this one allows interfaces and is all around more explicit:

The union has explicit "field names" hereafter called "tag names":

union { //or whatever
  None, invalid struct{} //None is zero value
  Good, Bad int
  Err error //okay because it's explicitly named
}

There is still no embedding. It's always an error to have a type without a tag name.

Union values have a dynamic tag rather than a dynamic type.

Literal value creation: U{v} is only valid if completely unambiguous, otherwise it has to be U{Tag: v}.

Convertibility and assignment compatibility take tag names into account as well.

Assignment to a union is not magic. It always means assigning a compatible union value. To set the stored value, the desired tag name must be explicitly used: v.Good = 1 sets the dynamic tag to Good and the stored value to 1.

Accessing the stored value uses a tag assertion rather than a type assertion:

g := v.[Tag] //may panic
g, ok := v.[Tag] //no panic but could return zero-value, false

v.Tag is an error on the rhs since it's ambiguous.

Tag switches are like type switches, written switch v.[type], except that the cases are the tags of the union.

Type assertions hold with respect to the type of the dynamic tag. Type switches work similarly.

Given values a, b of some union type, a == b if their dynamic tags are the same and the stored value is the same.

Checking whether the stored value is some particular value requires a tag assertion.

If a tag name is unexported it can only be set and accessed in the package that defines the union. This means that a tag switch of a union with mixed exported and unexported tags can never be exhaustive outside the defining package without a default case. If all the tags are unexported it's a black box.

Reflection needs handle to the tag names as well.

e: Clarification for nested unions. Given

type U union {
  A union {
    A1 T1
    A2 T2
  }
  B union {
    B1 T3
    B2 T4
  }
}
var u U

The value of u is the dynamic tag A and the stored value is the anonymous union with dynamic tag A1 and its stored value is the zero value of T1.

u.B.B2 = returnsSomeT3()

is all that is necessary to switch u from the zero value, even though it moves from one of the nested unions to the other as it's all stored in one memory location. But

v := u.[A].[A2]

has two chances to panic since it tag asserts on two union values and the 2 valued version of tag assertion isn't available without splitting across multiple lines. Nested tag switches would be cleaner, in this case.

edit2: Clarification on type asserts.

Given

type U union {
  Exported, unexported int
}
var u U

a type assert like u.(int) is entirely reasonable. Within the defining package, that would always hold. However, if u is outside the defining package u.(int) would panic when the dynamic tag is unexported to avoid leaking implementation details. Similarly for assertions to an interface type.

@ianlancetaylor Here are a few examples of how this feature would help:

  1. At the heart of some packages (go/ast for example) is one or more large sum types. It's hard to navigate these packages without understanding those types. More confusingly, sometimes a sum type is represented by an interface with methods (e.g. go/ast.Node), other times by the empty interface (e.g. go/ast.Object.Decl).

  2. Compiling the protobuf oneof feature to Go results in an unexported interface type whose only purpose is to make sure assignment to the oneof field is type-safe. That in turn requires generating a type for each branch of the oneof. Type literals for the end product are hard to read and write:

    &sppb.Mutation{
               Operation: &sppb.Mutation_Delete_{
                   Delete: &sppb.Mutation_Delete{
                       Table:  m.table,
                       KeySet: keySetProto,
                   },
               },
    }
    

    Some (though not all) oneofs could be expressed by sum types.

  3. Sometimes a "maybe" type is exactly what one needs. For example, many Google API resource update operations permit a subset of the resource's fields to be changed. One natural way to express this in Go is by a variant of the resource struct with a "maybe" type for each field. For instance, the Google Cloud Storage ObjectAttrs resource looks like

    type ObjectAttrs struct {
       ContentType string
       ...
    }
    

    To support partial updates, the package also defines

    type ObjectAttrsToUpdate struct {
       ContentType optional.String
       ...
    }
    

    Where optional.String looks like this (godoc):

    // String is either a string or nil.
    type String interface{}
    

    This is both hard to explain and type-unsafe, but it turns out to be convenient in practice, because an ObjectAttrsToUpdate literal looks exactly like an ObjectAttrs literal, while encoding presence. I wish we could have written

    type ObjectAttrsToUpdate struct {
       ContentType string | nil
       ...
    }
    
  4. Many functions return (T, error) with xor semantics (T is meaningful iff error is nil). Writing the return type as T | error would clarify the semantics, increase safety, and provide more opportunities for composition. Even if we can't (for compatibility reasons) or don't want to change a function's return value, the sum type is still useful for carrying around that value, like writing it to a channel.

A go vet annotation would admittedly help many of these cases, but not the ones where an anonymous type makes sense. I think if we had sum types we'd see a lot of

chan *Response | error

That type is short enough to write out multiple times.

@ianlancetaylor this probably isn't a great start, but here's everything you can do with unions that you can already do in Go1, because I figured it was only fair to acknowledge and summarize those arguments:

(Using my latest proposal with tags for the syntax/semantics below. Also assuming the emitted code is basically like the C code I posted much earlier in the thread.)

Sum types overlap with iota, pointers, and interfaces.

iota

These two types are roughly equivalent:

type Stoplight union {
  Green, Yellow, Red struct {}
}

func (s Stoplight) String() string {
  switch s.[type] {
  case Green: return "green" //etc
  }
}

and

type Stoplight int

const (
  Green Stoplight = iota
  Yellow
  Red
)

func (s Stoplight) String() string {
  switch s {
  case Green: return "green" //etc
  }
}

The compiler would likely emit exactly the same code for both.

In the union version, the int is turned into a hidden implementation detail. With the iota version you can ask what Yellow/Red is or set a Stoplight value to -42 is, but not the with the union version—those are all compiler errors and invariants that can be taken into account during optimization. Similarly, you can write a (value) switch that fails to account for Yellow lights but with a tag switch you'd need a default case to make that explicit.

Of course, there are things you can do with iota that you can't do with union types.

pointers

These two types are roughly equivalent

type MaybeInt64 union {
  None struct{}
  Int64 int64
}

and

type MaybeInt64 *int64

The pointer version is more compact. The union version would need an extra bit (which in turn would likely be word sized) to store the dynamic tag, so the size of the value would likely be the same as https://golang.org/pkg/database/sql/#NullInt64

The union version more clearly documents the intent.

Of course, there are things you can do with pointers that you can't do with union types.

interfaces

These two types are roughly equivalent

type AB union {
  A A
  B B
}

and

type AB interface {
  secret()
}
func (A) secret() {}
func (B) secret() {}

The union version can't be circumvented with embedding. A and B don't need methods in common—they could in fact be primitive types or have entirely disjoint method sets, like the json.Token example @urandom posted.

It's really easy to see what you can put in an AB union versus an AB interface: the definition is the documentation (I have had to read the go/ast source multiple times to figure out what something is).

The AB union can never be nil and can be given methods outside the intersection of its constituents (this could be simulated by embedding the interface in a struct but then construction becomes more delicate and error prone).

Of course, there are things you can do with interfaces that you can't do with union types.

Summary

Maybe that overlap is too much overlap.

In each case the primary benefit of the union versions is indeed stricter compile time checking. What you can't do is more important than what you can. For the compiler that translates into stronger invariants it can use to optimize the code. For the programmer that translates into another thing you can let the compiler worry about—it'll just tell you if you're wrong. In the interface version, at the very least, there are important documentation benefits.

Clunky versions of the iota and pointer examples can be constructed using the "interface with an unexported method" strategy. For that matter, though, structs could be simulated with a map[string]interface{} and (nonempty) interfaces with func types and method values. No one would because it's harder and less safe.

All of those features add something to the language but their absence could be worked around (painfully, and under protest).

So I'm assuming the bar isn't to demonstrate a program that cannot even be approximated in Go, but rather to demonstrate a program that is much more easily and cleanly written in Go with unions than without. So what remains to be shown is that.

@jimmyfrasche

I see no reason why the union type should have named fields. Names are only useful if you want to distinguish between different fields of the same type. However, a union must never have multiple fields of the same type, as that is quite meaningless. Thus, having names is just redundant, and leads to confusion and more typing.

In an essence, your union type should look something like:

union {
    struct{}
    int
    err
}

The types themselves will provide the unique indentifiers that can be used to assign to a union, quite similar to the way embedded types in structs are used as identifiers.

However, in order for explicit assignments to work, one cannot be able to create a union type by specifying an unnamed type as a member, since the syntax would allow such an expression. E.g. v.struct{} = struct{}

Thus, types like raw struct, unions and funcs have to be named beforehand in order to be part of a union, and become assignable. With this in mind, a nested union will not be anything special, as the inner union will just be another member type.

Now, I'm not sure which syntax would be better.

[union|sum|pick|oneof] {
    type1
    package1.type2
    ....
}

The above seems more go-like, but is a bit verbose for such a type.

On the other hand, type1 | package1.type2 may not look like your usual go type, however it gains the benefit of using the '|' symbol, which is predominantly recognised as an OR. And it reduces verbosity without being cryptic.

@urandom if you don't have "tag names" but allow interfaces the sums collapse into an interface{} with extra checks. They stop being sum types since you can put one thing in but get it out multiple ways. The tag names let them be sum types and hold interfaces without ambiguity.

The tag names repair a lot more than just the interface{} problem, though. They make the type much less magical and let everything be gloriously explicit without having to invent a bunch of types just to differentiate. You can have explicit assignment and type literals, as you point out.

That you can give one type more than one tag is a feature. Consider a type to measure how many successes or failures have happened in a row (1 success cancels out N failures and vice versa)

type Counter union {
  Successes, Failures uint 
}

without the tag names you'd need

type (
  Success uint
  Failures uint
  Counter Successes | Failures
)

and assignment would look like c = Successes(1) instead of c.Successes = 1. You don't gain much.

Another example is a type that represents local or remote failure. With tag names this is easy to model:

type Failure union {
  Local, Remote error
}

The providence of the error can be specified with its tag name, regardless of what the actual error is. Without tag names you'd need type Local { error } and the same for remote, even if you allow interfaces directly in the sum.

The tag names are sort of creating special neither alias nor named types locally in the union. Having multiple "tags" with identical types isn't unique to my proposal: it's what every functional language (that I know of) does.

The ability to create unexported tags for exported types and vice versa is an interesting twist, too.

Also having separate tag and type assertions allows for some interesting code, like being able to promote a shared method to the union with a one-line wrapper.

It seems like it solves more problems than it causes and makes everything fit together much nicer. I honestly wasn't so sure when I wrote it up, but I'm becoming increasingly convinced that it's the only way to resolve all the issues with integrating sums into Go.

To expand on that somewhat, the motivating example for me was from @rogpeppe io.Reader | io.ReadCloser. Allowing interfaces without tags, this is the same type as io.Reader.

You can put a ReadCloser in and pull it out as a Reader. You lose the A | B means A or B property of sum types.

If you need to be specific about sometimes handling an io.ReadCloser as an io.Reader you need to create wrapper structs as @bcmills pointed out, type Reader struct { io.Reader } etc. and have the type be Reader | ReadCloser.

Even if you limit sums to interfaces with disjoint method sets, you still have this problem because one type can implement more than one of those interfaces. You lose the explicitness of sum types: they're not "A or B": they're "A or B or sometimes whichever you feel like".

Worse, if those types are from other packages they can suddenly behave differently after an update even if you were very careful to construct your program so that A is never treated the same as B.

Originally I explored disallowing interfaces to solve the problem. No one was happy with that! But it also didn't get rid of problems like a = b meaning different things depending on the types of a and b, which I'm not comfortable with. There also had to be a lot of rules about what type is chosen in the pick when type assignability comes into play. It's a lot of magic.

You add tags and that all goes away.

With union { R io.Reader | RC io.ReadCloser } you can explicitly say I want this ReadCloser to be considered as a Reader if that's what makes sense. No wrapper types needed. It's implicit in the definition. Regardless of the type of the tag, it's either one tag or the other.

The downside is that, if you get an io.Reader from somewhere else, say a chan receive or func call, and it might be an io.ReadCloser and you need to assign it to the proper tag you have to type assert on io.ReadCloser and test. But that makes the intent of the program much clearer—exactly what you mean is in the code.

Also because the tag assertions are different from type assertions, if you really don't care and just want an io.Reader regardless, you can use a type assertion to pull that out, regardless of tag.

This is a best-effort transliteration of a toy example into Go without unions/sums/etc. It's probably not the best example but it's the one I used to see what this would look like.

It shows the semantics in a more operational fashion, which will likely be easier to understand than some terse bullet points in a proposal.

There is quite a bit of boilerplate in the transliteration so I generally only written the first instance of several methods with a note about the repetition.

In Go with union proposal:

type fail union { //zero value: (Local, nil)
  Local, Remote error
}

func (f fail) Error() string {
  //Could panic if local/remote nil, but assuming
  //it will be constructed purposefully
  return f.(error).Error()
}

type U union { //zero value: (A, "")
  A, B, C string
  D, E    int
  F       fail
}

//in a different package

func create() pkg.U {
  return pkg.U{D: 7}
}

func process(u pkg.U) {
  switch u := u.[type] {
  case A:
    handleA(u) //undefined here, just doing something with unboxed value
  case B:
    handleB(u)
  case C:
    handleC(u)
  case D:
    handleD(u)
  case E:
    handleE(u)
  case F:
    switch u := u.[type] {
    case Local:
      log.Fatal(u)
    case Remote:
      log.Printf("remote error %s", u)
      retry()
    } 
  }
}

Transliterated to current Go:

(notes are included about the differences between the transliteration and the above)

const ( //simulates tags, namespaced so other packages can see them without overlap
  Fail_Local = iota
  Fail_Remote
)

//since there are only two tags with a single type this can
//be represented precisely and safely
//the error method on the full version of fail can be
//put more succinctly with type embedding in this case

type fail struct { //zero value (Fail_Local, nil) :)
  remote bool
  error
}

// e, ok := f.[Local]
func (f *fail) TagAssertLocal2() (error, bool) { //same for TagAssertRemote2
  if !f.remote {
    return nil, false
  }
  return f.error, true
}

// e := f.[Local]
func (f *fail) TagAssertLocal() error { //same for TagAssertRemote
  if !f.remote {
    panic("invalid tag assert")
  }
  return f.error
}

// f.Local = err
func (f *fail) SetLocal(err error) { //same for SetRemote
  f.remote = false
  f.error = err
}

// simulate tag switch
func (f *fail) TagSwitch() int {
  if f.remote {
    return Fail_Remote
  }
  return Fail_Local
}

// f.(someType) needs to be written as f.TypeAssert().(someType)
func (f *fail) TypeAssert() interface{} {
  return f.error
}

const (
  U_A = iota
  U_B
  // ...
  U_F
)

type U struct { //zero value (U_A, "", 0, fail{}) :(
  kind int //more than two types, need an int
  s string //these would all occupy the same space
  i int
  f fail
}

//s, ok := u.[A]
func (u *U) TagAssertA2() (string, bool) { //similar for B, etc.
  if u.kind == U_A {
    return u.s, true
  }
  return "", false
}

//s := u.[A]
func (u *U) TagAssertA() string { //similar for B, etc.
  if u.kind != U_A {
    panic("invalid tag assert")
  }
  return u.s
}

// u.A = s
func (u *U) SetA(s string) { //similar for B, etc.
  //if there were any pointers or reference types
  //in the union, they'd have to be nil'd out here,
  //since the space isn't shared
  u.kind = U_A
  u.s = s
}

// special case of u.F.Local = err
func (u *U) SetF_Local(err error) { //same for SetF_Remote
  u.kind = U_F
  u.f.SetLocal(err)
}

func (u *U) TagSwitch() int {
  return u.kind
}

func (u *U) TypeAssert() interface{} {
  switch u.kind {
  case U_A, U_B, U_C:
    return u.s
  case U_D, U_E:
    return u.i
  }
  return u.f
}

//in a different package

func create() pkg.U {
  var u pkg.U
  u.SetD(7)
  return u
}

func process(u pkg.U) {
  switch u.TagSwitch() {
  case U_A:
    handleA(u.TagAssertA())
  case U_B:
    handleB(u.TagAssertB())
  case U_C:
    handleC(u.TagAssertC())
  case U_D:
    handleD(u.TagAssertD())
  case U_E:
    handleE(u.TagAssertE())
  case U_F:
    switch u := u.TagAssertF(); u.TagSwitch() {
    case Fail_Local:
      log.Fatal(u.TagAssertLocal())
    case Fail_Remote:
      log.Printf("remote error %s", u.TagAssertRemote())
    }
  }
}

@jimmyfrasche

Since the union contains tags that may have the same type, wouldn't the following syntax be better suited:

func process(u pkg.U) {
  switch v := u {
  case A:
    handleA(v) //undefined here, just doing something with unboxed value
  case B:
    handleB(v)
  case C:
    handleC(v)
  case D:
    handleD(v)
  case E:
    handleE(v)
  case F:
    switch w := v {
    case Local:
      log.Fatal(w)
    case Remote:
      log.Printf("remote error %s", w)
      retry()
    } 
  }
}

The way I see it, when used with a switch, a union is quite similar to types such as int, or string. The main difference being that there are only finite 'values' that can be assigned to it, as opposed to the former types, and the switch itself is exhaustive. Thus, in this case I don't really see the need for a special syntax, reducing the mental work of the developer.

Also, under this proposal, would such code be valid:

type Foo union {
    // Completely different types, no ambiguity
    A string
    B int
}

func Bar(f Foo) {
    switch v := f {
        ....
    }
}

....

func main() {
    // No need for Bar(Foo{A: "hello world"})
    Bar("hello world")
    Bar(1)
}

@urandom I chose a syntax to reflect the semantics using analogies to existing Go syntax whenever possible.

With interface types you can do

var i someInterface = someValue //where someValue implements someInterface.
var j someInterface = i //this assignment is different from the last one.

That's fine and unambiguous since it doesn't matter what the type of someValue is as long as the contract is satisfied.

When you introduce tags† on unions it can sometimes be ambiguous. Magic assignment would only be valid in certain cases. Special casing it only gets you around having to be explicit sometimes.

I don't see a point in being able to sometimes skip a step, especially when a code change can easily invalidate that special case and then you have to go back and update all the code anyway. To use your Foo/Bar example if C int is added to Foo then Bar(1) has to change but not Bar("hello world"). It complicates everything to save a few keystrokes in situations that may not be that common and makes the concepts harder to understand because sometimes they look like this and sometimes they look that—just consult this handy flowchart to see which applies to you!

† I wish I had a better name for those. There are already struct tags. I would have called them labels but Go has those, too. Calling them fields seems both more appropriate and the most confusing. If anyone wants to bikeshed this one could really use a fresh coat.

In a sense, tagged unions are more similar to a struct than an interface. They're a special kind of struct that can only have one field set at a time. Seen in that light, your Foo/Bar example would be like saying this:

type Foo struct {
  A string
  B int
}

func Bar(f Foo) {...}

func main() {
  Bar("hello world") //same as Bar(Foo{A: "hello world", B: 0})
  Bar(1) //same as Bar(Foo{A: "", B: 1})
}

While it is unambiguous in this case, I don't think it's a good idea.

Also in the proposal Bar(Foo{1}) is allowed when it is unambiguous if you really want to save keystrokes. You can also have pointers to unions so the composite literal syntax is still necessary for &Foo{"hello world"}.

That said, unions do have a similarity to interfaces in that they have a dynamic tag of which "field" is currently set.

The switch v := u.[type] {... nicely mirrors the switch v := i.(type) {... for interfaces while still allowing type switches and assertions directly on union values. Maybe it should be u.[union] to make it easier to spot, but either way the syntax isn't that heavy and it's clear what's meant.

You could make the same argument that the .(type) is unnecessary, but when you see that you always know exactly what's happening and that fully justifies it, in my opinion.

That was my reasoning behind these choices.

@jimmyfrasche
The switch syntax seems a bit counter-intuitive to me, even after your explanations. With an interface, switch v := i.(type) {... switches through possible types, as listed by the switch cases, and indicated by .(type).
However, with a union, a switch is not switching through possible types, but values. Each case represents a different possible value, where values may in fact share the same type. This is more similar to strings and int switches, where cases also list values, and their syntax is a simple switch v := u {... . From that, to me it seems more natural that switching through the values of a union would be switch v := u { ..., since the cases are similar, but more restrictive, than the cases for ints and strings.

@urandom that's a very good point about the syntax. The truth is it's a holdover from my previous proposal without labels so it was the type then. I just blindly copied it over without thinking. Thanks for pointing it out.

switch u {... would work but the problem with switch v := u {... is it looks too much like switch v := f(); v {... (which would make error reporting more difficult—not clear which was intended).

If the union keyword were renamed to pick as has been suggested by @as then the tag switch could be written as switch u.[pick] {... or switch v := u.[pick] {... which keeps the symmetry with a type switch but loses the confusion and looks pretty nice.

Even if the implementation is switching on an int there's still implicit destructuring of the pick into dynamic tag and stored value, which I think should be explicit, regardless of grammatical rules

you know, just calling the tags fields and having it be field assert and field switch makes a great deal of sense.

edit: that would make using reflect with picks awkward, though

[Sorry for delayed response - I was away on vacation]

@ianlancetaylor wrote:

Something I don't see in your proposals is why we should do this. There is a clear disadvantage to adding a new kind of type: it's a new concept that everybody who leans Go will have to learn. What is the compensating advantage? In particular, what does the new kind of type give us that we don't get from interface types?

There are two main advantages that I see. The first is a language advantage; the second is a performance advantage.

  • when processing messages, particularly when read from a concurrent process, it is very useful to be able to know the complete set of messages that can be received, because each message can come with associated protocol requirements. For a given protocol, the number of possible message types might be very small, but when we use an open-ended interface to represent the messages, that invariant is not clear. Often people will use a different channel for each message type to avoid this, but that comes with its own costs.

  • there are times when there is a small number of known possible message types, none of which contain pointers. If we use an open-ended interface to represent them we need to incur an allocation to make interface values. Using a type that restricts the possible message types means that can be avoided and hence relieve GC pressure and increase cache locality.

A particular pain for me that sum types could solve is godoc. Take ast.Spec for example: https://golang.org/pkg/go/ast/#Spec

Many packages manually list the possible underlying types of a named interface type, so that a user can quickly get an idea without having to look at the code or rely on name suffixes or prefixes.

If the language already knows all the possible values, this could be automated in godoc much like enum types with iotas. They could also actually link to the types, as opposed to being just plaintext.

Edit: another example: https://github.com/mvdan/sh/commit/ebbfda50dfe167bee741460a4491ffec1006bdef

@mvdan that's an excellent, practical point for improving the story in Go1 without any language changes. Can you file a separate issue for that and reference this one?

Sorry, are you referring to just links to other names within the godoc page, but still listing them manually?

Sorry, should have been clearer.

I meant a feature request for automatically handling types that implement interfaces defined in the current package in godoc.

(I believe there's a feature request somewhere for linking manually listed names, but I don't have the time to hunt it down at the moment).

I don't wish to take over this (already very long) thread, so I've created a separate issue - see above.

@Merovius I'm replying to https://github.com/golang/go/issues/19814#issuecomment-298833986 in this issue since the AST stuff applies more to sum types than enums. Apologies for pulling you into a different issue.

First, I'd like to reiterate that I'm not sure if sum types belong in Go. I've yet to convince myself that they definitely do not belong. I'm working under the assumption that they do in order to explore the idea and see whether they fit. I'm willing to be convinced either way, though.

Second, you mentioned gradual code repair in your comment. Adding a new term to a sum type is by definition a breaking change, on par with adding a new method to an interface or removing a field from a struct. But this is the correct and desired behavior.

Let's consider the example of an AST, implemented with a Node interface, that adds a new kind of node. Let's say the AST is defined in an external project and you're importing that in a package in your project, which walks the AST.

There are a number of cases:

  1. Your code expects to walk every node:
    1.1. You don't have a default statement, your code is silently incorrect
    1.2. You have a default statement with a panic, your code fails at runtime instead of compile time (tests don't help because they only know about the nodes that existed when you wrote the tests)
  2. Your code only inspects a subset of node types:
    2.1. This new kind of node would not have been in the subset anyway
    2.1.1. As long as this new node never contains any of the nodes that you are interested in, everything works out
    2.1.2. Otherwise, you're in the same situation as if your code expected to walk every node
    2.2. This new kind of node would have been in the subset you're interested in, had you known about it.

With interface-based AST only case 2.1.1 works correctly. This is coincidence as much as anything. Gradual code repair doesn't work. The AST hast to bump its version and your code needs to bump its version.

An exhaustiveness linter would help but since the linter can't examine all interface types it needs to be told in some manner that a particular interface needs to be checked. That either means an in source comment or some sort of config file in your repo. If it's an in source comment, since by definition the AST is defined in a separate project you are at the mercy of that project to tag the interface for exhaustiveness checking. This only works at scale if there is a single exhaustiveness linter that the entire community agrees upon and always uses.

With a sum-based AST, you still need to use versioning. The only difference in this case is that the exhaustiveness linter is built into the compiler.

Neither helps with 2.2, but what could?

There's a simpler, AST-adjacent, case where sum types would be useful: tokens. Say you're writing a lexer for a simpler calculator. There are tokens like * that don't have any values associated with them and tokens like Var that have a string representing the name, and tokens like Val that hold a float64.

You could implement this with interfaces but it would be tiresome. You'd probably do something like this, though:

package token
type Type int
const (
  Times Type = iota
  // ...
  Var
  Val
)
type Value struct {
  Type
  Name string // only valid if Type == Var
  Number float64 // only valid if Type == Val
}

An exhaustiveness linter on iota-based enums could ensure an illegal Type is never used, but it wouldn't work too well against someone assigning to Name when Type == Times or using Number when Type == Var. As the number and kind of tokens grow it only gets worse. Really the best you could do here is add a method, Valid() error, that checks all the constraints and a bunch of documentation explaining when you can do what.

A sum type easily encodes all of those constraints and the definition would be all the documentation needed. Adding a new kind of token would be a breaking change but everything I said about AST's still applies here.

I think more tooling is necessary. I'm just not convinced it's sufficient.

@jimmyfrasche

Second, you mentioned gradual code repair in your comment. Adding a new term to a sum type is by definition a breaking change, on par with adding a new method to an interface or removing a field from a struct.

No, it is not on par. You can do both of those changes in a gradual repair model (for interfaces: 1. Add new method to all implementations, 2. Add method to interface. For struct fields: 1. Remove all usages of field, 2. Remove field). Adding a case in a sum type can not work in a gradual repair model; if you add it do the lib first, it would break all users, as they don't check exhaustively anymore, but you can't add it to the users first, because the new case doesn't exist yet. Same goes for removal.

It's not about whether or not it's a breaking change, it's about whether it's a breaking change that can be orchestrated with minimal interruption.

But this is the correct and desired behavior.

Exactly. Sum types, by their very definition and every reason people want them, are fundamentally incompatible with the idea of gradual code repair.

With interface-based AST only case 2.1.1 works correctly.

No, it also works correctly in case 1.2 (failing at runtime for unrecognized grammar is perfectly fine. I probably wouldn't want to panic, though, but just return an error) and also in a whole lot of cases of 2.1. The rest is a fundamental issue with upgrading software; if you add a new feature to a library, users of your lib need to change code to make use of it. It doesn't mean your software is incorrect until it does, though.

The AST hast to bump its version and your code needs to bump its version.

I do not see how this follows from what you are saying, at all. To me, saying "this new grammar won't work with all tools yet, but it is available to the compiler" is fine. Just as "if you run this tool on this new grammar, it will fail at runtime" is fine. At its very worst, this only adds another step to the gradual repair process: a) Add the new node to the AST package and parser. b) Fix tools using the AST package to take advantage of the new node. c) Update code to use the new node. Yes, the new node will only become usable, after a) and b) are done; but in every step of this process, without any breakages, everything will still compile and work correctly.

I'm not saying you'll be automatically fine in a world of gradual code repair and no exhaustive compiler checks. It will still require careful planning and execution, you will still probably break unmaintained reverse dependencies and there might still be changes you might not be able to do at all (though I can't think of any). But at least a) there is a gradual upgrade path and b) the decision of whether this should break your tool at runtime, or not, is up to the author of the tool. They can decide what to do in a case that is unknown.

An exhaustiveness linter would help but since the linter can't examine all interface types it needs to be told in some manner that a particular interface needs to be checked.

Why? I would argue that it's fine for switchlint™ to complain about any type-switch without a default-case; after all, you'd expect the code to work with any interface definition, so not having code in place to work with unknown implementations is likely a problem anyway. Yes, there are exceptions to this rule, but exceptions can already be manually ignored.

I'd probably be more on board with enforcing "every type-switch should require a default case, even if it's empty" in the compiler, than with actual sum types. It would both enable and force people to make the decision of what their code should do when faced with an unknown choice.

You could implement this with interfaces but it would be tiresome.

shrug it's a one-time effort in a case that very rarely comes up. Seems fine to me.

And FWIW, I'm currently only arguing against the exhaustive checking notion of sum types. I don't yet have any strong opinions about the added convenience of saying "any of these structurally defined types".

@Merovius I'm going to have to think further on your excellent points about gradual code repair. In the meantime:

exhaustiveness checks

I'm currently only arguing against the exhaustive checking notion of sum types.

You can explicitly opt out of exhaustiveness checks with a default case (well, effectively: the default makes it exhaustive by adding a case that covers "anything else, whatever that may be"). You still have a choice, but you must make it explicitly.

I would argue that it's fine for switchlint™ to complain about any type-switch without a default-case; after all, you'd expect the code to work with any interface definition, so not having code in place to work with unknown implementations is likely a problem anyway. Yes, there are exceptions to this rule, but exceptions can already be manually ignored.

That's an interesting idea. While it would hit sum types simulated with interface and enums simulated with const/iota, it doesn't tell you that you missed a known case, just that you didn't handle the unknown case. Regardless, it seems noisy. Consider:

switch {
case n < 0:
case n == 0:
case n > 0:
}

That's exhaustive if n is integral (for floats it's missing n != n) but without encoding a lot of information about types it's probably easier to just flag that as missing default. For something like:

switch {
case p[0](a, b):
case p[1](a, b):
//...
case p[N](a, b):
}

even if the p[i] form an equivalence relation on the types of a and b it's not going to be able to prove that, so it must flag the switch as missing a default case, which means a way to silence it with a manifest, an annotation in the source, a wrapper script to egrep -v out the whitelisted, or an unnecessary default on the switch which falsely implies that the p[i] are not exhaustive.

At any rate that would be trivial linter to implement if the "always complain about no default in all circumstances" route is taken. It would be interesting to do so and run it on go-corpus and see how noisy and/or useful it is in practice.

tokens

Alternate token implementations:

//Type defined as before
type SimpleToken { Type }
type StringToken { Type; Value string }
type NumberToken { Type; Value float64 }
type Interface interface {
  //some method common to all these types, maybe just token() Interface
}

That gets rid of the possibility of defining an illegal token state where something has both a string and a number value but doesn't disallow creating a StringToken with a type that should be a SimpleToken or vice versa.

To do that with interfaces you need to define one type per token (type Plus struct{}, type Mul struct{}, etc.) and most of the definitions are exactly the same exact for the type name. One time effort or not that's a lot of work (though well suited for code generation in this case).

I suppose you could have a "hierarchy" of token interfaces to partition the kinds of tokens based on the allowable values: (Assuming in this example that there is more than one kind of token that can contain a number or string, etc.)

type SimpleToken int //implements token.Interface
const (
  Plus SimpleToken = iota
  // ...
}
type NumericToken interface {
  Interface
  Value() float64
  nt() NumericToken
}
type IntToken struct { //implements NumericToken, and a FloatToken
type StringToken interface { // for Var and Func and Const, etc.
  Interface
  Value() string
  st() StringToken
}

Regardless, it means each token requires a pointer deference to access its value, unlike with the struct or sum type which only require pointers when strings are involved. So with appropriate linters and improvements to godoc the big win for sum types in this case is related to minimizing allocations while disallowing illegal states and the amount of typing (in the keyboard sense), which doesn't seem unimportant.

You can explicitly opt out of exhaustiveness checks with a default case (well, effectively: the default makes it exhaustive by adding a case that covers "anything else, whatever that may be"). You still have a choice, but you must make it explicitly.

So, it seems like either way, both of us will have the choice to opt in or out of exhaustive checking :)

it doesn't tell you that you missed a known case, just that you didn't handle the unknown case.

Effectively, I believe, the compiler already does a whole-program analysis to determine what concrete types are used in what interfaces I think? I would at least expect it to, at least for for non-interface type assertions (i.e. type-assertions that are not asserting to an interface type, but to a concrete type), generate the function tables used in interfaces at compile time.
But, honestly, this is argued from first principles, I have no idea about the actual implementation.

In any case, it should be pretty easy, to a) list any concrete type defined in a whole program and b) for any type-switch, filter them on whether they implement that interface. If you use something like this, you'd end up with a reliable list. I think.

I'm not 100% convinced that a tool can be written that is as reliable as actually explicitly stating the options, but I'm convinced that you could cover 90% of the cases and you could definitely write a tool that does this outside of the compiler, given the correct annotations (i.e. make sum-types a pragma-like comment, not an actual type). Not a great solution, admittedly.

Regardless, it seems noisy. Consider:

I think this is being unfair. The cases you are mentioning have nothing at all to do with sum-types. If I where to write such a tool, I'd restrict it to type-switches and switches with an expression, as those seem to be the way sum types would be handled too.

Alternate token implementations:

Why not a marker-method? You don't need a type-field, you get that for free from the interface representation. If you are concerned about repeating the marker-method over and over again; define an unexported struct{}, give it that marker method and embed it in each implementation, for zero extra cost and less typing per option than your method.

Regardless, it means each token requires a pointer deference to access its value

Yes. It's a real cost, but I don't think it outweighs basically any other argument.

I think this is being unfair.

That's true.

I wrote a quick-and-dirty version and ran it on the stdlib. Checking any switch statement had 1956 hits, restricting it to skip the switch { form reduced that count to 1677. I haven't inspected any of those locations to see if the result is meaningful.

https://github.com/jimmyfrasche/switchlint

There's certainly a great deal of room for improvement. It's not terribly sophisticated. Pull requests welcome.

(I'll reply to the rest later)

edit: wrong markup format

I think this is a (quite biased) summary of everything so far (and narcissistically assuming my second proposal)

Pros

  • concise, easy to write a number of constraints succinctly in a self-documenting manner
  • better control of allocations
  • easier to optimize (all possibilities known to compiler)
  • exhaustive checking (when desired, can opt out)

Cons

  • any change to the members of a sum type is a breaking change, disallowing gradual code repair unless all external packages opt out of exhaustiveness checks
  • one more thing in the language to learn, some conceptual overlap with existing features
  • garbage collector has to know which members are pointers
  • awkward for sums of the form 1 + 1 + ⋯ + 1

Alternatives

  • iota "enum" for sums of the form 1 + 1 + ⋯ + 1
  • interfaces with an unexported tag method for more complicated sums (possibly generated)
  • or struct with an iota enum and extra-linguistic rules about which fields are set dependent on the enums value

Regardless

  • better tooling, always better tooling

For gradual repair, and that is a big one, I think the only option is for external packages to opt out of the exhaustiveness checks. This does imply that it must be legal to have an "unnecessary" default case only concerned with future proofing even though you otherwise match everything else. I believe that that's implicitly true now, and if not easy enough to specify.

There could be an announcement from a package maintainer that "hey we're going to be adding a new member to this sum type in the next version, make sure you can handle it" and then a switchlint tool could find any cases that need to be opted out.

Not as straightforward as other cases, but still quite doable.

When writing a program that uses an externally defined sum type you could comment out the default to make sure you didn't miss any known cases and then uncomment it before committing. Or there could be a tool to let you know that the default is "unnecessary" which tells you that you got everything known and are future-proofed against the unknown.

Let's say we want to opt-in to exhaustiveness checking with a linter when using interface types simulating sum types, regardless of the package they're defined in.

@Merovius your betterSumType() BetterSumType trick is very cool, but it means that switches have to happen in the defining package (or you expose something like

func CallBeforeSwitches(b BetterSumType) (BetterSumType, bool) {
    if b == nil {
        return nil, false
    }
    b = b.betterSumType()
    if b == nil {
        return nil, false
    }
    return b, true
}

and also lint that that's called every time).

What are the criteria necessary to check that all switches in a program are exhaustive?

It can't be the empty interface, because then anything's game. So it needs at least one method.

If the interface has no unexported methods, any type could implement it so exhaustiveness would depend on all the packages up the callgraph of each switch. It's possible to import a package, implements its interface, and then send that value to one of the package's functions; so a switch in that function wouldn't be able to be exhaustive without creating an import cycle. So it needs at least one unexported method. (This subsumes the previous criterion).

Embedding would mess up the property we're looking for, so we need to ensure that none of the importers of the package ever embed the interface or any of the types that implement it at any point. A really fancy linter may be able to tell that sometimes embedding is okay if we never call a certain function that creates an embedded value or none of the embedded interfaces ever "escape" the API boundary of the package.

To be thorough we either need to check that the zero value of the interface is never passed around or enforce that an exhaustive switch check case nil as well. (The latter is easier but the former is preferred since including nil turns a "type A or type B or type C" sum into a "nil or type A or type B or type C" sum).

Let's say we have a linter, with all those abilities, even the optional ones, that can verify these semantics for any tree of imports and any given interface within that tree.

Now let's say we have project with a dependency D. We want to make sure an interface defined in one of the packages of D is exhaustive in our project. Let's say it does.

Now, we need to add a new dependency to our project D′. If D′ imports the package in D that defined the interface type in question but does not use this linter, it can easily destroy the invariants that need to hold for us to use exhaustive switches.

For that matter, let's say D just passed the linter coincidentally not because the maintainer runs it. An upgrade to D could just as easily destroy the invariants as D′.

Even if the linter can say "right now this is 100% exhaustive 👍" that can change without us doing anything.

An exhaustiveness checker for "iota enums" seems easier.

For all type t u where u is integral and t is used as a const with either individually specified values or iota such that the zero value for u is included among these constants.

Notes:

  • Duplicate values can be treated as aliases and ignored in this analysis. We'll assume all named constants have distinct values.
  • 1 << iota may be treated as a powerset, I believe at least most of the time, but would probably require extra conditions, especially surrounding bitwise complement. For the time being, they will not be considered

For some shorthand, let's call min(t) the constant such that for any other constant, C, min(t) <= C, and, similarly, let's call max(t) the constant such that for any other constant, C, C <= max(t).

To ensure t is used exhaustively we need to ensure that

  • values of t are always the named constants (or 0 in certain idiomatic positions, like function invocation)
  • There are no inequality comparisons of a value of t, v, outside of min(t) <= v <= max(t)
  • values of t are never used in arithmetic operations +, /, etc. A possible exception could be when the result is clamped between min(t) and max(t) immediately afterward, but that could be hard to detect in general so it may require an annotation in comments and should probably be restricted to the package that defines t.
  • switches contain all constants of t or a default case.

This still requires verifying all the packages in the import tree and can be invalidated as easily, although it is less likely to be invalidated in idiomatic code.

My understanding is that this, similar to type aliases, won't be breaking changes, so why hold it up for Go 2?

Type aliases don't introduce a new keyword, which is a definite breaking change. There also seems to be a moratorium on even minor language changes and this would be a major change. Even just retrofitting all marshal/unmarshal routines to handle reflected sum values would be a huge ordeal.

Type alias are fixing an issue for which there was no workaround for. Sum types provide a benefit in type safety, but it isn't a show stopper not having them.

Just one (minor) point in favour of something like @rogpeppe's original proposal. In package http, there's the interface type Handler and a function type that implements it, HandlerFunc. Right now, in order to pass a function to http.Handle, you explicitly have to convert it to a HandlerFunc. If http.Handle instead accepted an argument of type HandlerFunc | Handler, it could accept any function/closure assignable to HandlerFunc directly. The union effectively serves as a type hint telling the compiler how values with unnamed types can be converted to the interface type. Since HandlerFunc implements Handler, the union type would behave exactly like Handler otherwise.

@griesemer in response to your comment in the enum thread, https://github.com/golang/go/issues/19814#issuecomment-322752526, I think my proposal earlier in this thread https://github.com/golang/go/issues/19412#issuecomment-289588569 addresses the question of how sum types ("swift style enums") would have to work in Go. As much as I would like them, I don't know if they would be a necessary addition to Go, but I do think if they were added they'd have to look/operate a lot like that.

That post's not complete and there's clarifications throughout this thread, before and after, but I don't mind reiterating those points or summarizing since this thread is quite long.

If you have a sum type simultated by an interface with a type tag and absolutely cannot have it be circumvented by embedding, this is the best defense I've come up with: https://play.golang.org/p/FqdKfFojp-

@jimmyfrasche I wrote this a while back.

Another possible approach is this: https://play.golang.org/p/p2tFm984S8

@rogpeppe if you're going to use reflection why not just use reflection?

I've written up a revised version of my second proposal based on comments here and in other issues.

Notably, I've removed exhaustiveness checking. However, an external exhaustiveness checker is trivial to write for the below proposal, though I do not believe one can be written for other Go types used to simulate a sum type.

Edit: I've removed the ability to type assert on the dynamic value of a pick value. It's too magical and the reason for allowing it is served just as well by code generation.

Edit2: clarified how field names work with assertions and switches when the pick is defined in another package.

Edit3: restricted embedding and clarified implicit field names

Edit4: clarify default in switch

Pick types

A pick is a composite type syntactically similar to a struct:

pick {
  A, B S
  C, D T
  E U "a pick tag"
}

In the above, A, B, C, D, and E are the field names of the pick and S, T, and U are the respective types of those fields. Field names may be exported or unexported.

A pick may not be recursive without indirection.

Legal

type p pick {
    //...
    p *p
}

Illegal

type p pick {
    //...
    p p
}

There is no embedding for picks, but a pick may be embedded in a struct. If a pick is embedded in a struct, method on the pick are promoted to the struct but the fields of a pick are not.

A type without a field name is shorthand for defining a field with the same name as the type. (This is an error if the type is unnamed, with an exception for *T where the name is T).

For example,

type p pick {
    io.Reader
    io.Writer
    string
}

has three fields Reader, Writer, and string, with the respective types. Note that the field string is unexported even though it is in the universe scope.

A value of a pick type consists of a dynamic field and the value of that field.

The zero value of a pick type is its first field in source order and the zero value of that field.

Given two values of the same pick type, a and b, the pick value may be assigned as any other value

a = b

Assigning a non-pick value, even one of a type of one of the fields in a pick, is illegal.

A pick type only ever has one dynamic field at any given time.

The composite literal syntax is similar to structs, but there are extra restrictions. Namely, keyless literals are always invalid and only one key may be specified.

The following are valid

pick{A string; B int}{A: "string"} //value is (B, "string")
pick{A, B int}{B: 1} //value is (B, 1)
pick{A, B string}{} //value is (A, "")

The following are compile time errors:

pick{A int; B string}{A: 1, B: "string"} //a pick can only have one value at a time
pick{A int; B uint}{1} //pick composite literals must be keyed

Given a value p of type pick {A int; B string} the following assignment

p.B = "hi"

sets the dynamic field of p to B and the value of B to "hi".

Assignment to the current dynamic field updates the value of that field. Assignment that sets a new dynamic field must zero any unspecified memory locations. Assignment to a pick or struct field of a pick field updates or sets the dynamic field as necessary.

type P pick {
    A, B image.Point
}

var p P
fmt.Println(P) //{A: {0 0}}

p.A.X = 1 //A is the dynamic field, update
fmt.Println(P) //{A: {1 0}}

p.B.Y = 2 //B is not the dynamic value, create zero image.Point first
fmt.Println(P) //{B: {0 2}}

The value held in a pick can only be accessed by a field assert or field switch.

x := p.[X] //panics if X is not the dynamic field of p
x, ok := p.[X] //returns the zero value of X and false if X is not the dynamic field of p

switch v := p.[var] {
case A:
case B, C: // v is only defined in this case if fields B and C have identical type names
case D:
default: // always legal even if all fields are exhaustively listed above
}

The field names in field assertions and field switches are a property of the type, not the package it was defined in. They are not, and cannot be, qualified by the package name that defines the pick.

This is valid:

_, ok := externalPackage.ReturnsPick().[Field]

This is invalid:

_, ok := externalPackage.ReturnsPick().[externalPackage.Field]

Field assertions and field switches always return a copy of the dynamic field's value.

Unexported field names can only be asserted in their defining package.

Type assertions and type switches also work on picks.

//removed, see note at top
//v, ok := p.(fmt.Stringer) //holds if the type of the dynamic field implements fmt.Stringer
//v, ok := p.(int) //holds if the type of the dynamic field is an int

Type assertions and type switches always return a copy of the dynamic field's value.

If the pick is stored in an interface, type assertions for interfaces only match against the method set of the pick itself. [still true but redundant as the above has been removed]

If all the types of a pick support the equality operators then:

  • values of that pick may be used as map keys
  • two values of the same pick are == if they have the same dynamic field and its values are ==
  • two values with different dynamic fields are != even if the values are ==.

No other operators are supported on values of a pick type.

A value of a pick type P can be converted to another pick type Q iff the set of field names and their types in P is a subset of the field names and their types in Q.

If P and Q are defined in different packages and have unexported fields, those fields are considered different regardless of name and type.

Example:

type P pick {A int; B string}
type Q pick {B string; A int; C float64}

//legal
var p P
q := Q(p)

//illegal
var q Q
p := P(Q) //cannot handle field C

Assignability between two pick types is defined as convertibility, as long as no more than one of the types is defined.

Methods may be declared on a defined pick type.

I created (and added to the wiki) an experience report https://gist.github.com/jimmyfrasche/ba2b709cdc390585ba8c43c989797325

Edit: and :heart: to @mewmew who left a much better and more detailed report as a reply on that gist

What if we had a way to say, for a given type T, the list of types that could be converted to type T or assigned to a variable of type T? For example

type T interface{} restrict { string, error }

defines an empty interface type named T such that the only types that may be assigned to it are string or error. Any attempt to assign a value of any other type produces a compile time error. Now I can say

func FindOrFail(m map[int]string, key int) T {
    if v, ok := m[key]; ok {
        return v
    }
    return errors.New("no such key")
}

func Lookup() {
    v := FindOrFail(m, key)
    if err, ok := v.(error); ok {
        log.Fatal(err)
    }
    s := v.(string) // This type assertion must succeed.
}

What key elements of sum types (or pick types) would not be satisfied by this kind of approach?

s := v.(string) // This type assertion must succeed.

This isn't strictly true, since v could also be nil. It would take a fairly major change to the language to remove this possibility, since it would mean introducing types that don't have zero values and everything that entails. The zero value simplifies parts of the language, but also makes designing these kinds of features more difficult.

Interestingly, this approach is fairly similar to @rogpeppe's original proposal. What it doesn't have is coercion to the listed types, which could be useful in situations like I pointed out earlier (http.Handler). Another thing is that it requires each variant to be a distinct type, since variants are discriminated by type rather than a distinct tag. I think this is strictly as expressive, but some people prefer to have variant tags and types be distinct.

@ianlancetaylor

the pros

  • possible to restrict to a closed set of types—and that's definitely the main thing
  • possible to write a precise exhaustiveness checker
  • you get the "you can assign a value that satisfies the contract to this" property. (I don't care about this, but I imagine others do).

the cons

  • they're just interfaces with benefits and not really a different kind of type (nice benefits though!)
  • you still have nil so it's not really a sum type in the type theoretic sense. Whatever A + B + C you specify is really a 1 + A + B + C that you don't have a choice about. As @stevenblenkinsop pointed out while I worked on this.
  • more importantly, because of that implicit pointer you always have an indirection. With the pick proposal you can choose to have a p or *p giving you more greater control over memory trade offs. You couldn't implement them as discriminated unions (in the C sense) as an optimization.
  • no choice of zero value, which is a really nice property especially since it's very important in Go to have as useful a zero value as possible
  • presumably you couldn't define methods on T (but presumably you'd have the methods of the interface the restrict modifies but the types in the restrict would need to satisfy it? Otherwise I don't see the point of not just having type T restrict {string, error})
  • if you lose the labels for the fields/summands/what-have-you, then it gets confusing when it interacts with interface types. You lose the strong "exactly this or exactly that" property of sum types. You could put an io.Reader in and pull an io.Writer out. That makes sense for (unrestricted) interfaces but not sum types.
  • If you want to have two identical types mean different things you need to use wrapper types to disambiguate; such a tag would have to be in an outer namespace rather than confined to a type the way a struct field is
  • this may be reading too much into your specific wording, but it sounds like it changes the rules of assignability based on the type of the assignee (I'm reading it as saying you can't assign something assignable to error to T is has to be exactly an error).

That said, it does check the major boxes (the first two pros I listed) and I'd take it in a heartbeat if that's all I could get. I hope for better, though.

I assumed type assertion rules applied. So the type needs to be either identical to a concrete type or assignable to an interface type. Basically, it works exactly like an interface but any value (other than nil) must be assertable to at least one of the listed types.

@jimmyfrasche
In your updated proposal, would the following assignment be possible, if all elements of the type are of distinct types:

type p pick {
    A int
    B string
}

func Foo(P p) {
}

var P p = 42
var Q p = "foo"

Foo(42)
Foo("foo")

The usability of sum types when such assignments are possible is a lot bigger.

With the pick proposal you can choose to have a p or *p giving you more greater control over memory trade offs.

The reason interfaces allocate to store scalar values is so you you don't have to read a type word in order to decide whether the other word is a pointer; see #8405 for discussion. The same implementation considerations would likely apply for a pick type, which might mean in practice that p end up allocating and being non-local anyway.

@urandom no, given your definitions it would have to be written

var p P = P{A: 42} // p := P{A: 42}
var q P = P{B: "foo")
Foo(P{A: 42}) // or Foo({A: 42}) if types can be elided here
Foo(P{B: "foo"})

It's best to think of them as a struct that can only have one field set at a time.

If you don't have that and then you add a C uint to p what happens to p = 42?

You can make up a lot of rules based on order and assignability but they always mean that changes to the type definition can have subtle and dramatic effects on all the code using the type.

In the best case a change breaks all the code relying on the lack of ambiguity and says you need to change it to p = int(42) or p = uint(42) before it compiles again. A one line change shouldn't require fixing a hundred lines. Especially if those lines are in packages of people depending on your code.

You either have to be 100% explicit or have a very fragile type that no one can touch because it might break everything.

This applies to any sum type proposal but if there are explicit labels you still have assignability because the label is explicit about which type is being assigned to.

@josharian so if I'm reading that correctly the reason iface is now always (*type, *value) instead of stashing word-sized values in the second field as Go did previously is so that the concurrent GC doesn't need to inspect both fields to see whether the second is a pointer—it can just assume that it always is. Did I get that right?

In other words, if the pick type were implemented (using C notation) like

struct {
    int which;
    union {
         A a;
         B b;
         C c;
    } summands;
}

the GC would need to take a lock (or something fancy but equivalent) to inspect which to determine if summands needed to be scanned?

the reason iface is now always (*type, *value) instead of stashing word-sized values in the second field as Go did previously is so that the concurrent GC doesn't need to inspect both fields to see whether the second is a pointer—it can just assume that it always is.

That's right.

Of course, the limited nature of pick types would allow some alternative implementations. The pick type could be laid out such that there's always a consistent pattern of pointer/non-pointer; e.g. all scalar types can overlap, and a string field could overlap with the beginning of a slice field (because both start "pointer, non-pointer"). So

pick {
  a uintptr
  b string
  c []byte
}

could be laid out roughly:

[ word 1 (ptr) ] [ word 2 (non-ptr) ] [ word 3 (non-ptr) ]
[    <nil>         ] [                 a           ] [                              ]
[       b.ptr      ] [            b.len          ] [                              ]
[       c.ptr      ] [             c.len         ] [        c.cap             ]

but other pick types might not allow such optimal packing. (Sorry about the broken ASCII, I can't seem to make GitHub render it right. You get the point, I hope.)

This ability to do static layout might even be a performance argument in favor of including pick types; my goal here is simply to flag relevant implementation details for you.

@josharian and thank you for doing so. I hadn't thought of that (in all honestly I just googled if there existed research on how to GC discriminated unions, saw that yes you can do that and called it a day—for some reason my brain didn't associate "concurrency" with "Go" that day: facepalm!).

There would be less choice if one of the types a defined struct which already had a layout.

One option would be to not "compact" the summands if they contain pointers meaning that the size would be the same as the equivalent struct (+ 1 for the discriminator int). Maybe taking a hybrid approach, when possible, so that all the types that can share layout do.

It would be a shame to lose the nice size properties but that really is just an optimization.

Even if it were always 1 + the size of an equivalent struct even when they contained no pointers, it would still have all the other nice properties of the type itself, including control over allocations. Additional optimizations could be added over time and would at least be possible as you point out.

type p pick {
    A int
    B string
}

Do A and B need to be there? A pick picks from a set of types, so why not throw out their identifier names completely:

type p pick {
    int
    string
}
q := p{string: "hello"}

I believe this form is already valid for struct. There can be a constraint that it is required for pick.

@as if the field name is omitted it is the same as the type so your example works, but since those field names are unexported they could only be set/accessed from within the defining package.

The field names do need to be there, even if implicitly generated based on the type name, or there are bad interactions with assignability and interface types. The field names are what makes it work with the rest of Go.

@as apologies, I just realized you meant something different than what I read.

Your formulation works but then you have things that look like struct fields but behave differently because of the usual exported/unexported thing.

Is string accessible from outside the package defining p because it's in the universe?

What about

type t struct {}
type P pick {
  t
  //other stuff
}

?

By separating the field name from the type name you can do things like

pick {
  unexported Exported
  Exported unexported
}

or even

pick { Recoverable, Fatal error }

If pick fields behave like struct fields you can use a lot of what you already know about struct fields to think about pick fields. The only real difference is that only one can pick field can be set at a time.

@jimmyfrasche
Go already supports embedding anonymous types inside structs, so the restriction of scope is one that already exists in the language, and I believe that problem is being solved by type aliases. But admit I haven't thought of every possible use case. It seems to hinge on whether this idiom is common in Go:

package p
type T struct{
    Exported t
}
type t struct{}

The small _t_ exists in a package where it's embedded in large T, and it's only exposure is through such exported types.

@as

I'm not sure I entirely follow, however:

//with the option to have field names
pick { //T is in the namespace of the pick and the type isn't exposed to other packages
  T t
  //...
}

//without
type T = t //T has to be defined in the outer scope and now t is exposed to other packages
pick {
  T
  //...
}

Also, if you only had the type name for the label, to include say a []string you'd need to do a type Strings = []string.

That is very much the way I want to see pick types implemented. In
particular, it is how Rust and C++ (the gold standards for performance) do
it.

If I just wanted exhaustiveness checking, I could use a checker. I want
the performance win. That means that pick types cannot be nil, either.

Taking the address of a member of a pick element should not be allowed (it
is not memory safe, even in the single-threaded case, as is well-known in
the Rust community.). If that requires other restrictions on a pick type,
then so be it. But for me having pick types always allocate on the heap
would be bad.

On Aug 18, 2017 12:01 PM, "jimmyfrasche" notifications@github.com wrote:

@josharian https://github.com/josharian so if I'm reading that correctly
the reason iface is now always (*type, *value) instead of stashing
word-sized values in the second field as Go did previously is so that the
concurrent GC doesn't need to inspect both fields to see whether the second
is a pointer—it can just assume that it always is. Did I get that right?

In other words, if the pick type were implemented (using C notation) like

struct {
int which;
union {
A a;
B b;
C c;
} summands;
}

the GC would need to take a lock (or something fancy but equivalent) to
inspect which to determine if summands needed to be scanned?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/19412#issuecomment-323393003, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGGWB3Ayi31dYwotewcfgmCQL-XVrfxIks5sZbVrgaJpZM4MTmSr
.

@DemiMarie

Taking the address of a member of a pick element should not be allowed (it is not memory safe, even in the single-threaded case, as is well-known in the Rust community.). If that requires other restrictions on a pick type, then so be it.

That's a good point. I had that in there but it must have got lost in an edit. I did include that when you access the value from a pick it always returns a copy for that same reason though.

As an example of why that's true, for posterity, consider

v := pick{ A int; B bool }{A: 5}
p := &v.[A] //(this would be illegal but pretending it's not for a second)
v.B = true

If v is optimized so that the fields A and B take up the same position in memory then p isn't pointing to an int: it's pointing to a bool. Memory safety was violated.

@jimmyfrasche

The second reason you wouldn't want the contents to be addressable is mutation semantics. If the value is stored indirectly under certain circumstances, then

v := pick{ A int; ... }{A: 5}
v2 := v

v2.[A] = 6 // (this would be illegal under the proposal, but would be 
           // permitted if `v2.[A]` were addressable)

fmt.Println(v.[A]) // would print 6 if the contents of the pick are stored indirectly

One place where pick is similar to interfaces is that you want to retain value semantics if you store values in it. If you might need indirection as an implementation detail, the only option is to make the contents not addressable (or more precisely, mutably addressable, but the distinction doesn't exist in Go at present), so that you can't observe the aliasing.

Edit: Oops (see below)

@jimmyfrasche

The zero value of a pick type is its first field in source order and the zero value of that field.

Note that this wouldn't work if the first field needs to be stored indirectly, unless you special case the zero value so that v.[A] and v.(error) do the right thing.

@stevenblenkinsop I'm not sure what you mean by "the first field needs to be stored indirectly". I assume you mean if the first field is a pointer or a type that implicitly contains a pointer. If so, there's an example below. If not, could you please clarify?

Given

var p pick { A error; B int }

the zero value, p, has dynamic field A and the value of A is nil.

I wasn't referring to the value stored in the pick being/containing a pointer, I was referring to a non-pointer value being stored indirectly due to layout constraints imposed by the garbage collector, as described by @josharian.

In your example, p.B—not being a pointer—would not be able to share overlapping storage with p.A, which comprises two pointers. It would most likely have to be stored indirectly (i.e. be represented as a *int that automatically gets dereferenced when you access it, rather than as an int). If p.B were the first field, the zero value of the pick would be new(int), which isn't an acceptable zero value since it requires initialization. You'd need to special case it so that a nil *int is treated as a new(int).

@jimmyfrasche
Oh sorry. Going back over the conversation, I realized you were considering using adjacent storage to store variants with incompatible layouts, rather than copying the interface mechanism of indirect storage of non-pointer types. My last three comments don't make sense in that case.

Edit: whoops, race condition. Posted then saw your comment.

@stevenblenkinsop ah, okay I see what you mean. But that's not a problem.

Sharing overlapping storage is an optimization. It could never do that: the semantics of the type are the important bit.

If the compiler can optimize the storage and chooses to do so, it's a nice bonus.

In your example, the compiler could store it exactly as it would the equivalent struct (adding a tag to know which is the active field). This would be

struct {
  which_field int // 0 = A, 1 = B
  A error
  B int
}

The zero value is still all bytes 0 and there's no need to surreptitiously allocate as a special case.

The important part is ensuring that only one field is in play at a given time.

The motivation for allowing type assertions/switches on picks was so that, for example, if every type in the pick satisfied fmt.Stringer you could write a method on the pick like

func (p P) String() string {
  return p.(fmt.Stringer).String()
}

But since the types of pick fields can be interfaces this creates a subtlety.

If the pick P in the previous example had a field whose type is itself fmt.Stringer that String method would panic if that was the dynamic field and its value is nil. You cannot type assert a nil interface to anything, not even itself. https://play.golang.org/p/HMYglwyVbl While this has always been true, it just doesn't come up regularly, but it could come up more regularly with picks.

However, the closed nature of sum types would allow an exhaustiveness linter to find everywhere this would come up (potentially with some false positives) and report the case that needs to be handled.

It would also be surprising, if you can implement methods on the pick, that those methods aren't used to satisfy a type assertion.

type Num pick { A int; B float32 }

func (n Num) String() string {
      switch v := n.[var] {
      case A:
          return fmt.Sprint(v)
      case B:
          return fmt.Sprint(v)
      }
}
...
n := Num{A: 5}
s1, ok := p.(fmt.Stringer) // ok == false
var i interface{} = p
s2, ok := i.(fmt.Stringer) // ok == true

You could have the type assertion promote methods from the current field if they satisfy the interface, but this runs into its own issues, such as whether to promote methods from a value in an interface field which aren't defined on the interface itself (or even how to implement this efficiently). Also, one might then expect methods common to all fields to be promoted to the pick itself, but then they'd have to be dispatched via variant selection on each call, in addition potentially to a virtual dispatch if the pick is stored in an interface, and/or to a virtual dispatch if the field is an interface.

Edit: By the way, optimally packing a pick is an instance of the shortest common superstring problem, which is NP-complete, though there are greedy approximations that are commonly used.

The rule is if it's a pick value the type assertion asserts on the dynamic field of the pick value, but if the pick value is stored in an interface the type assertion is on the method set of the pick type. It may be surprising at first but it's fairly consistent.

It wouldn't be a problem to just drop allowing type assertions on a pick value. It would be a shame though since it does make it very easy to promote methods that all the types in the pick share without having to write out all the cases or use reflection.

Though, it would be fairly easy to use code generation to write the

func (p Pick) String() string {
  switch v := p.[var] {
  case A:
    return v.String()
  case B:
    return v.String()
  //etc
  }
}

Just went ahead and dropped type assertions. Maybe they should be added but they're not a necessary part of the proposal.

I want to come back to @ianlancetaylor's previous comment, because I've got some new perspective on it after thinking some more about error-handling (specifically, https://github.com/golang/go/issues/21161#issuecomment-320294933).

In particular, what does the new kind of type give us that we don't get from interface types?

As I see it, the major advantage of sum-types is that they would allow us to distinguish between returning multiple values, and returning one of several values — particularly when one of those values is an instance of the error interface.

We currently have a lot of functions of the form

func F(…) (T, error) {
    …
}

Some of them, such as io.Reader.Read and io.Reader.Write, return a T along with an error, whereas others return either a T or an error but never both. For the former style of API, ignoring the T in case of error is often a bug (e.g., if the error is io.EOF); for the latter style, returning a nonzero T is the bug.

Automated tools, including lint, can check usage of specific functions to ensure that the value is (or is not) ignored correctly when the error is non-nil, but such checks do not naturally extend to arbitrary functions.

For example, proto.Marshal is intended to be the "value and error" style if the error is a RequiredNotSetError, but seems to be the "value or error" style otherwise. Because the type system does not distinguish between the two, it is easy to accidentally introduce regressions: either not returning a value when we should, or returning a value when we should not. And implementations of proto.Marshaler further complicate the matter.

On the other hand, if we could express the type as a union, we could be much more explicit about it:

type PartialMarshal struct {
    Data []byte // The marshalled value, ignoring unset required fields.
    MissingFields []string
}

func Marshal(pb Message) []byte | PartialMarshal | error

@ianlancetaylor, I've been playing around with your proposal on paper. Can you let me know if anything below is incorrect?

Given

var r interface{} restrict { uint, int } = 1

the dynamic type of r is int, and

var _ interface{} restrict { uint32, int32 } = 1

is illegal.

Given

type R interface{} restrict { struct { n int }, etc }
type S struct { n int }

then var _ R = S{} would be illegal.

But given

type R interface{} restrict { int, error }
type A interface {
  error
  foo()
}
type C struct { error }
func (C) foo() {}

both var _ R = C{} and var _ R = A(C{}) would be legal.

Both

interface{} restrict { io.Reader, io.Writer }

and

interface{} restrict { io.Reader, io.Writer, *bytes.Buffer }

are equivalent.

Likewise,

interface{} restrict { error, net.Error }

is equivalent to

interface { Error() string }

Given

type IO interface{} restrict { io.Reader, io.Writer }
type R interface{} restrict {
  interface{} restrict { int, uint },
  IO,
}

then the underlying type of R is equivalent to

interface{} restrict { io.Writer, uint, io.Reader, int }

Edit: small correction in italics

@jimmyfrasche I wouldn't go so far as to say that what I wrote above was a proposal. It was more like an idea. I would have to think about your comments, but at first glance they look plausible.

@jimmyfrasche's proposal is pretty much how I would intuitively expect a pick type to behave in Go. I think it's especially worth noting that his proposal to use the zero value of the first field for the zero value of the pick is intuitive with the "zero value means zeroing-out the bytes", provided the tag values begin at zero (maybe this has already been noted; this thread is very long by now...). I also like the performance implications (no unnecessary allocs), and that picks are completely orthogonal to interfaces (no surprising behavior switching on an pick that contains an interface).

The only thing I would consider changing is mutating the tag: foo.X = 0 seems like it could be foo = Foo{X: 0}; a few more characters, but more explicit that it's resetting the tag and zeroing the value. This is a minor point, and I'd still be very happy if his proposal was accepted as-is.

@ns-cweber thank you but I can't take credit for the zero value behavior. The ideas been floating around for a while and was in @rogpeppe's proposal that came earlier in this (as you point out quite long) thread. My justification was the same as the one you gave.

As far as foo.X = 0 vs foo = Foo{X: 0}, my proposal allows both, actually. The latter is useful if that field of a pick is a struct so you can do foo.X.Y = 0 instead of foo = Foo{X: image.Point{X: foo.[X].X, 0}} which in addition to being verbose could fail at runtime.

I also think it helps to keep it as such because it reinforces the elevator pitch for its semantics: It's a struct that can only have one field set a time.

One thing that may block it from being accepted as-is is how embedding a pick in a struct would work. I realized the other day I glossed over the various effects that would have on using the struct. I think it's repairable but not entirely sure what the best repairs are. The simplest would be that it only inherits the methods and you have to directly refer to the embedded pick by name to get to its fields and I'm leaning toward that in order to avoid a struct having both struct fields and pick fields.

@jimmyfrasche Thanks for correcting me about the zero-value behavior. I agree that your proposal allows for both mutators, and I think your elevator pitch point is a good one. Your explanation for your proposal makes sense, though I could see myself setting foo.X.Y, not realizing it would automatically change the pick field. I would still be positively gleeful if your proposal succeeded, even with that slight reservation.

Lastly, your simple proposal for pick embedding seems like the one I would intuit. Even if we change our minds, we can go from the simple proposal to the complex proposal without breaking existing code, but the reverse isn't true.

@ns-cweber

I could see myself setting foo.X.Y, not realizing it would automatically change the pick field

That's a fair point, but you could make it about a lot of things in the language, or any language, for that matter. In general, Go has safety rails but not safety scissors.

There are a lot of big things it generally protects you from, if you don't go out of your way to subvert them, but you still have to know what you're doing.

That can be annoying when you make a mistake like this but, otoh, it's not that much different than "I set bar.X = 0 but I meant to set bar.Y = 0" as the hypothetical relies on you not realizing that foo is a pick type.

Similarly i.Foo(), p.Foo(), and v.Foo() all look the same but if i is a nil interface, p is a nil pointer and Foo doesn't handle that case, the first two could panic whereas if v uses a value method receiver it could not (at least not from the invocation itself, anyway).

As far as embedding, good point about it being easy to loosen later so I just went ahead and edited the proposal.

Sum types often have a valueless field. For example, in the database/sql package, we have:

type NullString struct {
    String string
    Valid  bool // Valid is true if String is not NULL
}

If we had sum types / picks / unions, this might be expressed as:

type NullString pick {
  Null   struct{}
  String string
}

A sum type has obvious advantages over a struct in this case. I think this is a common enough use that it would be worth including as an example in any proposal.

Bikeshedding (sorry), I'd argue this is worth syntactic support and inconsistency with struct field embedding syntax:

type NullString union {
  Null
  String string
}

@neild

Hitting the last point first: As a last minute change before I posted (not strictly required in any sense), I added that if there's a named type (or a pointer to a named type) with no field name the pick creates an implicit field with the same name as the type. That may not be the best idea but it seemed like it would cover one of the common case of "any of these types" without a lot of fuss. Given that your last example could be written:

type Null = struct{} //though this puts Null in the same scope as NullString
type NullString pick {
  Null
  String string
}

Back to your main point, though, yes, that's an excellent use. In fact, you can use it to build enums: type Stoplight pick { Stop, Slow, Go struct{} }. This would be much like a const/iota faux-enum. It would even compile down to the same output. The major benefit in this case is that the number representing the state is entirely encapsulated and you could not put in any state other than those three listed.

Unfortunately, there is somewhat awkward syntax for creating and setting values of Stoplight that is exacerbated in this case:

light := Stoplight{Slow: struct{}{}}
light.Go = struct{}{}

Allowing {} or _ to be shorthand for struct{}{}, as proposed elsewhere, would help.

Many languages, especially functional languages, get around this by putting the labels in the same scope as the type. This creates a lot of complexity and would disallow two picks defined in the same scope to share field names.

However, it is easy to work around this with a code generator that creates a func with the same name of each field in the pick that takes the field's type as an argument. If it also, as a special case, took no arguments if the type was zero sized, then the output for the Stoplight example would look like this

func Stop() Stoplight {
  return Stoplight{Stop: struct{}{}}
}
func Slow() Stoplight {
  return Stoplight{Slow: struct{}{}}
}
func Go() Stoplight {
  return Stoplight{Go: struct{}{}}
}

and for your NullString example it would look like this:

func Null() NullString {
  return NullString{Null: struct{}{}}
}
func String(s string) NullString {
  return NullString{String: s}
}

It's not pretty, but it's a go generate away and likely very easily inlined.

That wouldn't work in the case where it created implicit fields based on the type names (unless the types were from other packages) or it was run on two picks in the same package that shared field names, but that's fine. The proposal doesn't do everything out of the box but it enables a lot of things and gives the programmer the flexibility to decide what's best for a given situation.

More syntax bikeshedding:

type NullString union {
  Null
  Value string
}

var _ = NullString{Null}
var _ = NullString{Value: "some value"}
var _ = NullString{Value} // equivalent to NullString{Value: ""}.

Concretely, a literal with an element list that contains no keys is interpreted as naming the field to set.

This would be syntactically inconsistent with other uses of composite literals. On the other hand, it's a usage that seems sensible and intuitive in the context of union/pick/sum types (to me at least), since there's no sensible interpretation of a union initializer without a key.

@neild

This would be syntactically inconsistent with other uses of composite literals.

That seems like a huge negative to me, though it does make sense in the context.

Also note that

var ns NullString // == NullString{Null: struct{}{}} == NullString{}
ns.String = "" // == NullString{String: ""}

For dealing with struct{}{} when I'm using a map[T]struct{} I throw

var set struct{}

somewhere and use theMap[k] = set, Similar would work with picks

Further bikeshedding: the empty type (in the context of sum types) is conventionally named "unit", not "null".

@bcmills Sorta.

In functional languages, when you create a sum type, its labels are actually functions that create the values of that type, (albeit special functions known as "type constructors" or "tycons" that the compiler know about to allow pattern matching), so

data Bool = False | True

creates the data type Bool and two functions in the same scope, True and False, each with the signature () -> Bool.

Here () is how you write the type pronounced unit—the type with only a single value. In Go this type can be written many different ways but it is idiomatically written as struct{}.

So the type of the constructor's argument would be called unit. The convention for the name of the constructor is generally None when used as an option type like this, but it can be changed to suit the domain. Null would be a fine name if the value were coming from a database, for example.

@bcmills

As I see it, the major advantage of sum-types is that they would allow us to distinguish between returning multiple values, and returning one of several values — particularly when one of those values is an instance of the error interface.

For an alternative perspective, I see this as a major disadvantage of sum types in Go.

Many languages of course use sum types for exactly the case of returning some value or an error, and this works well for them. If sum types were added to Go, there would be a great temptation to use them in the same way.

However, Go already has a large ecosystem of code which uses multiple values for this purpose. If new code uses sum types to return (value, error) tuples, then that ecosystem will become fragmented. Some authors will continue to use multiple returns for consistency with their existing code; some authors will use sum types; some will attempt to convert their existing APIs. Authors stuck on older Go versions, for whatever reason, will be locked out of new APIs. It'll be a mess, and I don't think the gains will begin to be worth the costs.

If new code uses sum types to return (value, error) tuples, then that ecosystem will become fragmented.

If we add sum types in Go 2 and use them uniformly, then the problem reduces to one of migration, not fragmentation: it would need to be possible to convert a Go 1 (value, error) API to a Go 2 (value | error) API and vice-versa, but they could be distinct types in the Go 2 parts of the program.

If we add sum types in Go 2 and use them uniformly

Note that this is a proposal that is quite different than the ones seen here so far: The standard library will need to be extensively refactored, the translation between API styles will need to be defined, etc. Go this route and this becomes a quite large and complicated proposal for an API transition with a minor codicil regarding the design of sum types.

The intent is for Go 1 and Go 2 to be able to coexist seamlessly in the same project, so I don't think the concern is that someone might be stuck with a Go 1 compiler "for some reason" and be unable to use a Go 2 library. However, if you have dependency A that depends in turn on B, and B updates to use a new feature like pick in its API, then that would break dependency A unless it updates to use the new version of B. A could just vendor B and keep using the old version, but if the old version isn't being maintained for security bugs, etc... or if you need to use the new version of B directly and you can't have two versions in your project for some reason, that could create a problem.

Ultimately, the problem here has little to do with language versions, and more to do with changing the signatures of existing exported functions. The fact that it would be a new feature providing the impetus is a bit of distraction from that. If the intent is to allow existing APIs to be changed to use pick without breaking backwards compatibility, then there might need to be a bridge syntax of some sort. For example (completely as a strawman):

type ReadResult pick(N int, Err error) {
    N
    PartialResult struct { N; Err }
    Err
}

The compiler could just splat the ReadResult when it is accessed by legacy code, using zero values if a field isn't present in a particular variant. I'm not sure how to go the other way or whether it's worth it. APIs like template.Must might just have to keep accepting multiple values rather than a pick and rely on splatting to make up the difference. Or something like this could be used:

type ReadResult pick(N int, Err error) {
case Err == nil:
    N
default:
    PartialResult struct { N; Err }
case N == 0:
    Err
}

This does complicate things, but I can see how introducing a feature which changes how APIs should be written requires a story for how to transition without breaking the world. Maybe there's a way to do it that doesn't require a bridge syntax.

It's trivial to go from sum types to product types (structs, multiple return values) — just set everything that isn't the value to zero. Going from product types to sum types is not well defined in general.

If an API wants to transition seamlessly and gradually from a product type based implementation to a sum type based one, the easiest route would be to have two versions of everything necessary where the sum type version has the actual implementation and the product type version calls the sum type version, doing any runtime checking requires and any projection down into product space.

That's really abstract so here's an example

version 1 without sums

func Take(i interface{}) error {
  switch i.(type) {
  case int: //do something
  case string:
  default: return fmt.Errorf("invalid %T", i)
  }
}
func Give() (interface{}, error) {
   i := f() //something
   if i == nil {
     return nil, errors.New("whoops v:)v")
  }
  return i
}

version 2 with sums

type Value pick {
  I int
  S string
}
func TakeSum(v Value) {
  // do something
}
// Deprecated: use TakeSum
func Take(i interface{}) error {
  switch x := i.(type) {
  case int: TakeSum(Value{I: x})
  case string: TakeSum(Value{S: x})
  default: return fmt.Errorf("invalid %T", i)
  }
}
type ErrValue pick {
  Value
  Err error
}
func GiveSum() ErrValue { //though honestly (Value, error) is fine
  return f()
}
// Deprecated: use GiveSum
func Give() (interface{}, error) {
  switch v := GiveSum().(var) {
  case Value:
    switch v := v.(var) {
    case I: return v, nil
    case S: return v, nil
    }
  case Err:
    return nil, v
  }
}

version 3 would remove Give/Take

version 4 would move the implementation of GiveSum/TakeSum to Give/Take, make GiveSum/TakeSum just call Give/Take and deprecate GiveSum/TakeSum.

version 5 would remove GiveSum/TakeSum

It's not pretty or fast but it's the same as any other large scale disruption of similar nature and requires nothing extra from the language

I think (most of) the utility of a sum type could be realized with a mechanism to constrain assignment to a type of type interface{} at compile time.

In my dreams it looks like:

type T1 switch {T2,T3} // only nil, T2 and T3 may be assigned to T1
type T2 struct{}
type U switch {} // only nil may be assigned to U
type V switch{interface{} /* ,... */} // V equivalent to interface{}
type Invalid switch {T2,T2} // only uniquely named types
type T3 switch {int,uint} // switches can contain switches but... 

...it would also be a compile-time error to assert a switch type is a type not explicitly defined:

var t1 T1
i,ok := t1.(int) // T1 can't be int, only T2 or T3 (but T3 could be int)
switch t := t1.(type) {
    case int: // compile error, T1 is just nil, T2 or T3
}

and go vet would carp about ambiguous constant assignments to types like T3 but for all intents and purposes (at runtime) var x T3 = 32 would be var x interface{} = 32. Maybe some predefined switch types for builtins in a package named something like switches or ponies would be groovy too.

@j7b, @ianlancetaylor offered a similar idea in https://github.com/golang/go/issues/19412#issuecomment-323256891

I posted what I believe would be the logical consequences of this later at https://github.com/golang/go/issues/19412#issuecomment-325048452

It looks like much of them would apply equally given the similarity.

It would be really great if something like that would work. It would be easy to transition from interfaces to interfaces+restrictions (especially with Ian's syntax: just tack the restrict on the end of existing pseudo-sums built with interfaces). It would be easy to implement since at runtime they'd essentially be identical to interfaces and most of the work would just be having the compiler emit additional errors when their invariants are broken.

But I don't think it's possible to make it work.

Everything lines up so close that it looks like a fit, but you zoom in and it just isn't quite right, so you give it a little push and then something else pops out of alignment. You can try to repair it but then you get something that looks a lot like interfaces but behaves differently in weird cases.

Maybe I'm missing something.

There's nothing wrong with the restricted interface proposal as long as you're okay with the cases not necessarily being disjoint. I don't think it's as surprising as you do that a union between two interface types (like io.Reader / io.Writer) isn't disjoint. It's entirely consistent with the fact that you can't determine whether a value assigned to an interface{} had been stored as an io.Reader or an io.Writer if it implements both. The fact that you can construct a disjoint union as long as each case is a concrete type seems perfectly adequate.

The tradeoff is that, if unions are restricted interfaces, then you can't define methods directly on them. And if they're restricted interface types, you don't get the guaranteed direct storage which pick types provide. Whether it's worthwhile adding a distinct kind of thing to the language to get these additional benefits, I'm not sure.

@jimmyfrasche for type T switch {io.Reader,io.Writer} it's fine to assign a ReadWriter to T but you can only assert T is an io.Reader or Io.Writer, you'd need another assertion to assert the io.Reader or io.Writer is a ReadWriter, which should encourage adding it to the switchtype if it's a useful assertion.

@stevenblenkinsop You could define the pick proposal without methods. In fact, if you get rid of methods and implicit field names, then you could allow pick embedding. (Though clearly I think methods and, to a much lesser degree implicit field names, are the more useful trade off there).

And, on the other hand, @ianlancetaylor's syntax would allow

type IR interface {
  Foo()
  Bar()
} restrict { A, B, C }

which would compile as long as A, B, and C each have Foo and Bar methods (though you would have to worry about nil values).

edit: clarification in italics

I think some form of _restricted interface_ would be useful, but I disagree with the syntax. Here is what I am suggesting. It acts in a similar way as an algebraic data type, which groups domain-related objects that do not necessarily have common behavior.

//MyGroup can be any of these. It can contain other groups, interfaces, structs, or primitive types
type MyGroup group {
   MyOtherGroup
   MyInterface
   MyStruct
   int
   string
   //..possibly some other types as well
}

//type definitions..
type MyInterface interface{}
type MyStruct struct{}
//etc..

func DoWork(item MyGroup) {
   switch t:=item.(type) {
      //do work here..
   }
}

There are several benefits of this approach over the conventional empty interface interface{} approach:

  • static type checking when the function is used
  • user can infer what type of argument is required from the function signature alone, without having to look at the function implementation

Empty interface interface{} is useful when the number of types involved is unknown. You really have no choice here but to rely on runtime verification. On the other hand, when the number of types is limited and known during compile time, why not get the compiler to assist us?

@henryas I think a more useful comparison would be the currently recommended way to do (open) sum types: Non-empty interfaces (if no clear interface can be distilled, using unexported marker functions).
I don't think your arguments apply to that in a significant way.

Here's an experience report in regards to Go protobufs:

  • The proto2 syntax allows for "optional" fields, which are types where there is distinction between the zero value and an unset value. The current solution is to use a pointer (e.g., *int), where a nil pointer indicates unset, while a set pointer points to the actual value. The desire is an approach that allows making a distinction between zero and unset possible, without complicating the common case of only needing to access the value (where zero value is fine if unset).

    • This is non-performant due to an extra allocation (although unions may suffer from the same fate depending on implementation).
    • This is painful for users because the need to constantly check the pointer hurts readibility (although non-zero default values in protos may mean the need to check is a good thing...).
  • The proto language allows for "one ofs", which are proto's version of sum types. The approach currently taken is as follows (gross example):

    • Define an interface type with a hidden method (e.g., type Communique_Union interface { isCommunique_Union() })
    • For each of the possible Go types allowed in the union, define a wrapper struct, who's only purpose is to wrap each allowed type (e.g., type Communique_Number struct { Number int32 }) where each type has the isCommunique_Union method.
    • This is also non-performant as the wrappers cause an allocation. A sum type would help since we know that the largest value (a slice) would occupy no more than 24B.

@henryas I think a more useful comparison would be the currently recommended way to do (open) sum types: Non-empty interfaces (if no clear interface can be distilled, using unexported marker functions).
I don't think your arguments apply to that in a significant way.

You mean by adding a dummy unexported method to an object so that the object can be passed as an interface, as follows?

type MyInterface interface {
   belongToMyInterface() //dummy method definition
}

type MyObject struct{}
func (MyObject) belongToMyInterface(){} //dummy method

I don't think that should be recommended at all. It's more like a workaround rather than a solution. I personally would rather forgo static type verification rather than having empty methods and unnecessary method definition lying around.

These are the problems with the _dummy method_ approach:

  • Unnecessary methods and method definitions cluttering the object and the interface.
  • Every time a new _group_ is added, you need to modify the object's implementation (eg. adding dummy methods). This is wrong (see the next point).
  • Algebraic data type (or grouping based on _domain_ rather than behavior) is domain-specific. Depending on the domain, you may need to view object relationship differently. An accountant groups documents differently from a warehouse manager. This grouping concerns the consumer of the object, and not the object itself. The object doesn't need to know anything about the consumer's problem, and it shouldn't need to. Does an Invoice need to know anything about accounting? If it doesn't, then why does an Invoice needs to change its implementation _(eg. adding new dummy methods)_ every time there is a change in the accounting rule _(eg. applying new document grouping)_? By using the _dummy method_ approach, you couple your object to the consumer's domain and make significant assumption about the consumer's domain. You shouldn't need to do this. This is even worse than the empty interface interface{} approach. There are better approaches available.

@henryas

I don't see your third point as a strong argument. If the accountant wants to view object relationships differently the accountant can create their own interface that fits their specification. Adding a private method to an interface doesn't mean the concrete types that satisfy it are incompatible with subsets of the interface defined elsewhere.

The Go parser makes heavy use of this technique and honestly I can't imagine picks making that package so much better that it warrants implementing picks in the language.

@as My point is that every time a new _relationship view_ is created, the relevant concrete objects must be updated to make certain accommodation for this view. It seems wrong, because in order to do that, the objects must often make a certain assumption about the consumer's domain. If the objects and the consumers are closely related or live within the same domain, such as in Go parser case, it may not matter much. However, if the objects provide basic functionalities that are to be consumed by several other domains, it becomes a problem. The objects now needs to know a little bit about all the other domains for the _dummy method_ approach to work.

You end up with many empty methods attached to the objects, and it isn't obvious to the readers why you need those methods because the interfaces that require them live in a separate domain/package/layer.

The point that the open-sums-via-interfaces approach doesn't let you easily¹ use sums is fair enough. Explicit sum-types obviously would make it easier to have sums. It's a very different argument than "sum types give you type-safety", though - you can still get type-safety today, if you need it.

I still see two downsides of closed sums as implemented in other languages though: One, the difficulty of evolving them in a large-scale distributed development process. And Two, that I think that they add power to the type-system and I like that Go does not have a very powerful type-system, as that discourages coding types and instead code programs - when I feel that a problem can benefit from a more powerful type-system, I move to a more powerful language (like Haskell or Rust).

That being said, at least the second one is definitely one of preference and even if you'd agree, whether the downsides are considered to outweigh the upsides is also up to personal preference. Just wanted to point out, that you can't get type-safe sums without closed sum types isn't really true :)

[1] notably, it's not easy, but still possible, e.g. you can do

type Node interface {
    node()
}

type Foo struct {
    bar.Baz
}

func (foo) node() {}

@Merovius
I disagree with your second downside point. The fact that there are plenty of places in the standard library that would immensely benefit from sum types, but are now implemented using empty interfaces and panics, shows that this lacks is hurting coding. Of course, people might say that since such code has been written in the first place, there is no problem and we don't need sum types, but the folly of that logic is that we then wouldn't need any other type for function signatures, and we should just use empty interfaces instead.

As for using interfaces with some method to represent sum types right now, there's one big drawback. You don't know what types you can use for that interface, since they are implemented implicitly. With proper sum type, the type itself described exactly what types can actually be used.

I disagree with your second downside point.

Are you disagreeing with the statement "sum types encourage programming with types", or are you disagreeing with that being a downside? Because it doesn't seem you are disagreeing with the first (your comment is basically just a re-assertion of that) and regarding the second, I acknowledged that it's up to preference above.

The fact that there are plenty of places in the standard library that would immensely benefit from sum types, but are now implemented using empty interfaces and panics, shows that this lacks is hurting coding. Of course, people might say that since such code has been written in the first place, there is no problem and we don't need sum types, but the folly of that logic is that we then wouldn't need any other type for function signatures, and we should just use empty interfaces instead.

This type of black-and-white argument doesn't really help. I agree, that sum types would reduce pain in some instances. Every change making the type-system more powerful will reduce pain in some instances - but it will also cause pain in some instances. So the question is, which outweighs the other (and that is, to a good degree, a question of preference).

The discussions shouldn't be about whether we want a python-esque type-system (no types) or a coq-esque type-system (correctness proofs for everything). The discussion should be "do the benefits of sum types outweigh their downsides" and it's helpful to acknowledge both.


FTR, I want to re-emphasize that, personally, I wouldn't be that opposed to open sum types (i.e. every sum type has an implicit or explicit "SomethingElse"-case), as it would alleviate most of the technical downsides of them (mostly that they are hard to evolve) while also providing most of the technical upsides of them (static type checking, the documentation you mentioned, you can enumerate types from other packages…).

I also assume, though, that open sums a) won't be a satisfying compromise for people who usually push for sum types and b) probably won't be considered a large enough benefit to warrant inclusion by the Go team. But I'd be ready to be proven wrong on either or both of these assumptions :)

One more question:

The fact that there are plenty of places in the standard library that would immensely benefit from sum types

I can only think of two places in the standard library, where I'd say there is any significant benefit to them: reflect and go/ast. And even there, the packages seem to work just fine without them. From this reference point, the words "plenty" and "immensely" seem overstatements - but I might not see a bunch of legitimate places, of course.

database/sql/driver.Value might benefit from being a sum type (as noted in #23077).
https://godoc.corp.google.com/pkg/database/sql/driver#Value

The more public interface in database/sql.Rows.Scan would not, however, without a loss in functionality. Scan can read into values whose underlying type is e.g., int; changing its destination parameter to a sum type would require limiting its inputs to a finite set of types.
https://godoc.corp.google.com/pkg/database/sql#Rows.Scan

@Merovius

I wouldn't be that opposed to open sum types (i.e. every sum type has an implicit or explicit "SomethingElse"-case), as it would alleviate most of the technical downsides of them (mostly that they are hard to evolve)

There are at least two other options that alleviate the “hard to evolve” problem of closed sums.

One is to allow matches on types that are not actually a part of the sum. Then, to add a member to the sum, you first update its consumers to match against the new member, and only actually add that member once the consumers are updated.

Another is to allow “impossible” members: that is, members that are explicitly allowed in matches but explicitly disallowed in actual values. To add a member to the sum, you first add it as an impossible member, then update consumers, and finally change the new member to be possible.

database/sql/driver.Value might benefit from being a sum type

Agreed, didn't know about that one. Thanks :)

One is to allow matches on types that are not actually a part of the sum. Then, to add a member to the sum, you first update its consumers to match against the new member, and only actually add that member once the consumers are updated.

Intriguing solution.

@Merovius interfaces are essentially a family of infinite-sum types. All sum types, infinite or otherwise, have a default: case. Without finite sum types, though, default: means either a valid case you didn't know about it or an invalid case that's a bug somewhere in the program—with finite sums it's only the former and never the latter.

json.Token and the sql.Null* types are other canonical examples. go/types would benefit the same way go/ast does. I'm guessing there are a lot of examples that aren't in the exported APIs where it would have be easier to debug and test some intricate plumbing by limiting the domain of the internal state. I find them most useful for internal state and application constraints that don't come up that often in public APIs for general libraries, though they do have their occasional uses there as well.

Personally I think sum types give Go just enough extra power but not too much. The Go type system is already very nice and flexible, though it does have its shortcomings. Go2 additions to the type system just aren't going to deliver as much power as what's already there—the 80-90% of what's needed is already in place. I mean, even generics wouldn't be fundamentally letting you do something new: it would be letting you do things you already do more safely, more easily, more perfomantly, and in way that enables better tooling. Sum types are similar, imo (though obviously if it were one or the other generics would take precedence (and they pair rather nicely)).

If you allow an extraneous default (all cases + default is allowed) on sum-type switches and don't have the compiler enforce exhaustiveness (though a linter could), adding a case to a sum is just as easy (and just as difficult) as changing any other public API.

json.Token and the sql.Null* types are other canonical examples.

Token - sure. Another instance of the AST-problem (basically any parser benefits from sum types).

I don't see the benefit for sql.Null*, though. Without generics (or adding some "magical" generic optional builtin), you are still going to have to have the types and there doesn't seem a significant difference between type NullBool enum { Invalid struct{}; Value Int } and type NullBool struct { Valid bool; Value Int }. Yes, I am aware there is a difference, but it is vanishingly small.

If you allow an extraneous default (all cases + default is allowed) on sum-type switches and don't have the compiler enforce exhaustiveness (though a linter could), adding a case to a sum is just as easy (and just as difficult) as changing any other public API.

See above. Those are what I call open sums, I'm less opposed to them.

Those are what I call open sums, I'm less opposed to them.

My specific proposal is https://github.com/golang/go/issues/19412#issuecomment-323208336 and I believe it may satisfy your definition of open, though it is still a bit rough and I'm sure there's yet more to remove and polish. In particular I noticed it wasn't clear that a default case was admissible even if all the cases were listed so I just updated it.

Agreed that optional types aren't the killer app of sum types. They are quite nice though and as you point out with generics defining a

type Nullable(T) pick { // or whatever syntax (on all counts)
  Null struct{}
  Value T
}

once and covering all the cases would be great. But, as you also point out, we could do the same with a generic product (struct). There is the invalid state of Valid = false, Value != 0. In that scenario it would be easy to root out if that was causing problems since 2 ⨯ T is small, even if it's not as small as 1 + T.

Of course if it were a more complicated sum with lots of cases and many overlapping invariants it becomes easier to make a mistake and harder to discover the mistake even with defensive programming, so making impossible things just not compile at all can save a lot of hair pulling.

Token - sure. Another instance of the AST-problem (basically any parser benefits from sum types).

I write a lot of programs that take some input, do some processing, and produce some output and I usually divvy this up recursively into a lot of passes that divide the input into cases and transform it based on those cases as move ever closer to the desired output. I may not literally be writing a parser (admittedly sometimes I am because that's fun!) but I find the AST-problem, as you put it, applies to a lot of code—especially when dealing with abstruse business logic that has too many weird requirements and edge cases to fit in my tiny head.

When I'm writing a general library it doesn't come up in the API as often as doing some ETL or making some fanciful report or making sure that users in state X have action Y happen if they're not marked Z. Even in a general library though I find places where being able to limit the internal state would help, even if it just reduces a 10 minute debug to a 1 second "oh the compiler said I'm wrong".

With Go in particular one place where I'd use sum types is a goroutine selecting over a bunch of channels where I need to gives 3 chans to one goroutine and 2 to another. It would help me track what's going on to be able to use a chan pick { a A; b B; c C } over chan A, chan B, chan C though a chan stuct { kind MsgKind; a A; b B; c C } can do the job in a pinch at the cost of extra space and less validation.

Instead of a new type what about the compile-time type list check as an addition to the existing interface type switch feature?

func main() {
    if FlipCoin() == false {
        printCertainTypes(FlipCoin(), int(5))
    } else {
        printCertainTypes(FlipCoin(), string("5"))
    }
}
// this function compiles with main
func printCertainTypes(flip bool, in interface{}) {
    if flip == false {
        switch v := in.(type) {
        case int:
            fmt.Printf(“integer %v\n”, v)
        default:
            fmt.Println(v)
        }
    } else {
        switch v := in.(type) {
        case int:
            fmt.Printf(“integer %v\n”, v)
        case string:
            fmt.Printf(“string %v\n”, v)
        }
    }
}
// this function compiles with main
func printCertainTypes(flip bool, in interface{}) {
    switch v := in.(type) {
    case int:
        fmt.Printf(“integer %v\n”, v)   
    case bool:
        fmt.Printf(“bool %v\n”, v)
    }
    fmt.Println(flip)
    switch v := in.(type) {
    case string:
        fmt.Printf(“string %v\n”, v)
    case bool:
        fmt.Printf(“bool 2 %v\n”, v)
    }
}
// this function emits a type switch not complete error when compiled with main
func printCertainTypes(flip bool, in interface{}) {
    if flip == false {
        switch v := in.(type) {
        case int:
            fmt.Printf(“integer %v\n”, v)
        case bool:
            fmt.Printf(“bool %v\n”, v)
        }
    } else {
        switch v := in.(type) {
        case string:
            fmt.Printf(“string %v\n”, v)
        case bool:
            fmt.Printf(“bool %v\n”, v)
        }
    }
}
// this function emits a type switch not complete error when compiled with main
func printCertainTypes(flip bool, in interface{}) {
    fmt.Println(flip)
    switch v := in.(type) {
    case int:
        fmt.Printf(“integer %v\n”, v)
    case bool:
        fmt.Printf(“bool %v\n”, v)
    }
}

In fairness, we should explore ways of approximating sum types in the current type system and weigh their pros and cons. If nothing else, it gives a baseline for comparison.

The standard means is an interface with an unexported, do-nothing method as a tag.

One argument against this is that each type in the sum needs to have this tag defined on it. This isn't strictly true, at least for members that are structs, we could do

type Sum interface { sum() }
type sum struct{}
func (sum) sum() {}

and just embed that 0-width tag in our structs.

We can add external types to our sum by introducing a wrapper

type External struct {
  sum
  *pkg.SomeType
}

though this is a bit ungainly.

If all members in the sum share common behavior, we can include those methods in the interface definition.

Constructs like this let us say that a type is in a sum, but it does not let us say what is not in that sum. In addition to the mandatory nil case, the same embedding trick can be used by external packages like

import "p"
var member struct {
  p.Sum
}

Within the package we have to take care to validate values that compile but are illegal.

There are various ways to recover some type-safety at runtime. I've found including a valid() error method in the definition of the sum interface coupled with a func like

func valid(s Sum) error {
  switch s.(type) {
  case nil:
    return errors.New("pkg: Sum must be non-nil")
  case A, B, C, ...: // listing each valid member
    return s.valid()
  }
  return fmt.Errorf("pkg: %T is not a valid member of Sum")
}

to be useful as it allows taking care of two kinds of validation at once. For members that happen to always be valid, we can avoid some boilerplate with

type alwaysValid struct{}
func (alwaysValid) valid() error { return nil }

One of the more common complaints about this pattern is that it does not make membership in the sum clear in godoc. Since it also does not let us exclude members and requires us to validate anyway, there's a simple way around this: export the dummy method.
Instead of,

//A Node is one of (list of types).
type Node interface { node() }

write

//A Node is only valid if it is defined in this package.
type Node interface { 
  //Node is a dummy method that signifies that a type is a Node.
  Node()
}

We can't stop anyone from satisfying Node so we may as well let them know what does. While this doesn't make it clear at a glance which types satisfy Node (no central list), it does make it clear whether the particular type you're looking at now satisfies Node.

This pattern is useful when the majority of the types in the sum are defined in the same package. When none are, the common recourse is to fall back to interface{}, like json.Token or driver.Value. We could use the previous pattern with wrapper types for each but in the end it says as much as interface{} so there is little point. If we expect such values to come from outside the package, we can be courteous and define a factory:

//Sum is one of int64, float64, or bool.
type Sum interface{}
func New(v interface{}) (Sum, error) {
  switch v.(type) {
  case nil:
    return errors.New("pkg: Sum must be non-nil")
  case int64, float64, bool:
     return v
  }
  return fmt.Printf("pkg: %T is not a valid member of Sum")
}

A common use of sums is for optional types, where you need to differentiate between "no value" and "a value that may be zero". There are two ways to do this.

*T let's you signify no value as a nil pointer and a (possibly) zero value as the result of derefencing a non-nil pointer.

Like the previous interface-based approximations, and the various proposals for implementing sum types as interfaces with restrictions, this requires an extra pointer dereference and a possible heap allocation.

For optionals this can be avoided using the technique from the sql package

type OptionalT struct {
  Valid bool
  Value T
}

The major downside of this is that it allows encoding invalid state: Valid can be false and Value can be non-zero. It's also possible to grab Value when Valid is false (though this can be useful if you want the zero T if it was not specified). Casually setting Valid to false without zeroing Value followed by setting Valid to true (or ignoring it) without assigning Value causes a previously discarded value to accidentially resurface. This can be worked around by providing setters and getters to protect the invariants of the type.

The simplest form of sum types is when you care about the identity, not the value: enumerations.

The traditional way to handle this in Go is const/iota:

type Enum int
const (
  A Enum = iota
  B
  C
)

Like the OptionalT type this doesn't have any unnecessary indirection. Like the interface sums, it doesn't limit the domain: there are only three valid values and many invalid values, so we need to validate at runtime. If there are exactly two values we can use bool.

There's also the issue of the fundamental number-ness of this type. A+B == C. We can convert untyped integral constants to this type a bit too easily. There are plenty of places where that's desirable, but we get this no matter what. With a little extra work, we can limit this to just identity:

type Enum struct { v int }
var (
  A = Enum{0}
  B = Enum{1}
  C = Enum{2}
)

Now these are just opaque lables. They can be compared but that's it. Unfortunately now we lost const-ness, but we could get that back with a little more work:

func A() Enum { return Enum{0} }
func B() Enum { return Enum{1} }
func C() Enum { return Enum{2} }

We've regained the inability for an external user to alter the names at the cost of some boilerplate and some function calls that are highly inline-able.

However, this is in some ways nicer than the interface sums since we've almost fully closed the type. External code can only use A(), B(), or C(). They can't swap the labels around like in the var example and they can't do A() + B() and we're free to define whatever methods we want on Enum. It would still be possible for code in the same package to erroneously create or modify a value, but if we take care to ensure that does not happen, this is the first sum type that does not require validation code: if it exists, it's valid.

Sometimes you have many labels and some of them have additional date and the ones that do have the same kind of data. Say you have a value that has three valueless states (A, B, C), two with a string value (D, E) and one with a string value and an int value (F). We could use a number of combinations of the above tactics, but the simplest way is

type Value struct {
  Which int // could have consts for A, B, C, D, E, F
  String string
  Int int
}

This is a lot like the OptionalT type above, but instead of a bool it has an enumeration and there are multiple fields that can be set (or not) depending on the value of Which. Validation has to be careful that these are set (or not) appropriately.

There are lots of ways to kinda express "one of the following" in Go. Some require more care than others. They often require validating the "one of" invariant at runtime or extraneous dereferences. A major downside they all share is that since they're being simulated in the language instead of being a part of the language, the "one of" invariant doesn't show up in reflect or go/types, making it hard to metaprogram with them. To use them in metaprogramming you both need to be able to recognize and validate the correct flavor of sum and be told that that's what you're looking for since they all look a lot like valid code without the "one of" invariant.

If sum types were a part of the language, they could be reflected upon and easily pulled out of source code, resulting in better libraries and tooling. The compiler could make a number of optimizations if it were aware of that "one of" invariant. Programmers could focus on the important validation code instead of the trivial maintenance of checking that a value is indeed in the correct domain.

Constructs like this let us say that a type is in a sum, but it does not let us say what is not in that sum. In addition to the mandatory nil case, the same embedding trick can be used by external packages like
[…]
Within the package we have to take care to validate values that compile but are illegal.

Why? As a package author, this seems firmly in the realm of "your problem" to me. If you pass me an io.Reader, whose Read method panic's, I'm not going to recover from that and just let it panic. Likewise, if you go out of your way to create an invalid value of a type I declared - who am I to argue with you? I.e. I consider "I embedded an emulated closed sum" a problem to rarely (if ever) come up by accident.

That being said, you can prevent that problem, by changing the interface to type Sum interface { sum() Sum } and have every value return itself. That way, you can just use the return of sum(), which will be well-behaved even under embedding.

One of the more common complaints about this pattern is that it does not make membership in the sum clear in godoc.

This may help you.

The major downside of this is that it allows encoding invalid state: Valid can be false and Value can be non-zero.

This isn't an invalid state to me. Zero values aren't magical. There is no difference, IMO, between sql.NullInt64{false,0} and NullInt64{false,42}. Both are valid and equivalent representations of an SQL NULL. If all code checks Valid before using Value, the difference is not observable to a program.

It's a fair and correct criticism that the compiler does not enforce doing this check (which it probably would, for "real" optionals/sum types), making it easier to not do it. But if you do forget it, I wouldn't consider it any better to accidentally use a zero value than to accidentally use a non-zero value (with the possible exception of pointer-shaped types, as they'd panic when used, thus failing loudly - but for those, you should just use the bare pointer-shaped type anyway and use nil as "unset").

There's also the issue of the fundamental number-ness of this type. A+B == C. We can convert untyped integral constants to this type a bit too easily.

Is this a theoretical concern or has it come up in practice?

Programmers could focus on the important validation code instead of the trivial maintenance of checking that a value is indeed in the correct domain.

Just FTR, in the cases that I do use sum-types-as-sum-types (i.e. the problem can't be more elegantly modeled via golden variety interfaces) I never write any validation code. Just like I don't check for nil-ness of pointers passed as receivers or arguments (unless it's documented as a valid variant). In the places where the compiler forces me to deal with that (i.e. "no return at end of function" style problems), I panic in the default case.

Personally, I consider Go a pragmatic language, which doesn't just add safety-features for their own sake or because "everyone knows they are better", but based on demonstrated need. I think using it in a pragmatic way is thus fine.

The standard means is an interface with an unexported, do-nothing method as a tag.

There's a fundamental difference between interfaces and sum types (I didn't see it mentioned in your post). When you approximate a sum type via an interface, there's really no way to handle the value. As the consumer, you have no idea what it actually holds, and can only guess. This is no better than than just using an empty interface. It's only usefulness is if any implementation can only come from the same package that defines the interface, since only then can you control what you can get.

On the other hand, having something like:

func foo(val string|int|error) {
    switch v:= val.(type) {
    case string:
        ...
    }
}

Gives the consumer full power in using the value of the sum type. It's value is concrete, not open to interpretation.

@Merovius
These "open sums" you mention have what some people might classify as a significant drawback, in that they would allow abusing them for "feature creep". This very reason has been given for why optional function arguments have been rejected as a feature.

These "open sums" you mention have what some people might classify as a significant drawback, in that they would allow abusing them for "feature creep". This very reason has been given for why optional function arguments have been rejected as a feature.

That seems like a pretty weak argument to me - if nothing else, then because they exist, so you are already allowing whatever they enable. Indeed, we already have optional arguments, for all intents and purposes (not that I like that pattern, but it clearly already is possible in the language).

There's a fundamental difference between interfaces and sum types (I didn't see it mentioned in your post). When you approximate a sum type via an interface, there's really no way to handle the value. As the consumer, you have no idea what it actually holds, and can only guess.

I've tried parsing this a second time and still can't. Why wouldn't you be able to use them? They can be regular, exported types. Yes, they have to be types created in your package (obviously), but apart from that there does not seem to be any restriction in how you can use them, compared to actual, closed sums.

I've tried parsing this a second time and still can't. Why wouldn't you be able to use them? They can be regular, exported types. Yes, they have to be types created in your package (obviously), but apart from that there does not seem to be any restriction in how you can use them, compared to actual, closed sums.

What happens in the case when the dummy method is exported and any third party can implement the "sum type"? Or the quite realistic scenario where a team member is not familiar with the various consumers of the interface, decides to add another implementation in the same package, and an instance of that implementation winds up being passed to these consumers through various means of the code? At a risk of repeating my apparent "unparseable" statement: "As the consumer, you have no idea what [the sum value] actually holds, and can only guess.". You know, since it's an interface, and it doesn't tell you who's implementing it.

@Merovius

Just FTR, in the cases that I do use sum-types-as-sum-types (i.e. the problem can't be more elegantly modeled via golden variety interfaces) I never write any validation code. Just like I don't check for nil-ness of pointers passed as receivers or arguments (unless it's documented as a valid variant). In the places where the compiler forces me to deal with that (i.e. "no return at end of function" style problems), I panic in the default case.

I don't treat this as an always or never thing.

If someone passing bad input would immediately explode, I don't bother with validation code.

But if someone passing bad input might eventually cause a panic but it won't show up for awhile, then I write validation code so that the bad input is flagged as soon as possible and no one has to figure out that the error was introduced 150 frames up in the call stack (especially since they then may have to go up another 150 frames in the call stack to figure out where that bad value was introduced).

Spending half a minute now to potentially save a half hour of debugging later is pragmatic. Especially for me since I make dumb mistakes all the time and the sooner I get schooled the sooner I can move on to make the next dumb mistake.

If I have a func that takes a reader and immediately starts using it, I won't check for nil, but if the func is a factory for a struct that won't call the reader until a certain method is invoked, I'll check it for nil and panic or return an error with something like "reader must not be nil" so that the cause of the error is as close to the source of the error as possible.

godoc -analysis

I'm aware but I don't find it useful. It ran for 40 minutes on my workspace before I hit ^C and that needs to be refreshed every time a package is installed or modified. There's #20131 (forked from this very thread!) though.

That being said, you can prevent that problem, by changing the interface to type Sum interface { sum() Sum } and have every value return itself. That way, you can just use the return of sum(), which will be well-behaved even under embedding.

I haven't found that that useful. It doesn't provide any more benefits than explicit validation and it provides less validation.

Is [the fact that you can add members of a const/iota enumeration] a theoretical concern or has it come up in practice?

That particular one was theoretical: I was trying to list all the pros and cons I could think of, theoretical and practical. My larger point, though, was that there were many ways to try to express the "one of" invariant in the language that do get used fairly commonly but none as simple as just having it be a kind of type in the language.

Is [the fact that you can assign an untyped integral to a const/iota enumeration] a theoretical concern or has it come up in practice?

That one has come up in practice. It didn't take long to figure out what went wrong but it would have taken even less time if the compiler had said "there, that line—that's the one that's wrong". There's talk of other ways of handling that particular case, but I don't see how they'd be of general use.

This isn't an invalid state to me. Zero values aren't magical. There is no difference, IMO, between sql.NullInt64{false,0} and NullInt64{false,42}. Both are valid and equivalent representations of an SQL NULL. If all code checks Valid before using Value, the difference is not observable to a program.

It's a fair and correct criticism that the compiler does not enforce doing this check (which it probably would, for "real" optionals/sum types), making it easier to not do it. But if you do forget it, I wouldn't consider it any better to accidentally use a zero value than to accidentally use a non-zero value (with the possible exception of pointer-shaped types, as they'd panic when used, thus failing loudly - but for those, you should just use the bare pointer-shaped type anyway and use nil as "unset").

That "If all code checks Valid before using Value" is where the bugs slip in and what the compiler could enforce. I have had bugs like that happen (albeit with larger versions of that pattern, where there were more than one value field and more than two states for the discriminator). I believe/hope I found all of these during development and testing and none escaped into the wild, but it would be nice if the compiler could have just told me when I made that mistake and I could be sure that the only way one of these slipped past was if there was a bug in the compiler, the same way it would tell me if I tried to assign a string to a variable of type int.

And, sure, I prefer *T for optional types though that does have non-zero costs associated with it, both in execution spacetime and in the readability of the code.

(For that particular example the code to get the actual value or the correct zero value with the pick proposal would be v, _ := nullable.[Value] which is concise and safe.)

That is very much not what I would want. Pick types should be value types,
as in Rust. Their first word should be a pointer to GC Metadata, if needed.

Otherwise their use comes with a performance penalty that might be
unacceptable. For me, the pass 10:41 AM, "Josh Bleecher Snyder" <
[email protected]> wrote:

With the pick proposal you can choose to have a p or *p giving you more
greater control over memory trade offs.

The reason interfaces allocate to store scalar values is so you you don't
have to read a type word in order to decide whether the other word is a
pointer; see #8405 https://github.com/golang/go/issues/8405 for
discussion. The same implementation considerations would likely apply for a
pick type, which might mean in practice that p end up allocating and being
non-local anyway.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/19412#issuecomment-323371837, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGGWB-wQD75N44TGoU6LWQhjED_uhKGUks5sZaKbgaJpZM4MTmSr
.

@urandom

What happens in the case when the dummy method is exported and any third party can implement the "sum type"?

There is a difference between the method being exported and the type being exported. We seem to be talking past each other. To me, this seems to work just fine, without any difference between open and closed sums:

type X interface { x() X }
type IntX int
func (v IntX) x() X { return v }
type StringX string
func (v StringX) x() X { return v }
type StructX struct{
    Foo bool
    Bar int
}
func (v StructX) x() X { return v }

There's no extension outside the package possible, yet consumers of the package can use, create and pass around the values just like any other.

You can embed X, or one of the local types that satisfy it, externally and then pass it to a function in your package that takes an X.

If that func calls x it either panics (if X itself was embedded and not set to anything) or returns a value that your code can operate on—but it's not what was passed by the caller, which would be a bit surprising to the caller (and their code is already suspect if they're attempting something like this because they didn't read the docs).

Calling a validator that panics with a "don't do that" message seems like the least surprising way to handle that and lets the caller fix their code.

If that func calls x it either panics […] or returns a value that your code can operate on—but it's not what was passed by the caller, which would be a bit surprising to the caller

Like I said above: If you are surprised, that your intentional construction of an invalid value is invalid, you need to rethink your expectations. But in any case, that is not what this particular strain of discussion was about and it would be helpful to keep separate arguments separate. This one was about @urandom saying that open sums via interfaces with tag-methods wouldn't be introspectable or usable by other packages. I find that a dubious claim, it would be great if it could be clarified.

The problem is that someone can create a type that is not in the sum that compiles and can be passed to your package.

Without adding proper sum types to the language, there are three options for handling it

  1. ignore the situation
  2. validate and panic/return an error
  3. try to "do what you mean" by implicitly extracting the embedded value and using it

3 seems like a strange mix of 1 and 2 to me: I don't see what it buys.

I agree that "If you are surprised, that your intentional construction of an invalid value is invalid, you need to rethink your expectations", but, with 3, it can be very hard to notice that something has gone wrong and even when you do it'd be hard to figure out why.

2 seems best because it both protects the code from slipping into an invalid state and sends up a flare if someone messes up letting them know why they're wrong and how to correct it.

Am I misunderstanding the intent of the pattern or are we just approaching this from different philosophies?

@urandom I'd also appreciate clarification; I'm not 100% sure on what you're trying to say, either.

The problem is that someone can create a type that is not in the sum that compiles and can be passed to your package.

You can always do that; if in doubt, you could always use unsafe, even with compiler-checked sum types (and I don't see that as a qualitatively different way of constructing invalid values from embedding something that is clearly intended as a sum and not initializing it to a valid value). The question is "how often will this pose a problem in practice and how severe will that problem be". In my opinion, with the solution from above the answer is "pretty much never and very low" - you apparently disagree, which is fine. But either way, there doesn't seem to be much of a point laboring over this - the arguments and views on both sides of this particular point should be sufficiently clear and I'm trying to avoid too much noisy repetition and focus on the genuinely new arguments. I brought up above construction to demonstrate that there is no difference in exportability between first-class sum types and emulated-sums-via-interfaces. Not to show that they are strictly better in every way.

if in doubt, you could always use unsafe, even with compiler-checked sum types (and I don't see that as a qualitatively different way of constructing invalid values from embedding something that is clearly intended as a sum and not initializing it to a valid value).

I think it is qualitatively different: when people misuse embedding in this way (at least with proto.Message and the concrete types that implement it), they're generally not thinking about whether it is safe and what invariants it might break. (Users assume that interfaces completely describe the required behaviors, but when interfaces are employed as union or sum types they often do not. See also https://github.com/golang/protobuf/issues/364.)

In contrast, if someone uses package unsafe to set a variable to a type to which it cannot normally refer, they're more-or-less explicitly claiming to have at least thought about what they might break and why.

@Merovius Perhaps I've been unclear: the fact that the compiler would tell someone they used embedding wrong is more of a nice side benefit.

The largest gain of the safety feature is that it would be honored by reflect and represented in go/types. That gives tooling and libraries more information to work with. There are lots of ways to simulate sum types in Go but they're all identical to non-sum type code, so tooling and library needs out of band info to know that it's a sum type and has to be able to recognize the specific pattern being used but even those patterns allow significant variation.

It would also make unsafe the only way to create an invalid value: now you have regular code, generated code, and reflect—the latter two being more likely to cause an issue as unlike a person they cannot read the documentation.

Another side benefit of the safety means the compiler has more information and can generate better faster code.

There's also the fact that in addition to being able replace the pseudo-sum with interfaces you could replace the pseudo-sum "one of these regular types" like json.Token or driver.Value. Those are few and far between but it would be one less place where interface{} is necessary.

It would also make unsafe the only way to create an invalid value

I don't think I understand the definition of "invalid value" that leads to this statement.

@neild if you had

var v pick {
  None struct{}
  A struct { X int; Y *T}
  B int
}

it would be laid out in memory like

struct {
  activeField int //which of None (0), A (1), or B (2) is the current field
  theInt int // If None always 0
  thePtr *T // If None or B, always nil
}

and with unsafe you could set thePtr even if activeField was 0 or 2 or set a value of theInt even if activeField was 0.

In either case this would invalidate assumptions the compiler would be making and allows the same kind of theoretical bugs that we can have today.

But as @bcmills pointed out if you're using unsafe you'd better know what you're doing because it's the nuclear option.

What I don't understand is why unsafe is the only way to create an invalid value.

var t time.Timer

t is an invalid value; t.C is unset, calling t.Stop will panic, etc. No unsafe required.

Some languages have type systems which go to great lengths to prevent the creation of "invalid" values. Go is not one of them. I don't see how unions move that needle significantly. (There are other reasons to support unions, of course.)

@neild yes sorry I'm being loose with my definitions.

I should have said invalid with respect to the invariants of the sum type.

The individual types in the sum can of course be in an invalid state.

However, maintaining the sum type invariants means they're accessible to reflect and go/types as well as the programmer so manipulating them in libraries and tools maintains that safety and provides more information to the metaprogrammer

@jimmyfrasche , I'm saying that unlike a sum type, which tells you every possible type it can be, an interface is opaque in that you don't know, or at least you can't be use, what the list of types that implement the interface are. This makes writing the switch portion of the code a bit of a guesswork:

func F(sum SumInterface) {
    switch v := sum {
    case Screwdriver:
             ...
    default:
           panic ("Someone implementing a new type which gets passed to F and causes a runtime panic 3 weeks into production")
    }
}

So, it would seem to me, that most of the issues people are having with the interface-based sum-type emulation can be solved by tolling and/or convention. E.g. if an interface contains an unexported method, it would be trivial to figure out all possible (yes, intentional circumventions) implementations. Similarly, to address most of the issues with iota-based enums, a simple convention of "an enum is a type Foo int with a declaration of the form const ( FooA Foo = iota; FooB; FooC )" would enable to write extensive and precise tools for them too.

Yes, this isn't equivalent to actual sum types (among other things, they wouldn't get first-class reflect support, though I don't really understand how important that would be anyway), but it does mean that the existing solutions appear, from my POV, better than they are often painted. And IMO it would be worth exploring that design space before actually putting them into Go 2 - at least if they really are that important to people.

(and I want to re-emphasize that I'm aware of the advantages of sum types, so there's no need to restate them for my benefit. I just don't weigh them as heavily as other people, also see the disadvantages and thus come to different conclusions on the same data)

@Merovius that's a fine position.

The reflect support would allow libraries as well as off-line tools—linters, code generators, etc.—to access the information and to disallow it from modifying it inappropriately which cannot be detected statically with any precision.

Regardless, it's a fair idea to explore, so let's explore it.

To recap the most common families of pseudosums in Go are: (roughly in order of occurrence)

  • const/iota enum.
  • Interface with tag method for sum over types defined in same package.
  • *T for an optional T
  • struct with an enum whose value determines what fields may be set (when the enum is a bool and there's only one other field this is another kind of optional T)
  • interface{} that's restricted to a grab bag of a finite set of types.

All of those can be used for both sum types and non-sum types. The first two are so rarely used for anything else that it might make sense to just assume that they represent sum types and accept the occasional false positive. For interface sums, it could limit it to unexported method with no params or returns and with no body on any members. For enums it would make sense to only recognize them when they're just Type = iota so it's not tripped up when iota is used as part of an expression.

*T for an optional T would be really hard to distinguish from a regular pointer. This could be given the convention type O = *T. That would be possible to detect, though a bit difficult since the alias name isn't part of the type. type O *T would be easier to detect but harder to work with in code. On the other hand everything that needs to be done is essentially built into the type so there's little to be gained in tooling from recognizing this. Let's just ignore this one. (Generics would likely allow something along the lines of type Optional(T) *T which would simplify "tagging" these).

The struct with an enum would be hard to reason about in tooling, which fields go with which value for the enum? We could simplify this to the convention that there must be one field per member in the enum and that the enum value and the field value must be the same, for example:

type Which int
const (
  A Which = iota
  B
  C
)
type Sum struct {
  Which
  A struct{} // has to be included to line up with the value of Which
  B struct { X int; Y float64 }
  C struct { X int; Y int } 
}

That wouldn't get optional types but we could special case "2 fields, first is bool" in the recognizer.

Using an interface{} for a grab bag sum would be impossible to detect without a magic comment like //gosum: int, float64, string, Foo

Alternately, there could be a special package with the following definitions:

package sum
type (
  Type struct{}
  Enum int
  OneOf interface{}
)

and only recognize enums if they're of the form type MyEnum sum.Enum, only recognize interfaces and structs only if they embed sum.Type, and only recognize interface{} grab bags like type GrabBag sum.OneOf (but that would still need a machine recognizable comment to explain its comments). That would have the following pros and cons:
Pros

  • explicit in the code: if it is so marked it is 100% a sum type, no false positives.
  • those definitions could have documentation explaining what they mean and the package documentation could link to tools that can be used with these types
  • some would have some visibility in reflect
    Cons
  • Plenty of false negatives from old code and the stdlib (which would not be using them).
  • They would have to be used to be useful so adoption would be slow and likely never get to 100% and the effectiveness of tools that recognized this special package would be a function of the adoption, so interesting though experiment but likely unrealistic.

Regardless of which of those two ways are used to identify sum types, let's assume that they were recognized and move on to using that information to see what kind of tooling we can build.

We can roughly group tooling into generative (like stringer) and introspective (like golint).

The simplest generative code would be a tool to fill in a switch statement with missing cases. This could be used by editors. Once a sum type is identified as a sum type this is trivial (a bit tiresome but the actual generation logic is going to be the same with or without language support).

In all cases it would be possible to generate a function that validates the "one of" invariant.

For enums there could be more tools like stringer. In https://github.com/golang/go/issues/19814#issuecomment-291002852 I mentioned some possibilities.

The biggest generative tool is the compiler which could produce better machine code with this info, but ah well.

I can't think of any others at the moment. Is there anything on anyone's wish list?

For introspection, the obvious candidate is exhaustiveness linting. Without language support there are actually two different kinds of linting required

  1. making sure all possible states are handled
  2. making sure no invalid states are created (which would invalidate the work done by 1)

1 is trivial, but it would require all possible states and a default case because 2 can't be verified 100% (even ignoring unsafe) and you can't expect all code using your code runs this linter anyway.

2 couldn't really follow values through reflect or identify all code that could generate an invalid state for the sum but it could catch a lot of simple errors, like if you embed a sum type and then call a func with it, it could say "you wrote pkg.F(v) but you meant pkg.F(v.EmbeddedField)" or "you passed 2 to pkg.F, use pkg.B". For the struct it couldn't do much to enforce the invariant that one field is set at a time except in really obvious cases like "you're switching on Which and in the case X you set the field F to a non-zero value". It could insist that you use the generated validation function when accepting values from outside the package.

The other big thing would be showing up in godoc. godoc already groups const/iota and #20131 would help with the interface pseudosums. There's not really anything to do with the struct version that isn't explicit in the definition other than to specify the invariant.

as well as off-line tools—linters, code generators, etc.

No. The static information is present, you don't need the type-system (or reflect) for that, convention works fine. If your interface contains unexported methods, any static tool can choose to treat that as a closed sum (because it effectively is) and do any analysis/codegen you might want. Likewise with the convention of iota-enums.

reflect is for runtime type information - and in a sense, the compiler erases the necessary info to make sums-by-convention work here (as it doesn't give you access to a list of functions or declared types or declared consts), which is why I agree that actual sums enable this.

(also, FTR, depending on the use case, you could still have a tool that uses the statically known information to generate the necessary runtime-information - e.g. it could enumerate the types which have the required tag-method and generate a lookup-table for them. But I don't understand what a use-case would be, so it's hard to evaluate the practicality of this).

So, my question was intentionally: What would the use case be, of having this info available at runtime?

Regardless, it's a fair idea to explore, so let's explore it.

When I said "explore it", I did not mean "enumerate them and argue about them in a vacuum", I meant "implement tools that use these conventions and see how useful/necessary/practical they are".

The advantage of experience reports is, that they are based on experience: You needed to do a thing, you tried to use existing mechanisms for that, you found that they didn't suffice. This focuses the discussion on the actual use-case (as in "the case it was used in") and enables to evaluate any proposed solutions against them, against the tried alternatives and to see, how a solution would not have the same pitfalls.

You are skipping the "trying to use existing mechanisms for that" part. You want to have static exhaustiveness-checks of sums (problem). Write a tool that finds interfaces with unexported methods, does the exhaustiveness-checks for any type-switch it's used in, use that tool for a while (use the existing mechanisms for it). Write up, where it failed.

I was thinking out loud and have begun work on a static recognizer based on those thoughts that tools may use. I was, I suppose, implicitly looking for feedback and more ideas (and that paid off re generating the info necessary for reflect).

FWIW, if I where you I'd simply ignore the complex cases and focus on the things that work: a) unexported methods in interfaces and b) simple const-iota-enums, that have int as an underlying type and a single const-declaration of the expected format. Using a tool would require using one of these two workarounds, but IMO that's fine (to use the compiler tool, you'd also need to explicitly use sums, so that seems okay).

That's definitely a good place to start and it can be dialed in after running it over a large set of packages and seeing how many false positives/negatives there are

https://godoc.org/github.com/jimmyfrasche/closed

Still very much a work in progress. I can't promise I won't have to add extra parameters the the constructor. It probably has more bugs than tests. But it's good enough to play with.

There's an example of usage in cmds/closed-exporer that will also list all closed types detected in a package specified by its import path.

I started just detecting all interfaces with unexported methods but they're fairly common and while some were clearly sum types others clearly weren't. If I just limited it to the empty tag method convention, I lost a lot of sum types, so I decided to record both separately and generalize the package a little bit beyond sum types to closed types.

With enums I went the other way and just recorded every non-bitset const of a defined type. I plan to expose the discovered bitsets, too.

It doesn't detect optional structs or defined empty interfaces yet since they'll require some kind of marker comment, but it does special case the ones in the stdlib.

I started just detecting all interfaces with unexported methods but they're fairly common and while some were clearly sum types others clearly weren't.

I would find it helpful if you could provide some of the examples that weren't.

@Merovius sorry I didn't keep a list. I found them by running stdlib.sh (in cmds/closed-explorer). If I run across a good example next time I get to play with this I'll post it.

The ones that I'm not considering as sum types were all unexported interfaces that were being used to plug in one of several implementations: nothing cared what was in the interface, just that there was something that satisfied it. They were very much being used as interfaces not sums, but just happened to be closed because they were unexported. Perhaps that's a distinction without a difference, but I can always change my mind after further investigation.

@jimmyfrasche I'd argue those should be properly treated as closed sums. I'd argue that if they don't care about the dynamic type (i.e. only calling the methods in the interface), then a static linter wouldn't complain, as "all switches are exhaustive" - so there's no downside to treating them as closed sums. If, OTOH, they do sometimes type-switch and leave out a case, complaining would be correct - that would exactly be the kind of thing the linter is supposed to catch.

I'd like to put in a good word for exploring how union types could reduce memory usage. I'm writing an interpreter in Go and have a Value type that is necessarily implemented as an interface because Values can be pointers to different types. This presumably means a []Value takes up twice as much memory compared to packing the pointer with a small bit tag as you could do in C. It seems like a lot?

The language spec needn't mention this, but it seems like cutting memory usage of an array in half for some small union types could be a pretty compelling argument for unions? It lets you do something that as far as I know is impossible to do in Go today. By contrast, implementing unions on top of interfaces could help with program correctness and understandability, but doesn't do anything new at the machine level.

I haven't done any performance testing; just pointing out a direction for research.

You can implement a Value as an unsafe.Pointer instead.

On Feb 6, 2018 3:54 PM, "Brian Slesinsky" notifications@github.com wrote:

I'd like to put in a good word for exploring how union types could reduce
memory usage. I'm writing an interpreter in Go and have a Value type that
is necessarily implemented as an interface because Values can be pointers
to different types. This presumably means a []Value takes up twice as much
memory compared to packing the pointer with a small bit tag as you could do
in C. It seems like a lot?

The language spec needn't mention this, but it seems like cutting memory
usage of an array in half for some small union types could be a pretty
compelling argument for unions? It lets you do something that as far as I
know is impossible to do in Go today. By contrast, implementing unions on
top of interfaces could help with program correctness and
understandability, but doesn't do anything new at the machine level.

I haven't done any performance testing; just pointing out a direction for
research.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/19412#issuecomment-363561070, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGGWBz-L3t0YosVIJmYNyf2iQ-YgIXLGks5tSLv9gaJpZM4MTmSr
.

@skybrian That seems pretty presumptuous regarding the implementation of sum types. It not only requires sum-types, but also that the compiler recognizes the special case of only pointers in a sum and optimizes them as a packed pointer - and it requires the GC to be aware of how many tag-bits are required in the pointer, to mask them out. Like, I don't really see these things happening, TBH.

That leaves you with: Sum types will probably be tagged unions and will probably take up just as much space in a slice as now. Unless the slice is homogenous, but then you can also use a more specific slice-type right now.

So yeah. In very special cases, you might be able to save a bit of memory, if you specifically optimize for them, but it would seem you can also just manually optimize for that, if you actually need it.

@DemiMarie unsafe.Pointer does't work on App Engine, and in any case, it's not going to let you pack bits without messing up the garbage collector. Even if it were possible, it wouldn't be portable.

@Merovius yes, it does require changing the runtime and garbage collector to understand packed memory layouts. That's kind of the point; pointers are managed by the Go runtime, so if you want to do better than interfaces in a safe way, you can't do it in a library, or in the compiler.

But I'll readily admit that writing a fast interpreter is an uncommon use case. Perhaps there are others? It seems like a good way to motivate a language feature is to find things that can't easily be done in Go today.

That is true.

My thought is that Go is not the best language to write an interpreter in,
due to the wildly dynamic of such software. If you need high performance,
your hot loops ought to be written in assembly. Is there some reason you
need to write an interpreter that works on App Engine?

On Feb 6, 2018 6:15 PM, "Brian Slesinsky" notifications@github.com wrote:

@DemiMarie https://github.com/demimarie unsafe.Pointer does't work on App
Engine, and in any case, it's not going to let you pack bits without
messing up the garbage collector. Even if it were possible, it wouldn't be
portable.

@metrovius yes, it does require changing the runtime and garbage collector
to understand packed memory layouts. That's kind of the point; pointers are
managed by the Go runtime, so if you want to do better than interfaces in a
safe way, you can't do it in a library, or in the compiler.

But I'll readily admit that writing a fast interpreter is an uncommon use
case. Perhaps there are others? It seems like a good way to motivate a
language feature is to find things that can't easily be done in Go today.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/19412#issuecomment-363598572, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGGWB65jRKg_qVPWTiq8LbGk3YM1RUasks5tSN0tgaJpZM4MTmSr
.

I find the @rogpeppe proposal quite appealing. I also wonder if there's potential to unlock additional benefits to go along with those already identified by @griesemer.

The proposal says: "The method set of the sum type holds the intersection of the method set
of all its component types, excluding any methods that have the same
name but different signatures.".

But a type is more than just a method set. What if the sum type supported the intersection of the operations supported by its component types?

For example, consider:

var x int|float64

The idea being that the following would work.

x += 5

It would be equivalent to writing out the full type switch:

switch i := x.(type) {
case int:
    x = i + 5
case float64:
    x = i + 5
}

Another variant involves a type switch where a component type is itself a sum type.

type Num int | float64
type StringOrNum string | Num 
var x StringOrNum

switch i := x.(type) {
case string:
    // Do string stuff.
case Num:
    // Would be nice if we could use i as a Num here.
}

Also, I think there's potentially a really nice synergy between sum types and a generics system that uses type constraints.

var x int|float64

What about var x, y int | float64? What are the rules here, when adding these? Which lossy conversion gets made (and why)? What will be the result type?

Go doesn't do automatic conversions in expressions (like C does) on purpose - these questions aren't easy to answer and that lead to bugs.

And for even more fun:

var x, y, z int|string|rune
x = 42
y = 'a'
z = "b"
fmt.Println(x + y + z)
fmt.Println(x + z + y)
fmt.Println(y + x + z)
fmt.Println(y + z + x)
fmt.Println(z + x + y)
fmt.Println(z + y + x)

All of int, string and rune have a + operator; what is the above printing, why and most of all, how can the result not be completely confusing?

What about var x, y int | float64? What are the rules here, when adding these? Which lossy conversion gets made (and why)? What will be the result type?

@Merovius no lossy conversion gets implicitly made, although I can see how my wording could give that impression sorry. Here, a simple x + y would not compile because it implies a possible implicit conversion. But either of the following would compile:

z = int(x) + int(y)
z = float64(x) + float64(y)

Similarly your xyz example would not compile because it requires possible implicit conversions.

I think "supported the intersection of the operations supported" sounds nice but doesn't quite convey what I was intending. Adding something like "compiles for all component types" helps describe how I think it could work.

Another example is if all the component types are slices and maps. Would be nice to be able to call len on the sum type without needing a type switch.

All of int, string and rune have a + operator; what is the above printing, why and most of all, how can the result not be completely confusing?

Just wanted to add that my "What if the sum type supported the intersection of the operations supported by its component types?" was inspired by the Go Spec's description of a type as "A type determines a set of values together with operations and methods specific to those values.".

The point I was trying to make is that a type is more than just values and methods, and thus a sum type could try to capture the commonality of that other stuff from its component types. This "other stuff" is more nuanced than just a set of operators.

Another example is comparison to nil:

var x []int | []string
fmt.Println(x == nil)  // Prints true
x = []string(nil)
fmt.Println(x == nil)  // Still prints true

Both component types are At least one type is comparable to nil, so we allow the sum type to be compared to nil without a type switch. Of course this is somewhat at odds with how interfaces currently behave, but that might not be a bad thing per https://github.com/golang/go/issues/22729

Edit: equality testing is a bad example here as I think it should be more permissive, and only require a potential match from one or more component types. Mirrors assignment in that respect.

The problem is, that the result will either a) have the same problems that automatic conversions have or b) will be extremely (and IMO confusingly) limited in scope - namely, all the operators would only work with untyped literals, at best.

I also have another issue, which is that allowing that will even further limit their robustness against evolution of their constituent types - now the only types you could ever add while preserving backwards compatibility are ones which allow all the operations of their constituent types.

All of this just seems really messy to me, for a very small (if any) tangible benefit.

now the only types you could ever add while preserving backwards compatibility are ones which allow all the operations of their constituent types.

Oh and to be explicit about this one too: It implies that you can never decide that you'd like to extend a parameter or return type or variable or… from a singleton-type to a sum. Because adding any new type will make some operations (like assignments) fail to compile.l

@Merovius note that a variant of the compatibility issue already exists with the original proposal because "The method set of the sum type holds the intersection of the method set
of all its component types". So if you add a new component type that doesn't implement that method set, then that'll be a non backwards compatible change.

Oh and to be explicit about this one too: It implies that you can never decide that you'd like to extend a parameter or return type or variable or… from a singleton-type to a sum. Because adding any new type will make some operations (like assignments) fail to compile.l

Assignment behavior would remain as described by @rogpeppe but overall I'm not sure I understand this point.

If nothing else, I think the original rogpeppe proposal needs to be clarified regarding the behavior of the sum type outside of a type switch. Assignment and method set are covered, but that's all. What about equality? I think we can do better than what interface{} does:

var x int | float64
fmt.Println(x == "hello")  // compilation error?
x = 0.0
fmt.Println(x == 0) // true or false?  I vote true :-)

So if you add a new component type that doesn't implement that method set, then that'll be a non backwards compatible change.

You can always add methods, but you can't overload operators to work on new types. Which is precisely the difference - in their proposal, you can only call the common methods on a sum-value (or assign to it), unless you unwrap it with a type-assertion/-switch. Thus, as long as the type you add has the necessary methods, it would not be a breaking change. In your proposal, it still would be a breaking change, because users might use operators you can't overload.

(you might want to point out that adding types to the sum would still be a breaking change, because type-switches would not have the new type in them. Which is exactly why I'm not in favor of the original proposal either - I don't want closed sums for that very reason)

Assignment behavior would remain as described by @rogpeppe

Their proposal only talks about assignment to a sum-value, I talk about assignment from a sum-value (to one of its constituent parts). I agree that their proposal doesn't allow this either, but the difference is, that their proposal isn't about adding this possibility. i.e. my argument is exactly, that the semantics you suggest are not particularly beneficial, because in practice, the usage they get is severely limited.

fmt.Println(x == "hello") // compilation error?

This would probably be added to their proposal as well. We already have an equivalent special case for interfaces, namely

A value x of non-interface type X and a value t of interface type T are comparable when values of type X are comparable and X implements T. They are equal if t's dynamic type is identical to X and t's dynamic value is equal to x.

fmt.Println(x == 0) // true or false? I vote true :-)

Presumably false. Given, that the similar

var x int|float64 = 0.0
y := 0
fmt.Println(x == y)

should be a compile-error (as we concluded above), this question only really makes sense when comparing to untyped numerical constants. At that point it kind of depends how this is added to the spec. You could argue, that this is similar to assigning a constant to an interface type and thus it should have its default type (and then the comparison would be false). Which IMO is more than fine, we already accept that situation today without much fuzz. You could, however, also add a case to the spec for untyped constants that would cover the case of assigning/comparing them to sums and solve the question that way.

Answering this question either way, however, doesn't necessitate allowing all expressions using sum types that might make sense for the constituent parts.

But to reiterate: I'm not arguing in favor of a different proposal for sums. I'm arguing against this one.

fmt.Println(x == "hello") // compilation error?

This would probably be added to their proposal as well.

Correction: The spec already covers this compilation error, given that it contains the statement

In any comparison, the first operand must be assignable to the type of the second operand, or vice versa.

@Merovius you make some good points about my variant of the proposal. I'll refrain from debating them further, but I would like to drill into the comparison to 0 question a little further because it applies equally to the original proposal.

fmt.Println(x == 0) // true or false? I vote true :-)

Presumably false. Given, that the similar

var x int|float64 = 0.0
y := 0
fmt.Println(x == y)
should be a compile-error (as we concluded above),

I don't find this example very compelling because if you change the first line to var x float64 = 0.0 then you could use the same reasoning to argue that comparing a float64 to 0 should be false. (Minor points: (a) I assume you meant float64(0) on first line, since 0.0 is assignable to int. (b) x==y should not be a compile error in your example. It should print false though.)

I think your idea that "that this is similar to assigning a constant to an interface type and thus it should have its default type" is more compelling (assuming you meant sum type), so the example would be:

var x,y int|float64 = float64(0), 0
fmt.Println(x == y) // false

I'd still argue that x == 0 should be true though. My mental model is that a type is given to 0 as late as possible. I realize that this is contrary to the current behavior of interfaces which is precisely why I brought it up. I agree that this hasn't lead to "much fuzz", but the similar issue of comparing interfaces to nil has resulted in quite a lot of confusion. I believe we'd see a similar amount of confusion for comparison to 0 if sum types come into existence and the old equality semantics are kept.

I don't find this example very compelling because if you change the first line to var x float64 = 0.0 then you could use the same reasoning to argue that comparing a float64 to 0 should be false.

I didn't say it should, I said that presumably it would, given what I perceive as the most likely tradeoff between simplicity/usefulness for how their proposal would be implemented. I wasn't trying to make a value judgement. In fact, if with just as simple rules we could get it to print true, I'd probably tend to prefer it. I'm just not optimistic.

Note, that comparing float64(0) to int(0) (i.e. the example with the sum replaced by var x float64 = 0.0) isn't false, though, it's a compile-time error (as it should be). This is exactly my point; your proposal is only really useful when combined with untyped constants, because for anything else it wouldn't compile.

(a) I assume you meant float64(0) on first line, since 0.0 is assignable to int.

Sure (I was assuming semantics closer to the current "default type" for constant expressions, but I agree that the current wording doesn't imply that).

(b) x==y should not be a compile error in your example. It should print false though.)

No, it should be a compile time error. You have said, that the operation e1 == y, with e1 being a sum-type expression should be allowed if and only if the expression would compile with any choice of constituent type. Given that in my example, x has type int|float64 and y has type int and given that float64 and int are not comparable, this condition is clearly violated.

To make this compile you'd need to drop the condition that the substituting any constituent typed expression needs to compile too; at which point we are in the situation of having to set up rules how types are promoted or converted when used in these expressions (also know as "the C mess").

The past consensus has been that sum types do not add very much to interface types.

They doesn't indeed for most usecases of Go: trivial network services and utils. But once the system grows larger there's a good chance they are useful.
I am currently writing a heavily distributed service with data consistency guarantees implemented via lots of logic and I drove into the situation where they would be handy. These NPDs became too annoying as the service grew large and we don't see a sane way to split it.
I mean Go's type system guarantees are a bit too weak for something more complex than typical primitive network services.

But, the story with Rust shows it is a bad idea to use sum types for NPD and error handling just like they do in Haskell: there is typical natural imperative workflow and Haskellish approach doesn't fit well into it.

Example

consider iotuils.WriteFile-like function in pseudocode. Imperative flow would look like this

file = open(name, os.write)
if file is error
    return error("cannot open " + name + " writing: " + file.error)
if file.write(data) is error:
    return error("cannot write into " + name + " : " + file.error)
return ok

and how it looks in Rust

match open(name, os.write)
    file
        match file.write(data, os.write)
            err
                return error("cannot open " + name + " writing: " + err)
            ok
                return ok
    err
        return error("cannot write into " + name + " : " + err)

it is safe but ugly.

And my proposal:

type result[T, Err] oneof {
    default T
    Error Err
}

and how the program could look like (result[void, string] = !void)

file := os.Open(name, ...)
if !file {
    return result.Error("cannot open " + name + " writing: " + file.Error)
}
if res := file.Write(data); !res {
    return result.Error("cannot write into " + name + " : " + res.Error)
}
return ok

Here default branch is anonymous and error branch can be accessed with .Error (once it is known the result is Error). Once it is known the file was opened successfully user can access it via the variable itself. In first if we make sure file was successfuly opened or exit otherwise (and thus further statements knows the file is not an error).

As you see, this approach preserves imperative flow and provides type safety. NPD handling can be done in a similar way:

type Reference[T] oneof {
    default T
    nil
}
// Reference[T] = *T

the handling is similar to result

@sirkon, your Rust example doesn't convince me that there's anything wrong with straightforward sum types like in Rust. Rather, it suggests that pattern-matching on sum types might be made more Go-like using if statements. Something like:

ferr := os.Open(name, ...)
if err(e) := ferr {           // conditional match and unpack, initializing e
  return fmt.Errorf("cannot open %v: %v", name, e)
}
ok(f) := ferr                  // unconditional match and unpack, initializing f
werr := f.Write(data)
...

(In the spirit of sum types, it would be compile error if the compiler can't prove that an unconditional match always succeeds because there is exactly one case remaining.)

For basic error-checking, this doesn't seem like an improvement over multiple return values, since it is one line longer and declares one more local variable. However, it would scale better to multiple cases (by adding more if statements), and the compiler could check that all cases are handled.

@sirkon

They doesn't indeed for most usecases of Go: trivial network services and utils. But once the system grows larger there's a good chance they are useful.
[…]
I mean Go's type system guarantees are a bit too weak for something more complex than typical primitive network services.

Statement like these are unnecessarily confrontational and derogatory. They are also kind of embarrassing, TBH, because there are extremely large, nontrivial services written in Go. And given that a significant chunk of its developers work at Google, you should just assume that they know better than you, if it is suitable to write large and non-trivial services. Go might not cover all usecases (neither should it, IMO), but it empirically does not only work for "primitive network services".

NPD handling can be done in a similar way

I think this really illustrates that your approach doesn't actually add any significant value. As you point out, it simply adds a different syntax for dereference. But AFAICT nothing is preventing a programmer from using that syntax on a nil-value (which would presumably still panic). i.e. every program that is valid using *p is also valid using p.T (or is it p.default? It's hard to tell what your idea is specifically) and vice versa.

The one advantage sum types can add to error handling and nil-dereferences is that the compiler can enforce that you have to prove that the operation is safe by pattern-matching on it. A proposal that omits that enforcement doesn't seem to bring significant new things to the table (arguably, it is worse than using open sums via interfaces), while a proposal that does include it is exactly what you describe as "ugly".

@Merovius

And given that a significant chunk of its developers work at Google, you should just assume that they know better than you,

Blessed are the believers.

As you point out, it simply adds a different syntax for dereference.

again

var written int64
...
res := os.Stdout.Write(data) // Write([]byte) -> Result[int64, string] ≈ !int64
written += res // Will not compile as res is a packed result type
if !res {
    // we are living on non-default res branch thus the only choice left is the default
    return Result.Error(...)
}
written += res // is OK

@skybrian

ferr := os.Open(...)

this intermediate variable is what force me to left this idea. As you see, my approach is specifically for error and nil-handling. These tiny tasks are too important and deserve a special attention IMO.

@sirkon You apparently have very little interest in talking to people eye-to-eye. I'll leave it at that.

Let us keep our conversations civil, and avoid nonconstructive comments. We can disagree on things, but still maintain a respectable discourse. https://golang.org/conduct.

And given that a significant chunk of its developers work at Google, you should just assume that they know better than you

I doubt you could make that kind of argument at Google.

@hasufell that guy is from Germany where they don't have large IT companies with crap interviews to pump interviewer's ego and behemoth management, that's why these words.

@sirkon same goes for you. Ad-hominem and social arguments are not useful. This is more than a CoC problem. I've seen these kind of "social arguments" pop up rather frequently when it's about the core language: compiler devs know better, language designers know better, google people know better.

No, they don't. There is no intellectual authority. There is just decision authority. Get over it.

Hiding a few comments to reset the conversation (and thanks @agnivade for trying to get it back on the rails).

Folks, please consider your role in these discussions in light of our Gopher values: everyone in the community has a perspective to bring, and we should strive to be respectful and charitable in how we interpret and respond to each other.

Allow me, please, to add my 2-cents to this discussion:

We need a way to group different types together by features other than their method sets (as with interfaces). A new grouping feature should allow including primitive (or basic) types, which don’t have any methods, and interface types to be categorized as relevantly similar. We can keep primitive types (boolean, numeric, string, and even []byte, []int, etc.) as they are but enable abstracting away from differences between types where a type definition groups them in a family.

I suggest we add something like a type _family_ construct to the language.

The Syntax

A type family may be define much like any other type:

type theFamilyName family {
    someType
    anotherType
}

The formal syntax would be something like:
FamilyType = "family" "{" { TypeName ";" } "}" .

A type family may be defined inside of a function signature:

func Display(s family{string; fmt.Stringer}) { /* function body */ }

That is, the one-line definition requires semicolons between the type names.

The zero value of a family type is nil, like with a nil interface.

(Under the hood a value sitting behind the family abstraction is implemented much like an interface.)

The Reasoning

We need something more precise than the empty interface where we want to specify what types are valid as arguments to a function or as returns of a function.

The proposed solution would enable better type safety, fully checked at compile time and adding no additional overhead at runtime.

The point is that _Go code should be more self-documenting_. What a function can take as an argument should be built into the code itself.

Too much code incorrectly exploits the fact that “interface{} says nothing.” It’s a little embarrassing that such a widely used (and abused) construct in Go, without which we wouldn’t be able to do much, says _nothing_.

Some Examples

The documentation for the sql.Rows.Scan function includes a large block detailing what types may be passed in to the function:

Scan converts columns read from the database into the following common Go types and special types provided by the sql package:
 *string
 *[]byte
 *int, *int8, *int16, *int32, *int64
 *uint, *uint8, *uint16, *uint32, *uint64
 *bool
 *float32, *float64
 *interface{}
 *RawBytes
 any type implementing Scanner (see Scanner docs)

And for the sql.Row.Scan function the documentation includes the sentence “See the documentation on Rows.Scan for details.” See the documentation for _some other function_ for details? This is not Go-like—and in this case that sentence is not correct because in fact Rows.Scan can take a *RawBytes value but Row.Scan cannot.

The problem is that we are often forced to rely on comments for guarantees and behavior contracts, which the compiler cannot enforce.

When the docs for a function say that the function works just like some other function—“so go see the documentation for that other function”—you can almost guarantee that the function will be misused sometimes. I’ll bet that most people, like myself, have only found out that a *RawBytes is not permitted as an argument in Row.Scan only after getting an error from the Row.Scan (saying "sql: RawBytes isn't allowed on Row.Scan"). It’s sad that the type system permits such mistakes.

We could instead have:

type Value family {
    *string
    *[]byte
    *int; *int8; *int16; *int32; *int64
    *uint; *uint8; *uint16; *uint32; *uint64
    *bool
    *float32; *float64
    *interface{}
    *RawBytes
    Scanner
}

This way, the value passed in must be one of the types in the given family, and the type switch inside the Rows.Scan function will not need to deal with any unexpected or default cases; there would be another family for the Row.Scan function.

Consider also how the cloud.google.com/go/datastore.Property struct has a “Value” field of type interface{} and requires all this documentation:

// Value is the property value. The valid types are:
// - int64
// - bool
// - string
// - float64
// - *Key
// - time.Time
// - GeoPoint
// - []byte (up to 1 megabyte in length)
// - *Entity (representing a nested struct)
// Value can also be:
// - []interface{} where each element is one of the above types
// This set is smaller than the set of valid struct field types that the
// datastore can load and save. A Value's type must be explicitly on
// the list above; it is not sufficient for the underlying type to be
// on that list. For example, a Value of "type myInt64 int64" is
// invalid. Smaller-width integers and floats are also invalid. Again,
// this is more restrictive than the set of valid struct field types.
//
// A Value will have an opaque type when loading entities from an index,
// such as via a projection query. Load entities into a struct instead
// of a PropertyLoadSaver when using a projection query.
//
// A Value may also be the nil interface value; this is equivalent to
// Python's None but not directly representable by a Go struct. Loading
// a nil-valued property into a struct will set that field to the zero
// value.

This could be:

type PropertyVal family {
  int64
  bool
  string
  float64
  *Key
  time.Time
  GeoPoint
  []byte
  *Entity
  nil
  []int64; []bool; []string; []float64; []*Key; []time.Time; []GeoPoint; [][]byte; []*Entity
}

(You can imagine how this could be split up cleaner into two families.)

The json.Token type was mentioned above. It's type definition would be:

type Token family {
    Delim
    bool
    float64
    Number
    string
    nil
}

Another example that I got bit by recently:
When calling functions like sql.DB.Exec, or sql.DB.Query, or any function which takes a variadic list of interface{} where each element has to have a type in a particular set and _not itself be a slice_, it’s important to remember to use the “spread” operator when passing in the arguments from a []interface{} into such a function: it’s wrong to say DB.Exec("some query with placeholders", emptyInterfaceSlice); the correct way is: DB.Exec("the query...", emptyInterfaceSlice...) where emptyInterfaceSlice has type []interface{}. An elegant way to make such mistakes impossible would be to have this function take a variadic argument of Value, where Value is defined as a family as described above.

The point of these examples is that _real mistakes are being made_ because of the imprecision of the interface{}.

var x int | float64 | string | rune
z = int(x) + int(y)
z = float64(x) + float64(y)

This should definitely be a compiler error because the type of x isn't really compatible with what can be passed to int().

I like the idea of having family. It'd essentially be an interface constrained (constricted?) to the listed types and the compiler can ensure you're matching against all the time and changes the type of the variable within the local context of the corresponding case.

The problem is that we are often forced to rely on comments for guarantees and
behavior contracts, which the compiler cannot enforce.

That's actually the reason why I started to slightly dislike things like

func foo() (..., error) 

because you have no idea what kind of error it returns.

and a few other things that return an interface instead of a concrete type. Some functions
return net.Addr and it's sometimes a bit difficult to dig through the source code to figure out what kind of net.Addr it actually returns and then use it appropriately. There's not really much downside in returning a concrete type (because it implements the interface and can thus be used anywhere where the interface can be used) except when you
later plan to extend your method to return a different kind of net.Addr. But if your
API mentions it returns OpError then why not make that part of the "compile time" spec?

For example:

 OpError is the error type usually returned by functions in the net package. It describes the operation, network type, and address of an error. 

Usually? Doesn't tell you exactly which functions return this error. And this is the documentation for the type, not the function. The documentation for Read nowhere mentions it returns OpError. Also, if you do

err := blabla.(*OpError)

it'll crash once it returns a different kind of error. That's why I'd really like to see this as part of the function declaration. At least *OpError | error would tell you that it returns
such an error and the compiler makes sure you don't do an unchecked type assertion crashing your program in the future.

BTW: Was a system like Haskell's type polymorphism considered yet? Or a 'trait' based type system i.e.:

func calc(a < add(a, a) a >, b a) a {
   return add(a, b)
}

func drawWidgets(widgets []< widgets.draw() error >) error {
  for _, widgets := range widgets {
    err := widgets.draw()
    if err != nil {
      return err
    }
  }
  return nil
}

a < add(a, a) a means "whatever a's type is, there must exist a function add(typeof a, typeof a) typeof a)". < widgets.draw() error> means that "whatever widget's type, it must provide a method draw that returns an error". This would allow more generic functions to be created:

func Sum(a []< add(a,a) a >) a {
  sum := a[0]
  for i := 1; i < len(a); i++ {
    sum = add(sum,a[i])
  }
  return sum
}

(Note that this is not equal to traditional "generics").

There's not really much downside in returning a concrete type (because it implements the interface and can thus be used anywhere where the interface can be used) except when you later plan to extend your method to return a different kind of net.Addr.

Also, Go doesn't have variant subtyping, so you can't use a func() *FooError as a func() error where needed. Which is especially important for interface satisfaction. And lastly, this doesn't compile:

func Foo() (FooVal, FooError) {
    // ...
}

func Bar(f FooVal) (BarVal, BarError) {
    // ...
}

func main() {
    foo, err := Foo()
    if err != nil {
        log.Fatal(err)
    }
    bar, err := Bar(foo) // Type error: Can not assign BarError to err (type FooError)
    if err != nil {
        log.Fatal(err)
    }
}

i.e. to make this work (I'd like if we could somehow) we'd need far more sophisticated type inference - currently, Go only uses local type information from a single expression. In my experience, those kinds of type inference algorithms are not only significantly slower (slowing down compilation and commonly not even bounded runtime) but also produce far less understandable error messages.

Also, Go doesn't have variant subtyping, so you can't use a func() *FooError as a func() error where needed. Which is especially important for interface satisfaction. And lastly, this doesn't compile:

I'd have expected that this works fine in Go but I've never stumbled upon this because the current practice is to just use error. But yes, in this cases these restrictions practically force you to use error as the return type.

func main() {
    foo, err := Foo()
    if err != nil {
        log.Fatal(err)
    }
    bar, err := Bar(foo) // Type error: Can not assign BarError to err (type FooError)
    if err != nil {
        log.Fatal(err)
    }
}

I'm not aware of any language that allows this (well, except for esolangs) but all you would have to do is keep a "type world" (which is basically a map of variable -> type) and if you re-assign the variable you just update its type in the "type world".

I don't think you need complicated type inference to do this but you need to keep track of types of variables but I'm assuming you need to do that anyway because

var int i = 0;
i = "hi";

you surely somehow have to remember which variables/declarations have which types and for i = "hi" you need to make a "type lookup" on i to check whether you can assign a string to it.

Are there practical issues that complicates assigning a func () *ConcreteError to a func() error other than the type checker not supporting it (like runtime reasons/compiled code reasons)? I guess currently you'd have to wrap it in a function like this:

type MyFunc func() error

type A struct {
}

func (_ *A) Error() string { return "" }

func NewA() *A {
    return &A{}
}

func main() {
    var err error = &A{}
    fmt.Println(err.Error())
    var mf MyFunc = MyFunc(func() error { return NewA() }) // type checks fine
        //var mf MyFunc = MyFunc(NewA) // doesn't type check
    _ = mf
}

If you're faced with a func (a, b) c but get a func (x, y) z all that needs to be done is to check whether z is assignable to c (and a,b must be assignable to x,y) which at least on the type level doesn't involve complicated type inference (it just involves checking whether a type is assignable/compatible to/with another type). Of course, whether this causes issues with runtime/compilation... I don't know but at least strictly looking at the type level I don't see why this would involve complicated type inference. The type checker already knows if an x can be assigned to a thus it also easily knows whether func () x can be assigned to func () a. Of course, there might be practical reasons (thinking about runtime representations) why this won't be easily possible. (I'm suspecting that's the real crux here, not the actual type checking).

Theoretically you could workaround the runtime issues (if there are) with automatically wrapping functions (like in the above snippet) with the _potentially huge_ downside that it screws up comparisons of funcs with funcs (as the wrapped func won't be equal to the func it wraps).

I'm not aware of any language that allows this (well, except for esolangs)

Not exactly, but I'd argue that's because languages with powerful type systems are usually functional languages which don't really use variables (and so don't really need the ability to reuse identifiers). FWIW, I'd argue that e.g. Haskell's type system would be able to deal with this just fine - at least as long as you are not using any other properties of FooError or BarError, it should be able to infer that err is of type error and deal with it. Of course, again, this is a hypothetical, because this exact situation doesn't easily transfer to a functional language.

but I'm assuming you need to do that anyway because

The difference being, that in your example, i has a clear and well-understood type after the first line, which is int and you then run into a type-error when you assign a string to it. Meanwhile, for something like I mentioned, every usage of an identifier essentially creates a set of constraints on the used type and the type checker then tries to infer the most general type fulfilling all the constraints given (or complain that there is no type fulfilling that contract). That's what formal type theories are for.

Are there practical issues that complicates assigning a func () *ConcreteError to a func() error other than the type checker not supporting it (like runtime reasons/compiled code reasons)?

There are practical problems, but I believe for func they are probably solvable (by emitting un/-wrapping code, similarly to how interface-passing works). I wrote a bit about variance in Go and explain some of the practical problems I see at the bottom. I'm not totally convinced it's worth adding though. I.e. I'm unsure it solves important problems on its own.

with the potentially huge downside that it screws up comparisons of funcs with funcs (as the wrapped func won't be equal to the func it wraps).

funcs are not comparable.

Anyway, TBH, all of this seems a bit off-topic for this issue :)

FYI: I just did this. It's not nice, but it sure is type-safe. (Same thing can be done for #19814 FWIW)

I'm a little late to the party, but I too would like to share with you my feelings after 4 years of Go:

  • Multi-value returns were a huge mistake. Discriminated unions should've been used instead.
  • Nil-able interfaces were a mistake. Passing nil pointers to interfaces should have caused a panic.
  • Pointers are not synonyms for "optional", discriminated unions should've been used instead.
  • The JSON unmarshaller should have returned an error if a required field is not included in the JSON document. Discriminated unions should've been used for optional JSON fields instead because they won't ever crash the server

In those last 4 years I've found many problems associated with it:

  • garbage data returns in case of error.
  • syntax clutter (returning zeroed-values in case of error).
  • multi-error returns (confusing APIs, please, don't do that!).
  • non-nil interfaces pointing to pointers pointing to nil (confuses the hell out of people making the "Go is an easy language" statement sound like a bad joke).
  • unchecked JSON fields make servers crash (yey!).
  • unchecked returned pointers make servers crash, yet nobody documented the returned pointer represents an optional (maybe-type) and could, therefore, be nil (yey!)

The changes required to fix all those issues, however, would require a truly backward incompatible Go 2.0.0 (not Go2) version, which will never be realized, I suppose. Anyway...

This is how error handling should have looked like:

// Divide returns either a float64 or an arbitrary error
func Divide(dividend, divisor float64) float64 | error {
  if dividend == 0 {
    return errors.New("dividend is zero")
  }
  if divisor == 0 {
    return errors.New("divisor is zero")
  }
  return dividend / divisor
}

func main() {
  // type-switch statements enforce completeness:
  switch v := Divide(1, 0).(type) {
  case float64:
    log.Print("1/0 = ", v)
  case error:
    log.Print("1/0 = error: ", v)
  }

  // type-assertions, however, do not:
  divisionResult := Divide(3, 1)
  if v, ok := divisionResult.(float64); ok {
    log.Print("3/1 = ", v)
  }
  if v, ok := divisionResult.(error); ok {
    log.Print("3/1 = error: ", v.Error())
  }
  // yet they don't allow asserting types not included in the union:
  if v, ok := divisionResult.(string); ok { // compile-time error!
    log.Print("3/1 = string: ", v)
  }
}

Interfaces are not a replacement for discriminated unions, they're two completely different animals. The compiler makes sure type-switches on discriminated unions are complete, meaning the cases cover all possible types, if you don't want this then you can use the type-assertion statement.

Too often have I seen people being totally confused about _non-nil interfaces to nil values_: https://play.golang.org/p/JzigZ2Q6E6F. Usually, people get confused when an error interface is pointing to a pointer of a custom error type that's pointing to nil, that's one of the reasons I think making interfaces nil-able was a mistake.

An interface is like a receptionist, you know it's a human when you're talking to it, but in Go, it could be a cardboard figure and the world will suddenly crash if you try to talk to it.

Discriminates unions should've been used for optionals (maybe-types) and passing nil pointers to interfaces should have resulted in a panic:

type CustomErr struct {}
func (err *CustomErr) Error() string { return "custom error" }

func CouldFail(foo int) error | nil {
  var err *customErr
  if foo > 10 {
    // you can't return a nil pointer as an interface value
    return err // this will panic!
  }
  // no error
  return nil
}

func main() {
  // assume no error
  if err, ok := CouldFail().(error); ok {
    log.Fatalf("it failed, Jim! %s", err)
  }
}

Pointers and maybe-types are not interchangeable. Pointers are usually used to reference memory and/or avoid unnecessary copying. Using pointers for optional types is bad because it leads to confusing APIs:

// P returns a pointer to T, but it's not clear whether or not the pointer
// will always reference a T instance. It might be an optional T,
// but the documentation usually doesn't tell you.
func P() *T {}

// O returns either a pointer to T or nothing, this implies (but still doesn't guarantee)
// that the pointer is always expected to not be nil, in any other case nil is returned.
func O() *T | nil {}

Then there's also JSON. Using pointers for optional JSON fields is wrong because a malformed JSON document could crash the server if a pointer is not properly checked for nil before use. This could never happen with unions though because the compiler forces you to check them before use. The JSON unmarshaller should fail if a required field (including fields of pointer type) is not included in the JSON document:

type DataModel struct {
  // Optional needs to be type-checked before use
  // and is therefore allowed to no be included in the JSON document
  Optional string | nil `json:"optional,omitempty"`
  // Required won't ever be nil
  // If the JSON document doesn't include it then unmarshalling will return an error
  Required *T `json:"required"`
}

P.S.
I'm also working on a functional language design at the moment and this is how I use discriminated unions for error handling there:

read = (s String) -> (Array<Byte> or Error) => match s {
  "A" then Error<NotFound>
  "B" then Error<AccessDenied>
  "C" then Error<MemoryLimitExceeded>
  else Array<Byte>("this is fine")
}

main = () -> ?Error => {
  // assume the result is a byte array
  // otherwise throw the error up the stack wrapped in a "type-assertion-failure" error
  r = read("D") as Array<Byte>
  log::print("data: %s", r)
}

I would love to see this become true one day. So let's see if I can help a bit:

Maybe the problem is that we are trying to cover too much with the proposal. We could go with a simplified version that brings the majority of the value so that it would be much easier to add it to the language in the short term.

From my point of view, this simplified version would be just related to nil. Here are the main ideas (almost all of them have already been mentioned in the comments):

  1. Only allow the | nil version of "discriminated unions", and it can only be used with pointer types:
    <any pointer type> | nil
    Where any pointer type would be: pointers, functions, channels, slices, and maps (the Go pointer types)
  2. Forbid assigning nil to a bare pointer type. If you want to assign nil, then the type needs to be <pointer type> | nil. For example:
var n *int       = nil // Does not compile, wrong type
var n *int | nil = nil // Ok!

var set map[string] bool       = nil // Does not compile
var set map[string] bool | nil = nil // Ok!

var myFunc func(int) err       = nil // Nope!
var myFunc func(int) err | nil = nil // All right.

Those are the main ideas. The following ones are the derived ideas from the main ones:

  1. You can't declare a variable of a bare pointer type and leave it uninitialized. If you want to do that, then you need to add the | nil discriminated type
var maybeAString *string       // Wrong: invalid initial value
var maybeAString *string | nil // Good
  1. You can assign a bare pointer type to a "nilable" pointer type, but not the other way around:
var value int = 42
var barePointer *int = &value          // Valid
var nilablePointer *int | nil = &value // Valid

nilablePointer = barePointer // Valid
barePointer = nilablePointer // Invalid: Incompatible types
  1. The only way to get the value out of a "nilable" pointer type is via the type switch, as others have pointed out. For example, following with the above example, if we really want to assign the value of nilablePointer to barePointer, then we would need to do:
switch val := nilablePointer.(type) {
  case *int:
    barePointer = val // Yeah! Types are compatible now. It is imposible that "val = nil"
  case nil:
    // Do what you need to do when nilablePointer is nil
}

And that's it. I know that discriminated unions can be used for much more (notably in the case of returning errors), but I'd say that sticking just to what I've written above we would bring a HUGE value to the language with less effort and without complicating it more than necessary.
Benefits I see with this simple proposal:

  • a) No nil pointer errors. Okay, never 4 words meant that much. That's why I feel the need to say it from another point of view: No Go program will _EVER_ have a nil pointer dereference error again! 💥
  • b) You can pass pointers to function parameters without trading "performance vs intention".
    What I mean with this is that there are some times when I want to pass a struct to a function, and not a pointer to it, because I don't want to that function to be worried about nullity and force it to check the parameters. However, I normally end up passing a pointer to avoid the copying overhead.
  • c) No more nil maps! YEAH! We will end with the inconsistency about the "safe nil-slices" and the "unsafe nil-maps" (that will panic if you try to write to them). A map will either be initialized or be of type map | nil, in which case you would need to use a type switch 😃

But there is also another intangible here that brings a lot of value: the developer peace of mind. You can be working and playing with pointers, functions, channels, maps, etc with the relaxed feeling that you don't need to worry about them being nil. _I would pay for this!_ 😂

A benefit of starting with this simpler version of the proposal is that it won't stop us to go for the full proposal in the future, or even go step by step (being, for me, the next natural step to allow discriminated error returns, but let's forget about that now).

One problem is that even this simple version of the proposal is backward incompatible, but it can be easily fixed by gofix: just replace all the pointer type declarations by <pointer type> | nil.

What do you think? I hope this could shed some light and speed up the inclusion of nil-safety into the language. It seems that this way (through the "discriminated unions") is the simpler and more orthogonal way to achieve it.

@alvaroloes

You can't declare a variable of a bare pointer type and left it uninitialized.

This is the crux of the matter. That's just not a thing Go does - every type has a zero value, full stop. Otherwise you'd have to answer what, e.g. make([]T, 100) does? Other things you mention (e.g. nil maps panicing on writes) is a consequence of this basic rule. (And as an aside, I don't think it's really true to say that nil-slices are safer than maps - writing to a nil-slice will panic just as much as writing to a nil-map).

In other words: Your proposal is actually not that simple, as it deviates pretty significantly from a pretty fundamental design decision in the Go language.

I think more important thing that Go does is making zero values useful and not simply giving everything zero value. Nil map is a zero value but it's not useful. It's harmful, actually. So why not disallow zero value in cases it's not useful. Changing Go in this regard would be beneficial but the proposal is indeed not that simple.

The proposal above looks more like optional/non-optional kind of thing like in Swift and others. It's cool and all but:

  1. That would break pretty much every program out there and the fix wouldn't be trivial to gofix. You can't just replace everything with <pointer type> | nil as, per proposal, this would require type switch to unpack the value.
  2. For this to be actually usable and bearable Go would need to have much more syntactic sugar around these optionals. Take Swift, for example. There're many features in the language specifically to work with optionals - guard, optional binding, optional chaining, nil coalescing etc etc. I don't think Go would go in that direction but without them working with optionals would be a chore.

So why not disallow zero value in cases it's not useful.

See above. It means that some things that look cheap have very non-trivial costs associated with it.

Changing Go in this regard would be beneficial

It has benefits, but that's not the same as being beneficial. It also has harms. Which weigh heavier is up to preference and a tradeoff. The Go designers chose this.

FTR, this is a general pattern in this thread and one of the main counter-arguments to any concept of sum types - that you need to say what the zero value is. Which is why any new idea should explicitly address it. But somewhat frustratingly, most people who are posting here these days have not read the rest of the thread and tend to ignore that part.

🤔 Aha! I knew there was something obvious I was missing. Doh! The word "simple" has complex meanings. Ok, feel free to remove the "simple" word from my previous comment.

Sorry if it was frustrating to some of you. My intention was to try to help a bit. I try to keep up with the thread, but I don't have too much spare time to spend on this.

Back to the matter: so it seems that the main reason that's holding this back is the zero value.
After thinking for a while and discarding a lot of options, the only thing I think that could add value and is worth mentioning is the following:

If I recall correctly, the zero value of any type consist of filling its memory space with 0's.
As you already know, this is fine for non-pointer types, but it is a source of bugs for pointer types:

type S struct {
    n int
}
var s S 
s.n  // Fine

var s *S
s.n // runtime error

var f func(int)
f() // runtime error

So, what if we:

  • Define a useful zero value for every pointer type
  • Only initialize it the first time it is used (lazy initialization).

I think this has been suggested in another issue, not sure. I just write it here because it addresses the main holding-back point of this proposal.

The following could be a list of the zero values for the pointer types. Note that those zero values will be used only when the value is accessed. We could call it "dynamic zero-value", and it is only a property of the pointer types:

| Pointer type | Zero value | Dynamic zero value | Comment |
| --- | --- | --- | --- |
| *T | nil | new(T) |
| []T | nil | []T{} |
| map[T]U | nil | map[T]U{} |
| func | nil | noop | So the dynamic zero value of a function does nothing and returns zero values. If the return values list finishes in error, then a default error is returned saying the function is a "no operation" |
| chan T | nil | make(chan T) |
| interface |nil | - | a default implementation where all methods are initialized with the noop function described above |
| discriminated union | nil | dynamic zero value of the first type | |

Now, when those types are initialized, they will be nil, as they are right now. The difference is in the moment a nil is accessed. At that moment, the dynamic zero value will be used. A few examples:

type S struct {
    n int
}
var s *S
if s == nil { // true. Nothing different happens here
...
}
s.n = 1       // At this moment the go runtime would check if it is nil, and if it is, 
              // do "s = new(S)". We could say the code would be replaced by:
/*
if s == nil {
    s = new(S)
}
s.n = 1
*/

// -------------
var pointers []*S = make([]*S, 100) // Everything as usual
for _,p := range pointers {
    p.n = 1 // This is translated to:
    /*
        if p == nil {
            p = new(S)
        }
        p.n = 1
    */
}

// ------------
type I interface {
    Add(string) (int, error)
}

var i I
n, err := i.Add("yup!") // This method returns 0, and the default error "Noop"
if err != nil { // This condition is true and the error is returned
    return err
}

I'm probably missing implementation details and possible difficulties, but I wanted to focus on the idea first.

The main drawback is that we add an extra nil-check every time you access the value of a pointer type. But I would say:

  • It is a good tradeoff for the benefits we get. The same situation happens with the bound checks in array/slice accesses and we accept on paying that performance penalty for the safety it brings.
  • The nil-checks could be avoided in the same way as arrays bound checks: if the pointer type has been initialized in the current scope, the compiler could know that and avoid adding the nil-check.

With this, we have all the benefits explained in the previous comment, with the plus that we don't need to use a type switch to access the value (that would be only for the discriminated unions), keeping the go code as clean as it is now.

What do you think? Apologies if this has been already discussed. Also, I'm aware that this comment-proposal is more related to nil than discriminated unions. I might move this to a nil related issue but, as I said, I posted it here because it tries to fix the main problem of the discriminated unions: the useful zero values.

Back to the matter: so it seems that the main reason that's holding this back is the zero value.

It is one significant, technical reason that needs to be addressed. For me, the main reason is that they make gradual repair categorically impossible (see above). i.e. for me personally, it is not so much a question of how to implement them, it's that I'm fundamentally opposed to the concept.
In any case, which reason is "main" is really a matter of taste and preference.

So, what if we:

  • Define a useful zero value for every pointer type
  • Only initialize it the first time it is used (lazy initialization).

This fails if you pass a pointer type around. e.g.

func F(p *T) {
    *p = 42 // same as if p == nil { p = new(T) } *p = 42
}

func G() {
    var p *T
    F(p)
    fmt.Println(p == nil) // Has to be true, as F can't modify p. But now F is silently misbehaving
}

This discussion is everything but new. There are reasons that the reference types behave the way they do and it's not that the Go developers haven't thought about it :)

This is the crux of the matter. That's just not a thing Go does - every type has a zero value, full stop. Otherwise you'd have to answer what, e.g. make([]T, 100) does?

This (and new(T)) would have to be disallowed if T doesn't have a zero value. You'd have to do make([]T, 0, 100) and then use append to populate the slice. Reslicing larger (v[:0][:100]) would also have to be an error. [10]T would basically be an impossible type (unless the ability to assert a slice to an array pointer is added to the language). And you'd need a way to mark existing nilable types as non-nilable in order to maintain backwards compatibility.

This would present an issue if generics are added, in that you'd need to treat all type parameters as not having a zero value unless they satisfy some bound. A subset of types would also need initialization tracking basically everywhere. This would be a fairly large change just on its own even without adding sum types on top of it. It's certainly doable, but it does contribute significantly to the cost side of a cost/benefit analysis. The deliberate choice to keep initialization simple ("there's always a zero value") would instead have the impact of making initialization more complex than if initialization tracking were in the language from day 1.

It is one significant, technical reason that needs to be addressed. For me, the main reason is that they make gradual repair categorically impossible (see above). i.e. for me personally, it is not so much a question of how to implement them, it's that I'm fundamentally opposed to the concept.
In any case, which reason is "main" is really a matter of taste and preference.

Ok, I understand this. We just have to also see other people's point of view (I'm not saying you are not doing that, I'm just making a point :wink:) where they see this as something powerful to write their programs. Does it fit into Go? It depends on how the idea is executed and integrated into the language, and that's what we all are trying to do in this thread (I guess)

This fails if you pass a pointer type around. e.g. (...)

I don't quite get this. Why is this a failure? You are just passing a value into the function parameter, which happens to be a pointer with nil value. Then you are modifying that value inside the function. It is expected that you don't see those effects outside the function. Let me comment on some examples:

// Augmenting your example with more comments:
func FCurrentGo(p *T) {
    // Here "p" is just a value, which happens to be a pointer type. Doing...
    *p = 42
    // ...without checking first for "nil" is the recipe for hiding a bug that will crash the entire program, 
    // which is exactly what is happening in current Go code bases

    // The correct code would be:
    if p == nil {
        // panic or return error
    }
    *p = 42
}

func FWithDynamicZero(p *T) {
    // Here, again, p is just a value of a pointer type. Doing...
    *p = 42
    // would allocate a new T and assign 42. It is true that this doesn't have any effect on the "outside
    // world", which could be considered "incorrect" because you expected the function to do that.
    // If you really want to be sure "p" is pointing to something valid in the "outside world", then
    // check that:
    if p == nil {
        // panic or return error
    }
    *p = 42
}

func main() {
    var p *T
    FCurrentGo(p) // This will crash the program
        FWithDynamicZero(p) // This won't have any effect on "p". This is expected because "p" is not pointing
                            // to anything. No crash here.
    fmt.Println(p == nil) // It is true, as expected
}

A similar situation is happening with non-pointer receiver methods, and it is confusing for new-comers to Go (but once you understand it, then makes sense):

type Point struct {
    x, y int
}

func (p Point) SetXY(x, y int) {
    p.x = x
    p.y = y
}

func main() {
    p := Point{x: 1, y: 2}
    p.SetXY(24, 42)

    pointerToP := &Point{x: 1, y: 2}
    pointerToP.SetXY(24, 42)

    fmt.Println(p, pointerToP) // Will print "{1 2} &{1 2}", which could confuse at first
}

So we need to choose between:

  • A) Failure with a crash
  • B) Failure with a silent non-modification of the value pointed by a pointer when that pointer is passed to a function.

The fix for both cases is the same: check for nil before doing anything. But, for me, A) is much more harmful (the whole application crashes!).
B) could be considered a "silent error", but I wouldn't consider it as an error. It only happens when you pass pointers to functions and as I have shown, there are cases with structs that behave in similar ways. This without considering the huge benefits it brings.

Note: I'm not trying to blindly defend "my" idea, I'm genuinely trying to improve Go (which is already really good). If there are some other points that make the idea not worth it, then I don't care to toss it away and keep thinking in other directions

Note 2: Eventually, this idea is only for "nil" values and has nothing to do with discriminated unions. So I will create a different issue to avoid polluting this one

Ok, I understand this. We just have to also see other people's point of view (I'm not saying you are not doing that, I'm just making a point )

That sword cuts both ways, though. You said "the main reason holding this back was ". That statement implies that we are all agreeing on whether we want the effect of this proposal. I can certainly agree that it is a technical detail holding back the specific suggestions made (or at least, that any suggestion should say something about that question). But I don't like the discussion to be reframed quietly into a parallel world where we are assuming everyone actually wants it.

Why is this a failure?

Because a function that is taking a pointer will, at least often, make a promise to modify the pointee. If the function then silently does nothing, I would consider that a bug. Or at least, it is an easy argument to make, that by preventing a nil-panic this way, you are introducing a new class of bug.

If you pass a nil-pointer to a function that expects something there, that's a bug - and I don't see the actual value in making such a buggy software silently continue. I can see the value in the original idea of catching that bug at compile time by having support for non-nilable pointers, but I don't see the point in allowing that bug to not be caught at all.

i.e. so to speak, you are addressing a sort of different problem from the actual proposal of non-nilable pointers: For that proposal, the runtime-panic isn't the problem, but just a symptom - the problem is the bug in accidentally passing nil to something that doesn't expect it and that this bug is only caught at runtime.

A similar situation is happening with non-pointer receiver methods

I don't buy this analogy. IMO it is totally reasonable to consider

func Foo(p *int) { *p = 42 }

func main() {
    var v int
    Foo(&v)
    if v != 42 { panic("") }
}

to be correct code. I don't think it is reasonable to consider

func Foo(v int) { v = 42 }

func main( ){
    var v int
    Foo(v)
    if v != 42 { panic("") }
}

to be correct. Maybe if you are an absolute beginner in Go and are coming from a language where every value is a reference (though I'm honestly hard-pressed to find one - even Python and Java only make most values references). But IMO, optimizing for that case is futile, it is fair to assume that people have some familiarity with pointers vs. values. I think even a seasoned Go developer would look at, say, a method with pointer receiver accessing it's fields as being correct, and code calling those methods being correct. Indeed, that is the whole argument for preventing nil-pointers statically, that it is too easy to unintentionally have a pointer be nil and correct looking code fail at runtime.

The fix for both cases is the same: check for nil before doing anything.

IMO the fix in the current semantics is to not check for nil and consider it a bug if someone passes nil. Like, in your example you write

// The correct code would be:
if p == nil {
    // panic or return error
}
*p = 42

But I don't consider that code correct. The nil-check doesn't do anything, because dereferencing nil already panics.

But, for me, A) is much more harmful (the whole application crashes!).

That's fine, but keep in mind that many people will disagree strongly on this. I, personally, consider a crash always preferable to continuing with corrupt data and wrong assumptions. In an ideal world, my software has no bugs and never crashes. In a less ideal world, my programs will have bugs and fail safely by crashing when they are detected. In the worst world, my programs will have bugs and just continue wreaking havoc when they are encountered.

That sword cuts both ways, though. You said "the main reason holding this back was ". That statement implies that we are all agreeing on whether we want the effect of this proposal. I can certainly agree that it is a technical detail holding back the specific suggestions made (or at least, that any suggestion should say something about that question). But I don't like the discussion to be reframed quietly into a parallel world where we are assuming everyone actually wants it.

Well, I didn't want to imply this. If that is what was understood, then I might have not chosen the right words and I apologize. I just wanted to provide some ideas for a possible solution, that's it.

I wrote _"...it seems that the main reason that's holding this back is...."_ based on your sentence _"This is the crux of the matter"_ referring to the zero value. That's why I assumed that the zero value was the main thing holding this back. So it was my bad assumption.

Regarding treating nil silently vs checking them at compile time: I agree that it is better to check them at compile time. The "dynamic zero value" was just an iteration on the original suggestion when I focused on addressing the all-types-should-have-zero-value issue. An extra motivation was that I _thought_ it was also the main holding-back of the discriminated unions proposal.
If we focus only on the nil-related issue, I would rather prefer to have non-nil pointer types checked at compile time.

I'd say that at some point, we (with "we" I'm referring to the whole Go community) will need to accept _some kind_ of change. For example: If there is a good solution to avoid nil errors entirely and the thing that holds this back is the design decision "all types have zero value and it is made of 0's", then we could consider the idea of making some tweaks or changes to that decision if it brings value.

The main reason I'm saying this is your sentence _"every type has a zero value, full stop"_. I normally don't like to "write full stops". Don't get me wrong! I completely accept that you think that way, it is just my way of thinking: I prefer no dogmas as they can hide paths that can lead to better solutions.

Finally, regarding this:

That's fine, but keep in mind that many people will disagree strongly on this. I, personally, consider a crash always preferable to continuing with corrupt data and wrong assumptions. In an ideal world, my software has no bugs and never crashes. In a less ideal world, my programs will have bugs and fail safely by crashing when they are detected. In the worst world, my programs will have bugs and just continue wreaking havoc when they are encountered.

I totally agree with this. Failing out loud is always better than failing silently. However, there is a catch in Go:

  • If you have an app with thousands of goroutines, an unhandled panic in one of them makes the whole program crash. This is different than in other languages, where only the thread that panics crashes

Putting that aside (although it is quite dangerous), the idea is, then, to avoid a whole category of failures (nil-related failures).

So let's keep iterating on this and try to find a solution.

Thank you for your time and energy!

I would like to see the rust’s discriminated unions syntax rather than haskell’s sum types, it allow naming of variants and allow better pattern matching syntax proposal.
Implementation can be done like struct with tag field(uint type, depends on count of variants) and union field(holding the data).
This feature required for closed set of variants (state representation would be much easier and cleaner, with compile time checking). According to questions about interfaces and their representation, I think that their implementation in sum type must be not over than just another case of sum type, because the interface is about any type which fit some requirements, but the sum type use case is different.

Syntax:

type Type enum {
         Tuple (int,int),
         One int,
         None,
};

In the example above the size would be sizeof((int,int)).
Pattern matching can be done with new created match operator, or within existing switch operator, just like:

var a Type
switch (a) {
         case Tuple{(b,c)}:
                    //do something
         case One{b}:
                    //do something else
         case None:
                    //...
}

Creation syntax:
var a Type = Type{One=12}
Note, that in enum instance construction only one variant can be specified.

Zero value (problem):
We can sort names in alphabet order, zero value of enum will be zero value of type of first member in sorted member list.

P.S Solution of Zero Value problem is mostly defined by agreement.

I think keeping the zero value of the sum as the zero value of the first user defined sum field would be less confusing, perhaps

I think keeping the zero value of the sum as the zero value of the first user defined sum field would be less confusing, perhaps

But making zero value depend on field declaration order, i think it's worse.

Someone wrote design doc?

I have one:
19412-discriminated_unions_and_pattern_matching.md.zip

I changed this:

I think keeping the zero value of the sum as the zero value of the first user defined sum field would be less confusing, perhaps

Now in my proposal agreement on Zero Value (Problem) moved to urandoms position.

UPD: Design doc changed, minor fixes.

I have two recent use cases, where I needed built in sum types:

  1. AST tree representation, as expected. Initially found a library which was a solution at first sight, but their approach was having a large struct with lots of nilable fields. Worst of both worlds IMO. No type safety of course. Wrote our own instead.
  2. Had a queue of predefined background tasks: we have a search service which is under development right now and our search operations could be overly long, etc. So we decided to execute them in the background via sending search index operation tasks into a channel. Then a dispatcher will decide what to do with them further. Could use visitor pattern, but it obviously is an overkill for a simple gRPC request. And it is not particularly clear to say at least, as it introduces a tie between a dispatcher and a visitor.

In both cases implemented something like this (on the example of 2nd task):

type Task interface {
    task()
}

type SearchAdd struct {
    Ctx   context.Context
    ID    string
    Attrs Attributes
}

func (SearchAdd) task() {}

type SearchUpdate struct {
    Ctx         context.Context
    ID          string
    UpdateAttrs UpdateAttributes
}

func (SearchUpdate) task() {}

type SearchDelete struct {
    Ctx context.Context
    ID  string
}

func (SearchDelete) task() {}

And then

task := <- taskChannel

switch v := task.(type) {
case tasks.SearchAdd:
    resp, err := search.Add(task.Ctx, &search2.RequestAdd{…}
    if err != nil {
        log.Error().Err(err).Msg("blah-blah-blah")
    } else {
        if resp.GetCode() != search2.StatusCodeSuccess  {
            …
        } 
    }
case tasks.SearchUpdate:
    …
case tasks.SearchDelete:
    …
}

This is almost good. The bad thing Go does not provide full type safety, i.e. there will be no compilation error after the new search index operation task would be added.

IMHO using sum types is the clearest solution for these kind of tasks usually solved with visitor and set of dispatchers, where visitor's functions are not numerous and small and visitor itself is a fixed type.

I truly believe having something like

type Task oneof {
    // SearchAdd holds a data for a new record in the search index
    SearchAdd {
        Ctx   context.Context
        ID    string
        Attrs Attributes   
    }

    // SearchUpdate update a record
    SearchUpdate struct {
        Ctx         context.Context
        ID          string
        UpdateAttrs UpdateAttributes
    }

    // SearchDelete delete a record
    SearchDelete struct {
        Ctx context.Context
        ID  string
    }
}

+

switch task {
case tasks.SearchAdd:
    // task is tasks.SearchAdd in this scope
case tasks.SearchUpdate:
case tasks.SearchDelete:
}

would be much more Goish in spirit than any other approach Go allows in its current state. No need for Haskellish pattern matching, juist diving up to the certain type is more than enough.

Ouch, missed the point of the syntax proposal. Fix it.

Two version, one for generic sum type and sum type for enumerations:

Generic sum types

type Sum oneof {
    T₁ TypeDecl₁
    T₂ TypeDecl₂
    …
    Tₙ TypeDeclₙ
}

where T₁Tₙ are type definitions at the same level with Sum (oneof exposes them outside of its scope) and Sum declares some interface which only T₁Tₙ satisfies.

The processing is similar to what we have (type) switch except it is done implicitly over oneof objects and there have to be a compiler check if all variants were listed.

Real type safe enumerations

type Enum oneof {
    Value = iota
}

pretty similar to iota of consts, except only explicitly listed values are Enums and everything else is not.

switch task {
case tasks.SearchAdd:
    // task is tasks.SearchAdd in this scope
case tasks.SearchUpdate:
case tasks.SearchDelete:
}

would be much more Goish in spirit than any other approach Go allows in its current state. No need for Haskellish pattern matching, juist diving up to the certain type is more than enough.

I don't think that manipulating meaning of task variable is good idea, although acceptable.

```go
switch task {
case tasks.SearchAdd:
    // task is tasks.SearchAdd in this scope
case tasks.SearchUpdate:
case tasks.SearchDelete:
}

would be much more Goish in spirit than any other approach Go allows in its current state. No need for Haskellish pattern matching, juist diving up to the certain type is more than enough.

I don't think that manipulating meaning of task variable is good idea, although acceptable.
```

Good luck with your visitors then.

@sirkon What do you mean about visitors? I liked this syntax btw, however should the switch be written as this:

switch task {
case Task.SearchAdd:
    // task is Task.SearchAdd in this scope
case Task.SearchUpdate:
case Task.SearchDelete:
}

Also what would the no-value for Task be? For example:

var task Task

Would it be nil? If so, should the switch have an extra case nil?
Or would it be initialized to the first type? This would be awkward tho, because then order of type declaration matters in a way it didn't before, however it would probably be OK for numeric enums.

I'm assuming that this is equivalent to switch task.(type) but the switch would require all cases to be there, right? as in.. if you miss one case, compilation error. And no default allowed. Is that right?

What do you mean about visitors?

I meant they are the only type safe option in Go for such kind of functionality. Much worse at that for a certain set of cases (limited number of predefined alternatives).

Also what would the no-value for Task be? For example:

var task Task

I am afraid it should be a nilable type in Go as this

Or would it be initialized to the first type?

would be way too weird especially for a purpose intended.

I'm assuming that this is equivalent to switch task.(type) but the switch would require all cases to be there, right? as in.. if you miss one case, compilation error.

Yes, right.

And no default allowed. Is that right?

No, defaults are allowed. Discouraged though.

PS I seem to got an idea Go @ianlancetaylor and other Go people have about sum types. It looks like nilness makes them quite an NPD-prone, since Go doesn't have any control over nil values.

If it's nil, then I guess it's fine. I would prefer that case nil was a requirement for the switch statement. Doing an if task != nil before is ok too, I just don't like it that much :|

Would this be allowed too?

type Foo oneof {
  A = 3
  B = "3"
  C = 3.0
  D = struct { E bool }{ true }
}

Would this be allowed too?

type Foo oneof {
  A = 3
  B = "3"
  C = 3.0
  D = struct { E bool }{ true }
}

Well, no consts then, only

type Foo oneof {
    A <type reference>
}

or

type Foo oneof {
    A = iota
    B
    C
}

or

type Foo oneof {
    A = 1
    B = 2
    C = 3
}

No combination of iotas and values. Or combination with control on values, they should not be repeated.

FWIW, one thing I found interesting about the newest generics design is that it showed another venue to address at least some of the use cases of sum types while avoiding the pitfall about zero-values. It defines disjunctive contracts, which are sum-ish in a way, but because they describe constraints and not types, don't need to have a zero value (as you can't declare variables of that type). That is, it's at least possible to write a function that takes a limited set of possible types, with compile-time type-checking of that set.

Now, of course, as-is the design doesn't really work for the use-cases intended here: Disjunctions list only underlying types or methods and are thus still widely open. And of course, even as a general idea, it's pretty limited as you can't instantiate a generic (or sum-ish-taking) func or value. But IMO it shows that the design space to address some of the use-cases of sums is much larger than the idea of sum types themselves. And that thinking about sums is thus more fixating on a specific solution, instead of specific problems.

Anyway. Just thought it's interesting.

@Merovius makes an excellent point about the latest generic design being able to deal with some of the use cases of sum types. For example, this function which was used earlier in the thread:

func addOne(x int|float64) int|float64 {
    switch x := x.(type) {
    case int:
        return x + 1
    case float64:
         return x + 1
    }
}

would become:

contract intOrFloat64(T) {
    T int, float64
}

func addOne(type T intOrFloat64) (x T) T {
    return x + 1
}

As far as sum types themselves are concerned, if generics eventually land, I'd be even more dubious than I am now about whether the benefits of introducing them would outweigh the costs for a simple language such as Go.

However, if something were to be done, then the simplest and least disruptive solution IMO would be @ianlancetaylor's idea of 'restricted interfaces' which would be implemented in exactly the same way as 'unrestricted' interfaces are today but could only be satisfied by the specified types. In fact, if you took a leaf out of the generic design's book, and made the type constraint the first line of the interface block:

type intOrFloat64 interface{ type int, float64 }    

then this would be completely backwards compatible as you wouldn't need a new keyword (such as restrict) at all. You could still add methods to the interface and it would be a compile time error if the methods were not supported by all of the specified types.

I see no problem at all assigning values to a variable of the restricted interface type. If the type of the value on the RHS (or the default type of an untyped literal) was not an exact match for one of the specified types then it simply would not compile. So we'd have:

var v1 intOrFloat64 = 1        // compiles, dynamic type int
var v2 intOrFloat64 = 1.0      // compiles, dynamic type float64
var v3 intOrFloat64 = 1 + 2i   // doesn't compile, complex128 is not a specified type

It would be a compile time error for the cases of a type switch to not match a specified type and an exhaustiveness check could be implemented. However, a type assertion would still be needed to convert the restricted interface value to a value of its dynamic type as it is today.

Zero values are not a problem with this approach (or at any rate no more of a problem than they are today with interfaces generally). The zero value of a restricted interface would be nil (implying that it doesn't currently contain anything) and the specified types would of course have their own zero values, internally, which would be nil for nilable types.

All this seems perfectly workable to me though, as I said earlier, is the compile time safety gained really worth the additional complexity - I have my doubts as I've never really felt the need for sum types in my own programming.

IIUC the generics thing won't be of dynamic types, so this whole point doesn't stand. However if interfaces are allowed to work as contracts (which I doubt) it wouldn't solve exhaustive checks and enums which is what (I think, maybe not?) sumtypes is about.

@alanfo, @Merovius Thanks for the cue; it's interesting that this discussing is turning in this direction:

I like to turn the viewpoint around for just a split second: I'm trying to understand why contracts cannot be replaced entirely with parameterized interfaces that permit the type restriction mentioned above. At the moment I don't see any strong technical reason, except that such "sum" interface types, when used as "sum" types would want to restrict the possible dynamic values to exactly the types enumerated in the interface, while - if the same interface were used in contract position - the enumerated types in the interface would need to serve as underlying types to be a reasonably useful generic restriction.

@Goodwine
I wasn't suggesting that the generics design would address everything that one might want to do with sum types - as @Merovius clearly explained in his last post they won't. In particular, the type constraints proposed for generics only cover the built-in types and any types derived from them. From a sum type viewpoint the former is too narrow and the latter too wide.

However, the generics design would enable one to write a function which operates on a limited set of types which the compiler would enforce and this is something that we can't do at all at present.

As far as restricted interfaces are concerned, the compiler would know the precise types that could be used and it would therefore become feasible for it to do an exhaustive check in a type switch statement.

@Griesemer

I'm puzzled by what you say as I thought the draft generics design document explained quite clearly (in the section "Why not use interfaces instead of contracts") why the latter were considered a better vehicle than the former for expressing generic constraints.

In particular, a contract can express a relationship between type parameters and therefore only a single contract is needed. Any of its type parameters can be used as the receiver type of a method listed in the contract.

The same cannot be said of an interface, parameterized or not. If they had any constraints at all, each type parameter would need a separate interface.

This makes it more awkward to express a relationship between type parameters using interfaces though not impossible as the graph example showed.

However, if you're thinking that we could "kill two birds with one stone" by adding type constraints to interfaces and then using them for both generic and sum type purposes, then (apart from the problem you mentioned) I think you're probably right that this would be technically feasible.

I guess it wouldn't really matter if interface type constraints could include 'non built-in' types as far as generics were concerned though some way would need to be found to restrict them to the exact types (and not derived types as well) so they would be suitable for sum types. Perhaps we could use const type for the latter (or even just const) if we are to stick with the current keywords.

@griesemer There are a few reasons why parameterized interface types aren't a direct replacement for contracts.

  1. The type parameters are the same as on other parameterized types.
    In a type like

    type C2(type T C1) interface { ... }
    

    the type parameter T exists outside the interface itself. Any type argument passed as T has to be already known to satisfy contract C1, and the body of the interface cannot further constrain T. This is different from contract parameters, which are constrained by the body of the contract as a result of being passed into it. This would mean that each type parameter to a function would have to be independently constrained before being passed as a parameter to the constraint on any other type parameter.

  2. There's no way to name the receiver type in the body of the interface.
    Interfaces would have to let you write something like:

    type C3(type U C1) interface(T) {
        Add(T) T
    }
    

    where T denotes the receiver type.

  3. Some interface types would not satisfy themselves as generic constraints.
    Any operations relying on multiple values of the receiver type are not compatible with dynamic dispatch. These operations would therefore not be usable on interface values. This would mean that the interface would not satisfy itself (for example as the type argument to a type parameter constrained by the same interface). This would be surprising. One solution is just not to allow the creation of interface values for such interfaces at all, but this would disallow the use case being envisioned here anyways.

As for distinguishing between underlying type constraints and type identity constraints, there is one method that might work. Imagine we could define custom constraints, like

contract (T) indenticalTo(U) {
    *T *U
}

(Here, I'm using an invented notation to specify a single type as the "receiver". I'll pronounce a contract with an explicit receiver type as "constraint", just as a func with a receiver is pronounced "method". The parameters after the contract name are normal type parameters and cannot appear on the left hand side of a constraint clause in the body of the constraint.)

Because the underlying type of a literal pointer type is itself, this constraint implies that T is identical to U. Because this is declared as a constraint, you could write (identicalTo(int)), (identicalTo(uint)), ... as a constraint disjunction.

While contracts might be useful to express some sort of sum types, I don't think you can express generic sum types with them. From what I've seen from the draft, one has to list concrete types, so you cannot write something like this:

contract Foo(T, U) {
    T U, int64
}

Which one would need to express a generic sum type of an unknown type and one/more known types. Even if the design did allow for such constructs, they would look weird when used, since both parameters would effectively be the same thing.

I've been thinking some more about how the draft generics design might change if interfaces were extended to include type constraints and then used to replace contracts in the design.

It's perhaps easiest to analyze the situation if we consider different numbers of type parameters:

No parameters

No change :)

One parameter

No real problems here. A parameterized interface (as opposed to a non-generic one) would only be needed if either the type parameter referred to itself and/or some other independent fixed type(s) were needed to instantiate the interface.

Two or more parameters

As mentioned previously each type parameter would need to be constrained individually if it needed a constraint at all.

A parameterized interface would only be needed if:

  1. The type parameter referred to itself.

  2. The interface referred to another type parameter or parameters which _had already been declared_ in the type parameter section (presumably we wouldn't want to backtrack here).

  3. Some other independent fixed type(s) were needed to instantiate the interface.

Of these (2) is really the only troublesome case as it would rule out type parameters referring to each other such as in the graph example. Whether one declared 'Node' or 'Edge' first, its constraining interface would still need the other one to be passed as a type parameter.

However, as indicated in the design document, you could work around this by declaring non-parameterized (as they don't refer to themselves) NodeInterface and EdgeInterface at top level as there would then be no problem in their referring to each other whatever the declaration order. You could then use these interfaces to constrain the Graph struct's type parameters and those of its associated 'New' method.

So it doesn't look like there are any insuperable problems here even if the contracts idea is nicer.

Presumably, comparable could now just become a built-in interface rather than a contract.

Interfaces could, of course, be embedded in each other as they can already.

I'm not sure how one would deal with the pointer method problem (in those cases where these would need to be specified in the contract) as you can't specify a receiver for an interface method. Perhaps some special syntax (such as preceding the method name with an asterisk) would be needed to indicate a pointer method.

Turning now to @stevenblenkinsop's observations, I wonder whether it would make life easier if parameterized interfaces didn't allow their own type parameters to be constrained in any way at all? I'm not sure this is really a useful feature anyway unless someone can think of a sensible use case.

Personally, I don't regard it as surprising that some interface types will not be able to satisfy themselves as generic constraints. An interface type is not a valid receiver type in any case and so can have no methods.

Although Steven's idea of an identicalTo() built in function would work, it seems to me to be potentially long-winded for specifying sum types. I'd prefer a syntax which allows one to specify a whole line of types as being exact.

@urandom is correct, of course, that as the generics draft currently stands one can only list concrete (built-in or aggregate built-in) types. However, this would clearly have to change if restricted interfaces are used instead for both generics and sum types. So I wouldn't rule out that something like this would be permitted in a unified environment:

interface Foo(T) {
    const type T, int64  // 'const' indicates types are exact i.e. no derived types
}

why we can not just add Discriminated Unions to language instead of inventing another walk around of their absence?

@griesemer You might or might not be aware, but I've been in favor of using interfaces to specify constraints from the beginning :) I no longer think the exact ideas I bring up in that post are the way to go (especially the things I suggest to address operators). And I do like the most recent iteration of the contracts design a lot more than the previous one. But in general, I completely agree that (possibly extended) interfaces as constraints are viable and worth considering.

@urandom

I don't think you can express generic sum types with them

I want to re-iterate that my point wasn't "you can build sum types with them", but "you can solve some problems that sum types solve with them". If your problem statement is "I want sum types", then it's not surprising that sum types are the only solution. I simply wanted to express that it might be possible to do without them, if we focus on the problems you want to solve with them.

@alanfo

This makes it more awkward to express a relationship between type parameters using interfaces though not impossible as the graph example showed.

I think "awkward" is subjective. Personally, I find using parameterized interfaces more natural and the graph example a very good illustration. To me, a Graph is an entity, not a relationship between a kind of Edge and a kind of Node.

But TBH, I don't think either of them is really more or less awkward - you write pretty much exactly the same code to express pretty much exactly the same things. And FWIW, there is prior art for this. Haskell type-classes behave a lot like interfaces and as that wiki-article points out, using multi-parameter type-classes to express relationships between types is a pretty normal thing to do.

@stevenblenkinsop

There's no way to name the receiver type in the body of the interface.

The way you would address that is with type-arguments at usage-site. i.e.

type Adder(type T) interface {
    Add(t T) T
}

func Sum(type T Adder(T)) (vs []T) T {
    var t T
    for _, v := range vs {
        t = t.Add(v)
    }
    return t
}

This requires some care as to how unification works, so that you can allow self-referencing type-parameters, but I think it can be made to work.

Your 1. and 3. I don't really understand, I have to admit. I would benefit from some concrete examples.


Anyway, it's a bit disingenuous to drop this at the end of continuing this discussion, but this is probably not the right issue to talk through the minutiae of the generics design. I only brought it up to widen the design space for this issue a bit :) Because it felt like it has been a while since new ideas where brought into the discussion around sum types.

```go
switch task {
case tasks.SearchAdd:
    // task is tasks.SearchAdd in this scope
case tasks.SearchUpdate:
case tasks.SearchDelete:
}

would be much more Goish in spirit than any other approach Go allows in its current state. No need for Haskellish pattern matching, juist diving up to the certain type is more than enough.
I don't think that manipulating meaning of task variable is good idea, although acceptable.

Good luck with your visitors then.

Why you think that pattern matching can't be done in Go? If you lack of examples of patten matching, see, for example, Rust.

@Merovius re: "To me, a Graph is an entity"

Is it a compile-time entity or does it have a representation at runtime? One of the major differences between contracts and interfaces is that an interface is a runtime object. It participates in garbage collection, has pointers to other runtime objects, and so on. Converting from a contract to an interface would mean introducing a new, temporary runtime object that has pointers to the nodes/vertexes it contains (how many?), which seems awkward when you have a collection of graph functions, each of which might more naturally take parameters pointing at various parts of graphs in their own ways, depending on the function's needs.

Your intuition might be misled by using "Graph" for a contract, since "Graph" seems object-like and the contract isn't really specifying any particular subgraph; it's more like defining a set of terms to use later, as you would in mathematics or law. In some cases you might want both a graph contract and a graph interface, resulting in an annoying name clash. I can't think of a better name off the top of my head, though.

By contrast, a discriminated union is a runtime object. While not constraining the implementation, you need to think of what an array of them might be like. An N-item array needs N discriminators and N values, and there are a variety of ways that might be done. (Julia has interesting representations, sometimes putting the discriminators and values in separate arrays.)

To suggest a reduction of errors which are currently occurring all over the place with the interface{}schemes, but to remove the continuous typing of the | operator, I would suggest the following:

type foobar union {
    int
    float64
}

Just the use-case alone of replacing many interface{} with this kind of type-safety would be a massive gain for the library. Just looking at half the things in the crypto library could use this.

Issues such as: ah you gave in ecdsa.PrivateKey instead of *ecdsa.PrivateKey - here's a generic error that only ecdsa.PrivateKey is supported. The simple fact that these should be clear union types would increase type safety quite a bit.

While this suggestion takes up more _space_ compared to int|float64 it does force the user to think about this. Keeping the code-base a lot cleaner.

To suggest a reduction of errors which are currently occurring all over the place with the interface{}schemes, but to remove the continuous typing of the | operator, I would suggest the following:

type foobar union {
    int
    float64
}

Just the use-case alone of replacing many interface{} with this kind of type-safety would be a massive gain for the library. Just looking at half the things in the crypto library could use this.

Issues such as: ah you gave in ecdsa.PrivateKey instead of *ecdsa.PrivateKey - here's a generic error that only ecdsa.PrivateKey is supported. The simple fact that these should be clear union types would increase type safety quite a bit.

While this suggestion takes up more _space_ compared to int|float64 it does force the user to think about this. Keeping the code-base a lot cleaner.

See this(comment), it's my proposal.

Actually we can introduce both our ideas into language. This will lead to existance of two native ways to do ADT, but with different syntaxes.

My proposal for features, especially pattern matching, your for compability and ability to benefit from the feature for old code bases.

But looks like overkill, isn't it?

Also, sum type can be made to have nil as it's default value. Ofcourse it will require nil case in every switch.
Pattern Matching can be done like:
-- declaration

type U enum{
    A(int64),
    B(string),
}

-- matching

...
var a U
...
switch a {
    case A{b}:
         //process b here
    case B{b}:
         //...
    case nil:
         //...
}
...

If one don't like pattern matching - see sirkon's proposal above.

Also, sum type can be made to have nil as it's default value. Ofcourse it will require nil case in every switch.

Would'nt it be easyier to disallow non initiated value at compile time ? For cases where we need an initialised value we could add it to the sum type: ie

type U enum {
  None
  A(string)
  B(uint64)
}
...
var a U.None
...
switch a {
  case U.None: ...
  case U.A(str): ...
  case U.B(i): ...
}

Also, sum type can be made to have nil as it's default value. Ofcourse it will require nil case in every switch.

Would'nt it be easyier to disallow non initiated value at compile time ? For cases where we need an initialised value we could add it to the sum type: ie

Breaks existing code.

Also, sum type can be made to have nil as it's default value. Ofcourse it will require nil case in every switch.

Would'nt it be easyier to disallow non initiated value at compile time ? For cases where we need an initialised value we could add it to the sum type: ie

Breaks existing code.

There's isn't any existing code with sum types. Though I think the default value should be the something defined in the type itself. Either the first entry, or the first alphabetical, or something.

There's isn't any existing code with sum types. Though I think the default value should be the something defined in the type itself. Either the first entry, or the first alphabetical, or something.

I agreed with you on first thought, but after some reflection, the new reserved name for the union could have been used previously in some codebase (union, enum, etc.)

I think the obligation to check for nil would be quite painful to use.

It looks like a breaking change for backward compatibility that could only be solved by Go2.0

There's isn't any existing code with sum types. Though I think the default value should be the something defined in the type itself. Either the first entry, or the first alphabetical, or something.

But there are a lot of existing go code which is having a nil'able everything. That definitely will be breaking change. Worse, gofix and similar tools can only change variable types to Options(of same type) producing at least ugly code, all another cases it will simply break everything in world.

If nothing else, reflect.Zero needs to return something. But all of these are technical hurdles that can be solved - for example, this hurdle is pretty obvious if the zero value of a sum-type is well-defined and probably will be "panic", if not. The bigger question is still why a certain choice is the correct one and whether and how any choice fits into the language at large. IMO, the best way to address these is still to talk about concrete cases where sum types address specific problems or their lack created ones. The three criteria for an experience report apply for that.

Note in particular, that both "there should be no zero value and it should be disallowed to create uninitialized values" and "the default should be the first entry" have been mentioned above, several times. So whether you think it ought to be this way or that doesn't really add new information. But it makes an already humongous thread even longer and harder for the future to find the relevant information in it.

Lets consider reflect.Kind. There is an Invalid Kind, which has the default int value of 0. If you had a function that accepted a reflect.Kind, and you passed an uninitialized variable of that type, it would end up being Invalid. If, reflect.Kind can hypothetically be changed to a sum type, it should perhaps retain the behavior of having a named invalid entry as being the default one, rather than relying on a nil value.

Now, lets consider html/template.contentType. The Plain type is it's default value, and is indeed treated as such by the stringify function, as it is the fallback. In a hypothetical sum future, not only would you still need that behavior, but its also infeasible to use a nil value for it, since nil will not mean anything to a user of this type. It will be pretty much mandatory to always return a named value here, and you have a clear default of what that value should be.

It is me again with another example where algebraic/variadic/sum/whatever data types work nicely.

So, we are using noSQL database without transactions (distributed system, transactions don't work for us) yet we love data integrity and consistency for obvious reason and have to workaround concurrent access issues, usually with a bit complex conditional update queries over a single record (single record write is atomic).

I am having a new task to write a set of entities what can be inserted, appended or deleted (only one of these ops).

If we could have something like

type EntityOp oneof {
    Insert   Reference
    NewState string
    Delete   struct{}
}

The method could be just

type DB interface {
    …
    Capture(ctx context.Context, processID string, ops map[string]EntityOp) (bool, error)
}

One fantastic use for sum times is to represent nodes in an AST. Another one is to replace nil with an option that is checked at compile-time.

@DemiMarie but in today's Go, this sum can also be nil, as I proposed above, we can simply make nil to be variant of each enum, there will be case nil in each switch but this obligation is not so bad, especially if we want this feature without breaking all existing go code(currently we have nillable everything)

Don't know if it belongs here, but all this remainds me Typescript, where very cool feature called "String Literal Types" exists and we can do that:

var name: "Peter" | "Consuela"; // string type with compile-time constraint

It's like string enum, which is way better than traditional numeric enums in my opinion.

@Merovius
a concrete example is working with arbitrary JSON.
In Rust it can be represented as
enum Value {
Null,
Bool(bool),
Number(Number),
String(String),
Array(Vec),
Object(Map),
}

An union type as two advantages:

  1. Self documenting the code
  2. Allowing the compiler or go vet to check incorrect usage of an union type
    (e.g. a switch where not all types are checked)

For the syntax, the following should be compatible with Go1, as with type alias:

type Token = int | float64 | string

An union type can be implemented internally as an interface; what it is important is that using an union type allow the code to be more readable and catch errors like

var tok Token

switch t := tok.(type) {
case int:
    // do something
}

The compiler should raise an error, since non all Token types are used in the switch.

The problem with this, is that there is (to my knowledge) no way to store pointer types (or types which contain pointers, such as string) and non-pointer types together. Even types with different layouts would't work. Feel free to correct me but the problem is that precise GC doesn't work well with variables which can be pointers and simple variables at the same time.

We can go down the road of implicit boxing - like interface{} currently does. But I don't think that this provides sufficient benefits - it's still looks like a glorified interface type. Maybe some sort of vet check can be developed instead?

The garbage collector would need to read the tag bits from the union to determine the layout. This isn't impossible but would be a big change to the runtime that might slow down gc.

Maybe some sort of vet check can be developed instead?

https://github.com/BurntSushi/go-sumtype

The garbage collector would need to read the tag bits from the union to determine the layout.

That's the exact same race that existed with interfaces, when they could contain non-pointers. That design was explicitly moved away from.

go-sumtype is interesting, thanks. But what happens if the same package defines two union types?

The compiler could implement the union type internally as interface, but adding an uniform syntax and standard type checking.

If there are N projects using union types, each differently and with N large enough, maybe introducing the one way to do it may be the best solution.

But what happens if the same package defines two union types?

Nothing much? The logic is per-type and uses a dummy method to recognize implementers. Just use different names for the dummy methods.

@skybrian IIRC current bitmap which specifies type layout is currently stored in one place. Adding such thing per object would add a lot of jumps, and will make every optional object a GC root.

The problem with this, is that there is (to my knowledge) no way to store pointer types (or types which contain pointers, such as string) and non-pointer types together

I don't believe this is necessary. The compiler could overlap the layout for types when the pointer-maps match, and not otherwise. When they don't match, it would be free to lay them out consecutively or use a pointer approach as used for interfaces currently. It could even use non-contiguous layouts for struct members.

But I don't think that this provides sufficient benefits - it's still looks like a glorified interface type.

In my proposal, union types are _exactly_ a glorified interface type - a union type is just a subset of an interface that is only allowed to store an enumerated set of types. This potentially gives the compiler the freedom to choose a more efficient storage method for certain type sets, but that's an implementation detail, not the main motivation.

@rogpeppe - Out of curiosity, can I use the sum type directly or do I explicitly need to cast it to a known type to do anything with it? Because if I have to constantly cast it to a known type, I really don't know what benefits this gives than what is already given to us with interfaces. The main benefit I see is compile time error checking, kind of, as unmarshaling would still occur at runtime, which is more likely where you'd see an issue with an invalid type being passed. The other benefit being a more constrained interface, which I don't think warrants a language change.

Can I do

type FooType int | float64

func AddOne(foo FooType) FooType {
    return foo + 1
}

// if this can be done, what happens here?
type FooType nil | int
func AddOne(foo FooType) FooType {
    return foo + 1
}

If this can't be done, I don't see much of a difference with

type FooType interface{}

func AddOne(foo FooType) (FooType, error) {
    switch v := foo.(type) {
        case int:
              return v + 1, nil
        case float64:
              return v + 1.0, nil
    }

    return nil, fmt.Errorf("invalid type %T", foo)
}

// versus
type FooType int | float64

func AddOne(foo FooType) FooType {
    switch v := foo.(type) {
        case int:
              return v + 1
        case float64:
              return v + 1.0
    }

    // assumes the compiler knows that there is no other type is 
    // valid and thus this should always returns a value
    // Would the compiler error out on incomplete switch types?
}

@xibz

Out of curiosity, can I use the sum type directly or do I explicitly need to cast it to a known type to do anything with it? Because if I have to constantly cast it to a known type, I really don't know what benefits this gives than what is already given to us with interfaces.

@rogpeppe , please correct me if I'm wrong 🙏
Having to always perform pattern matching (that's how "casting" is called when working with sum types in functional programming languages) is actually one of the biggest benefits of using sum types. Forcing the developer to explicitly handle all the possible shapes of a sum type is a way to prevent the developer from using a variable thinking it is of a given type whereas it actually is a different one. An exaggerated example would be, in JavaScript:

const a = "1" // string "1"
const b = a + 5 // string "15" and not number 6

If this can't be done, I don't see much of a difference with

I think you states some advantages yourself didn't you?

The main benefit I see is compile time error checking, kind of, as unmarshaling would still occur at runtime, which is more likely where you'd see an issue with an invalid type being passed. The other benefit being a more constrained interface, which I don't think warrants a language change.

// Would the compiler error out on incomplete switch types?

Based on what functional programming languages do, I think this should be possible and configurable 👍

@xibz also performance since that can be done at compile time vs runtime, but then there's generics, hopefully, one day before I die.

@xibz

Out of curiosity, can I use the sum type directly or do I explicitly need to cast it to a known type to do anything with it?

You can call methods on it if all the members of the type share that method.

Taking your int | float64 as an example, what would the result of:

var x int|float64 = int(2)
var y int|float64 = float64(0.5)
fmt.Println(x * y)

Would it do an implicit conversion from int to float64? Or from float64 to int. Or would it panic?

So you're almost right - you'd need to type-check before using it in most cases. I believe that's an advantage, not a disadvantage though.

The runtime advantage might be significant, BTW. To continue with your example type, a slice of type [](int|float64) would not need to contain any pointers because it's possible to represent all instances of the type in a few bytes (probably 16 bytes due to alignment restrictions), which could lead to significant performance improvements in some cases.

@rogpeppe

you'd need to type-check before using it in most cases. I believe that's an advantage, not a disadvantage though.

I agree about it not being a disadvantage. I am just trying to see what benefits this gives us as opposed to interfaces.

a slice of type would not need to contain any pointers because it's possible to represent all instances of the type in a few bytes (probably 16 bytes due to alignment restrictions), which could lead to significant performance improvements in some cases.

Hm, I am not too sure I buy the significant part of this. I am sure in very rare cases, it would reduce the memory size by halfish. With that said, I don't think the memory saved isn't significant enough for a language change.

@stouf

Having to always perform pattern matching (that's how "casting" is called when working with sum types in functional programming languages) is actually one of the biggest benefits of using sum types

But what benefits does it bring to the language that isn't already handled with interfaces? Originally I was completely for the sum types, but as I started thinking about it, I kind of lost what benefits it would bring.


With all that said, if using a sum type can provide cleaner and more readable code, I'd be 100% for it. However, from the looks of it, it seems that it'd look almost identical to interface code.

@xibz pattern-matching would be useful in tree-walking code where you want to look more than one level deep in the tree. Type switches only let you look one level deep at a time, so you'd need to nest them.

This is a bit contrived, but for example if you have an expression syntax tree, to match a quadratic equation you might do something like:

match Add(Add(Mult(Const(a), Power(Var(x), 2)), Mult(Const(b), Var(x))), Const(c)) {
  // here a, b, c are bound to the constants and x is bound to the variable name.
  // x must have been the same in both var expressions or it wouldn't match.
}

Simple examples that only go one level deep won't show a big difference, but here we are going up to five levels deep which would be quite complicated to do with nested type switches. A language with pattern-matching could go multiple levels deep while also making sure you didn't miss any cases.

I'm not sure how much it comes up outside of compilers, though.

@xibz
A benefit of sum types is you and the compiler both know exactly what types can exist within the sum. That is essentially the difference. WIth empty interfaces, you will always have to worry and guard against misuses in the api, by always having a branch whose sole purpose is to recover when a user gives you a type you're not expecting.

Since there seems little hope for sum types to be implemented in the compiler, I hope that at least a standard comment directive, like //go:union A | B | C is proposed and supported by go vet.

With a standard way to declare a sum type, after N years it will be possible to know how many packages are using it.

made a small utility to create sum types emulation via interface with private method and respective implementations, it may be helpful for some: https://github.com/sirkon/go-oneof

With the recent design drafts for generics, perhaps sum types could be tied to them.

On of the drafts floated the idea of using interfaces instead of contracts, and the interfaces would have to support type lists:

type Foo interface { 
     int64, int32, int, uint, uint32, uint64
}

While that in itself would not produce a memory-packed union, but perhaps when using in a generic function or struct, it would not be boxed, and it would at least provide type safety when dealing with a finite list of types.

And maybe, using these particular interfaces within type switches would require that such a switch would be exhaustive.

This isn't the ideal short syntax (e.g.: Foo | int32 | []Bar), but it is something.

With the recent design drafts for generics, perhaps sum types could be tied to them.

On of the drafts floated the idea of using interfaces instead of contracts, and the interfaces would have to support type lists:

type Foo interface { 
     int64, int32, int, uint, uint32, uint64
}

While that in itself would not produce a memory-packed union, but perhaps when using in a generic function or struct, it would not be boxed, and it would at least provide type safety when dealing with a finite list of types.

And maybe, using these particular interfaces within type switches would require that such a switch would be exhaustive.

This isn't the ideal short syntax (e.g.: Foo | int32 | []Bar), but it is something.

Pretty similar to my proposal: https://github.com/golang/go/issues/19412#issuecomment-520306000

type foobar union {
  int
  float
}

@mathieudevos wow, I quite like that actually.

To me, the biggest oddity (the only remaining oddity, really) with the latest generics proposal is the type lists in interfaces. They just don't quite fit. Then you wind up with some interfaces that you can only use as type parameter constraints, and so on...

The union concept works great in my mind because you could then embed a union in an interface to accomplish "constraint that includes methods and raw types". Interfaces continue to function as-is, and with semantics defined around a union, they can be used in regular code and the feeling of strangeness goes away.

// Ordinary interface
type Stringer interface {
    String() string
}

// Type union
type Foobar union {
    int
    float
}

// Equivalent to an interface with a type list
type FoobarStringer interface {
    Stringer
    Foobar
}

// Unions can intersect
type SuperFoo union {
    Foobar
    int
}

// Doesn't compile since these unions can't intersect
type Strange interface {
    Foobar
    union {
        int
        string
    }
}

EDIT - Actually, just saw this CL: https://github.com/golang/go/commit/af48c2e84b52f99d30e4787b1b8d527b5cd2ab64

The primary benefit of this change is that it opens the door for general
(non-constraint) use of interfaces with type lists

...Great! Interfaces become fully useable as sum types, which unifies the semantics across regular and constraint usage. (Obviously not turned on yet, but I think that's a great destination to be heading for.)

I've opened #41716 to discuss the way that a version of sum types appears in the current generics design draft.

I just wanted to share an old proposal from @henryas about Algebraic Data Types. It's very good written with provided use cases.
https://github.com/golang/go/issues/21154
Unfortunately, It has been closed by @mvdan on the same day without any appreciation of the work. I'm pretty sure that person really felt that way and thus there are no further activities on the gh account. I feel sorry for that guy.

I really like #21154. It seems to be a different thing though (and hence @mvdan's) comment closing it as dupe not quite hitting. Reopen there or include in the discussion here?

Yeah, I would really like to have the ability to model some more high level business logic in a similar way as described in that issue. Sum types for enum-like, restricted options, and the suggested accepted types as in the other issue would be awesome in the toolbox. Business/domain code in Go sometimes feels a bit clunky at the moment.

My only feedback is that type foo,bar inside of an interface looks a bit awkward and second-class, and I agree that there should be a choice between nullable and non-nullable (if possible).

@ProximaB I don't understand why you say "there are no further activities on the gh account". They have since created and commented on a bunch of other issues as well, many of them on the Go project. I see no evidence that their activity has been influenced by that issue at all.

Furthermore, I strongly agree with Daniel closing that issue as a dupe of this one. I don't understand why @andig says that they propose something different. As far as I can understand the text of #21154, it proposes exactly the same thing we are discussing here and I wouldn't be at all surprised if even the exact syntax was already suggested somewhere in this megathread (the semantics, as far as described, most certainly were. Multiple times). In fact, I would go so far as to say Daniels closing is proven right by the length of this issue, because it already contains quite a detailed and nuanced discussion of #21154, so repeating all of that would have been arduous and redundant.

I agree and understand that it's probably disappointing to have a proposal closed as a dupe. But I don't know of a practical way to avoid it. Having the discussion in one place seems beneficial to everyone involved and keeping multiple issues for the same thing open, without any discussion on them, is clearly pointless.

Furthermore, I strongly agree with Daniel closing that issue as a dupe of this one. I don't understand why @andig says that they propose something different. As far as I can understand the text of #21154, it proposes exactly the same thing we are discussing here

Rereading this issue I agree. Seems I confused with this issue with generics contracts. I'd strongly support sum types. I didn't mean to sound harsh, please take my apology if it came across like that.

I'm a human and issue gardening can be tricky at times, so by all means point out when I make a mistake :) But in this case I do think that any specific sum types proposal should fork from this thread just like https://github.com/golang/go/issues/19412#issuecomment-701625548

I'm a human and issue gardening can be tricky at times, so by all means point out when I make a mistake :) But in this case I do think that any specific sum types proposal should fork from this thread just like #19412 (comment)

@mvdan is not human. Trust me. I am his neighbor. Just kidding.

Thank you for the attention. I am not that attached to my proposals. Feel free to mangle, to modify, and to shoot down any part of them. I have been busy in real life, so I haven't got a chance to be active in the discussions. It is good to know that people read my proposals and some actually like them.

The original intention is to allow grouping of types by their domain relevance, where they don't necessarily share common behaviors, and have the compiler enforce that. In my opinion, this is just a static verification problem, which is done during compilation. There is no need for the compiler to generate code that retains the complex relationship among types. The generated code may treat these domain types normally as if they are the regular interface{} type. The difference is that the compiler now does additional static type checking upon compilation. That is basically the essence of my proposal #21154

@henryas Great to see you! 😊
I'm wondering if Golang hadn't used duck typing would have that made the relationship between types much more strict and allow grouping object by their domain relevance as you described in your proposal.

@henryas Great to see you! 😊
I'm wondering if Golang hadn't used duck typing would have that made the relationship between types much more strict and allow grouping object by their domain relevance as you described in your proposal.

It would, but that would break the compatibility promise with Go 1. We probably wouldn't need sum types if we have explicit interface. However, duck typing is not necessarily a bad thing. It makes certain things more lightweight and convenient. I enjoy duck typing. It is a matter of using the right tool for the job.

@henryas I agree. It was a hypothetical question. Go creators definitely deeply considered all ups and downs.
On the other hand coding guid like verifying interface compliance would never appear.
https://github.com/uber-go/guide/blob/master/style.md#verify-interface-compliance

Can you please have this off-topic discussion somewhere else? There are a lot of people subscribed to this issue.
Open interface satisfaction has been part of Go since its inception and it's not going to change.

Was this page helpful?
0 / 5 - 0 ratings