Go: proposal: spec: binary integer literals

Created on 27 Feb 2017  ·  91Comments  ·  Source: golang/go

Currently, Go supports octal and hexadecimal integer literals in addition to the standard decimal literals you'd expect. In an effort to round out this group, I propose adding binary integer literals as well. They would come in the form as a new prefix to integer literals: 0b or 0B.

Initial Disclaimer

This was my first dive into the Go source code and primarily a learning experience for me to get my feet wet submitting changes and what-not. That being said, I appreciate any and all criticism, comments, and suggestions.

The "Why": Prior Art

Binary literals exist or have appeared in mainstream languages as well, including:

All of the above cases have settled on a convention of using 0b or 0B to prefix binary literals, suggesting that this would be a fairly comfortable and sensible choice for Go as well, avoiding needless invention and providing similarity for programmers coming from these other aforementioned languages.

The "Why": Continued

I managed to find some earlier discussions relating to this, albeit this was after I had already implemented the feature and it has more to do with changing the octal syntax than specifically relating to binary literals. But https://github.com/golang/go/issues/12711#issuecomment-142338246 from @griesemer does mention that "both the 'o' for octal and 'b' for binary notation were also discussed in the design of Go. It's simply not enough bang for the buck." However, I don't see this as a good argument against adding something so simple to the language. Especially considering the fact that nowadays more and more languages have adopted the syntax for binary literals, it seems the earlier Go design decisions might need to be looked at under new light.

Example usage

const (
    SOME_MASK   = 0b00001111
    SOME_FLAG_A = 0b00000001
    SOME_FLAG_B = 0b00000010
    SOME_FLAG_C = 0b00000100
    SOME_FLAG_D = 0b00001000
)

Implementation

As I stated this was more of a learning experience for me, I already have changes that implement this feature ready:

CL-37502 spec: specify syntax for binary integer literals
CL-37503 cmd/compile/internal/syntax: scan binary literals
CL-37504 go/scanner: scan binary integer literals
CL-37505 strconv: support parsing binary integer literals
CL-37506 test: extend int_lit with binary literal usages

FrozenDueToAge Go2 LanguageChange NeedsDecision Proposal Proposal-Accepted

Most helpful comment

Let's look at why the aforementioned languages have moved forward and added support for binary literals. Let's start with C++14 as it is the first one on the list. What were the points James Dennett, then Google employee, brought forward?

Use of an 0b/0B prefix for binary literals is an existing GCC extension (also supported by Clang), and is the same syntax as Java 7, Python, and D.

Why would this particular point benefit Go?

Familiarity with the language.

Many developers move from programming language to programming language. Correct me if I'm wrong, but we all try what we know from one language in another one. You could in a way say that Go's familiarity with C and C++ attracted those kinds of developers who wanted a particular feature like GC, but didn't like other languages. This is one of the reason's the company I currently intern for has chosen to move to Go.

Besides experienced developers, let's also have a look what the benefit is of familiarity for beginner developers. Let's also take the use case of flags that @eddieringle mentioned. Explaining to a complete beginner how flags work is a lot harder when you have to explain it in octal or hexadecimal as it requires the person to learn those as well.

Lastly I would like to add here that the purpose of every language (at least that's what I hope) is to write clean code. And I think that is something we all agree on regardless. When looking over someone else's code, it is immediately clear without any explanation that when having a list of binary literal constants, that those are flags. That same thing is a lot less straightforward when using hexadecimal or octal. Below is a comparison.

// Hexadecimal
const (
    MASK          = 0x1E
    DEFAULT_COLOR = 0x00
    BOLD          = 0x01
    UNDERLINE     = 0x02
    FLASHING_TEXT = 0x04
    NO_CHANGE     = 0x08
)

// Octal
const (
    MASK          = 036
    DEFAULT_COLOR = 00
    BOLD          = 01
    UNDERLINE     = 02
    FLASHING_TEXT = 04
    NO_CHANGE     = 010
)

// Binary
const (
    MASK          = 0b11110
    DEFAULT_COLOR = 0b00000
    BOLD          = 0b00001
    UNDERLINE     = 0b00010
    FLASHING_TEXT = 0b00100
    NO_CHANGE     = 0b01000
)

I do think that no thought needs to be given to the fact that the last constants are used for flags. These are also very few flags, so keep in mind that this definitely adds up when having more flags. The first constant 0x1E can definitely turn some heads when it's declared without context. Using binary literals alone may indicate that a variable might be used as a flag.

The referred C++ PDF further refers to the aforementioned languages for support. So let's look at those next. I have found the (original?) proposal by Derek Foster in 2009 for binary literals in JDK. Source

The first thing it questions, which I completely agree with, is why there is an octal representation in the JDK, but there is no binary representation in the JDK. In the past few years I have never thought to myself: "Oh, octals would make my code cleaner!" This however refers to the previous point I made: familiarity. There is one thing it however adds to my previously made point:

When the data being dealt with is fundamentally bit-oriented, however, using hexadecimal to represent ranges of bits requires an extra degree of translation for the programmer, and this can often become a source of errors. [...] then a programmer coding to that specification must translate each such value from its binary representation into hexadecimal. [...] In most cases, programmers do these translations in their heads, and HOPEFULLY get them right. however, errors can easily creep in, and re-verifying the results is not straightforward enough to be done frequently.

Hexadecimal and octal notation used primarily in hardware instead of binary can result in human errors. In the comparison I gave previously, I checked what I did with my head by entering what I thought was octal into Google which confirmed my answers. I am automatically sure of my case when I write it in binary, but am not when I write it in hexadecimal or octal. And no matter how many times you do this a day, it makes it harder to write code because you have to think of the binary form in your head and while doing so it is possible to make errors.

To dig deeper into the question to why there is an octal notation, but no binary notation I have another question to ask which was also asked by Derek Foster, the author of the binary literal proposal for the JDK: "Why did Go chose to use the 0 prefix for octal notations?" @griesemer commented that we should not jump the gun when implementing new features:

Let's wait and see what other people say before jumping the gun. Thanks.

But hasn't Go jumped the gun when implementing the octal notation? If its argument was "because other languages do it", then why can't that argument not be used for binary literals? If it wasn't, then what was the reason that the 0 prefix for octal notations made it into the language when it confuses people?

Someone might incorrectly think that "0b1" represented the same value as hexadecimal number "0xB1". However, note that this problem has existed for octal/decimal for many years (confusion between "050" and "50") and does not seem to be a major issue.

- Derek Foster

There don't seem to be more points in favour of binary literals because it is something we all refer to in our heads. This is why I feel proposals for other languages have been brief like this one. It however is not a reason to shut it down so quickly.

All 91 comments

This has come up before. There is a significant amount of work involved in rolling this out, making the compiler and spec changes is trivial. But there's a lot of libraries that should also be made consistent (strconv, math/big, etc.).

If we make a change in this direction, it should be more thorough and support arbitrary bases. I am against this as is.

@griesemer Yes, the changes I'm about to submit also modifies strconv (to my understanding, it's actually required to support this change).

@griesemer I disagree however that any change should support arbitrary bases or otherwise no change be made at all. From prior reading, it sounds like that would be a good goal for Go2; this is simply polishing Go1 with syntax developers of other languages might expect when using Go. (i.e. Base-2 is a common enough case, probably more common than octal; base-14 or what-have-you is less-so.)

CL https://golang.org/cl/37503 mentions this issue.

CL https://golang.org/cl/37504 mentions this issue.

CL https://golang.org/cl/37502 mentions this issue.

CL https://golang.org/cl/37505 mentions this issue.

CL https://golang.org/cl/37506 mentions this issue.

However, I don't see this as a good argument against adding something so simple to the language.

It's not a particularly strong argument in favour of adding them, either.

IMHO you'll need to expand the "why" section explaining exactly what advantages supporting binary literals will bring to people writing go code.

I don't find them particularly useful; hex is a much more readable and compact format for literals that have a "bit-level meaning".

You gave an 'usage example', but it's not very compelling. I would write those constants using 0xfs and shifts for the others.

@EddieRingle This proposal has not been discussed widely nor has it been accepted. Please do not spam us with code reviews. The Go Team has enough to do with work that's actually important.

It is clear to everybody that adding a simple feature to the language is trivial. It's also clear that plenty of people would like this feature (I myself would have liked it sometime). But that said, just because one can, is not an argument that one should. Any however small and simple addition to a language has long-term costs. If we accept this, it will become even more difficult in the future to have a more general mechanism, and we need to remain backward compatible.

Let's wait and see what other people say before jumping the gun. Thanks.

Reminder of our no-me-too policy: https://golang.org/wiki/NoMeToo

Opinions without constructive content can be expressed using Github's emoji reactions.

@ALTree

IMHO you'll need to expand the "why" section explaining exactly what advantages supporting binary literals will bring to people writing go code.

I don't find them particularly useful; hex is a much more readable and compact format for literals that have a "bit-level meaning", IMO.

I'd argue the opposite, actually. Hex is more compact in many cases, yes, but binary literals would be an exact "bit-level" representation and therefore as readable as you could get.

@griesemer

This proposal has not been discussed widely nor has it been accepted. Please do not spam us with code reviews. The Go Team has enough to do with work that's actually important.

Apologies. It was originally a single change but since it seems like the Go policy is to break up commits based on the area of codebase affected, that's how I ended up splitting them up. I wasn't aware the bot would make individual comments here for every change. I wouldn't be so cold as to call it spamming you, however, nor imply that any effort I put in using my free time is unimportant.

Any however small and simple addition to a language has long-term costs. If we accept this, it will become even more difficult in the future to have a more general mechanism, and we need to remain backward compatible.

Like was mentioned before, the general-purpose route (which I would prefer, as well) would also encourage the deprecation/removal of the existing (confusing) octal syntax, no? The feeling I got was that the general-purpose syntax (2r0010 or 2x0010 for base-2, for example) was invention meant for Go2, where breaking changes would be welcome anyway.

Putting a potential Go2 aside, to address the statement that "_if we accept this, it will become even more difficult in the future to have a more general mechanism_": I just don't see how this is true. Adding the binary literal prefix would be orthogonal to an alternative, general-purpose syntax, especially the one you described in #12711 (in fact, that syntax conflicts directly with hexadecimal literals, but would not with this proposed binary literal syntax). They would exist side-by-side just as the general-purpose syntax would with the existing octal, hex, and decimal literals.

Apologies. It was originally a single change but since it seems like the Go policy is to break up commits based on the area of codebase affected, that's how I ended up splitting them up. I wasn't aware the bot would make individual comments here for every change. I wouldn't be so cold as to call it spamming you, however, nor imply that any effort I put in using my free time is unimportant.

It's not just that the bot sends mail about CLs, it's that each mailed CL is a request for a Go reviewer to spend time reviewing it.

The 0b syntax is nice because it's familiar but if the true goal is simply to add binary literals to the language, I would much prefer the generic solution over the familiar.

Is there any technical reason the generic option can't be implemented prior to 2.0? I've had a number of cases lately where binary literals would have been preferred over hex and it would be nice to have that option in 1.9 or 1.10 instead of waiting (possibly many years) until 2.0.

@wedow I think it would help to see specific real cases where binary literals are useful. Please share the cases where binary literals would be helpful. Thanks.

I don't see "should support arbitrary bases" as a worthwhile objection. It adds complexity/cost for little or not additional benefit. Over all the years I've been hacking, the would-be-useful bases I've heard of people wanting to use are 2, 8, 10, 12, and 16, and possibly 64 (we have base64 encoding, after all).

Let's look at why the aforementioned languages have moved forward and added support for binary literals. Let's start with C++14 as it is the first one on the list. What were the points James Dennett, then Google employee, brought forward?

Use of an 0b/0B prefix for binary literals is an existing GCC extension (also supported by Clang), and is the same syntax as Java 7, Python, and D.

Why would this particular point benefit Go?

Familiarity with the language.

Many developers move from programming language to programming language. Correct me if I'm wrong, but we all try what we know from one language in another one. You could in a way say that Go's familiarity with C and C++ attracted those kinds of developers who wanted a particular feature like GC, but didn't like other languages. This is one of the reason's the company I currently intern for has chosen to move to Go.

Besides experienced developers, let's also have a look what the benefit is of familiarity for beginner developers. Let's also take the use case of flags that @eddieringle mentioned. Explaining to a complete beginner how flags work is a lot harder when you have to explain it in octal or hexadecimal as it requires the person to learn those as well.

Lastly I would like to add here that the purpose of every language (at least that's what I hope) is to write clean code. And I think that is something we all agree on regardless. When looking over someone else's code, it is immediately clear without any explanation that when having a list of binary literal constants, that those are flags. That same thing is a lot less straightforward when using hexadecimal or octal. Below is a comparison.

// Hexadecimal
const (
    MASK          = 0x1E
    DEFAULT_COLOR = 0x00
    BOLD          = 0x01
    UNDERLINE     = 0x02
    FLASHING_TEXT = 0x04
    NO_CHANGE     = 0x08
)

// Octal
const (
    MASK          = 036
    DEFAULT_COLOR = 00
    BOLD          = 01
    UNDERLINE     = 02
    FLASHING_TEXT = 04
    NO_CHANGE     = 010
)

// Binary
const (
    MASK          = 0b11110
    DEFAULT_COLOR = 0b00000
    BOLD          = 0b00001
    UNDERLINE     = 0b00010
    FLASHING_TEXT = 0b00100
    NO_CHANGE     = 0b01000
)

I do think that no thought needs to be given to the fact that the last constants are used for flags. These are also very few flags, so keep in mind that this definitely adds up when having more flags. The first constant 0x1E can definitely turn some heads when it's declared without context. Using binary literals alone may indicate that a variable might be used as a flag.

The referred C++ PDF further refers to the aforementioned languages for support. So let's look at those next. I have found the (original?) proposal by Derek Foster in 2009 for binary literals in JDK. Source

The first thing it questions, which I completely agree with, is why there is an octal representation in the JDK, but there is no binary representation in the JDK. In the past few years I have never thought to myself: "Oh, octals would make my code cleaner!" This however refers to the previous point I made: familiarity. There is one thing it however adds to my previously made point:

When the data being dealt with is fundamentally bit-oriented, however, using hexadecimal to represent ranges of bits requires an extra degree of translation for the programmer, and this can often become a source of errors. [...] then a programmer coding to that specification must translate each such value from its binary representation into hexadecimal. [...] In most cases, programmers do these translations in their heads, and HOPEFULLY get them right. however, errors can easily creep in, and re-verifying the results is not straightforward enough to be done frequently.

Hexadecimal and octal notation used primarily in hardware instead of binary can result in human errors. In the comparison I gave previously, I checked what I did with my head by entering what I thought was octal into Google which confirmed my answers. I am automatically sure of my case when I write it in binary, but am not when I write it in hexadecimal or octal. And no matter how many times you do this a day, it makes it harder to write code because you have to think of the binary form in your head and while doing so it is possible to make errors.

To dig deeper into the question to why there is an octal notation, but no binary notation I have another question to ask which was also asked by Derek Foster, the author of the binary literal proposal for the JDK: "Why did Go chose to use the 0 prefix for octal notations?" @griesemer commented that we should not jump the gun when implementing new features:

Let's wait and see what other people say before jumping the gun. Thanks.

But hasn't Go jumped the gun when implementing the octal notation? If its argument was "because other languages do it", then why can't that argument not be used for binary literals? If it wasn't, then what was the reason that the 0 prefix for octal notations made it into the language when it confuses people?

Someone might incorrectly think that "0b1" represented the same value as hexadecimal number "0xB1". However, note that this problem has existed for octal/decimal for many years (confusion between "050" and "50") and does not seem to be a major issue.

- Derek Foster

There don't seem to be more points in favour of binary literals because it is something we all refer to in our heads. This is why I feel proposals for other languages have been brief like this one. It however is not a reason to shut it down so quickly.

Here's another option, which seems clearer to me than any of the integer constants.

// Shifts
const (
    MASK          = 0x1e
    DEFAULT_COLOR = 0
    BOLD          = 1<<0
    UNDERLINE     = 1<<1
    FLASHING_TEXT = 1<<2
    NO_CHANGE     = 1<<3
)

(And shouldn't mask be 0xf, not 0x1e?)

I'm mildly against adding binary constants, at least in Go 1. I'd be for adding them in Go 2, though. The reason for the difference is that if for whatever reason someone is stuck at Go 1.8, when Go 1.9 comes out with binary constants, if one of the (transitive) imports of that person's code uses binary constants, then they can no longer build their project using Go 1.8. They would have to vendor or upgrade. There's a definite cost to adding a forward-incompatible feature which should weigh against its utility.

I do agree that I don't see any need for bases not in {2,8,10,16}. The case for octal seems particularly shaky, I'd be for removing octal in Go 2.

@randall77 I disagree that shifting looks cleaner. In my head I'm still representing them as binary numbers and probably always will. It would make it easier to remove that calculation I do in my head.

(And shouldn't mask be 0xf, not 0x1e?)

The name MASK was merely taken from the JDK proposal and is not really in line with the other constants. But it does show that 0x1E and hexadecimals already cause confusion.

I can understand the point you're trying to make wanting to move it to Go 2. But I disagree that we should have to support projects that downgrade their Go version from 1.9 to 1.8. It would make languages changes a nightmare to deal with. I do however not know how Go looks at this, it would be most wise to follow what compatibility Go has in mind.

I wholeheartedly support your position on removing octal in Go 2.

I just reread my previous comment (specifically, "The Go Team has enough to do with work that's actually important."). I want to apologize for this statement which was a rather offensive formulation of what I actually meant to say. So let me try again, elaborating a bit more and hopefully finding the right tone this time:

We do appreciate proposals that are well substantiated and, if necessary, come with prototype implementations. That said, the Go proposal process is light-weight on purpose, and no extra work is required by the proposer unless requested or necessary to understand the proposal. Sending change lists that are not requested and/or don't fix an issue is counter-productive since somebody will have to take the time and look at them (if only to postpone or close them). A better approach, if one does want to prototype/write the code ahead of time, is to link to the changes elsewhere (for instance, a private GitHub commit). This will leave the Go Team and external contributors the choice: They can decide to look at that code if they want to, or otherwise focus on higher-priority items. Thanks.

@griesemer Gotcha, I understand and that makes sense. I assumed the Go team treated their Gerrit like AOSP does theirs, and figured my changes could exist there while this was discussed. Linking to a branch here on GitHub is less work anyway, so it's a win-win, I guess. :)

I actually did the work first since my main goal was to hack on the compiler. It was after the fact that I decided to submit it as a proposal.

@AndreasBackx The leading 0 for octal in Go was discussed in issue #151. See also #12711.

When defining 1-set-bit constants, shifting is more readable than 0b00001000..00 for the simple reason that in the shift version you don't need to count a bunch of zeros on the screen just to understand which bit is set; you just read the shift value.

0b100000000000000000000000 vs 1 << 23

As far as real world usage goes, a common variable length integer encoding method is to use the high bit for "read more". I've had to use it to extract git packfiles. Here's the code to extract the lower bits in various bases:

b & 127
b & 0x1f
b & 0177
b & 0b01111111

I personally think that the binary version more clearly shows the intent.

You still have the shift option mentioned previously
If you think it's not readable, use an helper function

b & ^(^0 << 7)
b & mask(7)

@AndreasBackx, 1<<12 is more clear than 0b0001000000000000 because I don't have to count all those zeroes. It's apparent it's a mask because the 12 falls between 11 and 13, or uses iota. When having to match an arbitrary pattern, e.g. masking bits from an instruction word, then hex is better because a programmer used to dealing with bits can read 0xae and "see" 10101110 by virtue of knowing 0xa, ten, is 1010, mnemonic ten-ten, just as one learns times tables, 65 is ASCII A, etc. Hex is a more dense representation that's easier to parse for the human reader.

@randall77, aren't 0644, 02775, etc., a bit tedious without octal? That's why it's still kicking about.

@RalphCorderoy : yes, it seems to me that octal survives for the sole reason of constructing an os.FileMode.
0664 = 6<<6 + 6<<3 + 4, which isn't too tedious. It would be better if os provided symbolic constants to make this easier, or at least clearer.

We already know how to avoid the counting zeroes problem: we should support 0b1e10 to mean 1 followed by 10 zeroes, in binary. Admittedly this would work better if we had a way to concatenate binary constants rather than adding them.

I actually did the work first since my main goal was to hack on the compiler.

Excellent. If you'd like some ideas for places to continue to hack on the compiler while this gets discussed, feel free to drop me an email--(github username) @ gmail.

@RalphCorderoy 1<<12 is more clear than 0b0001000000000000 because I don't have to count all those zeroes.

A solution to this problem would be to allow for some kind of separation. Java allows for underscores in numeric literals. It has been shortly discussed in #42, but there weren't many arguments against it, if there were any comments on that issue in the first place.

@ianlancetaylor's solution should perhaps also be considered.

It's apparent it's a mask because the 12 falls between 11 and 13, or uses iota.

I'm sorry, but this perhaps is the case for you. But not for everyone else.

When having to match an arbitrary pattern, e.g. masking bits from an instruction word, then hex is better because a programmer used to dealing with bits can read 0xae and "see" 10101110 by virtue of knowing 0xa, ten, is 1010, mnemonic ten-ten, just as one learns times tables, 65 is ASCII A, etc. 

This leaves room for error in the code as has been previously stated by me and proposals made for other languages that found it a valid reason. You're also assuming here that every "programmer" knows hex off the top of their head and that isn't the case. You might work with a lot of hardware, but most people don't. Definitely beginners would prefer the binary literals over the hexadecimal representation.

Hex is a more dense representation that's easier to parse for the human reader.

Does dense mean that it is cleaner? No it doesn't. People write one-liners all the time for crazy things and the reason that those are somewhat impressive is because that code is so dense and unreadable that we all wonder what witchcraft lies hidden behind the meaning of each character.

1 << 10 is much clearer than 0b1e10.

I find binary literals difficult to read. Very often what you need rounds up to three or four bit segments, which are much easier and less error prone to read and write in octal or hexadecimal. When things don't round out to such even boundaries, a shift is also much easier to read and write, and less error prone.

Some form of concatenation would make binary literals easier to read and write, at the cost of inconsistency. Why can binary literals be concatenated, but no other type of numeric literals? Why not hex? At this point, this discussion becomes boundless.

Personally, I would prefer some sort of generic radix mechanism. I do not think binary literals bring enough to the table (and I only write low-level code).

Also, I am pretty sure we had this discussion several times before.

(As a side note, the demise of octal has been greatly exaggerated. Octal is useful beyond setting file modes. I certainly use octal literals more that I would use binary literals.)

It's a bit surprising to me to see so many personal opinions being given as arguments about a change to a language. I'm unsure how to quantify comments related to how useful you would find it personally.

If some reason personal feelings have merit points, I'll arbitrarily say that I use binary literals in Java and I can validate my opinion on this by saying I've been programing for 100 years and I own a car.

Next, debating if it's easier to use shifting to define masks is like arguing that Gregorian calendars are easier to use than Chinese calendars. Just because you find it easier to use does not mean everyone does. The fact that binary literals exist in other languages should probably be an indication that someone found them useful furthering that the shifting argument is not much of an argument as it is just simply an alternative.

This has come up before. There is a significant amount of work involved in rolling this out, making the compiler and spec changes is trivial. But there's a lot of libraries that should also be made consistent (strconv, math/big, etc.).

This is a solid argument against this proposal and I would completely understand hesitation in making changes that create a significant amount of work.

I'd argue the opposite, actually. Hex is more compact in many cases, yes, but binary literals would be an exact "bit-level" representation and therefore as readable as you could get.

Funny thing about learning binary is that you have to actually read and write binary and then perform math on it. Writing in hex or decimal or octal or base64 (lul) can indirectly aid in learning binary but I heard that just learning the thing you want to learn directly is useful (probably just an opinion though).

Personally, I would prefer some sort of generic radix mechanism.

I wish every language had this in a literal form.

@randall77: As #151 stated, there are several reasons for keeping octal. Yes, one is the setting of file modes, but that's the last important. The other two are the change of semantics it would be to an integer literal that starts with 0, and the importance of safety porting code from every other C-like language, in which octal constants have this syntax. It's true that no one of these is compelling but taken together they meet the bar. Anyway the issue is decided, at least for Go 1.

As for binary constants, I don't think they carry their weight. Very few programs would benefit from them, and even then the benefit is small.

@robpike It would be safe to just fail to compile anything that looked like an octal constant (starts with 0 but not "0").

Let's leave this for Go 2.

Why wait? It doesn't break anything.
On Mon, Mar 6, 2017 at 15:19 Russ Cox notifications@github.com wrote:

Let's leave this for Go 2.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/19308#issuecomment-284535766, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABLfW7bN2NicSthvEvMeGEhqExg2et-qks5rjHhtgaJpZM4MNgUY
.

@DocMerlin, because Go isn't a language that keeps accreting features just because it can. All language changes are basically on hold for now until they can be evaluated together as a group, so they look and feel and work together like a cohesive whole. That is why this is labeled Go2.

@DocMerlin I like to add to @bradfitz 's comment that we've made exceptions in the past when there's pressing needs. There is clearly no pressing need here.

These are also very few flags, so keep in mind that this definitely adds up when having more flags.
SOME_FLAG_D = 0b0000000001000000000000000000000

At a quick glance is this 2^19 or 2^20? See the problem?

Thanks @davecheney, I've caught up with the thread. I hadn't considered that there could be a massive effort involved in making the standard library support binary integer literals.

It's obvious that there are no use cases where a routine that handles bit manipulation on integers can't be correctly expressed due to the lack of binary literals, and nothing is gained performance-wise by expressing integer data in one base over another.

However, there are plenty of binary encoding situations (video codecs, data compression, binary network protocols, etc.) where bit masks, binary constants, and other bitmapped data can be made more legible in source code if data is expressed in base 2.

It's a matter of legibility and style for people who handle bitmapped data.

Legibility and style are the same reasons why octal integer literal notation has been supported by Go on day one. The inclusion of support for octal integer literals is most likely a decision that has to do with the handling of Unix file permissions. It's hard to imagine on this day and age many practical uses of octal notation other than legacy support of Unix-style permissions and the readability of this data in code.

Octal support is nonetheless helpful to show that there are only two simple functions in strconv tasked with the handling of octal strings.

archive/tar/strconv.go:func (p *parser) parseOctal(b []byte) int64
archive/tar/strconv.go:func (f *formatter) formatOctal(b []byte, x int64)

To very roughly assess the change impact of adding binary literals support, one possible method is to check the code footprint of equivalent support for octal, a trivial thing because octal is so rarely used that it's easy to identify the places and situations in which base 8 is supported.

In my local copy right now, a rough search shows that most of these are parse and format operations.

vxv@vxs:/gosource$ grep -i -r octal * | wc -l
73
vxv@vxs:/gosource$ grep -i -r octal * | grep "func " | wc -l
2

Granted that this is a trivial and simpleminded search, but the proportions of the problem don't seem to present an unsurmountable task.

No worries. For the record I don’t have a dog in this race, I’m just here to garden the issues. It’s up to others to decide on the fate of this proposal for go 2

However, there are plenty of binary encoding situations (video codecs, data compression, binary network protocols, etc.) where bit masks, binary constants, and other bitmapped data can be made more legible in source code if data is expressed in base 2.

I've used all of the above, and I've never imagined base 2 could have any benefits for these. Would you point to a concrete example to convince me otherwise?

Readability should be a main reason for implementing binary literals. I'm currently writing a engine that takes advantages of bit positioning, and I use uint16, uint32 for several use cases and each slice/section of those uints represents different info.

In chess we use encoded moves by adding flags, from and to position into a uint16. It would be nice to see a binary literal implementation so the code can show, only by itself, which sections relates to which info.

...
constexpr uint_fast16_t FLAG_SPECIAL1  {0b0010000000000000};
constexpr uint_fast16_t FLAG_SPECIAL0  {0b0001000000000000};
constexpr uint_fast16_t RANGE_FLAG     {0b1111000000000000};
constexpr uint_fast16_t RANGE_FROM     {0b0000111111000000};
constexpr uint_fast16_t RANGE_TO       {0b0000000000111111};

This is my code example from C++17. In Go however, it will look like this:

const FlagSpecial1 uint16 = 8192
const FlagSpecial2 uint16 = 4096
const RangeFlag uint16 = 61440
const RangeFrom uint16 = 4032
const RangeTo uint16 = 63

Essentially useful for writing clean and simple code for anyone that works with bits and masks.

Can't Go get a precompiler for this and then you won't have to rewrite anything? Since after all it's astethics (in my eyes).

Which bit is set in (c++)

constexpr uint_fast16_t FLAG_SPECIAL0  {0b0001000000000000};

vs

which bit is set in (Go)

const FlagSpecial0 = 0x10000

I'm probably not the only one who can tell immediately in the later case only.

With the 0b00.. approach you can see it, without having to know hex numbers. It's easier to read when you have a large list of uint16. Understanding that the set bit is at the 13 position, since the size is listed, in the example you gave is easier than using hex. 1<<13, would be much better than hex as well as you don't have to look up values, you can just look at it and know which bits are targeted. But for a range or multiple set bits, it can be easier to just use a binary literal.

Looking at the later cases like 61440 I'm impressed you can tell immediately, and you think that it's easier to know which bits are set using decimals than a binary literal, but not everyone sees this.

But you just ignored the other cases like 0b0000111111000000 which is 0xfe0 or 4064 in decimal. In my opinion that is cleaner using binary literal. 16 bits is a larger number, but look at bytes:

0b11101010 vs 0xea. You don't even have to think what is being targeted, you just know it as soon as you take a look.

@andersfylling, If you're programming bit masks then you really need to be able to read hex: RangeFlag wouldn't be 61440 but 0xf000 which makes instantly obvious it's the top nibble of the 16 bits.

For those thinking they have novel insight on this issue, can you please wade through all comments from the top, taking particular note of Brad's reference to the NoMeToo policy, before waking the rest of us from our slumber.

Hex and bit shifting do it all for me (plus the octal file permissions), but I’m wondering if learning bit manipulation in C may have been easier with binary integer literals. I like how the above examples look in binary. Maybe I’d use this for small masks or for shifting small masks.

x := y & (0b101 << 8)

(edit: the better Go way is x := y & 0b101<<8)

I just ran into this lack of feature today. In my example use case, I'm using the integer day-of-week field of a date (0..6 to mean Sunday..Saturday) and checking it against a preference bitmask. Because the source of the integer is programmatic, I don't define a set of constants for each day of the week (my code has no reason to ever talk about SUNDAY specifically), so 1 << 3 syntax isn't useful here. However, I want a default value for the preference bitmask, which would be perhaps 0b0111110. Obviously it's easy to write this default as decimal (126) or hex (0x7e), but it's considerably clearer to write it in binary.

Re: octal, note that between python2 and pyhon3, they dropped support for the 0110 format and now require 0o110 instead. Then they backported 0o110 parsing to python2 without dropping the old format there, making it easy to start using the new less-error-prone syntax, without breaking compatibility with the old version. In python2, a substantial number of python users had ended up accidentally declaring octal numbers when pasting in 0-padded decimal numbers, leading to confusion. (Common problem: length-padded serial numbers from a parts database, or length-padded invoice numbers.) I actually once spent a confused half-hour because of this, trying to figure out why my "obviously correct" unit test was failing.

On the other hand, I've never once needed support for language constants in an abitrary radix. Maybe someone has, but that seems like a red herring (and the syntax needed to support it sounds like it would be pretty ugly).

Another example that just burned me was defining addresses in the embedded protocols (I2C, SPI, CAN, etc...) where there is often an address defined as a binary constant in the data sheet shifted that has some sort of read / write bit as part of the value. Converting them to hex adds one more layer of translation the human brain has to do, thus one more thing to question when debugging.

Hi @tapir, Please read my earlier comment https://github.com/golang/go/issues/19308#issuecomment-352290337 especially https://github.com/golang/go/wiki/NoPlusOne as NoMeToo is now called.

Here's a counter-proposal which permits arbitrary radix notation at about the same (or less) cost: #28256 . I proposes a notation I have alluded to directly or indirectly in various discussions, but never formally written down. Please comment there for opinions.

See #28493 (independent proposal) discussing the use of _ as separator between digits.

If you're revisiting this old/shelved discussion for Go 2, then I would suggest looking at octal at the same time. Obviously, you can't remove the 0123 notation (for compatibility reasons - not without a deprecation period at least), but you can add 0o123 at the same time as you add 0bXXX. This would enable a more consistent set of number-base identifiers, promoting Go's nice uniformity of syntax and programmer expectation.

But, on its own, the binary proposal would still be worthwhile.

Would strconv.ParseInt support the 0b010 syntax when base is 0?

@nathany Yes, if we decide to support the 0b prefix in the language we should also support it in the library. For instance, the compiler currently relies on math/big.Int's SetString method to convert constant literals into big.Ints - so one would expect SetString (and friends, such as strconv.ParseInt) to understand 0b as well.

Perhaps less obvious is whether ParseInt should also filter _ separators if we accept this together with #28493. It would be easy as a separate path (replace _ with `) but for error handling (e.g.; do we allow_` anywhere, or not) and performance we might need to handle it in the ParseInt (and all other ParseXXX routines for numbers).

Should there be a separate proposal/issue for the 0o octal literal or is it better to keep it together with this?

0o would match Swift and Rust. We could look into their reasoning for preferring this syntax.

One reason I've seen to prefer 0o over 0 is to avoid ambiguity with strconv.ParseInt("012", 0, 64) where "012" could be user-input. However, I don't know if this is much of an issue in Go as other languages, being that Atoi always uses base 10, and there are no default arguments in Go, so the programmer must explicitly ask to derive the base from the string prefix by specifying 0 for the base.

Can't say I've ever needed to use _ in a string being parsed. Less obvious indeed.

@nathany I'd suggest a separate issue. I agree that if we decide to permit 0b it might make sense to also allow 0o for consistency (and then we may want gofmt to auto-rewrite 0-prefix octals into 0o prefix octals so that the former could be slowly phased out from the code base). But this proposal is about binary integer literals; let's keep it on that.

My original question did not receive a response, so I read the entire thread again to see why a thread
with so many thumbs up has so little discussion.

So far the proposal asks for binary literals for the following reasons:

other languages have them

The proposal goes into meticulous detail about how other languages have them and how others are used to them. Has this ever been a good enough reason before?

they are "more readable"

I disagree. They're a dastardly pattern of ones and zeroes that have no easy pronunciation without a
conversion to base16.

  • You just know

0b11101010 vs 0xea. You don't even have to think what is being targeted, you just
know it as soon as you take a look.

_The first low order bit, and the third low order bit, and the fifth low order bit are off, but not the rest of them. And there are [counts the total amount of bits twice to make sure the number is right] eight
bits total._

I know what the pattern is, and I will probably remember it for a few seconds. How is that knowledge
inherently useful?

  • Go vs C++

Some arguments misrepresent their benefit, perhaps unintentionally. In one particular example in
this thread, the following snippet was posted comparing go and c++.

constexpr uint_fast16_t FLAG_SPECIAL1  {0b0010000000000000};
constexpr uint_fast16_t FLAG_SPECIAL0  {0b0001000000000000};
constexpr uint_fast16_t RANGE_FLAG     {0b1111000000000000};
constexpr uint_fast16_t RANGE_FROM     {0b0000111111000000};
constexpr uint_fast16_t RANGE_TO       {0b0000000000111111};

This is my code example from C++17. In Go however, it will look like this:

const FlagSpecial1 uint16 = 8192
const FlagSpecial2 uint16 = 4096
const RangeFlag uint16 = 61440
const RangeFrom uint16 = 4032
const RangeTo uint16 = 63

The problem is the C++ is meticulously aligned while the Go is unformatted, lacks a const block and incorrectly uses decimal instead of hex (which you can't easily convert to binary by splitting each hex digit into four binary ones).

const (
    FlagSpecial1 uint16 = 0x2000
    FlagSpecial2 uint16 = 0x1000
    RangeFlag    uint16 = 0xf000
    RangeFrom    uint16 = 0x0fc0
    RangeTo      uint16 = 0x003f
)

protocols specifications publish binary sometimes

Another example that just burned me was defining addresses in the embedded protocols
(I2C, SPI, > CAN, etc...) where there is often an address defined as a binary constant in the
data sheet shifted > that has some sort of read / write bit as part of the value. Converting
them to hex adds one more layer of translation the human brain has to do, thus one more
thing to question when debugging.

The problem is the human brain shouldn't be doing this for you in the first place. |

Consider your debugging experience again. Are going to dump binary integer literals to stderr or
and grep for them later? Will you be sharing these numbers with colleagues by saying each 1
and 0 out loud? Odds are you rather output and transmit them in hex, and if that is true, it is
also true that the source code should express those digits in hex in order to remove the need
for the the human brain (or program) to do even more work for the reader.

Many specifications express 1010 to mean a bitstream consisting of those ordered states. This doesn't map to the byte-wise notion of binary integer literals, and will surely burn someone expecting
to implement a bitstream reader. (I would rather have go implement a bitstream reader in the standard library than support binary integer literals).

I just ran into this lack of feature today. In my example use case, I'm using the integer day-of-week
field of a date (0..6 to mean Sunday..Saturday) and checking it against a preference bitmask.
Because the source of the integer is programmatic, I don't define a set of constants for each day
of the week (my code has no reason to ever talk about SUNDAY specifically), so 1 << 3 syntax
isn't useful here. However, I want a default value for the preference bitmask, which would be
perhaps 0b0111110. Obviously it's easy to write this default as decimal (126) or hex (0x7e), but
it's considerably clearer to write it in binary.

I would have the days of the week as unexported constants and build the mask by ORing their values. I disagree binary integer literals would help make anything more clear in this situation.

@as Thanks for your comment. It has been recorded.

Clearly we don't _need_ binary integer literals; we have a reasonably close way to express such numbers using hexadecimal literals (and I have pointed out as much in my blog post). But at the same time they seem to address a pain point for many programmers, and this is something we could easily fix here without adding much complexity to speak of to the language.

Perhaps a better way to think about this issue is whether we want to bring Go up to par with most other programming languages in this respect and complete the set of integer literal representations by supporting all relevant bases (2, 8, 10, 16).

That is a question that's separate from personal sentiment about the usefulness of binary literals and may be easier to answer.

Change https://golang.org/cl/152338 mentions this issue: spec: add binary integer literals (tentative)

See https://golang.org/cl/152338 for the spec changes pertinent to this proposal (ignoring _ separators in this first step).

Change https://golang.org/cl/152377 mentions this issue: spec: permit underscores for grouping in numeric literals (tentative)

I was looking for benchmark for various sort algos where pseudorandom was set as 0xff & (i ^ 0xab) If it were 0b10101011 instead of 0xab it would be more readable. I am surprised there is no binary literals in Go, not even a proposal...

@andrewmed This _is_ the proposal for binary integer literals.

Got it, thanks

We've posted a combined proposal for #19308, #12711, #28493, and #29008 at golang.org/design/19308-number-literals.

Note that this will be the first proposal to follow the process outlined in the blog post: we will have everything ready to go and checked in at the start of the Go 1.13 cycle (Feb 1), we will spend the next three months using those features and soliciting feedback based on actual usage, and then at the start of the release freeze (May 1), we will make the "launch decision" about whether to include the work in Go 1.13 or not.

Thanks for your feedback and all your help improving Go.

Originally suggested to me by @rsc: it might be wise to deprecate (but still support) 0X for hexadecimal, and then not add 0B (unnecessary) and 0O (unnecessary, confusing, and hard to read).

@robpike ...and have gofmt start changing 0X to 0x.

@josharian, yeah, I also brought that up for octal numbers (0644 -> 0o644) but at least in the octal case we could only really do it in goimports where we'd know the go.mod-declared minimum language version for the code in that module.

But for 0X -> 0x it could be done in gofmt, yeah.

I have no strong feelings about 0B vs 0b and 0O vs 0o (many code editors write a zero with a slash through them which looks different than an uppercase O; personally I always use lower-case).

But the main point of adding these new formats is to be compatible with other languages and ease the pain for people coming from such languages, and perhaps translating code coming from elsewhere. It would defeat that purpose if said people or code is using uppercase in these prefixes and Go couldn't digest those literals after all.

I also note that there will be a bit of an inconsistency with the exponent where we allow E and e (and newly P and p).

In short, while I totally support the sentiment, disallowing upper-case 0B seems like a gratuitous difference that doesn't help people used to lower-case 0b anyway (which I'm guessing is a majority) and hurts the others.

On the other hand, having gofmt make the change automatically (or maybe with -s) seems like a good idea.

as an embedded developer who works with individual bits and bitmasks a lot, binary literals would be a welcome change. i recently discovered them in (GC)C and was pleasantly surprised.
of course pretty much anyone can reasonably quickly understand 0x1, 0x80 and 0x8000. but something like 0x1c makes you pause. Of course, (7 << 2) is a bit better 0b00011100 is just more readable and conveys the meaning - three contiguous bits 2 positions to the left - more clearly.

Change https://golang.org/cl/157677 mentions this issue: cmd/compile: accept new Go2 number literals

Change https://golang.org/cl/159997 mentions this issue: go/scanner: accept new Go2 number literals

Change https://golang.org/cl/160018 mentions this issue: cmd/gofmt: test that Go 2 number literals can be formatted

Change https://golang.org/cl/160239 mentions this issue: go/constant: accept new Go2 number literals

Change https://golang.org/cl/160240 mentions this issue: go/types: add tests for new Go 2 number literals

Change https://golang.org/cl/160247 mentions this issue: fmt: scan new number syntax

Change https://golang.org/cl/160250 mentions this issue: math/big: add %#b and %O integer formats

Change https://golang.org/cl/160248 mentions this issue: text/template: accept new number syntax

Change https://golang.org/cl/160246 mentions this issue: fmt: format 0b, 0o prefixes in %#b and %O

Change https://golang.org/cl/160244 mentions this issue: strconv: add 0b, 0o integer prefixes in ParseInt, ParseUint

Change https://golang.org/cl/160184 mentions this issue: cmd/gofmt: normalize number prefixes and exponents

Change https://golang.org/cl/160478 mentions this issue: design/19308-number-literals: add note about gofmt

As a reminder, we introduced a new process for these Go 2-related language changes in our blog post blog.golang.org/go2-here-we-come. We are going to tentatively accept a proposal, land changes at the start of a cycle, get experience using it, and then make the final acceptance decision three months later, at the freeze. For Go 1.13, this would mean landing a change when the tree opens February, and making the final decision when the tree freezes in May.

We are going to tentatively accept this proposal for Go 1.13 and plan to land its implementation when the tree opens. The issue state for "tentative accept" will be marked Proposal-Accepted but left open and milestoned to the Go release (Go1.13 here). At the freeze we will revisit the issue and close it if it is finally accepted.

Change https://golang.org/cl/161098 mentions this issue: spec: document new Go2 number literals

Change https://golang.org/cl/161199 mentions this issue: text/scanner: accept new Go2 number literals

Change https://golang.org/cl/163079 mentions this issue: text/scanner: don't liberally consume (invalid) floats or underbars

Change https://golang.org/cl/173663 mentions this issue: unicode/utf8: use binary literals

Change https://golang.org/cl/174897 mentions this issue: cmd/compile: disable Go1.13 language features for -lang=go1.12 and below

As a reminder, we introduced a new process for these Go 2-related language changes in our blog post blog.golang.org/go2-here-we-come. The Go 1.13 development cycle is now over and it’s time for the final decision.

The feedback on the Go 2 number literal changes has been strongly positive, with very few negative voices. These changes modernize and harmonize Go’s number literal syntax without adding significant complexity to the language: There is now a uniform prefix notation for the three common non-decimal number bases, which matches the notation used in other modern programming languages. The introduction of hexadecimal floating-point literals addresses a pain point for people concerning themselves with numeric code. The suffix “i” may now be used with any (non-imaginary) number literal to create an imaginary constant in a uniform way. And finally, underscores may be used to split longer literals into groups of digits for improved readability.

Proposal accepted for Go 1.13. Closing because the changes have landed.

- rsc for proposal review

Change https://golang.org/cl/189718 mentions this issue: compiler: support new numeric literal syntax

Was this page helpful?
0 / 5 - 0 ratings