Ninja: Clarifications about ninja spec

Created on 1 Apr 2015  ·  9Comments  ·  Source: ninja-build/ninja

Bunch of minor clarifications:

1. The Ninja manual mentions "whitespace" in a few places; what exactly does that mean? Are tabs and/or carriage returns considered whitespace? (The fact that Ninja is encoding agnostic suggests at least that Unicode whitespace isn't allowed.)

The current ninja implementation seems to error on \r and \t. It might be clearer to just say "spaces" instead of "whitespace", and to explicitly state the rule about \r and \t.

2. Are there any restrictions on acceptable variable names? Is "foo$ bar = 3" a valid variable assignment? Are "${foo bar}" and/or "${foo$ bar}" valid references to that variable?

The ninja implementation seems to only allow identifiers like [a-zA-Z0-9_.-]+ (notably allowing identifiers like "42"), though the "${foo}" syntax needs to be used for identifiers with periods in them. This doesn't seem documented anywhere.

3. "If a line is indented more than the previous one, it’s considered part of its parent’s scope; if it is indented less than the previous one, it closes the previous scope."

It's not clear what "its parent" means here, IMO.

Also, strictly interpreted, for this input:

rule foo
    x = 1
    y = 2
  z = 3

the "y = 2" line is not indented "more than the previous one" (i.e., "x = 1"), so it should _not_ be considered part of its parent's scope.

Similarly, the "z = 3" line is indented "less than the previous one", so it should "close the previous scope".

As far as I can tell looking at the ninja implementation, the amount of indenting is insignificant as there's no nested scoping within a file: any indented assignment is simply associated with the last non-indented line.

4. It seems that the number of spaces is significant on the right hand side of an assignment statement, but this isn't mentioned anywhere that I can tell.

5. Comments only seem to work as standalone lines, though intuitively I would have expected them to behave like shell comments. Probably worth noting.

6. The spec says "The full lookup order for a variable expanded in a build block (or the rule is uses) is: [...] 2. Build-level variables from the build block."

But the current implementation runs "echo 1 2 4" given this build.ninja:

rule echo
    command = echo $args
args = 1
args = $args 2
build hello: echo
    args = $args 3
    args = $args 4

7. The current implementation rejects undefined escape sequences like "$!"; seems worth documenting that other uses of $ are invalid.

8. There's no escape sequence for literal "|" characters, though that's an uncommon character for file names.

Most helpful comment

I think Ninja's implementation is the definitive answer for most questions but also that we're fine with changing Ninja's implementation where it doesn't make sense.

You might consider lifting the ban on tabs, then, because enforcing a specific style of formatting is just repeating Make's mistake.

The lexer should raise an error only if tabs and spaces are mixed within the same file, which (of course) should be forbidden. The first indented line could set the expected indentation for the rest of the file.

All 9 comments

Might be better to bring this up on the mailing list, or file separate bugs on the issues you specifically want to fix. :) Stuff like putting #5 in the docs seems like a simple pull request.

For many of these corner cases I deliberately left it unsaid, because I wasn't quite sure what the answer should be and I didn't want to tie the implementation, nor do I think it's really useful to have a manual with 20% more text that documents every last corner. I think Ninja's implementation is the definitive answer for most questions but also that we're fine with changing Ninja's implementation where it doesn't make sense. (I think the error message if you include a "\r" mentions that we could fix things to make it allowed if you really care.) I know that this ambiguity might cause problems if we change behaviors across versions but in practice it's never been a problem.

Re 3, originally I had imagined multiple nested scopes of indenting (like Haskell "where" clauses) and then it never ended up being useful. I agree the docs could remove all mention of parent and simplify to just "indented or not".

Re 4, could you elaborate?

Re 7 and 8, see this recent thread: https://groups.google.com/forum/#!topic/ninja-build/UVYk9RhCE4M

For item 4, these two builds have the same behavior:

build a.out: link foo.o bar.o
build a.out: link foo.o  bar.o

but these two rules have different behavior:

rule hello_one_space_world
    command = echo "hello world"
rule hello_two_space_world
    command = echo "hello  world"

I understand why, but I don't think it's very obvious from the manual.

Context: My first idea for writing a ninja parser was to treat the value part of "var = value" the same as a list of paths (i.e., a sequence of space-separated tokens) and then I'd just rejoin them with a space later, but that's actually wrong. (Of course, I need to lex "value" tokens separately anyway because of : and | characters too, so now I'm just tracking whether the last token was an "=".)

fwiw, I wrote a formal (PEG) grammar for Ninja files a while ago that I meant to get merged into the Ninja docs but never got around to it. I think it would answer most of your questions ...

https://github.com/dpranke/glop/blob/master/test_grammars/ninja.pymeta

Yes also these are different:

command = echo foo$
  bar
command = echo foo $
  bar

I agree the best way to treat this is that = begins a specially-lexed section (much like how " begins a specially-lexed section in C).

dpranke: Thanks, that is helpful. (I note a handful of discrepancies though; is it useful to write them up somewhere?)

That also makes me realize you can't use "$\n" to wrap identifiers (e.g., rule and variable names), so the lexer needs to contextually know to look for those too (as opposed to paths and values, where it is allowed). Seems like all the special cases make the traditional lexer + parser generator solution impractical for such a simple grammar. PEG seems nicer here.

On Wed, Apr 1, 2015 at 5:38 PM, Matthew Dempsky [email protected]
wrote:

dpranke: Thanks, that is helpful. (I note a handful of discrepancies
though; is it useful to write them up somewhere?)

Yes, if there are discrepancies then my python ninja clone is probably
buggy and I'd like to fix that :). We should also push the grammar into the
ninja docs.

In the implementation, there are two lexers: the one for reading tokens (which doesn't understand $) and the one for reading identifiers and variable values (which does understand $). Unfortunately as you observe that does mean the lexing is dependent on the parse.

I think Ninja's implementation is the definitive answer for most questions but also that we're fine with changing Ninja's implementation where it doesn't make sense.

You might consider lifting the ban on tabs, then, because enforcing a specific style of formatting is just repeating Make's mistake.

The lexer should raise an error only if tabs and spaces are mixed within the same file, which (of course) should be forbidden. The first indented line could set the expected indentation for the rest of the file.

Was this page helpful?
0 / 5 - 0 ratings