Pegjs: Allow returning match result of a specific expression in a rule without an action

Created on 23 May 2016 · 10Comments · Source: pegjs/pegjs

It's very common to need to return a value from one of the non-terminals in a rule or inside of a parenthesized sub-rule. For example:

varDecl = type:type id:ID init:( EQ e:expr {return e} )?
                { return scopedAST('VARDECL', {type, id, init}) }

In this case, I needed the expr labeled as e inside the init parenthesis level for an optional phrase in the language. I didn't need the "noise word" EQ as part of the returned value.

If the PEGjs language had a symbol to be used to mark terminals like the expr above so that they, and only they, are the value returned from a grammar rule or sub-rule this case would be simpler.

To rewrite my example above:

varDecl = type:type id:ID init:( EQ ^expr )?
                { return scopedAST('VARDECL', {type, id, init}) }

Note the use of the ^ to mark the expr value inside the init parenthesized optional phrase sub-rule to designate what is bound to init. This simplifies many situations both with and without the parenthesized sub-rule shown in this example.

Thanks for making such a wonderfully simple, elegant, and powerful tool. I love PEGjs! :smile:

feature

Source

alanmimms

👍3

All 10 comments

Love that idea! ^ is very intuitive, too.

This could work on non-nested rules too:

WhiteSpacedIdentifier = WhiteSpace? identifier:Identifier WhiteSpace {return identifier;}
// becomes
WhiteSpacedIdentifier = WhiteSpace? ^Identifier WhiteSpace?

opatut on 25 May 2016

👍1

Very readable! Presumably, use of multiple ^ would also work, such that:

a = ^b  c  ^d  e

Would return [b, d]? Seems to make sense.

Likewise, seems to make sense that if mixed with named captures, the ^ rules are ignored, so

x = a ^b foo:c { return foo; }

Would return only c.

grrrwaaa on 5 Jul 2016

Oh that multiples idea is excellent. Mixing with named captures should be
an error.

On Tue, Jul 5, 2016, 01:07 Graham Wakefield [email protected]
wrote:

Very readable! Presumably, use of multiple ^ would also work, such that:

a = ^b c ^d e

Would return [b, d]? Seems to make sense.

Likewise, seems to make sense that if mixed with named captures, the ^
rules are ignored:

x = a ^b foo:c { return foo; }

—
You are receiving this because you authored the thread.

Reply to this email directly, view it on GitHub
https://github.com/pegjs/pegjs/issues/427#issuecomment-230413015, or mute
the thread
https://github.com/notifications/unsubscribe/ABC26k8v0DIzuWUlkoDZGm2ep10Y5bcMks5qShDAgaJpZM4IkuA9
.

alanmimms on 5 Jul 2016

I totally agree the described pattern is a quite common. Having a way to express it without an action makes sense.

What I’m not so sure about the proposed solution (the ^ operator). Using a special character whose meaning is not immediately obvious is always problematic and adds to the learning curve. It’s also possible the character would be better used for some other purpose. Last but not least, I don’t like the idea of putting things that don’t directly influence parsing into expressions much. One can argue there is already one instance of this — the $ operator — and I’d agree. But I’m not sure whether addition of $ wasn’t a (small) mistake. If so, I’d like to avoid making it again.

I’ll think about this more deeply after 1.0.0.

dmajda on 31 Jul 2016

👍1

Some more food for thought: since ^ and labeled expressions kind of collide (@grrrwaaa suggest ignoring the ^), how about instead of marking the result, one could mark the _ignored_ expressions, for example (syntax suggestion!) by providing an empty label:

WhiteSpacedIdentifier = WhiteSpace? identifier:Identifier WhiteSpace {return identifier;}
// becomes
WhiteSpacedIdentifier = :WhiteSpace? Identifier :WhiteSpace?

There, no new syntax (we have : already), only a bit of extension on the semantics:

allow empty labels (call these "anonymous" expressions?)
if only one non-anonymous capture exists, do not generate an array of expression matches, instead return the only match

opatut on 1 Aug 2016

In that case more consistent will mark with empty "labels" those expressions which will need to be returned as a result. It, by the way, not to break the existing semantics: the label exists, but it is unnamed; as labels are introduced for access to result, it is quite logical that unnamed labels automatically become result. Simultaneous existence of automatic and concrete labels shall be forbidden. If only one automatic label exist, then the single result, but not an array with one element must be returned since such behavior is more demanded.

Mingun on 1 Aug 2016

👍1

@Mingun

Why not just return any label?
start = "{" :expr "}" // return expr
start = "{" label:expr "}" // return label
I think it makes sense that if you "label" something then you want to do something with it (e.g. return it).

On the other hand, why rules like start = ex:expr :expr should rise an error?
Maybe it should do something similar to javascript's functions' arguments variable? For example start = ex:expr :expr should return [ex, expr]. When you have an action, there should be labeled & arguments variables (start = ex:expr :expr { return [ex, arguments[0], ex] } )

@alanmimms I like this idea. We don't have to create a name (a variable/label) just to return simple value.
I think unnamed label (:expr) would be better than ^expr

nedzadarek on 30 Jan 2017

Why not just return any label?

@nedzadarek because if you give a name to expression it is more likely that you wont to use it in some no-trivial expression. At least, the name is important for you, otherwise you wouldn't give it, truly? Also, mixing named and unnamed labels more likely are mistake than conscious action, so it will be safer if it will be forbidden. If you give one name why not provide another?

Unfortunately, it is necessary to recognize that automatic labels in that look in what they are offered by @opatut, it is impossible to implement since it creates ambiguity in grammar. The elementary example:

start = a :b;// `a` - it is rule reference or label?
a = .;
b = .;

So, for this purpose need select another character. At the moment there is a choice from: ~, (backslash), @, #, %, ^, -, |, \ and ,.

Another solution -- introduce some pseudo-actions -- a shortcuts for creation of simple functions for return, for example, {=>[]} can mean _"collect the labeled results from the sequence and to return them in the array"_, and {=>{}} -- the same, but to return of an object, with the keys equal to names of labels. But implementation of this behavior doesn't require extension of grammar and can be quite realized by plug-ins. I would even tell that it is more preferable to have such implementation by plug-ins:

start1 = a:'a' b c d:. {=>[]};// returns ['a', <d value>]
start2 = a:'a' b c d:. {=>{}};// returns { a: 'a', d: <d value> }

Mingun on 31 Jan 2017

@Mingun

because if you give a name to expression it is more likely that you wont to use it in some no-trivial expression. At least, the name is important for you, otherwise you wouldn't give it, truly?

Yes, the name is important => I want to use it => I want to return it.
What's the problem with non-trivial expressions?

Unfortunately, it is necessary to recognize that automatic labels in that look in what they are offered by @opatut, it is impossible to implement since it creates ambiguity in grammar. The elementary example:

Yes.
I guess ::expression is confusing too? @dmajda

nedzadarek on 1 Feb 2017

Closed as duplicate of #235

Edit: Added note to OP's comment on #235 that references this issue