Pegjs: Non-greedy operators for * , + , and ?

Created on 7 Oct 2011  ·  7Comments  ·  Source: pegjs/pegjs

I have a language where there are repeated instances of the same pattern where I only care about the first symbol. For example:

          system       OBJECT IDENTIFIER ::= { mib-2 1 }
          interfaces   OBJECT IDENTIFIER ::= { mib-2 2 }
          at           OBJECT IDENTIFIER ::= { mib-2 3 }
          ip           OBJECT IDENTIFIER ::= { mib-2 4 }
          icmp         OBJECT IDENTIFIER ::= { mib-2 5 }
          tcp          OBJECT IDENTIFIER ::= { mib-2 6 }
          udp          OBJECT IDENTIFIER ::= { mib-2 7 }
          egp          OBJECT IDENTIFIER ::= { mib-2 8 }

This simple example could be matched by this pattern (where _ is whitespace):

identifier _ "OBJECT IDENTIFIER" _ "::=" _ "{" _ identifier _ number _ "}"

This isn't such a big deal in this case (I already typed the pattern :-) But the language has a set of other big hairy constructs that don't warrant the full parsing (I only want the initial identifier on each line to do the job I have in mind).

I would like to type something like this pattern:

identifier _ "OBJECT IDENTIFIER" .*? "}"

where the ".*?" is non-greedy - it only consumes to the first occurrence of the terminal. Could this be on the list for PEG.js? Many thanks.

Most helpful comment

In PEG formalism, you can easily match until a terminator by using a predicate together with the . metacharacter. Something like:

"OBJECT IDENTIFIER" (!"}" .)* "}"

Is that sufficient for you?

All 7 comments

Update: This could be satisfied by a repetition count (which is a generalization of my initial thought) as suggested in Google Groups at: http://groups.google.com/group/pegjs/browse_thread/thread/2bea15581be45187

In PEG formalism, you can easily match until a terminator by using a predicate together with the . metacharacter. Something like:

"OBJECT IDENTIFIER" (!"}" .)* "}"

Is that sufficient for you?

Yes, that works perfectly. Thanks!

@dmajda What's the recommended practice for stripping out the empty char returned by the !"}" expression?

For example:

   = chars:(!"-suffix" .)+ "-suffix"

"foo-suffix" => [[ '', 'f' ], ['', 'o' ], ['', 'o' ]]  // result
"foo-suffix" => ['f', 'o', 'o' ] // desired result

I was able to achieve this by breaking !"-suffix" . into its own rule that just returns the . result, but I'm curious if there's a better way.

66 will fix this.

I think in the mean while you can use:

    = chars:(!"-suffix" c:. {return c})+ "-suffix"

@islandr Please don't use issues as a place to ask questions about PEG.js usage. Especially when they are closed and especially when you are asking something that other people beside me can help you with. The proper channel is the Google Group.

Sorry David. Thought this would have been a good place since it was
directly related to the example you'd given.

On Wed, Jan 9, 2013 at 9:51 PM, David Majda [email protected]:

@islandr https://github.com/islandr Please don't use issues as a place
to ask questions about PEG.js usage. Especially when they are closed and
especially when you are asking something that other people beside me can
help you with. The proper channel is the Google Grouphttp://groups.google.com/group/pegjs
.


Reply to this email directly or view it on GitHubhttps://github.com/dmajda/pegjs/issues/57#issuecomment-12083927.

Was this page helpful?
0 / 5 - 0 ratings