Pegjs: How to define a rule to match with a pattern multiple times in PEG.js?

Created on 25 Jan 2020  ·  22Comments  ·  Source: pegjs/pegjs

Issue type

  • Bug Report:
  • Feature Request:
  • Question: yes
  • Not an issue:

Prerequisites

  • Can you reproduce the issue?: yes
  • Did you search the repository issues?: to an extend
  • Did you check the forums?: no
  • Did you perform a web search (google, yahoo, etc)?: yes

Description


I'm trying to parse a file where the pattern might be seen multiple times:

G04 hello world*
G04 foo bar*

The corresponding PEG.js grammar is:

  = "G04" _ content:String* _ EOL
  {
    return content
  }

_ "whitespace"
  = [ \t\n\r]*

String
  = value:[a-zA-Z0-9.(): _-]+
  {
    return value.join('') 
  }

EOL
  = [*] _ 

Expected behavior:

I would expect PEG.js to produce a 2 item array for each G04 line.

Actual behavior:

The following error is thrown:

Line 2, column 1: Expected end of input but "G" found.

Software

  • PEG.js: online version
  • Node.js:
  • NPM or Yarn:
  • Browser:
  • OS:
  • Editor:

All 22 comments

Please read the peg example grammars. There is no bug here. Issue trackers are not intended for help requests. One of the example grammars is exactly what you're asking for.

The appropriate place for help requests is StackOverflow. Issue trackers are what the developers use to keep track of what's broken, and what package repositories use to measure project quality.

Document
  = ClassRow+

ClassID
  = "G04"

ClassTitle
  = title:[^\n]+ { return title.join(''); }

ClassRow = 
  id:ClassID title:ClassTitle '\n'? { return { id, title }; }

Once you've seen this, please close this issue. Thanks.

image

The key is to learn to read PEG grammars. That says:

  1. "A Document is one or more ClassRows."
  2. "A ClassID is the fixed string "G04"."
  3. "A ClassTitle is any text up to, but not including, the next newline. You'll call that "title". Return title as a string, not an array of characters."
  4. "A ClassRow is a ClassID followed by a ClassRow.

Because ClassRow ends in a newline, a newline effectively starts the new row.

I'll use StackOverflow for my further questions, thanks for the answer and the explanations.

However, there are some things I want to express:

  1. The "Issue type: Question: [yes/no]" part is a little bit misleading regarding to the directive you mentioned. I interpreted that section as "The issue tracker is the right place to ask questions"
  2. "examples": I can see 4 example grammars in the examples/ folder. However, IMHO, none of them are suitable (simple enough) for beginners (like me, or for me) except for the "arithmetics.pegjs". I understand that PEG.js is under (heavy?) development, so it's quite understandable that you might be more focused on complex real world problems/scenarios. I just expected step by step examples from simple to complex grammars.

Please consider that as a newcomer feedback.

The "Issue type: Question: [yes/no]" part is a little bit misleading regarding to the directive you mentioned. I interpreted that section as "The issue tracker is the right place to ask questions"

I agree. I want to remove that text, and asked to in 2017.

The reason that text is there is because David, who no longer runs this library, was tired of people saying "this is a bug" without looking at issues and finding the other person who thought there was a bug, and there wasn't

This issue, for example, has half a dozen clones

.

"Examples": I can see 4 example grammars in the examples/ folder. However, IMHO, none of them are suitable (simple enough) for beginners (like me, or for me) except for the "arithmetics.pegjs".

I agree. I would like to write a lot of them.

.

I understand that PEG.js is under (heavy?) development

It most certainly is not.

The original author had let it sit idle for a year, so he asked for new maintainers.

The new maintainer who took over in May of 2017 has not released a single byte of code to the master branch.

I have begun the process of agitating for a takeover, because the library's usage is dropping, the library doesn't support javascript from 2014, there's been no readme on NPM for almost three years, single-character AST fixes are unreplied to for a year in issues, and the new maintainer has decided to throw away 2.5 years of a feature branch that he's marked as closing a ton of issues, and replace the whole library with something new he wrote himself in a different language

.

it's quite understandable that you might be more focused on complex real world problems/scenarios. I just expected step by step examples from simple to complex grammars.

I believe that user onboarding is probably the single most important real world problem right now following getting this library healthy again

For clarity, I'm not the original author. @dmajda is.

For clarity, I'm not the maintainer. There is no maintainer.

The "Issue type: Question: [yes/no]" part is a little bit misleading regarding to the directive you mentioned. I interpreted that section as "The issue tracker is the right place to ask questions"

@ceremcem, you did everything right (other than what you didn't look at PEG's description on wikipedia and not to try to parse you grammar manually, after which your question, IMHO, would be resolved). You cannot know your question is simply a question or a description of a bug in the library. This can only be decided by the developer. Therefore, there is no rule GitHub only for bugs. Issue is issue even in Africa

in general an issue tracker is meant for issues, not questions

ceremcem would have gotten an answer in hours on stackoverflow. here, he waited nine days, and if i hadn't spoken up, i think he wouldn't have gotten an answer.

many similar questions like these are unanswered here after a year or more. there's half a dozen that are eight years old.

.

You cannot know your question is simply a question or a description of a bug in the library.

Yes he can. He was asking "how do I do this?"

Unless he believes the parser cannot do this, that's never a bug.

It's basically the simplest possible thing in a parser, and I believe that peg is still the most heavily used javascript parser, though that's rapidly ceasing to be true, and he appears to be intelligent, so I don't think he believed that a parser generator wasn't able to use a rule more than once

In his best interests, it's ideal to direct him to a resource that is designed for questions, rather than library repair, especially if the question resource is a highly active one, and the library repair resource just announced that it's scuttling three years of work and generally ignores questions like these

It's not coincidence that this went unanswered until I started tagging people, as did half a dozen other issues so far

No decisions are being made. I was giving him advice.

Also, I answered his question.

I usually get an answer in few of hours in StackOverflow, so I'm a regular user of it. However, there were no activities on my SO question, so I came and asked here. Basically, response times are nearly identical.

I've seen every combination of issue tracker usage:

  • For bug reports only, if a separate forum or mail group is provided (PaperJS, for example),
  • For both questions and bug reports (RactiveJS <3, FreeCAD_Assembly3 <3)
  • For something I don't actually get, along with a separate forum, where the usual suspects took over and are talking on behalf of project owners, which causes more frustration than anything else (like KiCAD (grrr))
  • For nothing (Espruino, AFAIR). Every issue is immediately closed and you are forced to open an appropriate thread in their forum.

There is no one single best choice for that (no one fits every case), but I do like using issue tracker for everything.

You cannot know your question is simply a question or a description of a bug in the library.

We've seen this many times on FreeCAD_Assembly3 library. Many of my simple, dumb questions revealed one or more bugs. This happens, I saw.

I agree. I would like to write a lot of them.

I like your approach to this library. You seem to care a lot.

so I don't think he believed that a parser generator wasn't able to use a rule more than once

Correct. My intention was not a bug report. I just didn't find the way out to reuse the same rule for multiple lines.

However, there were no activities on my SO question, so I came and asked here.

Oh man, has the peg community there died too?

That's so sad ☹️

Okay, if you already made an SO post, then at that point you're 100% correct to come here

.

I've seen every combination of issue tracker usage:

Yep, people violate community norms all the time

.

I agree. I would like to write a lot of them.

I like your approach to this library. You seem to care a lot.

Very much. I wanted to be involved in the 2017 ownership transition, but someone gave a plan with 17 major releases, and the old owner believed them

That person gave up on their first minor release three years later.

The truth is public software is really hard to write. Almost everyone I know, including myself, has the tendency to say "this release isn't ready until X, Y, and Z are done."

And then as Y finishes, you realize you also need V and W.

And then as Z finishes, you realize you also need S, T, and U.

And then as V finishes ...

That's how 0.11.0 got started with half a dozen features in 2017, and died with a hundred merges incorrectly marked closed in the tracker in 2020

Part of the discipline of public software is frequent, small releases. That was always a problem with peg, but the software's quality was so high that we put up with it anyway.

Then dmajda left, and everything just halted.

And we waited, patiently, for a long time.

But the new guy now calls this his hobby project, and says he scrapped the whole thing in favor of something new and incompatible he wrote. And even if it had the same AST and the same featureset and more, it wouldn't have ten years of community debugging behind it, and I wouldn't be able to switch

And you know, if he wants to write a new, more powerful PEG parser, fine, great, go ahead.

But he doesn't get to kill this one by pretending to be a maintainer then never maintaining, then taking over this one's library community and position, and putting his own never-going-to-be-released software in its place

It's time for a healthy process to take back over. The new maintainer built a micro-community of junior developers, and they're actually advocating to keep the library dead rather than to save it

It's clear that change is badly needed

.

I just didn't find the way out to reuse the same rule for multiple lines.

If you have trouble finding answers again, feel free to tag me personally

That said, generally I google for examples, and because this library was once so popular and heavily used (and can be again if the current library murderers will just make room on the bench for another person to help,) there are more than enough examples out there to cover the things you'll need to find

Generally, though, what I wish someone had told me when I was new is the thing I said under the comment starting "The key is to learn to read PEG grammars"

Once you learn to read PEG grammars in that discussion form, it also becomes very easy to think about them, and at that point they are suddenly trivially easy to write

It's like a lightswitch. No ramp. Straight from impossible to easy

It's like a lightswitch. No ramp. Straight from impossible to easy

You encouraged me! :) So I shouldn't feel so stupid when I could only see some "gibberish" rule sets :)

If you have trouble finding answers again, feel free to tag me personally

I don't want to abuse this, so it'll be hard call everytime if it is worth to ask or should I search the net some more. This was a generous offer, thank you.

It's clear that change is badly needed

I see that you enabled "issues" section in your fork. That is always a good sign of attempting to fly an airliner by rushing into the cockpit while you were only a passenger. That's a good thing.

If there is a live water source, it will always find its way to flow, no matter what you put on its way. If there is no flow because it's source is drained, there is nothing to do. Source is the demand. Your attitude indicates that the source of the water is very much alive.

So I'm curious, why don't you just take over? That's what I did for loading-bar library. I realized that the development is stalled, so I took over by addressing the issues starting from my own ones and creating pull requests for each branch. Some time later, the original author decided to continue his work, and we were good to go. Do you think this is a feasible solution?

You encouraged me! :)

I'm glad.

.

It's like a lightswitch. No ramp. Straight from impossible to easy

So I shouldn't feel so stupid when I could only see some "gibberish" rule sets :)

Nah. Parsers are an extreme case of the "it's just ridiculous and then all of a sudden it's easy" thing.

Here's the kicker - comparatively speaking, peg's pretty easy. The other ones are often just brutal.

There are in my opinion four big problems in learning peg.

  1. There isn't any really well structured introductory material
  2. There are a lot of mid-level examples out there but you have to be good at google to find them
  3. There are also bad examples there and it takes experience to be able to identify them
  4. You have to be able to "think this way," and that doesn't happen immediately

I have been considering making some video tutorials. They would make this _far_ easier to understand, I believe.

.

I don't want to abuse this, so it'll be hard call everytime if it is worth to ask

Once a week is fine. Understand that I'm sometimes slow to respond

.

I see that you enabled "issues" section in your fork. That is always a good sign of attempting to fly an airliner by rushing into the cockpit while you were only a passenger. That's a good thing.

I've barely gotten started. First I want to see if the real repo can just be rescued

Doing this from a fork would be obscenely more difficult. I'd lose all the PRs and all the cross references, and all of the closed unmerged or closed deleted material, some of which is very valuable

.

So I'm curious, why don't you just take over?

I would like to.

At this time, the relevant passwords and authentication are in one person's hands, and they have yet to respond.

We'll see.

.

I realized that the development is stalled, so I took over by addressing the issues starting from my own ones and creating pull requests for each branch

This is a special case

What's published is 0.10.0

The new maintainer allowed the 0.11.0 branch to grow unbounded for three years, then decided it was cancelled, in favor of a 0.12.0 he's writing from scratch in isolation

There's nothing to put PRs to. What's in npm is from 2017, and what's in github under the new maintainer is cancelled after three years without ever having published

.

Some time later, the original author decided to continue his work, and we were good to go. Do you think this is a feasible solution?

If the replacement maintainer is willing to allow it, this is roughly exactly what I want.

I kind of doubt David will come back, but if he does, that'd be fantastic

As such I want to turn this into a standard open source project again

What are the most 3 important issues here, according to you?

I think saying what the most important issues are to me is a little dangerous, because there are a fair number of people here with more experience than I have in peg, and if they say "actually it's this other thing," I'm likely to listen.

To that end, I want to caution that whereas I'm happy to develop here, my interest here is as a maintainer.

This is, I think, something a lot of people don't get: development and maintenance coding are really, really different.

  • Development coding wants big new ideas, new features, new flashy ideas.
  • Maintenance coding wants to fix small problems before they aggregate and make something good unusable.

I'm happy to do some development coding - maybe even looking forward to some - but there are other people in here who are better suited to it. And, so, I want to make clear: my actual goal is to make this so they can contribute PRs again, like they used to.


That said, to show what my personal viewpoint is:

1. Getting a regular release cadence back

peg.js must never have a magic branch again. It's like the fucking One Ring. It sounds great and powerful, but it doesn't god damned work, and at the end, you're Gollum. This isn't svn. In the Steve Ballmer voice, feature branches, feature branches, feature branches, feature branches.

A version should be the result of a feature, not a collecting point for a plan of features. We're not a 1980s company and we shouldn't be planning like one.

The only time more than one feature should go in at the same time is when it's unavoidable, like as the result of patching features to cope with an external upgrade, or things that genuinely cannot be done apart. Oh, you think that's related to another feature? Great, put it in 2.31.0, we need to get 2.29.0 out and this other thing over there is likely to be 30.

People have been treating minors like they're majors. That's why the minor never came out: it was hung on the same behavioral trap that hangs majors. Just Don't Fucking Do That ™.

To be specific,

  • People have, I believe, largely lost their faith that this library exists anymore.
  • Three months of weekly releases would get people to give it another chance.

    • Three months of weekly releases would actually be super easy to achieve

  • If we didn't just get 0.12.0 out, but also 0.13.0 - and notice I'm not saying what's in them, and I kind of think it doesn't matter - then we'd have a real shot at 1.0.0
  • Just going through the open and closed-unmerged PRs in here, there's a huge amount of power and beauty, with a comment from 2015 or so like "I'll look into it later." Being a generous person and finding room to share authority would help peg not just come to life, but blossom
  • To that end, I don't want to be the artist. I want to be the curator.

2. Getting the documentation and testing into an acceptable place

Everyone talks about this, but my hobby finite state machine currently has 3500 unit tests and 100% documentation coverage, so, take me more seriously.

I really deeply believe in testing.

There's another library I wrote that's half the size and significantly lower in complexity than the state machine. It's way easier to make sweeping language changes to the state machine than small changes to the network handler, because the network handler is poorly tested, and you have to put in real effort to make sure something is right.

The FSM? Nah, the tests are great, they'll catch you

This specific contrast reminds me in brutal clarity every time I touch either of them just how important testing actually is to something being, to me at least, trustworthy.

I think a really big part of the problem with working on peg.js is that the testing and documentation are in a shambles. I think it's time for that to change.


3. Removing the hipster nonsense.

  • I'm not a Ruby person, but, the Ruby people have a real point about configuration by convention. I don't Ruby at all, but I can still sit down and know how a project works, because that's how they all work, and if I don't get something I can ask someone, and they don't need access, because that's how they all work
  • peg faces four problems in this regard

    1. peg is an extremely early javascript library, and it made a whole bunch of foundational choices before community norms existed. in fact, several community norms are because of david; prior to peg lots of people thought multipackaging was hard, so now that one of the trailblazers in that regard is problematically behind in the same regard, it's kinda heartbreaking. That said, in addition to the brilliant things david got right ahead of time, some things he got wrong, and some things that were right back then aren't anymore. Lots of small changes would result in a radical change to the developer experience.

    2. Community norms should be re-established.



      1. There are just certain ways a node project is supposed to work. That includes producing browser targeted output, and thus, a node project is clearly the right modern way to work


      2. This is still a browser project with hand-made automation. That should change


      3. It's a significant learning challenge to get into a position to correctly edit the README. After three years, the current maintainer still hasn't pulled it off (!), and neither had the original developer in the last two versions





        • This is asinine. Pieces of the project that should be trivial are breaking because they aren't done the simple normal way. They should be done the simple normal way, but that requires someone who knows node and is ready to do boring work.






    3. peg is both the beneficiary and the victim of extreme automation.



      • It is likely that dmajda couldn't have gotten as far as he did without it. I certainly can't on my things.


      • However, this is 2011 automation, not 2020 automation


      • It's also 2013, 2014, 2016, and 2018 automation. This stuff is spread across zeit, now, github pages, a yahoo personal account, gitlab, several weird tracking and auto-deploy services, and probably a bunch of things I haven't found yet


      • It's just tool thrashing to either survive a collapse or play with the hot new thing. Careful tool selection leads to permanence by design. The actual repl is almost ten years durable now. Everything else can be too, if "ooh shiny" is treated as a red flag.


      • This should be moved to gh pages and gh actions, which everyone understands, and left alone forever



    4. The new maintainer chose to dive deep into uncommon tools and marginalized strategies. As a result you have to install a new package manager to contribute bugfixes, and learn an uncommon source layout that coexists, in a confusing way, with an unrelated set of source that appears to be the real product, in the regular source layout.



      • When a third party contributed a fix allowing the standard language package manager to also work, he rejected it.


      • This sort of behavior is, frankly, unacceptable in a community project. It makes the library far harder to contribute to.


      • Several of these extremist tools have been replaced by other extremist tools, so he's not going for things he knows; he's trying things out. In the meantime, we wait for basics, like spelling errors in the AST, like getting the readme on npm fixed, or merging es6 modules, for three years at a time.


      • Frankly, in a 0.12.0 rebake, things like the module in module yarn stuff would just come out. David's build stuff has worked since 2011. The new stuff from 2018 is already broken in 2020. No more dilettante technology.



  • AND CAN I PLEASE GET AN ES6 EXPORT

but you'll notice none of those are actually really about the software per se

i don't think the software here is the problem

i think the process is, and to a lesser degree, the project

those are what i will fix, if @futagoza will allow it

this library can be back to life in 30 days if we swallow our pride and choose the library's large community's needs over our hobby project interests

let the hobby rewrite be the fork

let a maintainer start maintaining

@StoneCypher I like you energy :-) I used to be using pegjs in the past (in dmajda's times) and I liked it a lot.

Just fork it and do not fight. Things can and will settle down later. If community follows you then you do not need to care about existing "key holder" or anything like that. Building reputation takes time but is necessary. Do not waste time on arguing anymore.

Eight your fork will make it "back into the origin" at some point in the future or it will have its own life. Both options are valid and fine IMO.

Please stop it with the "make a fork" bullshit. There are four of them and you don't know what they are. A fork won't save any of the existing downstream consumers, won't save the PRs, won't save the issues, won't carry the community, and won't be visible.

People have been trying that for three years. IT DOES NOT WORK.

Do not continue to offer this advice.

@futagoza - everyone has given up on you doing the right thing. Five people have told me to fork the library, because they expect you to refuse to allow repair, and to keep control of the library you're killing.

I believe that the reason they expect this is that I've found more than a dozen people offering you help to do the thing you promised to do and never did, and every time you say "no, I'm gonna do it soon."

Do what you should have done in 2018, and give maintenance to someone who will actually maintain the project. Stop killing this library, stop killing this community, and get out of the way.

@StoneCypher John, when I suggested to create a fork I really meant it. Forking a project is a great option open source gives you especially if you feel it needs changes or it is dying. And no, I do not have any problem, you do not need to write me DM just to ask me this question.

When I said "do not give this advice again," I also really meant it.

I love this conversation.

@StoneCypher I have to disagree with you here, because I saw just the opposite:

  1. I decided to start learning FreeCAD. However, there was a problem: its "Assembly module" was not complete, so we were nearly unable to create complex assemblies which makes it useless for professional work.
  2. A guy, realthunder decided to solve this problem. He needed to change some core properties to achieve the goal, which in turn made his fork incompatible with the mainstream branch. He lost all users, nearly all community. Except for a few people, he had no supporters (that was what I see). He also had no active users, as far as I understood.
  3. I examined his documentation¹ and despite the year of stall in the development (that was what I saw at that time) I made the math (an interesting math) and decided to give it a shot.
  4. I asked many many questions¹ and he answered them patiently. Meanwhile I took notes on what I learn, thus made a good introduction material.
  5. People discouraged¹ to use his branch claiming he was the lonely author and maintainer and his work shouldn't be trusted in terms of sustainability. I just ignored them.
  6. His pull request has not been merged for a long time (about a year or so).

Lately, his PR had been started to be reviewed. A great amount of work had been done and finally his branch and the mainstream become compatible. In the next release, I believe his branch will be merged into the mainstream.

Meanwhile, there was a serious finance problem. He was the only maintainer and donation from a few users couldn't make him survive. I decided to teach FreeCAD/Assembly3 in here, Turkey, and sell the support in order to support the finance. I proposed him and this is accepted. I made all necessary applications to become a trainer in a highly reputed foundation, which is accepted recently.

Sometimes 1 person is enough to start a fire.

and get out of the way

Disagree. Let them stay on the way. A good motivation will always find its way out. If it can't, it's because it was not that good.

I am explicitly and clearly not gathering advice on this topic.

This library needs to be resurrected. I'm sorry if I'm not explaining it sufficiently, but the responses that I am receiving fail to address any of the practical concerns raised.

People's lives and jobs ride on this.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

emmenko picture emmenko  ·  15Comments

audinue picture audinue  ·  13Comments

futagoza picture futagoza  ·  13Comments

vldmr1986 picture vldmr1986  ·  12Comments

mattkanwisher picture mattkanwisher  ·  5Comments