Pegjs: Import/include other grammars

Created on 16 Aug 2011 · 32Comments · Source: pegjs/pegjs

It could be extremely useful to have the ability to define grammars by importing rules from other grammars.

Several ideas ;

@include "expression.pegjs"
(or @from "expression.pegjs" import expression)

tag_if
    = "if" space? expression space? { ... }

@import "expression.pegjs" as expr

tag_if
    = "if" space? expr.expression space?

Ideally, this would not re-generate the whole code in every .pegjs that includes another ; maybe we would have to modify a little the behaviour of parse() to something of the like ;

Editing as per what you were saying in the options issue ;

parse(input, startRule)
->
parse(input, { startRule: "...", startPos : 9000 })

And at the end, if startPos != 0 && result !== null, we don't check if we went until input.length, but instead return the result as well as the endPos (don't really know how to do that elegantly - maybe simply modifying the options parameter ?).

It would allow reusability of grammars and modularisation of the code, which I think are two extremely important aspects of coding in general.

feature

Source

ceymard

👍9

Most helpful comment

@Dignifiedquire I am currently thinking about syntax & semantics that can probably be best explained by an example:

static-languages.pegjs

langauges  = "C" / "C++" / "Java" / "C#"

dynamic-languages.pegjs

languages = "Ruby" / "Python" / "JavaScript"

all-languages.pegjs

static  = require("./static-languages")
dynamic = require("./dynamic-languages")

all = static.languages / dynamic.languages

Each .pegjs file would implicitly define a module that would export all the rules it contains. The <name> = require(<module>) construct would import such a module. Its rules would then be available inside a namespace.

This design is deliberately similar to Node.js. Using namespaces will avoid conflicts. There are two downsides I see:

The <name> = require(<module>) construct is too similar to rule definitions and thus can be confusing (one might think that just one rule is imported).
The . syntax conflicts with the current meaning of ., which is “any character”. This can be solved by ugly hacks (e.g. . surrounded by whitespace means “any character”, while . surrounded by identifiers separates a namespace name from a rule name) or by changing the syntax (e.g. using any keyword to represent “any character”).

dmajda on 23 Feb 2013

👍3

All 32 comments

I agree that this is an important feature, I want to do this after version 1.0.

(BTW I don't like the Python-like syntax you propose — something similar to Node.js's require would be better because it would be more familiar to JavaScript programmers. But this is a minor thing that can be ironed out later.)

dmajda on 20 Aug 2011

👍1

Would you consider it for inclusion before 1.0 if provided with a patch ?

I agree on your remark about the python syntax.

ceymard on 20 Aug 2011

+1 for this feature

s3u on 2 Oct 2011

@ceymard Yes, I would consider it.

dmajda on 10 Jan 2012

+1 for the feature and +1 for require style inclusion

dignifiedquire on 5 Aug 2012

@dmajda @ceymard Do you have any thoughts already on how to implement this? I need this for a project at work and will try to implement. The question is should this be just an addition to split grammars into multiple files or something like inheritance, so one could inherit all rules for example and then overwrite specific rules in the new grammar.

dignifiedquire on 19 Feb 2013

@Dignifiedquire I am currently thinking about syntax & semantics that can probably be best explained by an example:

static-languages.pegjs

langauges  = "C" / "C++" / "Java" / "C#"

dynamic-languages.pegjs

languages = "Ruby" / "Python" / "JavaScript"

all-languages.pegjs

static  = require("./static-languages")
dynamic = require("./dynamic-languages")

all = static.languages / dynamic.languages

This design is deliberately similar to Node.js. Using namespaces will avoid conflicts. There are two downsides I see:

The <name> = require(<module>) construct is too similar to rule definitions and thus can be confusing (one might think that just one rule is imported).
The . syntax conflicts with the current meaning of ., which is “any character”. This can be solved by ugly hacks (e.g. . surrounded by whitespace means “any character”, while . surrounded by identifiers separates a namespace name from a rule name) or by changing the syntax (e.g. using any keyword to represent “any character”).

dmajda on 23 Feb 2013

👍3

@dmajda As the <identifier> = <expression> pattern is already taken by the rule definitions, why not do something like this:

static := require("./static-languages")
dynamic := require("./dynamic-languages")

all = static::languages / dynamic::languages

The :: is not used anywhere that I know of in PEG.js and makes it easy to distinguish between namespaces and other things. I'm not sure about the := it brings the point across but feels very foreign for Javascript..

Also if you want to use namespaces, do you think there should be only one namespace per file or should there be a way of creating multiple namespaces in one file like this:

static := {
  languages  = "C" / "C++" / "Java" / "C#"
}

dynamic := {
  languages = "Ruby" / "Python" / "JavaScript"
}

dignifiedquire on 23 Feb 2013

I'm not much of a fan of :: and :=, they look alien in javaScript/CoffeeScript world.

I'd also like to keep things simple and define namespaces implicitly only by requiring files. I don't see a big need for anything more complicated.

dmajda on 24 Feb 2013

How about simply:

@require foo = "./foo"

bar = foo:languages

Colons are a compromise, but they are used to separate namespaces in many places: C++, C#, XML, etc.

otac0n on 1 Mar 2013

: will always be associated with cons for many, many functional programmers. I suggest staying away from that operator. :: looks fine to me. Isn't that used for C++ namespaces? I'm not convinced yet that . is a bad choice, either.

michaelficarra on 1 Mar 2013

. can't be used without a breaking change. It would be ambiguous in the language.

:: is used in C++ for namespaces, and in C# for namespace prefixes (global::System, for example).

otac0n on 1 Mar 2013

I was thinking of a quick workaround on this topic - to solve simple inheritance only - glue pegjs files together, while having everything namespaced.

This might make grammars too verbose, and involves a building step - but looking at the bright side, it would force you to have granular DRY&OTW grammars

And regarding the markup, no saying that this is a proper fit to this thread, but just an option to consider, I was going for a simple __

languages = static__languages / dynamic__languages
<static-languages.pegjs>
<dynamic-languages.pegjs>
/* alternative */
languages = STATIC__languages / DYNAMIC__languages

andreineculau on 24 May 2013

@andreineculau I'm basically already doing this with a build step, so if you and others are just looking for something to generate useful parsers from a grammar with a dependency tree (where a single parser implementing the combined grammar is generated), I might clean what I have up and release it so the discussion can refocus on how to deal with this in a more permanent way.

Another thing: approaching this primarily by designing extensions to the grammar syntax misses something important, which is that one of the main reasons we all have the itch to pull in rules from other grammars (another being clarity) is the need to write parsers that share a lot of logic. So, while generated parsers might never be meaningfully re-composable at parse-time, it seems important that a tree of grammars generate a tree of parsers, rather than one monolithic parser. It's most important when a set of parsers will be part of a web UI, but it generally doesn't hurt to avoid unnecessary bloat in generated code.

odonnell on 24 May 2013

@odonnell +1 for releasing anything - no matter if you have the time to clean it up

and +1 for the clarification. This should be treated as a quick workaround, not a long-term proper solution.

andreineculau on 25 May 2013

@odonnell my take on it is online at https://github.com/andreineculau/core-pegjs - please poke me if you have something better.

andreineculau on 26 May 2013

+1 for this feature

cpettitt on 3 Sep 2013

:+1:

ne-sachirou on 8 May 2014

:+1:

adammichalik on 9 May 2014

:+1:

goldibex on 10 Oct 2014

I went and wrote a plugin/extension for PEG.js that does imports: https://github.com/casetext/pegjs-import.

goldibex on 15 Oct 2014

+1 for this as well.

yinso on 23 Oct 2014

I implement this in #308 in generic way: inclusion of grammar is only one way to implement decomposition rules.

Mingun on 3 Feb 2015

Great feature :+1:

Looking forward to seeing it released.

lbeschastny on 31 Mar 2016

:+1:

AndreTheHunter on 6 Apr 2016

Awesome! :+1:

rumkin on 9 Apr 2016

@dmajda I'm coming late to this party, but I wonder how often we need to import many rules from another library. I would love to be able to import things like Url and Email into my composed grammars but I don't care that Url may also have things like HierarchicalPart and AsciiLetter. Do you think something like Node's named exports would be a viable way forward, keeping the benefits of namespacing but allowing direct named imports?

import { SchemalessUrl, Url } from "./Urls.pegjs"

Token
  = PhoneNumber
  / Url
  / SchemalessUrl

Namespacing has been an issue for me as I try and explore writing otherwise-composable grammars. I'm stuck right now including files in files and naming things the way PHP functions were named before they introduced proper namespaces: UrlIpHost, HtmlQuotedString, etc…

dmsnell on 28 Dec 2016

👍1

@dmajda @futagoza

Any progress on this issue? or the primary discussion living now on #473 ?
My grammar file is growing very fast :(
It would be nice to split it several ones

eraxillan on 4 Jun 2017

👍1

I wouldn't mind being able to split grammars between files, simply for organization and composition. It would make them easier to test and re-use, as well as providing a way swap grammars dynamically, maybe? Just some thoughts.

The JavaScript example that I used as a base is over 1,300 lines. It took a while to learn where everything was, and jump around and edit different sections.

mikeaustin on 5 Jun 2017

@mikeaustin I see this feature as some kind of Node.JS required:

cat bash.pegjs
{
const _ = require("whitespace");
const LB = require("line_break");
const CodeBlock = require("code_block");
const BoolExpr = require("boolean_expression");
}
...
IfStatement = "if" _ "[" BoolExpr "]" _ ";" _ "then" LB? CodeBlock "fi"

eraxillan on 5 Jun 2017

I agree, splitting grammars and making them modular is a great feature, however handling these case's would be a a problem:
1- sub-grammar that relies on a global variable that was defined in the main grammar code ?
2- duplicate variables and grammar name ?

IMO, a temporally convenient approach would be creating a new addon for PEG.js (independent from PEG.js) that defines a keyword for importing (for example @load(anotherGrammarFileLocation) ) keyword should not part of javacsript/peg.js grammar,
build a reg-exp or a peg grammar to detect that keyword and substitute it with "anotherGrammarFile Location" content , and send the substituted code to PEG.js

Example:

integers.pegjs

integers=[0-9]* {return parseInt(text())}

main.pegjs

arrayOfInteger="["(integers ",")* integers"]"
@load("integers.pegjs")

Note using this method, if someone did not define the start grammar, and placed @load before "arrayOfInteger" peg.js will assume the first grammar as the start ( integers grammar)

One approach to handle this is , use same names of filename and start grammar and let the new ad-don manually configure the start attribute as the file name, or substitute all content at the end of file.

user should be responsible of any duplication .

jodevsa on 5 Jun 2017

I just want to highlight that this issue is primarily an optimization request, because composability/modularity is something that you can achieve on your own, especially when you control the full spectrum of the grammar.

If you're not comfortable with a grammar 1k-lines long, then split it up, and concatenate it back as you see fit before pumping it into pegjs.

andreineculau on 9 Jun 2017

👎1 👍1

Was this page helpful?

0 / 5 - 0 ratings