Pegjs: 生成的解析器的行为不一致

创建于 2018-03-08 · 15评论 · 资料来源: pegjs/pegjs

问题类型

错误报告：

先决条件

你能重现这个问题吗？： _是_
您是否搜索了存储库问题？： _是_
你看论坛了吗？： _no_
您是否进行了网络搜索（谷歌、雅虎等）？： _是_

描述

目前我正在使用 JS API 在运行时生成解析器。这工作正常。

然后我尝试使用 CLI 生成解析器，以避免在运行时生成它。当我使用它时，虽然我得到了错误（~我的一半用于解析字符串的测试抛出错误）。

重现步骤

将语法移动到它自己的文件grammar.pegjs
使用 CLI 生成解析器

pegjs -o parser.js grammar.pegjs

删除peg.generate('...')并将其替换为新的解析器

const parser = require('./parser');
parser.parse('...');

运行测试

预期行为：
我希望从 CLI 生成的解析器与从 JS API 生成的解析器的工作方式相同。

实际行为：
使用 JS API，当我将此字符串 ( 'foo = "bar"' ) 传递给解析器时，我得到以下 AST：

{
  kind: 'condition',
  target: 'foo',
  operator: '=',
  value: 'bar',
  valueType: 'string',
  attributeType: undefined
}

但是，当我使用 CLI 使用“生成的”解析器并传递相同的字符串 ( 'foo = "bar"' ) 时，我收到以下错误：

SyntaxError: Expected "(", boolean, date, datetime, number, string, or time but "\"" found.
    at peg$buildStructuredError (/Users/emmenko/xxx/parser.js:446:12)
    at Object.peg$parse [as parse] (/Users/emmenko/xxx/parser.js:2865:11)
    at repl:1:7
    at ContextifyScript.Script.runInThisContext (vm.js:50:33)
    at REPLServer.defaultEval (repl.js:240:29)
    at bound (domain.js:301:14)
    at REPLServer.runBound [as eval] (domain.js:314:12)
    at REPLServer.onLine (repl.js:441:10)
    at emitOne (events.js:121:20)
    at REPLServer.emit (events.js:211:7)

软件

PEG.js： 0.10.0
Node.js： 8.9.1
NPM 或纱线： [email protected]
浏览器： Chrome
操作系统： OSX
编辑： VSCode

question

资料来源

emmenko

👍1

所有15条评论

很好，你填对了👍，现在我们只需要语法，我可以帮你😄

futagoza 于 2018-03-08

干得好：

// GRAMMAR
const parser = peg.generate(`
{
  function getFlattenedValue (value) {
    if (!value) return undefined
    return Array.isArray(value)
      ? value.map(function(v){return v.value})
      : value.value
  }
  function getValueType (value) {
    if (!value) return undefined
    var rawType = value.type
    if (Array.isArray(value))
      rawType = value[0].type
    switch (rawType) {
      case 'string':
      case 'number':
      case 'boolean':
        return rawType
      default:
        return 'string'
    }
  }
  function getAttributeType (target, op, val) {
    if (typeof target === 'string' && target.indexOf('attributes.') === 0) {
      if (!val)
        return undefined
      switch (op) {
        case 'in':
        case 'not in':
          return val[0].type;
        case 'contains':
          return 'set-' + val.type
        default:
          return Array.isArray(val) ? 'set-' + val[0].type : val.type;
      }
    }
  }
  function transformToCondition (target, op, val) {
    return {
      kind: "condition",
      target: target,
      operator: op,
      value: getFlattenedValue(val),
      valueType: getValueType(val),
      attributeType: getAttributeType(target, op, val),
    }
  }

  function createIdentifier (body) {
    return body
      .map(identifiers => identifiers.filter(identifier => (identifier && identifier !== '.'))) // gets raw_identifiers without dots and empty identifiers
      .filter(identifiers => identifiers.length > 0) // filter out empty identifiers arrays
      .map(identifiers => identifiers.join('.'))
      .join('.') // join back to construct the path
  }
}

// ----- DSL Grammar -----
predicate
  = ws exp:expression ws { return exp; }

expression
  = head:term tail:("or" term)*
    {
      if (tail.length === 0) {
        return head;
      }

      return {
        kind: "logical",
        logical: "or",
        conditions: [head].concat(tail.map(function(el){return el[1];})),
      };
    }

term
  = head:factor tail:("and" factor)*
    {
      if (tail.length === 0) {
        return head;
      }

      return {
        kind: "logical",
        logical: "and",
        conditions: [head].concat(tail.map(function(el){return el[1];})),
      };
    }

factor
  = ws negation:"not" ws primary:primary ws
    {
      return {
        kind: "negation",
        condition: primary,
      };
    }
  / ws primary:primary ws { return primary; }

primary
  = basic_comparison
  / list_comparison
  / empty_comparison
  / parens

// ----- Comparators -----
basic_comparison
  = target:val_expression ws op:single_operators ws val:value
    { return transformToCondition(target, op, val); }

list_comparison
  = target:val_expression ws op:list_operators ws val:list_of_values
    { return transformToCondition(target, op, val); }

empty_comparison
  = target:val_expression ws op:empty_operators
    { return transformToCondition(target, op); }

// ----- Operators -----
single_operators
  = "!="
  / "="
  / "<>"
  / ">="
  / ">"
  / "<="
  / "<"
  / "contains"

list_operators
  = "!="
  / "="
  / "<>"
  / "not in"
  / "in"
  / "contains all"
  / "contains any"

empty_operators
  = "is not empty"
  / "is empty"
  / "is not defined"
  / "is defined"

list_of_values
  = ws "(" ws head:value tail:(ws "," ws value)* ws ")" ws
    {
      if (tail.length === 0) {
        return [head];
      }
      return [head].concat(tail.map(function(el){ return el[el.length -1];}));
    }

// ----- Expressions -----
val_expression
  = application_expression
  / constant_expression
  / field_expression

application_expression
  = identifier ws "(" ws function_argument (ws "," ws function_argument)* ws ")"
constant_expression = ws val:value ws { return val; }
field_expression = ws i:identifier ws { return i; }

function_argument
  = expression
  / constant_expression
  / field_expression

value
  = v:boolean { return { type: 'boolean', value: v }; }
  / v:datetime { return { type: 'datetime', value: v }; }
  / v:date { return { type: 'date', value: v }; }
  / v:time { return { type: 'time', value: v }; }
  / v:number { return { type: 'number', value: v }; }
  / v:string { return { type: 'string', value: v }; }

// ----- Common rules -----
parens
  = ws "(" ws ex:expression ws ")" ws { return ex; }

identifier
  = body:((raw_identifier "." escaped_identifier)+ / (raw_identifier "." raw_identifier)+)
    { 
      return createIdentifier(body)
    }
    / i:raw_identifier { return i; }

escaped_identifier
  = "\`" head:raw_identifier tail:("-" raw_identifier)* "\`"
    { return [head].concat(tail.map(function(el){return el.join('');})).join(''); }

raw_identifier = i:[a-zA-Z0-9_]* { return i.join(''); }

ws "whitespace" = [ \\t\\n\\r]*

// ----- Types: booleans -----
boolean "boolean"
  = "false" { return false; }
  / "true" { return true; }

// ----- Types: datetime -----
datetime "datetime"
  =  quotation_mark datetime:datetime_format quotation_mark
    { return datetime.map(function(el){return Array.isArray(el) ? el.join('') : el;}).join(''); }

datetime_format = date_format time_mark time_format zulu_mark
time_mark = "T"
zulu_mark = "Z"

// ----- Types: date -----
date "date"
  =  quotation_mark date:date_format quotation_mark { return date.join("");}

date_format = [0-9][0-9][0-9][0-9] minus [0-9][0-9] minus [0-9][0-9]

// ----- Types: time -----
time "time"
  =  quotation_mark time:time_format quotation_mark { return time.join("");}

time_format = [0-2][0-9] colon [0-5][0-9] colon [0-5][0-9] decimal_point [0-9][0-9][0-9]
colon = ":"

// ----- Types: numbers -----
number "number"
  = minus? int frac? exp? { return parseFloat(text()); }

decimal_point = "."
digit1_9 = [1-9]
e = [eE]
exp = e (minus / plus)? DIGIT+
frac = decimal_point DIGIT+
int = zero / (digit1_9 DIGIT*)
minus = "-"
plus = "+"
zero = "0"

// ----- Types: strings -----
string "string"
  = quotation_mark chars:char* quotation_mark { return chars.join(""); }

char
  = unescaped
  / escape
    sequence:(
        '"'
      / "\\\\"
      / "/"
      / "b" { return "\\b"; }
      / "f" { return "\\f"; }
      / "n" { return "\\n"; }
      / "r" { return "\\r"; }
      / "t" { return "\\t"; }
      / "u" digits:$(HEXDIG HEXDIG HEXDIG HEXDIG)
        { return String.fromCharCode(parseInt(digits, 16)); }
    )
    { return sequence; }

escape = "\\\\"
quotation_mark = '"'
unescaped = [^\\0-\\x1F\\x22\\x5C]
// See RFC 4234, Appendix B (http://tools.ietf.org/html/rfc4234).
DIGIT  = [0-9]
HEXDIG = [0-9a-f]i

emmenko 于 2018-03-08

该错误的一些相关补充。我通过pegjs-loader设置pegjs 。它在底层调用parser.generate的 JS API 上运行它也会导致相同的错误。

顺便说一下，非常感谢这个项目！

tdeekens 于 2018-03-08

👍1

@emmenko我不知道为什么您的语法与 API 一起工作（将继续尝试找出原因），但是您的语法不正确， unescaped规则应该是：

unescaped = !'"' [^\\0-\\x1F\\x22\\x5C]

告诉我这是否可以解决您的问题

@tdeekens如果它是同样的错误（例如Expected ... but "\"" found. ），然后检查你的语法是否正确，或者把它贴在这里

futagoza 于 2018-03-08

@futagoza我和@tdeekens在同一个团队，所以是同一个问题😅

我们会及时通知您！感谢您到目前为止的支持🙏

我不知道为什么你的语法与 API 一起工作

老实说，我们从来没有遇到过这个问题。感谢您指出它无论如何！

emmenko 于 2018-03-08

它现在工作了吗？

futagoza 于 2018-03-08

不幸的是它没有帮助☹️

emmenko 于 2018-03-08

使用您的语法、PEG.js 0.10、Node 8.9.0 和输入foo = "bar" ，我通过 3 条路线进行了尝试：

https://pegjs.org/online
PEG.js API
pegjs CLI

所有 3 个都显示相同的错误： Line 1, column 7: Expected "(", boolean, date, datetime, number, string, or time but "\"" found.

如果我更改您的语法，它将修复所有 3 条路线的此错误：

// orignal
unescaped = [^\\0-\\x1F\\x22\\x5C]

// fixed
unescaped = !'"' [^\\0-\\x1F\\x22\\x5C]

应用固定规则后，您是否可以检查：

您收到相同的错误消息或不同的错误
你正在做的事情与我提到的不同（或额外的步骤）
您在使用 PEG.js API 时是否使用了选项

此外，在稍微调整输入后，我意识到您的语法没有正确地将换行符视为空格，这很可能是因为您的ws规则。

编辑：这是我的测试脚本：

/* eslint node/no-unsupported-features: 0 */

"use strict";

const { exec } = require( "child_process" );
const { readFileSync } = require( "fs" );
const { join } = require( "path" );
const { generate } = require( "pegjs" );

function test( parser ) {

    try {

        console.log( parser.parse( `foo = "bar"` ) );

    } catch ( error ) {

        if ( error.name !== "SyntaxError" ) throw error;

        const loc = error.location.start;

        console.log( `Line ${ loc.line }, column ${ loc.column }: ${ error.message }` );

    }

}

const COMMAND = process.argv[ 2 ];
switch ( COMMAND ) {

    case "api":
        test( generate( readFileSync( join( __dirname, "grammar.pegjs" ), "utf8" ) ) );
        break;

    case "cli":
        exec( "node node_modules/pegjs/bin/pegjs -o parser.js grammar.pegjs", error => {

            if ( error ) console.error( error ), process.exit( 1 );

            test( require( "./parser" ) );

        } );
        break;

    default:
        console.error( `Invalid command "${ COMMAND }" passed to test script.` );
        process.exit( 1 );

}

futagoza 于 2018-03-08

非常感谢您的反馈！我们将在明天尝试您的建议，如果有帮助，我们会尽快通知您。 🙏

emmenko 于 2018-03-08

感谢您的反馈。首先为造成的混乱道歉。我只是想指出这个问题也出在 webpack-loader 中。抱歉，在这个问题上造成了混乱。

我们尝试了改进。它总体上修复了解析器，但现在我们遇到了一个新问题，我们很难理解其原因。

一个测试的例子是（更多见下文）

Object {
+   "attributeType": undefined,
    "kind": "condition",
    "operator": "=",
    "target": "foo",
-   "value": "bar",
+   "value": ",b,a,r",
    "valueType": "string",
}

我们认为错误可能在我们这边，我们只是不确定在哪里。这发生在例如以下输入

categories.id != ("b33f8e3a-f8d1-476f-a595-2615c4b57556")

变成

categories.id != (",b,3,3,f,8,e,3,a,-,f,8,d,1,-,4,7,6,f,-,a,5,9,5,-,2,6,1,5,c,4,b,5,7,5,5,6")

解析时。

我们显然会非常感谢提供线索，但也了解是否可以在那里支持我们。

tdeekens 于 2018-03-09

哎呀，我的错误😨，这应该可以解决这个问题

unescaped = !'"' value:[^\\0-\\x1F\\x22\\x5C] { return value; }

futagoza 于 2018-03-09

👍1

感谢您的超级快速响应。它确实有帮助，但在使用 CLI 或 webpack-loader 时没有帮助，它们经常返回SyntaxError: Expected "(", boolean, date, datetime, number, string, or time but "\"" found.的初始错误。例如not(sku = "123")或更复杂的例子lineItemTotal(sku = "SKU1" or list contains all (1,2,3), field.name, "third arg") = "10 EUR"发生的事情。这可能与转义有关吗？

tdeekens 于 2018-03-09

是的，事实证明这是因为双重转义。以下是固定规则：

ws "whitespace" = [ \t\n\r]*

char
  = unescaped
  / escape
    sequence:(
        '"'
      / "\\"
      / "/"
      / "b" { return "\b"; }
      / "f" { return "\f"; }
      / "n" { return "\n"; }
      / "r" { return "\r"; }
      / "t" { return "\t"; }
      / "u" digits:$(HEXDIG HEXDIG HEXDIG HEXDIG)
        { return String.fromCharCode(parseInt(digits, 16)); }
    )
    { return sequence; }

escape = "\\"

unescaped = !'"' value:[^\0-\x1F\x22\x5C] { return value; }

编辑：似乎您可能想要研究解析复杂示例的规则： lineItemTotal(sku = "SKU1" or list contains all (1,2,3), field.name, "third arg") = "10 EUR" ，它目前正在输出一个奇怪的"kind":"condition"节点

futagoza 于 2018-03-09

非常感谢您的帮助和建议。它似乎解决了我们遇到的问题。我们将研究有关“条件”节点的建议。

tdeekens 于 2018-03-09

不客气😄

futagoza 于 2018-03-09

此页面是否有帮助？

0 / 5 - 0 等级

Pegjs: 生成的解析器的行为不一致

问题类型

先决条件

描述

重现步骤

软件

所有15条评论

相关问题