Tags generation stops when the PHP heredoc
(<<<
) syntax is encountered in a file. As the nowdoc
PHP syntax is basically the same, that's another language element that breaks the file parsing.
not sure about this. Assuming PHP
$ ctags --options=NONE foo.php
<?php
class LivingBeings {
public function doSomething()
{
$foo = <<<FOO
FOO;
}
public function doSomethingElse()
{
}
}
The doSomethingElse
method is not listed in the file. As soon as I comment out the heredoc
portion, the method is indexed normally, as you can see in the "expected output" section ahead.
!_TAG_FILE_FORMAT 2 /extended format; --format=1 will not append ;" to lines/
!_TAG_FILE_SORTED 1 /0=unsorted, 1=sorted, 2=foldcase/
!_TAG_OUTPUT_EXCMD mixed /number, pattern, mixed, or combine/
!_TAG_OUTPUT_FILESEP slash /slash or backslash/
!_TAG_OUTPUT_MODE u-ctags /u-ctags or e-ctags/
!_TAG_PATTERN_LENGTH_LIMIT 96 /0 for no limit/
!_TAG_PROC_CWD /tmp/ //
!_TAG_PROGRAM_AUTHOR Universal Ctags Team //
!_TAG_PROGRAM_NAME Universal Ctags /Derived from Exuberant Ctags/
!_TAG_PROGRAM_URL https://ctags.io/ /official site/
!_TAG_PROGRAM_VERSION 5.9.0 /5a136315/
LivingBeings foo.php /^class LivingBeings {$/;" c
doSomething foo.php /^ public function doSomething()$/;" f class:LivingBeings
!_TAG_FILE_FORMAT 2 /extended format; --format=1 will not append ;" to lines/
!_TAG_FILE_SORTED 1 /0=unsorted, 1=sorted, 2=foldcase/
!_TAG_OUTPUT_EXCMD mixed /number, pattern, mixed, or combine/
!_TAG_OUTPUT_FILESEP slash /slash or backslash/
!_TAG_OUTPUT_MODE u-ctags /u-ctags or e-ctags/
!_TAG_PATTERN_LENGTH_LIMIT 96 /0 for no limit/
!_TAG_PROC_CWD /tmp/ //
!_TAG_PROGRAM_AUTHOR Universal Ctags Team //
!_TAG_PROGRAM_NAME Universal Ctags /Derived from Exuberant Ctags/
!_TAG_PROGRAM_URL https://ctags.io/ /official site/
!_TAG_PROGRAM_VERSION 5.9.0 /5a136315/
LivingBeings foo.php /^class LivingBeings {$/;" c
doSomething foo.php /^ public function doSomething()$/;" f class:LivingBeings
doSomethingElse foo.php /^ public function doSomethingElse()$/;" f class:LivingBeings
$ ctags --version
Universal Ctags 5.9.0(5a136315), Copyright (C) 2015 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
Compiled: Nov 20 2020, 11:46:20
URL: https://ctags.io/
Optional compiled features: +wildcards, +regex, +iconv, +option-directory, +xpath, +yaml, +packcc
Building it locally:
$ cd ctags_source
$ make clean && make distclean
$ ./autogen.sh
$ ./configure --prefix=$HOME
$ make
$ make install
@jespinal, are you talking about this change: https://wiki.php.net/rfc/flexible_heredoc_nowdoc_syntaxes ?
$ git diff |cat
git diff |cat
diff --git a/parsers/php.c b/parsers/php.c
index e3fdc241..ace25561 100644
--- a/parsers/php.c
+++ b/parsers/php.c
@@ -682,6 +682,8 @@ static void parseHeredoc (vString *const string)
int extra = EOF;
c = getcFromInputFile ();
+ if (c == ' ' || c == '\t')
+ c = getcFromInputFile ();
for (len = 0; c != 0 && (c - delimiter[len]) == 0; len++)
c = getcFromInputFile ();
$ cat input.php
cat input.php
<?php
// Taken from https://github.com/universal-ctags/ctags/issues/2717
// submitted by @jespinal
class LivingBeings {
public function doSomething()
{
$foo = <<<FOO
FOO;
}
public function doSomethingElse()
{
}
}
$ u-ctags -o - input.php
u-ctags -o - input.php
LivingBeings input.php /^class LivingBeings {$/;" c
doSomething input.php /^ public function doSomething()$/;" f class:LivingBeings
$
@masatake that's not enough, because the ending marker used to have to be on its own line, while the new version lifts that restriction. I don't find the explanation very clear:
The implementation I am proposing avoids this problem by checking to see if a continuation of the found marker exists, and if so, then if it forms a valid identifier.
but I'd say that it means that unless there is a identifier character after the line prefixed with the terminating marker, it is indeed a terminating marker. So END;
is a termination (given that the marker is END
), but ENDFOO
isn't.
BTW, as this is a backward incompatible syntactic change, I don't know what we want to do about it. But I guess if PHP is happy to break it, we can as well, especially as it's fairly unlikely to cause problem. Ideally I guess we'd use the current syntax for *.php[1-6]
and the new one for the rest, but that might be too much trouble for what it's worth.
@jespinal, are you talking about this change: https://wiki.php.net/rfc/flexible_heredoc_nowdoc_syntaxes ?
Sorry, @masatake , for some reason I was not notified of your question.
Yes, I'm talking about that change. But that was actually implemented in PHP 7.3 (current stable version is 7.4, and version 8 is at hand). I'm not sure why this was not reported earlier considering the vast user-base of both, ctags and PHP.
I'm adding a few screenshots of code snippets derived from the previous example in order to (hopefully) shed some light on what they consider valid/invalid syntax in regards to the new heredoc/nowdoc syntax (the RFC is not clear enough, I think).
In this example, 'TEXT
' (second one) is the ending marker. So, the third one 'TEXT;
is a syntactically invalid string in the view of the parser, as it would expect only a semicolon or a comma:
A similar case as the previous one:
Had it been a semicolon or a comma, php parser would have been happy. E.g.
echo <<<TEXT
some string
TEXT, 'some other string';
In the eyes of the parser, this is the same as:
echo 'some string', 'some other string';
The following is a valid example, as the parser knows that 'TEXT
' and 'TEXTUAL
' are two different strings:
Here's a couple of invalid snippets due to wrong indentation. Specifically, to the RFC statement: "If the closing marker is indented further than any lines of the body, then a ParseError will be thrown:"
@jespinal thanks, but if you have normative text that's be even better :) It's always tricky to guess the logic based solely on a few cases, whereas if we have the normative text we can just implement that and it should hopefully work. And actually, I think we have enough with @masatake's link and your info :+1:
@masatake I don't promise anything given the little time I find lately, but I'll try to give this a look soon unless -- you beat me to it :)
BTW @jespinal if nobody complained I really think it's because there's very little use of those syntax, and we support pre-7.3 syntax, so the only cases where one would see a problem is with 7.3+ syntax usage, which implies using neredoc/nowdoc in the first place :)
You were inactive for a while. So I didn't expect to get a comment from you.
But, now we get the "self-assigned" sign from you. @b4n, thank you for the offering.
@masatake you were wise not to expect much of me, as I indeed didn't find time for much UCtags/Geany contributions lately :disappointed: . I'm trying to find how to allocate time here again, so I hope I'll be more active again, but I can't promise just yet.
Nonetheless, see #2734 for a fix for the issue at hand :)
Thank you!