Nltk: Update various regex escape sequences

Created on 28 Aug 2019  ·  14Comments  ·  Source: nltk/nltk

The latest versions of Python are more strict wrt. escape in regex.
For instance with 3.6.8, there are 10+ warnings like this one:

...
lib/python3.6/site-packages/nltk/featstruct.py:2092: DeprecationWarning: invalid escape sequence \d
    RANGE_RE = re.compile('(-?\d+):(-?\d+)')

The regex(es) should be updated to silence these warnings.

goodfirstbug pythonic

Most helpful comment

If there is no one working on this, I would like to. Can you tell steps to duplicate the issue please?

All 14 comments

If there is no one working on this, I would like to. Can you tell steps to duplicate the issue please?

@PabloDino Install Python 3.6.8 or later and try to import every module. The fix the regex either by using raw strings or using proper escape such that this works both on Python 2 and 3

I'm on it- been working through some exercises but not seeing any warnings. Can you post a code snippet in which the warnings manifest pl

@PabloDino :

$ python --version
Python 3.6.8
$ git clone git://github.com/nltk/nltk.git
$ pip install pytest
$ pytest -vvs nltk/ --collect-only
========================================= warnings summary =========================================
nltk/nltk/featstruct.py:1295
  /home/pombreda/tmp/nl/nltk/nltk/featstruct.py:1295: DeprecationWarning: invalid escape sequence \d
    name, n = re.sub("\d+$", "", var.name), 2

nltk/nltk/featstruct.py:2091
  /home/pombreda/tmp/nl/nltk/nltk/featstruct.py:2091: DeprecationWarning: invalid escape sequence \d
    RANGE_RE = re.compile("(-?\d+):(-?\d+)")

nltk/nltk/sem/evaluate.py:307
  /home/pombreda/tmp/nl/nltk/nltk/sem/evaluate.py:307: DeprecationWarning: invalid escape sequence \ 
    """

nltk/nltk/sem/relextract.py:128
  /home/pombreda/tmp/nl/nltk/nltk/sem/relextract.py:128: DeprecationWarning: invalid escape sequence \w
    ENT = re.compile("&(\w+?);")

nltk/nltk/sem/relextract.py:407
  /home/pombreda/tmp/nl/nltk/nltk/sem/relextract.py:407: DeprecationWarning: invalid escape sequence \s
    """

nltk/nltk/sem/boxer.py:776
  /home/pombreda/tmp/nl/nltk/nltk/sem/boxer.py:776: DeprecationWarning: invalid escape sequence \d
    assert re.match("^[exps]\d+$", var), var

nltk/nltk/sem/drt.py:716
  /home/pombreda/tmp/nl/nltk/nltk/sem/drt.py:716: DeprecationWarning: invalid escape sequence \ 
    + [" \  " + blank + line for line in term_lines[1:2]]

nltk/nltk/sem/drt.py:717
  /home/pombreda/tmp/nl/nltk/nltk/sem/drt.py:717: DeprecationWarning: invalid escape sequence \ 
    + [" /\ " + var_string + line for line in term_lines[2:3]]

nltk/nltk/grammar.py:1291
  /home/pombreda/tmp/nl/nltk/nltk/grammar.py:1291: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/grammar.py:1463
  /home/pombreda/tmp/nl/nltk/nltk/grammar.py:1463: DeprecationWarning: invalid escape sequence \w
    _STANDARD_NONTERM_RE = re.compile("( [\w/][\w/^<>-]* ) \s*", re.VERBOSE)

nltk/nltk/text.py:650
  /home/pombreda/tmp/nl/nltk/nltk/text.py:650: DeprecationWarning: invalid escape sequence \w
    _CONTEXT_RE = re.compile("\w+|[\.\!\?]")

nltk/nltk/tokenize/punkt.py:1462
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/punkt.py:1462: DeprecationWarning: invalid escape sequence \s
    pat = "\s*".join(re.escape(c) for c in tok)

nltk/nltk/tokenize/regexp.py:100
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/regexp.py:100: DeprecationWarning: invalid escape sequence \w
    """

nltk/nltk/tokenize/regexp.py:193
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/regexp.py:193: DeprecationWarning: invalid escape sequence \w
    """

nltk/nltk/tokenize/repp.py:133
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/repp.py:133: DeprecationWarning: invalid escape sequence \(
    line_regex = re.compile("^\((\d+), (\d+), (.+)\)$", re.MULTILINE)

nltk/nltk/tokenize/texttiling.py:96
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/texttiling.py:96: DeprecationWarning: invalid escape sequence \-
    c for c in lowercase_text if re.match("[a-z\-' \n\t]", c)

nltk/nltk/tokenize/texttiling.py:229
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/texttiling.py:229: DeprecationWarning: invalid escape sequence \w
    matches = re.finditer("\w+", text)

nltk/nltk/tokenize/toktok.py:53
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:53: DeprecationWarning: invalid escape sequence \]
    FUNKY_PUNCT_1 = re.compile(u'([،;؛¿!"\])}»›”؟¡%٪°±©®।॥…])'), r" \1 "

nltk/nltk/tokenize/toktok.py:55
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:55: DeprecationWarning: invalid escape sequence \[
    FUNKY_PUNCT_2 = re.compile(u"([({\[“‘„‚«‹「『])"), r" \1 "

nltk/nltk/tokenize/toktok.py:62
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:62: DeprecationWarning: invalid escape sequence \|
    PIPE = re.compile("\|"), " &#124; "

nltk/nltk/tokenize/treebank.py:269
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:269: DeprecationWarning: invalid escape sequence \]
    """

nltk/nltk/tokenize/treebank.py:273
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:273: DeprecationWarning: invalid escape sequence \s
    re.compile(pattern.replace("(?#X)", "\s"))

nltk/nltk/tokenize/treebank.py:277
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:277: DeprecationWarning: invalid escape sequence \s
    re.compile(pattern.replace("(?#X)", "\s"))

nltk/nltk/tree.py:99
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:99: DeprecationWarning: invalid escape sequence \ 
    """

nltk/nltk/tree.py:652
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:652: DeprecationWarning: invalid escape sequence \s
    if re.search("\s", brackets):

nltk/nltk/tree.py:658
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:658: DeprecationWarning: invalid escape sequence \s
    node_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)

nltk/nltk/tree.py:660
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:660: DeprecationWarning: invalid escape sequence \s
    leaf_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)

nltk/nltk/tree.py:662
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:662: DeprecationWarning: invalid escape sequence \s
    "%s\s*(%s)?|%s|(%s)"

nltk/nltk/tree.py:900
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:900: DeprecationWarning: invalid escape sequence \$
    reserved_chars = re.compile("([#\$%&~_\{\}])")

nltk/nltk/parse/chart.py:1034
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1034: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1073
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1073: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1128
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1128: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1148
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1148: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1218
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1218: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1241
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1241: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/featurechart.py:270
  /home/pombreda/tmp/nl/nltk/nltk/parse/featurechart.py:270: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/featurechart.py:369
  /home/pombreda/tmp/nl/nltk/nltk/parse/featurechart.py:369: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/tag/sequential.py:730
  /home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:730: DeprecationWarning: invalid escape sequence \w
    elif re.match("\w+$", word):

nltk/nltk/tag/sequential.py:724
  /home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:724: DeprecationWarning: invalid escape sequence \W
    elif re.match("\W+$", word):

nltk/nltk/tag/sequential.py:722
  /home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:722: DeprecationWarning: invalid escape sequence \.
    if re.match("[0-9]+(\.[0-9]*)?|[0-9]*\.[0-9]+$", word):

nltk/nltk/classify/rte_classify.py:61
  /home/pombreda/tmp/nl/nltk/nltk/classify/rte_classify.py:61: DeprecationWarning: invalid escape sequence \w
    tokenizer = RegexpTokenizer("[\w.@:/]+|\w+|\$[\d.]+")

nltk/nltk/classify/maxent.py:1351
  /home/pombreda/tmp/nl/nltk/nltk/classify/maxent.py:1351: DeprecationWarning: invalid escape sequence \ 
    """

nltk/nltk/chunk/util.py:371
  /home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:371: DeprecationWarning: invalid escape sequence \S
    _LINE_RE = re.compile("(\S+)\s+(\S+)\s+([IOB])-?(\S+)?")

nltk/nltk/chunk/util.py:517
  /home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:517: DeprecationWarning: invalid escape sequence \w
    _IEER_TYPE_RE = re.compile('<b_\w+\s+[^>]*?type="(?P<type>\w+)"')

nltk/nltk/chunk/util.py:526
  /home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:526: DeprecationWarning: invalid escape sequence \s
    for piece_m in re.finditer("<[^>]+>|[^\s<]+", s):

nltk/nltk/chunk/regexp.py:70
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:70: DeprecationWarning: invalid escape sequence \{
    _BRACKETS = re.compile("[^\{\}]+")

nltk/nltk/chunk/regexp.py:215
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:215: DeprecationWarning: invalid escape sequence \{
    s = re.sub("\{\}", "", s)

nltk/nltk/chunk/regexp.py:426
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:426: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "{\g<chunk>}", descr)

nltk/nltk/chunk/regexp.py:471
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:471: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "}\g<chink>{", descr)

nltk/nltk/chunk/regexp.py:510
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:510: DeprecationWarning: invalid escape sequence \{
    regexp = re.compile("\{(?P<chunk>%s)\}" % tag_pattern2re_pattern(tag_pattern))

nltk/nltk/chunk/regexp.py:511
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:511: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "\g<chunk>", descr)

nltk/nltk/chunk/regexp.py:575
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:575: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "\g<left>", descr)

nltk/nltk/chunk/regexp.py:708
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:708: DeprecationWarning: invalid escape sequence \{
    "(?P<left>%s)\{(?P<right>%s)"

nltk/nltk/chunk/regexp.py:714
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:714: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "{\g<left>\g<right>", descr)

nltk/nltk/chunk/regexp.py:778
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:778: DeprecationWarning: invalid escape sequence \}
    "(?P<left>%s)\}(?P<right>%s)"

nltk/nltk/chunk/regexp.py:784
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:784: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "\g<left>\g<right>}", descr)

nltk/nltk/chunk/regexp.py:896
nltk/nltk/chunk/regexp.py:896
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:896: DeprecationWarning: invalid escape sequence \{
    r"^((%s|<%s>)*)$" % ("([^\{\}<>]|\{\d+,?\}|\{\d*,\d+\})+", "[^\{\}<>]+")

nltk/nltk/chunk/regexp.py:1175
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:1175: DeprecationWarning: invalid escape sequence \.
    """

nltk/nltk/inference/discourse.py:44
  /home/pombreda/tmp/nl/nltk/nltk/inference/discourse.py:44: DeprecationWarning: invalid escape sequence \ 
    """

nltk/nltk/stem/lancaster.py:192
  /home/pombreda/tmp/nl/nltk/nltk/stem/lancaster.py:192: DeprecationWarning: invalid escape sequence \*
    valid_rule = re.compile("^[a-z]+\*?\d[a-z]*[>\.]?$")

nltk/nltk/stem/lancaster.py:225
  /home/pombreda/tmp/nl/nltk/nltk/stem/lancaster.py:225: DeprecationWarning: invalid escape sequence \*
    valid_rule = re.compile("^([a-z]+)(\*?)(\d)([a-z]*)([>\.]?)$")

nltk/nltk/stem/porter.py:177
  /home/pombreda/tmp/nl/nltk/nltk/stem/porter.py:177: DeprecationWarning: invalid escape sequence \m
    """

nltk/nltk/corpus/__init__.py:116
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:116: DeprecationWarning: invalid escape sequence \.
    ".*\.(test|train).*",

nltk/nltk/corpus/__init__.py:123
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:123: DeprecationWarning: invalid escape sequence \.
    ".*\.(test|train).*",

nltk/nltk/corpus/__init__.py:126
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:126: DeprecationWarning: invalid escape sequence \.
    crubadan = LazyCorpusLoader("crubadan", CrubadanCorpusReader, ".*\.txt")

nltk/nltk/corpus/__init__.py:128
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:128: DeprecationWarning: invalid escape sequence \.
    "dependency_treebank", DependencyCorpusReader, ".*\.dp", encoding="ascii"

nltk/nltk/corpus/__init__.py:311
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:311: DeprecationWarning: invalid escape sequence \.
    "timit", TimitTaggedCorpusReader, ".+\.tags", tagset="wsj", encoding="ascii"

nltk/nltk/corpus/__init__.py:335
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:335: DeprecationWarning: invalid escape sequence \.
    twitter_samples = LazyCorpusLoader("twitter_samples", TwitterCorpusReader, ".*\.json")

nltk/nltk/corpus/__init__.py:364
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:364: DeprecationWarning: invalid escape sequence \.
    wordnet_ic = LazyCorpusLoader("wordnet_ic", WordNetICCorpusReader, ".*\.dat")

nltk/nltk/corpus/__init__.py:374
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:374: DeprecationWarning: invalid escape sequence \.
    "frames/.*\.xml",

nltk/nltk/corpus/__init__.py:383
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:383: DeprecationWarning: invalid escape sequence \.
    "frames/.*\.xml",

nltk/nltk/corpus/__init__.py:392
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:392: DeprecationWarning: invalid escape sequence \.
    "frames/.*\.xml",

nltk/nltk/corpus/__init__.py:401
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:401: DeprecationWarning: invalid escape sequence \.
    "frames/.*\.xml",

nltk/nltk/corpus/reader/plaintext.py:62
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/plaintext.py:62: DeprecationWarning: invalid escape sequence \.
    """

nltk/nltk/corpus/reader/util.py:635
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/util.py:635: DeprecationWarning: invalid escape sequence \d
    if re.match("^\d+-\d+", line) is not None:

nltk/nltk/corpus/reader/util.py:859
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/util.py:859: DeprecationWarning: invalid escape sequence \s
    if re.match("======+\s*$", line):

nltk/nltk/corpus/reader/api.py:77
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/api.py:77: DeprecationWarning: invalid escape sequence \.
    m = re.match("(.*\.zip)/?(.*)$|", root)

nltk/nltk/corpus/reader/timit.py:165
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/timit.py:165: DeprecationWarning: invalid escape sequence \.
    encoding = [(".*\.wav", None), (".*", encoding)]

nltk/nltk/corpus/reader/bracket_parse.py:214
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/bracket_parse.py:214: DeprecationWarning: invalid escape sequence \.
    "alpino\.xml",

nltk/nltk/corpus/reader/xmldocs.py:232
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/xmldocs.py:232: DeprecationWarning: invalid escape sequence \s
    _XML_TAG_NAME = re.compile("<\s*/?\s*([^\s>]+)")

nltk/nltk/toolbox.py:209
  /home/pombreda/tmp/nl/nltk/nltk/toolbox.py:209: DeprecationWarning: invalid escape sequence \_
    """

nltk/nltk/corpus/reader/bnc.py:29
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/bnc.py:29: DeprecationWarning: invalid escape sequence \w
    """

nltk/nltk/corpus/reader/switchboard.py:113
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/switchboard.py:113: DeprecationWarning: invalid escape sequence \w
    _UTTERANCE_RE = re.compile("(\w+)\.(\d+)\:\s*(.*)")

nltk/nltk/corpus/reader/childes.py:281
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/childes.py:281: DeprecationWarning: invalid escape sequence \d
    m = re.match("P(\d+)Y(\d+)M?(\d?\d?)D?", age_year)

nltk/nltk/corpus/reader/framenet.py:2753
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/framenet.py:2753: DeprecationWarning: invalid escape sequence \w
    """

nltk/nltk/corpus/reader/udhr.py:30
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/udhr.py:30: DeprecationWarning: invalid escape sequence \-
    ("Abkhaz\-Cyrillic\+Abkh", "cp1251"),

nltk/nltk/corpus/reader/twitter.py:54
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/twitter.py:54: DeprecationWarning: invalid escape sequence \.
    """

nltk/nltk/ccg/combinator.py:225
  /home/pombreda/tmp/nl/nltk/nltk/ccg/combinator.py:225: DeprecationWarning: invalid escape sequence \Y
    """

nltk/nltk/treetransforms.py:108
  /home/pombreda/tmp/nl/nltk/nltk/treetransforms.py:108: DeprecationWarning: invalid escape sequence \ 
    """

And FWIW: https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the result. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.) It is also important to note that the escape sequences only recognized in string literals fall into the category of unrecognized escapes for bytes literals.

Changed in version 3.6: Unrecognized escape sequences produce a DeprecationWarning. In some future version of Python they will be a SyntaxError.

$ python --version
Python 3.6.7
$ pytest --version
This is pytest version 5.1.2, imported from /pytest.py
$ pytest -vvs nltk/ --collect-only
============================= test session starts ==============================
platform linux -- Python 3.6.7, pytest-5.1.2, py-1.8.0, pluggy-0.12.0 -- *
/python3
cachedir: .pytest_cache
rootdir: *
*/nltk
collected 381 items


Unit tests for nltk.compat.
See also nltk/test/compat.doctest.






Unit tests for nltk.metrics.aline

Test Aline algorithm for aligning phonetic sequences


Test aline for computing the difference between two segments

Tests for Brill tagger.















Test for bug https://github.com/nltk/nltk/issues/1597

    Ensures that curly bracket quantifiers can be used inside a chunk rule.
    This type of quantifier has been used for the supplementary example
    in http://www.nltk.org/book/ch07.html#exploring-text-corpora.


Unit tests for nltk.classify. See also: nltk/test/classify.doctest









Text constructed using: http://www.nltk.org/book/ch01.html





Mock test for Stanford CoreNLP wrappers.






































Corpus View Regression Tests









Class containing unit tests for nltk.metrics.agreement.Disagreement.

More advanced test, based on
http://www.agreestat.com/research_papers/onkrippendorffalpha.pdf

Same more advanced example, but with 1 rating removed.
Again, removal of that 1 rating shoudl not matter.

Simple test, based on
https://github.com/foolswood/krippendorffs_alpha/raw/master/krippendorff.pdf.

Same simple test with 1 rating removed.
Removal of that rating should not matter: K-Apha ignores items with
only 1 rating.





Regression tests for json2csv() and json2csv_entities() in Twitter
package.


Sanity check that file comparison is not giving false positives.























Unit tests for nltk.corpus.nombank





Tests for nltk.pos_tag














The following test performs a random series of reads, seeks, and
tells, and checks that the results are consistent.




Unit tests for Senna

Unittest for nltk.classify.senna

Senna pipeline interface

Unittest for nltk.tag.senna






this unit testing for test the snowball arabic light stemmer
this stemmer deals with prefixes and suffixes






Test for bug https://github.com/nltk/nltk/issues/1581

    Ensures that 'oed' can be stemmed without throwing an error.
  <TestCaseFunction test_vocabulary_martin_mode>
    Tests all words from the test vocabulary provided by M Porter

    The sample vocabulary and output were sourced from:
    http://tartarus.org/martin/PorterStemmer/voc.txt
    http://tartarus.org/martin/PorterStemmer/output.txt
    and are linked to from the Porter Stemmer algorithm's homepage
    at
    http://tartarus.org/martin/PorterStemmer/
  <TestCaseFunction test_vocabulary_nltk_mode>
  <TestCaseFunction test_vocabulary_original_mode>




Unit tests for nltk.tgrep.

Class containing unit tests for nltk.tgrep.

Test error handling of undefined tgrep operators.

Test that comments are correctly filtered out of tgrep search
strings.

Test the Basic Examples from the TGrep2 manual.

Test labeled nodes.

    Test case from Emily M. Bender.
  <TestCaseFunction test_multiple_conjs>
    Test that multiple (3 or more) conjunctions of node relations are
    handled properly.
  <TestCaseFunction test_node_encoding>
    Test that tgrep search strings handles bytes and strs the same
    way.
  <TestCaseFunction test_node_nocase>
    Test selecting nodes using case insensitive node names.
  <TestCaseFunction test_node_noleaves>
    Test node name matching with the search_leaves flag set to False.
  <TestCaseFunction test_node_printing>
    Test that the tgrep print operator ' is properly ignored.
  <TestCaseFunction test_node_quoted>
    Test selecting nodes using quoted node names.
  <TestCaseFunction test_node_regex>
    Test regex matching on nodes.
  <TestCaseFunction test_node_regex_2>
    Test regex matching on nodes.
  <TestCaseFunction test_node_simple>
    Test a simple use of tgrep for finding nodes matching a given
    pattern.
  <TestCaseFunction test_node_tree_position>
    Test matching on nodes based on NLTK tree position.
  <TestCaseFunction test_rel_precedence>
    Test matching nodes based on precedence relations.
  <TestCaseFunction test_rel_sister_nodes>
    Test matching sister nodes in a tree.
  <TestCaseFunction test_tokenize_encoding>
    Test that tokenization handles bytes and strs the same way.
  <TestCaseFunction test_tokenize_examples>
    Test tokenization of the TGrep2 manual example patterns.
  <TestCaseFunction test_tokenize_link_types>
    Test tokenization of basic link types.
  <TestCaseFunction test_tokenize_macros>
    Test tokenization of macro definitions.
  <TestCaseFunction test_tokenize_node_labels>
    Test tokenization of labeled nodes.
  <TestCaseFunction test_tokenize_nodenames>
    Test tokenization of node names.
  <TestCaseFunction test_tokenize_quoting>
    Test tokenization of quoting.
  <TestCaseFunction test_tokenize_segmented_patterns>
    Test tokenization of segmented patterns.
  <TestCaseFunction test_tokenize_simple>
    Simple test of tokenization.
  <TestCaseFunction test_trailing_semicolon>
    Test that semicolons at the end of a tgrep2 search string won't
    cause a parse failure.
  <TestCaseFunction test_use_macros>
    Test defining and using tgrep2 macros.
  <TestCaseFunction tests_rel_dominance>
    Test matching nodes based on dominance relations.
  <TestCaseFunction tests_rel_indexed_children>
    Test matching nodes based on their index in their parent node.


Unit tests for nltk.tokenize.
See also nltk/test/tokenize.doctest


Test padding of asterisk for word tokenization.

Test padding of dotdot* for word tokenization.

Test a string that resembles a phone number but contains a newline




Test remove_handle() from casual.py with specially crafted edge cases

Test SyllableTokenizer tokenizer.

Test the Stanford Word Segmenter for Arabic (default config)

Test the Stanford Word Segmenter for Chinese (default config)

Test TreebankWordTokenizer.span_tokenize function

Test TweetTokenizer using words with special and accented characters.

Test word_tokenize function

Tests for static parts of Twitter package

Tests that Twitter credentials information from file is handled correctly.

Default credentials file is identified

Default credentials file has been read correctluy

Path to default credentials file is well-formed, given specified
subdir.

Setting subdir to empty path should raise an error.

Setting subdir to None should raise an error.

Test that environment variable has been read correctly.

Credentials file 'bad_oauth1-1.txt' is incomplete

First key in credentials file 'bad_oauth1-2.txt' is ill-formed

First key in credentials file 'bad_oauth1-2.txt' is ill-formed

Setting subdir to nonexistent directory should raise an error.

Defaults for authentication will fail since 'credentials.txt' not
present in default subdir, as read from os.environ['TWITTER'].

Credentials file 'foobar' cannot be found in default subdir.

Unit tests for nltk.corpus.wordnet
See also nltk/test/wordnet.doctest

















Tests for NgramCounter that only involve lookup, no modification.


















Unit tests for MLE ngram model.



















MLE trigram model tests















Unit tests for Lidstone class






























Unit tests for Laplace class
















































Using MLE model, generate some text.












tests Vocabulary Class





















Tests for BLEU translation evaluation metric




Examples from the original BLEU paper
http://www.aclweb.org/anthology/P02-1040.pdf













Tests GDFA alignments


Testing GDFA with first 10 eflomal outputs from issue #1829
https://github.com/nltk/nltk/issues/1829

Tests for IBM Model 1 training methods





Tests for IBM Model 2 training methods





Tests for IBM Model 3 training methods





Tests for IBM Model 4 training methods





Tests for IBM Model 5 training methods






Tests for common methods of IBM translation models














Tests for NIST translation evaluation metric



Tests for stack decoder




















============================ no tests ran in 2.13s =============================
$

I'm seeing the same output as @pombredanne.

Hi, is @PabloDino still planning to work on the issue?

I have been able to replicate @pombredanne 's output and would like to work on fixing this issue.

Go ahead, I haven't replicated yet

On Mon, Sep 30, 2019 at 11:40 AM Armin Stepanjan notifications@github.com
wrote:

Hi, is @PabloDino https://github.com/PabloDino still planning to work
on the issue?

I have been able to replicate @pombredanne
https://github.com/pombredanne 's output and would like to work on
fixing this issue.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/nltk/nltk/issues/2378?email_source=notifications&email_token=ABRSN4KL27M5TYFOR65HRMDQMIMVPA5CNFSM4IRCRGM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD76DA2A#issuecomment-536621160,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABRSN4KRASPRV6I4VLHFNILQMIMVPANCNFSM4IRCRGMQ
.

@ab-10 Have you been able to fix those dep warnings?

An updated list with Python 3.8 with running below command :

find . -iname '*.py' | xargs -P 4 -I{} python3.8 -Wall -m py_compile {}
./nltk/chat/iesha.py:52: DeprecationWarning: invalid escape sequence \<
  "u think I can%2??! really?? kekeke \<_\<",
./nltk/tag/sequential.py:730: DeprecationWarning: invalid escape sequence \w
  elif re.match("\w+$", word):
./nltk/tag/sequential.py:724: DeprecationWarning: invalid escape sequence \W
  elif re.match("\W+$", word):
./nltk/tag/sequential.py:722: DeprecationWarning: invalid escape sequence \.
  if re.match("[0-9]+(\.[0-9]*)?|[0-9]*\.[0-9]+$", word):
./nltk/app/chunkparser_app.py:206: DeprecationWarning: invalid escape sequence \#
  "\t<regexp><\#><CD> # This is a comment...</regexp>\n"
./nltk/app/chunkparser_app.py:315: DeprecationWarning: invalid escape sequence \s
  grammar = re.sub("\n\s+", "\n", grammar)
./nltk/app/chunkparser_app.py:1061: DeprecationWarning: invalid escape sequence \w
  key=lambda t_w: re.match("\w+", t_w[0])
./nltk/app/chunkparser_app.py:1422: DeprecationWarning: invalid escape sequence \#
  "^\# Regexp Chunk Parsing Grammar[\s\S]*" "F-score:.*\n", "", grammar
./nltk/sem/cooper_storage.py:48: DeprecationWarning: invalid escape sequence \P
  """
./nltk/sem/relextract.py:128: DeprecationWarning: invalid escape sequence \w
  ENT = re.compile("&(\w+?);")
./nltk/sem/relextract.py:382: DeprecationWarning: invalid escape sequence \s
  roles = """
./nltk/sem/boxer.py:776: DeprecationWarning: invalid escape sequence \d
  assert re.match("^[exps]\d+$", var), var
./nltk/sem/drt.py:716: DeprecationWarning: invalid escape sequence \ 
  + [" \  " + blank + line for line in term_lines[1:2]]
./nltk/sem/drt.py:717: DeprecationWarning: invalid escape sequence \ 
  + [" /\ " + var_string + line for line in term_lines[2:3]]
./nltk/sem/chat80.py:9: DeprecationWarning: invalid escape sequence \P
  """
./nltk/sem/chat80.py:705: DeprecationWarning: invalid escape sequence \P
  template = "PropN[num=sg, sem=<\P.(P %s)>] -> '%s'\n"
./nltk/sem/evaluate.py:257: DeprecationWarning: invalid escape sequence \ 
  """
./nltk/corpus/reader/util.py:635: DeprecationWarning: invalid escape sequence \d
  if re.match("^\d+-\d+", line) is not None:
./nltk/corpus/reader/util.py:859: DeprecationWarning: invalid escape sequence \s
  if re.match("======+\s*$", line):
./nltk/corpus/reader/framenet.py:2748: DeprecationWarning: invalid escape sequence \w
  """
./nltk/corpus/reader/bracket_parse.py:215: DeprecationWarning: invalid escape sequence \.
  "alpino\.xml",
./nltk/corpus/reader/twitter.py:25: DeprecationWarning: invalid escape sequence \.
  """
./nltk/corpus/reader/xmldocs.py:232: DeprecationWarning: invalid escape sequence \s
  _XML_TAG_NAME = re.compile("<\s*/?\s*([^\s>]+)")
./nltk/corpus/reader/bnc.py:15: DeprecationWarning: invalid escape sequence \w
  """Corpus reader for the XML version of the British National Corpus.
./nltk/corpus/reader/udhr.py:30: DeprecationWarning: invalid escape sequence \-
  ("Abkhaz\-Cyrillic\+Abkh", "cp1251"),
./nltk/corpus/reader/timit.py:165: DeprecationWarning: invalid escape sequence \.
  encoding = [(".*\.wav", None), (".*", encoding)]
./nltk/corpus/reader/childes.py:281: DeprecationWarning: invalid escape sequence \d
  m = re.match("P(\d+)Y(\d+)M?(\d?\d?)D?", age_year)
./nltk/corpus/reader/plaintext.py:47: DeprecationWarning: invalid escape sequence \.
  """
./nltk/corpus/reader/switchboard.py:113: DeprecationWarning: invalid escape sequence \w
  _UTTERANCE_RE = re.compile("(\w+)\.(\d+)\:\s*(.*)")
./nltk/corpus/reader/api.py:77: DeprecationWarning: invalid escape sequence \.
  m = re.match("(.*\.zip)/?(.*)$|", root)
./nltk/corpus/__init__.py:116: DeprecationWarning: invalid escape sequence \.
  ".*\.(test|train).*",
./nltk/corpus/__init__.py:123: DeprecationWarning: invalid escape sequence \.
  ".*\.(test|train).*",
./nltk/corpus/__init__.py:126: DeprecationWarning: invalid escape sequence \.
  crubadan = LazyCorpusLoader("crubadan", CrubadanCorpusReader, ".*\.txt")
./nltk/corpus/__init__.py:128: DeprecationWarning: invalid escape sequence \.
  "dependency_treebank", DependencyCorpusReader, ".*\.dp", encoding="ascii"
./nltk/corpus/__init__.py:311: DeprecationWarning: invalid escape sequence \.
  "timit", TimitTaggedCorpusReader, ".+\.tags", tagset="wsj", encoding="ascii"
./nltk/corpus/__init__.py:335: DeprecationWarning: invalid escape sequence \.
  twitter_samples = LazyCorpusLoader("twitter_samples", TwitterCorpusReader, ".*\.json")
./nltk/corpus/__init__.py:364: DeprecationWarning: invalid escape sequence \.
  wordnet_ic = LazyCorpusLoader("wordnet_ic", WordNetICCorpusReader, ".*\.dat")
./nltk/corpus/__init__.py:374: DeprecationWarning: invalid escape sequence \.
  "frames/.*\.xml",
./nltk/corpus/__init__.py:383: DeprecationWarning: invalid escape sequence \.
  "frames/.*\.xml",
./nltk/corpus/__init__.py:392: DeprecationWarning: invalid escape sequence \.
  "frames/.*\.xml",
./nltk/corpus/__init__.py:401: DeprecationWarning: invalid escape sequence \.
  "frames/.*\.xml",
./nltk/text.py:650: DeprecationWarning: invalid escape sequence \w
  _CONTEXT_RE = re.compile("\w+|[\.\!\?]")
./nltk/inference/discourse.py:9: DeprecationWarning: invalid escape sequence \ 
  """
./nltk/tree.py:38: DeprecationWarning: invalid escape sequence \ 
  """
./nltk/tree.py:652: DeprecationWarning: invalid escape sequence \s
  if re.search("\s", brackets):
./nltk/tree.py:658: DeprecationWarning: invalid escape sequence \s
  node_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)
./nltk/tree.py:660: DeprecationWarning: invalid escape sequence \s
  leaf_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)
./nltk/tree.py:662: DeprecationWarning: invalid escape sequence \s
  "%s\s*(%s)?|%s|(%s)"
./nltk/tree.py:900: DeprecationWarning: invalid escape sequence \$
  reserved_chars = re.compile("([#\$%&~_\{\}])")
./nltk/ccg/combinator.py:220: DeprecationWarning: invalid escape sequence \Y
  """
./nltk/tokenize/toktok.py:53: DeprecationWarning: invalid escape sequence \]
  FUNKY_PUNCT_1 = re.compile(u'([،;؛¿!"\])}»›”؟¡%٪°±©®।॥…])'), r" \1 "
./nltk/tokenize/toktok.py:55: DeprecationWarning: invalid escape sequence \[
  FUNKY_PUNCT_2 = re.compile(u"([({\[“‘„‚«‹「『])"), r" \1 "
./nltk/tokenize/toktok.py:62: DeprecationWarning: invalid escape sequence \|
  PIPE = re.compile("\|"), " &#124; "
./nltk/tokenize/punkt.py:1462: DeprecationWarning: invalid escape sequence \s
  pat = "\s*".join(re.escape(c) for c in tok)
./nltk/tokenize/repp.py:133: DeprecationWarning: invalid escape sequence \(
  line_regex = re.compile("^\((\d+), (\d+), (.+)\)$", re.MULTILINE)
./nltk/tokenize/nist.py:81: DeprecationWarning: invalid escape sequence \{
  PUNCT = re.compile("([\{-\~\[-\` -\&\(-\+\:-\@\/])"), " \\1 "
./nltk/tokenize/nist.py:83: DeprecationWarning: invalid escape sequence \.
  PERIOD_COMMA_PRECEED = re.compile("([^0-9])([\.,])"), "\\1 \\2 "
./nltk/tokenize/nist.py:85: DeprecationWarning: invalid escape sequence \.
  PERIOD_COMMA_FOLLOW = re.compile("([\.,])([^0-9])"), " \\1 \\2"
./nltk/tokenize/treebank.py:194: DeprecationWarning: invalid escape sequence \]
  """
./nltk/tokenize/treebank.py:255: DeprecationWarning: invalid escape sequence \s
  re.compile(pattern.replace("(?#X)", "\s"))
./nltk/tokenize/treebank.py:259: DeprecationWarning: invalid escape sequence \s
  re.compile(pattern.replace("(?#X)", "\s"))
./nltk/tokenize/texttiling.py:96: DeprecationWarning: invalid escape sequence \-
  c for c in lowercase_text if re.match("[a-z\-' \n\t]", c)
./nltk/tokenize/texttiling.py:229: DeprecationWarning: invalid escape sequence \w
  matches = re.finditer("\w+", text)
./nltk/tokenize/regexp.py:76: DeprecationWarning: invalid escape sequence \w
  """
./nltk/tokenize/regexp.py:184: DeprecationWarning: invalid escape sequence \w
  """
./nltk/classify/maxent.py:1292: DeprecationWarning: invalid escape sequence \ 
  """
./nltk/classify/rte_classify.py:61: DeprecationWarning: invalid escape sequence \w
  tokenizer = RegexpTokenizer("[\w.@:/]+|\w+|\$[\d.]+")
./nltk/parse/chart.py:1024: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1057: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1123: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1140: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1213: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1232: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/featurechart.py:251: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/featurechart.py:353: DeprecationWarning: invalid escape sequence \*
  """
./nltk/chunk/util.py:371: DeprecationWarning: invalid escape sequence \S
  _LINE_RE = re.compile("(\S+)\s+(\S+)\s+([IOB])-?(\S+)?")
./nltk/chunk/util.py:517: DeprecationWarning: invalid escape sequence \w
  _IEER_TYPE_RE = re.compile('<b_\w+\s+[^>]*?type="(?P<type>\w+)"')
./nltk/chunk/util.py:526: DeprecationWarning: invalid escape sequence \s
  for piece_m in re.finditer("<[^>]+>|[^\s<]+", s):
./nltk/chunk/named_entity.py:178: DeprecationWarning: invalid escape sequence \w
  elif re.match("\w+$", word, re.UNICODE):
./nltk/chunk/named_entity.py:176: DeprecationWarning: invalid escape sequence \W
  elif re.match("\W+$", word, re.UNICODE):
./nltk/chunk/named_entity.py:174: DeprecationWarning: invalid escape sequence \.
  if re.match("[0-9]+(\.[0-9]*)?|[0-9]*\.[0-9]+$", word, re.UNICODE):
./nltk/chunk/named_entity.py:250: DeprecationWarning: invalid escape sequence \s
  text = re.sub("[\s\S]*<TEXT>", subfunc, text)
./nltk/chunk/named_entity.py:251: DeprecationWarning: invalid escape sequence \s
  text = re.sub("</TEXT>[\s\S]*", "", text)
./nltk/chunk/regexp.py:70: DeprecationWarning: invalid escape sequence \{
  _BRACKETS = re.compile("[^\{\}]+")
./nltk/chunk/regexp.py:215: DeprecationWarning: invalid escape sequence \{
  s = re.sub("\{\}", "", s)
./nltk/chunk/regexp.py:426: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "{\g<chunk>}", descr)
./nltk/chunk/regexp.py:471: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "}\g<chink>{", descr)
./nltk/chunk/regexp.py:510: DeprecationWarning: invalid escape sequence \{
  regexp = re.compile("\{(?P<chunk>%s)\}" % tag_pattern2re_pattern(tag_pattern))
./nltk/chunk/regexp.py:511: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "\g<chunk>", descr)
./nltk/chunk/regexp.py:575: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "\g<left>", descr)
./nltk/chunk/regexp.py:708: DeprecationWarning: invalid escape sequence \{
  "(?P<left>%s)\{(?P<right>%s)"
./nltk/chunk/regexp.py:714: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "{\g<left>\g<right>", descr)
./nltk/chunk/regexp.py:778: DeprecationWarning: invalid escape sequence \}
  "(?P<left>%s)\}(?P<right>%s)"
./nltk/chunk/regexp.py:784: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "\g<left>\g<right>}", descr)
./nltk/chunk/regexp.py:896: DeprecationWarning: invalid escape sequence \{
  r"^((%s|<%s>)*)$" % ("([^\{\}<>]|\{\d+,?\}|\{\d*,\d+\})+", "[^\{\}<>]+")
./nltk/chunk/regexp.py:896: DeprecationWarning: invalid escape sequence \{
  r"^((%s|<%s>)*)$" % ("([^\{\}<>]|\{\d+,?\}|\{\d*,\d+\})+", "[^\{\}<>]+")
./nltk/chunk/regexp.py:1136: DeprecationWarning: invalid escape sequence \.
  """
./nltk/featstruct.py:1295: DeprecationWarning: invalid escape sequence \d
  name, n = re.sub("\d+$", "", var.name), 2
./nltk/featstruct.py:2091: DeprecationWarning: invalid escape sequence \d
  RANGE_RE = re.compile("(-?\d+):(-?\d+)")
./nltk/draw/cfg.py:166: DeprecationWarning: invalid escape sequence \s
  _ARROW_RE = re.compile("\s*(->|(" + ARROW + "))\s*")
./nltk/draw/cfg.py:166: DeprecationWarning: invalid escape sequence \s
  _ARROW_RE = re.compile("\s*(->|(" + ARROW + "))\s*")
./nltk/draw/cfg.py:171: DeprecationWarning: invalid escape sequence \s
  + "))\s*"
./nltk/toolbox.py:159: DeprecationWarning: invalid escape sequence \_
  """
./nltk/grammar.py:1278: DeprecationWarning: invalid escape sequence \*
  """
./nltk/grammar.py:1463: DeprecationWarning: invalid escape sequence \w
  _STANDARD_NONTERM_RE = re.compile("( [\w/][\w/^<>-]* ) \s*", re.VERBOSE)
./nltk/stem/porter.py:145: DeprecationWarning: invalid escape sequence \m
  """Returns the 'measure' of stem, per definition in the paper
./nltk/stem/lancaster.py:192: DeprecationWarning: invalid escape sequence \*
  valid_rule = re.compile("^[a-z]+\*?\d[a-z]*[>\.]?$")
./nltk/stem/lancaster.py:225: DeprecationWarning: invalid escape sequence \*
  valid_rule = re.compile("^([a-z]+)(\*?)(\d)([a-z]*)([>\.]?)$")
./nltk/treetransforms.py:8: DeprecationWarning: invalid escape sequence \ 
  """
./tools/nltk_term_index.py:52: DeprecationWarning: invalid escape sequence \s
  SCAN_RE1 = "<programlisting>[\s\S]*?</programlisting>"
./tools/nltk_term_index.py:53: DeprecationWarning: invalid escape sequence \s
  SCAN_RE2 = "<literal>[\s\S]*?</literal>"
./tools/nltk_term_index.py:56: DeprecationWarning: invalid escape sequence \w
  TOKEN_RE = re.compile('[\w\.]+')
./tools/find_deprecated.py:43: DeprecationWarning: invalid escape sequence \s
  '"""[\s\S]*?"""|'
./tools/find_deprecated.py:45: DeprecationWarning: invalid escape sequence \s
  "'''[\s\S]*?'''|"
./tools/find_deprecated.py:47: DeprecationWarning: invalid escape sequence \s
  ")\s*"
./tools/find_deprecated.py:64: DeprecationWarning: invalid escape sequence \.
  '({})\.read\('.format('|'.join(re.escape(n) for n in dir(nltk.corpus)))
./tools/find_deprecated.py:67: DeprecationWarning: invalid escape sequence \s
  CLASS_DEF_RE = re.compile('^\s*class\s+(\w+)\s*[:\(]')

@gertjanwytynck I'm currently fixing them one by one, should be done by the end of the week.

Has this been completed?

It looks like there are still a few left. I wonder if adding a unit test could help.

  • ./nltk/tools/nltk_term_index.py
  • ./nltk/tools/find_deprecated.py
  • ./nltk/nltk/tokenize/punkt.py

... and even though the impact of tools deprecation is not much, there is a bit of irony that the find_deprecated.py scripts uses deprecated syntax :)

$ git clone https://github.com/nltk/nltk.git
$ find . -iname '*.py' | xargs -P 4 -I{} python3.8 -Wall -m py_compile {}
./nltk/tools/nltk_term_index.py:51: DeprecationWarning: invalid escape sequence \s
  SCAN_RE1 = "<programlisting>[\s\S]*?</programlisting>"
./nltk/tools/nltk_term_index.py:52: DeprecationWarning: invalid escape sequence \s
  SCAN_RE2 = "<literal>[\s\S]*?</literal>"
./nltk/tools/nltk_term_index.py:55: DeprecationWarning: invalid escape sequence \w
  TOKEN_RE = re.compile('[\w\.]+')
./nltk/tools/find_deprecated.py:42: DeprecationWarning: invalid escape sequence \s
  '"""[\s\S]*?"""|'
./nltk/tools/find_deprecated.py:44: DeprecationWarning: invalid escape sequence \s
  "'''[\s\S]*?'''|"
./nltk/tools/find_deprecated.py:46: DeprecationWarning: invalid escape sequence \s
  ")\s*"
./nltk/tools/find_deprecated.py:63: DeprecationWarning: invalid escape sequence \.
  '({})\.read\('.format('|'.join(re.escape(n) for n in dir(nltk.corpus)))
./nltk/tools/find_deprecated.py:66: DeprecationWarning: invalid escape sequence \s
  CLASS_DEF_RE = re.compile('^\s*class\s+(\w+)\s*[:\(]')
./nltk/nltk/tokenize/punkt.py:223: DeprecationWarning: invalid escape sequence \]
  return "(?:[)\";}\]\*:@\'\({\[%s])" % re.escape("".join(set(self.sent_end_chars) - {"."}))
Was this page helpful?
0 / 5 - 0 ratings