The latest versions of Python are more strict wrt. escape in regex.
For instance with 3.6.8, there are 10+ warnings like this one:
...
lib/python3.6/site-packages/nltk/featstruct.py:2092: DeprecationWarning: invalid escape sequence \d
RANGE_RE = re.compile('(-?\d+):(-?\d+)')
The regex(es) should be updated to silence these warnings.
If there is no one working on this, I would like to. Can you tell steps to duplicate the issue please?
@PabloDino Install Python 3.6.8 or later and try to import every module. The fix the regex either by using raw strings or using proper escape such that this works both on Python 2 and 3
I'm on it- been working through some exercises but not seeing any warnings. Can you post a code snippet in which the warnings manifest pl
@PabloDino :
$ python --version
Python 3.6.8
$ git clone git://github.com/nltk/nltk.git
$ pip install pytest
$ pytest -vvs nltk/ --collect-only
========================================= warnings summary =========================================
nltk/nltk/featstruct.py:1295
/home/pombreda/tmp/nl/nltk/nltk/featstruct.py:1295: DeprecationWarning: invalid escape sequence \d
name, n = re.sub("\d+$", "", var.name), 2
nltk/nltk/featstruct.py:2091
/home/pombreda/tmp/nl/nltk/nltk/featstruct.py:2091: DeprecationWarning: invalid escape sequence \d
RANGE_RE = re.compile("(-?\d+):(-?\d+)")
nltk/nltk/sem/evaluate.py:307
/home/pombreda/tmp/nl/nltk/nltk/sem/evaluate.py:307: DeprecationWarning: invalid escape sequence \
"""
nltk/nltk/sem/relextract.py:128
/home/pombreda/tmp/nl/nltk/nltk/sem/relextract.py:128: DeprecationWarning: invalid escape sequence \w
ENT = re.compile("&(\w+?);")
nltk/nltk/sem/relextract.py:407
/home/pombreda/tmp/nl/nltk/nltk/sem/relextract.py:407: DeprecationWarning: invalid escape sequence \s
"""
nltk/nltk/sem/boxer.py:776
/home/pombreda/tmp/nl/nltk/nltk/sem/boxer.py:776: DeprecationWarning: invalid escape sequence \d
assert re.match("^[exps]\d+$", var), var
nltk/nltk/sem/drt.py:716
/home/pombreda/tmp/nl/nltk/nltk/sem/drt.py:716: DeprecationWarning: invalid escape sequence \
+ [" \ " + blank + line for line in term_lines[1:2]]
nltk/nltk/sem/drt.py:717
/home/pombreda/tmp/nl/nltk/nltk/sem/drt.py:717: DeprecationWarning: invalid escape sequence \
+ [" /\ " + var_string + line for line in term_lines[2:3]]
nltk/nltk/grammar.py:1291
/home/pombreda/tmp/nl/nltk/nltk/grammar.py:1291: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/grammar.py:1463
/home/pombreda/tmp/nl/nltk/nltk/grammar.py:1463: DeprecationWarning: invalid escape sequence \w
_STANDARD_NONTERM_RE = re.compile("( [\w/][\w/^<>-]* ) \s*", re.VERBOSE)
nltk/nltk/text.py:650
/home/pombreda/tmp/nl/nltk/nltk/text.py:650: DeprecationWarning: invalid escape sequence \w
_CONTEXT_RE = re.compile("\w+|[\.\!\?]")
nltk/nltk/tokenize/punkt.py:1462
/home/pombreda/tmp/nl/nltk/nltk/tokenize/punkt.py:1462: DeprecationWarning: invalid escape sequence \s
pat = "\s*".join(re.escape(c) for c in tok)
nltk/nltk/tokenize/regexp.py:100
/home/pombreda/tmp/nl/nltk/nltk/tokenize/regexp.py:100: DeprecationWarning: invalid escape sequence \w
"""
nltk/nltk/tokenize/regexp.py:193
/home/pombreda/tmp/nl/nltk/nltk/tokenize/regexp.py:193: DeprecationWarning: invalid escape sequence \w
"""
nltk/nltk/tokenize/repp.py:133
/home/pombreda/tmp/nl/nltk/nltk/tokenize/repp.py:133: DeprecationWarning: invalid escape sequence \(
line_regex = re.compile("^\((\d+), (\d+), (.+)\)$", re.MULTILINE)
nltk/nltk/tokenize/texttiling.py:96
/home/pombreda/tmp/nl/nltk/nltk/tokenize/texttiling.py:96: DeprecationWarning: invalid escape sequence \-
c for c in lowercase_text if re.match("[a-z\-' \n\t]", c)
nltk/nltk/tokenize/texttiling.py:229
/home/pombreda/tmp/nl/nltk/nltk/tokenize/texttiling.py:229: DeprecationWarning: invalid escape sequence \w
matches = re.finditer("\w+", text)
nltk/nltk/tokenize/toktok.py:53
/home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:53: DeprecationWarning: invalid escape sequence \]
FUNKY_PUNCT_1 = re.compile(u'([،;؛¿!"\])}»›”؟¡%٪°±©®।॥…])'), r" \1 "
nltk/nltk/tokenize/toktok.py:55
/home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:55: DeprecationWarning: invalid escape sequence \[
FUNKY_PUNCT_2 = re.compile(u"([({\[“‘„‚«‹「『])"), r" \1 "
nltk/nltk/tokenize/toktok.py:62
/home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:62: DeprecationWarning: invalid escape sequence \|
PIPE = re.compile("\|"), " | "
nltk/nltk/tokenize/treebank.py:269
/home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:269: DeprecationWarning: invalid escape sequence \]
"""
nltk/nltk/tokenize/treebank.py:273
/home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:273: DeprecationWarning: invalid escape sequence \s
re.compile(pattern.replace("(?#X)", "\s"))
nltk/nltk/tokenize/treebank.py:277
/home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:277: DeprecationWarning: invalid escape sequence \s
re.compile(pattern.replace("(?#X)", "\s"))
nltk/nltk/tree.py:99
/home/pombreda/tmp/nl/nltk/nltk/tree.py:99: DeprecationWarning: invalid escape sequence \
"""
nltk/nltk/tree.py:652
/home/pombreda/tmp/nl/nltk/nltk/tree.py:652: DeprecationWarning: invalid escape sequence \s
if re.search("\s", brackets):
nltk/nltk/tree.py:658
/home/pombreda/tmp/nl/nltk/nltk/tree.py:658: DeprecationWarning: invalid escape sequence \s
node_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)
nltk/nltk/tree.py:660
/home/pombreda/tmp/nl/nltk/nltk/tree.py:660: DeprecationWarning: invalid escape sequence \s
leaf_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)
nltk/nltk/tree.py:662
/home/pombreda/tmp/nl/nltk/nltk/tree.py:662: DeprecationWarning: invalid escape sequence \s
"%s\s*(%s)?|%s|(%s)"
nltk/nltk/tree.py:900
/home/pombreda/tmp/nl/nltk/nltk/tree.py:900: DeprecationWarning: invalid escape sequence \$
reserved_chars = re.compile("([#\$%&~_\{\}])")
nltk/nltk/parse/chart.py:1034
/home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1034: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/chart.py:1073
/home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1073: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/chart.py:1128
/home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1128: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/chart.py:1148
/home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1148: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/chart.py:1218
/home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1218: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/chart.py:1241
/home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1241: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/featurechart.py:270
/home/pombreda/tmp/nl/nltk/nltk/parse/featurechart.py:270: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/featurechart.py:369
/home/pombreda/tmp/nl/nltk/nltk/parse/featurechart.py:369: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/tag/sequential.py:730
/home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:730: DeprecationWarning: invalid escape sequence \w
elif re.match("\w+$", word):
nltk/nltk/tag/sequential.py:724
/home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:724: DeprecationWarning: invalid escape sequence \W
elif re.match("\W+$", word):
nltk/nltk/tag/sequential.py:722
/home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:722: DeprecationWarning: invalid escape sequence \.
if re.match("[0-9]+(\.[0-9]*)?|[0-9]*\.[0-9]+$", word):
nltk/nltk/classify/rte_classify.py:61
/home/pombreda/tmp/nl/nltk/nltk/classify/rte_classify.py:61: DeprecationWarning: invalid escape sequence \w
tokenizer = RegexpTokenizer("[\w.@:/]+|\w+|\$[\d.]+")
nltk/nltk/classify/maxent.py:1351
/home/pombreda/tmp/nl/nltk/nltk/classify/maxent.py:1351: DeprecationWarning: invalid escape sequence \
"""
nltk/nltk/chunk/util.py:371
/home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:371: DeprecationWarning: invalid escape sequence \S
_LINE_RE = re.compile("(\S+)\s+(\S+)\s+([IOB])-?(\S+)?")
nltk/nltk/chunk/util.py:517
/home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:517: DeprecationWarning: invalid escape sequence \w
_IEER_TYPE_RE = re.compile('<b_\w+\s+[^>]*?type="(?P<type>\w+)"')
nltk/nltk/chunk/util.py:526
/home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:526: DeprecationWarning: invalid escape sequence \s
for piece_m in re.finditer("<[^>]+>|[^\s<]+", s):
nltk/nltk/chunk/regexp.py:70
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:70: DeprecationWarning: invalid escape sequence \{
_BRACKETS = re.compile("[^\{\}]+")
nltk/nltk/chunk/regexp.py:215
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:215: DeprecationWarning: invalid escape sequence \{
s = re.sub("\{\}", "", s)
nltk/nltk/chunk/regexp.py:426
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:426: DeprecationWarning: invalid escape sequence \g
RegexpChunkRule.__init__(self, regexp, "{\g<chunk>}", descr)
nltk/nltk/chunk/regexp.py:471
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:471: DeprecationWarning: invalid escape sequence \g
RegexpChunkRule.__init__(self, regexp, "}\g<chink>{", descr)
nltk/nltk/chunk/regexp.py:510
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:510: DeprecationWarning: invalid escape sequence \{
regexp = re.compile("\{(?P<chunk>%s)\}" % tag_pattern2re_pattern(tag_pattern))
nltk/nltk/chunk/regexp.py:511
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:511: DeprecationWarning: invalid escape sequence \g
RegexpChunkRule.__init__(self, regexp, "\g<chunk>", descr)
nltk/nltk/chunk/regexp.py:575
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:575: DeprecationWarning: invalid escape sequence \g
RegexpChunkRule.__init__(self, regexp, "\g<left>", descr)
nltk/nltk/chunk/regexp.py:708
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:708: DeprecationWarning: invalid escape sequence \{
"(?P<left>%s)\{(?P<right>%s)"
nltk/nltk/chunk/regexp.py:714
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:714: DeprecationWarning: invalid escape sequence \g
RegexpChunkRule.__init__(self, regexp, "{\g<left>\g<right>", descr)
nltk/nltk/chunk/regexp.py:778
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:778: DeprecationWarning: invalid escape sequence \}
"(?P<left>%s)\}(?P<right>%s)"
nltk/nltk/chunk/regexp.py:784
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:784: DeprecationWarning: invalid escape sequence \g
RegexpChunkRule.__init__(self, regexp, "\g<left>\g<right>}", descr)
nltk/nltk/chunk/regexp.py:896
nltk/nltk/chunk/regexp.py:896
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:896: DeprecationWarning: invalid escape sequence \{
r"^((%s|<%s>)*)$" % ("([^\{\}<>]|\{\d+,?\}|\{\d*,\d+\})+", "[^\{\}<>]+")
nltk/nltk/chunk/regexp.py:1175
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:1175: DeprecationWarning: invalid escape sequence \.
"""
nltk/nltk/inference/discourse.py:44
/home/pombreda/tmp/nl/nltk/nltk/inference/discourse.py:44: DeprecationWarning: invalid escape sequence \
"""
nltk/nltk/stem/lancaster.py:192
/home/pombreda/tmp/nl/nltk/nltk/stem/lancaster.py:192: DeprecationWarning: invalid escape sequence \*
valid_rule = re.compile("^[a-z]+\*?\d[a-z]*[>\.]?$")
nltk/nltk/stem/lancaster.py:225
/home/pombreda/tmp/nl/nltk/nltk/stem/lancaster.py:225: DeprecationWarning: invalid escape sequence \*
valid_rule = re.compile("^([a-z]+)(\*?)(\d)([a-z]*)([>\.]?)$")
nltk/nltk/stem/porter.py:177
/home/pombreda/tmp/nl/nltk/nltk/stem/porter.py:177: DeprecationWarning: invalid escape sequence \m
"""
nltk/nltk/corpus/__init__.py:116
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:116: DeprecationWarning: invalid escape sequence \.
".*\.(test|train).*",
nltk/nltk/corpus/__init__.py:123
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:123: DeprecationWarning: invalid escape sequence \.
".*\.(test|train).*",
nltk/nltk/corpus/__init__.py:126
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:126: DeprecationWarning: invalid escape sequence \.
crubadan = LazyCorpusLoader("crubadan", CrubadanCorpusReader, ".*\.txt")
nltk/nltk/corpus/__init__.py:128
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:128: DeprecationWarning: invalid escape sequence \.
"dependency_treebank", DependencyCorpusReader, ".*\.dp", encoding="ascii"
nltk/nltk/corpus/__init__.py:311
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:311: DeprecationWarning: invalid escape sequence \.
"timit", TimitTaggedCorpusReader, ".+\.tags", tagset="wsj", encoding="ascii"
nltk/nltk/corpus/__init__.py:335
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:335: DeprecationWarning: invalid escape sequence \.
twitter_samples = LazyCorpusLoader("twitter_samples", TwitterCorpusReader, ".*\.json")
nltk/nltk/corpus/__init__.py:364
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:364: DeprecationWarning: invalid escape sequence \.
wordnet_ic = LazyCorpusLoader("wordnet_ic", WordNetICCorpusReader, ".*\.dat")
nltk/nltk/corpus/__init__.py:374
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:374: DeprecationWarning: invalid escape sequence \.
"frames/.*\.xml",
nltk/nltk/corpus/__init__.py:383
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:383: DeprecationWarning: invalid escape sequence \.
"frames/.*\.xml",
nltk/nltk/corpus/__init__.py:392
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:392: DeprecationWarning: invalid escape sequence \.
"frames/.*\.xml",
nltk/nltk/corpus/__init__.py:401
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:401: DeprecationWarning: invalid escape sequence \.
"frames/.*\.xml",
nltk/nltk/corpus/reader/plaintext.py:62
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/plaintext.py:62: DeprecationWarning: invalid escape sequence \.
"""
nltk/nltk/corpus/reader/util.py:635
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/util.py:635: DeprecationWarning: invalid escape sequence \d
if re.match("^\d+-\d+", line) is not None:
nltk/nltk/corpus/reader/util.py:859
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/util.py:859: DeprecationWarning: invalid escape sequence \s
if re.match("======+\s*$", line):
nltk/nltk/corpus/reader/api.py:77
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/api.py:77: DeprecationWarning: invalid escape sequence \.
m = re.match("(.*\.zip)/?(.*)$|", root)
nltk/nltk/corpus/reader/timit.py:165
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/timit.py:165: DeprecationWarning: invalid escape sequence \.
encoding = [(".*\.wav", None), (".*", encoding)]
nltk/nltk/corpus/reader/bracket_parse.py:214
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/bracket_parse.py:214: DeprecationWarning: invalid escape sequence \.
"alpino\.xml",
nltk/nltk/corpus/reader/xmldocs.py:232
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/xmldocs.py:232: DeprecationWarning: invalid escape sequence \s
_XML_TAG_NAME = re.compile("<\s*/?\s*([^\s>]+)")
nltk/nltk/toolbox.py:209
/home/pombreda/tmp/nl/nltk/nltk/toolbox.py:209: DeprecationWarning: invalid escape sequence \_
"""
nltk/nltk/corpus/reader/bnc.py:29
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/bnc.py:29: DeprecationWarning: invalid escape sequence \w
"""
nltk/nltk/corpus/reader/switchboard.py:113
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/switchboard.py:113: DeprecationWarning: invalid escape sequence \w
_UTTERANCE_RE = re.compile("(\w+)\.(\d+)\:\s*(.*)")
nltk/nltk/corpus/reader/childes.py:281
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/childes.py:281: DeprecationWarning: invalid escape sequence \d
m = re.match("P(\d+)Y(\d+)M?(\d?\d?)D?", age_year)
nltk/nltk/corpus/reader/framenet.py:2753
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/framenet.py:2753: DeprecationWarning: invalid escape sequence \w
"""
nltk/nltk/corpus/reader/udhr.py:30
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/udhr.py:30: DeprecationWarning: invalid escape sequence \-
("Abkhaz\-Cyrillic\+Abkh", "cp1251"),
nltk/nltk/corpus/reader/twitter.py:54
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/twitter.py:54: DeprecationWarning: invalid escape sequence \.
"""
nltk/nltk/ccg/combinator.py:225
/home/pombreda/tmp/nl/nltk/nltk/ccg/combinator.py:225: DeprecationWarning: invalid escape sequence \Y
"""
nltk/nltk/treetransforms.py:108
/home/pombreda/tmp/nl/nltk/nltk/treetransforms.py:108: DeprecationWarning: invalid escape sequence \
"""
And FWIW: https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the result. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.) It is also important to note that the escape sequences only recognized in string literals fall into the category of unrecognized escapes for bytes literals.
Changed in version 3.6: Unrecognized escape sequences produce a DeprecationWarning. In some future version of Python they will be a SyntaxError.
$ python --version
Python 3.6.7
$ pytest --version
This is pytest version 5.1.2, imported from /pytest.py
$ pytest -vvs nltk/ --collect-only
============================= test session starts ==============================
platform linux -- Python 3.6.7, pytest-5.1.2, py-1.8.0, pluggy-0.12.0 -- */python3
cachedir: .pytest_cache
rootdir: **/nltk
collected 381 items
Unit tests for nltk.compat.
See also nltk/test/compat.doctest.
Unit tests for nltk.metrics.aline
Test Aline algorithm for aligning phonetic sequences
Test aline for computing the difference between two segments
Tests for Brill tagger.
Test for bug https://github.com/nltk/nltk/issues/1597
Ensures that curly bracket quantifiers can be used inside a chunk rule.
This type of quantifier has been used for the supplementary example
in http://www.nltk.org/book/ch07.html#exploring-text-corpora.
Unit tests for nltk.classify. See also: nltk/test/classify.doctest
Text constructed using: http://www.nltk.org/book/ch01.html
Mock test for Stanford CoreNLP wrappers.
Corpus View Regression Tests
Class containing unit tests for nltk.metrics.agreement.Disagreement.
More advanced test, based on
http://www.agreestat.com/research_papers/onkrippendorffalpha.pdf
Same more advanced example, but with 1 rating removed.
Again, removal of that 1 rating shoudl not matter.
Simple test, based on
https://github.com/foolswood/krippendorffs_alpha/raw/master/krippendorff.pdf.
Same simple test with 1 rating removed.
Removal of that rating should not matter: K-Apha ignores items with
only 1 rating.
Regression tests for json2csv()
and json2csv_entities()
in Twitter
package.
Sanity check that file comparison is not giving false positives.
Unit tests for nltk.corpus.nombank
Tests for nltk.pos_tag
The following test performs a random series of reads, seeks, and
tells, and checks that the results are consistent.
Unit tests for Senna
Unittest for nltk.classify.senna
Senna pipeline interface
Unittest for nltk.tag.senna
this unit testing for test the snowball arabic light stemmer
this stemmer deals with prefixes and suffixes
Test for bug https://github.com/nltk/nltk/issues/1581
Ensures that 'oed' can be stemmed without throwing an error.
<TestCaseFunction test_vocabulary_martin_mode>
Tests all words from the test vocabulary provided by M Porter
The sample vocabulary and output were sourced from:
http://tartarus.org/martin/PorterStemmer/voc.txt
http://tartarus.org/martin/PorterStemmer/output.txt
and are linked to from the Porter Stemmer algorithm's homepage
at
http://tartarus.org/martin/PorterStemmer/
<TestCaseFunction test_vocabulary_nltk_mode>
<TestCaseFunction test_vocabulary_original_mode>
Unit tests for nltk.tgrep.
Class containing unit tests for nltk.tgrep.
Test error handling of undefined tgrep operators.
Test that comments are correctly filtered out of tgrep search
strings.
Test the Basic Examples from the TGrep2 manual.
Test labeled nodes.
Test case from Emily M. Bender.
<TestCaseFunction test_multiple_conjs>
Test that multiple (3 or more) conjunctions of node relations are
handled properly.
<TestCaseFunction test_node_encoding>
Test that tgrep search strings handles bytes and strs the same
way.
<TestCaseFunction test_node_nocase>
Test selecting nodes using case insensitive node names.
<TestCaseFunction test_node_noleaves>
Test node name matching with the search_leaves flag set to False.
<TestCaseFunction test_node_printing>
Test that the tgrep print operator ' is properly ignored.
<TestCaseFunction test_node_quoted>
Test selecting nodes using quoted node names.
<TestCaseFunction test_node_regex>
Test regex matching on nodes.
<TestCaseFunction test_node_regex_2>
Test regex matching on nodes.
<TestCaseFunction test_node_simple>
Test a simple use of tgrep for finding nodes matching a given
pattern.
<TestCaseFunction test_node_tree_position>
Test matching on nodes based on NLTK tree position.
<TestCaseFunction test_rel_precedence>
Test matching nodes based on precedence relations.
<TestCaseFunction test_rel_sister_nodes>
Test matching sister nodes in a tree.
<TestCaseFunction test_tokenize_encoding>
Test that tokenization handles bytes and strs the same way.
<TestCaseFunction test_tokenize_examples>
Test tokenization of the TGrep2 manual example patterns.
<TestCaseFunction test_tokenize_link_types>
Test tokenization of basic link types.
<TestCaseFunction test_tokenize_macros>
Test tokenization of macro definitions.
<TestCaseFunction test_tokenize_node_labels>
Test tokenization of labeled nodes.
<TestCaseFunction test_tokenize_nodenames>
Test tokenization of node names.
<TestCaseFunction test_tokenize_quoting>
Test tokenization of quoting.
<TestCaseFunction test_tokenize_segmented_patterns>
Test tokenization of segmented patterns.
<TestCaseFunction test_tokenize_simple>
Simple test of tokenization.
<TestCaseFunction test_trailing_semicolon>
Test that semicolons at the end of a tgrep2 search string won't
cause a parse failure.
<TestCaseFunction test_use_macros>
Test defining and using tgrep2 macros.
<TestCaseFunction tests_rel_dominance>
Test matching nodes based on dominance relations.
<TestCaseFunction tests_rel_indexed_children>
Test matching nodes based on their index in their parent node.
Unit tests for nltk.tokenize.
See also nltk/test/tokenize.doctest
Test padding of asterisk for word tokenization.
Test padding of dotdot* for word tokenization.
Test a string that resembles a phone number but contains a newline
Test remove_handle() from casual.py with specially crafted edge cases
Test SyllableTokenizer tokenizer.
Test the Stanford Word Segmenter for Arabic (default config)
Test the Stanford Word Segmenter for Chinese (default config)
Test TreebankWordTokenizer.span_tokenize function
Test TweetTokenizer using words with special and accented characters.
Test word_tokenize function
Tests for static parts of Twitter package
Tests that Twitter credentials information from file is handled correctly.
Default credentials file is identified
Default credentials file has been read correctluy
Path to default credentials file is well-formed, given specified
subdir.
Setting subdir to empty path should raise an error.
Setting subdir to None
should raise an error.
Test that environment variable has been read correctly.
Credentials file 'bad_oauth1-1.txt' is incomplete
First key in credentials file 'bad_oauth1-2.txt' is ill-formed
First key in credentials file 'bad_oauth1-2.txt' is ill-formed
Setting subdir to nonexistent directory should raise an error.
Defaults for authentication will fail since 'credentials.txt' not
present in default subdir, as read from os.environ['TWITTER']
.
Credentials file 'foobar' cannot be found in default subdir.
Unit tests for nltk.corpus.wordnet
See also nltk/test/wordnet.doctest
Tests for NgramCounter that only involve lookup, no modification.
Unit tests for MLE ngram model.
MLE trigram model tests
Unit tests for Lidstone class
Unit tests for Laplace class
Using MLE model, generate some text.
tests Vocabulary Class
Tests for BLEU translation evaluation metric
Examples from the original BLEU paper
http://www.aclweb.org/anthology/P02-1040.pdf
Tests GDFA alignments
Testing GDFA with first 10 eflomal outputs from issue #1829
https://github.com/nltk/nltk/issues/1829
Tests for IBM Model 1 training methods
Tests for IBM Model 2 training methods
Tests for IBM Model 3 training methods
Tests for IBM Model 4 training methods
Most helpful comment
If there is no one working on this, I would like to. Can you tell steps to duplicate the issue please?