nltk 🚀 - 更新各种正则表达式转义序列

如果没有人为此工作，我想。您能否说出重复问题的步骤？

PabloDino 于 2019-08-31

👍2 ❤1

@PabloDino安装Python 3.6.8或更高版本，并尝试导入每个模块。通过使用原始字符串或使用适当的转义来修复正则表达式，以使其在Python 2和3上均有效

pombredanne 于 2019-08-31

我正在研究-正在进行一些练习，但没有看到任何警告。您可以张贴警告清单显示在其中的代码段吗？

PabloDino 于 2019-09-03

@PabloDino ：

$ python --version
Python 3.6.8
$ git clone git://github.com/nltk/nltk.git
$ pip install pytest
$ pytest -vvs nltk/ --collect-only

========================================= warnings summary =========================================
nltk/nltk/featstruct.py:1295
  /home/pombreda/tmp/nl/nltk/nltk/featstruct.py:1295: DeprecationWarning: invalid escape sequence \d
    name, n = re.sub("\d+$", "", var.name), 2

nltk/nltk/featstruct.py:2091
  /home/pombreda/tmp/nl/nltk/nltk/featstruct.py:2091: DeprecationWarning: invalid escape sequence \d
    RANGE_RE = re.compile("(-?\d+):(-?\d+)")

nltk/nltk/sem/evaluate.py:307
  /home/pombreda/tmp/nl/nltk/nltk/sem/evaluate.py:307: DeprecationWarning: invalid escape sequence \ 
    """

nltk/nltk/sem/relextract.py:128
  /home/pombreda/tmp/nl/nltk/nltk/sem/relextract.py:128: DeprecationWarning: invalid escape sequence \w
    ENT = re.compile("&(\w+?);")

nltk/nltk/sem/relextract.py:407
  /home/pombreda/tmp/nl/nltk/nltk/sem/relextract.py:407: DeprecationWarning: invalid escape sequence \s
    """

nltk/nltk/sem/boxer.py:776
  /home/pombreda/tmp/nl/nltk/nltk/sem/boxer.py:776: DeprecationWarning: invalid escape sequence \d
    assert re.match("^[exps]\d+$", var), var

nltk/nltk/sem/drt.py:716
  /home/pombreda/tmp/nl/nltk/nltk/sem/drt.py:716: DeprecationWarning: invalid escape sequence \ 
    + [" \  " + blank + line for line in term_lines[1:2]]

nltk/nltk/sem/drt.py:717
  /home/pombreda/tmp/nl/nltk/nltk/sem/drt.py:717: DeprecationWarning: invalid escape sequence \ 
    + [" /\ " + var_string + line for line in term_lines[2:3]]

nltk/nltk/grammar.py:1291
  /home/pombreda/tmp/nl/nltk/nltk/grammar.py:1291: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/grammar.py:1463
  /home/pombreda/tmp/nl/nltk/nltk/grammar.py:1463: DeprecationWarning: invalid escape sequence \w
    _STANDARD_NONTERM_RE = re.compile("( [\w/][\w/^<>-]* ) \s*", re.VERBOSE)

nltk/nltk/text.py:650
  /home/pombreda/tmp/nl/nltk/nltk/text.py:650: DeprecationWarning: invalid escape sequence \w
    _CONTEXT_RE = re.compile("\w+|[\.\!\?]")

nltk/nltk/tokenize/punkt.py:1462
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/punkt.py:1462: DeprecationWarning: invalid escape sequence \s
    pat = "\s*".join(re.escape(c) for c in tok)

nltk/nltk/tokenize/regexp.py:100
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/regexp.py:100: DeprecationWarning: invalid escape sequence \w
    """

nltk/nltk/tokenize/regexp.py:193
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/regexp.py:193: DeprecationWarning: invalid escape sequence \w
    """

nltk/nltk/tokenize/repp.py:133
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/repp.py:133: DeprecationWarning: invalid escape sequence \(
    line_regex = re.compile("^\((\d+), (\d+), (.+)\)$", re.MULTILINE)

nltk/nltk/tokenize/texttiling.py:96
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/texttiling.py:96: DeprecationWarning: invalid escape sequence \-
    c for c in lowercase_text if re.match("[a-z\-' \n\t]", c)

nltk/nltk/tokenize/texttiling.py:229
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/texttiling.py:229: DeprecationWarning: invalid escape sequence \w
    matches = re.finditer("\w+", text)

nltk/nltk/tokenize/toktok.py:53
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:53: DeprecationWarning: invalid escape sequence \]
    FUNKY_PUNCT_1 = re.compile(u'([،;؛¿!"\])}»›”؟¡%٪°±©®।॥…])'), r" \1 "

nltk/nltk/tokenize/toktok.py:55
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:55: DeprecationWarning: invalid escape sequence \[
    FUNKY_PUNCT_2 = re.compile(u"([({\[“‘„‚«‹「『])"), r" \1 "

nltk/nltk/tokenize/toktok.py:62
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:62: DeprecationWarning: invalid escape sequence \|
    PIPE = re.compile("\|"), " &#124; "

nltk/nltk/tokenize/treebank.py:269
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:269: DeprecationWarning: invalid escape sequence \]
    """

nltk/nltk/tokenize/treebank.py:273
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:273: DeprecationWarning: invalid escape sequence \s
    re.compile(pattern.replace("(?#X)", "\s"))

nltk/nltk/tokenize/treebank.py:277
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:277: DeprecationWarning: invalid escape sequence \s
    re.compile(pattern.replace("(?#X)", "\s"))

nltk/nltk/tree.py:99
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:99: DeprecationWarning: invalid escape sequence \ 
    """

nltk/nltk/tree.py:652
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:652: DeprecationWarning: invalid escape sequence \s
    if re.search("\s", brackets):

nltk/nltk/tree.py:658
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:658: DeprecationWarning: invalid escape sequence \s
    node_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)

nltk/nltk/tree.py:660
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:660: DeprecationWarning: invalid escape sequence \s
    leaf_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)

nltk/nltk/tree.py:662
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:662: DeprecationWarning: invalid escape sequence \s
    "%s\s*(%s)?|%s|(%s)"

nltk/nltk/tree.py:900
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:900: DeprecationWarning: invalid escape sequence \$
    reserved_chars = re.compile("([#\$%&~_\{\}])")

nltk/nltk/parse/chart.py:1034
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1034: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1073
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1073: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1128
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1128: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1148
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1148: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1218
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1218: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1241
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1241: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/featurechart.py:270
  /home/pombreda/tmp/nl/nltk/nltk/parse/featurechart.py:270: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/featurechart.py:369
  /home/pombreda/tmp/nl/nltk/nltk/parse/featurechart.py:369: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/tag/sequential.py:730
  /home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:730: DeprecationWarning: invalid escape sequence \w
    elif re.match("\w+$", word):

nltk/nltk/tag/sequential.py:724
  /home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:724: DeprecationWarning: invalid escape sequence \W
    elif re.match("\W+$", word):

nltk/nltk/tag/sequential.py:722
  /home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:722: DeprecationWarning: invalid escape sequence \.
    if re.match("[0-9]+(\.[0-9]*)?|[0-9]*\.[0-9]+$", word):

nltk/nltk/classify/rte_classify.py:61
  /home/pombreda/tmp/nl/nltk/nltk/classify/rte_classify.py:61: DeprecationWarning: invalid escape sequence \w
    tokenizer = RegexpTokenizer("[\w.@:/]+|\w+|\$[\d.]+")

nltk/nltk/classify/maxent.py:1351
  /home/pombreda/tmp/nl/nltk/nltk/classify/maxent.py:1351: DeprecationWarning: invalid escape sequence \ 
    """

nltk/nltk/chunk/util.py:371
  /home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:371: DeprecationWarning: invalid escape sequence \S
    _LINE_RE = re.compile("(\S+)\s+(\S+)\s+([IOB])-?(\S+)?")

nltk/nltk/chunk/util.py:517
  /home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:517: DeprecationWarning: invalid escape sequence \w
    _IEER_TYPE_RE = re.compile('<b_\w+\s+[^>]*?type="(?P<type>\w+)"')

nltk/nltk/chunk/util.py:526
  /home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:526: DeprecationWarning: invalid escape sequence \s
    for piece_m in re.finditer("<[^>]+>|[^\s<]+", s):

nltk/nltk/chunk/regexp.py:70
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:70: DeprecationWarning: invalid escape sequence \{
    _BRACKETS = re.compile("[^\{\}]+")

nltk/nltk/chunk/regexp.py:215
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:215: DeprecationWarning: invalid escape sequence \{
    s = re.sub("\{\}", "", s)

nltk/nltk/chunk/regexp.py:426
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:426: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "{\g<chunk>}", descr)

nltk/nltk/chunk/regexp.py:471
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:471: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "}\g<chink>{", descr)

nltk/nltk/chunk/regexp.py:510
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:510: DeprecationWarning: invalid escape sequence \{
    regexp = re.compile("\{(?P<chunk>%s)\}" % tag_pattern2re_pattern(tag_pattern))

nltk/nltk/chunk/regexp.py:511
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:511: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "\g<chunk>", descr)

nltk/nltk/chunk/regexp.py:575
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:575: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "\g<left>", descr)

nltk/nltk/chunk/regexp.py:708
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:708: DeprecationWarning: invalid escape sequence \{
    "(?P<left>%s)\{(?P<right>%s)"

nltk/nltk/chunk/regexp.py:714
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:714: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "{\g<left>\g<right>", descr)

nltk/nltk/chunk/regexp.py:778
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:778: DeprecationWarning: invalid escape sequence \}
    "(?P<left>%s)\}(?P<right>%s)"

nltk/nltk/chunk/regexp.py:784
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:784: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "\g<left>\g<right>}", descr)

nltk/nltk/chunk/regexp.py:896
nltk/nltk/chunk/regexp.py:896
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:896: DeprecationWarning: invalid escape sequence \{
    r"^((%s|<%s>)*)$" % ("([^\{\}<>]|\{\d+,?\}|\{\d*,\d+\})+", "[^\{\}<>]+")

nltk/nltk/chunk/regexp.py:1175
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:1175: DeprecationWarning: invalid escape sequence \.
    """

nltk/nltk/inference/discourse.py:44
  /home/pombreda/tmp/nl/nltk/nltk/inference/discourse.py:44: DeprecationWarning: invalid escape sequence \ 
    """

nltk/nltk/stem/lancaster.py:192
  /home/pombreda/tmp/nl/nltk/nltk/stem/lancaster.py:192: DeprecationWarning: invalid escape sequence \*
    valid_rule = re.compile("^[a-z]+\*?\d[a-z]*[>\.]?$")

nltk/nltk/stem/lancaster.py:225
  /home/pombreda/tmp/nl/nltk/nltk/stem/lancaster.py:225: DeprecationWarning: invalid escape sequence \*
    valid_rule = re.compile("^([a-z]+)(\*?)(\d)([a-z]*)([>\.]?)$")

nltk/nltk/stem/porter.py:177
  /home/pombreda/tmp/nl/nltk/nltk/stem/porter.py:177: DeprecationWarning: invalid escape sequence \m
    """

nltk/nltk/corpus/__init__.py:116
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:116: DeprecationWarning: invalid escape sequence \.
    ".*\.(test|train).*",

nltk/nltk/corpus/__init__.py:123
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:123: DeprecationWarning: invalid escape sequence \.
    ".*\.(test|train).*",

nltk/nltk/corpus/__init__.py:126
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:126: DeprecationWarning: invalid escape sequence \.
    crubadan = LazyCorpusLoader("crubadan", CrubadanCorpusReader, ".*\.txt")

nltk/nltk/corpus/__init__.py:128
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:128: DeprecationWarning: invalid escape sequence \.
    "dependency_treebank", DependencyCorpusReader, ".*\.dp", encoding="ascii"

nltk/nltk/corpus/__init__.py:311
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:311: DeprecationWarning: invalid escape sequence \.
    "timit", TimitTaggedCorpusReader, ".+\.tags", tagset="wsj", encoding="ascii"

nltk/nltk/corpus/__init__.py:335
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:335: DeprecationWarning: invalid escape sequence \.
    twitter_samples = LazyCorpusLoader("twitter_samples", TwitterCorpusReader, ".*\.json")

nltk/nltk/corpus/__init__.py:364
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:364: DeprecationWarning: invalid escape sequence \.
    wordnet_ic = LazyCorpusLoader("wordnet_ic", WordNetICCorpusReader, ".*\.dat")

nltk/nltk/corpus/__init__.py:374
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:374: DeprecationWarning: invalid escape sequence \.
    "frames/.*\.xml",

nltk/nltk/corpus/__init__.py:383
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:383: DeprecationWarning: invalid escape sequence \.
    "frames/.*\.xml",

nltk/nltk/corpus/__init__.py:392
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:392: DeprecationWarning: invalid escape sequence \.
    "frames/.*\.xml",

nltk/nltk/corpus/__init__.py:401
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:401: DeprecationWarning: invalid escape sequence \.
    "frames/.*\.xml",

nltk/nltk/corpus/reader/plaintext.py:62
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/plaintext.py:62: DeprecationWarning: invalid escape sequence \.
    """

nltk/nltk/corpus/reader/util.py:635
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/util.py:635: DeprecationWarning: invalid escape sequence \d
    if re.match("^\d+-\d+", line) is not None:

nltk/nltk/corpus/reader/util.py:859
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/util.py:859: DeprecationWarning: invalid escape sequence \s
    if re.match("======+\s*$", line):

nltk/nltk/corpus/reader/api.py:77
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/api.py:77: DeprecationWarning: invalid escape sequence \.
    m = re.match("(.*\.zip)/?(.*)$|", root)

nltk/nltk/corpus/reader/timit.py:165
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/timit.py:165: DeprecationWarning: invalid escape sequence \.
    encoding = [(".*\.wav", None), (".*", encoding)]

nltk/nltk/corpus/reader/bracket_parse.py:214
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/bracket_parse.py:214: DeprecationWarning: invalid escape sequence \.
    "alpino\.xml",

nltk/nltk/corpus/reader/xmldocs.py:232
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/xmldocs.py:232: DeprecationWarning: invalid escape sequence \s
    _XML_TAG_NAME = re.compile("<\s*/?\s*([^\s>]+)")

nltk/nltk/toolbox.py:209
  /home/pombreda/tmp/nl/nltk/nltk/toolbox.py:209: DeprecationWarning: invalid escape sequence \_
    """

nltk/nltk/corpus/reader/bnc.py:29
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/bnc.py:29: DeprecationWarning: invalid escape sequence \w
    """

nltk/nltk/corpus/reader/switchboard.py:113
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/switchboard.py:113: DeprecationWarning: invalid escape sequence \w
    _UTTERANCE_RE = re.compile("(\w+)\.(\d+)\:\s*(.*)")

nltk/nltk/corpus/reader/childes.py:281
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/childes.py:281: DeprecationWarning: invalid escape sequence \d
    m = re.match("P(\d+)Y(\d+)M?(\d?\d?)D?", age_year)

nltk/nltk/corpus/reader/framenet.py:2753
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/framenet.py:2753: DeprecationWarning: invalid escape sequence \w
    """

nltk/nltk/corpus/reader/udhr.py:30
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/udhr.py:30: DeprecationWarning: invalid escape sequence \-
    ("Abkhaz\-Cyrillic\+Abkh", "cp1251"),

nltk/nltk/corpus/reader/twitter.py:54
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/twitter.py:54: DeprecationWarning: invalid escape sequence \.
    """

nltk/nltk/ccg/combinator.py:225
  /home/pombreda/tmp/nl/nltk/nltk/ccg/combinator.py:225: DeprecationWarning: invalid escape sequence \Y
    """

nltk/nltk/treetransforms.py:108
  /home/pombreda/tmp/nl/nltk/nltk/treetransforms.py:108: DeprecationWarning: invalid escape sequence \ 
    """

pombredanne 于 2019-09-07

和FWIW： https :

与标准C不同，所有无法识别的转义序列都保留在字符串中不变，即结果中留有反斜杠。（此行为在调试时非常有用：如果转义序列输入错误，则更容易将输出识别为中断。）还要注意的一点是，仅在字符串文字中识别的转义序列属于无法识别的字节转义类别。文字。

在版本3.6中更改：无法识别的转义序列会产生DeprecationWarning。在将来的Python版本中，它们将是SyntaxError。

pombredanne 于 2019-09-07

$ python-版本
Python 3.6.7
$ pytest-版本
这是从/pytest.py导入的pytest版本5.1.2。$ pytest -vvs nltk /-仅收集=============================测试会话开始=================== ============平台linux-Python 3.6.7，pytest-5.1.2，py-1.8.0，pluggy-0.12.0-* / python3
cachedir：.pytest_cache
rootdir：** / nltk
收集了381个项目

nltk.compat的单元测试。
另请参见nltk / test / compat.doctest。

nltk.metrics.aline的单元测试

语音序列比对的Test Aline算法

测试线以计算两个细分之间的差异

测试Brill标记器。

测试错误https://github.com/nltk/nltk/issues/1597

    Ensures that curly bracket quantifiers can be used inside a chunk rule.
    This type of quantifier has been used for the supplementary example
    in http://www.nltk.org/book/ch07.html#exploring-text-corpora.

nltk.classify的单元测试。另请参阅：nltk / test / classify.doctest

使用以下文本构造的文本： http :

斯坦福大学CoreNLP包装器的模拟测试。

语料库视图回归测试

包含nltk.metrics.agreement.Disagreement的单元测试的类。

更高级的测试，基于
http://www.agreestat.com/research_papers/onkrippendorffalpha.pdf

同样的高级示例，但删除了1个等级。
同样，删除该1级评级无关紧要。

简单测试，基于
https://github.com/foolswood/krippendorffs_alpha/raw/master/krippendorff.pdf。

删除1个等级的相同简单测试。
删除该等级无关紧要：K-Apha会忽略带有
只有1个评分。

在Twitter中对json2csv()和json2csv_entities()进行回归测试
包。

仔细检查文件比较是否没有误报。

nltk.corpus.nombank的单元测试

测试nltk.pos_tag

以下测试执行一系列随机的读取，查找和查找
告诉并检查结果是否一致。

番泻叶的单元测试

nltk.classify.senna的单元测试

番泻叶管道接口

nltk.tag.senna的单元测试

本单元测试用于测试雪球阿拉伯文灯杆
该词干处理前缀和后缀

测试错误https://github.com/nltk/nltk/issues/1581

    Ensures that 'oed' can be stemmed without throwing an error.
  <TestCaseFunction test_vocabulary_martin_mode>
    Tests all words from the test vocabulary provided by M Porter

    The sample vocabulary and output were sourced from:
    http://tartarus.org/martin/PorterStemmer/voc.txt
    http://tartarus.org/martin/PorterStemmer/output.txt
    and are linked to from the Porter Stemmer algorithm's homepage
    at
    http://tartarus.org/martin/PorterStemmer/
  <TestCaseFunction test_vocabulary_nltk_mode>
  <TestCaseFunction test_vocabulary_original_mode>

nltk.tgrep的单元测试。

包含nltk.tgrep的单元测试的类。

测试未定义的tgrep运算符的错误处理。

测试已从tgrep搜索中正确过滤掉评论
字符串。

从TGrep2手册中测试基本示例。

测试标记的节点。

    Test case from Emily M. Bender.
  <TestCaseFunction test_multiple_conjs>
    Test that multiple (3 or more) conjunctions of node relations are
    handled properly.
  <TestCaseFunction test_node_encoding>
    Test that tgrep search strings handles bytes and strs the same
    way.
  <TestCaseFunction test_node_nocase>
    Test selecting nodes using case insensitive node names.
  <TestCaseFunction test_node_noleaves>
    Test node name matching with the search_leaves flag set to False.
  <TestCaseFunction test_node_printing>
    Test that the tgrep print operator ' is properly ignored.
  <TestCaseFunction test_node_quoted>
    Test selecting nodes using quoted node names.
  <TestCaseFunction test_node_regex>
    Test regex matching on nodes.
  <TestCaseFunction test_node_regex_2>
    Test regex matching on nodes.
  <TestCaseFunction test_node_simple>
    Test a simple use of tgrep for finding nodes matching a given
    pattern.
  <TestCaseFunction test_node_tree_position>
    Test matching on nodes based on NLTK tree position.
  <TestCaseFunction test_rel_precedence>
    Test matching nodes based on precedence relations.
  <TestCaseFunction test_rel_sister_nodes>
    Test matching sister nodes in a tree.
  <TestCaseFunction test_tokenize_encoding>
    Test that tokenization handles bytes and strs the same way.
  <TestCaseFunction test_tokenize_examples>
    Test tokenization of the TGrep2 manual example patterns.
  <TestCaseFunction test_tokenize_link_types>
    Test tokenization of basic link types.
  <TestCaseFunction test_tokenize_macros>
    Test tokenization of macro definitions.
  <TestCaseFunction test_tokenize_node_labels>
    Test tokenization of labeled nodes.
  <TestCaseFunction test_tokenize_nodenames>
    Test tokenization of node names.
  <TestCaseFunction test_tokenize_quoting>
    Test tokenization of quoting.
  <TestCaseFunction test_tokenize_segmented_patterns>
    Test tokenization of segmented patterns.
  <TestCaseFunction test_tokenize_simple>
    Simple test of tokenization.
  <TestCaseFunction test_trailing_semicolon>
    Test that semicolons at the end of a tgrep2 search string won't
    cause a parse failure.
  <TestCaseFunction test_use_macros>
    Test defining and using tgrep2 macros.
  <TestCaseFunction tests_rel_dominance>
    Test matching nodes based on dominance relations.
  <TestCaseFunction tests_rel_indexed_children>
    Test matching nodes based on their index in their parent node.

对nltk.tokenize进行单元测试。
另请参见nltk / test / tokenize.doctest

测试星号填充以进行单词标记化。

测试dotdot *的填充以进行单词标记化。

测试类似于电话号码但包含换行符的字符串

使用特制的边缘盒测试来自Casual.py的remove_handle（）

测试SyllableTokenizer标记程序。

测试斯坦福分词器的阿拉伯语（默认配置）

测试斯坦福分词器的中文（默认配置）

测试TreebankWordTokenizer.span_tokenize函数

使用带有特殊和重音符号的单词测试TweetTokenizer。

测试word_tokenize函数

测试Twitter软件包的静态部分

测试来自文件的Twitter凭证信息是否得到正确处理。

识别默认凭证文件

默认凭证文件已正确读取

给定指定，默认凭据文件的路径格式正确
子目录

将subdir设置为空路径会引发错误。

将subdir设置为None引发错误。

测试是否正确读取了环境变量。

凭证文件“ bad_oauth1-1.txt”不完整

凭证文件“ bad_oauth1-2.txt”中的第一密钥格式错误

凭证文件“ bad_oauth1-2.txt”中的第一密钥格式错误

将subdir设置为不存在的目录会引发错误。

验证的默认设置将失败，因为'credentials.txt'不会
存在于默认子目录中，如从os.environ['TWITTER']读取。

在默认子目录中找不到凭证文件“ foobar”。

nltk.corpus.wordnet的单元测试
另请参见nltk / test / wordnet.doctest

对NgramCounter的测试仅涉及查找，没有修改。

MLE ngram模型的单元测试。

MLE Trigram模型测试

Lidstone类的单元测试

拉普拉斯课程的单元测试

使用MLE模型，生成一些文本。

测试词汇课

BLEU翻译评估指标的测试

BLEU原始纸中的示例
http://www.aclweb.org/anthology/P02-1040.pdf

测试GDFA对齐

使用问题＃1829的前10个结果输出测试GDFA
https://github.com/nltk/nltk/issues/1829

测试IBM Model 1培训方法

测试IBM Model 2培训方法

测试IBM Model 3培训方法

测试IBM Model 4培训方法

测试IBM Model 5培训方法

========================================================================= =============
$

PabloDino 于 2019-09-09

我看到与@pombredanne相同的输出。

stevenbird 于 2019-09-12

嗨， @ PabloDino是否仍计划解决此问题？

我已经能够复制@pombredanne的输出，并希望解决此问题。

ab-10 于 2019-09-30

来吧，我还没有复制

于2019年9月30日星期一11:40 AM Armin Stepanjan [email protected]
写道：

嗨， @PabloDino https://github.com/PabloDino是否仍计划工作
在这个问题上？
我已经能够复制@pombredanne
https://github.com/pombredanne的输出，并希望继续工作
解决此问题。
-
您收到此邮件是因为有人提到您。
直接回复此电子邮件，在GitHub上查看
https://github.com/nltk/nltk/issues/2378？
或使线程静音
https://github.com/notifications/unsubscribe-auth/ABRSN4KRASPRV6I4VLHFNILQMIMVPANCNFSM4IRCRGMQ
。

PabloDino 于 2019-10-12

@ ab-10您是否能够解决这些部门警告？

gertjanwytynck 于 2020-01-16

使用以下命令运行的Python 3.8更新列表：

find . -iname '*.py' | xargs -P 4 -I{} python3.8 -Wall -m py_compile {}

./nltk/chat/iesha.py:52: DeprecationWarning: invalid escape sequence \<
  "u think I can%2??! really?? kekeke \<_\<",
./nltk/tag/sequential.py:730: DeprecationWarning: invalid escape sequence \w
  elif re.match("\w+$", word):
./nltk/tag/sequential.py:724: DeprecationWarning: invalid escape sequence \W
  elif re.match("\W+$", word):
./nltk/tag/sequential.py:722: DeprecationWarning: invalid escape sequence \.
  if re.match("[0-9]+(\.[0-9]*)?|[0-9]*\.[0-9]+$", word):
./nltk/app/chunkparser_app.py:206: DeprecationWarning: invalid escape sequence \#
  "\t<regexp><\#><CD> # This is a comment...</regexp>\n"
./nltk/app/chunkparser_app.py:315: DeprecationWarning: invalid escape sequence \s
  grammar = re.sub("\n\s+", "\n", grammar)
./nltk/app/chunkparser_app.py:1061: DeprecationWarning: invalid escape sequence \w
  key=lambda t_w: re.match("\w+", t_w[0])
./nltk/app/chunkparser_app.py:1422: DeprecationWarning: invalid escape sequence \#
  "^\# Regexp Chunk Parsing Grammar[\s\S]*" "F-score:.*\n", "", grammar
./nltk/sem/cooper_storage.py:48: DeprecationWarning: invalid escape sequence \P
  """
./nltk/sem/relextract.py:128: DeprecationWarning: invalid escape sequence \w
  ENT = re.compile("&(\w+?);")
./nltk/sem/relextract.py:382: DeprecationWarning: invalid escape sequence \s
  roles = """
./nltk/sem/boxer.py:776: DeprecationWarning: invalid escape sequence \d
  assert re.match("^[exps]\d+$", var), var
./nltk/sem/drt.py:716: DeprecationWarning: invalid escape sequence \ 
  + [" \  " + blank + line for line in term_lines[1:2]]
./nltk/sem/drt.py:717: DeprecationWarning: invalid escape sequence \ 
  + [" /\ " + var_string + line for line in term_lines[2:3]]
./nltk/sem/chat80.py:9: DeprecationWarning: invalid escape sequence \P
  """
./nltk/sem/chat80.py:705: DeprecationWarning: invalid escape sequence \P
  template = "PropN[num=sg, sem=<\P.(P %s)>] -> '%s'\n"
./nltk/sem/evaluate.py:257: DeprecationWarning: invalid escape sequence \ 
  """
./nltk/corpus/reader/util.py:635: DeprecationWarning: invalid escape sequence \d
  if re.match("^\d+-\d+", line) is not None:
./nltk/corpus/reader/util.py:859: DeprecationWarning: invalid escape sequence \s
  if re.match("======+\s*$", line):
./nltk/corpus/reader/framenet.py:2748: DeprecationWarning: invalid escape sequence \w
  """
./nltk/corpus/reader/bracket_parse.py:215: DeprecationWarning: invalid escape sequence \.
  "alpino\.xml",
./nltk/corpus/reader/twitter.py:25: DeprecationWarning: invalid escape sequence \.
  """
./nltk/corpus/reader/xmldocs.py:232: DeprecationWarning: invalid escape sequence \s
  _XML_TAG_NAME = re.compile("<\s*/?\s*([^\s>]+)")
./nltk/corpus/reader/bnc.py:15: DeprecationWarning: invalid escape sequence \w
  """Corpus reader for the XML version of the British National Corpus.
./nltk/corpus/reader/udhr.py:30: DeprecationWarning: invalid escape sequence \-
  ("Abkhaz\-Cyrillic\+Abkh", "cp1251"),
./nltk/corpus/reader/timit.py:165: DeprecationWarning: invalid escape sequence \.
  encoding = [(".*\.wav", None), (".*", encoding)]
./nltk/corpus/reader/childes.py:281: DeprecationWarning: invalid escape sequence \d
  m = re.match("P(\d+)Y(\d+)M?(\d?\d?)D?", age_year)
./nltk/corpus/reader/plaintext.py:47: DeprecationWarning: invalid escape sequence \.
  """
./nltk/corpus/reader/switchboard.py:113: DeprecationWarning: invalid escape sequence \w
  _UTTERANCE_RE = re.compile("(\w+)\.(\d+)\:\s*(.*)")
./nltk/corpus/reader/api.py:77: DeprecationWarning: invalid escape sequence \.
  m = re.match("(.*\.zip)/?(.*)$|", root)
./nltk/corpus/__init__.py:116: DeprecationWarning: invalid escape sequence \.
  ".*\.(test|train).*",
./nltk/corpus/__init__.py:123: DeprecationWarning: invalid escape sequence \.
  ".*\.(test|train).*",
./nltk/corpus/__init__.py:126: DeprecationWarning: invalid escape sequence \.
  crubadan = LazyCorpusLoader("crubadan", CrubadanCorpusReader, ".*\.txt")
./nltk/corpus/__init__.py:128: DeprecationWarning: invalid escape sequence \.
  "dependency_treebank", DependencyCorpusReader, ".*\.dp", encoding="ascii"
./nltk/corpus/__init__.py:311: DeprecationWarning: invalid escape sequence \.
  "timit", TimitTaggedCorpusReader, ".+\.tags", tagset="wsj", encoding="ascii"
./nltk/corpus/__init__.py:335: DeprecationWarning: invalid escape sequence \.
  twitter_samples = LazyCorpusLoader("twitter_samples", TwitterCorpusReader, ".*\.json")
./nltk/corpus/__init__.py:364: DeprecationWarning: invalid escape sequence \.
  wordnet_ic = LazyCorpusLoader("wordnet_ic", WordNetICCorpusReader, ".*\.dat")
./nltk/corpus/__init__.py:374: DeprecationWarning: invalid escape sequence \.
  "frames/.*\.xml",
./nltk/corpus/__init__.py:383: DeprecationWarning: invalid escape sequence \.
  "frames/.*\.xml",
./nltk/corpus/__init__.py:392: DeprecationWarning: invalid escape sequence \.
  "frames/.*\.xml",
./nltk/corpus/__init__.py:401: DeprecationWarning: invalid escape sequence \.
  "frames/.*\.xml",
./nltk/text.py:650: DeprecationWarning: invalid escape sequence \w
  _CONTEXT_RE = re.compile("\w+|[\.\!\?]")
./nltk/inference/discourse.py:9: DeprecationWarning: invalid escape sequence \ 
  """
./nltk/tree.py:38: DeprecationWarning: invalid escape sequence \ 
  """
./nltk/tree.py:652: DeprecationWarning: invalid escape sequence \s
  if re.search("\s", brackets):
./nltk/tree.py:658: DeprecationWarning: invalid escape sequence \s
  node_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)
./nltk/tree.py:660: DeprecationWarning: invalid escape sequence \s
  leaf_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)
./nltk/tree.py:662: DeprecationWarning: invalid escape sequence \s
  "%s\s*(%s)?|%s|(%s)"
./nltk/tree.py:900: DeprecationWarning: invalid escape sequence \$
  reserved_chars = re.compile("([#\$%&~_\{\}])")
./nltk/ccg/combinator.py:220: DeprecationWarning: invalid escape sequence \Y
  """
./nltk/tokenize/toktok.py:53: DeprecationWarning: invalid escape sequence \]
  FUNKY_PUNCT_1 = re.compile(u'([،;؛¿!"\])}»›”؟¡%٪°±©®।॥…])'), r" \1 "
./nltk/tokenize/toktok.py:55: DeprecationWarning: invalid escape sequence \[
  FUNKY_PUNCT_2 = re.compile(u"([({\[“‘„‚«‹「『])"), r" \1 "
./nltk/tokenize/toktok.py:62: DeprecationWarning: invalid escape sequence \|
  PIPE = re.compile("\|"), " &#124; "
./nltk/tokenize/punkt.py:1462: DeprecationWarning: invalid escape sequence \s
  pat = "\s*".join(re.escape(c) for c in tok)
./nltk/tokenize/repp.py:133: DeprecationWarning: invalid escape sequence \(
  line_regex = re.compile("^\((\d+), (\d+), (.+)\)$", re.MULTILINE)
./nltk/tokenize/nist.py:81: DeprecationWarning: invalid escape sequence \{
  PUNCT = re.compile("([\{-\~\[-\` -\&\(-\+\:-\@\/])"), " \\1 "
./nltk/tokenize/nist.py:83: DeprecationWarning: invalid escape sequence \.
  PERIOD_COMMA_PRECEED = re.compile("([^0-9])([\.,])"), "\\1 \\2 "
./nltk/tokenize/nist.py:85: DeprecationWarning: invalid escape sequence \.
  PERIOD_COMMA_FOLLOW = re.compile("([\.,])([^0-9])"), " \\1 \\2"
./nltk/tokenize/treebank.py:194: DeprecationWarning: invalid escape sequence \]
  """
./nltk/tokenize/treebank.py:255: DeprecationWarning: invalid escape sequence \s
  re.compile(pattern.replace("(?#X)", "\s"))
./nltk/tokenize/treebank.py:259: DeprecationWarning: invalid escape sequence \s
  re.compile(pattern.replace("(?#X)", "\s"))
./nltk/tokenize/texttiling.py:96: DeprecationWarning: invalid escape sequence \-
  c for c in lowercase_text if re.match("[a-z\-' \n\t]", c)
./nltk/tokenize/texttiling.py:229: DeprecationWarning: invalid escape sequence \w
  matches = re.finditer("\w+", text)
./nltk/tokenize/regexp.py:76: DeprecationWarning: invalid escape sequence \w
  """
./nltk/tokenize/regexp.py:184: DeprecationWarning: invalid escape sequence \w
  """
./nltk/classify/maxent.py:1292: DeprecationWarning: invalid escape sequence \ 
  """
./nltk/classify/rte_classify.py:61: DeprecationWarning: invalid escape sequence \w
  tokenizer = RegexpTokenizer("[\w.@:/]+|\w+|\$[\d.]+")
./nltk/parse/chart.py:1024: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1057: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1123: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1140: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1213: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1232: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/featurechart.py:251: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/featurechart.py:353: DeprecationWarning: invalid escape sequence \*
  """
./nltk/chunk/util.py:371: DeprecationWarning: invalid escape sequence \S
  _LINE_RE = re.compile("(\S+)\s+(\S+)\s+([IOB])-?(\S+)?")
./nltk/chunk/util.py:517: DeprecationWarning: invalid escape sequence \w
  _IEER_TYPE_RE = re.compile('<b_\w+\s+[^>]*?type="(?P<type>\w+)"')
./nltk/chunk/util.py:526: DeprecationWarning: invalid escape sequence \s
  for piece_m in re.finditer("<[^>]+>|[^\s<]+", s):
./nltk/chunk/named_entity.py:178: DeprecationWarning: invalid escape sequence \w
  elif re.match("\w+$", word, re.UNICODE):
./nltk/chunk/named_entity.py:176: DeprecationWarning: invalid escape sequence \W
  elif re.match("\W+$", word, re.UNICODE):
./nltk/chunk/named_entity.py:174: DeprecationWarning: invalid escape sequence \.
  if re.match("[0-9]+(\.[0-9]*)?|[0-9]*\.[0-9]+$", word, re.UNICODE):
./nltk/chunk/named_entity.py:250: DeprecationWarning: invalid escape sequence \s
  text = re.sub("[\s\S]*<TEXT>", subfunc, text)
./nltk/chunk/named_entity.py:251: DeprecationWarning: invalid escape sequence \s
  text = re.sub("</TEXT>[\s\S]*", "", text)
./nltk/chunk/regexp.py:70: DeprecationWarning: invalid escape sequence \{
  _BRACKETS = re.compile("[^\{\}]+")
./nltk/chunk/regexp.py:215: DeprecationWarning: invalid escape sequence \{
  s = re.sub("\{\}", "", s)
./nltk/chunk/regexp.py:426: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "{\g<chunk>}", descr)
./nltk/chunk/regexp.py:471: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "}\g<chink>{", descr)
./nltk/chunk/regexp.py:510: DeprecationWarning: invalid escape sequence \{
  regexp = re.compile("\{(?P<chunk>%s)\}" % tag_pattern2re_pattern(tag_pattern))
./nltk/chunk/regexp.py:511: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "\g<chunk>", descr)
./nltk/chunk/regexp.py:575: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "\g<left>", descr)
./nltk/chunk/regexp.py:708: DeprecationWarning: invalid escape sequence \{
  "(?P<left>%s)\{(?P<right>%s)"
./nltk/chunk/regexp.py:714: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "{\g<left>\g<right>", descr)
./nltk/chunk/regexp.py:778: DeprecationWarning: invalid escape sequence \}
  "(?P<left>%s)\}(?P<right>%s)"
./nltk/chunk/regexp.py:784: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "\g<left>\g<right>}", descr)
./nltk/chunk/regexp.py:896: DeprecationWarning: invalid escape sequence \{
  r"^((%s|<%s>)*)$" % ("([^\{\}<>]|\{\d+,?\}|\{\d*,\d+\})+", "[^\{\}<>]+")
./nltk/chunk/regexp.py:896: DeprecationWarning: invalid escape sequence \{
  r"^((%s|<%s>)*)$" % ("([^\{\}<>]|\{\d+,?\}|\{\d*,\d+\})+", "[^\{\}<>]+")
./nltk/chunk/regexp.py:1136: DeprecationWarning: invalid escape sequence \.
  """
./nltk/featstruct.py:1295: DeprecationWarning: invalid escape sequence \d
  name, n = re.sub("\d+$", "", var.name), 2
./nltk/featstruct.py:2091: DeprecationWarning: invalid escape sequence \d
  RANGE_RE = re.compile("(-?\d+):(-?\d+)")
./nltk/draw/cfg.py:166: DeprecationWarning: invalid escape sequence \s
  _ARROW_RE = re.compile("\s*(->|(" + ARROW + "))\s*")
./nltk/draw/cfg.py:166: DeprecationWarning: invalid escape sequence \s
  _ARROW_RE = re.compile("\s*(->|(" + ARROW + "))\s*")
./nltk/draw/cfg.py:171: DeprecationWarning: invalid escape sequence \s
  + "))\s*"
./nltk/toolbox.py:159: DeprecationWarning: invalid escape sequence \_
  """
./nltk/grammar.py:1278: DeprecationWarning: invalid escape sequence \*
  """
./nltk/grammar.py:1463: DeprecationWarning: invalid escape sequence \w
  _STANDARD_NONTERM_RE = re.compile("( [\w/][\w/^<>-]* ) \s*", re.VERBOSE)
./nltk/stem/porter.py:145: DeprecationWarning: invalid escape sequence \m
  """Returns the 'measure' of stem, per definition in the paper
./nltk/stem/lancaster.py:192: DeprecationWarning: invalid escape sequence \*
  valid_rule = re.compile("^[a-z]+\*?\d[a-z]*[>\.]?$")
./nltk/stem/lancaster.py:225: DeprecationWarning: invalid escape sequence \*
  valid_rule = re.compile("^([a-z]+)(\*?)(\d)([a-z]*)([>\.]?)$")
./nltk/treetransforms.py:8: DeprecationWarning: invalid escape sequence \ 
  """
./tools/nltk_term_index.py:52: DeprecationWarning: invalid escape sequence \s
  SCAN_RE1 = "<programlisting>[\s\S]*?</programlisting>"
./tools/nltk_term_index.py:53: DeprecationWarning: invalid escape sequence \s
  SCAN_RE2 = "<literal>[\s\S]*?</literal>"
./tools/nltk_term_index.py:56: DeprecationWarning: invalid escape sequence \w
  TOKEN_RE = re.compile('[\w\.]+')
./tools/find_deprecated.py:43: DeprecationWarning: invalid escape sequence \s
  '"""[\s\S]*?"""|'
./tools/find_deprecated.py:45: DeprecationWarning: invalid escape sequence \s
  "'''[\s\S]*?'''|"
./tools/find_deprecated.py:47: DeprecationWarning: invalid escape sequence \s
  ")\s*"
./tools/find_deprecated.py:64: DeprecationWarning: invalid escape sequence \.
  '({})\.read\('.format('|'.join(re.escape(n) for n in dir(nltk.corpus)))
./tools/find_deprecated.py:67: DeprecationWarning: invalid escape sequence \s
  CLASS_DEF_RE = re.compile('^\s*class\s+(\w+)\s*[:\(]')

tirkarthi 于 2020-01-19

@gertjanwytynck我目前正在

ab-10 于 2020-01-21

🚀2

完成了吗？

morrme 于 2020-10-19

似乎还剩下一些。我想知道添加单元测试是否有帮助。

./nltk/tools/nltk_term_index.py
./nltk/tools/find_deprecated.py
./nltk/nltk/tokenize/punkt.py

...尽管工具弃用的影响不大，但具有讽刺意味的是find_deprecated.py脚本使用了弃用的语法:)

$ git clone https://github.com/nltk/nltk.git
$ find . -iname '*.py' | xargs -P 4 -I{} python3.8 -Wall -m py_compile {}
./nltk/tools/nltk_term_index.py:51: DeprecationWarning: invalid escape sequence \s
  SCAN_RE1 = "<programlisting>[\s\S]*?</programlisting>"
./nltk/tools/nltk_term_index.py:52: DeprecationWarning: invalid escape sequence \s
  SCAN_RE2 = "<literal>[\s\S]*?</literal>"
./nltk/tools/nltk_term_index.py:55: DeprecationWarning: invalid escape sequence \w
  TOKEN_RE = re.compile('[\w\.]+')
./nltk/tools/find_deprecated.py:42: DeprecationWarning: invalid escape sequence \s
  '"""[\s\S]*?"""|'
./nltk/tools/find_deprecated.py:44: DeprecationWarning: invalid escape sequence \s
  "'''[\s\S]*?'''|"
./nltk/tools/find_deprecated.py:46: DeprecationWarning: invalid escape sequence \s
  ")\s*"
./nltk/tools/find_deprecated.py:63: DeprecationWarning: invalid escape sequence \.
  '({})\.read\('.format('|'.join(re.escape(n) for n in dir(nltk.corpus)))
./nltk/tools/find_deprecated.py:66: DeprecationWarning: invalid escape sequence \s
  CLASS_DEF_RE = re.compile('^\s*class\s+(\w+)\s*[:\(]')
./nltk/nltk/tokenize/punkt.py:223: DeprecationWarning: invalid escape sequence \]
  return "(?:[)\";}\]\*:@\'\({\[%s])" % re.escape("".join(set(self.sent_end_chars) - {"."}))

pombredanne 于 2020-10-19

Nltk: 更新各种正则表达式转义序列

最有用的评论

所有14条评论

相关问题