nltk 🚀 - Mettre à jour diverses séquences d'échappement de regex

S'il n'y a personne qui travaille là-dessus, j'aimerais bien. Pouvez-vous indiquer les étapes à suivre pour dupliquer le problème s'il vous plaît?

PabloDino le 31 août 2019

👍2 ❤1

@PabloDino Installez Python 3.6.8 ou une version ultérieure et essayez d'importer chaque module. Le correctif de l'expression régulière soit en utilisant des chaînes brutes ou en utilisant un échappement approprié pour que cela fonctionne à la fois sur Python 2 et 3

pombredanne le 31 août 2019

J'y suis - j'ai travaillé sur quelques exercices mais je n'ai vu aucun avertissement. Pouvez-vous publier un extrait de code dans lequel les avertissements manifestent pl

PabloDino le 3 sept. 2019

@PabloDino :

$ python --version
Python 3.6.8
$ git clone git://github.com/nltk/nltk.git
$ pip install pytest
$ pytest -vvs nltk/ --collect-only

========================================= warnings summary =========================================
nltk/nltk/featstruct.py:1295
  /home/pombreda/tmp/nl/nltk/nltk/featstruct.py:1295: DeprecationWarning: invalid escape sequence \d
    name, n = re.sub("\d+$", "", var.name), 2

nltk/nltk/featstruct.py:2091
  /home/pombreda/tmp/nl/nltk/nltk/featstruct.py:2091: DeprecationWarning: invalid escape sequence \d
    RANGE_RE = re.compile("(-?\d+):(-?\d+)")

nltk/nltk/sem/evaluate.py:307
  /home/pombreda/tmp/nl/nltk/nltk/sem/evaluate.py:307: DeprecationWarning: invalid escape sequence \ 
    """

nltk/nltk/sem/relextract.py:128
  /home/pombreda/tmp/nl/nltk/nltk/sem/relextract.py:128: DeprecationWarning: invalid escape sequence \w
    ENT = re.compile("&(\w+?);")

nltk/nltk/sem/relextract.py:407
  /home/pombreda/tmp/nl/nltk/nltk/sem/relextract.py:407: DeprecationWarning: invalid escape sequence \s
    """

nltk/nltk/sem/boxer.py:776
  /home/pombreda/tmp/nl/nltk/nltk/sem/boxer.py:776: DeprecationWarning: invalid escape sequence \d
    assert re.match("^[exps]\d+$", var), var

nltk/nltk/sem/drt.py:716
  /home/pombreda/tmp/nl/nltk/nltk/sem/drt.py:716: DeprecationWarning: invalid escape sequence \ 
    + [" \  " + blank + line for line in term_lines[1:2]]

nltk/nltk/sem/drt.py:717
  /home/pombreda/tmp/nl/nltk/nltk/sem/drt.py:717: DeprecationWarning: invalid escape sequence \ 
    + [" /\ " + var_string + line for line in term_lines[2:3]]

nltk/nltk/grammar.py:1291
  /home/pombreda/tmp/nl/nltk/nltk/grammar.py:1291: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/grammar.py:1463
  /home/pombreda/tmp/nl/nltk/nltk/grammar.py:1463: DeprecationWarning: invalid escape sequence \w
    _STANDARD_NONTERM_RE = re.compile("( [\w/][\w/^<>-]* ) \s*", re.VERBOSE)

nltk/nltk/text.py:650
  /home/pombreda/tmp/nl/nltk/nltk/text.py:650: DeprecationWarning: invalid escape sequence \w
    _CONTEXT_RE = re.compile("\w+|[\.\!\?]")

nltk/nltk/tokenize/punkt.py:1462
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/punkt.py:1462: DeprecationWarning: invalid escape sequence \s
    pat = "\s*".join(re.escape(c) for c in tok)

nltk/nltk/tokenize/regexp.py:100
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/regexp.py:100: DeprecationWarning: invalid escape sequence \w
    """

nltk/nltk/tokenize/regexp.py:193
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/regexp.py:193: DeprecationWarning: invalid escape sequence \w
    """

nltk/nltk/tokenize/repp.py:133
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/repp.py:133: DeprecationWarning: invalid escape sequence \(
    line_regex = re.compile("^\((\d+), (\d+), (.+)\)$", re.MULTILINE)

nltk/nltk/tokenize/texttiling.py:96
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/texttiling.py:96: DeprecationWarning: invalid escape sequence \-
    c for c in lowercase_text if re.match("[a-z\-' \n\t]", c)

nltk/nltk/tokenize/texttiling.py:229
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/texttiling.py:229: DeprecationWarning: invalid escape sequence \w
    matches = re.finditer("\w+", text)

nltk/nltk/tokenize/toktok.py:53
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:53: DeprecationWarning: invalid escape sequence \]
    FUNKY_PUNCT_1 = re.compile(u'([،;؛¿!"\])}»›”؟¡%٪°±©®।॥…])'), r" \1 "

nltk/nltk/tokenize/toktok.py:55
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:55: DeprecationWarning: invalid escape sequence \[
    FUNKY_PUNCT_2 = re.compile(u"([({\[“‘„‚«‹「『])"), r" \1 "

nltk/nltk/tokenize/toktok.py:62
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:62: DeprecationWarning: invalid escape sequence \|
    PIPE = re.compile("\|"), " &#124; "

nltk/nltk/tokenize/treebank.py:269
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:269: DeprecationWarning: invalid escape sequence \]
    """

nltk/nltk/tokenize/treebank.py:273
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:273: DeprecationWarning: invalid escape sequence \s
    re.compile(pattern.replace("(?#X)", "\s"))

nltk/nltk/tokenize/treebank.py:277
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:277: DeprecationWarning: invalid escape sequence \s
    re.compile(pattern.replace("(?#X)", "\s"))

nltk/nltk/tree.py:99
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:99: DeprecationWarning: invalid escape sequence \ 
    """

nltk/nltk/tree.py:652
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:652: DeprecationWarning: invalid escape sequence \s
    if re.search("\s", brackets):

nltk/nltk/tree.py:658
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:658: DeprecationWarning: invalid escape sequence \s
    node_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)

nltk/nltk/tree.py:660
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:660: DeprecationWarning: invalid escape sequence \s
    leaf_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)

nltk/nltk/tree.py:662
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:662: DeprecationWarning: invalid escape sequence \s
    "%s\s*(%s)?|%s|(%s)"

nltk/nltk/tree.py:900
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:900: DeprecationWarning: invalid escape sequence \$
    reserved_chars = re.compile("([#\$%&~_\{\}])")

nltk/nltk/parse/chart.py:1034
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1034: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1073
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1073: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1128
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1128: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1148
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1148: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1218
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1218: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1241
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1241: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/featurechart.py:270
  /home/pombreda/tmp/nl/nltk/nltk/parse/featurechart.py:270: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/featurechart.py:369
  /home/pombreda/tmp/nl/nltk/nltk/parse/featurechart.py:369: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/tag/sequential.py:730
  /home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:730: DeprecationWarning: invalid escape sequence \w
    elif re.match("\w+$", word):

nltk/nltk/tag/sequential.py:724
  /home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:724: DeprecationWarning: invalid escape sequence \W
    elif re.match("\W+$", word):

nltk/nltk/tag/sequential.py:722
  /home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:722: DeprecationWarning: invalid escape sequence \.
    if re.match("[0-9]+(\.[0-9]*)?|[0-9]*\.[0-9]+$", word):

nltk/nltk/classify/rte_classify.py:61
  /home/pombreda/tmp/nl/nltk/nltk/classify/rte_classify.py:61: DeprecationWarning: invalid escape sequence \w
    tokenizer = RegexpTokenizer("[\w.@:/]+|\w+|\$[\d.]+")

nltk/nltk/classify/maxent.py:1351
  /home/pombreda/tmp/nl/nltk/nltk/classify/maxent.py:1351: DeprecationWarning: invalid escape sequence \ 
    """

nltk/nltk/chunk/util.py:371
  /home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:371: DeprecationWarning: invalid escape sequence \S
    _LINE_RE = re.compile("(\S+)\s+(\S+)\s+([IOB])-?(\S+)?")

nltk/nltk/chunk/util.py:517
  /home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:517: DeprecationWarning: invalid escape sequence \w
    _IEER_TYPE_RE = re.compile('<b_\w+\s+[^>]*?type="(?P<type>\w+)"')

nltk/nltk/chunk/util.py:526
  /home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:526: DeprecationWarning: invalid escape sequence \s
    for piece_m in re.finditer("<[^>]+>|[^\s<]+", s):

nltk/nltk/chunk/regexp.py:70
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:70: DeprecationWarning: invalid escape sequence \{
    _BRACKETS = re.compile("[^\{\}]+")

nltk/nltk/chunk/regexp.py:215
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:215: DeprecationWarning: invalid escape sequence \{
    s = re.sub("\{\}", "", s)

nltk/nltk/chunk/regexp.py:426
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:426: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "{\g<chunk>}", descr)

nltk/nltk/chunk/regexp.py:471
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:471: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "}\g<chink>{", descr)

nltk/nltk/chunk/regexp.py:510
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:510: DeprecationWarning: invalid escape sequence \{
    regexp = re.compile("\{(?P<chunk>%s)\}" % tag_pattern2re_pattern(tag_pattern))

nltk/nltk/chunk/regexp.py:511
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:511: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "\g<chunk>", descr)

nltk/nltk/chunk/regexp.py:575
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:575: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "\g<left>", descr)

nltk/nltk/chunk/regexp.py:708
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:708: DeprecationWarning: invalid escape sequence \{
    "(?P<left>%s)\{(?P<right>%s)"

nltk/nltk/chunk/regexp.py:714
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:714: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "{\g<left>\g<right>", descr)

nltk/nltk/chunk/regexp.py:778
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:778: DeprecationWarning: invalid escape sequence \}
    "(?P<left>%s)\}(?P<right>%s)"

nltk/nltk/chunk/regexp.py:784
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:784: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "\g<left>\g<right>}", descr)

nltk/nltk/chunk/regexp.py:896
nltk/nltk/chunk/regexp.py:896
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:896: DeprecationWarning: invalid escape sequence \{
    r"^((%s|<%s>)*)$" % ("([^\{\}<>]|\{\d+,?\}|\{\d*,\d+\})+", "[^\{\}<>]+")

nltk/nltk/chunk/regexp.py:1175
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:1175: DeprecationWarning: invalid escape sequence \.
    """

nltk/nltk/inference/discourse.py:44
  /home/pombreda/tmp/nl/nltk/nltk/inference/discourse.py:44: DeprecationWarning: invalid escape sequence \ 
    """

nltk/nltk/stem/lancaster.py:192
  /home/pombreda/tmp/nl/nltk/nltk/stem/lancaster.py:192: DeprecationWarning: invalid escape sequence \*
    valid_rule = re.compile("^[a-z]+\*?\d[a-z]*[>\.]?$")

nltk/nltk/stem/lancaster.py:225
  /home/pombreda/tmp/nl/nltk/nltk/stem/lancaster.py:225: DeprecationWarning: invalid escape sequence \*
    valid_rule = re.compile("^([a-z]+)(\*?)(\d)([a-z]*)([>\.]?)$")

nltk/nltk/stem/porter.py:177
  /home/pombreda/tmp/nl/nltk/nltk/stem/porter.py:177: DeprecationWarning: invalid escape sequence \m
    """

nltk/nltk/corpus/__init__.py:116
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:116: DeprecationWarning: invalid escape sequence \.
    ".*\.(test|train).*",

nltk/nltk/corpus/__init__.py:123
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:123: DeprecationWarning: invalid escape sequence \.
    ".*\.(test|train).*",

nltk/nltk/corpus/__init__.py:126
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:126: DeprecationWarning: invalid escape sequence \.
    crubadan = LazyCorpusLoader("crubadan", CrubadanCorpusReader, ".*\.txt")

nltk/nltk/corpus/__init__.py:128
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:128: DeprecationWarning: invalid escape sequence \.
    "dependency_treebank", DependencyCorpusReader, ".*\.dp", encoding="ascii"

nltk/nltk/corpus/__init__.py:311
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:311: DeprecationWarning: invalid escape sequence \.
    "timit", TimitTaggedCorpusReader, ".+\.tags", tagset="wsj", encoding="ascii"

nltk/nltk/corpus/__init__.py:335
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:335: DeprecationWarning: invalid escape sequence \.
    twitter_samples = LazyCorpusLoader("twitter_samples", TwitterCorpusReader, ".*\.json")

nltk/nltk/corpus/__init__.py:364
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:364: DeprecationWarning: invalid escape sequence \.
    wordnet_ic = LazyCorpusLoader("wordnet_ic", WordNetICCorpusReader, ".*\.dat")

nltk/nltk/corpus/__init__.py:374
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:374: DeprecationWarning: invalid escape sequence \.
    "frames/.*\.xml",

nltk/nltk/corpus/__init__.py:383
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:383: DeprecationWarning: invalid escape sequence \.
    "frames/.*\.xml",

nltk/nltk/corpus/__init__.py:392
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:392: DeprecationWarning: invalid escape sequence \.
    "frames/.*\.xml",

nltk/nltk/corpus/__init__.py:401
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:401: DeprecationWarning: invalid escape sequence \.
    "frames/.*\.xml",

nltk/nltk/corpus/reader/plaintext.py:62
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/plaintext.py:62: DeprecationWarning: invalid escape sequence \.
    """

nltk/nltk/corpus/reader/util.py:635
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/util.py:635: DeprecationWarning: invalid escape sequence \d
    if re.match("^\d+-\d+", line) is not None:

nltk/nltk/corpus/reader/util.py:859
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/util.py:859: DeprecationWarning: invalid escape sequence \s
    if re.match("======+\s*$", line):

nltk/nltk/corpus/reader/api.py:77
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/api.py:77: DeprecationWarning: invalid escape sequence \.
    m = re.match("(.*\.zip)/?(.*)$|", root)

nltk/nltk/corpus/reader/timit.py:165
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/timit.py:165: DeprecationWarning: invalid escape sequence \.
    encoding = [(".*\.wav", None), (".*", encoding)]

nltk/nltk/corpus/reader/bracket_parse.py:214
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/bracket_parse.py:214: DeprecationWarning: invalid escape sequence \.
    "alpino\.xml",

nltk/nltk/corpus/reader/xmldocs.py:232
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/xmldocs.py:232: DeprecationWarning: invalid escape sequence \s
    _XML_TAG_NAME = re.compile("<\s*/?\s*([^\s>]+)")

nltk/nltk/toolbox.py:209
  /home/pombreda/tmp/nl/nltk/nltk/toolbox.py:209: DeprecationWarning: invalid escape sequence \_
    """

nltk/nltk/corpus/reader/bnc.py:29
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/bnc.py:29: DeprecationWarning: invalid escape sequence \w
    """

nltk/nltk/corpus/reader/switchboard.py:113
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/switchboard.py:113: DeprecationWarning: invalid escape sequence \w
    _UTTERANCE_RE = re.compile("(\w+)\.(\d+)\:\s*(.*)")

nltk/nltk/corpus/reader/childes.py:281
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/childes.py:281: DeprecationWarning: invalid escape sequence \d
    m = re.match("P(\d+)Y(\d+)M?(\d?\d?)D?", age_year)

nltk/nltk/corpus/reader/framenet.py:2753
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/framenet.py:2753: DeprecationWarning: invalid escape sequence \w
    """

nltk/nltk/corpus/reader/udhr.py:30
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/udhr.py:30: DeprecationWarning: invalid escape sequence \-
    ("Abkhaz\-Cyrillic\+Abkh", "cp1251"),

nltk/nltk/corpus/reader/twitter.py:54
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/twitter.py:54: DeprecationWarning: invalid escape sequence \.
    """

nltk/nltk/ccg/combinator.py:225
  /home/pombreda/tmp/nl/nltk/nltk/ccg/combinator.py:225: DeprecationWarning: invalid escape sequence \Y
    """

nltk/nltk/treetransforms.py:108
  /home/pombreda/tmp/nl/nltk/nltk/treetransforms.py:108: DeprecationWarning: invalid escape sequence \ 
    """

pombredanne le 7 sept. 2019

Et FWIW: https://docs.python.org/3/reference/lexical_analysis.html#string -and-bytes-literals

Contrairement au standard C, toutes les séquences d'échappement non reconnues sont laissées dans la chaîne inchangées, c'est-à-dire que la barre oblique inverse est laissée dans le résultat. (Ce comportement est utile lors du débogage: si une séquence d'échappement est mal tapée, la sortie résultante est plus facilement reconnue comme cassée.) Il est également important de noter que les séquences d'échappement uniquement reconnues dans les chaînes littérales tombent dans la catégorie des échappements non reconnus pour les octets littéraux.

Modifié dans la version 3.6: les séquences d'échappement non reconnues génèrent un DeprecationWarning. Dans certaines versions futures de Python, ils seront une SyntaxError.

pombredanne le 7 sept. 2019

$ python --version
Python 3.6.7
$ pytest --version
Il s'agit de la version 5.1.2 de pytest , importée de

cachedir: .pytest_cache
rootdir: ** / nltk
a collecté 381 articles

Tests unitaires pour nltk.compat.
Voir aussi nltk / test / compat.doctest.

Tests unitaires pour nltk.metrics.aline

Test de l'algorithme Aline pour aligner les séquences phonétiques

Test aline pour calculer la différence entre deux segments

Tests pour le tagueur Brill.

Tester le bogue https://github.com/nltk/nltk/issues/1597

    Ensures that curly bracket quantifiers can be used inside a chunk rule.
    This type of quantifier has been used for the supplementary example
    in http://www.nltk.org/book/ch07.html#exploring-text-corpora.

Tests unitaires pour nltk.classify. Voir aussi: nltk / test / classify.doctest

Texte construit à l'aide de: http://www.nltk.org/book/ch01.html

Test simulé pour les wrappers Stanford CoreNLP.

Tests de régression de la vue Corpus

Classe contenant les tests unitaires pour nltk.metrics.agreement.Disagreement.

Test plus avancé, basé sur
http://www.agreestat.com/research_papers/onkrippendorffalpha.pdf

Même exemple plus avancé, mais avec 1 note supprimée.
Encore une fois, la suppression de cette note 1 ne devrait pas avoir d'importance.

Test simple, basé sur
https://github.com/foolswood/krippendorffs_alpha/raw/master/krippendorff.pdf.

Même test simple avec 1 note supprimée.
La suppression de cette note ne devrait pas avoir d'importance: K-Apha ignore les éléments avec
seulement 1 note.

Tests de régression pour json2csv() et json2csv_entities() sur Twitter
paquet.

Vérifiez que la comparaison de fichiers ne donne pas de faux positifs.

Tests unitaires pour nltk.corpus.nombank

Tests pour nltk.pos_tag

Le test suivant effectue une série aléatoire de lectures, de recherches et
dit et vérifie que les résultats sont cohérents.

Tests unitaires pour Senna

Unittest pour nltk.classify.senna

Interface du pipeline Senna

Unittest pour nltk.tag.senna

cet essai unitaire pour tester le tube de lumière arabe boule de neige
ce stemmer traite des préfixes et des suffixes

Tester le bogue https://github.com/nltk/nltk/issues/1581

    Ensures that 'oed' can be stemmed without throwing an error.
  <TestCaseFunction test_vocabulary_martin_mode>
    Tests all words from the test vocabulary provided by M Porter

    The sample vocabulary and output were sourced from:
    http://tartarus.org/martin/PorterStemmer/voc.txt
    http://tartarus.org/martin/PorterStemmer/output.txt
    and are linked to from the Porter Stemmer algorithm's homepage
    at
    http://tartarus.org/martin/PorterStemmer/
  <TestCaseFunction test_vocabulary_nltk_mode>
  <TestCaseFunction test_vocabulary_original_mode>

Tests unitaires pour nltk.tgrep.

Classe contenant les tests unitaires pour nltk.tgrep.

Tester la gestion des erreurs des opérateurs tgrep non définis.

Vérifiez que les commentaires sont correctement filtrés de la recherche tgrep
cordes.

Testez les exemples de base du manuel TGrep2.

Testez les nœuds étiquetés.

    Test case from Emily M. Bender.
  <TestCaseFunction test_multiple_conjs>
    Test that multiple (3 or more) conjunctions of node relations are
    handled properly.
  <TestCaseFunction test_node_encoding>
    Test that tgrep search strings handles bytes and strs the same
    way.
  <TestCaseFunction test_node_nocase>
    Test selecting nodes using case insensitive node names.
  <TestCaseFunction test_node_noleaves>
    Test node name matching with the search_leaves flag set to False.
  <TestCaseFunction test_node_printing>
    Test that the tgrep print operator ' is properly ignored.
  <TestCaseFunction test_node_quoted>
    Test selecting nodes using quoted node names.
  <TestCaseFunction test_node_regex>
    Test regex matching on nodes.
  <TestCaseFunction test_node_regex_2>
    Test regex matching on nodes.
  <TestCaseFunction test_node_simple>
    Test a simple use of tgrep for finding nodes matching a given
    pattern.
  <TestCaseFunction test_node_tree_position>
    Test matching on nodes based on NLTK tree position.
  <TestCaseFunction test_rel_precedence>
    Test matching nodes based on precedence relations.
  <TestCaseFunction test_rel_sister_nodes>
    Test matching sister nodes in a tree.
  <TestCaseFunction test_tokenize_encoding>
    Test that tokenization handles bytes and strs the same way.
  <TestCaseFunction test_tokenize_examples>
    Test tokenization of the TGrep2 manual example patterns.
  <TestCaseFunction test_tokenize_link_types>
    Test tokenization of basic link types.
  <TestCaseFunction test_tokenize_macros>
    Test tokenization of macro definitions.
  <TestCaseFunction test_tokenize_node_labels>
    Test tokenization of labeled nodes.
  <TestCaseFunction test_tokenize_nodenames>
    Test tokenization of node names.
  <TestCaseFunction test_tokenize_quoting>
    Test tokenization of quoting.
  <TestCaseFunction test_tokenize_segmented_patterns>
    Test tokenization of segmented patterns.
  <TestCaseFunction test_tokenize_simple>
    Simple test of tokenization.
  <TestCaseFunction test_trailing_semicolon>
    Test that semicolons at the end of a tgrep2 search string won't
    cause a parse failure.
  <TestCaseFunction test_use_macros>
    Test defining and using tgrep2 macros.
  <TestCaseFunction tests_rel_dominance>
    Test matching nodes based on dominance relations.
  <TestCaseFunction tests_rel_indexed_children>
    Test matching nodes based on their index in their parent node.

Tests unitaires pour nltk.tokenize.
Voir aussi nltk / test / tokenize.doctest

Testez le remplissage de l'astérisque pour la tokenisation des mots

Testez le remplissage de dotdot * pour la tokenisation des mots.

Tester une chaîne qui ressemble à un numéro de téléphone mais qui contient une nouvelle ligne

Testez remove_handle () de casual.py avec des boîtiers de bord spécialement conçus

Testez le tokenizer SyllableTokenizer.

Tester le Stanford Word Segmenter pour l'arabe (configuration par défaut)

Tester le Stanford Word Segmenter pour le chinois (configuration par défaut)

Tester la fonction TreebankWordTokenizer.span_tokenize

Testez TweetTokenizer en utilisant des mots avec des caractères spéciaux et accentués.

Tester la fonction word_tokenize

Tests pour les parties statiques du package Twitter

Teste que les informations d'identification Twitter du fichier sont gérées correctement.

Le fichier d'informations d'identification par défaut est identifié

Le fichier d'informations d'identification par défaut a été lu correctement

Le chemin d'accès au fichier d'informations d'identification par défaut est bien formé, étant donné spécifié
subdir.

La définition de subdir sur un chemin vide devrait générer une erreur.

Définir subdir sur None devrait générer une erreur.

Vérifiez que la variable d'environnement a été lue correctement.

Le fichier d'informations d'identification 'bad_oauth1-1.txt' est incomplet

La première clé du fichier d'informations d'identification 'bad_oauth1-2.txt' est mal formée

La première clé du fichier d'informations d'identification 'bad_oauth1-2.txt' est mal formée

La définition de sous-répertoire sur un répertoire inexistant devrait générer une erreur.

Les valeurs par défaut pour l'authentification échoueront car «credentials.txt» n'est pas
présent dans le sous-répertoire par défaut, lu à partir de os.environ['TWITTER'] .

Le fichier d'informations d'identification 'foobar' est introuvable dans le sous-répertoire par défaut.

Tests unitaires pour nltk.corpus.wordnet
Voir aussi nltk / test / wordnet.doctest

Tests pour NgramCounter qui impliquent uniquement une recherche, aucune modification.

Tests unitaires pour le modèle ngram MLE.

Tests de modèle de trigramme MLE

Tests unitaires pour la classe Lidstone

Tests unitaires pour la classe Laplace

En utilisant le modèle MLE, générez du texte.

tests Classe de vocabulaire

Tests pour la métrique d'évaluation de la traduction BLEU

Exemples tirés du papier BLEU original
http://www.aclweb.org/anthology/P02-1040.pdf

Teste les alignements GDFA

Test de GDFA avec les 10 premières sorties eflomal du numéro 1829
https://github.com/nltk/nltk/issues/1829

Tests pour les méthodes de formation IBM Model 1

Tests pour les méthodes de formation IBM Model 2

Tests pour les méthodes de formation IBM Model 3

Tests pour les méthodes de formation IBM Model 4

Tests pour les méthodes de formation IBM Model 5

============================= aucun test n'a été exécuté en 2.13s ================= =============
$

PabloDino le 9 sept. 2019

Je vois la même sortie que @pombredanne.

stevenbird le 12 sept. 2019

Salut, @PabloDino envisage-

J'ai pu reproduire la sortie de @pombredanne et j'aimerais travailler sur la résolution de ce problème.

ab-10 le 30 sept. 2019

Allez-y, je n'ai pas encore répliqué

Le lun 30 septembre 2019 à 11:40 Armin Stepanjan [email protected]
a écrit:

Salut, @PabloDino https://github.com/PabloDino prévoit toujours de travailler
sur la question?
J'ai pu répliquer @pombredanne
https://github.com/pombredanne et aimerait travailler sur
résoudre ce problème.
-
Vous recevez cela parce que vous avez été mentionné.
Répondez directement à cet e-mail, affichez-le sur GitHub
https://github.com/nltk/nltk/issues/2378?email_source=notifications&email_token=ABRSN4KL27M5TYFOR65HRMDQMIMVPA5CNFSM4IRCRGM2YY3PNVWWK3TUL52HS4DissODFVREXJVM62MZLOBW63-58WNWWW63W60W60W6W60W6W60W6W60W6W6W6W6W6W6WW6W6WW6W6W6W6W6W6WW6WW6W6WWLW6W6WXW6W6W6WW6WW6W6W6WW6WW6W6WXW6W6WW6E5
ou couper le fil
https://github.com/notifications/unsubscribe-auth/ABRSN4KRASPRV6I4VLHFNILQMIMVPANCNFSM4IRCRGMQ
.

PabloDino le 12 oct. 2019

@ ab-10 Avez-vous pu corriger ces avertissements dep?

gertjanwytynck le 16 janv. 2020

Une liste mise à jour avec Python 3.8 avec l'exécution de la commande ci-dessous:

find . -iname '*.py' | xargs -P 4 -I{} python3.8 -Wall -m py_compile {}

./nltk/chat/iesha.py:52: DeprecationWarning: invalid escape sequence \<
  "u think I can%2??! really?? kekeke \<_\<",
./nltk/tag/sequential.py:730: DeprecationWarning: invalid escape sequence \w
  elif re.match("\w+$", word):
./nltk/tag/sequential.py:724: DeprecationWarning: invalid escape sequence \W
  elif re.match("\W+$", word):
./nltk/tag/sequential.py:722: DeprecationWarning: invalid escape sequence \.
  if re.match("[0-9]+(\.[0-9]*)?|[0-9]*\.[0-9]+$", word):
./nltk/app/chunkparser_app.py:206: DeprecationWarning: invalid escape sequence \#
  "\t<regexp><\#><CD> # This is a comment...</regexp>\n"
./nltk/app/chunkparser_app.py:315: DeprecationWarning: invalid escape sequence \s
  grammar = re.sub("\n\s+", "\n", grammar)
./nltk/app/chunkparser_app.py:1061: DeprecationWarning: invalid escape sequence \w
  key=lambda t_w: re.match("\w+", t_w[0])
./nltk/app/chunkparser_app.py:1422: DeprecationWarning: invalid escape sequence \#
  "^\# Regexp Chunk Parsing Grammar[\s\S]*" "F-score:.*\n", "", grammar
./nltk/sem/cooper_storage.py:48: DeprecationWarning: invalid escape sequence \P
  """
./nltk/sem/relextract.py:128: DeprecationWarning: invalid escape sequence \w
  ENT = re.compile("&(\w+?);")
./nltk/sem/relextract.py:382: DeprecationWarning: invalid escape sequence \s
  roles = """
./nltk/sem/boxer.py:776: DeprecationWarning: invalid escape sequence \d
  assert re.match("^[exps]\d+$", var), var
./nltk/sem/drt.py:716: DeprecationWarning: invalid escape sequence \ 
  + [" \  " + blank + line for line in term_lines[1:2]]
./nltk/sem/drt.py:717: DeprecationWarning: invalid escape sequence \ 
  + [" /\ " + var_string + line for line in term_lines[2:3]]
./nltk/sem/chat80.py:9: DeprecationWarning: invalid escape sequence \P
  """
./nltk/sem/chat80.py:705: DeprecationWarning: invalid escape sequence \P
  template = "PropN[num=sg, sem=<\P.(P %s)>] -> '%s'\n"
./nltk/sem/evaluate.py:257: DeprecationWarning: invalid escape sequence \ 
  """
./nltk/corpus/reader/util.py:635: DeprecationWarning: invalid escape sequence \d
  if re.match("^\d+-\d+", line) is not None:
./nltk/corpus/reader/util.py:859: DeprecationWarning: invalid escape sequence \s
  if re.match("======+\s*$", line):
./nltk/corpus/reader/framenet.py:2748: DeprecationWarning: invalid escape sequence \w
  """
./nltk/corpus/reader/bracket_parse.py:215: DeprecationWarning: invalid escape sequence \.
  "alpino\.xml",
./nltk/corpus/reader/twitter.py:25: DeprecationWarning: invalid escape sequence \.
  """
./nltk/corpus/reader/xmldocs.py:232: DeprecationWarning: invalid escape sequence \s
  _XML_TAG_NAME = re.compile("<\s*/?\s*([^\s>]+)")
./nltk/corpus/reader/bnc.py:15: DeprecationWarning: invalid escape sequence \w
  """Corpus reader for the XML version of the British National Corpus.
./nltk/corpus/reader/udhr.py:30: DeprecationWarning: invalid escape sequence \-
  ("Abkhaz\-Cyrillic\+Abkh", "cp1251"),
./nltk/corpus/reader/timit.py:165: DeprecationWarning: invalid escape sequence \.
  encoding = [(".*\.wav", None), (".*", encoding)]
./nltk/corpus/reader/childes.py:281: DeprecationWarning: invalid escape sequence \d
  m = re.match("P(\d+)Y(\d+)M?(\d?\d?)D?", age_year)
./nltk/corpus/reader/plaintext.py:47: DeprecationWarning: invalid escape sequence \.
  """
./nltk/corpus/reader/switchboard.py:113: DeprecationWarning: invalid escape sequence \w
  _UTTERANCE_RE = re.compile("(\w+)\.(\d+)\:\s*(.*)")
./nltk/corpus/reader/api.py:77: DeprecationWarning: invalid escape sequence \.
  m = re.match("(.*\.zip)/?(.*)$|", root)
./nltk/corpus/__init__.py:116: DeprecationWarning: invalid escape sequence \.
  ".*\.(test|train).*",
./nltk/corpus/__init__.py:123: DeprecationWarning: invalid escape sequence \.
  ".*\.(test|train).*",
./nltk/corpus/__init__.py:126: DeprecationWarning: invalid escape sequence \.
  crubadan = LazyCorpusLoader("crubadan", CrubadanCorpusReader, ".*\.txt")
./nltk/corpus/__init__.py:128: DeprecationWarning: invalid escape sequence \.
  "dependency_treebank", DependencyCorpusReader, ".*\.dp", encoding="ascii"
./nltk/corpus/__init__.py:311: DeprecationWarning: invalid escape sequence \.
  "timit", TimitTaggedCorpusReader, ".+\.tags", tagset="wsj", encoding="ascii"
./nltk/corpus/__init__.py:335: DeprecationWarning: invalid escape sequence \.
  twitter_samples = LazyCorpusLoader("twitter_samples", TwitterCorpusReader, ".*\.json")
./nltk/corpus/__init__.py:364: DeprecationWarning: invalid escape sequence \.
  wordnet_ic = LazyCorpusLoader("wordnet_ic", WordNetICCorpusReader, ".*\.dat")
./nltk/corpus/__init__.py:374: DeprecationWarning: invalid escape sequence \.
  "frames/.*\.xml",
./nltk/corpus/__init__.py:383: DeprecationWarning: invalid escape sequence \.
  "frames/.*\.xml",
./nltk/corpus/__init__.py:392: DeprecationWarning: invalid escape sequence \.
  "frames/.*\.xml",
./nltk/corpus/__init__.py:401: DeprecationWarning: invalid escape sequence \.
  "frames/.*\.xml",
./nltk/text.py:650: DeprecationWarning: invalid escape sequence \w
  _CONTEXT_RE = re.compile("\w+|[\.\!\?]")
./nltk/inference/discourse.py:9: DeprecationWarning: invalid escape sequence \ 
  """
./nltk/tree.py:38: DeprecationWarning: invalid escape sequence \ 
  """
./nltk/tree.py:652: DeprecationWarning: invalid escape sequence \s
  if re.search("\s", brackets):
./nltk/tree.py:658: DeprecationWarning: invalid escape sequence \s
  node_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)
./nltk/tree.py:660: DeprecationWarning: invalid escape sequence \s
  leaf_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)
./nltk/tree.py:662: DeprecationWarning: invalid escape sequence \s
  "%s\s*(%s)?|%s|(%s)"
./nltk/tree.py:900: DeprecationWarning: invalid escape sequence \$
  reserved_chars = re.compile("([#\$%&~_\{\}])")
./nltk/ccg/combinator.py:220: DeprecationWarning: invalid escape sequence \Y
  """
./nltk/tokenize/toktok.py:53: DeprecationWarning: invalid escape sequence \]
  FUNKY_PUNCT_1 = re.compile(u'([،;؛¿!"\])}»›”؟¡%٪°±©®।॥…])'), r" \1 "
./nltk/tokenize/toktok.py:55: DeprecationWarning: invalid escape sequence \[
  FUNKY_PUNCT_2 = re.compile(u"([({\[“‘„‚«‹「『])"), r" \1 "
./nltk/tokenize/toktok.py:62: DeprecationWarning: invalid escape sequence \|
  PIPE = re.compile("\|"), " &#124; "
./nltk/tokenize/punkt.py:1462: DeprecationWarning: invalid escape sequence \s
  pat = "\s*".join(re.escape(c) for c in tok)
./nltk/tokenize/repp.py:133: DeprecationWarning: invalid escape sequence \(
  line_regex = re.compile("^\((\d+), (\d+), (.+)\)$", re.MULTILINE)
./nltk/tokenize/nist.py:81: DeprecationWarning: invalid escape sequence \{
  PUNCT = re.compile("([\{-\~\[-\` -\&\(-\+\:-\@\/])"), " \\1 "
./nltk/tokenize/nist.py:83: DeprecationWarning: invalid escape sequence \.
  PERIOD_COMMA_PRECEED = re.compile("([^0-9])([\.,])"), "\\1 \\2 "
./nltk/tokenize/nist.py:85: DeprecationWarning: invalid escape sequence \.
  PERIOD_COMMA_FOLLOW = re.compile("([\.,])([^0-9])"), " \\1 \\2"
./nltk/tokenize/treebank.py:194: DeprecationWarning: invalid escape sequence \]
  """
./nltk/tokenize/treebank.py:255: DeprecationWarning: invalid escape sequence \s
  re.compile(pattern.replace("(?#X)", "\s"))
./nltk/tokenize/treebank.py:259: DeprecationWarning: invalid escape sequence \s
  re.compile(pattern.replace("(?#X)", "\s"))
./nltk/tokenize/texttiling.py:96: DeprecationWarning: invalid escape sequence \-
  c for c in lowercase_text if re.match("[a-z\-' \n\t]", c)
./nltk/tokenize/texttiling.py:229: DeprecationWarning: invalid escape sequence \w
  matches = re.finditer("\w+", text)
./nltk/tokenize/regexp.py:76: DeprecationWarning: invalid escape sequence \w
  """
./nltk/tokenize/regexp.py:184: DeprecationWarning: invalid escape sequence \w
  """
./nltk/classify/maxent.py:1292: DeprecationWarning: invalid escape sequence \ 
  """
./nltk/classify/rte_classify.py:61: DeprecationWarning: invalid escape sequence \w
  tokenizer = RegexpTokenizer("[\w.@:/]+|\w+|\$[\d.]+")
./nltk/parse/chart.py:1024: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1057: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1123: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1140: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1213: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1232: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/featurechart.py:251: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/featurechart.py:353: DeprecationWarning: invalid escape sequence \*
  """
./nltk/chunk/util.py:371: DeprecationWarning: invalid escape sequence \S
  _LINE_RE = re.compile("(\S+)\s+(\S+)\s+([IOB])-?(\S+)?")
./nltk/chunk/util.py:517: DeprecationWarning: invalid escape sequence \w
  _IEER_TYPE_RE = re.compile('<b_\w+\s+[^>]*?type="(?P<type>\w+)"')
./nltk/chunk/util.py:526: DeprecationWarning: invalid escape sequence \s
  for piece_m in re.finditer("<[^>]+>|[^\s<]+", s):
./nltk/chunk/named_entity.py:178: DeprecationWarning: invalid escape sequence \w
  elif re.match("\w+$", word, re.UNICODE):
./nltk/chunk/named_entity.py:176: DeprecationWarning: invalid escape sequence \W
  elif re.match("\W+$", word, re.UNICODE):
./nltk/chunk/named_entity.py:174: DeprecationWarning: invalid escape sequence \.
  if re.match("[0-9]+(\.[0-9]*)?|[0-9]*\.[0-9]+$", word, re.UNICODE):
./nltk/chunk/named_entity.py:250: DeprecationWarning: invalid escape sequence \s
  text = re.sub("[\s\S]*<TEXT>", subfunc, text)
./nltk/chunk/named_entity.py:251: DeprecationWarning: invalid escape sequence \s
  text = re.sub("</TEXT>[\s\S]*", "", text)
./nltk/chunk/regexp.py:70: DeprecationWarning: invalid escape sequence \{
  _BRACKETS = re.compile("[^\{\}]+")
./nltk/chunk/regexp.py:215: DeprecationWarning: invalid escape sequence \{
  s = re.sub("\{\}", "", s)
./nltk/chunk/regexp.py:426: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "{\g<chunk>}", descr)
./nltk/chunk/regexp.py:471: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "}\g<chink>{", descr)
./nltk/chunk/regexp.py:510: DeprecationWarning: invalid escape sequence \{
  regexp = re.compile("\{(?P<chunk>%s)\}" % tag_pattern2re_pattern(tag_pattern))
./nltk/chunk/regexp.py:511: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "\g<chunk>", descr)
./nltk/chunk/regexp.py:575: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "\g<left>", descr)
./nltk/chunk/regexp.py:708: DeprecationWarning: invalid escape sequence \{
  "(?P<left>%s)\{(?P<right>%s)"
./nltk/chunk/regexp.py:714: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "{\g<left>\g<right>", descr)
./nltk/chunk/regexp.py:778: DeprecationWarning: invalid escape sequence \}
  "(?P<left>%s)\}(?P<right>%s)"
./nltk/chunk/regexp.py:784: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "\g<left>\g<right>}", descr)
./nltk/chunk/regexp.py:896: DeprecationWarning: invalid escape sequence \{
  r"^((%s|<%s>)*)$" % ("([^\{\}<>]|\{\d+,?\}|\{\d*,\d+\})+", "[^\{\}<>]+")
./nltk/chunk/regexp.py:896: DeprecationWarning: invalid escape sequence \{
  r"^((%s|<%s>)*)$" % ("([^\{\}<>]|\{\d+,?\}|\{\d*,\d+\})+", "[^\{\}<>]+")
./nltk/chunk/regexp.py:1136: DeprecationWarning: invalid escape sequence \.
  """
./nltk/featstruct.py:1295: DeprecationWarning: invalid escape sequence \d
  name, n = re.sub("\d+$", "", var.name), 2
./nltk/featstruct.py:2091: DeprecationWarning: invalid escape sequence \d
  RANGE_RE = re.compile("(-?\d+):(-?\d+)")
./nltk/draw/cfg.py:166: DeprecationWarning: invalid escape sequence \s
  _ARROW_RE = re.compile("\s*(->|(" + ARROW + "))\s*")
./nltk/draw/cfg.py:166: DeprecationWarning: invalid escape sequence \s
  _ARROW_RE = re.compile("\s*(->|(" + ARROW + "))\s*")
./nltk/draw/cfg.py:171: DeprecationWarning: invalid escape sequence \s
  + "))\s*"
./nltk/toolbox.py:159: DeprecationWarning: invalid escape sequence \_
  """
./nltk/grammar.py:1278: DeprecationWarning: invalid escape sequence \*
  """
./nltk/grammar.py:1463: DeprecationWarning: invalid escape sequence \w
  _STANDARD_NONTERM_RE = re.compile("( [\w/][\w/^<>-]* ) \s*", re.VERBOSE)
./nltk/stem/porter.py:145: DeprecationWarning: invalid escape sequence \m
  """Returns the 'measure' of stem, per definition in the paper
./nltk/stem/lancaster.py:192: DeprecationWarning: invalid escape sequence \*
  valid_rule = re.compile("^[a-z]+\*?\d[a-z]*[>\.]?$")
./nltk/stem/lancaster.py:225: DeprecationWarning: invalid escape sequence \*
  valid_rule = re.compile("^([a-z]+)(\*?)(\d)([a-z]*)([>\.]?)$")
./nltk/treetransforms.py:8: DeprecationWarning: invalid escape sequence \ 
  """
./tools/nltk_term_index.py:52: DeprecationWarning: invalid escape sequence \s
  SCAN_RE1 = "<programlisting>[\s\S]*?</programlisting>"
./tools/nltk_term_index.py:53: DeprecationWarning: invalid escape sequence \s
  SCAN_RE2 = "<literal>[\s\S]*?</literal>"
./tools/nltk_term_index.py:56: DeprecationWarning: invalid escape sequence \w
  TOKEN_RE = re.compile('[\w\.]+')
./tools/find_deprecated.py:43: DeprecationWarning: invalid escape sequence \s
  '"""[\s\S]*?"""|'
./tools/find_deprecated.py:45: DeprecationWarning: invalid escape sequence \s
  "'''[\s\S]*?'''|"
./tools/find_deprecated.py:47: DeprecationWarning: invalid escape sequence \s
  ")\s*"
./tools/find_deprecated.py:64: DeprecationWarning: invalid escape sequence \.
  '({})\.read\('.format('|'.join(re.escape(n) for n in dir(nltk.corpus)))
./tools/find_deprecated.py:67: DeprecationWarning: invalid escape sequence \s
  CLASS_DEF_RE = re.compile('^\s*class\s+(\w+)\s*[:\(]')

tirkarthi le 19 janv. 2020

@gertjanwytynck Je les

ab-10 le 21 janv. 2020

🚀2

Cela a-t-il été terminé?

morrme le 19 oct. 2020

On dirait qu'il en reste encore quelques-uns. Je me demande si l'ajout d'un test unitaire pourrait aider.

./nltk/tools/nltk_term_index.py
./nltk/tools/find_deprecated.py
./nltk/nltk/tokenize/punkt.py

... et même si l'impact de la dépréciation des outils n'est pas grand, il y a un peu d'ironie que les scripts find_deprecated.py utilisent une syntaxe obsolète :)

$ git clone https://github.com/nltk/nltk.git
$ find . -iname '*.py' | xargs -P 4 -I{} python3.8 -Wall -m py_compile {}
./nltk/tools/nltk_term_index.py:51: DeprecationWarning: invalid escape sequence \s
  SCAN_RE1 = "<programlisting>[\s\S]*?</programlisting>"
./nltk/tools/nltk_term_index.py:52: DeprecationWarning: invalid escape sequence \s
  SCAN_RE2 = "<literal>[\s\S]*?</literal>"
./nltk/tools/nltk_term_index.py:55: DeprecationWarning: invalid escape sequence \w
  TOKEN_RE = re.compile('[\w\.]+')
./nltk/tools/find_deprecated.py:42: DeprecationWarning: invalid escape sequence \s
  '"""[\s\S]*?"""|'
./nltk/tools/find_deprecated.py:44: DeprecationWarning: invalid escape sequence \s
  "'''[\s\S]*?'''|"
./nltk/tools/find_deprecated.py:46: DeprecationWarning: invalid escape sequence \s
  ")\s*"
./nltk/tools/find_deprecated.py:63: DeprecationWarning: invalid escape sequence \.
  '({})\.read\('.format('|'.join(re.escape(n) for n in dir(nltk.corpus)))
./nltk/tools/find_deprecated.py:66: DeprecationWarning: invalid escape sequence \s
  CLASS_DEF_RE = re.compile('^\s*class\s+(\w+)\s*[:\(]')
./nltk/nltk/tokenize/punkt.py:223: DeprecationWarning: invalid escape sequence \]
  return "(?:[)\";}\]\*:@\'\({\[%s])" % re.escape("".join(set(self.sent_end_chars) - {"."}))

pombredanne le 19 oct. 2020

Nltk: Mettre à jour diverses séquences d'échappement de regex

Commentaire le plus utile

Tous les 14 commentaires

Questions connexes