As versões mais recentes do Python são mais restritas. escape em regex.
Por exemplo, com 3.6.8, existem mais de 10 avisos como este:
...
lib/python3.6/site-packages/nltk/featstruct.py:2092: DeprecationWarning: invalid escape sequence \d
RANGE_RE = re.compile('(-?\d+):(-?\d+)')
As regexs devem ser atualizadas para silenciar esses avisos.
Se não houver ninguém trabalhando nisso, eu gostaria. Você pode indicar as etapas para duplicar o problema, por favor?
@PabloDino Instale o Python 3.6.8 ou posterior e tente importar todos os módulos. A correção do regex ou usando strings brutas ou usando escape adequado de modo que isso funcione tanto no Python 2 quanto no 3
Estou trabalhando nisso - tenho feito alguns exercícios, mas não vejo nenhum aviso. Você pode postar um trecho de código no qual o manifesto de avisos pl
@PabloDino :
$ python --version
Python 3.6.8
$ git clone git://github.com/nltk/nltk.git
$ pip install pytest
$ pytest -vvs nltk/ --collect-only
========================================= warnings summary =========================================
nltk/nltk/featstruct.py:1295
/home/pombreda/tmp/nl/nltk/nltk/featstruct.py:1295: DeprecationWarning: invalid escape sequence \d
name, n = re.sub("\d+$", "", var.name), 2
nltk/nltk/featstruct.py:2091
/home/pombreda/tmp/nl/nltk/nltk/featstruct.py:2091: DeprecationWarning: invalid escape sequence \d
RANGE_RE = re.compile("(-?\d+):(-?\d+)")
nltk/nltk/sem/evaluate.py:307
/home/pombreda/tmp/nl/nltk/nltk/sem/evaluate.py:307: DeprecationWarning: invalid escape sequence \
"""
nltk/nltk/sem/relextract.py:128
/home/pombreda/tmp/nl/nltk/nltk/sem/relextract.py:128: DeprecationWarning: invalid escape sequence \w
ENT = re.compile("&(\w+?);")
nltk/nltk/sem/relextract.py:407
/home/pombreda/tmp/nl/nltk/nltk/sem/relextract.py:407: DeprecationWarning: invalid escape sequence \s
"""
nltk/nltk/sem/boxer.py:776
/home/pombreda/tmp/nl/nltk/nltk/sem/boxer.py:776: DeprecationWarning: invalid escape sequence \d
assert re.match("^[exps]\d+$", var), var
nltk/nltk/sem/drt.py:716
/home/pombreda/tmp/nl/nltk/nltk/sem/drt.py:716: DeprecationWarning: invalid escape sequence \
+ [" \ " + blank + line for line in term_lines[1:2]]
nltk/nltk/sem/drt.py:717
/home/pombreda/tmp/nl/nltk/nltk/sem/drt.py:717: DeprecationWarning: invalid escape sequence \
+ [" /\ " + var_string + line for line in term_lines[2:3]]
nltk/nltk/grammar.py:1291
/home/pombreda/tmp/nl/nltk/nltk/grammar.py:1291: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/grammar.py:1463
/home/pombreda/tmp/nl/nltk/nltk/grammar.py:1463: DeprecationWarning: invalid escape sequence \w
_STANDARD_NONTERM_RE = re.compile("( [\w/][\w/^<>-]* ) \s*", re.VERBOSE)
nltk/nltk/text.py:650
/home/pombreda/tmp/nl/nltk/nltk/text.py:650: DeprecationWarning: invalid escape sequence \w
_CONTEXT_RE = re.compile("\w+|[\.\!\?]")
nltk/nltk/tokenize/punkt.py:1462
/home/pombreda/tmp/nl/nltk/nltk/tokenize/punkt.py:1462: DeprecationWarning: invalid escape sequence \s
pat = "\s*".join(re.escape(c) for c in tok)
nltk/nltk/tokenize/regexp.py:100
/home/pombreda/tmp/nl/nltk/nltk/tokenize/regexp.py:100: DeprecationWarning: invalid escape sequence \w
"""
nltk/nltk/tokenize/regexp.py:193
/home/pombreda/tmp/nl/nltk/nltk/tokenize/regexp.py:193: DeprecationWarning: invalid escape sequence \w
"""
nltk/nltk/tokenize/repp.py:133
/home/pombreda/tmp/nl/nltk/nltk/tokenize/repp.py:133: DeprecationWarning: invalid escape sequence \(
line_regex = re.compile("^\((\d+), (\d+), (.+)\)$", re.MULTILINE)
nltk/nltk/tokenize/texttiling.py:96
/home/pombreda/tmp/nl/nltk/nltk/tokenize/texttiling.py:96: DeprecationWarning: invalid escape sequence \-
c for c in lowercase_text if re.match("[a-z\-' \n\t]", c)
nltk/nltk/tokenize/texttiling.py:229
/home/pombreda/tmp/nl/nltk/nltk/tokenize/texttiling.py:229: DeprecationWarning: invalid escape sequence \w
matches = re.finditer("\w+", text)
nltk/nltk/tokenize/toktok.py:53
/home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:53: DeprecationWarning: invalid escape sequence \]
FUNKY_PUNCT_1 = re.compile(u'([،;؛¿!"\])}»›”؟¡%٪°±©®।॥…])'), r" \1 "
nltk/nltk/tokenize/toktok.py:55
/home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:55: DeprecationWarning: invalid escape sequence \[
FUNKY_PUNCT_2 = re.compile(u"([({\[“‘„‚«‹「『])"), r" \1 "
nltk/nltk/tokenize/toktok.py:62
/home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:62: DeprecationWarning: invalid escape sequence \|
PIPE = re.compile("\|"), " | "
nltk/nltk/tokenize/treebank.py:269
/home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:269: DeprecationWarning: invalid escape sequence \]
"""
nltk/nltk/tokenize/treebank.py:273
/home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:273: DeprecationWarning: invalid escape sequence \s
re.compile(pattern.replace("(?#X)", "\s"))
nltk/nltk/tokenize/treebank.py:277
/home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:277: DeprecationWarning: invalid escape sequence \s
re.compile(pattern.replace("(?#X)", "\s"))
nltk/nltk/tree.py:99
/home/pombreda/tmp/nl/nltk/nltk/tree.py:99: DeprecationWarning: invalid escape sequence \
"""
nltk/nltk/tree.py:652
/home/pombreda/tmp/nl/nltk/nltk/tree.py:652: DeprecationWarning: invalid escape sequence \s
if re.search("\s", brackets):
nltk/nltk/tree.py:658
/home/pombreda/tmp/nl/nltk/nltk/tree.py:658: DeprecationWarning: invalid escape sequence \s
node_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)
nltk/nltk/tree.py:660
/home/pombreda/tmp/nl/nltk/nltk/tree.py:660: DeprecationWarning: invalid escape sequence \s
leaf_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)
nltk/nltk/tree.py:662
/home/pombreda/tmp/nl/nltk/nltk/tree.py:662: DeprecationWarning: invalid escape sequence \s
"%s\s*(%s)?|%s|(%s)"
nltk/nltk/tree.py:900
/home/pombreda/tmp/nl/nltk/nltk/tree.py:900: DeprecationWarning: invalid escape sequence \$
reserved_chars = re.compile("([#\$%&~_\{\}])")
nltk/nltk/parse/chart.py:1034
/home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1034: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/chart.py:1073
/home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1073: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/chart.py:1128
/home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1128: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/chart.py:1148
/home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1148: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/chart.py:1218
/home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1218: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/chart.py:1241
/home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1241: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/featurechart.py:270
/home/pombreda/tmp/nl/nltk/nltk/parse/featurechart.py:270: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/featurechart.py:369
/home/pombreda/tmp/nl/nltk/nltk/parse/featurechart.py:369: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/tag/sequential.py:730
/home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:730: DeprecationWarning: invalid escape sequence \w
elif re.match("\w+$", word):
nltk/nltk/tag/sequential.py:724
/home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:724: DeprecationWarning: invalid escape sequence \W
elif re.match("\W+$", word):
nltk/nltk/tag/sequential.py:722
/home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:722: DeprecationWarning: invalid escape sequence \.
if re.match("[0-9]+(\.[0-9]*)?|[0-9]*\.[0-9]+$", word):
nltk/nltk/classify/rte_classify.py:61
/home/pombreda/tmp/nl/nltk/nltk/classify/rte_classify.py:61: DeprecationWarning: invalid escape sequence \w
tokenizer = RegexpTokenizer("[\w.@:/]+|\w+|\$[\d.]+")
nltk/nltk/classify/maxent.py:1351
/home/pombreda/tmp/nl/nltk/nltk/classify/maxent.py:1351: DeprecationWarning: invalid escape sequence \
"""
nltk/nltk/chunk/util.py:371
/home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:371: DeprecationWarning: invalid escape sequence \S
_LINE_RE = re.compile("(\S+)\s+(\S+)\s+([IOB])-?(\S+)?")
nltk/nltk/chunk/util.py:517
/home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:517: DeprecationWarning: invalid escape sequence \w
_IEER_TYPE_RE = re.compile('<b_\w+\s+[^>]*?type="(?P<type>\w+)"')
nltk/nltk/chunk/util.py:526
/home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:526: DeprecationWarning: invalid escape sequence \s
for piece_m in re.finditer("<[^>]+>|[^\s<]+", s):
nltk/nltk/chunk/regexp.py:70
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:70: DeprecationWarning: invalid escape sequence \{
_BRACKETS = re.compile("[^\{\}]+")
nltk/nltk/chunk/regexp.py:215
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:215: DeprecationWarning: invalid escape sequence \{
s = re.sub("\{\}", "", s)
nltk/nltk/chunk/regexp.py:426
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:426: DeprecationWarning: invalid escape sequence \g
RegexpChunkRule.__init__(self, regexp, "{\g<chunk>}", descr)
nltk/nltk/chunk/regexp.py:471
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:471: DeprecationWarning: invalid escape sequence \g
RegexpChunkRule.__init__(self, regexp, "}\g<chink>{", descr)
nltk/nltk/chunk/regexp.py:510
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:510: DeprecationWarning: invalid escape sequence \{
regexp = re.compile("\{(?P<chunk>%s)\}" % tag_pattern2re_pattern(tag_pattern))
nltk/nltk/chunk/regexp.py:511
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:511: DeprecationWarning: invalid escape sequence \g
RegexpChunkRule.__init__(self, regexp, "\g<chunk>", descr)
nltk/nltk/chunk/regexp.py:575
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:575: DeprecationWarning: invalid escape sequence \g
RegexpChunkRule.__init__(self, regexp, "\g<left>", descr)
nltk/nltk/chunk/regexp.py:708
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:708: DeprecationWarning: invalid escape sequence \{
"(?P<left>%s)\{(?P<right>%s)"
nltk/nltk/chunk/regexp.py:714
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:714: DeprecationWarning: invalid escape sequence \g
RegexpChunkRule.__init__(self, regexp, "{\g<left>\g<right>", descr)
nltk/nltk/chunk/regexp.py:778
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:778: DeprecationWarning: invalid escape sequence \}
"(?P<left>%s)\}(?P<right>%s)"
nltk/nltk/chunk/regexp.py:784
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:784: DeprecationWarning: invalid escape sequence \g
RegexpChunkRule.__init__(self, regexp, "\g<left>\g<right>}", descr)
nltk/nltk/chunk/regexp.py:896
nltk/nltk/chunk/regexp.py:896
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:896: DeprecationWarning: invalid escape sequence \{
r"^((%s|<%s>)*)$" % ("([^\{\}<>]|\{\d+,?\}|\{\d*,\d+\})+", "[^\{\}<>]+")
nltk/nltk/chunk/regexp.py:1175
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:1175: DeprecationWarning: invalid escape sequence \.
"""
nltk/nltk/inference/discourse.py:44
/home/pombreda/tmp/nl/nltk/nltk/inference/discourse.py:44: DeprecationWarning: invalid escape sequence \
"""
nltk/nltk/stem/lancaster.py:192
/home/pombreda/tmp/nl/nltk/nltk/stem/lancaster.py:192: DeprecationWarning: invalid escape sequence \*
valid_rule = re.compile("^[a-z]+\*?\d[a-z]*[>\.]?$")
nltk/nltk/stem/lancaster.py:225
/home/pombreda/tmp/nl/nltk/nltk/stem/lancaster.py:225: DeprecationWarning: invalid escape sequence \*
valid_rule = re.compile("^([a-z]+)(\*?)(\d)([a-z]*)([>\.]?)$")
nltk/nltk/stem/porter.py:177
/home/pombreda/tmp/nl/nltk/nltk/stem/porter.py:177: DeprecationWarning: invalid escape sequence \m
"""
nltk/nltk/corpus/__init__.py:116
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:116: DeprecationWarning: invalid escape sequence \.
".*\.(test|train).*",
nltk/nltk/corpus/__init__.py:123
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:123: DeprecationWarning: invalid escape sequence \.
".*\.(test|train).*",
nltk/nltk/corpus/__init__.py:126
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:126: DeprecationWarning: invalid escape sequence \.
crubadan = LazyCorpusLoader("crubadan", CrubadanCorpusReader, ".*\.txt")
nltk/nltk/corpus/__init__.py:128
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:128: DeprecationWarning: invalid escape sequence \.
"dependency_treebank", DependencyCorpusReader, ".*\.dp", encoding="ascii"
nltk/nltk/corpus/__init__.py:311
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:311: DeprecationWarning: invalid escape sequence \.
"timit", TimitTaggedCorpusReader, ".+\.tags", tagset="wsj", encoding="ascii"
nltk/nltk/corpus/__init__.py:335
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:335: DeprecationWarning: invalid escape sequence \.
twitter_samples = LazyCorpusLoader("twitter_samples", TwitterCorpusReader, ".*\.json")
nltk/nltk/corpus/__init__.py:364
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:364: DeprecationWarning: invalid escape sequence \.
wordnet_ic = LazyCorpusLoader("wordnet_ic", WordNetICCorpusReader, ".*\.dat")
nltk/nltk/corpus/__init__.py:374
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:374: DeprecationWarning: invalid escape sequence \.
"frames/.*\.xml",
nltk/nltk/corpus/__init__.py:383
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:383: DeprecationWarning: invalid escape sequence \.
"frames/.*\.xml",
nltk/nltk/corpus/__init__.py:392
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:392: DeprecationWarning: invalid escape sequence \.
"frames/.*\.xml",
nltk/nltk/corpus/__init__.py:401
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:401: DeprecationWarning: invalid escape sequence \.
"frames/.*\.xml",
nltk/nltk/corpus/reader/plaintext.py:62
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/plaintext.py:62: DeprecationWarning: invalid escape sequence \.
"""
nltk/nltk/corpus/reader/util.py:635
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/util.py:635: DeprecationWarning: invalid escape sequence \d
if re.match("^\d+-\d+", line) is not None:
nltk/nltk/corpus/reader/util.py:859
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/util.py:859: DeprecationWarning: invalid escape sequence \s
if re.match("======+\s*$", line):
nltk/nltk/corpus/reader/api.py:77
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/api.py:77: DeprecationWarning: invalid escape sequence \.
m = re.match("(.*\.zip)/?(.*)$|", root)
nltk/nltk/corpus/reader/timit.py:165
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/timit.py:165: DeprecationWarning: invalid escape sequence \.
encoding = [(".*\.wav", None), (".*", encoding)]
nltk/nltk/corpus/reader/bracket_parse.py:214
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/bracket_parse.py:214: DeprecationWarning: invalid escape sequence \.
"alpino\.xml",
nltk/nltk/corpus/reader/xmldocs.py:232
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/xmldocs.py:232: DeprecationWarning: invalid escape sequence \s
_XML_TAG_NAME = re.compile("<\s*/?\s*([^\s>]+)")
nltk/nltk/toolbox.py:209
/home/pombreda/tmp/nl/nltk/nltk/toolbox.py:209: DeprecationWarning: invalid escape sequence \_
"""
nltk/nltk/corpus/reader/bnc.py:29
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/bnc.py:29: DeprecationWarning: invalid escape sequence \w
"""
nltk/nltk/corpus/reader/switchboard.py:113
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/switchboard.py:113: DeprecationWarning: invalid escape sequence \w
_UTTERANCE_RE = re.compile("(\w+)\.(\d+)\:\s*(.*)")
nltk/nltk/corpus/reader/childes.py:281
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/childes.py:281: DeprecationWarning: invalid escape sequence \d
m = re.match("P(\d+)Y(\d+)M?(\d?\d?)D?", age_year)
nltk/nltk/corpus/reader/framenet.py:2753
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/framenet.py:2753: DeprecationWarning: invalid escape sequence \w
"""
nltk/nltk/corpus/reader/udhr.py:30
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/udhr.py:30: DeprecationWarning: invalid escape sequence \-
("Abkhaz\-Cyrillic\+Abkh", "cp1251"),
nltk/nltk/corpus/reader/twitter.py:54
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/twitter.py:54: DeprecationWarning: invalid escape sequence \.
"""
nltk/nltk/ccg/combinator.py:225
/home/pombreda/tmp/nl/nltk/nltk/ccg/combinator.py:225: DeprecationWarning: invalid escape sequence \Y
"""
nltk/nltk/treetransforms.py:108
/home/pombreda/tmp/nl/nltk/nltk/treetransforms.py:108: DeprecationWarning: invalid escape sequence \
"""
E FWIW: https://docs.python.org/3/reference/lexical_analysis.html#string -and-bytes-literals
Ao contrário do C padrão, todas as sequências de escape não reconhecidas são deixadas na string inalteradas, ou seja, a barra invertida é deixada no resultado. (Este comportamento é útil durante a depuração: se uma sequência de escape for digitada incorretamente, a saída resultante será mais facilmente reconhecida como quebrada.) Também é importante observar que as sequências de escape reconhecidas apenas em literais de string se enquadram na categoria de escapes não reconhecidos para bytes literais.
Alterado na versão 3.6: Sequências de escape não reconhecidas produzem um aviso de depreciação. Em alguma versão futura do Python, eles serão um SyntaxError.
$ python --version
Python 3.6.7
$ pytest --version
Esta é a versão 5.1.2 do pytest , importada de
cachedir: .pytest_cache
rootdir: ** / nltk
coletou 381 itens
Testes de unidade para nltk.compat.
Veja também nltk / test / compat.doctest.
Testes de unidade para nltk.metrics.aline
Teste o algoritmo Aline para alinhar sequências fonéticas
Teste uma linha para calcular a diferença entre dois segmentos
Testes para tagger Brill.
Teste para bug https://github.com/nltk/nltk/issues/1597
Ensures that curly bracket quantifiers can be used inside a chunk rule.
This type of quantifier has been used for the supplementary example
in http://www.nltk.org/book/ch07.html#exploring-text-corpora.
Testes de unidade para nltk.classify. Veja também: nltk / test / classify.doctest
Texto construído usando: http://www.nltk.org/book/ch01.html
Teste simulado para wrappers Stanford CoreNLP.
Testes de regressão de visualização de corpus
Aula contendo testes de unidade para nltk.metrics.agreement.Disagreement.
Teste mais avançado, baseado em
http://www.agreestat.com/research_papers/onkrippendorffalpha.pdf
Mesmo exemplo mais avançado, mas com 1 classificação removida.
Mais uma vez, a remoção dessa classificação de 1 não deve importar.
Teste simples, baseado em
https://github.com/foolswood/krippendorffs_alpha/raw/master/krippendorff.pdf.
Mesmo teste simples com 1 classificação removida.
A remoção dessa classificação não deve importar: K-Apha ignora itens com
apenas 1 classificação.
Testes de regressão para json2csv()
e json2csv_entities()
no Twitter
pacote.
Verifique se a comparação do arquivo não está fornecendo falsos positivos.
Testes de unidade para nltk.corpus.nombank
Testes para nltk.pos_tag
O teste a seguir executa uma série aleatória de leituras, buscas e
informa e verifica se os resultados são consistentes.
Testes unitários para Senna
Unittest para nltk.classify.senna
Interface de pipeline Senna
Unittest para nltk.tag.senna
este teste de unidade para testar o lematizador de luz árabe bola de neve
este lematizador lida com prefixos e sufixos
Teste para bug https://github.com/nltk/nltk/issues/1581
Ensures that 'oed' can be stemmed without throwing an error.
<TestCaseFunction test_vocabulary_martin_mode>
Tests all words from the test vocabulary provided by M Porter
The sample vocabulary and output were sourced from:
http://tartarus.org/martin/PorterStemmer/voc.txt
http://tartarus.org/martin/PorterStemmer/output.txt
and are linked to from the Porter Stemmer algorithm's homepage
at
http://tartarus.org/martin/PorterStemmer/
<TestCaseFunction test_vocabulary_nltk_mode>
<TestCaseFunction test_vocabulary_original_mode>
Testes de unidade para nltk.tgrep.
Classe contendo testes de unidade para nltk.tgrep.
Teste o tratamento de erros de operadores tgrep indefinidos.
Teste se os comentários são filtrados corretamente da pesquisa tgrep
cordas.
Teste os exemplos básicos do manual TGrep2.
Teste os nós rotulados.
Test case from Emily M. Bender.
<TestCaseFunction test_multiple_conjs>
Test that multiple (3 or more) conjunctions of node relations are
handled properly.
<TestCaseFunction test_node_encoding>
Test that tgrep search strings handles bytes and strs the same
way.
<TestCaseFunction test_node_nocase>
Test selecting nodes using case insensitive node names.
<TestCaseFunction test_node_noleaves>
Test node name matching with the search_leaves flag set to False.
<TestCaseFunction test_node_printing>
Test that the tgrep print operator ' is properly ignored.
<TestCaseFunction test_node_quoted>
Test selecting nodes using quoted node names.
<TestCaseFunction test_node_regex>
Test regex matching on nodes.
<TestCaseFunction test_node_regex_2>
Test regex matching on nodes.
<TestCaseFunction test_node_simple>
Test a simple use of tgrep for finding nodes matching a given
pattern.
<TestCaseFunction test_node_tree_position>
Test matching on nodes based on NLTK tree position.
<TestCaseFunction test_rel_precedence>
Test matching nodes based on precedence relations.
<TestCaseFunction test_rel_sister_nodes>
Test matching sister nodes in a tree.
<TestCaseFunction test_tokenize_encoding>
Test that tokenization handles bytes and strs the same way.
<TestCaseFunction test_tokenize_examples>
Test tokenization of the TGrep2 manual example patterns.
<TestCaseFunction test_tokenize_link_types>
Test tokenization of basic link types.
<TestCaseFunction test_tokenize_macros>
Test tokenization of macro definitions.
<TestCaseFunction test_tokenize_node_labels>
Test tokenization of labeled nodes.
<TestCaseFunction test_tokenize_nodenames>
Test tokenization of node names.
<TestCaseFunction test_tokenize_quoting>
Test tokenization of quoting.
<TestCaseFunction test_tokenize_segmented_patterns>
Test tokenization of segmented patterns.
<TestCaseFunction test_tokenize_simple>
Simple test of tokenization.
<TestCaseFunction test_trailing_semicolon>
Test that semicolons at the end of a tgrep2 search string won't
cause a parse failure.
<TestCaseFunction test_use_macros>
Test defining and using tgrep2 macros.
<TestCaseFunction tests_rel_dominance>
Test matching nodes based on dominance relations.
<TestCaseFunction tests_rel_indexed_children>
Test matching nodes based on their index in their parent node.
Testes de unidade para nltk.tokenize.
Veja também nltk / test / tokenize.doctest
Teste o preenchimento de asterisco para tokenização de palavras.
Teste o preenchimento de dotdot * para tokenização de palavras.
Teste uma string que se pareça com um número de telefone, mas contém uma nova linha
Teste remove_handle () de casual.py com casos extremos especialmente criados
Teste o tokenizer SyllableTokenizer.
Teste o Stanford Word Segmenter para árabe (configuração padrão)
Teste o Stanford Word Segmenter para chinês (configuração padrão)
Teste a função TreebankWordTokenizer.span_tokenize
Teste o TweetTokenizer usando palavras com caracteres especiais e acentuados.
Testar função word_tokenize
Testes para partes estáticas do pacote Twitter
Testa se as informações de credenciais do Twitter do arquivo são tratadas corretamente.
O arquivo de credenciais padrão é identificado
O arquivo de credenciais padrão foi lido corretamente
O caminho para o arquivo de credenciais padrão está bem formado, conforme especificado
subdir.
Definir subdir como caminho vazio deve gerar um erro.
Definir subdiretório como None
deve gerar um erro.
Teste se a variável de ambiente foi lida corretamente.
O arquivo de credenciais 'bad_oauth1-1.txt' está incompleto
A primeira chave no arquivo de credenciais 'bad_oauth1-2.txt' está malformada
A primeira chave no arquivo de credenciais 'bad_oauth1-2.txt' está malformada
Definir subdir como um diretório inexistente deve gerar um erro.
Os padrões para autenticação falharão, pois 'credentials.txt' não
presente no subdiretório padrão, conforme lido em os.environ['TWITTER']
.
O arquivo de credenciais 'foobar' não pode ser encontrado no subdiretório padrão.
Testes de unidade para nltk.corpus.wordnet
Veja também nltk / test / wordnet.doctest
Testes para NgramCounter que envolvem apenas pesquisa, nenhuma modificação.
Testes de unidade para o modelo MLE ngram.
Testes de modelo trigrama MLE
Testes de unidade para a classe Lidstone
Testes de unidade para a classe Laplace
Usando o modelo MLE, gere algum texto.
testa a classe de vocabulário
Testes para métrica de avaliação de tradução BLEU
Exemplos do artigo BLEU original
http://www.aclweb.org/anthology/P02-1040.pdf
Testa alinhamentos GDFA
Testando GDFA com as primeiras 10 saídas eflomal da edição # 1829
https://github.com/nltk/nltk/issues/1829
Testes para métodos de treinamento IBM Model 1
Testes para métodos de treinamento IBM Model 2
Testes para métodos de treinamento IBM Model 3
Testes para métodos de treinamento IBM Model 4
Comentários muito úteis
Se não houver ninguém trabalhando nisso, eu gostaria. Você pode indicar as etapas para duplicar o problema, por favor?