์ต์ ๋ฒ์ ์ Python์ ๋ ์๊ฒฉํ wrt์
๋๋ค. ์ ๊ท์์์ ์ด์ค์ผ์ดํํ์ญ์์ค.
์๋ฅผ ๋ค์ด 3.6.8์์๋ ๋ค์๊ณผ ๊ฐ์ 10 ๊ฐ ์ด์์ ๊ฒฝ๊ณ ๊ฐ ์์ต๋๋ค.
...
lib/python3.6/site-packages/nltk/featstruct.py:2092: DeprecationWarning: invalid escape sequence \d
RANGE_RE = re.compile('(-?\d+):(-?\d+)')
์ด๋ฌํ ๊ฒฝ๊ณ ๋ฅผ ๋ฌด์์ผ๋ก ์ค์ ํ๋ ค๋ฉด ์ ๊ท์์ ์ ๋ฐ์ดํธํด์ผํฉ๋๋ค.
์ด ์์ ์ํ๋ ์ฌ๋์ด ์์ผ๋ฉดํ๊ณ ์ถ์ต๋๋ค. ๋ฌธ์ ๋ฅผ ๋ณต์ ํ๋ ๋จ๊ณ๋ฅผ ๋ง์ํด ์ฃผ์๊ฒ ์ต๋๊น?
@PabloDino Python 3.6.8 ์ด์์ ์ค์นํ๊ณ ๋ชจ๋ ๋ชจ๋์ ๊ฐ์ ธ ์ค์ญ์์ค. ์์ ๋ฌธ์์ด์ ์ฌ์ฉํ๊ฑฐ๋ ์ ์ ํ ์ด์ค์ผ์ดํ๋ฅผ ์ฌ์ฉํ์ฌ ์ ๊ท์์ ์์ ํ์ฌ Python 2์ 3 ๋ชจ๋์์ ์๋ํฉ๋๋ค.
๋ช ๊ฐ์ง ์ฐ์ต์ํ์ง๋ง ๊ฒฝ๊ณ ๊ฐ ํ์๋์ง ์์์ต๋๋ค. ๊ฒฝ๊ณ ๊ฐ ํ์๋๋ ์ฝ๋ ์ค ๋ํซ์ ๊ฒ์ ํ ์ ์์ต๋๊น?
@PabloDino :
$ python --version
Python 3.6.8
$ git clone git://github.com/nltk/nltk.git
$ pip install pytest
$ pytest -vvs nltk/ --collect-only
========================================= warnings summary =========================================
nltk/nltk/featstruct.py:1295
/home/pombreda/tmp/nl/nltk/nltk/featstruct.py:1295: DeprecationWarning: invalid escape sequence \d
name, n = re.sub("\d+$", "", var.name), 2
nltk/nltk/featstruct.py:2091
/home/pombreda/tmp/nl/nltk/nltk/featstruct.py:2091: DeprecationWarning: invalid escape sequence \d
RANGE_RE = re.compile("(-?\d+):(-?\d+)")
nltk/nltk/sem/evaluate.py:307
/home/pombreda/tmp/nl/nltk/nltk/sem/evaluate.py:307: DeprecationWarning: invalid escape sequence \
"""
nltk/nltk/sem/relextract.py:128
/home/pombreda/tmp/nl/nltk/nltk/sem/relextract.py:128: DeprecationWarning: invalid escape sequence \w
ENT = re.compile("&(\w+?);")
nltk/nltk/sem/relextract.py:407
/home/pombreda/tmp/nl/nltk/nltk/sem/relextract.py:407: DeprecationWarning: invalid escape sequence \s
"""
nltk/nltk/sem/boxer.py:776
/home/pombreda/tmp/nl/nltk/nltk/sem/boxer.py:776: DeprecationWarning: invalid escape sequence \d
assert re.match("^[exps]\d+$", var), var
nltk/nltk/sem/drt.py:716
/home/pombreda/tmp/nl/nltk/nltk/sem/drt.py:716: DeprecationWarning: invalid escape sequence \
+ [" \ " + blank + line for line in term_lines[1:2]]
nltk/nltk/sem/drt.py:717
/home/pombreda/tmp/nl/nltk/nltk/sem/drt.py:717: DeprecationWarning: invalid escape sequence \
+ [" /\ " + var_string + line for line in term_lines[2:3]]
nltk/nltk/grammar.py:1291
/home/pombreda/tmp/nl/nltk/nltk/grammar.py:1291: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/grammar.py:1463
/home/pombreda/tmp/nl/nltk/nltk/grammar.py:1463: DeprecationWarning: invalid escape sequence \w
_STANDARD_NONTERM_RE = re.compile("( [\w/][\w/^<>-]* ) \s*", re.VERBOSE)
nltk/nltk/text.py:650
/home/pombreda/tmp/nl/nltk/nltk/text.py:650: DeprecationWarning: invalid escape sequence \w
_CONTEXT_RE = re.compile("\w+|[\.\!\?]")
nltk/nltk/tokenize/punkt.py:1462
/home/pombreda/tmp/nl/nltk/nltk/tokenize/punkt.py:1462: DeprecationWarning: invalid escape sequence \s
pat = "\s*".join(re.escape(c) for c in tok)
nltk/nltk/tokenize/regexp.py:100
/home/pombreda/tmp/nl/nltk/nltk/tokenize/regexp.py:100: DeprecationWarning: invalid escape sequence \w
"""
nltk/nltk/tokenize/regexp.py:193
/home/pombreda/tmp/nl/nltk/nltk/tokenize/regexp.py:193: DeprecationWarning: invalid escape sequence \w
"""
nltk/nltk/tokenize/repp.py:133
/home/pombreda/tmp/nl/nltk/nltk/tokenize/repp.py:133: DeprecationWarning: invalid escape sequence \(
line_regex = re.compile("^\((\d+), (\d+), (.+)\)$", re.MULTILINE)
nltk/nltk/tokenize/texttiling.py:96
/home/pombreda/tmp/nl/nltk/nltk/tokenize/texttiling.py:96: DeprecationWarning: invalid escape sequence \-
c for c in lowercase_text if re.match("[a-z\-' \n\t]", c)
nltk/nltk/tokenize/texttiling.py:229
/home/pombreda/tmp/nl/nltk/nltk/tokenize/texttiling.py:229: DeprecationWarning: invalid escape sequence \w
matches = re.finditer("\w+", text)
nltk/nltk/tokenize/toktok.py:53
/home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:53: DeprecationWarning: invalid escape sequence \]
FUNKY_PUNCT_1 = re.compile(u'([ุ;ุยฟ!"\])}ยปโบโุยก%ูชยฐยฑยฉยฎเฅคเฅฅโฆ])'), r" \1 "
nltk/nltk/tokenize/toktok.py:55
/home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:55: DeprecationWarning: invalid escape sequence \[
FUNKY_PUNCT_2 = re.compile(u"([({\[โโโโยซโนใใ])"), r" \1 "
nltk/nltk/tokenize/toktok.py:62
/home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:62: DeprecationWarning: invalid escape sequence \|
PIPE = re.compile("\|"), " | "
nltk/nltk/tokenize/treebank.py:269
/home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:269: DeprecationWarning: invalid escape sequence \]
"""
nltk/nltk/tokenize/treebank.py:273
/home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:273: DeprecationWarning: invalid escape sequence \s
re.compile(pattern.replace("(?#X)", "\s"))
nltk/nltk/tokenize/treebank.py:277
/home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:277: DeprecationWarning: invalid escape sequence \s
re.compile(pattern.replace("(?#X)", "\s"))
nltk/nltk/tree.py:99
/home/pombreda/tmp/nl/nltk/nltk/tree.py:99: DeprecationWarning: invalid escape sequence \
"""
nltk/nltk/tree.py:652
/home/pombreda/tmp/nl/nltk/nltk/tree.py:652: DeprecationWarning: invalid escape sequence \s
if re.search("\s", brackets):
nltk/nltk/tree.py:658
/home/pombreda/tmp/nl/nltk/nltk/tree.py:658: DeprecationWarning: invalid escape sequence \s
node_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)
nltk/nltk/tree.py:660
/home/pombreda/tmp/nl/nltk/nltk/tree.py:660: DeprecationWarning: invalid escape sequence \s
leaf_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)
nltk/nltk/tree.py:662
/home/pombreda/tmp/nl/nltk/nltk/tree.py:662: DeprecationWarning: invalid escape sequence \s
"%s\s*(%s)?|%s|(%s)"
nltk/nltk/tree.py:900
/home/pombreda/tmp/nl/nltk/nltk/tree.py:900: DeprecationWarning: invalid escape sequence \$
reserved_chars = re.compile("([#\$%&~_\{\}])")
nltk/nltk/parse/chart.py:1034
/home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1034: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/chart.py:1073
/home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1073: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/chart.py:1128
/home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1128: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/chart.py:1148
/home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1148: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/chart.py:1218
/home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1218: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/chart.py:1241
/home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1241: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/featurechart.py:270
/home/pombreda/tmp/nl/nltk/nltk/parse/featurechart.py:270: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/parse/featurechart.py:369
/home/pombreda/tmp/nl/nltk/nltk/parse/featurechart.py:369: DeprecationWarning: invalid escape sequence \*
"""
nltk/nltk/tag/sequential.py:730
/home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:730: DeprecationWarning: invalid escape sequence \w
elif re.match("\w+$", word):
nltk/nltk/tag/sequential.py:724
/home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:724: DeprecationWarning: invalid escape sequence \W
elif re.match("\W+$", word):
nltk/nltk/tag/sequential.py:722
/home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:722: DeprecationWarning: invalid escape sequence \.
if re.match("[0-9]+(\.[0-9]*)?|[0-9]*\.[0-9]+$", word):
nltk/nltk/classify/rte_classify.py:61
/home/pombreda/tmp/nl/nltk/nltk/classify/rte_classify.py:61: DeprecationWarning: invalid escape sequence \w
tokenizer = RegexpTokenizer("[\w.@:/]+|\w+|\$[\d.]+")
nltk/nltk/classify/maxent.py:1351
/home/pombreda/tmp/nl/nltk/nltk/classify/maxent.py:1351: DeprecationWarning: invalid escape sequence \
"""
nltk/nltk/chunk/util.py:371
/home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:371: DeprecationWarning: invalid escape sequence \S
_LINE_RE = re.compile("(\S+)\s+(\S+)\s+([IOB])-?(\S+)?")
nltk/nltk/chunk/util.py:517
/home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:517: DeprecationWarning: invalid escape sequence \w
_IEER_TYPE_RE = re.compile('<b_\w+\s+[^>]*?type="(?P<type>\w+)"')
nltk/nltk/chunk/util.py:526
/home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:526: DeprecationWarning: invalid escape sequence \s
for piece_m in re.finditer("<[^>]+>|[^\s<]+", s):
nltk/nltk/chunk/regexp.py:70
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:70: DeprecationWarning: invalid escape sequence \{
_BRACKETS = re.compile("[^\{\}]+")
nltk/nltk/chunk/regexp.py:215
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:215: DeprecationWarning: invalid escape sequence \{
s = re.sub("\{\}", "", s)
nltk/nltk/chunk/regexp.py:426
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:426: DeprecationWarning: invalid escape sequence \g
RegexpChunkRule.__init__(self, regexp, "{\g<chunk>}", descr)
nltk/nltk/chunk/regexp.py:471
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:471: DeprecationWarning: invalid escape sequence \g
RegexpChunkRule.__init__(self, regexp, "}\g<chink>{", descr)
nltk/nltk/chunk/regexp.py:510
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:510: DeprecationWarning: invalid escape sequence \{
regexp = re.compile("\{(?P<chunk>%s)\}" % tag_pattern2re_pattern(tag_pattern))
nltk/nltk/chunk/regexp.py:511
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:511: DeprecationWarning: invalid escape sequence \g
RegexpChunkRule.__init__(self, regexp, "\g<chunk>", descr)
nltk/nltk/chunk/regexp.py:575
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:575: DeprecationWarning: invalid escape sequence \g
RegexpChunkRule.__init__(self, regexp, "\g<left>", descr)
nltk/nltk/chunk/regexp.py:708
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:708: DeprecationWarning: invalid escape sequence \{
"(?P<left>%s)\{(?P<right>%s)"
nltk/nltk/chunk/regexp.py:714
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:714: DeprecationWarning: invalid escape sequence \g
RegexpChunkRule.__init__(self, regexp, "{\g<left>\g<right>", descr)
nltk/nltk/chunk/regexp.py:778
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:778: DeprecationWarning: invalid escape sequence \}
"(?P<left>%s)\}(?P<right>%s)"
nltk/nltk/chunk/regexp.py:784
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:784: DeprecationWarning: invalid escape sequence \g
RegexpChunkRule.__init__(self, regexp, "\g<left>\g<right>}", descr)
nltk/nltk/chunk/regexp.py:896
nltk/nltk/chunk/regexp.py:896
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:896: DeprecationWarning: invalid escape sequence \{
r"^((%s|<%s>)*)$" % ("([^\{\}<>]|\{\d+,?\}|\{\d*,\d+\})+", "[^\{\}<>]+")
nltk/nltk/chunk/regexp.py:1175
/home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:1175: DeprecationWarning: invalid escape sequence \.
"""
nltk/nltk/inference/discourse.py:44
/home/pombreda/tmp/nl/nltk/nltk/inference/discourse.py:44: DeprecationWarning: invalid escape sequence \
"""
nltk/nltk/stem/lancaster.py:192
/home/pombreda/tmp/nl/nltk/nltk/stem/lancaster.py:192: DeprecationWarning: invalid escape sequence \*
valid_rule = re.compile("^[a-z]+\*?\d[a-z]*[>\.]?$")
nltk/nltk/stem/lancaster.py:225
/home/pombreda/tmp/nl/nltk/nltk/stem/lancaster.py:225: DeprecationWarning: invalid escape sequence \*
valid_rule = re.compile("^([a-z]+)(\*?)(\d)([a-z]*)([>\.]?)$")
nltk/nltk/stem/porter.py:177
/home/pombreda/tmp/nl/nltk/nltk/stem/porter.py:177: DeprecationWarning: invalid escape sequence \m
"""
nltk/nltk/corpus/__init__.py:116
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:116: DeprecationWarning: invalid escape sequence \.
".*\.(test|train).*",
nltk/nltk/corpus/__init__.py:123
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:123: DeprecationWarning: invalid escape sequence \.
".*\.(test|train).*",
nltk/nltk/corpus/__init__.py:126
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:126: DeprecationWarning: invalid escape sequence \.
crubadan = LazyCorpusLoader("crubadan", CrubadanCorpusReader, ".*\.txt")
nltk/nltk/corpus/__init__.py:128
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:128: DeprecationWarning: invalid escape sequence \.
"dependency_treebank", DependencyCorpusReader, ".*\.dp", encoding="ascii"
nltk/nltk/corpus/__init__.py:311
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:311: DeprecationWarning: invalid escape sequence \.
"timit", TimitTaggedCorpusReader, ".+\.tags", tagset="wsj", encoding="ascii"
nltk/nltk/corpus/__init__.py:335
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:335: DeprecationWarning: invalid escape sequence \.
twitter_samples = LazyCorpusLoader("twitter_samples", TwitterCorpusReader, ".*\.json")
nltk/nltk/corpus/__init__.py:364
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:364: DeprecationWarning: invalid escape sequence \.
wordnet_ic = LazyCorpusLoader("wordnet_ic", WordNetICCorpusReader, ".*\.dat")
nltk/nltk/corpus/__init__.py:374
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:374: DeprecationWarning: invalid escape sequence \.
"frames/.*\.xml",
nltk/nltk/corpus/__init__.py:383
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:383: DeprecationWarning: invalid escape sequence \.
"frames/.*\.xml",
nltk/nltk/corpus/__init__.py:392
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:392: DeprecationWarning: invalid escape sequence \.
"frames/.*\.xml",
nltk/nltk/corpus/__init__.py:401
/home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:401: DeprecationWarning: invalid escape sequence \.
"frames/.*\.xml",
nltk/nltk/corpus/reader/plaintext.py:62
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/plaintext.py:62: DeprecationWarning: invalid escape sequence \.
"""
nltk/nltk/corpus/reader/util.py:635
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/util.py:635: DeprecationWarning: invalid escape sequence \d
if re.match("^\d+-\d+", line) is not None:
nltk/nltk/corpus/reader/util.py:859
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/util.py:859: DeprecationWarning: invalid escape sequence \s
if re.match("======+\s*$", line):
nltk/nltk/corpus/reader/api.py:77
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/api.py:77: DeprecationWarning: invalid escape sequence \.
m = re.match("(.*\.zip)/?(.*)$|", root)
nltk/nltk/corpus/reader/timit.py:165
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/timit.py:165: DeprecationWarning: invalid escape sequence \.
encoding = [(".*\.wav", None), (".*", encoding)]
nltk/nltk/corpus/reader/bracket_parse.py:214
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/bracket_parse.py:214: DeprecationWarning: invalid escape sequence \.
"alpino\.xml",
nltk/nltk/corpus/reader/xmldocs.py:232
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/xmldocs.py:232: DeprecationWarning: invalid escape sequence \s
_XML_TAG_NAME = re.compile("<\s*/?\s*([^\s>]+)")
nltk/nltk/toolbox.py:209
/home/pombreda/tmp/nl/nltk/nltk/toolbox.py:209: DeprecationWarning: invalid escape sequence \_
"""
nltk/nltk/corpus/reader/bnc.py:29
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/bnc.py:29: DeprecationWarning: invalid escape sequence \w
"""
nltk/nltk/corpus/reader/switchboard.py:113
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/switchboard.py:113: DeprecationWarning: invalid escape sequence \w
_UTTERANCE_RE = re.compile("(\w+)\.(\d+)\:\s*(.*)")
nltk/nltk/corpus/reader/childes.py:281
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/childes.py:281: DeprecationWarning: invalid escape sequence \d
m = re.match("P(\d+)Y(\d+)M?(\d?\d?)D?", age_year)
nltk/nltk/corpus/reader/framenet.py:2753
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/framenet.py:2753: DeprecationWarning: invalid escape sequence \w
"""
nltk/nltk/corpus/reader/udhr.py:30
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/udhr.py:30: DeprecationWarning: invalid escape sequence \-
("Abkhaz\-Cyrillic\+Abkh", "cp1251"),
nltk/nltk/corpus/reader/twitter.py:54
/home/pombreda/tmp/nl/nltk/nltk/corpus/reader/twitter.py:54: DeprecationWarning: invalid escape sequence \.
"""
nltk/nltk/ccg/combinator.py:225
/home/pombreda/tmp/nl/nltk/nltk/ccg/combinator.py:225: DeprecationWarning: invalid escape sequence \Y
"""
nltk/nltk/treetransforms.py:108
/home/pombreda/tmp/nl/nltk/nltk/treetransforms.py:108: DeprecationWarning: invalid escape sequence \
"""
๊ทธ๋ฆฌ๊ณ FWIW : https://docs.python.org/3/reference/lexical_analysis.html#string -and-bytes-literals
ํ์ค C์ ๋ฌ๋ฆฌ, ์ธ์๋์ง ์๋ ๋ชจ๋ ์ด์ค์ผ์ดํ ์ํ์ค๋ ๋ฌธ์์ด์ ๋ณ๊ฒฝ๋์ง ์์ ์ฑ ๋จ์ ์์ต๋๋ค. ์ฆ, ๋ฐฑ ์ฌ๋์๋ ๊ฒฐ๊ณผ์ ๋จ์ต๋๋ค. (์ด ๋์์ ๋๋ฒ๊น ํ ๋ ์ ์ฉํฉ๋๋ค. ์ด์ค์ผ์ดํ ์ํ์ค๊ฐ โโ์๋ชป ์ ๋ ฅ๋๋ฉด ๊ฒฐ๊ณผ ์ถ๋ ฅ์ด ๋์ด์ง ๊ฒ์ผ๋ก ๋ ์ฝ๊ฒ ์ธ์๋ฉ๋๋ค.) ๋ํ ๋ฌธ์์ด ๋ฆฌํฐ๋ด์์๋ง ์ธ์๋๋ ์ด์ค์ผ์ดํ ์ํ์ค๊ฐ โโ๋ฐ์ดํธ์ ๋ํด ์ธ์๋์ง ์๋ ์ด์ค์ผ์ดํ ๋ฒ์ฃผ์ ์ํ๋ค๋ ์ ์ ์ ์ํด์ผํฉ๋๋ค. ๋ฆฌํฐ๋ด.
๋ฒ์ 3.6์์ ๋ณ๊ฒฝ : ์ธ์ ํ ์์๋ ์ด์ค์ผ์ดํ ์ํ์ค๋ DeprecationWarning์ ์์ฑํฉ๋๋ค. ํฅํ Python ๋ฒ์ ์์๋ SyntaxError๊ฐ ๋ ๊ฒ์ ๋๋ค.
$ python-๋ฒ์
ํ์ด์ฌ 3.6.7
$ pytest-๋ฒ์
์ด /pytest.py์์ ์์
, pytest ๋ฒ์ 5.1.2์
๋๋ค$ pytest -vvs nltk / --collect-only============================ ํ
์คํธ ์ธ์
์์ ================= ============ํ๋ซํผ ๋ฆฌ๋
์ค-Python 3.6.7, pytest-5.1.2, py-1.8.0, pluggy-0.12.0-* / python3
cachedir : .pytest_cache
rootdir : ** / nltk
์์ง ๋ ํญ๋ชฉ 381 ๊ฐ
nltk.compat์ ๋ํ ๋จ์ ํ
์คํธ.
nltk / test / compat.doctest๋ ์ฐธ์กฐํ์ญ์์ค.
nltk.metrics.aline์ ๋ํ ๋จ์ ํ
์คํธ
์์ฑ ์ํ์ค ์ ๋ ฌ์์ํ Aline ์๊ณ ๋ฆฌ์ฆ ํ
์คํธ
๋ ์ธ๊ทธ๋จผํธ ๊ฐ์ ์ฐจ์ด๋ฅผ ๊ณ์ฐํ๊ธฐ ์ํด ๋ผ์ธ ํ
์คํธ
Brill tagger๋ฅผ ํ
์คํธํฉ๋๋ค.
๋ฒ๊ทธ ํ
์คํธ https://github.com/nltk/nltk/issues/1597
Ensures that curly bracket quantifiers can be used inside a chunk rule.
This type of quantifier has been used for the supplementary example
in http://www.nltk.org/book/ch07.html#exploring-text-corpora.
nltk.classify์ ๋ํ ๋จ์ ํ
์คํธ. ์ฐธ์กฐ : nltk / test / classify.doctest
๋ค์์ ์ฌ์ฉํ์ฌ ์์ฑ๋ ํ
์คํธ : http://www.nltk.org/book/ch01.html
Stanford CoreNLP ๋ํผ์ ๋ํ ๋ชจ์ ํ
์คํธ.
์ฝํผ์ค ๋ทฐ ํ๊ท ํ
์คํธ
nltk.metrics.agreement.Disagreement์ ๋ํ ๋จ์ ํ
์คํธ๋ฅผ ํฌํจํ๋ ํด๋์ค์
๋๋ค.
๋ค์์ ๊ธฐ๋ฐ์ผ๋ก ํ ๊ณ ๊ธ ํ
์คํธ
http://www.agreestat.com/research_papers/onkrippendorffalpha.pdf
๋์ผํ ๊ณ ๊ธ ์์ด์ง๋ง 1 ๊ฐ์ ๋ฑ๊ธ์ด ์ ๊ฑฐ๋์์ต๋๋ค.
๋ค์ ๋งํ์ง๋ง, ๊ทธ 1 ๋ฑ๊ธ์ ์ ๊ฑฐ๋ ์ค์ํ์ง ์์ต๋๋ค.
๊ธฐ๋ฐ์ ๊ฐ๋จํ ํ
์คํธ
https://github.com/foolswood/krippendorffs_alpha/raw/master/krippendorff.pdf.
ํ์ 1 ๊ฐ๊ฐ ์ญ์ ๋ ๋์ผํ ๊ฐ๋จํ ํ
์คํธ์
๋๋ค.
๋ฑ๊ธ ์ ๊ฑฐ๋ ์ค์ํ์ง ์์ต๋๋ค. K-Apha๋
๋จ 1 ๊ฐ์ ํ๊ฐ.
Twitter์์ json2csv()
๋ฐ json2csv_entities()
์ ๋ํ ํ๊ท ํ
์คํธ
๊พธ๋ฌ๋ฏธ.
ํ์ผ ๋น๊ต๊ฐ ์ค ํ์ง๋ฅผ ์ ๊ณตํ์ง ์๋์ง ํ์ธํฉ๋๋ค.
nltk.corpus.nombank์ ๋ํ ๋จ์ ํ
์คํธ
nltk.pos_tag ํ
์คํธ
๋ค์ ํ
์คํธ๋ ์์์ ์ผ๋ จ์ ์ฝ๊ธฐ, ๊ฒ์ ๋ฐ
๊ฒฐ๊ณผ๊ฐ ์ผ๊ด์ฑ์ด ์๋์ง ์๋ ค์ฃผ๊ณ ํ์ธํฉ๋๋ค.
Senna์ ๋ํ ๋จ์ ํ
์คํธ
nltk.classify.senna์ ๋ํ ๋จ์ ํ
์คํธ
Senna ํ์ดํ ๋ผ์ธ ์ธํฐํ์ด์ค
nltk.tag.senna์ ๋ํ Unittest
๋๋ฉ์ด ์๋์ด ๋ผ์ดํธ ์คํ
๋จธ ํ
์คํธ๋ฅผ์ํ์ด ๋จ์ ํ
์คํธ
์ด ํํ์ ๋ถ์๊ธฐ๋ ์ ๋์ฌ์ ์ ๋ฏธ์ฌ๋ฅผ ๋ค๋ฃน๋๋ค.
๋ฒ๊ทธ ํ
์คํธ https://github.com/nltk/nltk/issues/1581
Ensures that 'oed' can be stemmed without throwing an error.
<TestCaseFunction test_vocabulary_martin_mode>
Tests all words from the test vocabulary provided by M Porter
The sample vocabulary and output were sourced from:
http://tartarus.org/martin/PorterStemmer/voc.txt
http://tartarus.org/martin/PorterStemmer/output.txt
and are linked to from the Porter Stemmer algorithm's homepage
at
http://tartarus.org/martin/PorterStemmer/
<TestCaseFunction test_vocabulary_nltk_mode>
<TestCaseFunction test_vocabulary_original_mode>
nltk.tgrep์ ๋ํ ๋จ์ ํ
์คํธ.
nltk.tgrep์ ๋ํ ๋จ์ ํ
์คํธ๋ฅผ ํฌํจํ๋ ํด๋์ค์
๋๋ค.
์ ์๋์ง ์์ tgrep ์ฐ์ฐ์์ ์ค๋ฅ ์ฒ๋ฆฌ๋ฅผ ํ
์คํธํฉ๋๋ค.
์ฃผ์์ด tgrep ๊ฒ์์์ ์ฌ๋ฐ๋ฅด๊ฒ ํํฐ๋ง๋๋์ง ํ
์คํธ
๋ฌธ์์ด.
TGrep2 ๋งค๋ด์ผ์์ ๊ธฐ๋ณธ ์์ ๋ฅผ ํ
์คํธํ์ญ์์ค.
๋ ์ด๋ธ์ด์๋ ๋
ธ๋๋ฅผ ํ
์คํธํฉ๋๋ค.
Test case from Emily M. Bender.
<TestCaseFunction test_multiple_conjs>
Test that multiple (3 or more) conjunctions of node relations are
handled properly.
<TestCaseFunction test_node_encoding>
Test that tgrep search strings handles bytes and strs the same
way.
<TestCaseFunction test_node_nocase>
Test selecting nodes using case insensitive node names.
<TestCaseFunction test_node_noleaves>
Test node name matching with the search_leaves flag set to False.
<TestCaseFunction test_node_printing>
Test that the tgrep print operator ' is properly ignored.
<TestCaseFunction test_node_quoted>
Test selecting nodes using quoted node names.
<TestCaseFunction test_node_regex>
Test regex matching on nodes.
<TestCaseFunction test_node_regex_2>
Test regex matching on nodes.
<TestCaseFunction test_node_simple>
Test a simple use of tgrep for finding nodes matching a given
pattern.
<TestCaseFunction test_node_tree_position>
Test matching on nodes based on NLTK tree position.
<TestCaseFunction test_rel_precedence>
Test matching nodes based on precedence relations.
<TestCaseFunction test_rel_sister_nodes>
Test matching sister nodes in a tree.
<TestCaseFunction test_tokenize_encoding>
Test that tokenization handles bytes and strs the same way.
<TestCaseFunction test_tokenize_examples>
Test tokenization of the TGrep2 manual example patterns.
<TestCaseFunction test_tokenize_link_types>
Test tokenization of basic link types.
<TestCaseFunction test_tokenize_macros>
Test tokenization of macro definitions.
<TestCaseFunction test_tokenize_node_labels>
Test tokenization of labeled nodes.
<TestCaseFunction test_tokenize_nodenames>
Test tokenization of node names.
<TestCaseFunction test_tokenize_quoting>
Test tokenization of quoting.
<TestCaseFunction test_tokenize_segmented_patterns>
Test tokenization of segmented patterns.
<TestCaseFunction test_tokenize_simple>
Simple test of tokenization.
<TestCaseFunction test_trailing_semicolon>
Test that semicolons at the end of a tgrep2 search string won't
cause a parse failure.
<TestCaseFunction test_use_macros>
Test defining and using tgrep2 macros.
<TestCaseFunction tests_rel_dominance>
Test matching nodes based on dominance relations.
<TestCaseFunction tests_rel_indexed_children>
Test matching nodes based on their index in their parent node.
nltk.tokenize์ ๋ํ ๋จ์ ํ
์คํธ.
nltk / test / tokenize.doctest๋ ์ฐธ์กฐํ์ญ์์ค.
๋จ์ด ํ ํฐ ํ๋ฅผ ์ํด ๋ณํ์ ํจ๋ฉ์ ํ
์คํธํฉ๋๋ค.
๋จ์ด ํ ํฐ ํ๋ฅผ ์ํด dotdot *์ ํจ๋ฉ์ ํ
์คํธํฉ๋๋ค.
์ ํ ๋ฒํธ์ ๋น์ทํ์ง๋ง ์ค ๋ฐ๊ฟ์ด ํฌํจ ๋ ๋ฌธ์์ด ํ
์คํธ
ํน์ ์ ์ ๋ ์ฃ์ง ์ผ์ด์ค๋ก casual.py์์ remove_handle () ํ
์คํธ
SyllableTokenizer ํ ํฌ ๋์ด์ ๋ฅผ ํ
์คํธํฉ๋๋ค.
์๋์ด ์ฉ Stanford Word Segmenter ํ
์คํธ (๊ธฐ๋ณธ ๊ตฌ์ฑ)
์ค๊ตญ์ด ์ฉ Stanford Word Segmenter ํ
์คํธ (๊ธฐ๋ณธ ๊ตฌ์ฑ)
TreebankWordTokenizer.span_tokenize ํจ์ ํ
์คํธ
ํน์ ๋ฌธ์์ ์
์ผํธ ๋ถํธ๊ฐ์๋ ๋จ์ด๋ฅผ ์ฌ์ฉํ์ฌ TweetTokenizer๋ฅผ ํ
์คํธํฉ๋๋ค.
word_tokenize ํจ์ ํ
์คํธ
Twitter ํจํค์ง์ ์ ์ ๋ถ๋ถ ํ
์คํธ
ํ์ผ์ Twitter ์๊ฒฉ ์ฆ๋ช
์ ๋ณด๊ฐ ์ฌ๋ฐ๋ฅด๊ฒ ์ฒ๋ฆฌ๋๋์ง ํ
์คํธํฉ๋๋ค.
๊ธฐ๋ณธ ์๊ฒฉ ์ฆ๋ช
ํ์ผ์ด ์๋ณ ๋จ
๊ธฐ๋ณธ ์๊ฒฉ ์ฆ๋ช
ํ์ผ์ ์ฌ๋ฐ๋ฅด๊ฒ ์ฝ์์ต๋๋ค.
์ง์ ๋ ๊ธฐ๋ณธ ์๊ฒฉ ์ฆ๋ช
ํ์ผ์ ๊ฒฝ๋ก๊ฐ ์ฌ๋ฐ๋ฅธ ํ์์
๋๋ค.
subdir.
subdir์ ๋น ๊ฒฝ๋ก๋ก ์ค์ ํ๋ฉด ์ค๋ฅ๊ฐ ๋ฐ์ํฉ๋๋ค.
subdir์ None
์ค์ ํ๋ฉด ์ค๋ฅ๊ฐ ๋ฐ์ํฉ๋๋ค.
ํ๊ฒฝ ๋ณ์๊ฐ ์ฌ๋ฐ๋ฅด๊ฒ ์ฝํ๋์ง ํ
์คํธํ์ญ์์ค.
์๊ฒฉ ์ฆ๋ช
ํ์ผ 'bad_oauth1-1.txt'๊ฐ ๋ถ์์ ํฉ๋๋ค.
์๊ฒฉ ์ฆ๋ช
ํ์ผ 'bad_oauth1-2.txt'์ ์ฒซ ๋ฒ์งธ ํค ํ์์ด ์๋ชป๋์์ต๋๋ค.
์๊ฒฉ ์ฆ๋ช
ํ์ผ 'bad_oauth1-2.txt'์ ์ฒซ ๋ฒ์งธ ํค ํ์์ด ์๋ชป๋์์ต๋๋ค.
subdir์ ์กด์ฌํ์ง ์๋ ๋๋ ํ ๋ฆฌ๋ก ์ค์ ํ๋ฉด ์ค๋ฅ๊ฐ ๋ฐ์ํฉ๋๋ค.
'credentials.txt'๊ฐ ์๋๊ธฐ ๋๋ฌธ์ ์ธ์ฆ ๊ธฐ๋ณธ๊ฐ์ด ์คํจํฉ๋๋ค.
os.environ['TWITTER']
์์ ์ฝ์๋๋ก ๊ธฐ๋ณธ ํ์ ๋๋ ํ ๋ฆฌ์ ์์ต๋๋ค.
๊ธฐ๋ณธ ํ์ ๋๋ ํฐ๋ฆฌ์์ ์๊ฒฉ ์ฆ๋ช
ํ์ผ 'foobar'๋ฅผ ์ฐพ์ ์ ์์ต๋๋ค.
nltk.corpus.wordnet์ ๋ํ ๋จ์ ํ
์คํธ
nltk / test / wordnet.doctest๋ ์ฐธ์กฐํ์ญ์์ค.
์กฐํ ๋ง ํฌํจํ๊ณ ์์ ์์ด NgramCounter๋ฅผ ํ
์คํธํฉ๋๋ค.
MLE ngram ๋ชจ๋ธ์ ๋ํ ๋จ์ ํ
์คํธ.
MLE ํธ๋ผ์ด ๊ทธ๋จ ๋ชจ๋ธ ํ
์คํธ
Lidstone ํด๋์ค์ ๋ํ ๋จ์ ํ
์คํธ
Laplace ํด๋์ค์ ๋ํ ๋จ์ ํ
์คํธ
MLE ๋ชจ๋ธ์ ์ฌ์ฉํ์ฌ ํ
์คํธ๋ฅผ ์์ฑํฉ๋๋ค.
ํ
์คํธ ์ดํ ํด๋์ค
BLEU ๋ฒ์ญ ํ๊ฐ ๋ฉํธ๋ฆญ ํ
์คํธ
BLEU ์๋ณธ ๋
ผ๋ฌธ์ ์
http://www.aclweb.org/anthology/P02-1040.pdf
GDFA ์ ๋ ฌ ํ
์คํธ
๋ฌธ์ # 1829์ ์ฒ์ 10 ๊ฐ์ eflomal ์ถ๋ ฅ์ผ๋ก GDFA ํ
์คํธ
https://github.com/nltk/nltk/issues/1829
IBM Model 1 ๊ต์ก ๋ฐฉ๋ฒ ํ
์คํธ
IBM Model 2 ๊ต์ก ๋ฐฉ๋ฒ ํ
์คํธ
IBM Model 3 ๊ต์ก ๋ฐฉ๋ฒ ํ
์คํธ
IBM Model 4 ๊ต์ก ๋ฐฉ๋ฒ ํ
์คํธ
๊ฐ์ฅ ์ ์ฉํ ๋๊ธ
์ด ์์ ์ํ๋ ์ฌ๋์ด ์์ผ๋ฉดํ๊ณ ์ถ์ต๋๋ค. ๋ฌธ์ ๋ฅผ ๋ณต์ ํ๋ ๋จ๊ณ๋ฅผ ๋ง์ํด ์ฃผ์๊ฒ ์ต๋๊น?