Nltk: "/"๊ฐ€์žˆ๋Š” ๋ฌธ์žฅ์ด์žˆ๋Š” DependencyGraph ๋˜๋Š” Stanford Parser API ๋ฌธ์ œ

์— ๋งŒ๋“  2016๋…„ 11์›” 18์ผ  ยท  31์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: nltk/nltk

์‚ฌ์šฉ์ž๊ฐ€์ด ๋ฌธ์žฅ์— ๋Œ€ํ•ด NLTK์—์„œ Stanford์˜ DependencyParser API๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ์ด ๋ฌธ์žฅ์ด ๋˜์ง€๊ณ  AssertionError ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค๊ณ ๋ณด๊ณ ํ–ˆ์Šต๋‹ˆ๋‹ค.

์‹ญ๋Œ€ ์‚ถ์˜ ๊ฟˆ์˜ ์„ธ๊ณ„์— ๋Œ€ํ•œ ๋ชจ๋“  ํ†ต์ฐฐ๋ ฅ๊ณผ ์‚ฌ์ด๋ฒ„ ๋ฌธํ™”๋ฅผ ํ†ตํ•œ ์ „์ž์  ํ‘œํ˜„์œผ๋กœ ์ธํ•ด ์˜ํ™”๋Š” 2 ์‹œ๊ฐ„ 30 ๋ถ„์˜ ์ƒ์˜ ์‹œ๊ฐ„์—์„œ ์ผ๊ด€๋œ ์ด์•ผ๊ธฐ๋ฅผ ์ด๋Œ์–ด ๋‚ด๊ณ ์žํ•˜๋Š” ์‚ฌ๋žŒ์—๊ฒŒ ๋ถ„๊ธฐ๋ฅผ์ฃผ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์•”ํ˜ธ:

>>> from nltk.parse.stanford import StanfordDependencyParser                                                                                       >>> dep_parser=StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")                                        
>>> sent = 'for all of its insights into the dream world of teen life , and its electronic expression through cyber culture , the film gives no quarter to anyone seeking to pull a cohesive story out of its 2 1/2-hour running time . '
>>> dep_parser.raw_parse(sent)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/nltk/parse/stanford.py", line 132, in raw_parse
    return next(self.raw_parse_sents([sentence], verbose))
  File "/Library/Python/2.7/site-packages/nltk/parse/stanford.py", line 150, in raw_parse_sents
    return self._parse_trees_output(self._execute(cmd, '\n'.join(sentences), verbose))
  File "/Library/Python/2.7/site-packages/nltk/parse/stanford.py", line 91, in _parse_trees_output
    res.append(iter([self._make_tree('\n'.join(cur_lines))]))
  File "/Library/Python/2.7/site-packages/nltk/parse/stanford.py", line 339, in _make_tree
    return DependencyGraph(result, top_relation_label='root')
  File "/Library/Python/2.7/site-packages/nltk/parse/dependencygraph.py", line 84, in __init__
    top_relation_label=top_relation_label,
  File "/Library/Python/2.7/site-packages/nltk/parse/dependencygraph.py", line 328, in _parse
    assert cell_number == len(cells)
AssertionError

์•„๋งˆ๋„ DependencyGraph ๊ฐ€ ์ถœ๋ ฅ์„ ์ฝ๋Š” ๋ฐฉ๋ฒ•์ด๊ฑฐ๋‚˜ Stanford ์ถœ๋ ฅ์ด ์ผ์น˜ํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

NLTK + Stanford ๋„๊ตฌ ์„ค์ •์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ https://gist.github.com/alvations/e1df0ba227e542955a8a#stanford -parser์— ์žˆ์Šต๋‹ˆ๋‹ค.

bug dependency parsing pleaseverify

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

@dimazest ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ•ญ์ƒ ๋„์›€๊ณผ ์ธ๋‚ด์‹ฌ์— ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค!

๋ชจ๋“  31 ๋Œ“๊ธ€

์•ˆ๋…•ํ•˜์„ธ์š” @alvations ์ด๊ฒƒ์— ๋Œ€ํ•œ ์—…๋ฐ์ดํŠธ๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ?
๊ฐ์‚ฌ

@ hoavt-54 # 1249์˜ ์ƒˆ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌธ์ œ๋ฅผ ์ผ์œผํ‚ค๋Š” ์Šคํƒ ํฌ๋“œ ์ธก์ธ์ง€ DependencyGraph ์ฝ”๋“œ์ธ์ง€ ๋น ๋ฅด๊ฒŒ ํ™•์ธํ•  ์ˆ˜์žˆ๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ์˜ค๋Š˜์€ ์กฐ๊ธˆ ๋ฐ”์  ๊ฒƒ ๊ฐ™์ง€๋งŒ ๋‹ค๋ฅธ ์‚ฌ๋žŒ์ด ํ™•์ธํ•˜๊ณ  ๋‹ค์‹œ ํ•  ์ˆ˜์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด ๋ฌธ์ œ๋ฅผ ๋†“์ณค์Šต๋‹ˆ๋‹ค.

@dimazest ์•ˆ๋…•ํ•˜์„ธ์š”, ๋ฐฉ๊ธˆ์ด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ์–ด๋–ป๊ฒŒ ํ•ด๊ฒฐํ•ด์•ผํ•ฉ๋‹ˆ๊นŒ?

@tesslocl ๋‹น์‹ ์˜ ๋ฌธ์žฅ์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ? ๋Œ€์‹  CoreNLP (nltk / parse / corenlp.py)๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๊ณ  ํ–ˆ์Šต๋‹ˆ๊นŒ?

@dimazest ๋ฐฉ๊ธˆํ–ˆ๊ณ  ๋‹ค๋ฅธ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค.

Traceback (most recent call last):
  File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 386, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 382, in _make_request
    httplib_response = conn.getresponse()
  File "C:\Users\Admin\Anaconda3\lib\http\client.py", line 1198, in getresponse
    response.begin()
  File "C:\Users\Admin\Anaconda3\lib\http\client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "C:\Users\Admin\Anaconda3\lib\http\client.py", line 258, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "C:\Users\Admin\Anaconda3\lib\socket.py", line 576, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\adapters.py", line 423, in send
    timeout=timeout
  File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 649, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\packages\urllib3\util\retry.py", line 347, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\packages\urllib3\packages\six.py", line 686, in reraise
    raise value
  File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 388, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 308, in _raise_timeout
    raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
requests.packages.urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='localhost', port=9000): Read timed out. (read timeout=60)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\classifier\feature_extraction.py", line 473, in <module>
    print(feature_extraction(test_file_id))
  File "E:\classifier\feature_extraction.py", line 146, in feature_extraction
    for line in dep_parse:
  File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 279, in raw_parse_sents
    parsed_data = self.api_call(sentence, properties=default_properties)
  File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 247, in api_call
    timeout=60,
  File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\sessions.py", line 535, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\adapters.py", line 499, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='localhost', port=9000): Read timed out. (read timeout=60)

StanfordDependencyParser ๋ถ€๋ถ„ ๋งŒ ๋ณ€๊ฒฝํ•˜๊ณ  ๋‚˜๋จธ์ง€ ์ฝ”๋“œ๋Š” ๋ณ€๊ฒฝํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋ฌธ์„œ๋ฅผ ํ™•์ธํ–ˆ๊ณ  CoreNLP์˜ ๋ฐฉ๋ฒ•์ด ๋™์ผํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ๋งž์Šต๋‹ˆ๊นŒ? ๊ตฌ๋ฌธ ๋ถ„์„ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜๋ณตํ•˜๋Š” ๋‹ค์Œ ์ค„์— ์˜ค๋ฅ˜๊ฐ€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๊ตฌ๋ฌธ ๋ถ„์„์ด ์„ฑ๊ณตํ•œ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

cornlp ์„œ๋ฒ„๋ฅผ ์‹œ์ž‘ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

with CoreNLPServer(port=9000) as server:
    parser = CoreNLPParser(url=server.url)
    parser.parse(...)

ํœด๋Œ€ ์ „ํ™”์— ์ž…๋ ฅํ•˜๋Š” ๋™์•ˆ ๋ˆ„๋ฝ ๋œ ๋ฌธ์„œ์™€ ์…”์ธ  ๋‹ต์žฅ์— ๋Œ€ํ•ด ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค.

@dimazest ๊ท€ํ•˜์˜ ๋„์›€๊ณผ ๋น ๋ฅธ ๋‹ต๋ณ€์— ์ •๋ง ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์˜ค๋ฅ˜๋Š” ๊ณ„์†๋ฉ๋‹ˆ๋‹ค :(

Traceback (most recent call last):
  File "E:\classifier\feature_extraction.py", line 474, in <module>
    print(feature_extraction(test_file_id))
  File "E:\classifier\feature_extraction.py", line 135, in feature_extraction
    with CoreNLPServer(port=9000) as server:
  File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 81, in __init__
    try_port(port)
  File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 35, in try_port
    sock.bind(('', port))
OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted

์ธํ„ฐ๋„ท ๊ฒ€์ƒ‰์„ ์‹œ๋„ํ–ˆ์ง€๋งŒ ์†Œ์ผ“์ด ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€ ๋ชจ๋ฅด๊ฒ ์Šต๋‹ˆ๋‹ค ...

๋‹ค๋ฅธ ํฌํŠธ (์˜ˆ : CoreNLPServer (port = 9001))๋ฅผ ์‹œ๋„ํ•˜๊ฑฐ๋‚˜ CoreNLPServer () ๋งŒ ์‹œ๋„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ํฌํŠธ๋ฅผ ์„ ํƒํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

@dimazest ๋‚˜๋Š” 9010๊นŒ์ง€ 9001์„ ์‹œ๋„ํ•˜๊ณ  ๋นˆ ๊ด„ํ˜ธ๋„ ์‹œ๋„ํ–ˆ์œผ๋ฉฐ ์ด๊ฒƒ์€ ๋งค๋ฒˆ ์–ป๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Traceback (most recent call last):
  File "E:\classifier\feature_extraction.py", line 509, in <module>
    print(feature_extraction(test_file_id))
  File "E:\classifier\feature_extraction.py", line 136, in feature_extraction
    with CoreNLPServer() as server:
  File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 170, in __enter__
    self.start()
  File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 149, in start
    'Could not connect to the server.'
nltk.parse.corenlp.CoreNLPServerError: Could not connect to the server.

๋‚ด๊ฐ€ ์ค‘๊ตญ์— ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๊ณ ๋ คํ•  ๋•Œ VPN์„ ์‹คํ–‰ํ•˜๋Š” ๋™์•ˆ VPN์„ ๊ณ„์† ์‚ฌ์šฉํ–ˆ์ง€๋งŒ ์—ฌ์ „ํžˆ ์šด์ด ์—†์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๋‚ด ๋ฌธ์ œ๋Š” ๋ฌด์—‡์ผ๊นŒ์š”?

corenlp .jars๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ? corenlp ์„œ๋ฒ„๊ฐ€ ๋กœ์ปฌ๋กœ ์‹คํ–‰๋˜์–ด์•ผํ•ฉ๋‹ˆ๋‹ค.

์ด ์˜ˆ์ œ๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ https://github.com/nltk/nltk/pull/1249#pullrequestreview -18096061

E:\classifier\stanford\stanford-corenlp-full-2016-10-31 ๋””๋ ‰ํ† ๋ฆฌ ์•„๋ž˜์— ํ•ญ์•„๋ฆฌ๊ฐ€ ์žˆ์œผ๋ฉฐ ์ด๊ฒƒ์ด ์ฐธ์กฐํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.

stanford-corenlp-3.7.0.jar
stanford-corenlp-3.7.0-javadoc.jar
stanford-corenlp-3.7.0-models.jar
stanford-corenlp-3.7.0-sources.jar

๊ทธ๋ฆฌ๊ณ  ๋””๋ ‰ํ† ๋ฆฌ๋Š” CLASSPATH ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋กœ ์„ค์ •๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Windows ๋ช…๋ น ํ”„๋กฌํ”„ํŠธ์—์„œ ์˜ˆ์ œ๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ด๊ฒƒ์ด ์ถœ๋ ฅ์ž…๋‹ˆ๋‹ค.

Python 3.5.3 |Anaconda custom (64-bit)| (default, Feb 22 2017, 21:28:42) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from nltk.parse.corenlp import *
>>> global server
>>> server = CoreNLPServer()
>>> server.start()
>>> parser = CoreNLPParser(url='http://localhost:9000')
>>> sent = 'the quick brown fox jumps over the lazy dog'
>>> parser.raw_parse(sent)
<list_iterator object at 0x000001F0EFED69E8>
>>> fox_parsed = next(parser.raw_parse(sent))
>>> fox_parsed.pretty_print()
                     ROOT
                      |
                      NP
       _______________|_________
      |                         NP
      |                _________|___
      |               |             PP
      |               |     ________|___
      NP              NP   |            NP
  ____|__________     |    |     _______|____
 DT   JJ    JJ   NN  NNS   IN   DT      JJ   NN
 |    |     |    |    |    |    |       |    |
the quick brown fox jumps over the     lazy dog

๊ทธ๋ฆฌ๊ณ ์ด ์ค„์„ server.start() ์‹คํ–‰ํ–ˆ์„ ๋•Œ Windows ๋ณด์•ˆ ๊ฒฝ๊ณ ๊ฐ€ ๋‚˜ํƒ€๋‚˜๊ณ  ์ด๊ฒƒ์ด ๋ฐฉํ™”๋ฒฝ์˜ ์ž˜๋ชป์ด๋ผ๊ณ  ์ƒ๊ฐํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ฐฉํ™”๋ฒฝ ์„ค์ •์œผ๋กœ ๊ฐ€์„œ ๋ฐฉํ™”๋ฒฝ์„ ํ†ตํ•ด Java (TM) Platform SE ๋ฐ”์ด๋„ˆ๋ฆฌ๋ฅผ ํ—ˆ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ–ˆ์ง€๋งŒ ํŽธ์ง‘๊ธฐ๋ฅผ ๋‹ค์‹œ ์—ด๊ณ  ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜๋ฉด ์—ฌ์ „ํžˆ ๋™์ผํ•œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

Traceback (most recent call last):
  File "E:\classifier\feature_extraction.py", line 503, in <module>
    print(feature_extraction(test_file_id))
  File "E:\classifier\feature_extraction.py", line 130, in feature_extraction
    with CoreNLPServer() as server:
  File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 170, in __enter__
    self.start()
  File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 149, in start
    'Could not connect to the server.'
nltk.parse.corenlp.CoreNLPServerError: Could not connect to the server.

์„œ๋ฒ„๋ฅผ ์‹œ์ž‘ํ•˜๋ฉด ๋ธŒ๋ผ์šฐ์ €์—์„œ http : // localhost : 9000์— ์•ก์„ธ์Šค ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

์ง์ ‘ ์„œ๋ฒ„๋ฅผ ์‹œ์ž‘ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. https://stanfordnlp.github.io/CoreNLP/corenlp-server.html ์„

ํ•˜๋‚˜๋Š” ์‹คํ–‰ ์ค‘์ด๊ณ  ๋ธŒ๋ผ์šฐ์ €๋ฅผ ํ†ตํ•ด ์•ก์„ธ์Šค ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ํŒŒ์„œ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์–ด์•ผํ•ฉ๋‹ˆ๋‹ค.

parser = CoreNLPParser(url='http://localhost:9000')
# and so on

์ด๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ๋ช…๋ น ํ”„๋กฌํ”„ํŠธ์—์„œ ์˜ˆ์ œ๋ฅผ ๋‹ค์‹œ ์‹คํ–‰ํ–ˆ์ง€๋งŒ ์ด๋ฒˆ์—๋Š” ์ต์ˆ™ํ•œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

Python 3.5.3 |Anaconda custom (64-bit)| (default, Feb 22 2017, 21:28:42) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from nltk.parse.corenlp import *
>>> global server
>>> server = CoreNLPServer()
>>> server.start()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 149, in start
    'Could not connect to the server.'
nltk.parse.corenlp.CoreNLPServerError: Could not connect to the server.
>>> server.start()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 149, in start
    'Could not connect to the server.'
nltk.parse.corenlp.CoreNLPServerError: Could not connect to the server.

... ๊ทธ๋ฆฌ๊ณ  ๊ทธ ์‚ฌ์ด์— ๋ฌด์Šจ ์ผ์ด ์ผ์–ด ๋‚ฌ๋Š”์ง€ ์ „ํ˜€ ๋ชจ๋ฆ…๋‹ˆ๋‹ค. ๊ตฌ์„ฑ์„ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

์˜ค๋ž˜ ์ „์—์ด ํ”„๋กœ์ ํŠธ๋ฅผ Linux๋กœ ์˜ฎ๊ธฐ๊ณ  ์‹ถ์—ˆ์ง€๋งŒ Linux์—์„œ NLTK was unable to find ***.jar! Set the CLASSPATH environment variable ์˜ค๋ฅ˜๊ฐ€ ๊ณ„์† ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. /etc/environment , /etc/profile ๋ฐ ~/.bash_profile ์—์„œ CLASSPATH๋ฅผ ์„ค์ •ํ–ˆ์œผ๋ฉฐ ํ•ด๋‹น ํ•ญ์•„๋ฆฌ๋ฅผ $JAVA_HOME/lib/ ๋ณต์‚ฌํ•˜๋ ค๊ณ  ์‹œ๋„ํ–ˆ์ง€๋งŒ ๋ฌธ์ œ๊ฐ€ ๊ณ„์† ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ๋ฌธ์ œ๋ฅผ ์—ด์–ด์•ผํ•ฉ๋‹ˆ๊นŒ?

Python์ด ์•„๋‹Œ ํ„ฐ๋ฏธ๋„์—์„œ corenlp ์„œ๋ฒ„๋ฅผ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ? ์ž์„ธํ•œ ๋‚ด์šฉ์€ https://stanfordnlp.github.io/CoreNLP/corenlp-server.html ์„ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.

# Run the server using all jars in the current directory (e.g., the CoreNLP home directory)
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
E:\classifier\stanford\stanford-corenlp-full-2016-10-31>java -mx4g -cp "E:\classifier\stanford\stanford-corenlp-full-2016-10-31" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLPServer

E:\classifier\stanford\stanford-corenlp-full-2016-10-31>java -Xmx4g -cp "E:\classifier\stanford\stanford-corenlp-full-2016-10-31" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -serverProperties StanfordCoreNLP-chinese.properties -port 9000 -timeout 15000
Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLPServer

E:\classifier\stanford\stanford-corenlp-full-2016-10-31>java -mx4g -cp "E:\classifier\stanford\stanford-corenlp-full-2016-10-31" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -timeout 15000
Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLPServer

๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋‚ด๊ฐ€ ์ œ๋Œ€๋กœํ•˜๊ณ  ์žˆ๋Š”๊ฐ€?

๋งˆ์นจ๋‚ด Linux์—์„œ ์ž‘๋™ํ•˜๋Š” Stanford ๋ชจ๋“ˆ์„ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค. ๋™์ผํ•œ ์ฝ”๋“œ ์ค„๋กœ CoreNLP ์„œ๋ฒ„๊ฐ€ ๋ฌธ์ œ์—†์ด ์‹œ์ž‘๋˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ด์ง€๋งŒ ๊ตฌ๋ฌธ ๋ถ„์„ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜๋ณตํ•˜๋Š” ์ค„์—์„œ ๋‹ค๋ฅธ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

Traceback (most recent call last):
  File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/connection.py", line 141, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/util/connection.py", line 83, in create_connection
    raise err
  File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 356, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/home/tesslo/anaconda3/lib/python3.5/http/client.py", line 1107, in request
    self._send_request(method, url, body, headers)
  File "/home/tesslo/anaconda3/lib/python3.5/http/client.py", line 1152, in _send_request
    self.endheaders(body)
  File "/home/tesslo/anaconda3/lib/python3.5/http/client.py", line 1103, in endheaders
    self._send_output(message_body)
  File "/home/tesslo/anaconda3/lib/python3.5/http/client.py", line 934, in _send_output
    self.send(msg)
  File "/home/tesslo/anaconda3/lib/python3.5/http/client.py", line 877, in send
    self.connect()
  File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/connection.py", line 166, in connect
    conn = self._new_conn()
  File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/connection.py", line 150, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x7f110a9c4940>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/adapters.py", line 438, in send
    timeout=timeout
  File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 649, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/util/retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=9000): Max retries exceeded with url: /?properties=%7B%22annotators%22%3A+%22tokenize%2Cpos%2Clemma%2Cssplit%2Cdepparse%22%2C+%22outputFormat%22%3A+%22json%22%2C+%22ssplit.isOneSentence%22%3A+%22true%22%7D (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f110a9c4940>: Failed to establish a new connection: [Errno 111] Connection refused',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/media/tesslo/classifier/feature_extraction.py", line 503, in <module>
    print(feature_extraction(test_file_id))
  File "/media/tesslo/classifier/feature_extraction.py", line 142, in feature_extraction
    for line in dep_parse:
  File "/home/tesslo/anaconda3/lib/python3.5/site-packages/nltk/parse/corenlp.py", line 279, in raw_parse_sents
    parsed_data = self.api_call(sentence, properties=default_properties)
  File "/home/tesslo/anaconda3/lib/python3.5/site-packages/nltk/parse/corenlp.py", line 247, in api_call
    timeout=60,
  File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/sessions.py", line 565, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/sessions.py", line 518, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/sessions.py", line 639, in send
    r = adapter.send(request, **kwargs)
  File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/adapters.py", line 502, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=9000): Max retries exceeded with url: /?properties=%7B%22annotators%22%3A+%22tokenize%2Cpos%2Clemma%2Cssplit%2Cdepparse%22%2C+%22outputFormat%22%3A+%22json%22%2C+%22ssplit.isOneSentence%22%3A+%22true%22%7D (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f110a9c4940>: Failed to establish a new connection: [Errno 111] Connection refused',))

์ฐธ๊ณ ๋กœ Linux์™€ Windows๋Š” ๋™์ผํ•œ ํ•˜๋“œ์›จ์–ด๋ฅผ ๊ณต์œ ํ•ฉ๋‹ˆ๋‹ค.

๋„ค, ๋‘ ๋‹จ๊ณ„๊ฐ€ ๊ด€๋ จ๋ฉ๋‹ˆ๋‹ค.
1) CoreNLP Java ํ”„๋กœ์„ธ์Šค๋ฅผ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. java -Xmx4g -cp ... ๋ช…๋ น์„ ์‚ฌ์šฉํ•˜๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ˆ˜๋™์œผ๋กœ ์‹œ์ž‘ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ๋‹น์‹ ์€ ๊ทธ๊ฒƒ์— ์„ฑ๊ณต ํ–ˆ์Šต๋‹ˆ๊นŒ? http : // localhost : 9000์„ ๋ฐฉ๋ฌธํ•˜์—ฌ ๋ธŒ๋ผ์šฐ์ €๋ฅผ ํ†ตํ•ด ์„œ๋ฒ„์— ์•ก์„ธ์Šค ํ•  ์ˆ˜ ์žˆ์–ด์•ผํ•ฉ๋‹ˆ๋‹ค. ์ฝ˜์†” ์ถœ๋ ฅ์—์„œ โ€‹โ€‹์‚ฌ์šฉ์ค‘์ธ ํฌํŠธ๋ฅผ ํ™•์ธํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.
2) ์„œ๋ฒ„๊ฐ€ ์‹คํ–‰๋˜๋ฉด CoreNLP Python ํด๋ผ์ด์–ธํŠธ parser = CoreNLPParser(url='http://localhost:9000') ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. CoreNLP Java ์„œ๋ฒ„๋ฅผ ์ง์ ‘ ์‹œ์ž‘ ํ–ˆ์œผ๋ฏ€๋กœ Python ์„ธ์…˜ ๋‚ด์—์„œ ์‹œ์ž‘ํ•  ํ•„์š” ๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค ( server = CoreNLPServer() ์‹คํ–‰ํ•˜์ง€ ๋งˆ์‹ญ์‹œ์˜ค).

๊ฒŒ์‹œ ํ•œ ์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€๋Š” CoreNLP Java ์„œ๋ฒ„๊ฐ€ ์‹คํ–‰๋˜๊ณ  ์žˆ์ง€ ์•Š์Œ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

์‹คํŒจํ–ˆ์Šต๋‹ˆ๋‹ค.

tesslo<strong i="6">@TLU</strong>:/media/tesslo/classifier/stanford/stanford-corenlp-full-2016-10-31$ java -mx4g -cp "/media/tesslo/classifier/stanford/stanford-corenlp-full-2016-10-31" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLPServer
tesslo<strong i="7">@TLU</strong>:/media/tesslo/classifier/stanford/stanford-corenlp-full-2016-10-31$ java -Xmx4g -cp "/media/tesslo/classifier/stanford/stanford-corenlp-full-2016-10-31" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -serverProperties StanfordCoreNLP-chinese.properties -port 9000 -timeout 15000
Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLPServer
tesslo<strong i="8">@TLU</strong>:/media/tesslo/classifier/stanford/stanford-corenlp-full-2016-10-31$ java -mx4g -cp "/media/tesslo/classifier/stanford/stanford-corenlp-full-2016-10-31" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -timeout 15000
Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLPServer

๊ทธ๋ฆฌ๊ณ  http : // localhost : 9000์€ ERR_CONNECTION_REFUSED ์˜ค๋ฅ˜๋ฅผ ํ‘œ์‹œํ•ฉ๋‹ˆ๋‹ค.

"*" ๋ฅผ ํด๋ž˜์Šค ๊ฒฝ๋กœ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ๋„ ํ–ˆ์Šต๋‹ˆ๊นŒ : j ava -mx4g -cp "*" ... ?

์•ˆ๋…•ํ•˜์„ธ์š”, ๋‚˜๋„์ด ๋ฌธ์ œ์— ์ง๋ฉด ํ•œ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋‚ด ๋ฌธ์žฅ์€ :
'์–ด์ฉŒ๋ฉด 2 21/2 ํ”ผํŠธ ์ฝ”๋“œ ์ผ๊นŒ?', u '๋งˆ์ดํฌ๋กœ USB๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ์ถฉ์ „๊ธฐ ๋ถ€๋ถ„๋ณด๋‹ค ํ’ˆ์งˆ์ด ์ €๋ ดํ•ฉ๋‹ˆ๋‹ค ...'
๊ทธ๋ฆฌ๊ณ  ๋‚˜๋Š” '/'๊ฐ€์ด ์˜ค๋ฅ˜๋ฅผ ์ผ์œผํ‚ค๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค.

@caisinong ์ƒˆ๋กœ์šด CoreNLP ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์‚ฌ์šฉํ•ด ๋ณด์…จ์Šต๋‹ˆ๊นŒ? ์œ„์˜ ๋‚ด ์˜๊ฒฌ์„ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.

@dimazest ์ง€์—ฐ ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค. ๋‚˜๋Š” ๋ฐฉ๊ธˆํ–ˆ๋‹ค :

tesslo<strong i="7">@TLU</strong>:/media/tesslo/classifier/stanford/stanford-corenlp-full-2016-10-31$ java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called ---
[main] INFO CoreNLP - setting default constituency parser
[main] INFO CoreNLP - warning: cannot find edu/stanford/nlp/models/srparser/englishSR.ser.gz
[main] INFO CoreNLP - using: edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz instead
[main] INFO CoreNLP - to use shift reduce parser download English models jar from:
[main] INFO CoreNLP - http://stanfordnlp.github.io/CoreNLP/download.html
[main] INFO CoreNLP -     Threads: 2
[main] INFO CoreNLP - Starting server...
[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000

์ด์ œ http : // localhost : 9000์„ ๋ฐฉ๋ฌธ ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ํŽธ์ง‘๊ธฐ๋กœ ๋Œ์•„ ๊ฐ€๋ฉด ์„œ๋ฒ„๋ฅผ ์‹œ์ž‘ํ•˜๋Š” ์ค„์—์„œ ์—ฌ์ „ํžˆ์ด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

Traceback (most recent call last):
  File "/media/tesslo/classifier/feature_extraction.py", line 503, in <module>
    print(feature_extraction(test_file_id))
  File "/media/tesslo/classifier/feature_extraction.py", line 130, in feature_extraction
    with CoreNLPServer() as server:
  File "/home/tesslo/anaconda3/lib/python3.5/site-packages/nltk/parse/corenlp.py", line 170, in __enter__
    self.start()
  File "/home/tesslo/anaconda3/lib/python3.5/site-packages/nltk/parse/corenlp.py", line 149, in start
    'Could not connect to the server.'
nltk.parse.corenlp.CoreNLPServerError: Could not connect to the server.

์„œ๋ฒ„๋ฅผ ์ˆ˜๋™์œผ๋กœ ์‹œ์ž‘ํ•œ ํ›„์—๋Š” ์ฝ”๋“œ์—์„œ ์„œ๋ฒ„๋ฅผ ์‹œ์ž‘ํ•  ํ•„์š” ๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค .

์„œ๋ฒ„๋ฅผ ๊ณ„์† ์‹คํ–‰ํ•˜๊ณ  ํŒŒ์„œ๋ฅผ ์ธ์Šคํ„ด์Šคํ™”ํ•ฉ๋‹ˆ๋‹ค.

parser = CoreNLPParser(url='http://localhost:9000')

@dimazest ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ•ญ์ƒ ๋„์›€๊ณผ ์ธ๋‚ด์‹ฌ์— ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค!

๋น„์Šทํ•œ ๊ฒฝํ—˜์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ฝ”๋“œ์—์„œ Stanford CorenNLP ์„œ๋ฒ„๋ฅผ ์‹œ์ž‘ํ•˜๋Š” ๊ฒƒ์€ ๋ณต์žกํ•˜๋ฉฐ ํ…Œ์ŠคํŠธ ๋ชฉ์ ์œผ๋กœ ๋งŒ ์‚ฌ์šฉํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ์–ด์ฉŒ๋ฉด ์šฐ๋ฆฌ๋Š” ๊ทธ๊ฒƒ์„ ์‚ฌ์šฉ์ž์—๊ฒŒ ๋…ธ์ถœ์‹œํ‚ค์ง€ ์•Š์•„์•ผ ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ผ์ด ์ž˜๋˜์–ด์„œ ๊ธฐ์ฉ๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ์„œ๋ฒ„๋Š” Python ์ฝ”๋“œ ์™ธ๋ถ€์—์„œ ์‹œ์ž‘ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

์ƒˆ๋กœ์šด CoreNLP API์— ์˜ํ•ด ํŒจ์น˜ ๋ฐ ํ•ด๊ฒฐ๋จ =)

@dimazest Hi ... ํ…์ŠคํŠธ์— \ ๋˜๋Š” / ๊ฒฝ์šฐ Assertion Error ๋Œ€ํ•œ ์†”๋ฃจ์…˜์€ Core NLP ๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๊นŒ? ๋‚˜๋Š” stanford-parser-full-2017-06-09
๊ตฌ๋ฌธ ๋ถ„์„์— ์‚ฌ์šฉ ๋œ ๋ฌธ์žฅ์€ Iraqi security forces drove Islamic State fighters from the centre of a town just south of the militants\' main stronghold of Mosul on Saturday and reached within a few km (miles) of an airport on the edge of the city, a senior commander said.

@ kavin26 ์˜ˆ, nltk.parse.corenlp.CoreNLPParser ๋ฅผ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค.

@alvations ์ •๋ง ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค : +1 :

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰