A user has reported that this sentence throws and AssertionError
when using Stanford's DependencyParser
API in NLTK for this sentence:
for all of its insights into the dream world of teen life , and its electronic expression through cyber culture , the film gives no quarter to anyone seeking to pull a cohesive story out of its 2 1/2-hour running time .
Code:
>>> from nltk.parse.stanford import StanfordDependencyParser >>> dep_parser=StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
>>> sent = 'for all of its insights into the dream world of teen life , and its electronic expression through cyber culture , the film gives no quarter to anyone seeking to pull a cohesive story out of its 2 1/2-hour running time . '
>>> dep_parser.raw_parse(sent)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/nltk/parse/stanford.py", line 132, in raw_parse
return next(self.raw_parse_sents([sentence], verbose))
File "/Library/Python/2.7/site-packages/nltk/parse/stanford.py", line 150, in raw_parse_sents
return self._parse_trees_output(self._execute(cmd, '\n'.join(sentences), verbose))
File "/Library/Python/2.7/site-packages/nltk/parse/stanford.py", line 91, in _parse_trees_output
res.append(iter([self._make_tree('\n'.join(cur_lines))]))
File "/Library/Python/2.7/site-packages/nltk/parse/stanford.py", line 339, in _make_tree
return DependencyGraph(result, top_relation_label='root')
File "/Library/Python/2.7/site-packages/nltk/parse/dependencygraph.py", line 84, in __init__
top_relation_label=top_relation_label,
File "/Library/Python/2.7/site-packages/nltk/parse/dependencygraph.py", line 328, in _parse
assert cell_number == len(cells)
AssertionError
Possibly, it might be how DependencyGraph
is reading the output or that the Stanford output is inconsistent.
More details on the setup for NLTK + Stanford tools is on https://gist.github.com/alvations/e1df0ba227e542955a8a#stanford-parser
Hi @alvations Any updates on this?
Thanks
@hoavt-54 I think there's a quick way to check whether it's Stanford side or the DependencyGraph
code causing the problem using the new interface from #1249. I'll be a little busy today but perhaps someone else can check it out and get back on this.
I can have a look, somehow I've missed this issue.
@dimazest Hello, I just ran into this error. How should I fix this?
@tesslocl what's your sentence? Did you try to use CoreNLP (nltk/parse/corenlp.py) instead?
@dimazest I just did and I ran into another error:
Traceback (most recent call last):
File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 386, in _make_request
six.raise_from(e, None)
File "<string>", line 2, in raise_from
File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 382, in _make_request
httplib_response = conn.getresponse()
File "C:\Users\Admin\Anaconda3\lib\http\client.py", line 1198, in getresponse
response.begin()
File "C:\Users\Admin\Anaconda3\lib\http\client.py", line 297, in begin
version, status, reason = self._read_status()
File "C:\Users\Admin\Anaconda3\lib\http\client.py", line 258, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "C:\Users\Admin\Anaconda3\lib\socket.py", line 576, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\adapters.py", line 423, in send
timeout=timeout
File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 649, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\packages\urllib3\util\retry.py", line 347, in increment
raise six.reraise(type(error), error, _stacktrace)
File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\packages\urllib3\packages\six.py", line 686, in reraise
raise value
File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 388, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 308, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
requests.packages.urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='localhost', port=9000): Read timed out. (read timeout=60)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:\classifier\feature_extraction.py", line 473, in <module>
print(feature_extraction(test_file_id))
File "E:\classifier\feature_extraction.py", line 146, in feature_extraction
for line in dep_parse:
File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 279, in raw_parse_sents
parsed_data = self.api_call(sentence, properties=default_properties)
File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 247, in api_call
timeout=60,
File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\sessions.py", line 535, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\sessions.py", line 488, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\sessions.py", line 609, in send
r = adapter.send(request, **kwargs)
File "C:\Users\Admin\Anaconda3\lib\site-packages\requests\adapters.py", line 499, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='localhost', port=9000): Read timed out. (read timeout=60)
I only changed the StanfordDependencyParser part and left the rest of the code unchanged. I've checked the docs and I suppose the methods in CoreNLP are the same, am I right? Parsing seems to be successful because the error is on the next line where I loop over the parse results.
You need to start a cornlp server, try:
with CoreNLPServer(port=9000) as server:
parser = CoreNLPParser(url=server.url)
parser.parse(...)
I'm sorry for the missing documentation, and for a shirt reply, as I'm typing on my phone.
@dimazest I really appreciate your help and quick replies. But errors stick :(
Traceback (most recent call last):
File "E:\classifier\feature_extraction.py", line 474, in <module>
print(feature_extraction(test_file_id))
File "E:\classifier\feature_extraction.py", line 135, in feature_extraction
with CoreNLPServer(port=9000) as server:
File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 81, in __init__
try_port(port)
File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 35, in try_port
sock.bind(('', port))
OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted
I tried googling it but I don't know how sockets work...
you can try other port: CoreNLPServer(port=9001), for example or just CoreNLPServer() then a free port should be chosen.
@dimazest I have tried 9001 up to 9010 and also empty parenthesis and this is what I get every time:
Traceback (most recent call last):
File "E:\classifier\feature_extraction.py", line 509, in <module>
print(feature_extraction(test_file_id))
File "E:\classifier\feature_extraction.py", line 136, in feature_extraction
with CoreNLPServer() as server:
File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 170, in __enter__
self.start()
File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 149, in start
'Could not connect to the server.'
nltk.parse.corenlp.CoreNLPServerError: Could not connect to the server.
Considering I am in China, I have kept my VPN on while running it, but still no luck. What might be my problem here?
Do you have corenlp .jars? You need to have a corenlp server running locally.
Can you run this example https://github.com/nltk/nltk/pull/1249#pullrequestreview-18096061
I have jars under the directory E:\classifier\stanford\stanford-corenlp-full-2016-10-31
and I suppose these are the ones that you refer to:
stanford-corenlp-3.7.0.jar
stanford-corenlp-3.7.0-javadoc.jar
stanford-corenlp-3.7.0-models.jar
stanford-corenlp-3.7.0-sources.jar
And the directory has been set to the CLASSPATH environment variable.
I can run the example in the windows command prompt and this is the output:
Python 3.5.3 |Anaconda custom (64-bit)| (default, Feb 22 2017, 21:28:42) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from nltk.parse.corenlp import *
>>> global server
>>> server = CoreNLPServer()
>>> server.start()
>>> parser = CoreNLPParser(url='http://localhost:9000')
>>> sent = 'the quick brown fox jumps over the lazy dog'
>>> parser.raw_parse(sent)
<list_iterator object at 0x000001F0EFED69E8>
>>> fox_parsed = next(parser.raw_parse(sent))
>>> fox_parsed.pretty_print()
ROOT
|
NP
_______________|_________
| NP
| _________|___
| | PP
| | ________|___
NP NP | NP
____|__________ | | _______|____
DT JJ JJ NN NNS IN DT JJ NN
| | | | | | | | |
the quick brown fox jumps over the lazy dog
And when I ran this line server.start()
a windows security alert pops up and I was thinking it was the firewall's fault all this time so I went to firewall settings to allow Java(TM) Platform SE binary through the firewall. I thought this would solve the problem but when I reopen the editor and run the code I still get the same error:
Traceback (most recent call last):
File "E:\classifier\feature_extraction.py", line 503, in <module>
print(feature_extraction(test_file_id))
File "E:\classifier\feature_extraction.py", line 130, in feature_extraction
with CoreNLPServer() as server:
File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 170, in __enter__
self.start()
File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 149, in start
'Could not connect to the server.'
nltk.parse.corenlp.CoreNLPServerError: Could not connect to the server.
Once you've started the server, can you access http://localhost:9000 in your browser?
You can also start the server by yourself, refer to https://stanfordnlp.github.io/CoreNLP/corenlp-server.html
One it's running, and you can access it via the browser, you should be able to use the parser:
parser = CoreNLPParser(url='http://localhost:9000')
# and so on
To find that out I run the example in the command prompt again but this time I get the familiar error
Python 3.5.3 |Anaconda custom (64-bit)| (default, Feb 22 2017, 21:28:42) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from nltk.parse.corenlp import *
>>> global server
>>> server = CoreNLPServer()
>>> server.start()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 149, in start
'Could not connect to the server.'
nltk.parse.corenlp.CoreNLPServerError: Could not connect to the server.
>>> server.start()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Admin\Anaconda3\lib\site-packages\nltk\parse\corenlp.py", line 149, in start
'Could not connect to the server.'
nltk.parse.corenlp.CoreNLPServerError: Could not connect to the server.
... And I have no idea what happened in between. I didn't change any configuration.
I wanted to move this project to Linux long ago but I keep getting the NLTK was unable to find ***.jar! Set the CLASSPATH environment variable
error on Linux. I've set CLASSPATH in /etc/environment
, /etc/profile
, and ~/.bash_profile
and even tried copying those jars to $JAVA_HOME/lib/
but the problem sticks. Should I open another issue?
Are you able to start a corenlp server from a terminal (not from python), check https://stanfordnlp.github.io/CoreNLP/corenlp-server.html for more details?
# Run the server using all jars in the current directory (e.g., the CoreNLP home directory)
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
E:\classifier\stanford\stanford-corenlp-full-2016-10-31>java -mx4g -cp "E:\classifier\stanford\stanford-corenlp-full-2016-10-31" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLPServer
E:\classifier\stanford\stanford-corenlp-full-2016-10-31>java -Xmx4g -cp "E:\classifier\stanford\stanford-corenlp-full-2016-10-31" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -serverProperties StanfordCoreNLP-chinese.properties -port 9000 -timeout 15000
Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLPServer
E:\classifier\stanford\stanford-corenlp-full-2016-10-31>java -mx4g -cp "E:\classifier\stanford\stanford-corenlp-full-2016-10-31" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -timeout 15000
Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLPServer
Seems not. Am I doing it right?
I finally get those Stanford modules to work on Linux. With the same lines of code, the CoreNLP server seems to start without problem but I get other errors on the line where I loop over the parse results.
Traceback (most recent call last):
File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/connection.py", line 141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/util/connection.py", line 83, in create_connection
raise err
File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 356, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/home/tesslo/anaconda3/lib/python3.5/http/client.py", line 1107, in request
self._send_request(method, url, body, headers)
File "/home/tesslo/anaconda3/lib/python3.5/http/client.py", line 1152, in _send_request
self.endheaders(body)
File "/home/tesslo/anaconda3/lib/python3.5/http/client.py", line 1103, in endheaders
self._send_output(message_body)
File "/home/tesslo/anaconda3/lib/python3.5/http/client.py", line 934, in _send_output
self.send(msg)
File "/home/tesslo/anaconda3/lib/python3.5/http/client.py", line 877, in send
self.connect()
File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/connection.py", line 166, in connect
conn = self._new_conn()
File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/connection.py", line 150, in _new_conn
self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x7f110a9c4940>: Failed to establish a new connection: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/adapters.py", line 438, in send
timeout=timeout
File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 649, in urlopen
_stacktrace=sys.exc_info()[2])
File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/packages/urllib3/util/retry.py", line 388, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=9000): Max retries exceeded with url: /?properties=%7B%22annotators%22%3A+%22tokenize%2Cpos%2Clemma%2Cssplit%2Cdepparse%22%2C+%22outputFormat%22%3A+%22json%22%2C+%22ssplit.isOneSentence%22%3A+%22true%22%7D (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f110a9c4940>: Failed to establish a new connection: [Errno 111] Connection refused',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/media/tesslo/classifier/feature_extraction.py", line 503, in <module>
print(feature_extraction(test_file_id))
File "/media/tesslo/classifier/feature_extraction.py", line 142, in feature_extraction
for line in dep_parse:
File "/home/tesslo/anaconda3/lib/python3.5/site-packages/nltk/parse/corenlp.py", line 279, in raw_parse_sents
parsed_data = self.api_call(sentence, properties=default_properties)
File "/home/tesslo/anaconda3/lib/python3.5/site-packages/nltk/parse/corenlp.py", line 247, in api_call
timeout=60,
File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/sessions.py", line 565, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/sessions.py", line 518, in request
resp = self.send(prep, **send_kwargs)
File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/sessions.py", line 639, in send
r = adapter.send(request, **kwargs)
File "/home/tesslo/anaconda3/lib/python3.5/site-packages/requests/adapters.py", line 502, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=9000): Max retries exceeded with url: /?properties=%7B%22annotators%22%3A+%22tokenize%2Cpos%2Clemma%2Cssplit%2Cdepparse%22%2C+%22outputFormat%22%3A+%22json%22%2C+%22ssplit.isOneSentence%22%3A+%22true%22%7D (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f110a9c4940>: Failed to establish a new connection: [Errno 111] Connection refused',))
FYI Linux and Windows share the same hardware.
Ok, there are two steps involved:
1) Start a CoreNLP Java process. There are two ways, I suggest you to start manually, that is using the java -Xmx4g -cp ...
command. Did you succeed in it? You should be able to access the server via a browser by visiting http://localhost:9000. You should see in the console output, what port is being used.
2) Once the server is running, you can create a CoreNLP python client parser = CoreNLPParser(url='http://localhost:9000')
. As you've started the CoreNLP Java server by yourself, you don't need to start it within the python session (don't run server = CoreNLPServer()
)
The error messages you post suggest that the CoreNLP Java server is not running.
It failed:
tesslo@TLU:/media/tesslo/classifier/stanford/stanford-corenlp-full-2016-10-31$ java -mx4g -cp "/media/tesslo/classifier/stanford/stanford-corenlp-full-2016-10-31" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLPServer
tesslo@TLU:/media/tesslo/classifier/stanford/stanford-corenlp-full-2016-10-31$ java -Xmx4g -cp "/media/tesslo/classifier/stanford/stanford-corenlp-full-2016-10-31" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -serverProperties StanfordCoreNLP-chinese.properties -port 9000 -timeout 15000
Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLPServer
tesslo@TLU:/media/tesslo/classifier/stanford/stanford-corenlp-full-2016-10-31$ java -mx4g -cp "/media/tesslo/classifier/stanford/stanford-corenlp-full-2016-10-31" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -timeout 15000
Error: Could not find or load main class edu.stanford.nlp.pipeline.StanfordCoreNLPServer
And http://localhost:9000 shows a ERR_CONNECTION_REFUSED error
Did you try it with "*"
as a classpath: java -mx4g -cp "*" ...
?
Hi there, seems that I also encountered with this problem. My sentence is:
'Maybe, a 2 21/2 foot cord?', u'And its of a cheaper quality than the part of the charger that the micro usb plugs into...'
And I tried to figure out, it seems that the '/' causes this error.
@caisinong have you tried using the new CoreNLP interface? See my comments above.
@dimazest Sorry for the delay. I did just now:
tesslo@TLU:/media/tesslo/classifier/stanford/stanford-corenlp-full-2016-10-31$ java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called ---
[main] INFO CoreNLP - setting default constituency parser
[main] INFO CoreNLP - warning: cannot find edu/stanford/nlp/models/srparser/englishSR.ser.gz
[main] INFO CoreNLP - using: edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz instead
[main] INFO CoreNLP - to use shift reduce parser download English models jar from:
[main] INFO CoreNLP - http://stanfordnlp.github.io/CoreNLP/download.html
[main] INFO CoreNLP - Threads: 2
[main] INFO CoreNLP - Starting server...
[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000
I'm able to visit http://localhost:9000 now, but back to the editor I'm still getting this error on the line starting the server:
Traceback (most recent call last):
File "/media/tesslo/classifier/feature_extraction.py", line 503, in <module>
print(feature_extraction(test_file_id))
File "/media/tesslo/classifier/feature_extraction.py", line 130, in feature_extraction
with CoreNLPServer() as server:
File "/home/tesslo/anaconda3/lib/python3.5/site-packages/nltk/parse/corenlp.py", line 170, in __enter__
self.start()
File "/home/tesslo/anaconda3/lib/python3.5/site-packages/nltk/parse/corenlp.py", line 149, in start
'Could not connect to the server.'
nltk.parse.corenlp.CoreNLPServerError: Could not connect to the server.
Once you've started a server manually, you don't need to start server in the code.
Keep the server running and instantiate the parser:
parser = CoreNLPParser(url='http://localhost:9000')
@dimazest The problem was solved. Thank you for your help and patience all this time!
I have similar experience. Starting the Stanford CorenNLP server in the code is messy and should only be used for testing purposes. Maybe we should somehow not expose that to the user.
I'm glad that things are working. Indeed, the server should be started outside of Python code.
Patched and resolved by new CoreNLP API =)
@dimazest Hi... if the text contains \
or /
solution for Assertion Error
is to only use Core NLP? i'm using stanford-parser-full-2017-06-09
Sentence used for parsing was Iraqi security forces drove Islamic State fighters from the centre of a town just south of the militants\' main stronghold of Mosul on Saturday and reached within a few km (miles) of an airport on the edge of the city, a senior commander said.
@kavin26 Yes, please use the nltk.parse.corenlp.CoreNLPParser
.
@alvations thank you so much :+1:
Most helpful comment
@dimazest The problem was solved. Thank you for your help and patience all this time!