Hello,
Here's the code:
>>> s = nltk.parse.corenlp.CoreNLPServer(path_to_jar='/usr/local/share/stanford/stanford-corenlp-3.8.0.jar', path_to_models_jar='/usr/local/share/stanford/stanford-english-corenlp-2017-06-09-models.jar')
Traceback (most recent call last):
File "<input>", line 1, in <module>
s = nltk.parse.corenlp.CoreNLPServer(path_to_jar='/usr/local/share/stanford/stanford-corenlp-3.8.0.jar', path_to_models_jar='/usr/local/share/stanford/stanford-english-corenlp-2017-06-09-models.jar')
File "/Users/adiep/feedback-sentiment/.env/src/nltk/nltk/parse/corenlp.py", line 69, in __init__
key=lambda model_name: re.match(self._JAR, model_name)
TypeError: '>' not supported between instances of 'NoneType' and 'NoneType'
The max
function is throwing this exception.
I think what's happening is that key=lambda model_name: re.match(self._JAR, model_name)
is returning NoneType because it didn't match anything. So it fills the list of NoneType
and max
fails to sort it. I found that self._JAR
and model_name
evaluated to the following:
>>> type(re.match(r'stanford-corenlp-(\d+)\.(\d+)\.(\d+)\.jar', '/usr/local/share/stanford/stanford-corenlp-3.8.0.jar'))
<class 'NoneType'>
Thanks,
Is the max
function suppose to be feed filenames and not full paths?
I tried to pass CoreNLPServer the folder instead and nltk couldn't find it for some reason.
Hi,
i think so, what are jars
?
The jars are located at `/usr/local/share/stanford'
Here the ls of the relevant dir:
(.env) ➜ stanford l
total 2794816
-rwxr-xr-x 1 adiep admin 5.3K Jun 21 17:20 CoreNLP-to-HTML.xsl*
-rw-r--r-- 1 adiep admin 1.6K Jun 21 17:20 LIBRARY-LICENSES
-rw-r--r-- 1 adiep admin 34K Jun 21 17:20 LICENSE.txt
-rw-r--r-- 1 adiep admin 769B Jun 21 17:20 Makefile
-rw-r--r-- 1 adiep admin 3.6K Jun 21 17:20 README.txt
-rw-r--r-- 1 adiep admin 2.3K Jun 21 17:20 SemgrexDemo.java
-rw-r--r-- 1 adiep admin 1.8K Jun 21 17:20 ShiftReduceDemo.java
-rw-r--r-- 1 adiep admin 5.7K Jun 21 17:20 StanfordCoreNlpDemo.java
-rw-r--r-- 1 adiep admin 195K Jun 21 17:20 StanfordDependenciesManual.pdf
-rw-r--r-- 1 adiep admin 3.9K Jun 21 17:20 build.xml
-rwxr-xr-x 1 adiep admin 871B Jun 21 17:20 corenlp.sh*
-rw-r--r-- 1 adiep admin 1.2M Jun 21 17:20 ejml-0.23-src.zip
-rw-r--r-- 1 adiep admin 207K Jun 21 17:20 ejml-0.23.jar
-rw-r--r-- 1 adiep admin 89B Jun 21 17:20 input.txt
-rw-r--r-- 1 adiep admin 19K Jun 21 17:20 input.txt.xml
-rw-r--r-- 1 adiep admin 54K Jun 21 17:20 javax.json-api-1.0-sources.jar
-rw-r--r-- 1 adiep admin 83K Jun 21 17:20 javax.json.jar
-rw-r--r-- 1 adiep admin 756K Jun 21 17:20 joda-time-2.9-sources.jar
-rw-r--r-- 1 adiep admin 615K Jun 21 17:20 joda-time.jar
-rw-r--r-- 1 adiep admin 192K Jun 21 17:20 jollyday-0.4.9-sources.jar
-rw-r--r-- 1 adiep admin 209K Jun 21 17:20 jollyday.jar
drwxr-xr-x 10 adiep admin 340B Jun 21 17:20 patterns/
-rw-r--r-- 1 adiep admin 5.3K Jun 21 17:20 pom.xml
-rw-r--r-- 1 adiep admin 1.3M Jun 21 17:20 protobuf.jar
-rw-r--r-- 1 adiep admin 31K Jun 21 17:20 slf4j-api.jar
-rw-r--r-- 1 adiep admin 10K Jun 21 17:20 slf4j-simple.jar
-rw-r--r-- 1 adiep admin 9.6M Jun 21 17:20 stanford-corenlp-3.8.0-javadoc.jar
-rw-r--r-- 1 adiep admin 346M Jun 21 17:20 stanford-corenlp-3.8.0-models.jar
-rw-r--r-- 1 adiep admin 5.0M Jun 21 17:20 stanford-corenlp-3.8.0-sources.jar
-rw-r--r-- 1 adiep admin 7.6M Jun 21 17:20 stanford-corenlp-3.8.0.jar
-rw-r--r-- 1 adiep admin 991M Jun 21 17:37 stanford-english-corenlp-2017-06-09-models.jar
drwxr-xr-x 5 adiep admin 170B Jun 21 17:20 sutime/
drwxr-xr-x 6 adiep admin 204B Jun 21 17:20 tokensregex/
-rw-r--r-- 1 adiep admin 656K Jun 21 17:20 xom-1.2.10-src.jar
-rw-r--r-- 1 adiep admin 306K Jun 21 17:20 xom.jar
I don't get why it's failing to find jar in the folder.
Could you set the classpath env variable before starting the server:
os.environ['CLASSPATH'] = '/usr/local/share/stanford'
another solution could be:
os.environ['STANFORD_PARSER'] = '/usr/local/share/stanford'
os.environ['STANFORD_MODELS'] = '/usr/local/share/stanford'
in my local setup I set all three variables.
I added all three to my zsh and I am still running into the same issues.
Also, when I run tox tests, I see the same exact errors.
Don't know how much help this is, but I added a line to print the list of jar files that it compares to.
['/usr/local/share/stanford/stanford-corenlp-3.8.0.jar', '/usr/local/share/stanford/stanford-corenlp-3.8.0.jar']
Traceback (most recent call last):
File "<input>", line 1, in <module>
s = nltk.parse.corenlp.CoreNLPServer()
File "/Users/adiep/feedback-sentiment/.env/src/nltk/nltk/parse/corenlp.py", line 70, in __init__
key=lambda model_name: re.match(self._JAR, model_name)
TypeError: '>' not supported between instances of 'NoneType' and 'NoneType'
re.match is failing to match _JAR with the found jar files.
```>>> import nltk
import re
s = nltk.parse.corenlp.CoreNLPServer()
['/usr/local/share/stanford/stanford-corenlp-3.8.0.jar', '/usr/local/share/stanford/stanford-corenlp-3.8.0.jar']
Traceback (most recent call last):
File "", line 1, in
s = nltk.parse.corenlp.CoreNLPServer()
File "/Users/adiep/feedback-sentiment/.env/src/nltk/nltk/parse/corenlp.py", line 70, in __init__
key=lambda model_name: re.match(self._JAR, model_name)
TypeError: '>' not supported between instances of 'NoneType' and 'NoneType'
jar = '/usr/local/share/stanford/stanford-corenlp-3.8.0.jar'
r_jar = nltk.parse.corenlp.CoreNLPServer._JAR
re.match(r_jar, jar)```
Seems like what happening is that re.match is being feed the full path, it searches for the jar file at the beginning of the string, and always fails. I think it is suppose to be feed just the names of the jar files.
Alright the fix is pretty simple.
I changed re.match(...) to re.search(...).group() so instead of it only looking at the start of the string, it looks everywhere.
Ran into another issue.
I think this is connected to issue I had before with CoreNLPParse using the proxy server.
>>> s = nltk.parse.corenlp.CoreNLPServer()
>>> s.url
'http://localhost:9000'
>>> s.start()
Traceback (most recent call last):
File "<input>", line 1, in <module>
s.start()
File "/Users/adiep/feedback-sentiment/.env/src/nltk/nltk/parse/corenlp.py", line 149, in start
'Could not connect to the server.'
nltk.parse.corenlp.CoreNLPServerError: Could not connect to the server.
Ok, so it was the proxy problem. I found that you can set NO_PROXY='localhost'
.
Now I have the problem with the server refusing connects.
I am not sure why, but when I init the server object, it starts a server at port 9000.
Edit: Turns out I had a corenlp server floating in the background. I just killed it and now it's pinging port 9000.
```>>> import nltk
s = nltk.parse.corenlp.CoreNLPServer(verbose=True)
[Found stanford-corenlp-(\d+).(\d+).(\d+).jar: /usr/local/share/stanford/stanford-corenlp-3.8.0.jar]
[Found stanford-corenlp-(\d+).(\d+).(\d+).jar: /usr/local/share/stanford/stanford-corenlp-3.8.0.jar]
[Found stanford-corenlp-(\d+).(\d+).(\d+)-models.jar: /usr/local/share/stanford/stanford-corenlp-3.8.0-models.jar]
s.start()
[Found java: /usr/bin/java]
[Found java: /usr/bin/java]
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
Traceback (most recent call last):
File "", line 1, in
s.start()
File "/Users/adiep/feedback-sentiment/.env/src/nltk/nltk/parse/corenlp.py", line 150, in start
'Could not connect to the server.'
nltk.parse.corenlp.CoreNLPServerError: Could not connect to the server.
```
Good lord, now it's working.
So overall, I had to change re.match to re.search.group and kill a floating corenlpserver in the background.
Maybe there should be some code to detect if there is other corenlpservers running? Don't know if it's even worth doing.
Seems like if you don't use with clauses, create servers objects, and don't stop them, python deletes the object but not the actual server. What I was doing is that I was starting servers in my interpreter and forgetting to run s.stop()
.
We need to somehow document how the Stanford CoreNLP API works so that we tell people not to start the server inside Python.
Alternatively, I think if we use the __all__
trick, maybe we can allow limited access to what's inside the nltk.parse.corenlp
and prevent users from access the CoreNLPServer
. The lowest level object that a user should use is GenericCoreNLPParser
.
BTW, @dimazest is the CoreNLPServer
use only for the test environment?
Yes, it's used only in tests.
Hello - I think I am having an issue related to the one noted here.
I recently started using this package and am using the StanfordNERTagger, which returns a deprecation error.
/usr/local/lib/python2.7/dist-packages/nltk/tag/stanford.py:183: DeprecationWarning:
The StanfordTokenizer will be deprecated in version 3.2.5.
Please use nltk.tag.corenlp.CoreNLPPOSTagger or nltk.tag.corenlp.CoreNLPNERTagger instead.
Note that I think there is a typo in this warning. I had to import from nltk.tag.stanford
to find the CoreNLPNERTagger
.
So I've attempted to switch to the new function and I cannot seem to get it to work.
Previously I had:
stanfordClassifier = '/root/stanford-ner-2017-06-09/classifiers/english.muc.7class.distsim.crf.ser.gz'
stanfordNerPath = '/root/stanford-ner-2017-06-09/stanford-ner.jar'
st = StanfordNERTagger(stanfordClassifier, stanfordNerPath, encoding='utf8')
tokens = nltk.tokenize.word_tokenize(text)
ner_tags = st.tag(tokens)
Which I have now changed to
stanfordClassifier = '/root/stanford-ner-2017-06-09/classifiers/english.muc.7class.distsim.crf.ser.gz'
stanfordNerPath = '/root/stanford-ner-2017-06-09/stanford-ner.jar'
with CoreNLPServer(stanfordNerPath, stanfordClassifier) as server:
st = CoreNLPNERTagger(url=server.url)
tokens = nltk.tokenize.word_tokenize(text)
ner_tags = st.tag(tokens)
However, this code returns this error: CoreNLPServerError: Could not connect to the server.
I can't find documentation anywhere for how I am supposed to call CoreNLPNERTagger
and how that call differs from the deprecated StanfordNERTagger
.
Any help is much appreciated. Thanks!
Hi, I'm sorry that there is no documentation. The API has changed, here are is what you need to do. Refer to #1510, it contains a long discussion on how to start a server.
The main change: NLTK does not start CoreNLP server, you need to start it. Refer to https://stanfordnlp.github.io/CoreNLP/corenlp-server.html for a detailed explanation, but the command should be something like:
# Run the server using all jars in the current directory (e.g., the CoreNLP home directory)
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
Open http://localhost:9000/ to make sure that the server is running.
Now you are ready to use NLTK corenlp client:
tagger = CoreNLPNERTagger(url='http://localhost:9000')
tokens = tagger.tag(text)
Please let me know if it works.
Yes! This works. Thank you very much for the help.
Thanks @dimazest, @alvations
For some reason, I can't connect to the server. Can anyone help me out? Here's the code I'm running.
from nltk.parse.corenlp import CoreNLPServer
server = CoreNLPServer("stanford-corenlp-4.0.0.jar","stanford-corenlp-4.0.0-models.jar",verbose=True)
server.start()
from nltk.parse.corenlpnltk.pa import CoreNLPParser
parser = CoreNLPParser()
parse = next(parser.raw_parse("I put the book in the box on the table."))
server.stop()
I am running it in a file inside the folder that I downloaded for CoreNLP so that I don't have to worry about abs paths or anything.
Here's what it outputs:
[Found java: /usr/bin/java]
[Found java: /usr/bin/java]
Traceback (most recent call last):
File "/Users/benstevens/Desktop/Bernstein2/Ben/stanford-corenlp-4.0.0/BenTest.py", line 4, in
server.start()
File "/Users/benstevens/Library/Python/3.8/lib/python/site-packages/nltk/parse/corenlp.py", line 153, in start
raise CoreNLPServerError("Could not connect to the server.")
nltk.parse.corenlp.CoreNLPServerError: Could not connect to the server.
I would greatly appreciate any help.
Most helpful comment
Good lord, now it's working.
So overall, I had to change re.match to re.search.group and kill a floating corenlpserver in the background.
Maybe there should be some code to detect if there is other corenlpservers running? Don't know if it's even worth doing.