Nltk: corenlp.py CoreNLPServer throws TypeError exception

Created on 22 Jun 2017 · 21Comments · Source: nltk/nltk

Hello,

Here's the code:

>>> s = nltk.parse.corenlp.CoreNLPServer(path_to_jar='/usr/local/share/stanford/stanford-corenlp-3.8.0.jar', path_to_models_jar='/usr/local/share/stanford/stanford-english-corenlp-2017-06-09-models.jar')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    s = nltk.parse.corenlp.CoreNLPServer(path_to_jar='/usr/local/share/stanford/stanford-corenlp-3.8.0.jar', path_to_models_jar='/usr/local/share/stanford/stanford-english-corenlp-2017-06-09-models.jar')
  File "/Users/adiep/feedback-sentiment/.env/src/nltk/nltk/parse/corenlp.py", line 69, in __init__
    key=lambda model_name: re.match(self._JAR, model_name)
TypeError: '>' not supported between instances of 'NoneType' and 'NoneType'

The max function is throwing this exception.

I think what's happening is that key=lambda model_name: re.match(self._JAR, model_name) is returning NoneType because it didn't match anything. So it fills the list of NoneType and max fails to sort it. I found that self._JAR and model_name evaluated to the following:

>>> type(re.match(r'stanford-corenlp-(\d+)\.(\d+)\.(\d+)\.jar', '/usr/local/share/stanford/stanford-corenlp-3.8.0.jar'))
<class 'NoneType'>

Thanks,

pleaseverify resolved stanford api

Source

f0lie

Most helpful comment

Good lord, now it's working.

So overall, I had to change re.match to re.search.group and kill a floating corenlpserver in the background.

Maybe there should be some code to detect if there is other corenlpservers running? Don't know if it's even worth doing.

f0lie on 27 Jun 2017

🎉2 😄2

All 21 comments

Is the max function suppose to be feed filenames and not full paths?

I tried to pass CoreNLPServer the folder instead and nltk couldn't find it for some reason.

f0lie on 22 Jun 2017

Hi,

i think so, what are jars?

dimazest on 23 Jun 2017

The jars are located at `/usr/local/share/stanford'

Here the ls of the relevant dir:

(.env) ➜  stanford l
total 2794816
-rwxr-xr-x   1 adiep  admin   5.3K Jun 21 17:20 CoreNLP-to-HTML.xsl*
-rw-r--r--   1 adiep  admin   1.6K Jun 21 17:20 LIBRARY-LICENSES
-rw-r--r--   1 adiep  admin    34K Jun 21 17:20 LICENSE.txt
-rw-r--r--   1 adiep  admin   769B Jun 21 17:20 Makefile
-rw-r--r--   1 adiep  admin   3.6K Jun 21 17:20 README.txt
-rw-r--r--   1 adiep  admin   2.3K Jun 21 17:20 SemgrexDemo.java
-rw-r--r--   1 adiep  admin   1.8K Jun 21 17:20 ShiftReduceDemo.java
-rw-r--r--   1 adiep  admin   5.7K Jun 21 17:20 StanfordCoreNlpDemo.java
-rw-r--r--   1 adiep  admin   195K Jun 21 17:20 StanfordDependenciesManual.pdf
-rw-r--r--   1 adiep  admin   3.9K Jun 21 17:20 build.xml
-rwxr-xr-x   1 adiep  admin   871B Jun 21 17:20 corenlp.sh*
-rw-r--r--   1 adiep  admin   1.2M Jun 21 17:20 ejml-0.23-src.zip
-rw-r--r--   1 adiep  admin   207K Jun 21 17:20 ejml-0.23.jar
-rw-r--r--   1 adiep  admin    89B Jun 21 17:20 input.txt
-rw-r--r--   1 adiep  admin    19K Jun 21 17:20 input.txt.xml
-rw-r--r--   1 adiep  admin    54K Jun 21 17:20 javax.json-api-1.0-sources.jar
-rw-r--r--   1 adiep  admin    83K Jun 21 17:20 javax.json.jar
-rw-r--r--   1 adiep  admin   756K Jun 21 17:20 joda-time-2.9-sources.jar
-rw-r--r--   1 adiep  admin   615K Jun 21 17:20 joda-time.jar
-rw-r--r--   1 adiep  admin   192K Jun 21 17:20 jollyday-0.4.9-sources.jar
-rw-r--r--   1 adiep  admin   209K Jun 21 17:20 jollyday.jar
drwxr-xr-x  10 adiep  admin   340B Jun 21 17:20 patterns/
-rw-r--r--   1 adiep  admin   5.3K Jun 21 17:20 pom.xml
-rw-r--r--   1 adiep  admin   1.3M Jun 21 17:20 protobuf.jar
-rw-r--r--   1 adiep  admin    31K Jun 21 17:20 slf4j-api.jar
-rw-r--r--   1 adiep  admin    10K Jun 21 17:20 slf4j-simple.jar
-rw-r--r--   1 adiep  admin   9.6M Jun 21 17:20 stanford-corenlp-3.8.0-javadoc.jar
-rw-r--r--   1 adiep  admin   346M Jun 21 17:20 stanford-corenlp-3.8.0-models.jar
-rw-r--r--   1 adiep  admin   5.0M Jun 21 17:20 stanford-corenlp-3.8.0-sources.jar
-rw-r--r--   1 adiep  admin   7.6M Jun 21 17:20 stanford-corenlp-3.8.0.jar
-rw-r--r--   1 adiep  admin   991M Jun 21 17:37 stanford-english-corenlp-2017-06-09-models.jar
drwxr-xr-x   5 adiep  admin   170B Jun 21 17:20 sutime/
drwxr-xr-x   6 adiep  admin   204B Jun 21 17:20 tokensregex/
-rw-r--r--   1 adiep  admin   656K Jun 21 17:20 xom-1.2.10-src.jar
-rw-r--r--   1 adiep  admin   306K Jun 21 17:20 xom.jar

I don't get why it's failing to find jar in the folder.

f0lie on 23 Jun 2017

Could you set the classpath env variable before starting the server:

os.environ['CLASSPATH'] = '/usr/local/share/stanford'

another solution could be:

os.environ['STANFORD_PARSER'] = '/usr/local/share/stanford'
os.environ['STANFORD_MODELS'] = '/usr/local/share/stanford'

in my local setup I set all three variables.

dimazest on 25 Jun 2017

I added all three to my zsh and I am still running into the same issues.

f0lie on 27 Jun 2017

Also, when I run tox tests, I see the same exact errors.

f0lie on 27 Jun 2017

Don't know how much help this is, but I added a line to print the list of jar files that it compares to.

['/usr/local/share/stanford/stanford-corenlp-3.8.0.jar', '/usr/local/share/stanford/stanford-corenlp-3.8.0.jar']
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    s = nltk.parse.corenlp.CoreNLPServer()
  File "/Users/adiep/feedback-sentiment/.env/src/nltk/nltk/parse/corenlp.py", line 70, in __init__
    key=lambda model_name: re.match(self._JAR, model_name)
TypeError: '>' not supported between instances of 'NoneType' and 'NoneType'

f0lie on 27 Jun 2017

re.match is failing to match _JAR with the found jar files.
```>>> import nltk

import re
s = nltk.parse.corenlp.CoreNLPServer()
['/usr/local/share/stanford/stanford-corenlp-3.8.0.jar', '/usr/local/share/stanford/stanford-corenlp-3.8.0.jar']
Traceback (most recent call last):
File "", line 1, in
s = nltk.parse.corenlp.CoreNLPServer()
File "/Users/adiep/feedback-sentiment/.env/src/nltk/nltk/parse/corenlp.py", line 70, in __init__
key=lambda model_name: re.match(self._JAR, model_name)
TypeError: '>' not supported between instances of 'NoneType' and 'NoneType'
jar = '/usr/local/share/stanford/stanford-corenlp-3.8.0.jar'
r_jar = nltk.parse.corenlp.CoreNLPServer._JAR
re.match(r_jar, jar)

```

f0lie on 27 Jun 2017

Seems like what happening is that re.match is being feed the full path, it searches for the jar file at the beginning of the string, and always fails. I think it is suppose to be feed just the names of the jar files.

f0lie on 27 Jun 2017

👍1

Alright the fix is pretty simple.

I changed re.match(...) to re.search(...).group() so instead of it only looking at the start of the string, it looks everywhere.

f0lie on 27 Jun 2017

Ran into another issue.

I think this is connected to issue I had before with CoreNLPParse using the proxy server.

>>> s = nltk.parse.corenlp.CoreNLPServer()
>>> s.url
'http://localhost:9000'
>>> s.start()
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    s.start()
  File "/Users/adiep/feedback-sentiment/.env/src/nltk/nltk/parse/corenlp.py", line 149, in start
    'Could not connect to the server.'
nltk.parse.corenlp.CoreNLPServerError: Could not connect to the server.

f0lie on 27 Jun 2017

Ok, so it was the proxy problem. I found that you can set NO_PROXY='localhost'.

Now I have the problem with the server refusing connects.

I am not sure why, but when I init the server object, it starts a server at port 9000.
Edit: Turns out I had a corenlp server floating in the background. I just killed it and now it's pinging port 9000.
```>>> import nltk

s = nltk.parse.corenlp.CoreNLPServer(verbose=True)
[Found stanford-corenlp-(\d+).(\d+).(\d+).jar: /usr/local/share/stanford/stanford-corenlp-3.8.0.jar]
[Found stanford-corenlp-(\d+).(\d+).(\d+).jar: /usr/local/share/stanford/stanford-corenlp-3.8.0.jar]
[Found stanford-corenlp-(\d+).(\d+).(\d+)-models.jar: /usr/local/share/stanford/stanford-corenlp-3.8.0-models.jar]
s.start()
[Found java: /usr/bin/java]
[Found java: /usr/bin/java]
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
HTTPConnectionPool(host='localhost', port=59023): Max retries exceeded with url: /live (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 61] Connection refused',))
Traceback (most recent call last):
File "", line 1, in
s.start()
File "/Users/adiep/feedback-sentiment/.env/src/nltk/nltk/parse/corenlp.py", line 150, in start
'Could not connect to the server.'
nltk.parse.corenlp.CoreNLPServerError: Could not connect to the server.
```

f0lie on 27 Jun 2017

👍1

Good lord, now it's working.

So overall, I had to change re.match to re.search.group and kill a floating corenlpserver in the background.

Maybe there should be some code to detect if there is other corenlpservers running? Don't know if it's even worth doing.

f0lie on 27 Jun 2017

🎉2 😄2

Seems like if you don't use with clauses, create servers objects, and don't stop them, python deletes the object but not the actual server. What I was doing is that I was starting servers in my interpreter and forgetting to run s.stop().

f0lie on 27 Jun 2017

❤1

We need to somehow document how the Stanford CoreNLP API works so that we tell people not to start the server inside Python.

Alternatively, I think if we use the __all__ trick, maybe we can allow limited access to what's inside the nltk.parse.corenlp and prevent users from access the CoreNLPServer. The lowest level object that a user should use is GenericCoreNLPParser.

BTW, @dimazest is the CoreNLPServer use only for the test environment?

alvations on 17 Oct 2017

Yes, it's used only in tests.

dimazest on 17 Oct 2017

Hello - I think I am having an issue related to the one noted here.

I recently started using this package and am using the StanfordNERTagger, which returns a deprecation error.

/usr/local/lib/python2.7/dist-packages/nltk/tag/stanford.py:183: DeprecationWarning: 
The StanfordTokenizer will be deprecated in version 3.2.5.
Please use nltk.tag.corenlp.CoreNLPPOSTagger or nltk.tag.corenlp.CoreNLPNERTagger instead.

Note that I think there is a typo in this warning. I had to import from nltk.tag.stanford to find the CoreNLPNERTagger.

So I've attempted to switch to the new function and I cannot seem to get it to work.
Previously I had:

    stanfordClassifier = '/root/stanford-ner-2017-06-09/classifiers/english.muc.7class.distsim.crf.ser.gz'
    stanfordNerPath = '/root/stanford-ner-2017-06-09/stanford-ner.jar'
    st = StanfordNERTagger(stanfordClassifier, stanfordNerPath, encoding='utf8')
    tokens = nltk.tokenize.word_tokenize(text) 
    ner_tags = st.tag(tokens)

Which I have now changed to

    stanfordClassifier = '/root/stanford-ner-2017-06-09/classifiers/english.muc.7class.distsim.crf.ser.gz'
    stanfordNerPath = '/root/stanford-ner-2017-06-09/stanford-ner.jar'
    with CoreNLPServer(stanfordNerPath, stanfordClassifier) as server:
        st = CoreNLPNERTagger(url=server.url)
        tokens = nltk.tokenize.word_tokenize(text) 
        ner_tags = st.tag(tokens)

However, this code returns this error: CoreNLPServerError: Could not connect to the server.

I can't find documentation anywhere for how I am supposed to call CoreNLPNERTagger and how that call differs from the deprecated StanfordNERTagger.

Any help is much appreciated. Thanks!

PhysB on 31 Oct 2017

Hi, I'm sorry that there is no documentation. The API has changed, here are is what you need to do. Refer to #1510, it contains a long discussion on how to start a server.

The main change: NLTK does not start CoreNLP server, you need to start it. Refer to https://stanfordnlp.github.io/CoreNLP/corenlp-server.html for a detailed explanation, but the command should be something like:

# Run the server using all jars in the current directory (e.g., the CoreNLP home directory)
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

Open http://localhost:9000/ to make sure that the server is running.

Now you are ready to use NLTK corenlp client:

tagger = CoreNLPNERTagger(url='http://localhost:9000')
tokens = tagger.tag(text)

Please let me know if it works.

dimazest on 31 Oct 2017

👍2

Yes! This works. Thank you very much for the help.

PhysB on 10 Nov 2017

Thanks @dimazest, @alvations

stevenbird on 1 Apr 2018

For some reason, I can't connect to the server. Can anyone help me out? Here's the code I'm running.

from nltk.parse.corenlp import CoreNLPServer
server = CoreNLPServer("stanford-corenlp-4.0.0.jar","stanford-corenlp-4.0.0-models.jar",verbose=True)
server.start()

from nltk.parse.corenlpnltk.pa import CoreNLPParser
parser = CoreNLPParser()
parse = next(parser.raw_parse("I put the book in the box on the table."))

server.stop()

I am running it in a file inside the folder that I downloaded for CoreNLP so that I don't have to worry about abs paths or anything.
Here's what it outputs:

[Found java: /usr/bin/java]
[Found java: /usr/bin/java]
Traceback (most recent call last):
File "/Users/benstevens/Desktop/Bernstein2/Ben/stanford-corenlp-4.0.0/BenTest.py", line 4, in
server.start()
File "/Users/benstevens/Library/Python/3.8/lib/python/site-packages/nltk/parse/corenlp.py", line 153, in start
raise CoreNLPServerError("Could not connect to the server.")
nltk.parse.corenlp.CoreNLPServerError: Could not connect to the server.

I would greatly appreciate any help.