Nltk: OSError: Java command failed when using stanford parser example

Created on 25 Dec 2015  ·  18Comments  ·  Source: nltk/nltk

Hi,

I am trying to run the stanford parser example. E.g.

from nltk.parse.stanford import * 
dep_parser=StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
[parse.tree() for parse in dep_parser.raw_parse("The quick brown fox jumps over the lazy dog.")]

executing the last command results with an error:

OSError: Java command failed : [u'/usr/bin/java', u'-mx1000m', '-cp', ....

when I reproduce the same command on the command line, I get the error Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory

Therefore, after adding slf4j-api.jar to the classpath _on the commandline_, parsing is successful.

How can slf4j-api.jar be added to nltk classpath, so parsing will be successful?

Thank you!
Happy holidays

Most helpful comment

what is 'st' in the command 'stanford_dir = st._stanford_jar.rpartition('/')[0]'

All 18 comments

@yuvval Just to be sure are you using Stanford Parser version 2015-12-09? If so, this error occurs because of the new StanfordNLP using more dependencies than before. This is similar to #1237

You would have to wait for a while before #1237 is fixed and NLTK catches up with Standford tools.

The quick fix solution is to either:

  1. use the previous version 2015-04-20 from http://nlp.stanford.edu/software/stanford-parser-full-2015-04-20.zip and the NLTK API would work, see http://stackoverflow.com/questions/13883277/stanford-parser-and-nltk/34112695#34112695 or
  2. hack the stanford parser classpath:
from nltk.internals import find_jars_within_path
from nltk.parse.stanford import StanfordDependencyParser
dep_parser=StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
stanford_dir = st._stanford_jar.rpartition('/')[0]
# or in windows comment the line above and uncomment the one below:
#stanford_dir = st._stanford_jar.rpartition("\\")[0]
stanford_jars = find_jars_within_path(stanford_dir)
st.stanford_jar = ':'.join(stanford_jars)
[parse.tree() for parse in dep_parser.raw_parse("The quick brown fox jumps over the lazy dog.")]

Thank you! It works with the 2015-04-20 version.

Did the classpath hack also work?

I didn't try - I just deleted the latest version and downloaded the 2015-04-20 version.

Hi! I tried to follow your hack but for me there is no `StanfordDependencyParser``:

print(nltk.__version__)
from nltk.tag import StanfordDependencyParser

3.1
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-7-67bb74c3494a> in <module>()
----> 1 from nltk.tag import StanfordDependencyParser

ImportError: cannot import name 'StanfordDependencyParser'

Any idea how to solve this? I would really like to use the latest stanford version.

@methodds Pardon my typo, it's from nltk.parse.stanford import StanfordDependencyParser. Please see https://gist.github.com/alvations/e1df0ba227e542955a8a for detailed explanations.

Thank you for the link. Unfortunately, I can't get the environment variables to work on my linux mint os.

My bashrc looks like this:

export JAVA_HOME="/usr/lib/jvm/java-8-oracle/"
export PATH=$JAVA_HOME/bin:$PATH

export CLASSPATH="/home/cs/stanford_nlp/stanford-postagger-full-2015-04-20/stanford-postagger.jar:$CLASSPATH"

export CLASSPATH="/home/cs/stanford_nlp/stanford-ner-2015-04-20/stanford-ner.jar:$CLASSPATH"

export STANFORD_MODELS="/home/cs/stanford_nlp/stanford-ner-2015-04-20/classifiers:$STANFORD_MODELS"

export STANFORD_MODELS="/home/cs/stanford_nlp/stanford-postagger-full-2015-04-20/models:$STANFORD_MODELS"

Echoing the variables looks right:

echo $CLASSPATH
/home/cs/stanford_nlp/stanford-ner-2015-04-20/stanford-ner.jar:/home/cs/stanford_nlp/stanford-postagger-full-2015-04-20/stanford-postagger.jar

echo $STANFORD_MODELS
/home/cs/stanford_nlp/stanford-postagger-full-2015-04-20/models:/home/cs/stanford_nlp/stanford-ner-2015-04-20/classifiers

However (even after rebooting) NLTK still does not find the tagger:

from nltk.tag.stanford import StanfordPOSTagger
st = StanfordPOSTagger('english-bidirectional-distsim.tagger')
st.tag('What is the airspeed of an unladen swallow ?'.split())

NLTK was unable to find stanford-postagger.jar! Set the CLASSPATH
environment variable.

Do source .bashrc and it will work meanwhile take a look at http://apple.stackexchange.com/questions/12993/why-doesnt-bashrc-run-automatically to learn how bashrc works.

Thank you for your tip, but I did source .bashrc beforehand and it did not work. I tried it again and unfortunately it's still not working.

What is your Linux distribution and version? Can you do a lsb_release -a? Or are you working with a Mac?

Thank you for investigating. `lsb_release -a returns

No LSB modules are available.
Distributor ID: LinuxMint
Description:    Linux Mint 17.3 Rosa
Release:    17.3
Codename:   rosa
  • Where did you do the export commands? Which directory?
  • Where are you running your python scripts? Which directory?

Go to the place where you want to run your python script, do this: import os; print os.environ.

Then go to your home directory, start python and do the same: import os; print os.environ

Do you see the 2 sets of environment variables differ?

I guess you wanted me to use import os; print(os.environ), which did not reveal the environment variables that I exported in .bashrc. After that I copy pasted the content into .profile (in my home folder) and now it works perfectly. I have no idea why though =D.

Glad that .profile works, i think it's a OS distro issue. I would not recommend to store the environment variables as static, personally, I rerun them everytime I start my python scripts, so that I can sure that there's no conflict. Have fun with the NLTK API and Stanford tools!

Thank you :)

what is 'st' in the command 'stanford_dir = st._stanford_jar.rpartition('/')[0]'

I have the same question as hansen7

for few who have been looking what is st,
st = StanfordNERTagger(os.environ.get('STANFORD_MODELS'))
Ref: https://gist.github.com/manashmndl/810db10809cbc1209b34c7d25efe95d5#file-stanfordnertagger-py

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Chris00 picture Chris00  ·  3Comments

alvations picture alvations  ·  4Comments

vezeli picture vezeli  ·  3Comments

chaseireland picture chaseireland  ·  3Comments

BLKSerene picture BLKSerene  ·  4Comments