Nltk: 请帮帮我！ nltk 和standford nlp 集成

创建于 2018-07-06 · 3评论 · 资料来源: nltk/nltk

当我使用 nltk 和 standford nlp 集成时，有一个问题让我感到困惑。
我的开发环境是这样的：

NLTK 3.3
斯坦福 NLP 斯坦福分段器 3.6.0 / 3.9.1
我尝试创建一个这样的 StanfordSegmenter 对象：
standfordNlpPath = self.projectPath + "\standford-nlp\stanford-segmenter-2015-12-09"
stanfordSegmenter=StanfordSegmenter(
path_to_jar=standfordNlpPath + "\stanford-segmenter-3.6.0.jar",
path_to_slf4j=standfordNlpPath + "\slf4j-api.jar",
path_to_sihan_corpora_dict=standfordNlpPath + "\data-2015",
path_to_model=standfordNlpPath + "\data-2015\pku.gz",
path_to_dict=standfordNlpPath + "\data-2015\dict-chris6.ser.gz")
那么像这样的失败结果：
================================================== ==========================
NLTK 无法找到 stanford-segmenter.jar！设置类路径
环境变量。
有关 stanford-segmenter.jar 的更多信息，请参阅：

https://nlp.stanford.edu/software

我很确定那里确实存在各种 jar，我的路径或我在 StanfordSegmenter 类中放置的参数有什么问题吗？我在 nltk 3.3 文档中找到的示例非常简单，他们只是放入了一个参数“path_to_slf4j”。
所以，有人，帮帮我:-(！

resolved stanford api

资料来源

libingnan54321

最有用的评论

请使用新的CoreNLPParser界面。

首先更新您的 NLTK：

pip3 install -U nltk

然后仍然在终端：

# Get the CoreNLP package
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip
unzip stanford-corenlp-full-2018-02-27.zip
cd stanford-corenlp-full-2018-02-27/

# Download the properties for chinese language
wget http://nlp.stanford.edu/software/stanford-chinese-corenlp-2018-02-27-models.jar 
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties 

# Download the properties for arabic
wget http://nlp.stanford.edu/software/stanford-arabic-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-arabic.properties

对于中国人：

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-chinese.properties \
-preload tokenize,ssplit,pos,lemma,ner,parse \
-status_port 9001  -port 9001 -timeout 15000 &

然后在 Python3 中：

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser('http://localhost:9001')
>>> list(parser.tokenize(u'我家没有电脑。'))
['我家', '没有', '电脑', '。']

对于阿拉伯语：

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-arabic.properties \
-preload tokenize,ssplit,pos,parse \
-status_port 9005  -port 9005 -timeout 15000

最后，启动 Python：

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser(url='http://localhost:9005')
>>> text = u'انا حامل'
>>> parser.tokenize(text)
<generator object GenericCoreNLPParser.tokenize at 0x7f4a26181bf8>
>>> list(parser.tokenize(text))
['انا', 'حامل']

alvations 于 2018-08-23

❤3

所有3条评论

@libingnan54321为什么不使用最新的 3.9.1 版本？

你能先试试这个并提供输出吗？

segmenter_jar_file = os.path.join(standfordNlpPath,'stanford-segmenter-2018-02-27/stanford-segmenter-3.9.1.jar')
assert(os.path.isfile(segmenter_jar_file))
stanfordSegmenter = StanfordSegmenter(
    path_to_jar=segmenter_jar_file,
)

Demetrio92 于 2018-07-09

请使用新的CoreNLPParser界面。

首先更新您的 NLTK：

pip3 install -U nltk

然后仍然在终端：

# Get the CoreNLP package
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip
unzip stanford-corenlp-full-2018-02-27.zip
cd stanford-corenlp-full-2018-02-27/

# Download the properties for chinese language
wget http://nlp.stanford.edu/software/stanford-chinese-corenlp-2018-02-27-models.jar 
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties 

# Download the properties for arabic
wget http://nlp.stanford.edu/software/stanford-arabic-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-arabic.properties

对于中国人：

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-chinese.properties \
-preload tokenize,ssplit,pos,lemma,ner,parse \
-status_port 9001  -port 9001 -timeout 15000 &

然后在 Python3 中：

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser('http://localhost:9001')
>>> list(parser.tokenize(u'我家没有电脑。'))
['我家', '没有', '电脑', '。']

对于阿拉伯语：

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-arabic.properties \
-preload tokenize,ssplit,pos,parse \
-status_port 9005  -port 9005 -timeout 15000

最后，启动 Python：

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser(url='http://localhost:9005')
>>> text = u'انا حامل'
>>> parser.tokenize(text)
<generator object GenericCoreNLPParser.tokenize at 0x7f4a26181bf8>
>>> list(parser.tokenize(text))
['انا', 'حامل']

alvations 于 2018-08-23

❤3

暂时关闭问题=）
如果还有问题，请打开。

alvations 于 2018-08-23

此页面是否有帮助？

0 / 5 - 0 等级

Nltk: 请帮帮我！ nltk 和standford nlp 集成

https://nlp.stanford.edu/software

最有用的评论

所有3条评论

相关问题