Nltk: お願い助けて！ nltkとstandfordnlpの統合

作成日 2018年07月06日 · 3コメント · ソース: nltk/nltk

nltkとstandfordnlpの統合を使用したときに、混乱する問題がありました。
私の開発環境は次のとおりです。

nltk 3.3
Standford nlp stanford-segmenter 3.6.0 / 3.9.1
そして、私は次のようなStanfordSegmenterオブジェクトを作成しようとします：
StandfordNlpPath = self.projectPath + "\ Standford-nlp \ stanford-segmenter-2015-12-09"
stanfordSegmenter = StanfordSegmenter（
path_to_jar = StandfordNlpPath + "\ stanford-segmenter-3.6.0.jar"、
path_to_slf4j = StandfordNlpPath + "\ slf4j-api.jar"、
path_to_sihan_corpora_dict = StandfordNlpPath + "\ data-2015"、
path_to_model = StandfordNlpPath + "\ data-2015 \ pku.gz"、
path_to_dict = StandfordNlpPath + "\ data-2015 \ dict-chris6.ser.gz"）
その後、結果としてこのような失敗：
================================================== =========================
NLTKはstanford-segmenter.jarを見つけることができませんでした！クラスパスを設定する
環境変数。
詳細については、stanford-segmenter.jarを参照してください。

https://nlp.stanford.edu/software

あらゆる種類のjarファイルが正確に存在します。パスまたはStanfordSegmenterのクラスに配置したパラメーターに何か問題がありますか？この例は、nltk 3.3ドキュメントで見つけたものは非常に簡単で、「path_to_slf4j」という1つのパラメーターを入力するだけです。
だから、誰か、私を助けてください:-(！

resolved stanford api

ソース

libingnan54321

最も参考になるコメント

新しいCoreNLPParserインターフェースを使用してください。

最初にNLTKを更新します。

pip3 install -U nltk

それからまだターミナルにあります：

# Get the CoreNLP package
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip
unzip stanford-corenlp-full-2018-02-27.zip
cd stanford-corenlp-full-2018-02-27/

# Download the properties for chinese language
wget http://nlp.stanford.edu/software/stanford-chinese-corenlp-2018-02-27-models.jar 
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties 

# Download the properties for arabic
wget http://nlp.stanford.edu/software/stanford-arabic-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-arabic.properties

中国語の場合：

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-chinese.properties \
-preload tokenize,ssplit,pos,lemma,ner,parse \
-status_port 9001  -port 9001 -timeout 15000 &

次にPython3で：

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser('http://localhost:9001')
>>> list(parser.tokenize(u'我家没有电脑。'))
['我家', '没有', '电脑', '。']

アラビア語の場合：

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-arabic.properties \
-preload tokenize,ssplit,pos,parse \
-status_port 9005  -port 9005 -timeout 15000

最後に、Pythonを起動します。

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser(url='http://localhost:9005')
>>> text = u'انا حامل'
>>> parser.tokenize(text)
<generator object GenericCoreNLPParser.tokenize at 0x7f4a26181bf8>
>>> list(parser.tokenize(text))
['انا', 'حامل']

alvations 2018年08月23日

❤3

全てのコメント3件

@ libingnan54321なぜ最新の3.9.1バージョンを使用していないのですか？

最初にこれを試して、出力を提供していただけますか？

segmenter_jar_file = os.path.join(standfordNlpPath,'stanford-segmenter-2018-02-27/stanford-segmenter-3.9.1.jar')
assert(os.path.isfile(segmenter_jar_file))
stanfordSegmenter = StanfordSegmenter(
    path_to_jar=segmenter_jar_file,
)

Demetrio92 2018年07月09日

新しいCoreNLPParserインターフェースを使用してください。

最初にNLTKを更新します。

pip3 install -U nltk

それからまだターミナルにあります：

# Get the CoreNLP package
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip
unzip stanford-corenlp-full-2018-02-27.zip
cd stanford-corenlp-full-2018-02-27/

# Download the properties for chinese language
wget http://nlp.stanford.edu/software/stanford-chinese-corenlp-2018-02-27-models.jar 
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties 

# Download the properties for arabic
wget http://nlp.stanford.edu/software/stanford-arabic-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-arabic.properties

中国語の場合：

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-chinese.properties \
-preload tokenize,ssplit,pos,lemma,ner,parse \
-status_port 9001  -port 9001 -timeout 15000 &

次にPython3で：

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser('http://localhost:9001')
>>> list(parser.tokenize(u'我家没有电脑。'))
['我家', '没有', '电脑', '。']

アラビア語の場合：

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-arabic.properties \
-preload tokenize,ssplit,pos,parse \
-status_port 9005  -port 9005 -timeout 15000

最後に、Pythonを起動します。

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser(url='http://localhost:9005')
>>> text = u'انا حامل'
>>> parser.tokenize(text)
<generator object GenericCoreNLPParser.tokenize at 0x7f4a26181bf8>
>>> list(parser.tokenize(text))
['انا', 'حامل']

alvations 2018年08月23日

❤3

今のところ解決済みの問題を解決する=）
さらに問題がある場合は開いてください。

alvations 2018年08月23日

このページは役に立ちましたか？

0 / 5 - 0 評価

Nltk: お願い助けて！ nltkとstandfordnlpの統合

https://nlp.stanford.edu/software

最も参考になるコメント

全てのコメント3件

関連する問題