Nltk: Hilf mir bitte! nltk- und standford-nlp-integration

Erstellt am 6. Juli 2018 · 3Kommentare · Quelle: nltk/nltk

Es gab ein Problem, das mich verwirrte, als ich die Integration von nltk und standford nlp verwendete.
Meine Entwicklungsumgebungen wie folgt:

nltk 3.3
standford nlp stanford-segmenter 3.6.0 / 3.9.1
Und ich versuche, ein StanfordSegmenter-Objekt wie folgt zu erstellen:
standfordNlpPath = self.projectPath + "\standford-nlp\stanford-segmenter-2015-12-09"
stanfordSegmenter= StanfordSegmenter(
path_to_jar=standfordNlpPath + "\stanford-segmenter-3.6.0.jar",
path_to_slf4j=standfordNlpPath + "\slf4j-api.jar",
path_to_sihan_corpora_dict=standfordNlpPath + "\data-2015",
path_to_model=standfordNlpPath + "\data-2015\pku.gz",
path_to_dict=standfordNlpPath + "\data-2015\dict-chris6.ser.gz")
dann das Scheitern wie folgt als Ergebnis:
================================================ =========================
NLTK konnte stanford-segmenter.jar nicht finden! KLASSPATH einstellen
Umgebungsvariable.
Weitere Informationen auf stanford-segmenter.jar finden Sie unter:

https://nlp.stanford.edu/software

Alle Arten von Gläsern gibt es dort genau, ich bin mir ziemlich sicher, stimmt etwas mit meinem Pfad oder den Parametern, die ich in die Klasse von StanfordSegmenter eingegeben habe, nicht? Das Beispiel war ziemlich einfach, was ich im nltk 3.3-Dokument finde, sie haben nur einen Parameter "path_to_slf4j" eingefügt.
Also, jemand, hilf mir :-( !

resolved stanford api

Quelle

libingnan54321

Hilfreichster Kommentar

Bitte verwenden Sie die neue CoreNLPParser Benutzeroberfläche.

Aktualisieren Sie zuerst Ihren NLTK:

pip3 install -U nltk

Dann noch im Terminal:

# Get the CoreNLP package
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip
unzip stanford-corenlp-full-2018-02-27.zip
cd stanford-corenlp-full-2018-02-27/

# Download the properties for chinese language
wget http://nlp.stanford.edu/software/stanford-chinese-corenlp-2018-02-27-models.jar 
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties 

# Download the properties for arabic
wget http://nlp.stanford.edu/software/stanford-arabic-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-arabic.properties

Für Chinesen:

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-chinese.properties \
-preload tokenize,ssplit,pos,lemma,ner,parse \
-status_port 9001  -port 9001 -timeout 15000 &

Dann in Python3:

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser('http://localhost:9001')
>>> list(parser.tokenize(u'我家没有电脑。'))
['我家', '没有', '电脑', '。']

Für Arabisch:

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-arabic.properties \
-preload tokenize,ssplit,pos,parse \
-status_port 9005  -port 9005 -timeout 15000

Starten Sie schließlich Python:

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser(url='http://localhost:9005')
>>> text = u'انا حامل'
>>> parser.tokenize(text)
<generator object GenericCoreNLPParser.tokenize at 0x7f4a26181bf8>
>>> list(parser.tokenize(text))
['انا', 'حامل']

alvations am 23. Aug. 2018

❤3

Alle 3 Kommentare

@libingnan54321 Warum verwenden Sie nicht die neueste Version 3.9.1?

Können Sie das bitte zuerst versuchen und die Ausgabe bereitstellen?

segmenter_jar_file = os.path.join(standfordNlpPath,'stanford-segmenter-2018-02-27/stanford-segmenter-3.9.1.jar')
assert(os.path.isfile(segmenter_jar_file))
stanfordSegmenter = StanfordSegmenter(
    path_to_jar=segmenter_jar_file,
)

Demetrio92 am 9. Juli 2018

Bitte verwenden Sie die neue CoreNLPParser Benutzeroberfläche.

Aktualisieren Sie zuerst Ihren NLTK:

pip3 install -U nltk

Dann noch im Terminal:

# Get the CoreNLP package
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip
unzip stanford-corenlp-full-2018-02-27.zip
cd stanford-corenlp-full-2018-02-27/

# Download the properties for chinese language
wget http://nlp.stanford.edu/software/stanford-chinese-corenlp-2018-02-27-models.jar 
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties 

# Download the properties for arabic
wget http://nlp.stanford.edu/software/stanford-arabic-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-arabic.properties

Für Chinesen:

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-chinese.properties \
-preload tokenize,ssplit,pos,lemma,ner,parse \
-status_port 9001  -port 9001 -timeout 15000 &

Dann in Python3:

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser('http://localhost:9001')
>>> list(parser.tokenize(u'我家没有电脑。'))
['我家', '没有', '电脑', '。']

Für Arabisch:

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-arabic.properties \
-preload tokenize,ssplit,pos,parse \
-status_port 9005  -port 9005 -timeout 15000

Starten Sie schließlich Python:

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser(url='http://localhost:9005')
>>> text = u'انا حامل'
>>> parser.tokenize(text)
<generator object GenericCoreNLPParser.tokenize at 0x7f4a26181bf8>
>>> list(parser.tokenize(text))
['انا', 'حامل']

alvations am 23. Aug. 2018

❤3

Schließe das Problem vorerst als gelöst =)
Bei weiteren Problemen bitte öffnen.

alvations am 23. Aug. 2018

War diese Seite hilfreich?

0 / 5 - 0 Bewertungen

Nltk: Hilf mir bitte! nltk- und standford-nlp-integration

https://nlp.stanford.edu/software

Hilfreichster Kommentar

Alle 3 Kommentare

Verwandte Themen