Nltk: Aidez moi s'il vous plait ! Intégration de nltk et standford nlp

Créé le 6 juil. 2018 · 3Commentaires · Source: nltk/nltk

Un problème m'a rendu confus lorsque j'ai utilisé l'intégration nltk et standford nlp.
Mes environnements de développement comme celui-ci :

nltk 3.3
standford nlp stanford-segmenter 3.6.0 / 3.9.1
Et j'essaie de créer un objet StanfordSegmenter comme ceci :
standfordNlpPath = self.projectPath + "\standford-nlp\stanford-segmenter-2015-12-09"
stanfordSegmenter=Segmenter Stanford(
path_to_jar=standfordNlpPath + "\stanford-segmenter-3.6.0.jar",
path_to_slf4j=standfordNlpPath + "\slf4j-api.jar",
path_to_sihan_corpora_dict=standfordNlpPath + "\data-2015",
path_to_model=standfordNlpPath + "\data-2015\pku.gz",
path_to_dict=standfordNlpPath + "\data-2015\dict-chris6.ser.gz")
alors l'échec comme ceci comme résultat :
================================================== =========================
NLTK n'a pas pu trouver stanford-segmenter.jar ! Définir le CLASSPATH
variable d'environnement.
Pour plus d'informations, sur stanford-segmenter.jar, voir :

https://nlp.stanford.edu/software

Toutes sortes de pots existent exactement là-bas, j'en suis sûr, y a-t-il quelque chose qui ne va pas avec mon chemin ou les paramètres que j'ai mis dans la classe de StanfordSegmenter? L'exemple était assez simple ce que je trouve dans le document nltk 3.3, ils ont juste mis un paramètre que "path_to_slf4j".
Alors, quelqu'un, aidez-moi :-( !

resolved stanford api

Source

libingnan54321

Commentaire le plus utile

Veuillez utiliser la nouvelle interface CoreNLPParser .

Mettez d'abord à jour votre NLTK :

pip3 install -U nltk

Puis toujours dans le terminal :

# Get the CoreNLP package
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip
unzip stanford-corenlp-full-2018-02-27.zip
cd stanford-corenlp-full-2018-02-27/

# Download the properties for chinese language
wget http://nlp.stanford.edu/software/stanford-chinese-corenlp-2018-02-27-models.jar 
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties 

# Download the properties for arabic
wget http://nlp.stanford.edu/software/stanford-arabic-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-arabic.properties

Pour le chinois :

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-chinese.properties \
-preload tokenize,ssplit,pos,lemma,ner,parse \
-status_port 9001  -port 9001 -timeout 15000 &

Puis en Python3 :

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser('http://localhost:9001')
>>> list(parser.tokenize(u'我家没有电脑。'))
['我家', '没有', '电脑', '。']

Pour l'arabe :

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-arabic.properties \
-preload tokenize,ssplit,pos,parse \
-status_port 9005  -port 9005 -timeout 15000

Enfin, lancez Python :

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser(url='http://localhost:9005')
>>> text = u'انا حامل'
>>> parser.tokenize(text)
<generator object GenericCoreNLPParser.tokenize at 0x7f4a26181bf8>
>>> list(parser.tokenize(text))
['انا', 'حامل']

alvations le 23 août 2018

❤3

Tous les 3 commentaires

@libingnan54321 pourquoi

Pouvez-vous s'il vous plaît essayer celui-ci d'abord et fournir la sortie?

segmenter_jar_file = os.path.join(standfordNlpPath,'stanford-segmenter-2018-02-27/stanford-segmenter-3.9.1.jar')
assert(os.path.isfile(segmenter_jar_file))
stanfordSegmenter = StanfordSegmenter(
    path_to_jar=segmenter_jar_file,
)

Demetrio92 le 9 juil. 2018

Veuillez utiliser la nouvelle interface CoreNLPParser .

Mettez d'abord à jour votre NLTK :

pip3 install -U nltk

Puis toujours dans le terminal :

# Get the CoreNLP package
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip
unzip stanford-corenlp-full-2018-02-27.zip
cd stanford-corenlp-full-2018-02-27/

# Download the properties for chinese language
wget http://nlp.stanford.edu/software/stanford-chinese-corenlp-2018-02-27-models.jar 
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties 

# Download the properties for arabic
wget http://nlp.stanford.edu/software/stanford-arabic-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-arabic.properties

Pour le chinois :

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-chinese.properties \
-preload tokenize,ssplit,pos,lemma,ner,parse \
-status_port 9001  -port 9001 -timeout 15000 &

Puis en Python3 :

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser('http://localhost:9001')
>>> list(parser.tokenize(u'我家没有电脑。'))
['我家', '没有', '电脑', '。']

Pour l'arabe :

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-arabic.properties \
-preload tokenize,ssplit,pos,parse \
-status_port 9005  -port 9005 -timeout 15000

Enfin, lancez Python :

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser(url='http://localhost:9005')
>>> text = u'انا حامل'
>>> parser.tokenize(text)
<generator object GenericCoreNLPParser.tokenize at 0x7f4a26181bf8>
>>> list(parser.tokenize(text))
['انا', 'حامل']

alvations le 23 août 2018

❤3

Clôture du problème comme résolu pour l'instant =)
Veuillez ouvrir s'il y a d'autres problèmes.

alvations le 23 août 2018

Cette page vous a été utile?

0 / 5 - 0 notes