Nltk: ¡Ayudame por favor! Integración de nltk y standford nlp

Creado en 6 jul. 2018 · 3Comentarios · Fuente: nltk/nltk

Hubo un problema que me confundió cuando usé la integración de nltk y standford nlp.
Mis entornos de desarrollo como este:

nltk 3.3
standford nlp stanford-segmenter 3.6.0 / 3.9.1
Y trato de crear un objeto StanfordSegmenter como este:
standfordNlpPath = self.projectPath + "\ standford-nlp \ stanford-segmenter-2015-12-09"
stanfordSegmenter = StanfordSegmenter (
path_to_jar = standfordNlpPath + "\ stanford-segmenter-3.6.0.jar",
ruta_a_slf4j = standfordNlpPath + "\ slf4j-api.jar",
ruta_a_sihan_corpora_dict = standfordNlpPath + "\ data-2015",
ruta_al_modelo = standfordNlpPath + "\ data-2015 \ pku.gz",
path_to_dict = standfordNlpPath + "\ data-2015 \ dict-chris6.ser.gz")
entonces el fracaso como este como resultado:
================================================ =========================
¡NLTK no pudo encontrar stanford-segmenter.jar! Establecer el CLASSPATH
Variable ambiental.
Para obtener más información, en stanford-segmenter.jar, consulte:

https://nlp.stanford.edu/software

Todo tipo de frascos existen exactamente allí, estoy bastante seguro, ¿hay algún problema con mi ruta o los parámetros que puse en la clase de StanfordSegmenter? El ejemplo fue bastante fácil de lo que encontré en el documento nltk 3.3, solo pusieron un parámetro que "path_to_slf4j".
Entonces, que alguien me ayude :-(!

resolved stanford api

Fuente

libingnan54321

Comentario más útil

Utilice la nueva interfaz CoreNLPParser .

Primero actualice su NLTK:

pip3 install -U nltk

Entonces todavía en la terminal:

# Get the CoreNLP package
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip
unzip stanford-corenlp-full-2018-02-27.zip
cd stanford-corenlp-full-2018-02-27/

# Download the properties for chinese language
wget http://nlp.stanford.edu/software/stanford-chinese-corenlp-2018-02-27-models.jar 
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties 

# Download the properties for arabic
wget http://nlp.stanford.edu/software/stanford-arabic-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-arabic.properties

Para chino:

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-chinese.properties \
-preload tokenize,ssplit,pos,lemma,ner,parse \
-status_port 9001  -port 9001 -timeout 15000 &

Luego en Python3:

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser('http://localhost:9001')
>>> list(parser.tokenize(u'我家没有电脑。'))
['我家', '没有', '电脑', '。']

Para árabe:

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-arabic.properties \
-preload tokenize,ssplit,pos,parse \
-status_port 9005  -port 9005 -timeout 15000

Finalmente, inicie Python:

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser(url='http://localhost:9005')
>>> text = u'انا حامل'
>>> parser.tokenize(text)
<generator object GenericCoreNLPParser.tokenize at 0x7f4a26181bf8>
>>> list(parser.tokenize(text))
['انا', 'حامل']

alvations en 23 ago. 2018

❤3

Todos 3 comentarios

@ libingnan54321 ¿por qué no está utilizando la última versión 3.9.1?

¿Puedes probar este primero y proporcionar el resultado?

segmenter_jar_file = os.path.join(standfordNlpPath,'stanford-segmenter-2018-02-27/stanford-segmenter-3.9.1.jar')
assert(os.path.isfile(segmenter_jar_file))
stanfordSegmenter = StanfordSegmenter(
    path_to_jar=segmenter_jar_file,
)

Demetrio92 en 9 jul. 2018

Utilice la nueva interfaz CoreNLPParser .

Primero actualice su NLTK:

pip3 install -U nltk

Entonces todavía en la terminal:

# Get the CoreNLP package
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip
unzip stanford-corenlp-full-2018-02-27.zip
cd stanford-corenlp-full-2018-02-27/

# Download the properties for chinese language
wget http://nlp.stanford.edu/software/stanford-chinese-corenlp-2018-02-27-models.jar 
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties 

# Download the properties for arabic
wget http://nlp.stanford.edu/software/stanford-arabic-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-arabic.properties

Para chino:

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-chinese.properties \
-preload tokenize,ssplit,pos,lemma,ner,parse \
-status_port 9001  -port 9001 -timeout 15000 &

Luego en Python3:

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser('http://localhost:9001')
>>> list(parser.tokenize(u'我家没有电脑。'))
['我家', '没有', '电脑', '。']

Para árabe:

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-arabic.properties \
-preload tokenize,ssplit,pos,parse \
-status_port 9005  -port 9005 -timeout 15000

Finalmente, inicie Python:

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser(url='http://localhost:9005')
>>> text = u'انا حامل'
>>> parser.tokenize(text)
<generator object GenericCoreNLPParser.tokenize at 0x7f4a26181bf8>
>>> list(parser.tokenize(text))
['انا', 'حامل']

alvations en 23 ago. 2018

❤3

Cerrando el problema como resuelto por ahora =)
Ábralo si hay más problemas.

alvations en 23 ago. 2018

¿Fue útil esta página

0 / 5 - 0 calificaciones