Nltk: 도와주세요, 제발! nltk 및 standford nlp 통합

에 만든 2018년 07월 06일 · 3코멘트 · 출처: nltk/nltk

nltk와 standford nlp 통합을 사용할 때 혼동을 일으키는 문제가 있었습니다.
내 개발 환경은 다음과 같습니다.

nltk 3.3
스탠포드 nlp 스탠포드 세그먼트 3.6.0 / 3.9.1
그리고 다음과 같이 StanfordSegmenter 개체를 만들려고 합니다.
standfordNlpPath = self.projectPath + "\standford-nlp\stanford-segmenter-2015-12-09"
stanfordSegmenter= 스탠포드세그멘터(
path_to_jar=standfordNlpPath + "\stanford-segmenter-3.6.0.jar",
path_to_slf4j=standfordNlpPath + "\slf4j-api.jar",
path_to_sihan_corpora_dict=standfordNlpPath + "\data-2015",
path_to_model=standfordNlpPath + "\data-2015\pku.gz",
path_to_dict=standfordNlpPath + "\data-2015\dict-chris6.ser.gz")
다음과 같은 결과로 실패합니다.
==================================================== ===========================
NLTK는 stanford-segmenter.jar를 찾을 수 없습니다! CLASSPATH 설정
환경 변수.
자세한 내용은 stanford-segmenter.jar에서 다음을 참조하세요.

https://nlp.stanford.edu/software

모든 종류의 항아리가 정확히 거기에 존재합니다. 내 경로나 StanfordSegmenter 클래스에 넣은 매개변수에 문제가 있습니까? 예제는 nltk 3.3 문서에서 찾은 매우 쉬웠으며 "path_to_slf4j"라는 매개변수를 하나만 넣었습니다.
그러니 누가 좀 도와주세요 :-( !

resolved stanford api

출처

libingnan54321

가장 유용한 댓글

새로운 CoreNLPParser 인터페이스를 사용하십시오.

먼저 NLTK를 업데이트하십시오.

pip3 install -U nltk

그런 다음 여전히 터미널에 있습니다.

# Get the CoreNLP package
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip
unzip stanford-corenlp-full-2018-02-27.zip
cd stanford-corenlp-full-2018-02-27/

# Download the properties for chinese language
wget http://nlp.stanford.edu/software/stanford-chinese-corenlp-2018-02-27-models.jar 
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties 

# Download the properties for arabic
wget http://nlp.stanford.edu/software/stanford-arabic-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-arabic.properties

중국어:

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-chinese.properties \
-preload tokenize,ssplit,pos,lemma,ner,parse \
-status_port 9001  -port 9001 -timeout 15000 &

그런 다음 Python3에서:

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser('http://localhost:9001')
>>> list(parser.tokenize(u'我家没有电脑。'))
['我家', '没有', '电脑', '。']

아랍어:

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-arabic.properties \
-preload tokenize,ssplit,pos,parse \
-status_port 9005  -port 9005 -timeout 15000

마지막으로 Python을 시작합니다.

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser(url='http://localhost:9005')
>>> text = u'انا حامل'
>>> parser.tokenize(text)
<generator object GenericCoreNLPParser.tokenize at 0x7f4a26181bf8>
>>> list(parser.tokenize(text))
['انا', 'حامل']

alvations 에 2018년 08월 23일

❤3

모든 3 댓글

@libingnan54321 왜 최신 3.9.1 버전을 사용하지 않습니까?

먼저 이것을 시도하고 출력을 제공할 수 있습니까?

segmenter_jar_file = os.path.join(standfordNlpPath,'stanford-segmenter-2018-02-27/stanford-segmenter-3.9.1.jar')
assert(os.path.isfile(segmenter_jar_file))
stanfordSegmenter = StanfordSegmenter(
    path_to_jar=segmenter_jar_file,
)

Demetrio92 에 2018년 07월 09일

새로운 CoreNLPParser 인터페이스를 사용하십시오.

먼저 NLTK를 업데이트하십시오.

pip3 install -U nltk

그런 다음 여전히 터미널에 있습니다.

# Get the CoreNLP package
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip
unzip stanford-corenlp-full-2018-02-27.zip
cd stanford-corenlp-full-2018-02-27/

# Download the properties for chinese language
wget http://nlp.stanford.edu/software/stanford-chinese-corenlp-2018-02-27-models.jar 
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties 

# Download the properties for arabic
wget http://nlp.stanford.edu/software/stanford-arabic-corenlp-2018-02-27-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-arabic.properties

중국어:

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-chinese.properties \
-preload tokenize,ssplit,pos,lemma,ner,parse \
-status_port 9001  -port 9001 -timeout 15000 &

그런 다음 Python3에서:

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser('http://localhost:9001')
>>> list(parser.tokenize(u'我家没有电脑。'))
['我家', '没有', '电脑', '。']

아랍어:

# Start the server.
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-arabic.properties \
-preload tokenize,ssplit,pos,parse \
-status_port 9005  -port 9005 -timeout 15000

마지막으로 Python을 시작합니다.

>>> from nltk.parse import CoreNLPParser
>>> parser = CoreNLPParser(url='http://localhost:9005')
>>> text = u'انا حامل'
>>> parser.tokenize(text)
<generator object GenericCoreNLPParser.tokenize at 0x7f4a26181bf8>
>>> list(parser.tokenize(text))
['انا', 'حامل']

alvations 에 2018년 08월 23일

❤3

현재 해결된 대로 문제 종료 =)
추가 문제가 있으면 열어주세요.

alvations 에 2018년 08월 23일

이 페이지가 도움이 되었나요?

0 / 5 - 0 등급

Nltk: 도와주세요, 제발! nltk 및 standford nlp 통합

https://nlp.stanford.edu/software

가장 유용한 댓글

모든 3 댓글

관련 문제