We have deprecated the StanfordTokenizer
/StanfordSegmenter
, StanfordPOSTagger
and StanfordNERTagger
.
It would be good to also deprecate the old StanfordParser
, StanfordDependencyParser
and StanfordNeuralDependencyParser
by
2a. Wrap the duck-types for CoreNLPParser
that emulates the functions of the old interface
2b. Write up documentations of how to use the CoreNLPParser
to use dependency and neural dependency parsing
Both (2a) and (2b) methods should only affect the properties
argument of the api_call
The current interface for CoreNLPParser
:
>>> from nltk.parse.corenlp import CoreNLPParser
>>> sent = 'The quick brown fox jumps over the lazy dog.'
>>> next(parser.raw_parse(sent)).pretty_print() # doctest: +NORMALIZE_WHITESPACE
ROOT
|
S
_______________|__________________________
| VP |
| _________|___ |
| | PP |
| | ________|___ |
NP | | NP |
____|__________ | | _______|____ |
DT JJ JJ NN VBZ IN DT JJ NN .
| | | | | | | | | |
The quick brown fox jumps over the lazy dog .
The desired interface might look like this:
# Using Duck-types
>>> from nltk.parse.stanford import CoreNLPParser
>>> depparser = CoreNLPDependencyParser('http://localhost:9000')
>>> depparser.parse(sent)
>>> ndepparser = CoreNLPNeuralDependencyParser('http://localhost:9000')
>>> ndepparser.parse(sent)
# Using arguments to control `properties` for `api_call()`
>>> from nltk.parse.stanford import CoreNLPParser
>>> depparser = CoreNLPParser('http://localhost:9000', parser_type="dependency")
>>> depparser.parse(sent)
>>> ndepparser = CoreNLPNeuralDependencyParser('http://localhost:9000', parser_type="neural_dependency")
>>> ndepparser.parse(sent)
This would make a good class project or good first challenge ;P
Hi, i would like to work on this issue, but i didn't get why mock used like this
Now tests don't test anything. Even if tokenize body completely erased, tests will still pass. Maybe we should patch the api_call
method and then call tokenize
corenlp_tokenizer = CoreNLPTokenizer()
corenlp_tokenizer.api_call = MagicMock(return_value=predefined_return_value)
corenlp_tokenizer.tokenize(input_string)
@artiemq Thank you for the interest in the issue!
Mock was used in the unittest because it was a quick way to document how the python flow of the APIs and how a user should use it but it didn't actually call CoreNLP.
Regarding the unittest, perhaps using unittest.mock
isn't the best way to test the CoreNLP functionalities. Please feel free to rewrite/edit it and create a PR =)
I can see info here on how to connect to the POStagger 'server' at port 9000, but I can't find information on how to run the Stanford postagger server to listen on port 9000... Anyone knows?
So is this decided or not?
I am now trying to run nltk.tag.StanfordNERTagger()
. There is a small issue with it, that I wanted to fix. Shall I do it or not?
I want the parser to run locally without API-calls. Is this possible with CoreNLPParser
?
Most helpful comment
@artiemq Thank you for the interest in the issue!
Mock was used in the unittest because it was a quick way to document how the python flow of the APIs and how a user should use it but it didn't actually call CoreNLP.
Regarding the unittest, perhaps using
unittest.mock
isn't the best way to test the CoreNLP functionalities. Please feel free to rewrite/edit it and create a PR =)