Nltk: Deprecating the old Stanford Parser

Created on 26 Sep 2017  ·  4Comments  ·  Source: nltk/nltk

We have deprecated the StanfordTokenizer/StanfordSegmenter, StanfordPOSTagger and StanfordNERTagger.

It would be good to also deprecate the old StanfordParser, StanfordDependencyParser and StanfordNeuralDependencyParser by

  1. Adding the appropriate warnings to the old interface

2a. Wrap the duck-types for CoreNLPParser that emulates the functions of the old interface

2b. Write up documentations of how to use the CoreNLPParser to use dependency and neural dependency parsing

  1. Write tests for the new CoreNLP parser interfaces

Both (2a) and (2b) methods should only affect the properties argument of the api_call

The current interface for CoreNLPParser:

>>> from nltk.parse.corenlp import CoreNLPParser
>>> sent = 'The quick brown fox jumps over the lazy dog.'
>>> next(parser.raw_parse(sent)).pretty_print()  # doctest: +NORMALIZE_WHITESPACE
                         ROOT
                          |
                          S
           _______________|__________________________
          |                         VP               |
          |                _________|___             |
          |               |             PP           |
          |               |     ________|___         |
          NP              |    |            NP       |
      ____|__________     |    |     _______|____    |
     DT   JJ    JJ   NN  VBZ   IN   DT      JJ   NN  .
     |    |     |    |    |    |    |       |    |   |
    The quick brown fox jumps over the     lazy dog  .

The desired interface might look like this:

# Using Duck-types
>>> from nltk.parse.stanford import CoreNLPParser
>>> depparser = CoreNLPDependencyParser('http://localhost:9000')
>>> depparser.parse(sent)
>>> ndepparser = CoreNLPNeuralDependencyParser('http://localhost:9000')
>>> ndepparser.parse(sent)
# Using arguments to control `properties` for `api_call()` 
>>> from nltk.parse.stanford import CoreNLPParser

>>> depparser = CoreNLPParser('http://localhost:9000', parser_type="dependency")
>>> depparser.parse(sent)

>>> ndepparser = CoreNLPNeuralDependencyParser('http://localhost:9000', parser_type="neural_dependency")
>>> ndepparser.parse(sent)

This would make a good class project or good first challenge ;P

dependency parsing enhancement goodfirstbug inactive nice idea tests

Most helpful comment

@artiemq Thank you for the interest in the issue!

Mock was used in the unittest because it was a quick way to document how the python flow of the APIs and how a user should use it but it didn't actually call CoreNLP.

Regarding the unittest, perhaps using unittest.mock isn't the best way to test the CoreNLP functionalities. Please feel free to rewrite/edit it and create a PR =)

All 4 comments

Hi, i would like to work on this issue, but i didn't get why mock used like this

Now tests don't test anything. Even if tokenize body completely erased, tests will still pass. Maybe we should patch the api_call method and then call tokenize

  corenlp_tokenizer = CoreNLPTokenizer()
  corenlp_tokenizer.api_call = MagicMock(return_value=predefined_return_value)
  corenlp_tokenizer.tokenize(input_string)

@artiemq Thank you for the interest in the issue!

Mock was used in the unittest because it was a quick way to document how the python flow of the APIs and how a user should use it but it didn't actually call CoreNLP.

Regarding the unittest, perhaps using unittest.mock isn't the best way to test the CoreNLP functionalities. Please feel free to rewrite/edit it and create a PR =)

I can see info here on how to connect to the POStagger 'server' at port 9000, but I can't find information on how to run the Stanford postagger server to listen on port 9000... Anyone knows?

So is this decided or not?
I am now trying to run nltk.tag.StanfordNERTagger(). There is a small issue with it, that I wanted to fix. Shall I do it or not?

I want the parser to run locally without API-calls. Is this possible with CoreNLPParser?

Was this page helpful?
0 / 5 - 0 ratings