Nltk: 弃用旧的斯坦福解析器

创建于 2017-09-26 · 4评论 · 资料来源: nltk/nltk

我们已弃用StanfordTokenizer / StanfordSegmenter ， StanfordPOSTagger和StanfordNERTagger 。

最好也淘汰旧的StanfordParser ， StanfordDependencyParser和StanfordNeuralDependencyParser

向旧界面添加适当的警告

2a。用CoreNLPParser封装鸭子类型，

2b。撰写有关如何使用CoreNLPParser来使用依赖项和神经依赖项解析的文档

为新的CoreNLP解析器接口编写测试

两个（2a）和（2b）的方法应只影响properties所述的参数api_call

CoreNLPParser的当前界面：

>>> from nltk.parse.corenlp import CoreNLPParser
>>> sent = 'The quick brown fox jumps over the lazy dog.'
>>> next(parser.raw_parse(sent)).pretty_print()  # doctest: +NORMALIZE_WHITESPACE
                         ROOT
                          |
                          S
           _______________|__________________________
          |                         VP               |
          |                _________|___             |
          |               |             PP           |
          |               |     ________|___         |
          NP              |    |            NP       |
      ____|__________     |    |     _______|____    |
     DT   JJ    JJ   NN  VBZ   IN   DT      JJ   NN  .
     |    |     |    |    |    |    |       |    |   |
    The quick brown fox jumps over the     lazy dog  .

所需的界面可能如下所示：

# Using Duck-types
>>> from nltk.parse.stanford import CoreNLPParser
>>> depparser = CoreNLPDependencyParser('http://localhost:9000')
>>> depparser.parse(sent)
>>> ndepparser = CoreNLPNeuralDependencyParser('http://localhost:9000')
>>> ndepparser.parse(sent)

# Using arguments to control `properties` for `api_call()` 
>>> from nltk.parse.stanford import CoreNLPParser

>>> depparser = CoreNLPParser('http://localhost:9000', parser_type="dependency")
>>> depparser.parse(sent)

>>> ndepparser = CoreNLPNeuralDependencyParser('http://localhost:9000', parser_type="neural_dependency")
>>> ndepparser.parse(sent)

这将是一个很好的课堂项目或一个很好的第一个挑战； P

dependency parsing enhancement goodfirstbug inactive nice idea tests

资料来源

alvations

👍1

最有用的评论

@artiemq感谢您对此问题的关注！

在单元测试中使用了Mock，因为它是记录API的python流程以及用户应如何使用它的一种快速方法，但实际上并未调用CoreNLP。

关于单元测试，也许使用unittest.mock并不是测试CoreNLP功能的最佳方法。请随时重写/编辑它并创建PR =）

alvations 于 2017-10-02

👍2

所有4条评论

嗨，我想解决这个问题，但是我不明白为什么要像这样使用模拟

现在测试不测试任何东西。即使将标记化的主体完全擦除，测试仍将通过。也许我们应该修补api_call方法，然后调用tokenize

  corenlp_tokenizer = CoreNLPTokenizer()
  corenlp_tokenizer.api_call = MagicMock(return_value=predefined_return_value)
  corenlp_tokenizer.tokenize(input_string)

artiemq 于 2017-10-01

👍1

@artiemq感谢您对此问题的关注！

在单元测试中使用了Mock，因为它是记录API的python流程以及用户应如何使用它的一种快速方法，但实际上并未调用CoreNLP。

关于单元测试，也许使用unittest.mock并不是测试CoreNLP功能的最佳方法。请随时重写/编辑它并创建PR =）

alvations 于 2017-10-02

👍2

我可以在这里看到有关如何在端口9000上连接POStagger'服务器'的信息，但是我找不到有关如何运行Stanford postagger服务器以在端口9000上侦听的信息...有人知道吗？

ndvbd 于 2018-03-22

👍1

那么这是决定还是没有？
我现在正在尝试运行nltk.tag.StanfordNERTagger() 。我想解决一个小问题。我该做还是不做？

我希望解析器在没有API调用的情况下在本地运行。 CoreNLPParser吗？

Demetrio92 于 2018-07-06

此页面是否有帮助？

0 / 5 - 0 等级

Nltk: 弃用旧的斯坦福解析器

最有用的评论

所有4条评论

相关问题