Nltk: Verbnet corpus is out of date

Created on 5 May 2018 · 13Comments · Source: nltk/nltk

The nltk data index (https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml) points verbnet to version 2.1. The latest verbnet definition is 3.2.

The latest version has updated frame descriptions that provide much more information about the phrasal structure. For example, the primary description of a frame from class future_having-13.3 in the latest version is NP V NP-Dative NP, describing the frame's structure as (noun-phrase, verb, noun-phrase(dative), noun-phrase) while in version 2.1 it just reads Dative.

bug corpus enhancement nltk_data

Source

agodbehere

Most helpful comment

@alvations
It does work for what I am using it for. Let me show you my code:

import nltk
v3 = nltk.corpus.util.LazyCorpusLoader(
    'verbnet3', nltk.corpus.reader.verbnet.VerbnetCorpusReader,
    r'(?!\.).*\.xml')
v3.classids('add') # returns ['mix-22.1-2', 'multiply-108', 'say-37.7-1']

For that to work you need to download verbnet3 from here. Unzip this file in the folder ~/nltk_data/corpora~. When unzipped it should create a new folder~/nltk_data/corpora/verbnet3which contains all the Verbnet3 definitions. Then you should be able to run the code above. Notice that for Verbnet 2 (the default) the codev3.classids('add')` only returns the first class (mix-22.1-2).

Since that is basically all I am using Verbnet3 for I have not tested the other APIs, but the classids method has been tested on maaany different words and they all work. I hope this helps!

Salompas on 19 Oct 2018

🎉3

All 13 comments

@agodbehere, thanks for reporting this issue. I've verified that the existing verbnet 2 corpus reader breaks on verbnet 3 data, so both will need to live alongside each other in the corpus collection.

The next step is for someone to contribute a corresponding corpus reader nltk.corpus.verbnet3, which can hopefully share some of the existing code.

We'll need to support both for a while.

stevenbird on 14 May 2018

👍1

@stevenbird, what breaking case did you find for using the existing corpus reader with verbnet 2? I didn't run the test suite after updating the corpus, but for my use-case (requesting classids and frames), the existing corpus reader works just fine.

agodbehere on 16 May 2018

The problem exists with verbnet 3. We need a different corpus reader for
that.

On Wed, 16 May 2018 10:45 am Andrew Godbehere notifications@github.com
wrote:

@stevenbird https://github.com/stevenbird, what breaking case did you
find for using the existing corpus reader with verbnet 2? I didn't run the
test suite after updating the corpus, but for my use-case (requesting
classids and frames), the existing corpus reader works just fine.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/nltk/nltk/issues/2015#issuecomment-389363521, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AADYbsgbjtFTKsyamRPG0OpJuWnF3UJ0ks5ty33DgaJpZM4Tzc0d
.

stevenbird on 16 May 2018

@stevenbird @agodbehere Hi, I work on the VerbNet project at CU Boulder and would be happy to contribute and maintain code for a corpus reader for VerbNet 3+.

amosleokim on 22 May 2018

👍1

@amosleokim: thanks, that would be welcome!

You can see that we have verbnet (2) and verbnet3 data here.

I propose we add an entry for verbnet3 here

And then work out how to extend verbnet.py to support both verbnet and verbnet3.

How does that sound? We need to support both simultaneously, and (ultimately) deprecate verbnet 2.

We have an NLTK slack channel where we can discuss details if necessary. Thanks!

stevenbird on 22 May 2018

👍1

@stevenbird That sounds good to me! If you can send me an invite code to the slack channel, I'll hop on so we can get started on the nitty gritty.

amosleokim on 22 May 2018

Any progress on this topic? I am trying to use verbnet for a research and the output I get from the classids methods seems weird.

Salompas on 10 Oct 2018

Please see https://github.com/nltk/nltk/issues/2015#issuecomment-390826015

stevenbird on 10 Oct 2018

Thanks @stevenbird, the older version seemed to be the cause of the problem. I was able to manually download verbnet3.zip and read it with the reader for verbnet 2.1 that is in nltk.

Salompas on 11 Oct 2018

@salompas Just like to check again, does the verbnet API in NLTK work with verbnet3?

alvations on 19 Oct 2018

@alvations
It does work for what I am using it for. Let me show you my code:

import nltk
v3 = nltk.corpus.util.LazyCorpusLoader(
    'verbnet3', nltk.corpus.reader.verbnet.VerbnetCorpusReader,
    r'(?!\.).*\.xml')
v3.classids('add') # returns ['mix-22.1-2', 'multiply-108', 'say-37.7-1']

Since that is basically all I am using Verbnet3 for I have not tested the other APIs, but the classids method has been tested on maaany different words and they all work. I hope this helps!

Salompas on 19 Oct 2018

🎉3

@Salompas Hi, thank you for your solution! What version of verbnet3 is your 'verbnet3'? Is it version 3.3 or 3.2?