Nltk: Verbnet corpus is out of date

Created on 5 May 2018  ·  13Comments  ·  Source: nltk/nltk

The nltk data index (https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml) points verbnet to version 2.1. The latest verbnet definition is 3.2.

The latest version has updated frame descriptions that provide much more information about the phrasal structure. For example, the primary description of a frame from class future_having-13.3 in the latest version is NP V NP-Dative NP, describing the frame's structure as (noun-phrase, verb, noun-phrase(dative), noun-phrase) while in version 2.1 it just reads Dative.

bug corpus enhancement nltk_data

Most helpful comment

@alvations
It does work for what I am using it for. Let me show you my code:

import nltk
v3 = nltk.corpus.util.LazyCorpusLoader(
    'verbnet3', nltk.corpus.reader.verbnet.VerbnetCorpusReader,
    r'(?!\.).*\.xml')
v3.classids('add') # returns ['mix-22.1-2', 'multiply-108', 'say-37.7-1']

For that to work you need to download verbnet3 from here. Unzip this file in the folder ~/nltk_data/corpora~. When unzipped it should create a new folder~/nltk_data/corpora/verbnet3which contains all the Verbnet3 definitions. Then you should be able to run the code above. Notice that for Verbnet 2 (the default) the codev3.classids('add')` only returns the first class (mix-22.1-2).

Since that is basically all I am using Verbnet3 for I have not tested the other APIs, but the classids method has been tested on maaany different words and they all work. I hope this helps!

All 13 comments

@agodbehere, thanks for reporting this issue. I've verified that the existing verbnet 2 corpus reader breaks on verbnet 3 data, so both will need to live alongside each other in the corpus collection.

The next step is for someone to contribute a corresponding corpus reader nltk.corpus.verbnet3, which can hopefully share some of the existing code.

We'll need to support both for a while.

@stevenbird, what breaking case did you find for using the existing corpus reader with verbnet 2? I didn't run the test suite after updating the corpus, but for my use-case (requesting classids and frames), the existing corpus reader works just fine.

The problem exists with verbnet 3. We need a different corpus reader for
that.

On Wed, 16 May 2018 10:45 am Andrew Godbehere notifications@github.com
wrote:

@stevenbird https://github.com/stevenbird, what breaking case did you
find for using the existing corpus reader with verbnet 2? I didn't run the
test suite after updating the corpus, but for my use-case (requesting
classids and frames), the existing corpus reader works just fine.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/nltk/nltk/issues/2015#issuecomment-389363521, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AADYbsgbjtFTKsyamRPG0OpJuWnF3UJ0ks5ty33DgaJpZM4Tzc0d
.

@stevenbird @agodbehere Hi, I work on the VerbNet project at CU Boulder and would be happy to contribute and maintain code for a corpus reader for VerbNet 3+.

@amosleokim: thanks, that would be welcome!

You can see that we have verbnet (2) and verbnet3 data here.

I propose we add an entry for verbnet3 here

And then work out how to extend verbnet.py to support both verbnet and verbnet3.

How does that sound? We need to support both simultaneously, and (ultimately) deprecate verbnet 2.

We have an NLTK slack channel where we can discuss details if necessary. Thanks!

@stevenbird That sounds good to me! If you can send me an invite code to the slack channel, I'll hop on so we can get started on the nitty gritty.

Any progress on this topic? I am trying to use verbnet for a research and the output I get from the classids methods seems weird.

Thanks @stevenbird, the older version seemed to be the cause of the problem. I was able to manually download verbnet3.zip and read it with the reader for verbnet 2.1 that is in nltk.

@salompas Just like to check again, does the verbnet API in NLTK work with verbnet3?

@alvations
It does work for what I am using it for. Let me show you my code:

import nltk
v3 = nltk.corpus.util.LazyCorpusLoader(
    'verbnet3', nltk.corpus.reader.verbnet.VerbnetCorpusReader,
    r'(?!\.).*\.xml')
v3.classids('add') # returns ['mix-22.1-2', 'multiply-108', 'say-37.7-1']

For that to work you need to download verbnet3 from here. Unzip this file in the folder ~/nltk_data/corpora~. When unzipped it should create a new folder~/nltk_data/corpora/verbnet3which contains all the Verbnet3 definitions. Then you should be able to run the code above. Notice that for Verbnet 2 (the default) the codev3.classids('add')` only returns the first class (mix-22.1-2).

Since that is basically all I am using Verbnet3 for I have not tested the other APIs, but the classids method has been tested on maaany different words and they all work. I hope this helps!

@Salompas Hi, thank you for your solution! What version of verbnet3 is your 'verbnet3'? Is it version 3.3 or 3.2?

@Salompas Hi, thank you for your solution! What version of verbnet3 is your 'verbnet3'? Is it version 3.3 or 3.2?

Hey @songhee-kim, it's been 2 years since I worked on this, so I do not know exactly which version I had.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

stevenbird picture stevenbird  ·  4Comments

Chris00 picture Chris00  ·  3Comments

mwess picture mwess  ·  5Comments

vezeli picture vezeli  ·  3Comments

zdog234 picture zdog234  ·  3Comments