Nltk: Function to access the Synset object using sense key

Created on 10 Jan 2018  ·  4Comments  ·  Source: nltk/nltk

I might have missed it but is there a function to access the Synset object from the NLTK wordnet interface from the sense key?

If there isn't could we expose a function that can achieve that in nltk.corpus.wordnet? E.g. https://stackoverflow.com/questions/48170666/how-to-get-the-gloss-given-sense-key-using-nltk-wordnet/

Ideally, it would be good to have functionalities to access Synset objects using:

  1. offset-pos, e.g. 1433493-a -> Synset('long.a.02')
  2. sense_key, e.g. long%3:00:02:: -> Synset('long.a.02')

Currently, we have the synset_from_pos_and_offset() for (1).

There's another function (_synset_from_pos_and_line) that reads the following line to return Synset('long.a.02'):

01433493 00 a 01 long 1 016 = 05129201 n 0000 + 05133287 n 0101 ! 01436003 a 0101 & 01434007 a 0000 & 01434218 a 0000 & 01434530 a 0000 & 01434717 a 0000 & 01434841 a 0000 & 01434966 a 0000 & 01435060 a 0000 & 01435189 a 0000 & 01435290 a 0000 & 01435399 a 0000 & 01435507 a 0000 & 01435675 a 0000 & 01435891 a 0000 | primarily spatial sense; of relatively great or greater than average spatial extension or extension as specified; "a long road"; "a long distance"; "contained many long words"; "ten miles long" 

but it's not the sense key.

corpus enhancement goodfirstbug nice idea wordnet

Most helpful comment

Implemented the function suggested in the stackoverflow, but it didn't seem to map to the correct senses -- for instance, synset_from_sense_key('afraid%3:00:00::') returned afraid.a.04 instead of afraid.a.01. This problem extends to other POS as well. (Sense keys were obtained from Wordnet's online interface)

Instead, using the method shown in the SemCor documentation appears to map correctly -- there is currently a lemma_from_key(key) function that seems to take in something similar to a sense key. However, lemma_from key(key) doesn't support adjective satellites (e.g. afraid%3:00:02:concerned:00). I can definitely implement a wrapper around lemma_from_key(key) to fix this and return a Synset.

All 4 comments

I'd like to work on this!

@craaaa Sorry for the late reply, was away for a while.

Feel free to work on it and create a PR afterwards.
P/S: Don't worry about breaking anything, there'll be checks and reviews before we merge the code.

Implemented the function suggested in the stackoverflow, but it didn't seem to map to the correct senses -- for instance, synset_from_sense_key('afraid%3:00:00::') returned afraid.a.04 instead of afraid.a.01. This problem extends to other POS as well. (Sense keys were obtained from Wordnet's online interface)

Instead, using the method shown in the SemCor documentation appears to map correctly -- there is currently a lemma_from_key(key) function that seems to take in something similar to a sense key. However, lemma_from key(key) doesn't support adjective satellites (e.g. afraid%3:00:02:concerned:00). I can definitely implement a wrapper around lemma_from_key(key) to fix this and return a Synset.

@craaaa. I think there is still an issue with adjective satellites. I tried to use the synset_from_sense_key function and here is the error:

File "/home/izorar/anaconda3/lib/python3.7/site-packages/nltk/corpus/reader/wordnet.py", line 1356, in synset raise WordNetError(message % lemma) WordNetError: adjective satellite requested but only plain adjective found for lemma 'first'

Any idea on how to fix the error?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

chaseireland picture chaseireland  ·  3Comments

mwess picture mwess  ·  5Comments

BLKSerene picture BLKSerene  ·  4Comments

stevenbird picture stevenbird  ·  4Comments

jeryini picture jeryini  ·  5Comments