Nltk: Unclosed file in stopwords corpora

Created on 3 Jan 2018  ·  11Comments  ·  Source: nltk/nltk

/Users/kiddo/anaconda/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/Users/kiddo/nltk_data/corpora/stopwords/english'>
return concat([ for f in fileids])

That's a warning that I found on debugging mode. I thought that maybe you would like to fix that before the next release.

bug corpus enhancement goodfirstbug pythonic

Most helpful comment

Any news on this? Python 3.6 still complains about NLTK 3.3, on pretty much every resource:

/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/lexnames'>
  for i, line in enumerate('lexnames')):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/index.adj'>
  for i, line in enumerate('index.%s' % suffix)):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/index.adv'>
  for i, line in enumerate('index.%s' % suffix)):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/index.noun'>
  for i, line in enumerate('index.%s' % suffix)):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/index.verb'>
  for i, line in enumerate('index.%s' % suffix)):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/adj.exc'>
  for line in'%s.exc' % suffix):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/adv.exc'>
  for line in'%s.exc' % suffix):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/noun.exc'>
  for line in'%s.exc' % suffix):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/verb.exc'>

All 11 comments

Hi, can I take this issue?

@iliaschalkidis @alvations How can I reproduce the warning on linux?

@sks4903440 Using version 3.2.5, you may try to run the following script in your command line:

import warnings
import nltk
warnings.filterwarnings('error', category=ResourceWarning)
stop_words = nltk.corpus.stopwords.words('english')

$ python

You should get:

ResourceWarning: unclosed file <_io.BufferedReader name='/Users/kiddo/nltk_data/corpora/stopwords/english'>

Fixed in #1945

Hmmm.. Inheriting the io.BufferedReader to the StreamCorpusReader is an interesting solution but maybe closing the file properly with context managers with scope might be a better fix.

And I think Python3.6 has some special requirements for files that are different from previous versions. We have to read the change log from CPython to be sure what we're doing is not just a bandaid =)

@alvations Using with would surely be a good idea. I would try to incorporate that. Since in CPython, the garbage collector automatically closes the file after zero reference counts, I had not used that. Also for with statement to work, we will have to use io.BufferedReader or implement __enter__ and __exit__ methods. What do you think is better?

~I think we don't have to implement the enter/exit methods since we'll not be inheriting from the BufferedReader but using the context to open and close and then let handle io module handle the gc (garbage collection).~

This is tricky, the io.BufferedReader already has the seek() like function and when SeekableUnicodeStreamReader inherits from that without doing any super __init__(), I'm not exactly sure what it's taking from BufferedReader.

And actually, we can't really wrap the with inside the read() because that'll prevent seek and tell functions from work unless we hack the buffer within the with context. Hmm...

Any news on this? Python 3.6 still complains about NLTK 3.3, on pretty much every resource:

/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/lexnames'>
  for i, line in enumerate('lexnames')):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/index.adj'>
  for i, line in enumerate('index.%s' % suffix)):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/index.adv'>
  for i, line in enumerate('index.%s' % suffix)):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/index.noun'>
  for i, line in enumerate('index.%s' % suffix)):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/index.verb'>
  for i, line in enumerate('index.%s' % suffix)):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/adj.exc'>
  for line in'%s.exc' % suffix):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/adv.exc'>
  for line in'%s.exc' % suffix):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/noun.exc'>
  for line in'%s.exc' % suffix):
/home/user/py36/lib/python3.6/site-packages/nltk/corpus/reader/ ResourceWarning: unclosed file <_io.BufferedReader name='/home/user/nltk_data/corpora/wordnet/verb.exc'>

Another issue that seems that it has been completed, it might be good to close the issue.

Thanks everyone for raising the issue and @purificant for the fix!

Was this page helpful?
0 / 5 - 0 ratings