Nltk: collocations function returns error

Created on 15 May 2019  ·  10Comments  ·  Source: nltk/nltk

I was going through chapter 1 of the book and the collocations function returns an error. It seems like line 440 in text.py is redundant, since the collocation_list function has been introduced. I fixed the issue by rewriting the current line 440 and line 441 in text.py.

old code:
collocation_strings = [w1 + ' ' + w2 for w1, w2 in self.collocation_list(num, window_size)]*
print(tokenwrap(collocation_strings, separator="; "))

new code:
print(tokenwrap(self.collocation_list(), separator="; "))

bug goodfirstbug resolved text

Most helpful comment

Also still having issues with .collocations(), but .collocation_list() works.

All 10 comments

Thanks @martinevanschouwenburg for raising the bug!

Yes it looks like the collocation list is needed. To replicate the bug:

$ python3
Python 3.6.4rc1 (v3.6.4rc1:3398dcb14f, Dec  5 2017, 00:58:30) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
>>> text4.collocations()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/nltk/text.py", line 440, in collocations
    collocation_strings = [w1 + ' ' + w2 for w1, w2 in self.collocation_list(num, window_size)]
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/nltk/text.py", line 440, in <listcomp>
    collocation_strings = [w1 + ' ' + w2 for w1, w2 in self.collocation_list(num, window_size)]
ValueError: too many values to unpack (expected 2)

I am still seeing this error as well when going through chapter 1 of the book.

* Introductory Examples for the NLTK Book *
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
Traceback (most recent call last):
File "c:\Users\Adam.vscode\extensions\ms-python.python-2019.6.24221\pythonFiles\ptvsd_launcher.py", line 43, in
main(ptvsdArgs)
File "c:\Users\Adam.vscode\extensions\ms-python.python-2019.6.24221\pythonFiles\lib\python\ptvsd__main__.py", line 434, in main
run()
File "c:\Users\Adam.vscode\extensions\ms-python.python-2019.6.24221\pythonFiles\lib\python\ptvsd__main__.py", line 312, in run_file
runpy.run_path(target, run_name='__main__')
File "c:\users\adam\appdata\local\programs\python\python37-32\Lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "c:\users\adam\appdata\local\programs\python\python37-32\Lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "c:\users\adam\appdata\local\programs\python\python37-32\Lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "c:\Users\Adam\Documents\code\python\natlang\natlang.py", line 4, in
text4.collocations()
File "C:\Users\Adam.virtualenvs\natlang-9ek-vNym\lib\site-packages\nltk\text.py", line 444, in collocations
w1 + " " + w2 for w1, w2 in self.collocation_list(num, window_size)
File "C:\Users\Adam.virtualenvs\natlang-9ek-vNym\lib\site-packages\nltk\text.py", line 444, in
w1 + " " + w2 for w1, w2 in self.collocation_list(num, window_size)
ValueError: too many values to unpack (expected 2)

@networkjr I can confirm that too. Maybe the fix in #2227 hasn't been pushed to PyPi yet?

@networkjr it's the same with the Anaconda package

I'm working through the NLTK book, am completely new to NLTK and fairly new to Python - and I'm getting this same error.

$ python
Python 3.7.2 (default, Feb 14 2019, 11:13:53) 
[Clang 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
>>> text4.collocations()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/george/code/nltk/py3env/lib/python3.7/site-packages/nltk/text.py", line 444, in collocations
    w1 + " " + w2 for w1, w2 in self.collocation_list(num, window_size)
  File "/Users/george/code/nltk/py3env/lib/python3.7/site-packages/nltk/text.py", line 444, in <listcomp>
    w1 + " " + w2 for w1, w2 in self.collocation_list(num, window_size)
ValueError: too many values to unpack (expected 2)

According to my Pipfile.lock I'm using NLTK 3.4.5 which I believe is the most recent release.

Is there a fix for this issue?

This has been fixed on #2377 , should be fixed in the next NLTK release soon.

Otherwise, if you can't wait =)

pip install -U https://github.com/nltk/nltk/archive/develop.zip

I still have the same error after updating cntk with
pip install -U https://github.com/nltk/nltk/archive/develop.zip

Current cnkt version '3.4.5'

How can I fix it?

Many thanks.

Also still having issues with .collocations(), but .collocation_list() works.

Replace at line 444 in /nltk/text.py :
collocation_strings = [ w1 + " " + w2 for w1, w2 in text.collocation_list(num, window_size)]

with the following:
collocation_strings = [ w for w in text.collocation_list(num, window_size)]

Same here. Working through the nltk book gives error for collocations() whereas collocation_list() works.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ndvbd picture ndvbd  ·  4Comments

jeryini picture jeryini  ·  5Comments

stevenbird picture stevenbird  ·  3Comments

libingnan54321 picture libingnan54321  ·  3Comments

DavidNemeskey picture DavidNemeskey  ·  4Comments