Nltk: A instalação do panlex_lite via nltk.download () parece falhar

Criado em 22 mar. 2016  ·  32Comentários  ·  Fonte: nltk/nltk

Plataforma: Python 3.5 no Mac OS X 10.11.2
Passos para reproduzir:

  1. $ python3
  2. >>> import nltk; nltk.download ('all', halt_on_error = False)

Sintomas:
# Gravação parcial do console:
[nltk_data] | Baixando o pacote panlex_lite para
[nltk_data] | / Usuários / beng / nltk_data ...
[nltk_data] | Descompactando corpora / panlex_lite.zip.
Traceback (última chamada mais recente):
Arquivo "", linha 1, em
Arquivo "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", linha 664, em download
para msg em self.incr_download (info_or_id, download_dir, force):
Arquivo "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", linha 543, em incr_download
para msg em self.incr_download (info.children, download_dir, force):
Arquivo "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", linha 529, em incr_download
para msg em self._download_list (info_or_id, download_dir, force):
Arquivo "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", linha 572, em _download_list
para msg em self.incr_download (item, download_dir, forçar):
Arquivo "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", linha 549, em incr_download
para msg em self._download_package (info, download_dir, force):
Arquivo "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", linha 638, em _download_package
para msg em _unzip_iter (filepath, zipdir, verbose = False):
Arquivo "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", linha 2039, em _unzip_iter
outfile.write (conteúdo)
OSError: [Errno 22] Argumento inválido

Comentários muito úteis

Olá, o mesmo aqui, espero que se um número suficiente de pessoas denunciar, então ele será consertado em algum momento ...

Todos 32 comentários

@grayben - você poderia instalar a versão atual do NLTK e relatar se ainda tiver esse problema?

@stevenbird, desculpe pela demora em responder - você sabe como as atribuições uni podem ser!
Eu estava tendo o problema na v3.2. Acabei de atualizar para a v3.2.1 e estou tendo o mesmo problema.

@grayben Como você instalou o NLTK? Você tem um erro ao baixar um único corpus, por exemplo, nltk.download('brown') ? Você tem um erro ao usar Python2.7?

@alvations

  1. Instalei o NLTK para python2 e python3 via pip e pip3, respectivamente.
  2. Não tenho o erro ao baixar um único corpus que não seja panlex_lite
  3. O erro ocorreu utilizando quer Python2.7 ou Python3.5

Informações adicionais: vários dos meus colegas relataram o que parece ser o mesmo problema, embora eu não possa comentar sobre suas configurações ou exatamente o que eles fizeram para encontrar o problema.

@grayben você poderia executar as seguintes linhas de código e ver se obtém a mesma saída [0, 448887900, 85839474] ?

>>> import zipfile
>>> plzip = '/Users/beng/nltk_data/corpora/panlex_lite.zip'
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
[0, 448887900, 85839474]

Na linha de comando fora do python, qual é a saída para o seguinte ?:

$ ls -lah /Users/beng//nltk_data/corpora/

Seu código -> minha saída:

>>> import zipfile
>>> plzip = ' /Users/beng//nltk_data/corpora/panlex_lite.zip'
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1009, in __init__
    self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: ' /Users/beng//nltk_data/corpora/panlex_lite.zip'

Em seguida, alterei ' /Users/beng//nltk_data/corpora/panlex_lite.zip' para '/Users/beng//nltk_data/corpora/panlex_lite.zip' (sem espaço antes de root):

>>> plzip = '/Users/beng//nltk_data/corpora/panlex_lite.zip'
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1026, in __init__
    self._RealGetContents()
  File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1093, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
Bens-MacBook-Pro:10K-Extractor beng$ ls -lah /Users/beng//nltk_data/corpora/
total 966608
drwxr-xr-x   152 beng  staff   5.0K 19 Apr 16:26 .
drwxr-xr-x    11 beng  staff   374B  3 Mar 14:41 ..
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:32 abc
-rw-r--r--     1 beng  staff   1.4M  3 Mar 14:32 abc.zip
drwxr-xr-x     4 beng  staff   136B  3 Mar 14:32 alpino
-rw-r--r--     1 beng  staff   2.7M  3 Mar 14:32 alpino.zip
drwxr-xr-x     7 beng  staff   238B  3 Mar 14:32 biocreative_ppi
-rw-r--r--     1 beng  staff   218K  3 Mar 14:32 biocreative_ppi.zip
drwxr-xr-x   505 beng  staff    17K  3 Mar 14:32 brown
-rw-r--r--     1 beng  staff   3.2M  3 Mar 14:32 brown.zip
drwxr-xr-x   509 beng  staff    17K  3 Mar 14:32 brown_tei
-rw-r--r--     1 beng  staff   8.3M  3 Mar 14:32 brown_tei.zip
drwxr-xr-x  1389 beng  staff    46K  3 Mar 14:33 cess_cat
-rw-r--r--     1 beng  staff   5.1M  3 Mar 14:33 cess_cat.zip
drwxr-xr-x   612 beng  staff    20K  3 Mar 14:33 cess_esp
-rw-r--r--     1 beng  staff   2.1M  3 Mar 14:33 cess_esp.zip
drwxr-xr-x    10 beng  staff   340B  3 Mar 14:33 chat80
-rw-r--r--     1 beng  staff    19K  3 Mar 14:33 chat80.zip
drwxr-xr-x     3 beng  staff   102B  3 Mar 14:33 city_database
-rw-r--r--     1 beng  staff   1.7K  3 Mar 14:33 city_database.zip
drwxr-xr-x     4 beng  staff   136B  3 Mar 14:33 cmudict
-rw-r--r--     1 beng  staff   875K  3 Mar 14:33 cmudict.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:33 comparative_sentences
-rw-r--r--     1 beng  staff   273K  3 Mar 14:33 comparative_sentences.zip
-rw-r--r--     1 beng  staff    11M  3 Mar 14:33 comtrans.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:33 conll2000
-rw-r--r--     1 beng  staff   739K  3 Mar 14:33 conll2000.zip
drwxr-xr-x     9 beng  staff   306B  3 Mar 14:33 conll2002
-rw-r--r--     1 beng  staff   1.8M  3 Mar 14:33 conll2002.zip
-rw-r--r--     1 beng  staff   1.2M  3 Mar 14:33 conll2007.zip
drwxr-xr-x   453 beng  staff    15K  3 Mar 14:33 crubadan
-rw-r--r--     1 beng  staff   5.0M  3 Mar 14:33 crubadan.zip
drwxr-xr-x   201 beng  staff   6.7K  3 Mar 14:33 dependency_treebank
-rw-r--r--     1 beng  staff   447K  3 Mar 14:33 dependency_treebank.zip
drwxr-xr-x    14 beng  staff   476B  3 Mar 14:33 europarl_raw
-rw-r--r--     1 beng  staff    12M  3 Mar 14:33 europarl_raw.zip
drwxr-xr-x     4 beng  staff   136B  3 Mar 14:33 floresta
-rw-r--r--     1 beng  staff   1.8M  3 Mar 14:33 floresta.zip
drwxr-xr-x    16 beng  staff   544B  3 Mar 14:34 framenet_v15
-rw-r--r--     1 beng  staff    66M  3 Mar 14:33 framenet_v15.zip
drwxr-xr-x    11 beng  staff   374B  3 Mar 14:34 gazetteers
-rw-r--r--     1 beng  staff   8.1K  3 Mar 14:34 gazetteers.zip
drwxr-xr-x    11 beng  staff   374B  3 Mar 14:34 genesis
-rw-r--r--     1 beng  staff   462K  3 Mar 14:34 genesis.zip
drwxr-xr-x    21 beng  staff   714B  3 Mar 14:34 gutenberg
-rw-r--r--     1 beng  staff   4.1M  3 Mar 14:34 gutenberg.zip
drwxr-xr-x     9 beng  staff   306B  3 Mar 14:34 ieer
-rw-r--r--     1 beng  staff   162K  3 Mar 14:34 ieer.zip
drwxr-xr-x    59 beng  staff   2.0K  3 Mar 14:34 inaugural
-rw-r--r--     1 beng  staff   314K  3 Mar 14:34 inaugural.zip
drwxr-xr-x     7 beng  staff   238B  3 Mar 14:34 indian
-rw-r--r--     1 beng  staff   195K  3 Mar 14:34 indian.zip
-rw-r--r--     1 beng  staff    16M  3 Mar 14:34 jeita.zip
drwxr-xr-x    22 beng  staff   748B  3 Mar 14:34 kimmo
-rw-r--r--     1 beng  staff   183K  3 Mar 14:34 kimmo.zip
-rw-r--r--     1 beng  staff   8.4M  3 Mar 14:34 knbc.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:34 lin_thesaurus
-rw-r--r--     1 beng  staff    85M  3 Mar 14:34 lin_thesaurus.zip
drwxr-xr-x   112 beng  staff   3.7K  3 Mar 14:34 mac_morpho
-rw-r--r--     1 beng  staff   2.9M  3 Mar 14:34 mac_morpho.zip
-rw-r--r--     1 beng  staff   5.9M  3 Mar 14:34 machado.zip
-rw-r--r--     1 beng  staff   1.5M  3 Mar 14:34 masc_tagged.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:34 movie_reviews
-rw-r--r--     1 beng  staff   3.8M  3 Mar 14:34 movie_reviews.zip
drwxr-xr-x    56 beng  staff   1.9K  3 Mar 14:38 mte_teip5
-rw-r--r--     1 beng  staff    14M  3 Mar 14:38 mte_teip5.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:34 names
-rw-r--r--     1 beng  staff    21K  3 Mar 14:34 names.zip
-rw-r--r--     1 beng  staff   6.4M  3 Mar 14:35 nombank.1.0.zip
drwxr-xr-x    19 beng  staff   646B  3 Mar 14:35 nps_chat
-rw-r--r--     1 beng  staff   294K  3 Mar 14:35 nps_chat.zip
drwxr-xr-x    32 beng  staff   1.1K  3 Mar 14:35 omw
-rw-r--r--     1 beng  staff    11M  3 Mar 14:35 omw.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:35 opinion_lexicon
-rw-r--r--     1 beng  staff    24K  3 Mar 14:35 opinion_lexicon.zip
drwxr-xr-x     4 beng  staff   136B 21 Mar 17:54 panlex_lite
-rw-r--r--     1 beng  staff    58M 19 Apr 16:28 panlex_lite.zip
-rw-r--r--     1 beng  staff   2.6M  3 Mar 14:37 panlex_swadesh.zip
drwxr-xr-x    21 beng  staff   714B  3 Mar 14:35 paradigms
-rw-r--r--     1 beng  staff    24K  3 Mar 14:35 paradigms.zip
drwxr-xr-x   475 beng  staff    16K  3 Mar 14:35 pil
-rw-r--r--     1 beng  staff   1.4M  3 Mar 14:35 pil.zip
drwxr-xr-x    16 beng  staff   544B  3 Mar 14:35 pl196x
-rw-r--r--     1 beng  staff   6.7M  3 Mar 14:35 pl196x.zip
drwxr-xr-x     7 beng  staff   238B  3 Mar 14:35 ppattach
-rw-r--r--     1 beng  staff   763K  3 Mar 14:35 ppattach.zip
drwxr-xr-x     8 beng  staff   272B  3 Mar 14:35 problem_reports
-rw-r--r--     1 beng  staff   1.0M  3 Mar 14:35 problem_reports.zip
drwxr-xr-x     8 beng  staff   272B  3 Mar 14:35 product_reviews_1
-rw-r--r--     1 beng  staff   138K  3 Mar 14:35 product_reviews_1.zip
drwxr-xr-x    12 beng  staff   408B  3 Mar 14:35 product_reviews_2
-rw-r--r--     1 beng  staff   167K  3 Mar 14:35 product_reviews_2.zip
-rw-r--r--     1 beng  staff   5.1M  3 Mar 14:35 propbank.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:35 pros_cons
-rw-r--r--     1 beng  staff   729K  3 Mar 14:35 pros_cons.zip
drwxr-xr-x     3 beng  staff   102B  3 Mar 14:35 ptb
-rw-r--r--     1 beng  staff   6.1K  3 Mar 14:35 ptb.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:35 qc
-rw-r--r--     1 beng  staff   123K  3 Mar 14:35 qc.zip
-rw-r--r--     1 beng  staff   6.1M  3 Mar 14:35 reuters.zip
drwxr-xr-x     9 beng  staff   306B  3 Mar 14:35 rte
-rw-r--r--     1 beng  staff   377K  3 Mar 14:35 rte.zip
-rw-r--r--     1 beng  staff   4.2M  3 Mar 14:35 semcor.zip
drwxr-xr-x     7 beng  staff   238B  3 Mar 14:35 senseval
-rw-r--r--     1 beng  staff   2.1M  3 Mar 14:35 senseval.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:35 sentence_polarity
-rw-r--r--     1 beng  staff   479K  3 Mar 14:35 sentence_polarity.zip
drwxr-xr-x     4 beng  staff   136B  3 Mar 14:35 sentiwordnet
-rw-r--r--     1 beng  staff   4.5M  3 Mar 14:35 sentiwordnet.zip
drwxr-xr-x    13 beng  staff   442B  3 Mar 14:35 shakespeare
-rw-r--r--     1 beng  staff   464K  3 Mar 14:35 shakespeare.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:35 sinica_treebank
-rw-r--r--     1 beng  staff   878K  3 Mar 14:35 sinica_treebank.zip
drwxr-xr-x     9 beng  staff   306B  3 Mar 14:35 smultron
-rw-r--r--     1 beng  staff   162K  3 Mar 14:35 smultron.zip
drwxr-xr-x    68 beng  staff   2.3K  3 Mar 14:35 state_union
-rw-r--r--     1 beng  staff   790K  3 Mar 14:35 state_union.zip
drwxr-xr-x    17 beng  staff   578B  3 Mar 14:35 stopwords
-rw-r--r--     1 beng  staff   8.9K  3 Mar 14:35 stopwords.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:35 subjectivity
-rw-r--r--     1 beng  staff   509K  3 Mar 14:35 subjectivity.zip
drwxr-xr-x    27 beng  staff   918B  3 Mar 14:35 swadesh
-rw-r--r--     1 beng  staff    22K  3 Mar 14:35 swadesh.zip
drwxr-xr-x     8 beng  staff   272B  3 Mar 14:35 switchboard
-rw-r--r--     1 beng  staff   773K  3 Mar 14:35 switchboard.zip
drwxr-xr-x    39 beng  staff   1.3K  3 Mar 14:35 timit
-rw-r--r--     1 beng  staff    21M  3 Mar 14:35 timit.zip
drwxr-xr-x     8 beng  staff   272B  3 Mar 14:35 toolbox
-rw-r--r--     1 beng  staff   245K  3 Mar 14:35 toolbox.zip
drwxr-xr-x    12 beng  staff   408B  3 Mar 14:36 treebank
-rw-r--r--     1 beng  staff   1.6M  3 Mar 14:36 treebank.zip
drwxr-xr-x     7 beng  staff   238B  3 Mar 14:36 twitter_samples
-rw-r--r--     1 beng  staff    15M  3 Mar 14:36 twitter_samples.zip
drwxr-xr-x   337 beng  staff    11K  3 Mar 14:36 udhr
-rw-r--r--     1 beng  staff   1.1M  3 Mar 14:36 udhr.zip
drwxr-xr-x   390 beng  staff    13K  3 Mar 14:36 udhr2
-rw-r--r--     1 beng  staff   1.6M  3 Mar 14:36 udhr2.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:36 unicode_samples
-rw-r--r--     1 beng  staff   1.2K  3 Mar 14:36 unicode_samples.zip
-rw-r--r--     1 beng  staff    25M  3 Mar 14:36 universal_treebanks_v20.zip
drwxr-xr-x   242 beng  staff   8.0K  3 Mar 14:36 verbnet
-rw-r--r--     1 beng  staff   316K  3 Mar 14:36 verbnet.zip
drwxr-xr-x     9 beng  staff   306B  3 Mar 14:36 webtext
-rw-r--r--     1 beng  staff   631K  3 Mar 14:36 webtext.zip
drwxr-xr-x    20 beng  staff   680B  3 Mar 14:36 wordnet
-rw-r--r--     1 beng  staff    10M  3 Mar 14:36 wordnet.zip
drwxr-xr-x    30 beng  staff   1.0K  3 Mar 14:36 wordnet_ic
-rw-r--r--     1 beng  staff    11M  3 Mar 14:36 wordnet_ic.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:36 words
-rw-r--r--     1 beng  staff   740K  3 Mar 14:36 words.zip
drwxr-xr-x     3 beng  staff   102B  3 Mar 14:36 ycoe
-rw-r--r--     1 beng  staff   477B  3 Mar 14:36 ycoe.zip

Isso sugere que, durante o download, o arquivo foi corrompido (possivelmente devido a uma conexão de Internet interrompida):

>>> plzip = '/Users/beng//nltk_data/corpora/panlex_lite.zip'
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1026, in __init__
    self._RealGetContents()
  File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1093, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Vá para '/Users/beng//nltk_data/corpora/ , exclua o arquivo panlex_lite.zip e baixe-o novamente. Observe que pode levar até 2 horas ou mais para baixar esse arquivo zip quando o servidor está sobrecarregado ou sua conexão com a Internet está lenta.

Eu fiz o seguinte (três vezes):

  1. rm /Users/beng//nltk_data/corpora/panlex_lite.zip
  2. python3
  3. Os seguintes comandos Python:
>>> import nltk
>>> nltk.download('panlex_lite')
[nltk_data] Downloading package panlex_lite to
[nltk_data]     /Users/beng/nltk_data...
[nltk_data]   Unzipping corpora/panlex_lite.zip.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 664, in download
    for msg in self.incr_download(info_or_id, download_dir, force):
  File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 549, in incr_download
    for msg in self._download_package(info, download_dir, force):
  File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 638, in _download_package
    for msg in _unzip_iter(filepath, zipdir, verbose=False):
  File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 2039, in _unzip_iter
    outfile.write(contents)
OSError: [Errno 22] Invalid argument
>>> 

No entanto, observe também a seguinte entrada / saída de comando:

>>> plzip = '/Users/beng//nltk_data/corpora/panlex_lite.zip'
>>> import zipfile
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
[0, 448887900, 85839474]

Você também pode fazer rm -rf /Users/beng//nltk_data/corpora/panlex_lite antes de executar o python3 ?

ie:

$ rm /Users/beng//nltk_data/corpora/panlex_lite.zip
$ rm -rf /Users/beng//nltk_data/corpora/panlex_lite
$ python -m nltk.downloader panlex_lite
$ python3
>>> plzip = '/Users/beng//nltk_data/corpora/panlex_lite.zip'
>>> import zipfile
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
[0, 448887900, 85839474]

Não consegui reproduzir seu OSError no Ubuntu 14.04 Python 3.5.1:

alvas<strong i="13">@ubi</strong>:~/nltk_data/corpora$ ls panlex_
panlex_lite.zip     panlex_swadesh.zip  
alvas<strong i="14">@ubi</strong>:~/nltk_data/corpora$ cd
alvas<strong i="15">@ubi</strong>:~$ python
Python 2.7.11 (default, Dec 15 2015, 16:46:19) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> nltk.download('panlex_lite')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'nltk' is not defined
>>> import nltk
>>> nltk.download('panlex_lite')
[nltk_data] Downloading package panlex_lite to
[nltk_data]     /home/alvas/nltk_data...
[nltk_data]   Package panlex_lite is already up-to-date!
True
>>> exit()
alvas<strong i="16">@ubi</strong>:~$ python3
Python 3.5.1 (default, Dec 18 2015, 00:00:00) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download('panlex_lite')
[nltk_data] Downloading package panlex_lite to
[nltk_data]     /home/alvas/nltk_data...
[nltk_data]   Package panlex_lite is already up-to-date!
True

BTW, se você não vai usar panlex , o resto de NLTK funcionará bem sem ele =)

Bens-MacBook-Pro:work beng$ rm -rf /Users/beng//nltk_data/corpora/panlex_lite
Bens-MacBook-Pro:work beng$ ls -lah /Users/beng//nltk_data/corpora
total 4361152
drwxr-xr-x   151 beng  staff   5.0K 20 Apr 13:12 .
drwxr-xr-x    11 beng  staff   374B  3 Mar 14:41 ..
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:32 abc
-rw-r--r--     1 beng  staff   1.4M  3 Mar 14:32 abc.zip
drwxr-xr-x     4 beng  staff   136B  3 Mar 14:32 alpino
-rw-r--r--     1 beng  staff   2.7M  3 Mar 14:32 alpino.zip
drwxr-xr-x     7 beng  staff   238B  3 Mar 14:32 biocreative_ppi
-rw-r--r--     1 beng  staff   218K  3 Mar 14:32 biocreative_ppi.zip
drwxr-xr-x   505 beng  staff    17K  3 Mar 14:32 brown
-rw-r--r--     1 beng  staff   3.2M  3 Mar 14:32 brown.zip
drwxr-xr-x   509 beng  staff    17K  3 Mar 14:32 brown_tei
-rw-r--r--     1 beng  staff   8.3M  3 Mar 14:32 brown_tei.zip
drwxr-xr-x  1389 beng  staff    46K  3 Mar 14:33 cess_cat
-rw-r--r--     1 beng  staff   5.1M  3 Mar 14:33 cess_cat.zip
drwxr-xr-x   612 beng  staff    20K  3 Mar 14:33 cess_esp
-rw-r--r--     1 beng  staff   2.1M  3 Mar 14:33 cess_esp.zip
drwxr-xr-x    10 beng  staff   340B  3 Mar 14:33 chat80
-rw-r--r--     1 beng  staff    19K  3 Mar 14:33 chat80.zip
drwxr-xr-x     3 beng  staff   102B  3 Mar 14:33 city_database
-rw-r--r--     1 beng  staff   1.7K  3 Mar 14:33 city_database.zip
drwxr-xr-x     4 beng  staff   136B  3 Mar 14:33 cmudict
-rw-r--r--     1 beng  staff   875K  3 Mar 14:33 cmudict.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:33 comparative_sentences
-rw-r--r--     1 beng  staff   273K  3 Mar 14:33 comparative_sentences.zip
-rw-r--r--     1 beng  staff    11M  3 Mar 14:33 comtrans.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:33 conll2000
-rw-r--r--     1 beng  staff   739K  3 Mar 14:33 conll2000.zip
drwxr-xr-x     9 beng  staff   306B  3 Mar 14:33 conll2002
-rw-r--r--     1 beng  staff   1.8M  3 Mar 14:33 conll2002.zip
-rw-r--r--     1 beng  staff   1.2M  3 Mar 14:33 conll2007.zip
drwxr-xr-x   453 beng  staff    15K  3 Mar 14:33 crubadan
-rw-r--r--     1 beng  staff   5.0M  3 Mar 14:33 crubadan.zip
drwxr-xr-x   201 beng  staff   6.7K  3 Mar 14:33 dependency_treebank
-rw-r--r--     1 beng  staff   447K  3 Mar 14:33 dependency_treebank.zip
drwxr-xr-x    14 beng  staff   476B  3 Mar 14:33 europarl_raw
-rw-r--r--     1 beng  staff    12M  3 Mar 14:33 europarl_raw.zip
drwxr-xr-x     4 beng  staff   136B  3 Mar 14:33 floresta
-rw-r--r--     1 beng  staff   1.8M  3 Mar 14:33 floresta.zip
drwxr-xr-x    16 beng  staff   544B  3 Mar 14:34 framenet_v15
-rw-r--r--     1 beng  staff    66M  3 Mar 14:33 framenet_v15.zip
drwxr-xr-x    11 beng  staff   374B  3 Mar 14:34 gazetteers
-rw-r--r--     1 beng  staff   8.1K  3 Mar 14:34 gazetteers.zip
drwxr-xr-x    11 beng  staff   374B  3 Mar 14:34 genesis
-rw-r--r--     1 beng  staff   462K  3 Mar 14:34 genesis.zip
drwxr-xr-x    21 beng  staff   714B  3 Mar 14:34 gutenberg
-rw-r--r--     1 beng  staff   4.1M  3 Mar 14:34 gutenberg.zip
drwxr-xr-x     9 beng  staff   306B  3 Mar 14:34 ieer
-rw-r--r--     1 beng  staff   162K  3 Mar 14:34 ieer.zip
drwxr-xr-x    59 beng  staff   2.0K  3 Mar 14:34 inaugural
-rw-r--r--     1 beng  staff   314K  3 Mar 14:34 inaugural.zip
drwxr-xr-x     7 beng  staff   238B  3 Mar 14:34 indian
-rw-r--r--     1 beng  staff   195K  3 Mar 14:34 indian.zip
-rw-r--r--     1 beng  staff    16M  3 Mar 14:34 jeita.zip
drwxr-xr-x    22 beng  staff   748B  3 Mar 14:34 kimmo
-rw-r--r--     1 beng  staff   183K  3 Mar 14:34 kimmo.zip
-rw-r--r--     1 beng  staff   8.4M  3 Mar 14:34 knbc.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:34 lin_thesaurus
-rw-r--r--     1 beng  staff    85M  3 Mar 14:34 lin_thesaurus.zip
drwxr-xr-x   112 beng  staff   3.7K  3 Mar 14:34 mac_morpho
-rw-r--r--     1 beng  staff   2.9M  3 Mar 14:34 mac_morpho.zip
-rw-r--r--     1 beng  staff   5.9M  3 Mar 14:34 machado.zip
-rw-r--r--     1 beng  staff   1.5M  3 Mar 14:34 masc_tagged.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:34 movie_reviews
-rw-r--r--     1 beng  staff   3.8M  3 Mar 14:34 movie_reviews.zip
drwxr-xr-x    56 beng  staff   1.9K  3 Mar 14:38 mte_teip5
-rw-r--r--     1 beng  staff    14M  3 Mar 14:38 mte_teip5.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:34 names
-rw-r--r--     1 beng  staff    21K  3 Mar 14:34 names.zip
-rw-r--r--     1 beng  staff   6.4M  3 Mar 14:35 nombank.1.0.zip
drwxr-xr-x    19 beng  staff   646B  3 Mar 14:35 nps_chat
-rw-r--r--     1 beng  staff   294K  3 Mar 14:35 nps_chat.zip
drwxr-xr-x    32 beng  staff   1.1K  3 Mar 14:35 omw
-rw-r--r--     1 beng  staff    11M  3 Mar 14:35 omw.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:35 opinion_lexicon
-rw-r--r--     1 beng  staff    24K  3 Mar 14:35 opinion_lexicon.zip
-rw-r--r--     1 beng  staff   1.7G 20 Apr 12:46 panlex_lite.zip
-rw-r--r--     1 beng  staff   2.6M  3 Mar 14:37 panlex_swadesh.zip
drwxr-xr-x    21 beng  staff   714B  3 Mar 14:35 paradigms
-rw-r--r--     1 beng  staff    24K  3 Mar 14:35 paradigms.zip
drwxr-xr-x   475 beng  staff    16K  3 Mar 14:35 pil
-rw-r--r--     1 beng  staff   1.4M  3 Mar 14:35 pil.zip
drwxr-xr-x    16 beng  staff   544B  3 Mar 14:35 pl196x
-rw-r--r--     1 beng  staff   6.7M  3 Mar 14:35 pl196x.zip
drwxr-xr-x     7 beng  staff   238B  3 Mar 14:35 ppattach
-rw-r--r--     1 beng  staff   763K  3 Mar 14:35 ppattach.zip
drwxr-xr-x     8 beng  staff   272B  3 Mar 14:35 problem_reports
-rw-r--r--     1 beng  staff   1.0M  3 Mar 14:35 problem_reports.zip
drwxr-xr-x     8 beng  staff   272B  3 Mar 14:35 product_reviews_1
-rw-r--r--     1 beng  staff   138K  3 Mar 14:35 product_reviews_1.zip
drwxr-xr-x    12 beng  staff   408B  3 Mar 14:35 product_reviews_2
-rw-r--r--     1 beng  staff   167K  3 Mar 14:35 product_reviews_2.zip
-rw-r--r--     1 beng  staff   5.1M  3 Mar 14:35 propbank.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:35 pros_cons
-rw-r--r--     1 beng  staff   729K  3 Mar 14:35 pros_cons.zip
drwxr-xr-x     3 beng  staff   102B  3 Mar 14:35 ptb
-rw-r--r--     1 beng  staff   6.1K  3 Mar 14:35 ptb.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:35 qc
-rw-r--r--     1 beng  staff   123K  3 Mar 14:35 qc.zip
-rw-r--r--     1 beng  staff   6.1M  3 Mar 14:35 reuters.zip
drwxr-xr-x     9 beng  staff   306B  3 Mar 14:35 rte
-rw-r--r--     1 beng  staff   377K  3 Mar 14:35 rte.zip
-rw-r--r--     1 beng  staff   4.2M  3 Mar 14:35 semcor.zip
drwxr-xr-x     7 beng  staff   238B  3 Mar 14:35 senseval
-rw-r--r--     1 beng  staff   2.1M  3 Mar 14:35 senseval.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:35 sentence_polarity
-rw-r--r--     1 beng  staff   479K  3 Mar 14:35 sentence_polarity.zip
drwxr-xr-x     4 beng  staff   136B  3 Mar 14:35 sentiwordnet
-rw-r--r--     1 beng  staff   4.5M  3 Mar 14:35 sentiwordnet.zip
drwxr-xr-x    13 beng  staff   442B  3 Mar 14:35 shakespeare
-rw-r--r--     1 beng  staff   464K  3 Mar 14:35 shakespeare.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:35 sinica_treebank
-rw-r--r--     1 beng  staff   878K  3 Mar 14:35 sinica_treebank.zip
drwxr-xr-x     9 beng  staff   306B  3 Mar 14:35 smultron
-rw-r--r--     1 beng  staff   162K  3 Mar 14:35 smultron.zip
drwxr-xr-x    68 beng  staff   2.3K  3 Mar 14:35 state_union
-rw-r--r--     1 beng  staff   790K  3 Mar 14:35 state_union.zip
drwxr-xr-x    17 beng  staff   578B  3 Mar 14:35 stopwords
-rw-r--r--     1 beng  staff   8.9K  3 Mar 14:35 stopwords.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:35 subjectivity
-rw-r--r--     1 beng  staff   509K  3 Mar 14:35 subjectivity.zip
drwxr-xr-x    27 beng  staff   918B  3 Mar 14:35 swadesh
-rw-r--r--     1 beng  staff    22K  3 Mar 14:35 swadesh.zip
drwxr-xr-x     8 beng  staff   272B  3 Mar 14:35 switchboard
-rw-r--r--     1 beng  staff   773K  3 Mar 14:35 switchboard.zip
drwxr-xr-x    39 beng  staff   1.3K  3 Mar 14:35 timit
-rw-r--r--     1 beng  staff    21M  3 Mar 14:35 timit.zip
drwxr-xr-x     8 beng  staff   272B  3 Mar 14:35 toolbox
-rw-r--r--     1 beng  staff   245K  3 Mar 14:35 toolbox.zip
drwxr-xr-x    12 beng  staff   408B  3 Mar 14:36 treebank
-rw-r--r--     1 beng  staff   1.6M  3 Mar 14:36 treebank.zip
drwxr-xr-x     7 beng  staff   238B  3 Mar 14:36 twitter_samples
-rw-r--r--     1 beng  staff    15M  3 Mar 14:36 twitter_samples.zip
drwxr-xr-x   337 beng  staff    11K  3 Mar 14:36 udhr
-rw-r--r--     1 beng  staff   1.1M  3 Mar 14:36 udhr.zip
drwxr-xr-x   390 beng  staff    13K  3 Mar 14:36 udhr2
-rw-r--r--     1 beng  staff   1.6M  3 Mar 14:36 udhr2.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:36 unicode_samples
-rw-r--r--     1 beng  staff   1.2K  3 Mar 14:36 unicode_samples.zip
-rw-r--r--     1 beng  staff    25M  3 Mar 14:36 universal_treebanks_v20.zip
drwxr-xr-x   242 beng  staff   8.0K  3 Mar 14:36 verbnet
-rw-r--r--     1 beng  staff   316K  3 Mar 14:36 verbnet.zip
drwxr-xr-x     9 beng  staff   306B  3 Mar 14:36 webtext
-rw-r--r--     1 beng  staff   631K  3 Mar 14:36 webtext.zip
drwxr-xr-x    20 beng  staff   680B  3 Mar 14:36 wordnet
-rw-r--r--     1 beng  staff    10M  3 Mar 14:36 wordnet.zip
drwxr-xr-x    30 beng  staff   1.0K  3 Mar 14:36 wordnet_ic
-rw-r--r--     1 beng  staff    11M  3 Mar 14:36 wordnet_ic.zip
drwxr-xr-x     5 beng  staff   170B  3 Mar 14:36 words
-rw-r--r--     1 beng  staff   740K  3 Mar 14:36 words.zip
drwxr-xr-x     3 beng  staff   102B  3 Mar 14:36 ycoe
-rw-r--r--     1 beng  staff   477B  3 Mar 14:36 ycoe.zip
Bens-MacBook-Pro:work beng$ python3
Python 3.5.1 (default, Mar  3 2016, 14:25:53) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download('panlex_lite')
[nltk_data] Downloading package panlex_lite to
[nltk_data]     /Users/beng/nltk_data...
[nltk_data]   Package panlex_lite is already up-to-date!
True

Além disso, através da GUI do downloader, o download de "todos" finalmente é bem-sucedido, com todos os campos marcados como "instalados".

Excelente! Então não há OSError agora? É o diretório panlex_lite quebrado (de downloads anteriores) remanescente que causou o OSError . Uma vez que infolist do arquivo zip estiver certo, não deve haver problema.

Divirta-se jogando NLTK! Diga aos seus amigos / colegas para fazerem o mesmo também:

$ rm /Users/beng//nltk_data/corpora/panlex_lite.zip
$ rm -rf /Users/beng//nltk_data/corpora/panlex_lite
$ python -m nltk.downloader panlex_lite
$ python3
>>> plzip = '/Users/beng//nltk_data/corpora/panlex_lite.zip'
>>> import zipfile
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
[0, 448887900, 85839474]

Obrigado!

Eu recebo exatamente o mesmo problema com o NLTK 3.2.1 mais recente no Ubuntu 16.04 (que trava todo o meu sistema operacional) e no OSX, recebo os mesmos erros do OP. Estou surpreso que este caso tenha sido encerrado como se não houvesse nada de errado com ele.

Ao tentar a solução alternativa, ele falha após esta etapa, pois tentou extraí-lo automaticamente logo após baixá-lo: python -m nltk.downloader panlex_lite

[nltk_data] Downloading package panlex_lite to
[nltk_data]     /Users/houmie/nltk_data...

[nltk_data]   Unzipping corpora/panlex_lite.zip.

Traceback (most recent call last):
  File "/Users/houmie/.pyenv/versions/3.5.1/lib/python3.5/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/houmie/.pyenv/versions/3.5.1/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/houmie/.pyenv/versions/venv35/lib/python3.5/site-packages/nltk/downloader.py", line 2268, in <module>
    halt_on_error=options.halt_on_error)
  File "/Users/houmie/.pyenv/versions/venv35/lib/python3.5/site-packages/nltk/downloader.py", line 664, in download
    for msg in self.incr_download(info_or_id, download_dir, force):
  File "/Users/houmie/.pyenv/versions/venv35/lib/python3.5/site-packages/nltk/downloader.py", line 549, in incr_download
    for msg in self._download_package(info, download_dir, force):
  File "/Users/houmie/.pyenv/versions/venv35/lib/python3.5/site-packages/nltk/downloader.py", line 638, in _download_package
    for msg in _unzip_iter(filepath, zipdir, verbose=False):
  File "/Users/houmie/.pyenv/versions/venv35/lib/python3.5/site-packages/nltk/downloader.py", line 2039, in _unzip_iter
    outfile.write(contents)
OSError: [Errno 22] Invalid argument

Obrigado

@houmie para que serve sua saída:

$ rm /Users/houmie//nltk_data/corpora/panlex_lite.zip
$ rm -rf /Users/houmie//nltk_data/corpora/panlex_lite
$ python -m nltk.downloader panlex_lite
$ python3
>>> plzip = '/Users/houmie//nltk_data/corpora/panlex_lite.zip'
>>> import zipfile
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
[0, 448887900, 85839474]

Isso não foi corrigido - está acontecendo com o python 2.7, 3.4.3 e 3.5.1. O download do panlex_lite trava por um bom tempo, e então a descompactação congela a GUI e / ou causa o OSError.

Tive o mesmo problema no meu Macbook Pro com (OS X EI Capitain, Anaconda 1.4.0 + python 3.5.2) e tentei a versão NLTK em "conda install nltk" com 3.2.1 e "sudo python3 setup.py install "com o branch master do github. A parte interessante é que nunca recebi o CRC [0, 448887900, 85839474] mas [0, 448887900, 84607019] sempre depois de tentar baixar panlex_lite.zip mais de 5 vezes. Alguma dica ou pista?

Infelizmente, eles se recusam a sequer existir o problema. Eu relatei isso em maio de 2016 e ainda não reconheci o problema.

Acabei de tentar novamente por meio do download da GUI e ainda recebo esta mensagem de erro exibida no console:

showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/houmie/.pyenv/versions/venv35/lib/python3.5/site-packages/nltk/downloader.py", line 655, in download
    self._interactive_download()
  File "/Users/houmie/.pyenv/versions/venv35/lib/python3.5/site-packages/nltk/downloader.py", line 974, in _interactive_download
    DownloaderGUI(self).mainloop()
  File "/Users/houmie/.pyenv/versions/venv35/lib/python3.5/site-packages/nltk/downloader.py", line 1709, in mainloop
    self.top.mainloop(*args, **kwargs)
  File "/Users/houmie/.pyenv/versions/3.5.1/lib/python3.5/tkinter/__init__.py", line 1131, in mainloop
    self.tk.mainloop(n)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 0: invalid continuation byte
>>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 0: invalid continuation byte
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 0: invalid continuation byte
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 0: invalid continuation byte

Isso é uma dor enorme para mim, pois preciso examinar o código e excluir todas as referências a Pantex para fazer os pacotes funcionarem.

Olá, o mesmo aqui, espero que se um número suficiente de pessoas denunciar, então ele será consertado em algum momento ...

ok aí, aqui está o que eu fiz

d = nltk.downloader.Downloader()
d._packages.pop('panlex_lite')
d.download()

# error message
d._packages.pop('panlex_lite')
/usr/local/lib/python3.5/site-packages/nltk/downloader.py in info(self, id)
    876         if id in self._packages: return self._packages[id]
    877         if id in self._collections: return self._collections[id]
--> 878         raise ValueError('Package %r not found in index' % id)
    879
    880     def xmlinfo(self, id):

Acho que podemos adicionar algo como if id != 'panlex_lite' ao código ...

Mas, quanto a mim, a maneira mais fácil é assim:

Aaaaaaand .... Done downloading collection all ! 🎉🎉🎉🎉

@demidovakatya

Eu gostaria de entender que você mencionou que

que significa

<package author="David Kamholz" checksum="e13211688738201c0a5bd5b2f50e94ab" id="panlex_lite" license="CC0 1.0 Universal" name="PanLex Lite Corpus" size="2202492316" subdir="corpora" unzip="1" unzipped_size="5778483185" url="http://dev.panlex.org/db/panlex_lite.zip" webpage="http://panlex.org/" />
<package author="Jonathan Pool (editor)" checksum="59a08f6c19d1d6d72cc03189983c8045" id="panlex_swadesh" license="CC0 1.0 Universal" name="PanLex Swadesh Corpora" size="2699578" subdir="corpora" unzip="0" unzipped_size="4103346" url="https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/panlex_swadesh.zip" webpage="http://panlex.org/" />

=>

<package author="David Kamholz" checksum="e13211688738201c0a5bd5b2f50e94ab" id="_lite" license="CC0 1.0 Universal" name="PanLex Lite Corpus" size="2202492316" subdir="corpora" unzip="1" unzipped_size="5778483185" url="http://dev.panlex.org/db/panlex_lite.zip" webpage="http://panlex.org/" />
<package author="Jonathan Pool (editor)" checksum="59a08f6c19d1d6d72cc03189983c8045" id="_swadesh" license="CC0 1.0 Universal" name="PanLex Swadesh Corpora" size="2699578" subdir="corpora" unzip="0" unzipped_size="4103346" url="https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/panlex_swadesh.zip" webpage="http://panlex.org/" />

@demidovakatya ,
Obrigada. Eu encontrei o mesmo problema.

Baixar panlex_lite deve funcionar bem agora

Novamente não está funcionando.

Não tenho largura de banda para testar isso. Nossa página nltk_data aponta para a versão de 1º de abril , que não foi alterada quando a versão de 1º de maio foi adicionada recentemente.

@kamholz : você se importaria de fazer o seguinte para verificar se ainda funciona, por favor? python -m nltk.downloader panlex_lite

Desculpe, isso continua acontecendo. É difícil depurar, porque muitas vezes não consigo reproduzir os erros relatados. Nesse caso, quando executo python -m nltk.downloader panlex_lite , ele não relata nenhum erro e descompacta. No entanto, a soma MD5 em https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml está incorreta. Não sei como isso aconteceu, pois o arquivo não mudou. A entrada deve ser a seguinte:

    <package author="David Kamholz" checksum="3156099b9acb623725d63c727fd8591d" id="panlex_lite" license="CC0 1.0 Universal" name="PanLex Lite Corpus" size="2357864277" subdir="corpora" unzip="1" unzipped_size="5993562112" url="https://db.panlex.org/panlex_lite-20170401.zip" webpage="http://panlex.org/" />

Também atualizei o URL acima (mas isso não deveria ter feito diferença para esse problema, já que o antigo redireciona) e os tamanhos.

Obrigado por este @kamholz . Enviei um arquivo de índice corrigido usando essas somas de verificação.
@clockwiser , por favor, tente novamente e diga-nos como você está?

@sokhnavor isso é causado por # 1787

@alvations, obrigado! Eu vejo:
PATH_TO_NLTK_DATA = / home / nome de usuário / nltk_data /
wget https://github.com/nltk/nltk_data/archive/gh-pages.zip
descompacte gh-pages.zip
mv nltk_data-gh-pages / $ PATH_TO_NLTK_DATA
Usei o prompt de comando do Window e ele não funciona, nenhum wget não é reconhecido no comando interno ou externo. Eu sou muito novo em linha de comando e no sabor de janela. Existe alguma solução alternativa para este prompt de comando fazer com que isso funcione? Eu realmente apreciaria isto.

Esta página foi útil?
0 / 5 - 0 avaliações