Platform: Python 3.5 di Mac OS X 10.11.2
Langkah-langkah untuk mereproduksi:
Gejala:
# Tulis konsol parsial:
[nltk_data] | Mengunduh paket panlex_lite ke
[nltk_data] | /Pengguna/beng/nltk_data...
[nltk_data] | Membuka ritsleting corpora/panlex_lite.zip.
Traceback (panggilan terakhir terakhir):
berkas "
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", baris 664, sedang diunduh
untuk pesan di self.incr_download(info_or_id, download_dir, force):
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", baris 543, di incr_download
untuk pesan di self.incr_download(info.children, download_dir, force):
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", baris 529, di incr_download
untuk pesan di self._download_list(info_or_id, download_dir, force):
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", baris 572, di _download_list
untuk pesan di self.incr_download(item, download_dir, force):
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", baris 549, di incr_download
untuk pesan di self._download_package(info, download_dir, force):
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", baris 638, di _download_package
untuk pesan di _unzip_iter(filepath, zipdir, verbose=False):
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", baris 2039, di _unzip_iter
outfile.write(isi)
OSError: [Errno 22] Argumen tidak valid
@grayben - tolong instal versi NLTK saat ini dan laporkan jika Anda masih memiliki masalah ini?
@stevenbird maaf atas keterlambatan saya dalam membalas - Anda tahu bagaimana tugas uni!
Saya mengalami masalah pada v3.2. Saya baru saja memutakhirkan ke v3.2.1 dan saya mengalami masalah yang sama.
@grayben Bagaimana Anda menginstal NLTK? Apakah Anda mengalami kesalahan saat mengunduh satu korpus, misalnya nltk.download('brown')
? Apakah Anda memiliki kesalahan saat menggunakan Python2.7?
@alvations
Informasi tambahan: sejumlah teman sekelas saya telah melaporkan apa yang tampaknya menjadi masalah yang sama, meskipun saya tidak dapat mengomentari konfigurasi mereka atau apa yang mereka lakukan untuk menghadapi masalah tersebut.
@grayben dapatkah Anda menjalankan baris kode berikut dan melihat apakah Anda mendapatkan output [0, 448887900, 85839474]
?
>>> import zipfile
>>> plzip = '/Users/beng/nltk_data/corpora/panlex_lite.zip'
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
[0, 448887900, 85839474]
Pada baris perintah di luar python, apa output untuk yang berikut ini?:
$ ls -lah /Users/beng//nltk_data/corpora/
Kode Anda -> keluaran saya:
>>> import zipfile
>>> plzip = ' /Users/beng//nltk_data/corpora/panlex_lite.zip'
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1009, in __init__
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: ' /Users/beng//nltk_data/corpora/panlex_lite.zip'
Saya kemudian mengubah ' /Users/beng//nltk_data/corpora/panlex_lite.zip'
menjadi '/Users/beng//nltk_data/corpora/panlex_lite.zip'
(tanpa spasi sebelum root):
>>> plzip = '/Users/beng//nltk_data/corpora/panlex_lite.zip'
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1026, in __init__
self._RealGetContents()
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1093, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
Bens-MacBook-Pro:10K-Extractor beng$ ls -lah /Users/beng//nltk_data/corpora/
total 966608
drwxr-xr-x 152 beng staff 5.0K 19 Apr 16:26 .
drwxr-xr-x 11 beng staff 374B 3 Mar 14:41 ..
drwxr-xr-x 5 beng staff 170B 3 Mar 14:32 abc
-rw-r--r-- 1 beng staff 1.4M 3 Mar 14:32 abc.zip
drwxr-xr-x 4 beng staff 136B 3 Mar 14:32 alpino
-rw-r--r-- 1 beng staff 2.7M 3 Mar 14:32 alpino.zip
drwxr-xr-x 7 beng staff 238B 3 Mar 14:32 biocreative_ppi
-rw-r--r-- 1 beng staff 218K 3 Mar 14:32 biocreative_ppi.zip
drwxr-xr-x 505 beng staff 17K 3 Mar 14:32 brown
-rw-r--r-- 1 beng staff 3.2M 3 Mar 14:32 brown.zip
drwxr-xr-x 509 beng staff 17K 3 Mar 14:32 brown_tei
-rw-r--r-- 1 beng staff 8.3M 3 Mar 14:32 brown_tei.zip
drwxr-xr-x 1389 beng staff 46K 3 Mar 14:33 cess_cat
-rw-r--r-- 1 beng staff 5.1M 3 Mar 14:33 cess_cat.zip
drwxr-xr-x 612 beng staff 20K 3 Mar 14:33 cess_esp
-rw-r--r-- 1 beng staff 2.1M 3 Mar 14:33 cess_esp.zip
drwxr-xr-x 10 beng staff 340B 3 Mar 14:33 chat80
-rw-r--r-- 1 beng staff 19K 3 Mar 14:33 chat80.zip
drwxr-xr-x 3 beng staff 102B 3 Mar 14:33 city_database
-rw-r--r-- 1 beng staff 1.7K 3 Mar 14:33 city_database.zip
drwxr-xr-x 4 beng staff 136B 3 Mar 14:33 cmudict
-rw-r--r-- 1 beng staff 875K 3 Mar 14:33 cmudict.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:33 comparative_sentences
-rw-r--r-- 1 beng staff 273K 3 Mar 14:33 comparative_sentences.zip
-rw-r--r-- 1 beng staff 11M 3 Mar 14:33 comtrans.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:33 conll2000
-rw-r--r-- 1 beng staff 739K 3 Mar 14:33 conll2000.zip
drwxr-xr-x 9 beng staff 306B 3 Mar 14:33 conll2002
-rw-r--r-- 1 beng staff 1.8M 3 Mar 14:33 conll2002.zip
-rw-r--r-- 1 beng staff 1.2M 3 Mar 14:33 conll2007.zip
drwxr-xr-x 453 beng staff 15K 3 Mar 14:33 crubadan
-rw-r--r-- 1 beng staff 5.0M 3 Mar 14:33 crubadan.zip
drwxr-xr-x 201 beng staff 6.7K 3 Mar 14:33 dependency_treebank
-rw-r--r-- 1 beng staff 447K 3 Mar 14:33 dependency_treebank.zip
drwxr-xr-x 14 beng staff 476B 3 Mar 14:33 europarl_raw
-rw-r--r-- 1 beng staff 12M 3 Mar 14:33 europarl_raw.zip
drwxr-xr-x 4 beng staff 136B 3 Mar 14:33 floresta
-rw-r--r-- 1 beng staff 1.8M 3 Mar 14:33 floresta.zip
drwxr-xr-x 16 beng staff 544B 3 Mar 14:34 framenet_v15
-rw-r--r-- 1 beng staff 66M 3 Mar 14:33 framenet_v15.zip
drwxr-xr-x 11 beng staff 374B 3 Mar 14:34 gazetteers
-rw-r--r-- 1 beng staff 8.1K 3 Mar 14:34 gazetteers.zip
drwxr-xr-x 11 beng staff 374B 3 Mar 14:34 genesis
-rw-r--r-- 1 beng staff 462K 3 Mar 14:34 genesis.zip
drwxr-xr-x 21 beng staff 714B 3 Mar 14:34 gutenberg
-rw-r--r-- 1 beng staff 4.1M 3 Mar 14:34 gutenberg.zip
drwxr-xr-x 9 beng staff 306B 3 Mar 14:34 ieer
-rw-r--r-- 1 beng staff 162K 3 Mar 14:34 ieer.zip
drwxr-xr-x 59 beng staff 2.0K 3 Mar 14:34 inaugural
-rw-r--r-- 1 beng staff 314K 3 Mar 14:34 inaugural.zip
drwxr-xr-x 7 beng staff 238B 3 Mar 14:34 indian
-rw-r--r-- 1 beng staff 195K 3 Mar 14:34 indian.zip
-rw-r--r-- 1 beng staff 16M 3 Mar 14:34 jeita.zip
drwxr-xr-x 22 beng staff 748B 3 Mar 14:34 kimmo
-rw-r--r-- 1 beng staff 183K 3 Mar 14:34 kimmo.zip
-rw-r--r-- 1 beng staff 8.4M 3 Mar 14:34 knbc.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:34 lin_thesaurus
-rw-r--r-- 1 beng staff 85M 3 Mar 14:34 lin_thesaurus.zip
drwxr-xr-x 112 beng staff 3.7K 3 Mar 14:34 mac_morpho
-rw-r--r-- 1 beng staff 2.9M 3 Mar 14:34 mac_morpho.zip
-rw-r--r-- 1 beng staff 5.9M 3 Mar 14:34 machado.zip
-rw-r--r-- 1 beng staff 1.5M 3 Mar 14:34 masc_tagged.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:34 movie_reviews
-rw-r--r-- 1 beng staff 3.8M 3 Mar 14:34 movie_reviews.zip
drwxr-xr-x 56 beng staff 1.9K 3 Mar 14:38 mte_teip5
-rw-r--r-- 1 beng staff 14M 3 Mar 14:38 mte_teip5.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:34 names
-rw-r--r-- 1 beng staff 21K 3 Mar 14:34 names.zip
-rw-r--r-- 1 beng staff 6.4M 3 Mar 14:35 nombank.1.0.zip
drwxr-xr-x 19 beng staff 646B 3 Mar 14:35 nps_chat
-rw-r--r-- 1 beng staff 294K 3 Mar 14:35 nps_chat.zip
drwxr-xr-x 32 beng staff 1.1K 3 Mar 14:35 omw
-rw-r--r-- 1 beng staff 11M 3 Mar 14:35 omw.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:35 opinion_lexicon
-rw-r--r-- 1 beng staff 24K 3 Mar 14:35 opinion_lexicon.zip
drwxr-xr-x 4 beng staff 136B 21 Mar 17:54 panlex_lite
-rw-r--r-- 1 beng staff 58M 19 Apr 16:28 panlex_lite.zip
-rw-r--r-- 1 beng staff 2.6M 3 Mar 14:37 panlex_swadesh.zip
drwxr-xr-x 21 beng staff 714B 3 Mar 14:35 paradigms
-rw-r--r-- 1 beng staff 24K 3 Mar 14:35 paradigms.zip
drwxr-xr-x 475 beng staff 16K 3 Mar 14:35 pil
-rw-r--r-- 1 beng staff 1.4M 3 Mar 14:35 pil.zip
drwxr-xr-x 16 beng staff 544B 3 Mar 14:35 pl196x
-rw-r--r-- 1 beng staff 6.7M 3 Mar 14:35 pl196x.zip
drwxr-xr-x 7 beng staff 238B 3 Mar 14:35 ppattach
-rw-r--r-- 1 beng staff 763K 3 Mar 14:35 ppattach.zip
drwxr-xr-x 8 beng staff 272B 3 Mar 14:35 problem_reports
-rw-r--r-- 1 beng staff 1.0M 3 Mar 14:35 problem_reports.zip
drwxr-xr-x 8 beng staff 272B 3 Mar 14:35 product_reviews_1
-rw-r--r-- 1 beng staff 138K 3 Mar 14:35 product_reviews_1.zip
drwxr-xr-x 12 beng staff 408B 3 Mar 14:35 product_reviews_2
-rw-r--r-- 1 beng staff 167K 3 Mar 14:35 product_reviews_2.zip
-rw-r--r-- 1 beng staff 5.1M 3 Mar 14:35 propbank.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:35 pros_cons
-rw-r--r-- 1 beng staff 729K 3 Mar 14:35 pros_cons.zip
drwxr-xr-x 3 beng staff 102B 3 Mar 14:35 ptb
-rw-r--r-- 1 beng staff 6.1K 3 Mar 14:35 ptb.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:35 qc
-rw-r--r-- 1 beng staff 123K 3 Mar 14:35 qc.zip
-rw-r--r-- 1 beng staff 6.1M 3 Mar 14:35 reuters.zip
drwxr-xr-x 9 beng staff 306B 3 Mar 14:35 rte
-rw-r--r-- 1 beng staff 377K 3 Mar 14:35 rte.zip
-rw-r--r-- 1 beng staff 4.2M 3 Mar 14:35 semcor.zip
drwxr-xr-x 7 beng staff 238B 3 Mar 14:35 senseval
-rw-r--r-- 1 beng staff 2.1M 3 Mar 14:35 senseval.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:35 sentence_polarity
-rw-r--r-- 1 beng staff 479K 3 Mar 14:35 sentence_polarity.zip
drwxr-xr-x 4 beng staff 136B 3 Mar 14:35 sentiwordnet
-rw-r--r-- 1 beng staff 4.5M 3 Mar 14:35 sentiwordnet.zip
drwxr-xr-x 13 beng staff 442B 3 Mar 14:35 shakespeare
-rw-r--r-- 1 beng staff 464K 3 Mar 14:35 shakespeare.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:35 sinica_treebank
-rw-r--r-- 1 beng staff 878K 3 Mar 14:35 sinica_treebank.zip
drwxr-xr-x 9 beng staff 306B 3 Mar 14:35 smultron
-rw-r--r-- 1 beng staff 162K 3 Mar 14:35 smultron.zip
drwxr-xr-x 68 beng staff 2.3K 3 Mar 14:35 state_union
-rw-r--r-- 1 beng staff 790K 3 Mar 14:35 state_union.zip
drwxr-xr-x 17 beng staff 578B 3 Mar 14:35 stopwords
-rw-r--r-- 1 beng staff 8.9K 3 Mar 14:35 stopwords.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:35 subjectivity
-rw-r--r-- 1 beng staff 509K 3 Mar 14:35 subjectivity.zip
drwxr-xr-x 27 beng staff 918B 3 Mar 14:35 swadesh
-rw-r--r-- 1 beng staff 22K 3 Mar 14:35 swadesh.zip
drwxr-xr-x 8 beng staff 272B 3 Mar 14:35 switchboard
-rw-r--r-- 1 beng staff 773K 3 Mar 14:35 switchboard.zip
drwxr-xr-x 39 beng staff 1.3K 3 Mar 14:35 timit
-rw-r--r-- 1 beng staff 21M 3 Mar 14:35 timit.zip
drwxr-xr-x 8 beng staff 272B 3 Mar 14:35 toolbox
-rw-r--r-- 1 beng staff 245K 3 Mar 14:35 toolbox.zip
drwxr-xr-x 12 beng staff 408B 3 Mar 14:36 treebank
-rw-r--r-- 1 beng staff 1.6M 3 Mar 14:36 treebank.zip
drwxr-xr-x 7 beng staff 238B 3 Mar 14:36 twitter_samples
-rw-r--r-- 1 beng staff 15M 3 Mar 14:36 twitter_samples.zip
drwxr-xr-x 337 beng staff 11K 3 Mar 14:36 udhr
-rw-r--r-- 1 beng staff 1.1M 3 Mar 14:36 udhr.zip
drwxr-xr-x 390 beng staff 13K 3 Mar 14:36 udhr2
-rw-r--r-- 1 beng staff 1.6M 3 Mar 14:36 udhr2.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:36 unicode_samples
-rw-r--r-- 1 beng staff 1.2K 3 Mar 14:36 unicode_samples.zip
-rw-r--r-- 1 beng staff 25M 3 Mar 14:36 universal_treebanks_v20.zip
drwxr-xr-x 242 beng staff 8.0K 3 Mar 14:36 verbnet
-rw-r--r-- 1 beng staff 316K 3 Mar 14:36 verbnet.zip
drwxr-xr-x 9 beng staff 306B 3 Mar 14:36 webtext
-rw-r--r-- 1 beng staff 631K 3 Mar 14:36 webtext.zip
drwxr-xr-x 20 beng staff 680B 3 Mar 14:36 wordnet
-rw-r--r-- 1 beng staff 10M 3 Mar 14:36 wordnet.zip
drwxr-xr-x 30 beng staff 1.0K 3 Mar 14:36 wordnet_ic
-rw-r--r-- 1 beng staff 11M 3 Mar 14:36 wordnet_ic.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:36 words
-rw-r--r-- 1 beng staff 740K 3 Mar 14:36 words.zip
drwxr-xr-x 3 beng staff 102B 3 Mar 14:36 ycoe
-rw-r--r-- 1 beng staff 477B 3 Mar 14:36 ycoe.zip
Ini menunjukkan bahwa saat mengunduh, file tersebut rusak (mungkin karena koneksi internet yang terputus):
>>> plzip = '/Users/beng//nltk_data/corpora/panlex_lite.zip'
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1026, in __init__
self._RealGetContents()
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1093, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
Pergi ke '/Users/beng//nltk_data/corpora/
, hapus file panlex_lite.zip
dan kemudian unduh ulang lagi. Perhatikan bahwa mungkin diperlukan waktu hingga 2+ jam atau lebih untuk mengunduh file zip tersebut saat server kelebihan beban atau koneksi internet Anda lambat.
Saya melakukan hal berikut (tiga kali):
rm /Users/beng//nltk_data/corpora/panlex_lite.zip
python3
>>> import nltk
>>> nltk.download('panlex_lite')
[nltk_data] Downloading package panlex_lite to
[nltk_data] /Users/beng/nltk_data...
[nltk_data] Unzipping corpora/panlex_lite.zip.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 664, in download
for msg in self.incr_download(info_or_id, download_dir, force):
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 549, in incr_download
for msg in self._download_package(info, download_dir, force):
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 638, in _download_package
for msg in _unzip_iter(filepath, zipdir, verbose=False):
File "/usr/local/lib/python3.5/site-packages/nltk/downloader.py", line 2039, in _unzip_iter
outfile.write(contents)
OSError: [Errno 22] Invalid argument
>>>
Namun, harap perhatikan juga input/output perintah berikut:
>>> plzip = '/Users/beng//nltk_data/corpora/panlex_lite.zip'
>>> import zipfile
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
[0, 448887900, 85839474]
Bisakah Anda juga melakukan rm -rf /Users/beng//nltk_data/corpora/panlex_lite
sebelum menjalankan python3
?
yaitu:
$ rm /Users/beng//nltk_data/corpora/panlex_lite.zip
$ rm -rf /Users/beng//nltk_data/corpora/panlex_lite
$ python -m nltk.downloader panlex_lite
$ python3
>>> plzip = '/Users/beng//nltk_data/corpora/panlex_lite.zip'
>>> import zipfile
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
[0, 448887900, 85839474]
Saya tidak dapat mereproduksi OSError
di Ubuntu 14.04 Python 3.5.1:
alvas<strong i="13">@ubi</strong>:~/nltk_data/corpora$ ls panlex_
panlex_lite.zip panlex_swadesh.zip
alvas<strong i="14">@ubi</strong>:~/nltk_data/corpora$ cd
alvas<strong i="15">@ubi</strong>:~$ python
Python 2.7.11 (default, Dec 15 2015, 16:46:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> nltk.download('panlex_lite')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'nltk' is not defined
>>> import nltk
>>> nltk.download('panlex_lite')
[nltk_data] Downloading package panlex_lite to
[nltk_data] /home/alvas/nltk_data...
[nltk_data] Package panlex_lite is already up-to-date!
True
>>> exit()
alvas<strong i="16">@ubi</strong>:~$ python3
Python 3.5.1 (default, Dec 18 2015, 00:00:00)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download('panlex_lite')
[nltk_data] Downloading package panlex_lite to
[nltk_data] /home/alvas/nltk_data...
[nltk_data] Package panlex_lite is already up-to-date!
True
BTW, jika Anda tidak akan menggunakan panlex
, sisa NLTK
akan berfungsi dengan baik tanpanya =)
Bens-MacBook-Pro:work beng$ rm -rf /Users/beng//nltk_data/corpora/panlex_lite
Bens-MacBook-Pro:work beng$ ls -lah /Users/beng//nltk_data/corpora
total 4361152
drwxr-xr-x 151 beng staff 5.0K 20 Apr 13:12 .
drwxr-xr-x 11 beng staff 374B 3 Mar 14:41 ..
drwxr-xr-x 5 beng staff 170B 3 Mar 14:32 abc
-rw-r--r-- 1 beng staff 1.4M 3 Mar 14:32 abc.zip
drwxr-xr-x 4 beng staff 136B 3 Mar 14:32 alpino
-rw-r--r-- 1 beng staff 2.7M 3 Mar 14:32 alpino.zip
drwxr-xr-x 7 beng staff 238B 3 Mar 14:32 biocreative_ppi
-rw-r--r-- 1 beng staff 218K 3 Mar 14:32 biocreative_ppi.zip
drwxr-xr-x 505 beng staff 17K 3 Mar 14:32 brown
-rw-r--r-- 1 beng staff 3.2M 3 Mar 14:32 brown.zip
drwxr-xr-x 509 beng staff 17K 3 Mar 14:32 brown_tei
-rw-r--r-- 1 beng staff 8.3M 3 Mar 14:32 brown_tei.zip
drwxr-xr-x 1389 beng staff 46K 3 Mar 14:33 cess_cat
-rw-r--r-- 1 beng staff 5.1M 3 Mar 14:33 cess_cat.zip
drwxr-xr-x 612 beng staff 20K 3 Mar 14:33 cess_esp
-rw-r--r-- 1 beng staff 2.1M 3 Mar 14:33 cess_esp.zip
drwxr-xr-x 10 beng staff 340B 3 Mar 14:33 chat80
-rw-r--r-- 1 beng staff 19K 3 Mar 14:33 chat80.zip
drwxr-xr-x 3 beng staff 102B 3 Mar 14:33 city_database
-rw-r--r-- 1 beng staff 1.7K 3 Mar 14:33 city_database.zip
drwxr-xr-x 4 beng staff 136B 3 Mar 14:33 cmudict
-rw-r--r-- 1 beng staff 875K 3 Mar 14:33 cmudict.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:33 comparative_sentences
-rw-r--r-- 1 beng staff 273K 3 Mar 14:33 comparative_sentences.zip
-rw-r--r-- 1 beng staff 11M 3 Mar 14:33 comtrans.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:33 conll2000
-rw-r--r-- 1 beng staff 739K 3 Mar 14:33 conll2000.zip
drwxr-xr-x 9 beng staff 306B 3 Mar 14:33 conll2002
-rw-r--r-- 1 beng staff 1.8M 3 Mar 14:33 conll2002.zip
-rw-r--r-- 1 beng staff 1.2M 3 Mar 14:33 conll2007.zip
drwxr-xr-x 453 beng staff 15K 3 Mar 14:33 crubadan
-rw-r--r-- 1 beng staff 5.0M 3 Mar 14:33 crubadan.zip
drwxr-xr-x 201 beng staff 6.7K 3 Mar 14:33 dependency_treebank
-rw-r--r-- 1 beng staff 447K 3 Mar 14:33 dependency_treebank.zip
drwxr-xr-x 14 beng staff 476B 3 Mar 14:33 europarl_raw
-rw-r--r-- 1 beng staff 12M 3 Mar 14:33 europarl_raw.zip
drwxr-xr-x 4 beng staff 136B 3 Mar 14:33 floresta
-rw-r--r-- 1 beng staff 1.8M 3 Mar 14:33 floresta.zip
drwxr-xr-x 16 beng staff 544B 3 Mar 14:34 framenet_v15
-rw-r--r-- 1 beng staff 66M 3 Mar 14:33 framenet_v15.zip
drwxr-xr-x 11 beng staff 374B 3 Mar 14:34 gazetteers
-rw-r--r-- 1 beng staff 8.1K 3 Mar 14:34 gazetteers.zip
drwxr-xr-x 11 beng staff 374B 3 Mar 14:34 genesis
-rw-r--r-- 1 beng staff 462K 3 Mar 14:34 genesis.zip
drwxr-xr-x 21 beng staff 714B 3 Mar 14:34 gutenberg
-rw-r--r-- 1 beng staff 4.1M 3 Mar 14:34 gutenberg.zip
drwxr-xr-x 9 beng staff 306B 3 Mar 14:34 ieer
-rw-r--r-- 1 beng staff 162K 3 Mar 14:34 ieer.zip
drwxr-xr-x 59 beng staff 2.0K 3 Mar 14:34 inaugural
-rw-r--r-- 1 beng staff 314K 3 Mar 14:34 inaugural.zip
drwxr-xr-x 7 beng staff 238B 3 Mar 14:34 indian
-rw-r--r-- 1 beng staff 195K 3 Mar 14:34 indian.zip
-rw-r--r-- 1 beng staff 16M 3 Mar 14:34 jeita.zip
drwxr-xr-x 22 beng staff 748B 3 Mar 14:34 kimmo
-rw-r--r-- 1 beng staff 183K 3 Mar 14:34 kimmo.zip
-rw-r--r-- 1 beng staff 8.4M 3 Mar 14:34 knbc.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:34 lin_thesaurus
-rw-r--r-- 1 beng staff 85M 3 Mar 14:34 lin_thesaurus.zip
drwxr-xr-x 112 beng staff 3.7K 3 Mar 14:34 mac_morpho
-rw-r--r-- 1 beng staff 2.9M 3 Mar 14:34 mac_morpho.zip
-rw-r--r-- 1 beng staff 5.9M 3 Mar 14:34 machado.zip
-rw-r--r-- 1 beng staff 1.5M 3 Mar 14:34 masc_tagged.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:34 movie_reviews
-rw-r--r-- 1 beng staff 3.8M 3 Mar 14:34 movie_reviews.zip
drwxr-xr-x 56 beng staff 1.9K 3 Mar 14:38 mte_teip5
-rw-r--r-- 1 beng staff 14M 3 Mar 14:38 mte_teip5.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:34 names
-rw-r--r-- 1 beng staff 21K 3 Mar 14:34 names.zip
-rw-r--r-- 1 beng staff 6.4M 3 Mar 14:35 nombank.1.0.zip
drwxr-xr-x 19 beng staff 646B 3 Mar 14:35 nps_chat
-rw-r--r-- 1 beng staff 294K 3 Mar 14:35 nps_chat.zip
drwxr-xr-x 32 beng staff 1.1K 3 Mar 14:35 omw
-rw-r--r-- 1 beng staff 11M 3 Mar 14:35 omw.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:35 opinion_lexicon
-rw-r--r-- 1 beng staff 24K 3 Mar 14:35 opinion_lexicon.zip
-rw-r--r-- 1 beng staff 1.7G 20 Apr 12:46 panlex_lite.zip
-rw-r--r-- 1 beng staff 2.6M 3 Mar 14:37 panlex_swadesh.zip
drwxr-xr-x 21 beng staff 714B 3 Mar 14:35 paradigms
-rw-r--r-- 1 beng staff 24K 3 Mar 14:35 paradigms.zip
drwxr-xr-x 475 beng staff 16K 3 Mar 14:35 pil
-rw-r--r-- 1 beng staff 1.4M 3 Mar 14:35 pil.zip
drwxr-xr-x 16 beng staff 544B 3 Mar 14:35 pl196x
-rw-r--r-- 1 beng staff 6.7M 3 Mar 14:35 pl196x.zip
drwxr-xr-x 7 beng staff 238B 3 Mar 14:35 ppattach
-rw-r--r-- 1 beng staff 763K 3 Mar 14:35 ppattach.zip
drwxr-xr-x 8 beng staff 272B 3 Mar 14:35 problem_reports
-rw-r--r-- 1 beng staff 1.0M 3 Mar 14:35 problem_reports.zip
drwxr-xr-x 8 beng staff 272B 3 Mar 14:35 product_reviews_1
-rw-r--r-- 1 beng staff 138K 3 Mar 14:35 product_reviews_1.zip
drwxr-xr-x 12 beng staff 408B 3 Mar 14:35 product_reviews_2
-rw-r--r-- 1 beng staff 167K 3 Mar 14:35 product_reviews_2.zip
-rw-r--r-- 1 beng staff 5.1M 3 Mar 14:35 propbank.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:35 pros_cons
-rw-r--r-- 1 beng staff 729K 3 Mar 14:35 pros_cons.zip
drwxr-xr-x 3 beng staff 102B 3 Mar 14:35 ptb
-rw-r--r-- 1 beng staff 6.1K 3 Mar 14:35 ptb.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:35 qc
-rw-r--r-- 1 beng staff 123K 3 Mar 14:35 qc.zip
-rw-r--r-- 1 beng staff 6.1M 3 Mar 14:35 reuters.zip
drwxr-xr-x 9 beng staff 306B 3 Mar 14:35 rte
-rw-r--r-- 1 beng staff 377K 3 Mar 14:35 rte.zip
-rw-r--r-- 1 beng staff 4.2M 3 Mar 14:35 semcor.zip
drwxr-xr-x 7 beng staff 238B 3 Mar 14:35 senseval
-rw-r--r-- 1 beng staff 2.1M 3 Mar 14:35 senseval.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:35 sentence_polarity
-rw-r--r-- 1 beng staff 479K 3 Mar 14:35 sentence_polarity.zip
drwxr-xr-x 4 beng staff 136B 3 Mar 14:35 sentiwordnet
-rw-r--r-- 1 beng staff 4.5M 3 Mar 14:35 sentiwordnet.zip
drwxr-xr-x 13 beng staff 442B 3 Mar 14:35 shakespeare
-rw-r--r-- 1 beng staff 464K 3 Mar 14:35 shakespeare.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:35 sinica_treebank
-rw-r--r-- 1 beng staff 878K 3 Mar 14:35 sinica_treebank.zip
drwxr-xr-x 9 beng staff 306B 3 Mar 14:35 smultron
-rw-r--r-- 1 beng staff 162K 3 Mar 14:35 smultron.zip
drwxr-xr-x 68 beng staff 2.3K 3 Mar 14:35 state_union
-rw-r--r-- 1 beng staff 790K 3 Mar 14:35 state_union.zip
drwxr-xr-x 17 beng staff 578B 3 Mar 14:35 stopwords
-rw-r--r-- 1 beng staff 8.9K 3 Mar 14:35 stopwords.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:35 subjectivity
-rw-r--r-- 1 beng staff 509K 3 Mar 14:35 subjectivity.zip
drwxr-xr-x 27 beng staff 918B 3 Mar 14:35 swadesh
-rw-r--r-- 1 beng staff 22K 3 Mar 14:35 swadesh.zip
drwxr-xr-x 8 beng staff 272B 3 Mar 14:35 switchboard
-rw-r--r-- 1 beng staff 773K 3 Mar 14:35 switchboard.zip
drwxr-xr-x 39 beng staff 1.3K 3 Mar 14:35 timit
-rw-r--r-- 1 beng staff 21M 3 Mar 14:35 timit.zip
drwxr-xr-x 8 beng staff 272B 3 Mar 14:35 toolbox
-rw-r--r-- 1 beng staff 245K 3 Mar 14:35 toolbox.zip
drwxr-xr-x 12 beng staff 408B 3 Mar 14:36 treebank
-rw-r--r-- 1 beng staff 1.6M 3 Mar 14:36 treebank.zip
drwxr-xr-x 7 beng staff 238B 3 Mar 14:36 twitter_samples
-rw-r--r-- 1 beng staff 15M 3 Mar 14:36 twitter_samples.zip
drwxr-xr-x 337 beng staff 11K 3 Mar 14:36 udhr
-rw-r--r-- 1 beng staff 1.1M 3 Mar 14:36 udhr.zip
drwxr-xr-x 390 beng staff 13K 3 Mar 14:36 udhr2
-rw-r--r-- 1 beng staff 1.6M 3 Mar 14:36 udhr2.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:36 unicode_samples
-rw-r--r-- 1 beng staff 1.2K 3 Mar 14:36 unicode_samples.zip
-rw-r--r-- 1 beng staff 25M 3 Mar 14:36 universal_treebanks_v20.zip
drwxr-xr-x 242 beng staff 8.0K 3 Mar 14:36 verbnet
-rw-r--r-- 1 beng staff 316K 3 Mar 14:36 verbnet.zip
drwxr-xr-x 9 beng staff 306B 3 Mar 14:36 webtext
-rw-r--r-- 1 beng staff 631K 3 Mar 14:36 webtext.zip
drwxr-xr-x 20 beng staff 680B 3 Mar 14:36 wordnet
-rw-r--r-- 1 beng staff 10M 3 Mar 14:36 wordnet.zip
drwxr-xr-x 30 beng staff 1.0K 3 Mar 14:36 wordnet_ic
-rw-r--r-- 1 beng staff 11M 3 Mar 14:36 wordnet_ic.zip
drwxr-xr-x 5 beng staff 170B 3 Mar 14:36 words
-rw-r--r-- 1 beng staff 740K 3 Mar 14:36 words.zip
drwxr-xr-x 3 beng staff 102B 3 Mar 14:36 ycoe
-rw-r--r-- 1 beng staff 477B 3 Mar 14:36 ycoe.zip
Bens-MacBook-Pro:work beng$ python3
Python 3.5.1 (default, Mar 3 2016, 14:25:53)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download('panlex_lite')
[nltk_data] Downloading package panlex_lite to
[nltk_data] /Users/beng/nltk_data...
[nltk_data] Package panlex_lite is already up-to-date!
True
Selanjutnya, melalui GUI pengunduh, pengunduhan "semua" akhirnya berhasil, dengan semua bidang ditandai "terpasang".
Besar! Jadi tidak ada OSError
sekarang? Itu adalah direktori panlex_lite
rusak (dari unduhan sebelumnya) yang tersisa yang menyebabkan OSError
. Setelah infolist
dari file zip benar, seharusnya tidak ada masalah.
Nikmati bermain-main dengan NLTK! Beritahu teman/teman sekelas Anda untuk melakukan hal yang sama juga:
$ rm /Users/beng//nltk_data/corpora/panlex_lite.zip
$ rm -rf /Users/beng//nltk_data/corpora/panlex_lite
$ python -m nltk.downloader panlex_lite
$ python3
>>> plzip = '/Users/beng//nltk_data/corpora/panlex_lite.zip'
>>> import zipfile
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
[0, 448887900, 85839474]
Terima kasih!
Saya mendapatkan masalah yang sama persis dengan NLTK 3.2.1 terbaru baik di Ubuntu 16.04 (yang membuat seluruh OS saya mogok) dan di OSX saya mendapatkan kesalahan yang sama dengan OP. Saya terkejut bahwa kasus ini telah ditutup seolah-olah tidak ada yang salah dengan itu.
Ketika mencoba solusinya gagal setelah langkah ini, karena mencoba mengekstraknya secara otomatis segera setelah mengunduhnya: python -m nltk.downloader panlex_lite
[nltk_data] Downloading package panlex_lite to
[nltk_data] /Users/houmie/nltk_data...
[nltk_data] Unzipping corpora/panlex_lite.zip.
Traceback (most recent call last):
File "/Users/houmie/.pyenv/versions/3.5.1/lib/python3.5/runpy.py", line 170, in _run_module_as_main
"__main__", mod_spec)
File "/Users/houmie/.pyenv/versions/3.5.1/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/houmie/.pyenv/versions/venv35/lib/python3.5/site-packages/nltk/downloader.py", line 2268, in <module>
halt_on_error=options.halt_on_error)
File "/Users/houmie/.pyenv/versions/venv35/lib/python3.5/site-packages/nltk/downloader.py", line 664, in download
for msg in self.incr_download(info_or_id, download_dir, force):
File "/Users/houmie/.pyenv/versions/venv35/lib/python3.5/site-packages/nltk/downloader.py", line 549, in incr_download
for msg in self._download_package(info, download_dir, force):
File "/Users/houmie/.pyenv/versions/venv35/lib/python3.5/site-packages/nltk/downloader.py", line 638, in _download_package
for msg in _unzip_iter(filepath, zipdir, verbose=False):
File "/Users/houmie/.pyenv/versions/venv35/lib/python3.5/site-packages/nltk/downloader.py", line 2039, in _unzip_iter
outfile.write(contents)
OSError: [Errno 22] Invalid argument
Terima kasih
@houmie untuk apa output Anda:
$ rm /Users/houmie//nltk_data/corpora/panlex_lite.zip
$ rm -rf /Users/houmie//nltk_data/corpora/panlex_lite
$ python -m nltk.downloader panlex_lite
$ python3
>>> plzip = '/Users/houmie//nltk_data/corpora/panlex_lite.zip'
>>> import zipfile
>>> [zifo.CRC for zifo in zipfile.ZipFile(plzip).infolist()]
[0, 448887900, 85839474]
Ini tidak diperbaiki - ini terjadi untuk python 2.7, 3.4.3, dan 3.5.1. Unduhan panlex_lite terhenti cukup lama, dan kemudian membuka ritsleting akan membekukan GUI dan/atau menyebabkan OSError.
Saya mengalami masalah yang sama pada Macbook Pro saya dengan (OS X EI Capitain, Anaconda 1.4.0+python 3.5.2) dan saya mencoba versi NLTK pada "conda install nltk" dengan 3.2.1 dan "sudo python3 setup.py install " dengan cabang master github. Yang menarik adalah saya tidak pernah mendapatkan CRC [0, 448887900, 85839474] tetapi [0, 448887900, 84607019] selalu setelah saya mencoba mengunduh panlex_lite.zip lebih dari 5 kali. Ada petunjuk atau petunjuk?
Sayangnya mereka menolak masalah itu malah akan ada. Saya melaporkan ini pada Mei 2016 dan masih belum ada pengakuan atas masalahnya.
Saya baru saja mencobanya lagi melalui unduhan GUI dan masih mendapatkan pesan kesalahan ini yang ditampilkan di konsol:
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/houmie/.pyenv/versions/venv35/lib/python3.5/site-packages/nltk/downloader.py", line 655, in download
self._interactive_download()
File "/Users/houmie/.pyenv/versions/venv35/lib/python3.5/site-packages/nltk/downloader.py", line 974, in _interactive_download
DownloaderGUI(self).mainloop()
File "/Users/houmie/.pyenv/versions/venv35/lib/python3.5/site-packages/nltk/downloader.py", line 1709, in mainloop
self.top.mainloop(*args, **kwargs)
File "/Users/houmie/.pyenv/versions/3.5.1/lib/python3.5/tkinter/__init__.py", line 1131, in mainloop
self.tk.mainloop(n)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 0: invalid continuation byte
>>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 0: invalid continuation byte
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 0: invalid continuation byte
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 0: invalid continuation byte
Ini sangat menyakitkan bagi saya, karena saya harus membaca kode dan menghapus semua referensi ke Pantex agar paket berfungsi.
Hai, sama di sini, semoga jika cukup banyak orang yang melaporkannya maka akan diperbaiki di beberapa titik ...
oke di sana, inilah yang telah saya lakukan
d = nltk.downloader.Downloader()
d._packages.pop('panlex_lite')
d.download()
# error message
d._packages.pop('panlex_lite')
/usr/local/lib/python3.5/site-packages/nltk/downloader.py in info(self, id)
876 if id in self._packages: return self._packages[id]
877 if id in self._collections: return self._collections[id]
--> 878 raise ValueError('Package %r not found in index' % id)
879
880 def xmlinfo(self, id):
Saya kira, kita bisa menambahkan sesuatu seperti if id != 'panlex_lite'
ke kode...
Tapi, bagi saya, cara termudah terlihat seperti ini:
panlex
darinyapython -m nltk.downloader -d /usr/local/share/nltk_data -u https://gist.githubusercontent.com/demidovakatya/61dab385d74065ae825c80496a197980/raw/c6ff7fbf44265c7f8c9e961e3e1158cd812d6af1/index.xml all
Aaaaaa dan.... Done downloading collection all
! 🎉🎉🎉🎉
@demidovakatya
Saya ingin memahami bahwa Anda menyebutkan itu
itu berarti
<package author="David Kamholz" checksum="e13211688738201c0a5bd5b2f50e94ab" id="panlex_lite" license="CC0 1.0 Universal" name="PanLex Lite Corpus" size="2202492316" subdir="corpora" unzip="1" unzipped_size="5778483185" url="http://dev.panlex.org/db/panlex_lite.zip" webpage="http://panlex.org/" />
<package author="Jonathan Pool (editor)" checksum="59a08f6c19d1d6d72cc03189983c8045" id="panlex_swadesh" license="CC0 1.0 Universal" name="PanLex Swadesh Corpora" size="2699578" subdir="corpora" unzip="0" unzipped_size="4103346" url="https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/panlex_swadesh.zip" webpage="http://panlex.org/" />
=>
<package author="David Kamholz" checksum="e13211688738201c0a5bd5b2f50e94ab" id="_lite" license="CC0 1.0 Universal" name="PanLex Lite Corpus" size="2202492316" subdir="corpora" unzip="1" unzipped_size="5778483185" url="http://dev.panlex.org/db/panlex_lite.zip" webpage="http://panlex.org/" />
<package author="Jonathan Pool (editor)" checksum="59a08f6c19d1d6d72cc03189983c8045" id="_swadesh" license="CC0 1.0 Universal" name="PanLex Swadesh Corpora" size="2699578" subdir="corpora" unzip="0" unzipped_size="4103346" url="https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/panlex_swadesh.zip" webpage="http://panlex.org/" />
@demidovakatya ,
Terima kasih. Saya menemui masalah yang sama.
Mengunduh panlex_lite seharusnya berfungsi dengan baik sekarang
Sekali lagi tidak bekerja.
Saya tidak memiliki bandwidth untuk menguji ini. Halaman nltk_data kami menunjuk pada versi 1 April , yang tidak tersentuh ketika versi 1 Mei ditambahkan baru-baru ini.
@kamholz : maukah Anda melakukan hal berikut untuk memeriksa apakah masih berfungsi? python -m nltk.downloader panlex_lite
Maaf ini terus terjadi. Sulit untuk di-debug, karena saya sering tidak dapat mereproduksi kesalahan yang dilaporkan. Dalam hal ini, ketika saya menjalankan python -m nltk.downloader panlex_lite
, itu tidak melaporkan kesalahan dan membuka ritsleting. Namun, jumlah MD5 pada https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
salah. Saya tidak tahu bagaimana itu terjadi, karena file tidak berubah. Entri harus dibaca sebagai berikut:
<package author="David Kamholz" checksum="3156099b9acb623725d63c727fd8591d" id="panlex_lite" license="CC0 1.0 Universal" name="PanLex Lite Corpus" size="2357864277" subdir="corpora" unzip="1" unzipped_size="5993562112" url="https://db.panlex.org/panlex_lite-20170401.zip" webpage="http://panlex.org/" />
Saya juga telah memperbarui URL di atas (tetapi itu seharusnya tidak membuat perbedaan untuk masalah ini, karena yang lama dialihkan), dan ukurannya.
Terima kasih untuk @kamholz ini. Saya telah mendorong file indeks yang dikoreksi menggunakan checksum ini.
@searah jarum jam maukah Anda mencoba lagi dan beri tahu kami bagaimana Anda melakukannya?
Saya mencoba: python -m nltk.downloader -u https://Gist.githubusercontent.com/demidovakatya/61dab385d74065ae825c80496a197980/raw/c6ff7fbf44265c7f8c9e961e3e1158cd812d6af1/index.xml semua dan semua kesalahan http 403xml dilarang. Adakah saran atau url baru yang akan berfungsi?
@sokhnavor ini disebabkan oleh #1787
@alvations terima kasih! Jadi begitu:
PATH_TO_NLTK_DATA=/home/nama pengguna/nltk_data/
wget https://github.com/nltk/nltk_data/archive/gh-pages.zip
unzip gh-pages.zip
mv nltk_data-gh-pages/ $PATH_TO_NLTK_DATA
Saya menggunakan Prompt perintah Window dan tidak berfungsi, tidak ada wget yang tidak dikenali dalam perintah internal atau eksternal. Saya cukup baru di baris perintah dan rasa jendela. Apakah ada solusi untuk Prompt perintah ini agar ini berfungsi? Aku akan sangat menghargainya.
Komentar yang paling membantu
Hai, sama di sini, semoga jika cukup banyak orang yang melaporkannya maka akan diperbaiki di beberapa titik ...