Nltk: Update installation instructions for recent versions of Python

Created on 30 May 2015  ·  14Comments  ·  Source: nltk/nltk

Recent versions of Python come with pip by default, so the installation instructions at http://www.nltk.org/install.html are out of date.

For modern installations (regardless of operating system), the following two steps should suffice:

  1. Install NLTK: pip install nltk
  2. Test installation: run python then type import nltk

The suggestion of optionally installing NumPy should be caveated by noting that it requires a working build system, and users may instead prefer to refer to the recommendations for installing the full SciPy stack: http://scipy.org/install.html

Users of older Python versions without pip already installed would likely be best served by referring to pip's own installation instructions at https://pip.pypa.io/en/latest/installing.html

documentation inactive

Most helpful comment

@txtsd there's a typo in your install command, as you have ntlk rather than nltk (and pip/pypi don't currently implement typo suggestions)

All 14 comments

For context on where this request came from, we had a new Python user come to distutils-sig confused by the setuptools/pip bootstrapping dance: https://mail.python.org/pipermail/distutils-sig/2015-May/026486.html

That dance _is_ confusing, which is why we started providing pip by default.

Hey @ncoghlan,

Here is my past attempt to write install docs, and the related discussion: https://github.com/nltk/nltk/pull/697. Could you please check it and weight in? Your experience with what's work and what's not would be very helpful.

I think it's a good idea to lead with the simple "if you are already have Python and pip installed and configured, just run 'pip install --user nltk'" instructions, as if that works, the user can just run it and move on. If you _assume_ it's necessary to instruct them on how to install Python, there's a risk they're going to put NLTK in the "too hard" basket, and miss the fact that it was only a single command away.

It's then worth asking "How may those simple instructions fail?". The three main cases:

  1. They don't have Python at all yet. In those cases, I suggest deferring to the SciPy stack instructions, since that will get the affected users a NumPy accelerated NLTK: http://scipy.org/install.html
  2. They have Python, but not pip. In those cases, I suggest deferring to pip's own bootstrapping instructions: https://pip.pypa.io/en/latest/installing.html
  3. They have Python and pip, but there's something else that prevents the "pip install --user nltk" approach working (e.g. they're using a system Python 3 installation on Linux, where "pip" installs into the Python 2 stack, and you need to use "pip3" or "python3 -m pip" to install into the right version). For this case, I suggest recommending they try "python -m pip install --user nltk", and if that still doesn't work, then point them at https://docs.python.org/3/installing/ for further ideas to try.

Finally, for more advanced usage (like learning how to use virtual environments), you could point them to https://packaging.python.org/en/latest/ and https://packaging.python.org/en/latest/science.html

The key is to focus on "How can I get a user to the point of productively using NLTK in the smallest number of possible steps?", rather than trying to teach them extraneous skills (like using virtual environments) which are likely to be helpful to them in the long run, but are initially just a distraction from the task of getting up and running for the first time. ("This will be helpful to you later, trust me" almost never engages an in-person student's attention, and it's even less effective when used as part of a self-directed learning process)

The principles in @ncoghlan's proposal of:

  • keep it as simple as possible, and
  • delegate to other (authoritative) information sources wherever possible

both seem spot on to me.

Thanks @ncoghlan, @kmike. I've simplified the instructions slightly (see http://www.nltk.org/install.html). Before doing more I wanted to check on the best version of the pip command. Three are on the table:

pip install nltk
sudo pip install nltk
pip install --user nltk

All this assumes that pip is installed in the users' path. Which option or explanation of options is likely to be the most general?

It's probably best to go with:

pip install nltk

There are currently cases where that won't work (specifically system Python installations on Linux), but that's a known issue with pip's default behaviour: https://github.com/pypa/pip/issues/1668

It's potentially worth suggesting "pip install --user nltk" as an alternative if the initial installation fails with a permissions error.

▶ pip install ntlk
Collecting ntlk
  Could not find a version that satisfies the requirement ntlk (from versions: )
No matching distribution found for ntlk

Is this expected behavior at this point?
Should I be compiling from git?

@txtsd there's a typo in your install command, as you have ntlk rather than nltk (and pip/pypi don't currently implement typo suggestions)

@ncoghlan Well that's embarrassing. Thanks!

Recently, scikit-learn has steered people away from pip and trust on the distro's package manager or conda: http://scikit-learn.org/stable/install.html.

Possibly adding a conda install instructions to http://www.nltk.org/install.html might be save users some time, e.g. http://nlpworkgroup.postach.io/post/install-miniconda-python-for-nltk-on-windows , especially for windows users.

@alvations good point regarding conda. It's probably the most robust way of getting a working scientific python environment.

Collecting nlkt
Could not find a version that satisfies the requirement nlkt (from versions: )
No matching distribution found for nlkt

Am experiencing this problem i try to install nlkt
Please help me try to solve it am using Ubuntu 17.04

@kafomambia There's a typo in your installation command - the last two letters are currently reversed. You want nltk (for Natural Language ToolKit) rather than your current nlkt.

Collecting nltk
Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connec
tion broken by 'NewConnectionError(' ction.VerifiedHTTPSConnection object at 0x0000007AC2BB2748>: Failed to establish
a new connection: [Errno 11002] getaddrinfo failed',)': /simple/nltk/
Retrying (Retry(total=3, connect=None, read=None, redirect=None)) after connec
tion broken by 'NewConnectionError(' ction.VerifiedHTTPSConnection object at 0x0000007AC2BB2B38>: Failed to establish
a new connection: [Errno 11002] getaddrinfo failed',)': /simple/nltk/
Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connec
tion broken by 'NewConnectionError(' ction.VerifiedHTTPSConnection object at 0x0000007AC2BB2860>: Failed to establish
a new connection: [Errno 11002] getaddrinfo failed',)': /simple/nltk/
Retrying (Retry(total=1, connect=None, read=None, redirect=None)) after connec
tion broken by 'NewConnectionError(' ction.VerifiedHTTPSConnection object at 0x0000007AC2BB29E8>: Failed to establish
a new connection: [Errno 11002] getaddrinfo failed',)': /simple/nltk/
Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connec
tion broken by 'NewConnectionError(' ction.VerifiedHTTPSConnection object at 0x0000007AC2BB2898>: Failed to establish
a new connection: [Errno 11002] getaddrinfo failed',)': /simple/nltk/
Could not find a version that satisfies the requirement nltk (from versions: )

No matching distribution found for nltk

I am getting above error while m trying to install nltk (python 3.5.2)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alvations picture alvations  ·  3Comments

BLKSerene picture BLKSerene  ·  4Comments

chaseireland picture chaseireland  ·  3Comments

stevenbird picture stevenbird  ·  4Comments

alvations picture alvations  ·  4Comments