pip install of a directory is super slow

Created on 16 Dec 2014  ·  74Comments  ·  Source: pypa/pip

See https://github.com/pypa/pip/issues/2195#issuecomment-524606986, for a summary of this issue.


I am dubious of why pip needs 17 seconds to process a local directory that is not on NFS (in fact, it's on an SSD drive) for pip, which has no dependencies, since everything is vendored.

$ time pip install --no-install ~/dev/git-repos/pip
DEPRECATION: --no-install and --no-download are deprecated. See https://github.com/pypa/pip/issues/906.
Processing /Users/marca/dev/git-repos/pip
  Requirement already satisfied (use --upgrade to upgrade): pip==6.0.dev1 from file:///Users/marca/dev/git-repos/pip in /Users/marca/dev/git-repos/pip
pip install --no-install ~/dev/git-repos/pip  2.80s user 5.86s system 50% cpu 17.205 total

It should probably at least be logging whatever is taking that long, but maybe it shouldn't even be doing whatever it's doing.

Note that the "Processing" line appears right away and pretty much the whole delay seems to be between that line and the next one.

needs discussion enhancement

Most helpful comment

Implementing PEP 517 will solve this.

Narrator: it didn't.

All 74 comments

It is making a copy of the entire directory, including .git. It probably shouldn't be doing that, no.

$ du -sh pip
263M    pip
$ du -sk * .cache .git .tox .travis | sort -nr | head -n 5
181860  .tox
34836   tests
31700   .git
9212    pip
2852    build

I tried passing 3 -v's (time pip install -vvv --no-install ~/dev/git-repos/pip) -- that didn't yield any more info.

Stepping through it with pdb, things slow down when I get to:

> /Users/marca/dev/git-repos/pip/pip/req/req_set.py(365)prepare_files()
-> unpack_url(

And yep, @tomprince is right - it slows down when it does a copy of the whole tree:

> /Users/marca/dev/git-repos/pip/pip/download.py(635)unpack_file_url()
-> shutil.copytree(link_path, location, symlinks=True)
$ time pip install --no-install ~/dev/git-repos/pip
DEPRECATION: --no-install and --no-download are deprecated. See https://github.com/pypa/pip/issues/906.
Processing /Users/marca/dev/git-repos/pip
  2014-12-15 15:23:34.630794: Copying tree; link_path = '/Users/marca/dev/git-repos/pip'; location = '/var/folders/gw/w0clrs515zx9x_55zgtpv4mm0000gp/T/pip-D6etc4-build'
  2014-12-15 15:23:57.418679: DONE copying tree; link_path = '/Users/marca/dev/git-repos/pip'; location = '/var/folders/gw/w0clrs515zx9x_55zgtpv4mm0000gp/T/pip-D6etc4-build'
  Requirement already satisfied (use --upgrade to upgrade): pip==6.0.dev1 from file:///Users/marca/dev/git-repos/pip in /Users/marca/dev/git-repos/pip
pip install --no-install ~/dev/git-repos/pip  2.75s user 5.03s system 32% cpu 24.168 total
>>> elapsed time 24s

It's much faster now that https://github.com/pypa/pip/pull/2196 is merged.

This should be reopened since #2196 was reverted. I'd like to come with an alternative PR that builds an sdist instead of using heuristics to figure out what to copy. See the comments on that PR for details.

$ time pip install --no-install ~/dev/git-repos/pip
DEPRECATION: --no-install and --no-download are deprecated. See https://github.com/pypa/pip/issues/906.
Processing /Users/marca/dev/git-repos/pip
  Requirement already satisfied (use --upgrade to upgrade): pip==6.1.0.dev0 from file:///Users/marca/dev/git-repos/pip in /Users/marca/dev/git-repos/pip
pip install --no-install ~/dev/git-repos/pip  3.67s user 8.12s system 7% cpu 2:45.83 total
>>> elapsed time 2m46s

Yikes, almost 3 minutes.

Probably mostly due to this:

$ du -sh .tox
177M    .tox

The .tox directory is 177M out of a total 270M for my whole pip directory.

Please see https://github.com/pypa/pip/pull/2535, which speeds up unpack_file_url by building an sdist and unpacking it.

This issue should be reopened, because the merged PR did nothing (see gh-3219).

Any progress for this issue?

No, and doesn't look like the final solution is going to arrive any time soon. PEP 516 or PEP 517 need accepting before a decision can be made on whether generating an sdist first is right (I personally don't think so).

PEP 516 summarizes it as:

Being able to create new sdists from existing source trees isn't a thing pip does today,
and while there is a PR to do that as part of building from source, it is contentious and
lacks consensus.

Probably the easiest is for someone to submit a simpler PR that fixes the most braindead behavior, like copying all of .git and .tox (assuming that still happens today). That would be a significant speedup in many cases and will be uncontroversial.

Somehow similar problem what to do when installing from bare repo (instead of source distribution or should I rather say published package) in npm – run prepublish for git url packages

@rgommers How about add a file .pipignore to list files and directories to be ignored like .gitignore instead of hardcoding some file/directory names such as .git and .tox?

That's not a good idea - it moves the responsibility to dealing with this slowness to the developer of every package, which just doesn't work.

If npm has it it must be good :) – https://docs.npmjs.com/misc/developers#keeping-files-out-of-your-package

this also actively breaks stuff like setuptools_scm even more ^^ - pip install making a folder copy breaks things hard already

What does setuptools_scm have to do with it? It should be run on a valid repo and not any kind of source package.

That's not a good idea - it moves the responsibility to dealing with this slowness to the developer of every package, which just doesn't work.

Have .pipignore implicitly include .git, .hg etc. with empty .pipignore supressing this.

@piotr-dobrogost a pip install from a source repo will break in various circumstances where pip does not copy enough context - for example pypa/setuptools_scm#138

We previously did ignore directories like .git and such and broke things like pbr and had to revert the change.

@dstufft it still breaks stuff if one is in a subdir of a git repo instead of the root ^^

Hmm, if that breakage was bad enough then there's no simple way to improve things here. Guess it's waiting for one of the build PEPs then.

pbr must be doing something quite unreasonable if it falls over without a .git dir present, but well ....

wondering if there's any progress on this? not only are .git or any .${scm} folders troublesome, it is much worse if people include .vagrant/ along with the source.

having a .pipignore that is customizable would really help ease the pain.

For another data point; we have a mix of Python and Javascript in some projects, as we use Sphinx for documenting our Javascript projects. Therefore, pip is also copying a very large node_modules directory, which can be painfully slow.

So we would vote for the .pipignore option as our use case highlights that hardcoded values won't necessarily be sufficient for all types of projects.

People do keep all kinds of junk in the tree beyond SCM file.

I have some large simulations (16GB +) produced by the code that I keep in the same directory with the package source code (as a way of keeping track of different projects).

pip install . copies them to my /tmp. The poor partition actually runs out of space and pip fails with a disk space error.

If sdist is not supposed to be used, and .pipignore expands the interface, then what about reusing the code for parsing MANIFEST.in / MANIFEST file? It should have described all files necessary for the installation.

A good workaround seems to be using an editable install (pip install -e $DIR).

A good workaround seems to be using an editable install (pip install -e $DIR).

Except, for testing, that doesn't test what a user installing the package from pypi would be using. (e.g. packages and modules that don't get packaged will still be available)

I hope this has been mentioned before in this thread.

A better workaround would be to build an sdist or a wheel directly using setup.py and installing the generated artefact using pip. That way, pip won't be doing the directory-copy stuff (because it got a file to install from) and this is the exact same result as you would with pip install . (as of pip 9), minus the directory copy.

For christ sake, guys, can this be solved somehow already, please? I mean, there seems to be some consensus that this behavior is braindead - yet the ticket is open for three years by now, and there is no solution in sight. I hate having to do manual move data in and out of my tree just so that pip does not barf or hang for some minutes (I have to work on shared filesystems).

If there is no consensus on how to not break existing work, can a solution like .pipignore be provided as an opt-in, maybe? I am don't mind jumping through some hoops to get this fixed.

@andre-merzky please calm down.

We are aware of the issue, but we're a volunteer organisation with very limited resources. And in practical terms, this issue simply doesn't affect enough of our users severely enough to be high on the priority list.

It will get fixed in due course (and the more major work that we're trying to get addressed at the moment, specifically PEP 517, is likely to solve this issue as a side effect) but shouting at volunteers won't help. If you feel that an immediate fix is critical, we'd be happy to review a PR - but you should be aware that even if you do raise a PR and get it accepted, it won't be released until PIP 10, and that's the release we'd like to get at least some of the "big ticket" work I referred to above into (it may not happen due to volunteer resource constraints again, but that's our aim). So it may be superseded before it gets released - but that doesn't mean you're not welcome to create a PR, it'll be a fallback should the bigger plans not come off in time.

@pfmoore sorry for the tone, frustration was speaking... I created a PR to a trivial (and thus possibly unacceptable) fix (#4900). I heard you on the release cycle, that's how things go, I know...

Ran into this as well:

(env) $ find node_modules/ | wc -l
140287
(env) $ time pip install .
Processing /path/to/myproject
Installing collected packages: myproject
  Running setup.py install for myproject ... done
Successfully installed myproject-1.0

real    4m35.598s
user    0m6.928s
sys 0m7.992s

After reset:

(env) $ mv node_modules/ ../
(env) $ time pip install .
Processing /path/to/myproject
Installing collected packages: myproject
  Running setup.py install for myproject ... done
Successfully installed myproject-1.0

real    0m0.899s
user    0m0.496s
sys 0m0.120s

Where is the latest profiling report about the problem?

No changes here. Today, pip is still copying the entire package to a temporary build directory.

Is this directory in memory?

No, its written to disk - which makes it particularly painful on shared file systems...

Is it at least in /tmp or /dev/shm? https://stackoverflow.com/questions/9745281/tmp-vs-dev-shm-for-temp-file-storage-on-linux Can it detect when tmpfs is not used and propose to create one?

It is in /tmp. It depends on the stdlib tempfile.

Implementing PEP 517 will solve this.

I'm running into this with the latest developer version of pip - I thought PEP 517 support was added in pip 19, so should this still be happening?

In my case because I work on a project (astropy) where I have many remotes and branches, my .git directory is 1.8Gb, and it takes minutes to copy this over to a temporary directory. It seems like it would make more sense to construct a source distribution first then build the wheel from there, behind the scenes.

We are still hurting quite badly because of this issue, too. It is really difficult to tell our users that they cannot keep code and experimental data (which are large) in the same directory - it's quite counter-intuitive. On our own systems, we use the .pipignore patch, but don't have the ability to deploy that on the majority of systems we support... :/

We run into this https://github.com/pypa/pip/issues/2195#issuecomment-351258913 today as well. It's still happening.

(venv) (venv) pip --version
pip 19.1.1 from /application/venv/lib/python2.7/site-packages/pip (python 2.7)

Implementing PEP 517 will solve this.

Narrator: it didn't.

Fixing this requires installing via sdist, and last time we discussed that, there was a lot of pushback from people using tools that (apparently) need the actual source directory. Personally I think we should bite the bullet and deprecate build processes that don't give the same results when you do build_sdist then build_wheel as you get when you just do build_wheel, but I don't have the time or energy to champion that proposal myself at the moment.

Fixing this requires installing via sdist

Actually, no - #4900 provided an implementation which solves the problem with little code in a backward compatible way. It might not solve other problems - but given the age of this ticket, I would like to ask to reconsider that approach.

Fixing this requires installing via sdist, and last time we discussed that, there was a lot of pushback from people using tools that (apparently) need the actual source directory. Personally I think we should bite the bullet and deprecate build processes that don't give the same results when you do build_sdist then build_wheel as you get when you just do build_wheel, but I don't have the time or energy to champion that proposal myself at the moment.

As someone who cared about the inplace build and therefore disliked the "must always go via sdist route": I've made peace with the "go sdist route" a long time ago.

This issue is _very_ painful if you run into it, and the "copy everything by default" makes little sense. So +10 to biting the bullet.

Fixing this requires installing via sdist

I had, incorrectly, assumed we'd make the switch with PEP 517.

I agree wholly with you here though.

IIRC we could have done, but the debates it would have triggered about whether installing via sdist was acceptable were too much extra controversy to add at the time - and as installing via copying and building a wheel was still an option, I took the less stressful course :-)

I'd still prefer to just switch to building via sdist, but I don't have the time right now to do it myself.

workaround: use a shallow clone (change depth to suit):

cd d:\code
git clone --depth=100 https://github.com/PROJECT/PROJECT.git d:/code/shallow-PROJECT
move d:\code\PROJECT d:\code\PROJECT-bloated
move d:\code\shallow-PROJECT d:\code\PROJECT

To reiterate and summarize:

  • The pip maintainers agree that this isn't a good experience for users. pip's own development processes hit this issue.
  • The reason this happens is, pip copies the source directory to a temporary directory, to ensure that the build does not depend on something out-of-sources.
  • The way we want to resolve this issue, is to change pip's behavior to build a source distribution in-tree, unpack the source distribution in a temporary directory and build a binary from that.

Now, going this route also fixes a bunch of other usability issues around pip's building mechanics for users.

I've started a self-motivated project to refactor pip's build logic. While I won't be tackling this issue as a part of my refactoring work, I am more than willing to help someone who is inclined enough to try to fix this issue -- the fix would be fairly involved in pip's build logic, which isn't the most straightforward bit of code around and there might be tricky edge cases that we only notice during implementation.

Oh, and as a band-aid workaround for this, added in #6770, pip 19.3 will exclude .nox and .tox directories when copying. This should reduce the amount of time these installations would take, for a fair number of users.

This doesn't resolve the issue for large .git or build directories -- that's what the approach I elaborated in my above comment would resolve. :)

This doesn't resolve the issue for large .git or build directories -- that's what the approach I elaborated in my above comment would resolve. :)

I know there's some tools that rely on .git, but is anyone relying on build being copied? That'd be nice to add to the ignored dirs, happy to send a PR if you agree.

Is this still being looked into? It's a very painful surprise to see multiple gigabytes of git-ignored debug-data dumps being copied over during a pip install .

Yes, take a look at the linked issues like #7555.

This issue still persists, because the directory I'm instaling from has maybe 10 mb of python code, but then a lot of json data files and .git.

This should be resolved by #7882 (build local directories in place).

We have now (per #7951) published a beta release of pip, pip 20.1b1. This release includes #7882, which implemented a solution for this issue.

I hope participants in this issue will help us by testing the beta and checking for new bugs. We'd like to identify and iron out any potential issues before the main 20.1 release on Tuesday.

I also welcome positive feedback along the lines of "yay, it works better now!" as well, since the issue tracker is usually full of "issues". :)

I will say that it is considerably better.

Old: noglob pip3 install . 3.76s user 2.51s system 12% cpu 50.245 total

New: noglob pip3 install . 3.40s user 0.70s system 42% cpu 9.764 total

Works great/faster for me! :+1:

» pip --version
pip 20.0.2 
» time pip install .
noglob pip install .  8.03s user 18.47s system 25% cpu 1:44.84 total
» pip --version
pip 20.1b1 
» time pip install .
noglob pip install .  3.69s user 0.31s system 92% cpu 4.307 total

down from ~2 minutes to 4 seconds, thank you so much!

Thank you for the positive reports @PythonCoderAS @astrofrog @klamann! :)

Unfortunately, there have been a number of issues with the implementation of in-place builds (which are being tracked under #7555) which means that for now, we need to revert #7882. As a result, this issue will become a problem again, and we'll therefore be reopening it. Longer-term, we hope to have a solution that addresses the issues that in-place builds solved, but without the impact on other workflows that the current solution had.

Sorry for the disruption that this will cause.

Unfortunately, there have been a number of issues with the implementation of in-place builds

@pradyunsg thanks for the update. Some feedback on terminology (please feel free to ignore, just FYI): this sentence, as well as gh-7555, confused me because pip does not do in-place builds. What in-place builds has always meant is python setup.py build_ext --inplace (or python setup.py develop).

Here you changed the meaning to: "build without copying to a tmpdir". Extension modules still don't end up in-place, they end up in a build/ dir that's usually easily cleaned up. It would be nice to be a little more explicit in for example gh-7555.

That was originally my wording. Sorry for any confusion, I wasn't aware that setuptools used the term "in place" to mean something different (and I'm still not really sure how that terminology applies outside of setuptools). We'll see if we can find a more neutral term in future (although offhand, I'm not sure what - suggestions gratefully accepted 😉)

No worries at all, thanks @pfmoore. I just thought I'd point it out, since confusion about terminology can sometimes result in talking past each other.

and I'm still not really sure how that terminology applies outside of setuptools

For tools like CMake and scikit-build I think it means the same thing: actually in-place, binaries land next to sources.

"editable installs" on the other hand is (I believe) invented here, and kinda means "in-place that pip is aware of".

although offhand, I'm not sure what - suggestions gratefully accepted

maybe just "local build" (vs. the current "copy to tmpdir and build")?

"editable installs" on the other hand is (I believe) invented here, and kinda means "in-place that pip is aware of".

We recently had a long discussion on what editable install means, and I think we actually landed in a place that is more along the lines of machine local as far as pip goes. But pip is unaware of where and how on the local machine and is the build backends job to define and handle that.

Could try «in-tree build» (similar to «in-tree PEP 517 backend») or «build in source dir»

My question is, why can't the feature be optional, so it does not cause problems but can be enabled by an argument or something similar?

I'm trying to wrap my head around the workarounds for this, where an editable install isn't an option. Is there any?

A workaround could be to build a wheel (using your build backend directly) then point pip to install it

why can't the feature be optional, so it does not cause problems but can be enabled by an argument or something similar?

It can. The reason for reverting the change was that we didn't have any opt-outs or a period for getting feedback on the change. We do have new flags to help facilitate that (--use-feature and --deprecated-feature), but someone has to reimplement/reintroduce the functionality in this context now.

Broadly, I think what we want to do here is:

  • Add a --use-feature=in-tree-build as an opt in.
  • Switch the default in a later release w/ a --deprecated-feature=out-of-tree-build as an opt out + pushing users of --use-feature=in-tree-build to drop it.
  • Drop both of the options in a subsequent release.

A workaround could be to build a wheel (using your build backend directly) then point pip to install it

I was thinking without an extra build step. But I guess I should have never thought Python could get away without Makefile equivalents from the beginning.

Was this page helpful?
0 / 5 - 0 ratings