Pip: Toward PEP 518

Created on 21 Oct 2017  ·  101Comments  ·  Source: pypa/pip

I'm AWOL this weekend but as I understand it, there needs to be a discussion toward PEP 518 and implementation of it in pip.

I've opened this issue, because I couldn't locate where the discussion was happening, if it is. Plus, having it in one place that's not pypa-dev/distutils-sig would be nice-ish?

auto-locked maintenance

Most helpful comment

You need separate pip installs to be run inside of a subprocess generally, because of caches in places like pkg_resources I believe (though I could be wrong there).

That doesn't mean you need to call pip though, you can create an API that serializes data via the CLI and call python -c "from pip._internals import coolapithing; coolapithing(sys.stdin.read())" and read more data off stdout. It would be possible to turn the recursive solution of pip's calling pips calling pips calling pips by turning it into a stack using such an API (since all recursion can also be described as a stack), you'd just essentially be making a private API that gets called as a process.

I'm still planning on reading this thread (had a bunch of plates spinning lately!), but one over thing: We don't really have a release schedule, we release when it's ready not on some target date. We sometimes have some general idea of when we would like to release, but that's not ever set in stone.

All 101 comments

4799 is where a lot of the debate is happening. It was mostly prompted by:

  1. I understood that the only outstanding blocker to PEP 518 support being in was #4764 (via https://github.com/pypa/pip/pull/4144#issuecomment-302711736)
  2. Then #4799 came up, and I looked at it to see if I could make some sense of all the in-progress work @xoviat was doing.
  3. In the course of that, #4647 popped up as a release blocker saying that PEP 518 support was broken.

As I dug into trying to work out what @xoviat was saying in #4799, it became obvious that we had some issues around recursive building of build environments (X needs Y to build, and Y needs Z, ...) although I'm still unclear as to whether these are implementation bugs, deeper design problems, or nasty corner cases that we can defer without too much issue.

Personally, I'm at the point where I'm out of my depth. I don't understand @xoviat's implementation to judge if it's needed, or if we still just need to merge #4764 and fix #4647 to be ready to go. Nor do I know how easy it'll be to fix #4647 (@xoviat seems to be saying that we need to merge #4799 to fix #4647, but that brings its own problems).

I've run out of time and energy to take the discussion any further this weekend, so I'm going to drop out at this point (at least for a while). For me, the key point is that we want an acceptable level of PEP 518 support for pip 10. I'd like someone to give me a feel for whether we're nearly there, or whether we're weeks away, so that I can avoid getting people fired up that pip 10 is coming only to then say it won't be till the new year...

Thanks for a very helpful summary @pfmoore.

@ncoghlan @dstufft @xoviat Can we please bring the discussion here? Doing it on a closed PR feels weird to me. ._.

Sure thing.

@pradyunsg I know you don't have time for this. But you have been more successful at getting PRs approved than I have, so if you want, I'll be more than happy to walk you through the current implementation, how it works, and the potential problems that it has. I'll explain how I solved some (but not all) of these problems and my ideas for a complete fix (which again, I may do after PEP 517 if it's not done). I honestly don't care who does the work as long as it's done.

Actually, you're AWOL, so let me write a summary:

pip has an object hierarchy, as is common in most Python projects. It all starts the the command, which creates new references to objects, which create new references to lower objects. It's like a tree.

I'll define the "scope" of an object as a sort of lifetime. It's the duration that an object exists. Right now, the scope of PEP 518 in pip is the WheelBuilder. The PEP 518 environment is set up for bdist_wheel, then bdist_wheel is run within that environment, and then the environment is torn down.

So what's the problem with that? The problem is that the scope of the PEP 518 environment needs to be equal or greater than the scope of all calls to setup.py. More specifically, it needs to encapsulate an object that exists throughout the duration of setup.py calls. That object is Requirement.

The first obvious decision that you will encounter is: what should have a reference to BuildEnvironment, and the Requirement is a good a place as any. In fact, it's the best place IMHO to put the reference because setup.py is called if and only if Requirement exists.

The next problem you may encounter is this: how do we install the BuildEnvironment requirements? We could just shell out to pip. And that's the decision that was made by the original implementer. But there's a problem with that: pip has no way of knowing how many shell calls it's making because each pip could call itself again. In fact, a maliciously constructed package with circular dependencies could crash someone's computer if pip spawned too many processes.

Another problem is what shell call should we make? It's actually trickier than you might think, because getting the command-line parameters is frankly a PITA where you need to make that call. So you might have trouble passing the original parameters that the user passed to the child. The solution used by the original implementer involved using the finder, but I think you know the problem with that.

Shelling out a child of yourself without some kind of manager class that can kill children when the user presses ctrl+C isn't just wrong, it's malicious, especially when you don't know how many processes you've spawned. I personally don't know whether the children die in the current implementation (this might be FUD), but if they don't, the current implementation IMHO is wrong (aside from the other concerns).

Some possible solutions to that issue are the following:

  1. If you want to get PEP 518 out the door, your best bet is probably some kind of lockfile that only allows up to 10 locks or so to make sure that pip isn't multiplying infinitely. Then you can just pass the exact requirements down to the child along with the command-line arguments.

  2. A proper solution, which I'd like to implement after PEP 517, is to have a BuildEnvironmentManager class that's initialized directly in the install command. the BuildEnvironmentManager would have a reference to all of the objects there (RequirementPreparer, WheelBuilder, etc.), and would have a single method: get_build_environment(requirement_set). You could then implement a method on RequirementPreparer that's something like set_build_environment_manager, which it can then use to obtain build environments. The BuildEnvironmentManager could even detect multiple uses of the same environment (most commonly ['setuptools', 'wheel']) and provide the same environment if it's needed multiple times so that you don't need to create it (very common initially with projects having no pyproject.toml). Ideally there would also be some OOP design to try to remove the circular references (not trivial).

@xoviat While it may not cover the case of deliberate malice, am I right in thinking that a build cache (which was used even when --no-binary :all: was specified) with the ability to track not only completed builds, but also in-progress ones, would be sufficient to ensure that build dependency cycles terminated? This would be a variant of your first suggestion (a cross-process limit on the number of concurrent pip invocations), but reformulated as:

  1. Only one process on a machine is permitted to be building the same package at the same time
  2. There's a top level "build ID" that pip passes down to any sub-builds that it spawns (e.g. the PID of the top level process combined with the name of the package being built)
  3. If the build cache indicates that something is being built by a different build ID, then wait for that build to finish
  4. If the build cache indicates that something is already being built for the same build ID, then bail out with an error reporting the circular dependency, and indicating that --binary <name> is going to be required for the build to work

pip would also need to implement @pfmoore's suggestion of exempting setuptools & wheel from the default logic of needing both setuptools and wheel as build dependencies, otherwise the implicit build dependency injection would inherently trigger the circular dependency detection logic.

Using the disk to avoid having to figure out OOP design problems is not a bad idea. That's like an intermediate option between implementing PEP 518 completely correctly and just hacking something together.

It would also interact nicely with containerised environments and chroots in general, since we'd be able to use operating system level tools to keep different Python level builds isolated from each other, so pip would just need to figure out how to ensure its own subprocesses cooperate with each other.

@xoviat thanks for the summary above. As I'd said, I'd reached the limit of my understanding of the code in this area, and your explanation has helped me enormously.

I had not actually looked at #4144's code before. I just did and I really don't want that to be shipping.

implementing PEP 518 completely correctly and just hacking something together.

Honestly, I think it comes down to this. Implementing PEP 518 completely and properly is a task which would/might delay pip 10 a (fair) bit if we go that way.

I think a safe-ish middle ground here would be to require build-dependencies to be available as wheels. That way, we can avoid the recursion problem since the build-dependencies (and all of their dependencies) would not need to be built via this process.

How does this sound?

It's restricted but I think a restricted first implementation is better than a you-can-shoot-yourself-in-the-foot-if-you-are-not-careful first implementation.

Implementing PEP 518 completely and properly is a task which would/might delay pip 10 a (fair) bit if we go that way.

Thanks for confirming that - that was my fear.

However, now I am confused, as I thought that at least partial PEP 518 was already in master. Specifically, as I understand it #4647 demonstrates a bug in the PEP 518 support in master - so clearly we have something already, as whatever it is isn't correct...

So we have to do something, and it seems the options are:

  1. Rip out whatever we have for PEP 518 support at the moment.
  2. Tidy up what we have and ship partial support.
  3. Implement PEP 518 fully before we release pip 10.

As you say (3) means a long delay before pip 10 and we have other fixes that I'd really like to see released (Unicode fixes being one we're regularly getting issues over). So I'm not keen on that. You didn't mention (1), and I'm not sure whether that's because you thought we had no PEP 518 support in place, or whether you assumed backing it out wasn't an option. Personally, I don't like the idea - it's a backward step, and it sends a pretty negative message about the PEP itself, if it's that hard to implement correctly. But I think we should be explicit about rejecting it.

Your proposal for (2), that we ship a version of PEP 518 that only supports wheels as build-dependencies (we'd still need to fix #4647, as the demonstration of that uses a build dependency that is a wheel) seems reasonable, in the sense that it's practical for us to implement. My main reservation is that I've no idea how problematic that restriction would be for people who want to use PEP 518.

So I guess I feel like we're stuck whatever we do, but partial support covering only wheels as build dependencies is the best option out of a bad lot :-(

The PEP authors (in https://github.com/pypa/pip/pull/4799#issuecomment-338331267 and https://github.com/pypa/pip/pull/4799#issuecomment-338332575 in response to my question in https://github.com/pypa/pip/pull/4799#issuecomment-338325354) were pretty definite that full PEP support required building any build dependencies, though, so it has to be only a stopgap.

But it's not me that will be implementing this, so I'm happy to go with the judgement of whoever does. One thing I have done is create #4803 and mark it as a release blocker, as a reminder that we should document how we deviate from the spec, if we need to.

(And off-topic for this issue, but can I suggest that we are careful not to make the same mistakes when we start implementing PEP 517? Let's make sure we understand all of the implementation implications before we get too deep into coding - my instinct is that PEP 517 is going to be an even more complex design problem than PEP 518...)

I'm most familiar with distros when it comes to the "build everything from source" perspective, and we definitely separate the process of "bootstrapping the buildroot" from that of regular package builds. Full auto-bootstrapping from source is hard, since you end up having to do things like bootstrap your C compiler.

So for pip, I think it's reasonable to say that build dependencies will always be installable from wheel files. The refinement you can introduce post 10.x is to have a build cache that's distinct from the regular wheel cache, such that users of the cache can be sure that all the wheels in that were built in a controlled environment, rather than downloaded from PyPI or another index server.

Personally, I don't like the idea - it's a backward step, and it sends a pretty negative message about the PEP itself, if it's that hard to implement correctly.

I don't necessarily agree with that. It's hard to implement correctly for pip. But pip as stated in the PEP is one of the only front ends that people are going to use. I think it's our job as language implementers, and pip really is a language that people use to package their python projects, to make it as simple as possible to create a build system without having to think about all of these difficult issues. For them it should just work seamlessly because we've done the hard work.

I think a safe-ish middle ground here would be to require build-dependencies to be available as wheels.

Actually that's exactly what #4799 does. If you want I can restore that branch and then you can fork it and submit it as a PR.

Two things on the implementation side of things of (2) still stand though -- as @xoviat had pointed out above:

  1. figuring out how to create the subprocess (arguments et al)
    I think should be doable.

  2. which package versions to be installed.
    This should probably be done in the same parent process although I'm not sure how exactly that would happen given the fact that the current resolver code is still intertwined with code in pip._internal.operations.prepare. I'll look into this sometime this week.

I'm not sure who would have the time to do these though.


it sends a pretty negative message about the PEP itself, if it's that hard to implement correctly.

It probably is not hard to implement correctly. It's just that with the way pip's codebase is today, it's non-trivial to implement in pip -- there's stuff that happens in weird places and I think if that is cleaned up, it would be fairly trivial.

You didn't mention (1), and I'm not sure whether that's because you thought we had no PEP 518 support in place, or whether you assumed backing it out wasn't an option.

I assumed backing out isn't an option.

Now that I think about it -- how important is it to ship PEP 518 in pip 10? I feel if it can be deferred to the next major release, that would (other than being the easy way out of this situation) be straightforward and both 517 + 518 could land in one big release. This feels clean enough that I won't be the one saying this is not the way to go.

@dstufft @xavfernandez thoughts?


@ncoghlan's idea of a build-cache sounds like a good idea to me although I'm not sure I understand all the implications of it.


If you want I can restore that branch and then you can fork it and submit it as a PR.

I probably won't have the time and even if I do I might not be reusing any existing commits. But restoring that branch can't hurt. :)

My main reservation is that I've no idea how problematic that restriction would be for people who want to use PEP 518.

We've had this exact discussion except that you probably didn't even know it. This situation is situation X. Calling egg_info before the build environment is set up is situation Y (#4799).

But it's not me that will be implementing this, so I'm happy to go with the judgement of whoever does.

I guess that means #4799 is back on the table then? As long as it passes all of the tests and does what it claims to do?

Aargh, those X's and Y's come back to haunt me again :wink: Yes, I'm saying I don't have a feel for the relative likelihoods of the 2 cases. I understand you're saying that build requirements that aren't wheels are rare enough that we're OK to go with ignoring that case. So were one "it's OK" and one "don't know" vote between us, basically. I'm not trying to block this option, just saying where the limits of my intuition lie, is all.

@xoviat I have a few of questions. It'd be awesome if you would answer them before you make a new PR. :)

  • How are you going to determine which packages get installed?
  • Are you going to restrict to binary-only build dependencies?

I've worked with a lot of the scientific projects so it could be that I'm biased. But I can rattle off a list of projects with wheel build dependencies and I genuinely cannot think of one project with source dependencies. Maybe I'm wrong.

@rgommers would you be okay with PEP 518 only supporting build dependencies available as wheels if it was in the next pip?

How are you going to determine which packages get installed?

Subprocess gets the requirements list exactly as specified. That way it goes through the resolver.

Are you going to restrict to binary-only build dependencies?

Yes that's in fact why the test was xfailed. The build dependency in the test is not a wheel.

Now that I think about it -- how important is it to ship PEP 518 in pip 10? I feel if it can be deferred to the next major release, that would (other than being the easy way out of this situation) be straightforward and both 517 + 518 could land in one big release. This feels clean enough that I won't be the one saying this is not the way to go.

What's our feel for whether we'll get anyone with the time to do PEPs 517 and 518 for pip 11? I'm not optimistic. It seems to me that they are both big chunks of work, and we also have the resolver work ongoing as well. While I'm in not favour of holding up pip 10 longer than necessary, I'm equally uncomfortable with letting all of our major feature plans drift while we release a series of essentially minor releases.

To put it another way, saying "let's go for a pip 10 release" prompted a resurgence of activity on the PEP 518 work. If we remove that from pip 10, I for one will focus on getting things ready for the release, and I suspect it's likely that PEP 518 loses momentum again. What's to kick things into activity again? @xoviat's been working on implementation, but he's had problems trying to get any of the rest of us to understand the issues he's been struggling with till now. I don't want to leave him working with no feedback again.

What we could do is release a "pip 9.1" with just the incremental fixes that we have ready, and reserve the version number "pip 10" for implementation of (at least one of) the 3 big ticket features that are in the pipeline. But if we do that, I'd like to try[1] to commit to a pip 10 release in the first quarter of 2018. I'd be OK with that as an approach. But does anyone have a feel for what would be involved in backing out the partial support we currently have in master? Or in documenting what we have and what its limitations are (so that people don't try to use it assuming it's complete, hit bugs and raise issues that we have to respond to with "this feature's not yet complete, sorry but wait for pip 10")? Are we just exchanging one big chunk of work for a different one?

[1] To the extent that we can commit to anything with extremely limited volunteer resource as all we have available.

I've worked with a lot of the scientific projects so it could be that I'm biased

Thanks, it's hard sometimes to know people's backgrounds. If you're familiar with the scientific projects, that alleviates my concerns a lot.

What we could do is release a "pip 9.1" with just the incremental fixes that we have ready, and reserve the version number "pip 10" for implementation of (at least one of) the 3 big ticket features that are in the pipeline.

I really like this. +1 to a pip 9.1.0 instead of pip 10.0.0

I'd like to try[1] to commit to a pip 10 release in the first quarter of 2018. I'd be OK with that as an approach.

I'd had a very interesting brainwave -- pip turns 10 years old on 12 Oct 2018. That would be the perfect date to do a pip 10.0.0 release. It's a completely different timeline. I am not saying that we should delay the white-whale features until then but some part of me really wants this version number and age thing to coincide too.

I suspect it's likely that PEP 518 loses momentum again.

I'll do what I can to make sure it doesn't. Hopefully @xoviat is willing to as well. :)

does anyone have a feel for what would be involved in backing out the partial support we currently have in master?

I won't mind taking a look at this tomorrow. Since @dstufft was the one reviewed on #4144, I think his input on this would valuable.

Note - I wouldn't want to do anything as drastic as backing things out without agreement from @dstufft and @xavfernandez - so let's see what they have to say, too.

@dstufft doesn't have enough time in the day. He also has to make sure that warehouse doesn't go down.

let's see what they have to say, too.

Yes please. :)

From a UX perspective: comprehensively countering the "trusting trust" attack is really painful [1], and you'll find a lot of folks that say "I compile everything from source" aren't actually doing so - somewhere in their process there will be a bootstrap step where they trust a binary provided either by someone else (e.g. the runtime environment and build toolchain from their operating system provider), or else from a previous generation of their own platform (e.g. the buildroots for new versions of Fedora and RHEL get seeded from previous versions of Fedora and RHEL, they don't start completely from scratch). Even a source-based Linux distro like Gentoo starts out with an installer to give you a working build environment with a Linux kernel, C compiler, hardware drivers, etc.

So I think it's entirely reasonable for pip 10 to say that --no-binary :all: only applies to runtime dependencies, not to build dependencies. If folks want to explicitly construct their buildroot from source, they still can - it's just that pip 10 won't implicitly automate it for them due to the inherent recursive bootstrapping problems involved in allowing implicit source builds for your build dependencies.

To allow folks to indicate that they expect the build environment to be fully preconfigured though, it would be reasonable to add a separate --no-implicit-builddeps option to have the install fail outright if implicit binary bootstrapping is needed as part of a source build. That way, folks trying to ensure everything is built from source (including the build dependencies) can do the equivalent of:

pip install --no-binary :all: --no-implicit-builddeps -r build-requirements.txt
pip install --no-binary :all: --no-implicit-builddeps -r requirements.txt

And define as many distinct install groups as they need to get to a point where the first one doesn't need anything other than CPython and any non-Python build toolchains preinstalled.

A potential future complement to that concept would be to allow people to say --buildenv <path> to specify a preconfigured build environment to use for any required source builds, rather than doing each build in an isolated environment. However, I wouldn't try to get that into pip 10 - I'd suggest limiting 10.x to the happy path of "binary build dependencies are allowed" and the alternative option of "fail the build if a binary build dependency is needed and isn't already available in the currently running interpreter".

[1] https://www.schneier.com/blog/archives/2006/01/countering_trus.html

I've thought of another option, which seems reasonable and wouldn't require too much refactoring: essentially using multi-threading to put the main thread on-hold while the build environment is set-up. The idea is sort of like this: in install.py, you would have a BuildEnvironmentManager:

class BuildEnvironmentManager(Thread):
    '''Has references to literally everything (cache, resolver, etc.)'''
    def run(self):
        while True:
            requirement_list, future = self.build_environment_queue.get()

            # install the requirements using all of the things
            # that we have

            # then put the build environment in the future
            future.put(BuildEnvironment())

You would then have another file, (I use backend.py because it's not full enough and could probably use more things in it, and it's at the lower end of the tree):

class Future(Queue):
    pass

class BuildEnvironmentQueue(object):
    def __init__(self):
        self._queue = Queue()

    def request_build_environment(self, requirement_list):
        f = Future()
        self._queue.put((requirement_list, f))
        return f.get()

    def get():
        return self._queue.get()

And in operations/prepare.py:

# This call will put the thread to sleep until we have a build environment
# with the requirements installed
self.build_environment_queue.request_build_environment(requirement_list)

This has the advantage of requiring minimal refactoring, having a serialized BuildEnvironmentManager (so the build environments can be optimized and you know exactly what requests have been made in a single object) and keeping everything contained in one process (so the worst-case scenario is a deadlock). Of course, logging would need to be disabled for the other threads but that's not too much of a problem.

Answering my own question about the queue.Queue based approach: it's best to avoid relying on concurrent.futures, as using that would require vendoring https://pypi.org/project/futures/ in Python 2.7.

Without knowing the pip code base well, the notion of consolidating build environment management in a single place still seems like an attractive option.

concurrent.futures is not required for that approach. Future is just a more descriptive wrapper.

Only primitive required is a Queue: https://docs.python.org/2/library/queue.html

I guess we can just move these lines into the BuildEnvironmentManager.

I've worked with a lot of the scientific projects so it could be that I'm biased. But I can rattle off a list of projects with wheel build dependencies and I genuinely cannot think of one project with source dependencies. Maybe I'm wrong.

Well, for one there's every OS that's not [Windows, macOS, Linux], IIRC those aren't covered by manylinux1.

@rgommers would you be okay with PEP 518 only supporting build dependencies available as wheels if it was in the next pip?

Not my call, but I'd be happy with any step forwards here. PEP 518 support is optional anyway, so it only working when wheels are available (covers >90% of cases I'd say) in pip 10 is still a significant improvement.

Note that even platforms that don't permit wheels on PyPI will still have a local wheel cache, which means even if pip can't implicitly bootstrap things, it may still be able to print them out and say "get these build dependencies installed somehow, and then this will work".

But does anyone have a feel for what would be involved in backing out the partial support we currently have in master?

I looked into this; it doesn't seem to be too hard. I'll be happy to make a PR for it if we decide to go this way.

+1 to a pip 9.1.0 instead of pip 10.0.0

I finally got time to properly read the distutils-sig thread and to look at relevant PRs and discussions (#4351, #4144, #4799 and a bunch others). I now think since we've announced pip 10; that's what we should do, with the partial PEP 518 support -- no 9.1.0.

this version number and age thing to coincide

Bummer. :(

@ncoghlan Maybe this comment slipped under the radar -- https://github.com/pypa/pip/pull/4799#issuecomment-338416543

In case it didn't, it'd be nice if you could explain why it wouldn't work because I sorta understand that sort of setup and am definitely open to learning more about it. :)

@pradyunsg I think that would mostly work, since it's a specific implementation of the build cache idea. The one aspect it doesn't cover is build dependency loops, since it's missing a way to detect "I was just asked to build something I am already attempting to build".

Note that pip doesn't need to magically resolve dependency loops - it just needs to detect them and fail as soon as it spots one, rather than actually going into an infinite loop.

it doesn't cover is build dependency loops,

That wouldn't occur with binary-only build dependencies?

@pradyunsg The linked comment was about a way of allowing source builds for build dependencies, which means circular dependencies become a potential concern. If we're requiring binary dependencies, then pip can just rely on the existing wheel cache for the time being.

Ah right. Thanks! :)

I'm in favor of a pip 10 with a partial PEP 518 implementation limited to binary build dependencies only (or already available in pip wheel cache) if that is all we manage to include.

I haven't read the entire thread yet, but I just want to point out that one side effect of limiting to binary build dependencies will be that it is ~impossible to have a C dependency in your build deps in many cases. Yes we have binary wheels on Windows, macOS, and some Linux versions, but we do not on:

  • Any Linux which does not use glibc (Alpine Linux inside of Docker being a popular one).
  • Any *nix operating system that isn't a Linux, like FreeBSD etc.

This would mean that any CFFI based project for instance would either not be able to use PEP 518 or would be uninstallable on those platforms.

This might have been brought up already! I'll be reading this thread later.

@dstufft That's correct. But what we're proposing is that using the pip cache is an option. So you can just pip wheel or pip install your build dependencies first and then they'll be stored in the cache.

This might have been brought up already!

Nope. :)

This would mean that any CFFI based project for instance would either not be able to use PEP 518 or would be uninstallable on those platforms.

Indeed. :-(

The way around this in my head would be that we could make PEP 518 behaviour opt-in -- if there's a pyproject.toml file, we use the isolation + build environment otherwise fall back to the current behaviour of using setup.py.

I'll be reading this thread later.

Please do. :)

I had the exact same comment as Donald; my understanding of this thread is that binary-only is temporary because there's no time to implement it for pip 10. Correct?

If it was proposed as a permanent decision instead, then -1 of course.

I had the exact same comment as Donald; my understanding of this thread is that binary-only is temporary because there's no time to implement it for pip 10. Correct?

That's correct. pip should support source dependencies but we're short on manpower.

The way around this in my head would be that we could make PEP 518 behaviour opt-in -- if there's a pyproject.toml file, we use the isolation + build environment otherwise fall back to the current behaviour of using setup.py.

I re-read this comment and I'm going to disagree here. PEP 518 support should not be optional, (for implementation reasons related to PEP 517) IMHO, but project's shouldn't become uninstallable on these platforms.

More specifically, the particular project that you are installing shouldn't determine whether you get PEP 518. That should be determined by whether your build dependencies are available as wheels or in the cache. Further, we can make PEP 518 support mandatory for those platforms as well if we just spit out a message like the following:

Error: build dependency X is not in the pip cache. Run "pip install X" before installing Y.

Summarising my own perspective:

  1. I view the "No implicit build support for build dependencies" as a temporary limitation in pip 10 to make certain classes of problem (e.g. build dependency loops) impossible to encounter in the first released iteration of the feature. Future iterations of the pluggable build backend support can allow implicit source builds for build dependencies, while putting suitable measures in place to avoid the new problems that arise once you allow that.
  2. Emitting the relevant python -m pip wheel X Y Z command in an error message without actually running the build implicitly is an adequate workaround for now, since it ensures pip can't inadvertently fork bomb a machine.
  3. Isolated builds probably shouldn't be the default yet, unless the specific wheel being built has a pyproject.toml file, or isolated builds are explicitly requested from the command line. This is a backwards compatibility issue, since existing projects are going to be expecting the non-isolated behaviour. Once isolated builds have been available for a release or two, and any usability issues with them have been worked out, then they can become the default generally (perhaps with a command line option to specify a particular build environment to use rather than implicitly generating isolated ones)

@ncoghlan Just to give you a heads-up: no default build isolation means no PEP 517 (at least with my approach) because only the newest version of setuptools supports it (we need to install a newer setuptools regardless of what is on the person's computer). Practically I think that might delay PEP 517 at least a year because it will dramatically increase the amount of effort required to implement it (requiring PEP 517 and non-PEP 517 code).

This is a backwards compatibility issue, since existing projects are going to be expecting the non-isolated behaviour.

Most people have CI-scripts that run pip install X and then run pip install Y. These projects would need to add a pypproject.toml. But adding a pyproject.toml isn't that much work, and we can add a command-line flag to disable build isolation if necessary.

We at least should spit out a warning if a project doesn't have pyproject.toml in pip 10 (which looks like isn't going to have PEP 517 support anyway).

@xoviat "It wouldn't be that much work to adjust" isn't how backwards compatibility works. If it was, pip would have switched to --user as the default non-venv installation model by now :)

As far as PEP 517 goes, you can't depend on PEP 517 as a package publisher without adding a pyproject.toml file, so it's fine if setup.py-only projects don't get PEP 517 support by default.

Would you be fine with spitting out a warning?

I'd see it as a problem if a build that currently works fine started spitting out a warning just because pip was upgraded, even if neither the project itself nor any of its dependencies had changed.

PEP 518 and 517 were deliberately designed to cause zero disruption for existing projects where all the publishers involved continued relying solely on setuptools.

It does make sense for pip to aim to consolidate back to a single PEP 518 build path even for setuptools based projects, but the time for that is after isolated builds have seen a release or two worth of practical use, not in the first version that supports them at all.

I had the exact same comment as Donald; my understanding of this thread is that binary-only is temporary because there's no time to implement it for pip 10. Correct?

Yep. Exactly.


It does make sense for pip to aim to consolidate back to a single PEP 518 build path even for setuptools based projects, but the time for that is after isolated builds have seen a release or two worth of practical use, not in the first version that supports them at all.

+1

I think that we should aim to remove the old path in, like 2 major releases. When pip lands full and proper PEP 518 support; we should deprecate the old build logic and remove it according to the standard deprecation policy.

I agree with Nick's summary and...

because it will dramatically increase the amount of effort required to implement it

No. I don't think implementing PEP 518 this way has any major road blocks; I made a short comment over https://github.com/pypa/pip/pull/4799#issuecomment-339219397 about how this could be implemented within pip.


What we want to do is provide people with a clean transition from old to new. Thus, we need to keep the current pip 9 install logic unchanged -- which would basically support everything that we currently do in the exact manner we do it.

Putting in a pyproject.toml file in the archive would mean the package is opting in to the newer standard and is willing to test out the support for the new behaviour -- going through the build environment with isolation and binary-only build-dependencies (for now).

No. I don't think implementing PEP 518 this way has any major road blocks;

Discussing PEP 517 here. Sorry for the confusion.

We would have to run the tests twice to check both code paths. Ah well PEP 517 is probably deferred.

IMO,

  1. Warning if a project doesn't have pyproject.toml sounds like a very bad idea. After all, 99% of projects on PyPI currently don't have pyproject.toml, and we can't spam end users with warnings that they can't do anything about (other than report the problem to the project(s). Am I missing something?
  2. I don't believe that build isolation was mentioned at all in PEP 518. It's an additional feature that pip has wanted to include for some time now, but it's not tied to PEP 518 support, except by the coincidental fact that the same PR implemented both (AFAIR). So if build isolation is what's giving us issues here, I'm OK with having just PEP 518 initially, and adding isolation as a phase 2. I'll leave that decision to the people implementing things, though.

Am I missing something?

Nope.

I'm OK with having just PEP 518 initially, and adding isolation as a phase 2.

I think that we should do PEP 518 and build isolation together, since it's a nice way to have people switch over to isolated builds.

While neither PEP 518 nor PEP 517 require isolated builds, PEP 517 recommends them for sound reasons: https://www.python.org/dev/peps/pep-0517/#recommendations-for-build-frontends-non-normative

Without a local binary artifact cache, isolated build environments are impractical, but once you have one of those (as pip now does), they're far more feasible, since:

  1. On most installations, you won't even need to do a source build in the first place
  2. When you do need to do a source build, you'll usually be setting the build environment up from the cache

At the same time, isolated build environments do require a bit more work from publishers, since they mean that buggy metadata will break the publisher's own builds in a way that requires them to explicitly say "My build dependencies are not fully declared" in order to do a build.

So having pyproject.toml based builds be isolated from the start provides a natural switching point, since that entire PEP is about providing a way to clearly and consistently declare build dependencies separately from runtime dependencies. That means folks switching over from setup.py are presumably doing so because they care about doing that kind of thing, while folks coming to it fresh for new projects will just treat it as another hoop that the packaging tooling obliges them to jump through.

So, just a few things I want to confirm before I get down to writing code:

  • PEP 517 support is not a blocker for pip 10
  • PEP 518 in pip 10

    • opt-in via pyproject.toml

    • supports only binary-only build-dependencies

    • isolated builds

PEP 517 cannot be a blocker for pip 10 because it's not ready yet, and there is no clear path forward at this point (there is a path forward but it's not clear).

I have a comment and question in response to @xoviat's comment here summarizing the implementation challenges as well as after reading this thread quickly.

First, regarding the recursion issue of things possibly blowing up, in general any recursive function can be "translated" to an iterative one. I'm wondering if that approach could help here by providing more control.

Second, what does shelling out buy as opposed to calling a pip function from within Python? Is there any reason an internal API function couldn't be created / refactored that would do whatever shelling out is trying to achieve? That should provide more flexibility when invoking the call (as compared to CLI parameters). This could also provide more control by allowing one to more easily manage the state of the overall process.

Is there any reason an internal API function couldn't be created / refactored that would do whatever shelling out is trying to achieve?

Second, what does shelling out buy as opposed to calling a pip function from within Python?

It buys time that we currently don't have. pip is already behind it's release schedule.

First, regarding the recursion issue of things possibly blowing up, in general any recursive function can be "translated" to an iterative one.

I'm not anti recursion. I'm anti process-recursion. I think it's fine if you want to use 100% CPU (well, it would be 20% in Python), but ultimately the user needs to be able to open task manager and kill the maximum of 15 processes that are there. To me, a situation that could potentially cause a process explosion is unacceptable.

That doesn't answer the question of why it buys time, though. What makes it hard to create an internal API function that does the same thing?

In any case, if shelling out solves some particular problem, one possibility to make this approach easier could be to temporarily expose a private / internal CLI command that makes it easier to pass whatever information is needed (e.g. it could even be a serialized Python object, etc).

That doesn't answer the question of why it buys time, though. What makes it hard to create an internal API function that does the same thing?

If you think it's easy, then go ahead. I'm not saying that sarcastically: please, go ahead, because it will solve all of the problems.

I don't think it's easy. I'm asking the question to get insight into why it's hard. (I'm assuming you've thought about this since you said it would save time.)

You need separate pip installs to be run inside of a subprocess generally, because of caches in places like pkg_resources I believe (though I could be wrong there).

That doesn't mean you need to call pip though, you can create an API that serializes data via the CLI and call python -c "from pip._internals import coolapithing; coolapithing(sys.stdin.read())" and read more data off stdout. It would be possible to turn the recursive solution of pip's calling pips calling pips calling pips by turning it into a stack using such an API (since all recursion can also be described as a stack), you'd just essentially be making a private API that gets called as a process.

I'm still planning on reading this thread (had a bunch of plates spinning lately!), but one over thing: We don't really have a release schedule, we release when it's ready not on some target date. We sometimes have some general idea of when we would like to release, but that's not ever set in stone.

Ultimately though Python has a maximum recursion depth to make sure that things don't get out of control. We'd need to implement that if we went with that approach.

Yes-ish, a stack based approach makes it pretty efficient to go pretty deep (much deeper than anything besides a dependency loop would ever do, for example we could have something that depends on literally every package and still be fine), the main thing to do is to detect loops.

One fairly naive and easy way of doing loop detection is to just put a upper limit on the number of items on the stack and say that if you hit this limit then you must surely be in a loop situation and error out. The downsides with that are of course that loops don't get detected as early as possible, and packages with deeper build dependency chains then the limit simply don't work.

The generally better option (since if you use a stack based approach you can access the entire stack) is to simply traverse the stack and look to see if the item we're attempting to install is already on the stack anywhere, and if it is break out and error because we've hit a loop (this error could either be presented to the end user, or could bubble up to the resolver eventually to try a different version-- although that makes it much much slower).

And to directly answer @cjerdonek's question: in principle this isn't hard, what's making things tricky are some embedded architectural assumptions in the way pip currently works that are no longer true in a world where each source build gets its own isolated build environment, rather than just running directly in the installation environment.

That means the easiest way to re-use pip's existing dependency management logic without stumbling over those internal architectural limitations and without risking breaking currently working code is to run another instance of pip in a subprocess. That's mostly fine, except for the consequence that failing to detect and bail out of dependency loops may fork bomb the system running the build.

I like @dstufft's idea of converting to an iterative / stack-based approach and shelling out to an internal pip function using the pattern python -c "from pip._internals import coolapithing; coolapithing(sys.stdin.read())". That seems simplest based on the discussion, as well as robust.

I think the first step towards this would be to boil the straightforward recursive approach down to a single recursive Python function with the expected input and output (at least sketching it), and then this could be translated / converted to the iterative approach. And yes, you could maintain a set of visited calls to prevent looping, which seems like one of the easier aspects to solve.

I looked into / thought a little more about converting the recursive approach to an iterative one. @xoviat's partial work on PEP 518 (PR #4799) helped me find the point of recursion (which some of you are probably already familiar with). It's at his code comment:

# TODO: Use single process with recursion handling

where it then invokes pip install ....

My idea is that it looks like this could perhaps be solved by a variant of pip install (for the build dependencies) with something like the following change:

  • If the install doesn't require any sub-installations, then do the install.
  • Otherwise, return with a (possibly partial) list of the sub-installs that are required (e.g. by writing the info to an agreed-upon file if writing to stdout doesn't already work).

In this way, the top-level root process can progressively be generating a tree of the build dependencies. And it can process leaves as they are found. As leaves are processed, nodes which previously weren't leaves will become leaves, and so on. With this implementation, at any point there will only be at most one pip-install happening in a subprocess.

A slight variation to what I suggested above is that the needed pip command / subprocess call can return / emit a list of the sub-install invocations a candidate install would require (a pip get-subinstalls or simply pip subinstalls command). The only difference to what I suggested above is that this command would be limited to reporting information. It wouldn't actually do the install. So implementing it could be simpler / easier to test.

@cjerdonek I don't see any problem with that idea. But ultimately someone needs to implement it (I think @pradyunsg was going to work on something this weekend?) and as always more difficulties may be discovered then.

Something has come up. If anyone else wants to pick this up, I have no
issues. :)

I also like @dstufft's idea.

On Sun, 29 Oct 2017, 08:47 xoviat, notifications@github.com wrote:

@cjerdonek https://github.com/cjerdonek I don't see any problem with
that idea. But ultimately someone needs to implement it (I think
@pradyunsg https://github.com/pradyunsg was going to work on something
this weekend?) and as always more difficulties may be discovered then.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/pypa/pip/issues/4802#issuecomment-340234567, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADH7SYImpWgJGg-DzQRcO_9hHfE6ZxEAks5sw-5RgaJpZM4QBdSg
.

Coming back around to this, do we want to go ahead with the stack+internal invocation approach suggested by @dstufft?

/ping @ncoghlan @pfmoore @xavfernandez

Yes please. Anything to move this forward.

Is there anyone that can summarise where we stand with PEP 517 and PEP 518 in relation to a pip 10 release? Specifically:

  1. Is master currently in a releasable state? Last I heard, PEP 518 support was broken (there's at least one release blocker issue open related to PEP 518 - #4647).
  2. Are we likely to have a working PEP 517 and/or PEP 518 implementation in a timescale that makes it reasonable to delay pip 10 until they are available?
  3. Assuming we fix the stuff in (1), do we want to do a pip 10 release without PEP 517/518? We produced a heads-up email about the release, so people know it's coming. And there are some reasonably significant fixes in there (e.g., encoding fixes for Windows) that would be nice to release.

My feeling is that we're waiting on the release blockers, but we're not close enough to PEP 517/518 support to block pip 10 on them. But I don't think anyone is working on #4647 except as part of implementing PEP 518.

One alternative option would be to document the limitations of current PEP 518 support, and downgrade #4647 from being a release blocker. I don't know enough about the use cases to know if that's viable.

I only just saw this discussion - apologies for the half-baked initial implementation that's liable to act as a fork bomb, and thanks to all of you who've taken the time to understand it and come up with better ideas.

FWIW, I think restricting it to installing wheels for build requirements would be an acceptable compromise for the first version.

This also reminds me that I should probably fix matters so that flit is not a build requirement of itself.

apologies for the half-baked initial implementation that's liable to act as a fork bomb,

No apology necessary. As I said you cannot predict these issues on the first try. Even knowing what we know now, delaying the PR would have just deferred these discussions.

Is master currently in a releasable state?

IIUC, it can fork-bomb a system as of now; is that correct? If so, I think it's not.

Are we likely to have a working PEP 517 and/or PEP 518 implementation in a timescale that makes it reasonable to delay pip 10 until they are available?

I think, the easy (short term) solution would be to just restrict to wheels for build dependencies. I'll try to take a stab at it sometime next week. If that doesn't materialise, I'll be fine if we remove the current PEP 518 support from master and cut a 10.0.0 from that.

I don't think anyone is working on #4647 except as part of implementing PEP 518.

I think you mean, complete implementation of PEP 518 with source build dependencies allowed?

Oh, and about #4647 -- @xoviat's description above says, fixing that would take changing/moving code and ownership/visibility of objects (specifically BuildEnvironment), which is non trivial.

I think that restricting it to wheels should be as simple as changing this line:

https://github.com/pypa/pip/blob/fc6b2c192088737f81259b6446f627f20ce46443/src/pip/_internal/wheel.py#L696

to:

finder.format_control = FormatControl(set(), set([':all:']))

The second field there is a set of packages to find only as binaries, with a special case for ':all:' to mean all packages.

Yes. But that alone wouldn’t address #4647. Also, none of this code goes
through a resolver.

On Sat, 2 Dec 2017 at 01:23 Thomas Kluyver notifications@github.com wrote:

I think that restricting it to wheels should be as simple as changing this
line:

https://github.com/pypa/pip/blob/fc6b2c192088737f81259b6446f627f20ce46443/src/pip/_internal/wheel.py#L696

to:

finder.format_control = FormatControl(set(), set([':all:']))

The second field there is a set of packages to find only as binaries, with
a special case for ':all:' to mean all packages.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/pypa/pip/issues/4802#issuecomment-348598368, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADH7SUi0QMS3rr5Iba90XWZmFweGmqeBks5s8FlEgaJpZM4QBdSg
.

Right, there's a bunch of other stuff wrong with it. But that should prevent it from fork-bombing, which I think is the most pressing concern.

If someone wants to take over my original PR ("fix the PEP 518 problems"),
it shouldn't be to difficult to alter it so that build isolation isn't
enabled without a pyproject.toml. The original reason that it wasn't merged
was that it dropped support for installing dependencies from source with
PEP 518. However, now that folks are beginning to realize that PEP 518 may
not be in pip 10 at all, they may be more amenable to accepting that PR. I
personally don't have time to champion it, but that shouldn't stop others
from taking it forward, as only a few lines of changes will be requires
(aside from xfailing the PEP 518 test).

Actually, against my better judgement, I am willing to implement both PEP 517 and 518 shortly if the pip developers will agree with my conditions:

  1. Dependencies will be only from wheels initially
  2. pip will have an internal build backend initially, even though it will eventually be removed

I have no issues with 1; I have no preference either way for 2.

On Sun, 3 Dec 2017 at 02:36 xoviat notifications@github.com wrote:

Actually, against my better judgement, I am willing to implement both PEP
517 and 518 shortly if the pip developers will agree with my conditions:

  1. Dependencies will be only from wheels initially
  2. pip will have an internal build backend initially, even though it
    will eventually be removed


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/pypa/pip/issues/4802#issuecomment-348720096, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADH7ST6riptZkYMap5Z5SstRf-VmE7eAks5s8bu5gaJpZM4QBdSg
.

FYI the conditions are not arbitrary but are there to make initial implementation possible. Is @pfmoore okay with this?

I'm, not particularly comfortable with the tone of the offer "against my better judgement ... agree with my conditions". I'm not going to object if the other pip developers are happy to take up this offer, but personally I don't have the time at the moment to restart the whole debate on implementation details. Basically, I'll defer to @dstufft and @xavfernandez on this (@pradyunsg has already given his views).

The tone of the offer is the way that it is because methods of implementation were closed off due to fundamental disagreements about what an initial implementation would look like. I'd rather agree on the principles now than devolve into another implementation debate.

I'll just state that I am also not super comfortable with the tone, it's
just, I don't have the time or energy to go into why that tone was used
etc. Same reason for the terse reply from my end.

Maybe also worth stating, I'm cool with binary only build dependencies in
the first implementation, not for a longer term. This doesn't apply to
runtime dependencies (since that has also somehow come up in another
discussion).

On Sun, 3 Dec 2017, 23:37 xoviat, notifications@github.com wrote:

The tone of the offer is the way that it is because methods of
implementation were closed off due to fundamental disagreements about what
an initial implementation would look like. I'd rather agree on the
principles now than devolve into another implementation debate.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/pypa/pip/issues/4802#issuecomment-348802032, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADH7ScUh-BveonoTxZ5FkkeSynFvoLb8ks5s8uNRgaJpZM4QBdSg
.

Honestly, it's probably not an effective use of anyone's time to discuss the tone of the post further, as it's not relevant to including these features in pip 10. However, I need an assurance that my conditions are acceptable to merge from either @pfmoore (who has indicated that he cannot make such an assurance, which is acceptable given that no one is paid for their time here), @dstufft, or @xavfernandez.

Again, the conditions are not my personal opinion, but are implementation-driven. If I relax these conditions, then I cannot promise an implementation, so there's no point in spending time on preparing a PR, and then people reading the diff and asking "oh, why is this line here?" and then "oh, so this isn't mergable?" because there was a miscommunication about what exactly the purpose of the PR was.

Agreed re the tone. My point is simply that I won't be merging the change[1], so my view is not really that important here.

[1] Obviously, I'm saying that without having seen the implementation, so I hope it's clear that my reasons are not to do with concerns about the code quality - it's about not having the time to ensure I understand the code well enough to be willing to merge, basically.

@xoviat What are your implementation plans with respect to the iterative vs. recursive approach we discussed above?

Also, to clarify, are you saying that you will first be completing a partial implementation of PEP 517 and 518 that can be merged, followed by a full implementation that can be merged? Or are you saying that you will only be doing the partial implementation, or are you saying that you will be doing a full implementation proceeding through earlier stages that won't necessarily be mergeable? (I'm partly trying to get a better sense of what you mean by "initially" and "eventually" in your comment.)

The first condition eliminates the entire recursion issue.

Also, to clarify, are you saying that you will first be completing a partial implementation of PEP 517 and 518 that can be merged, followed by a full implementation that can be merged?

What I am saying is that I will be completing a partial implementation this will work for 95% of use cases; specifically in the case where dependencies have wheels (very common now) and you're on a manylinux/windows/OSX platform (vast majority of users).

It's not a full implementation. But the way you get un-mergable PRs is trying to do the "all or nothing" approach where you either comply with the standard or you don't.

Keep in mind that a full implementation will require sorting through some fairly nasty issues that will each need a separate PR (as having a PR with 100+ comments usually means that the code isn't well-reviewed). [1]

[1] https://github.com/btc1/bitcoin/pull/11#issuecomment-313843216

I'll close this issue now -- we have preliminary support for PEP 518 in pip 10 which only supports wheels as build dependencies. I'll open a new issue for discussing about complete support and another for PEP 517 support (quoting from here as relevant).

Thanks @ncoghlan @rgommers @cjerdonek for your insights and help here. Thanks @takluyver for the initial implementation of PEP 518. Thanks @xoviat (who's @ghost now) for all the help with implementing these changes. Thanks @benoit-pierre for your help with improving the current support.

PS: 100th comment! :tada:

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings