Pipenv: Allow users to override default PyPI index URL with PyPI mirror URLs (without modifying Pipfile)

Created on 27 Apr 2018  ·  46Comments  ·  Source: pypa/pipenv

Hello all,

The situation

Currently, there is no easy way to override the default PyPI index URL to use a URL pointed at a mirror. In corporate environments, requiring developers to use a repository mirror is quite common:

  1. Corporate firewalls prohibit access to external software repositories.
  2. Internal repository mirrors conduct malware and vulnerability analysis, which can be a compliance requirement.
  3. Internal mirrors preserve modules that might later be unavailable upstream (due to outage, deletion, etc), which is necessary to ensure the availability and auditability of modules used within the company's environment.

Unfortunately, this doesn't appear to be easily accommodated by pipenv. Although the mirror could be explicitly added to the Pipfile as the source for these packages, this breaks portability.

  1. Projects initialized internally will contain unreachable indexes if published externally. Users of the public version will have to modify the Pipfile prior to installing the module's dependencies.
  2. Projects initialized externally will not work internally without modification of the Pipfile. These modifications must be maintained locally (but not shared), and reapplied if the Pipfile changes upstream.

There should be a way to override the location of the PyPI index, by specifying a (true) mirror. This would only be applicable to PyPI, and not to other third-party repositories (these would still be specified explicitly in the Pipfile).

General proposal

Docker accommodates this situation by allowing the user to specify a registry mirror in the daemon's configuration file. Likewise, it'd be great if the pipenv user could specify a (true) mirror for PyPI, via an environment variable, configuration file, or command line parameter. If this value is set, pipenv should use the mirror for all PyPI packages, even if a connection to PyPI is available. In some corporate environments, PyPI remains unblocked, but policy dictates that the mirror is used for the other reasons mentioned above.

Implementation considerations

  1. Pip already allows users to override the default index url through pip's configuration file. Although this would likely be the most obvious source of the internal mirror's url (and would likely be set for these users), this parameter can be used for repositories that aren't true mirrors. Accordingly, it's probably unsuitable for this purpose.
  2. For modules whose dependencies are all available on PyPI, it's my understanding that the explicit source can be removed from the Pipfile, and pip's default will be used. Unfortunately, this does not apply to projects with modules outside of PyPI. Furthermore, since the Pipfile generation process is explicit by default, many existing projects would have to modify their qualifying Pipfiles to accommodate this pattern by removing the default index url.
  3. If an environment variable is set as a source in the Pipfile, the variable could be optionally set to provide a mirror. Unfortunately, this requires existing projects to modify their Pipfiles to accommodate this pattern, which is not ideal.
  4. If an environment variable, command line parameter, or configuration setting is used to override the PyPI index url with a (true) mirror, how would the override work? Would it assume the mirror's index should be specified in all calls to pip which would otherwise use the PyPI index? Would it require a change to existing Pipfiles? Would it require redefining how sources are specified, including an overrideable PyPI default? Something else?

Related discussion

1451

1783

This has been discussed in #python and #pypa on Freenode. After some constructive back-and-forth, it was decided that it'd be helpful to open an issue here for discussion. I appreciate everyone's effort towards resolving this issue.

Dependency Resolution Future API Change Behavior Change Discussion

Most helpful comment

It should override PyPI only, not other URLs. I guess there are probably only a few different PyPI URLs in use, so they can be listed, and if we miss one then someone will file a bug, it'll get added, and pretty soon we'll have all of them.

All 46 comments

/cc @uranusjr @ncoghlan @altendky @njsmith

I am persuaded that this is a thing that happens commonly (corporate FW / caching proxy) -- I feel we need an override setting to specify a mirror to use instead of pypi if we find it in the pipfile-- like PIPENV_PYPI_MIRROR or PIPENV_PYPI_CACHING_PROXY or something like that to specify that it should be tried first, sliced into sources in front of pypi basically.

Does that seem like it accomplishes the goal? If so, we can tag in the implementation genie to tell us why this is good or bad (@ncoghlan)

I'll start with a note of caution: until PyPI has implemented a package signing mechanism akin to PEP 458 to provide a TLS-independent way for pip to ensure that packages that nominally originate from PyPI actually match what PyPI published, then offering the ability to transparently redirect traffic to a different server is genuinely concerning from a security perspective.

Unfortunately, that particular attack vector is already open by way of pip.conf, so offering something comparable at the pipenv level isn't going to make anything any worse than it already is.

Beyond that, I think a general purpose repository URL rewriting mechanism could actually be easier to document and explain than something PyPI specific, at least at the base capability layer. Something like:

pipenv --override-source-url 'default=https://pypi-proxy.example.com/api' --override-source-url 'https://pypi.python.org/simple=https://pypi-proxy.example.com/api'  --override-source-url 'https://pypi.org/simple=https://pypi-proxy.example.com/api' install

(The only PyPI specific bit there would be using "default" to refer to pip's default download source, as specific in pip.conf).

Spelling out the entire source URL override map every time would be unwieldy to use in practice though, so a couple of options for CLI sugar might look like:

pipenv --override-source-urls <config file> install

pipenv --pypi-mirror https://pypi-proxy.example.com/api install

Whether or not to expose the --override-source-url layer immediately is a different question - it might make more sense to implement the simpler --pypi-mirror option first, and merely keep the possibility of --override-source-url and --override-source-urls as possible future options in mind while doing so.

A general {given URL: override URL} mapping was my first thought too, but on further consideration, there are some arguments for special-casing PyPI:

  • PyPI is pretty unique in having a well-known public URL and lots of mirrors

  • PyPI actually has multiple URLs (e.g., we'll probably have Pipfiles floating around for a while with both https://pypi.python.org/simple and https://pypi.org/simple, and maybe also https://pypi.python.org/simple/ and https://pypi.org/simple/ with the trailing slash?), and it'd be nice if we could solve this once instead of forcing each user to figure it out themselves

@njsmith See the --pypi-mirror <URL> sugar suggestion in the last part of my post - if the initial implementation focused solely on that, then the general URL rewriting capability could start out as an internal implementation detail (driven by the fact that "PyPI" has multiple URLs that all ultimately resolve to the same place), and then be considered for exposure as a feature in its own right later on (after it's been confirmed that it's working as desired for the primary --pypi-mirror use case).

Ah, right, I missed that :-)

Is there a general rule mapping command line arguments to some kind of more persistent configuration? I imagine that most users of this would want to set it up once and then forget it.

@ncoghlan wrote:

I'll start with a note of caution: until PyPI has implemented a package signing mechanism akin to PEP 458 to provide a TLS-independent way for pip to ensure that packages that nominally originate from PyPI actually match what PyPI published, then offering the ability to transparently redirect traffic to a different server is genuinely concerning from a security perspective.

If I'm reading my Pipfile.lock correctly there is no relationship stored between a package and which source it was installed from. Given that the existing featureset allows multiple sources to be specified isn't that creating a similar issue? A sync could end up getting a package from a different source than the one that was used to when creating the lockfile.

Pipfile.lock stores a list of acceptable artifact hashes for each pinned dependency, so once you've done a lock, surreptitiously replacing packages is difficult. At lock generation time, explicitly opting in to a source in Pipfile is saying "I trust this source not to mess with me, and will use TLS to verify that I'm actually talking to this point of origin". (I think there's an issue somewhere discussing the prospect of binding particular packages to particular source repos, although it may be in pip or one of the other PyPA repos, rather than here)

Changing the default index URL (or adding an extra index URL) in pip.conf, or using the override feature proposed here through a config file or shell profile based mechanism is different: that's saying "I, or some arbitrary process I ran at some point in time with write access to my home directory (such as an sdist's setup.py file), decided to configure my settings to trust this source of packages". And even a signing scheme like PEP 458 isn't a complete defence against those kinds of shenanigans if the public keys used for verification are themselves stored somewhere inside your home directory rather than in a directory that requires elevated privileges to modify.

There are good reasons why organisations with strict security requirements execute builds on locked down servers with only limited access to the internet at large, or otherwise monitor for these kinds of problems at the network level :)

Note also if you use multiple indexes and a package comes from the non-primary index it will be indicated in the lockfile.

The pep 458 concerns were essentially what I had in mind, since things that are different urls but in actuality point at pypi are different than if you just locally copied pypi and claimed it was the same.

I, or some arbitrary process I ran at some point in time with write access to my home directory (such as an sdist's setup.py file), decided to configure my settings to trust this source of packages

If this is your threat model, then I don't see how anything pipenv can do will effect it much. Someone who can modify your home directory config can also do things like insert a new directory on $PATH and insert a fake pipenv in there that does whatever they want.

@njsmith this is also pip’s threat model, because package installation requires the execution of arbitrary code from sdist setup.py files be allowed. That code indeed could overwrite things in your home directory like your settings, or add things to your path, or any number of things. That’s why explicitly privileging pypi (a know, trusted index) and requiring hash checking is a good step toward security. It allows centralized control and elimination of known security threats and identify verification of the packages you are downloading in a distributed fashion. What did the lockfile you downloaded say about the hash you should be getting? It doesn’t match what you’re getting from the index? In order for this mode of operation to fail you need to have failures at more than one of the local machine, index and network layer because you’re talking about having multiple corrupted packages in your application stack working in concert verifying hashes against a trusted index, and in many cases the hashes themselves came from yet another uninvolved source. So now you need to have at a minimum, all of the hash checking in both pip and pipenv somehow tampered with such that it generates hashes that are identical to the ones you are hoping for, but installs yet other malicious things?

I guess what I’m saying is, if your local machine is compromised there is nothing pip or pipenv is going to do to save you. But we can ensure that the package you’re downloading is the one you were looking for, from the place you were supposed to search for it, which can provide one element in the chain of security.

@ncoghlan @njsmith how does this all factor in with the move to push back against sudo pip install... and the general sense I think we all have that if you're going to use pip, you probably shouldn't also use your system package manager to install python things broadly speaking. This isn't really a pipenv question maybe, but it's where the discussion is right now and this might guide the next steps...

@techalchemy I don't see any connection to this topic at all? I think the conclusion of all the above is that letting users override which mirror pipenv uses for PyPI doesn't introduce any additional threats, and doing sudo pipenv doesn't even make sense in the first place, right?

@njsmith no I don't think anyone should use sudo pipenv, like I mentioned it's not really on topic but since we went a bit down the threat model path, I thought it was worth exploring. Specifically:

And even a signing scheme like PEP 458 isn't a complete defence against those kinds of shenanigans if the public keys used for verification are themselves stored somewhere inside your home directory rather than in a directory that requires elevated privileges to modify.
There are good reasons why organisations with strict security requirements execute builds on locked down servers with only limited access to the internet at large, or otherwise monitor for these kinds of problems at the network level :)

If a defense at least in some capacity relies on keys being stored in a privileged location, but we are advising against using privileged python installs, I think it's possibly worth discussing. Maybe I'm wrong. But it definitely seems related to @ncoghlan's comment (but not sudo pipenv, that should never be a thing)

Yeah that probably seemed like it came out of nowhere, just a random thought. Hopefully the additional context clears it up some

I vote we keep this issue on the topic of helping folks who need to use PyPI mirrors, rather than getting into a speculative discussion of how we might implement TUF. (Anyway, I don't think there's much we can or should do to try to defend against an attacker who has arbitrary write access to the the user's home directory.)

Okay, so lets define the behavior that we would expect or prefer. My current working understanding is that:

  • If --pypi-mirror is passed or PIPENV_PYPI_MIRROR is set, we should prefer that
  • Should we prefer it over PyPI only? How are we making the assessment as to whether a given index url is 'PyPI' -- we can't query it, so we would have to maintain a list
  • Should the list contain all possible permutations, or should we be content with using the two urls we've used in the past for generating Pipfiles as the things for which we should try the provided mirror first?

It should override PyPI only, not other URLs. I guess there are probably only a few different PyPI URLs in use, so they can be listed, and if we miss one then someone will file a bug, it'll get added, and pretty soon we'll have all of them.

Seems like the right approach to me.

What @njsmith said matches my perspective as well. The 3 repo URLs I'd suggest replacing in an initial PR would be:

The trailing-slash-or-not is likely better handled as a URL normalisation step, rather than by listing the URLs separately.

Note that the requests Pipfile does have a trailing slash (at time of writing), so we probably do need to handle this one way or another.

Right, my thought was:

  • maintain a list of URLs without trailing slashes
  • check incoming URLs for a trailing slash, and remove it if found (str.rstrip would likely be good enough for the task, even though it would remove an arbitrary number of trailing slashes, or else we could be stricter about it, and remove at most one trailing slash)

Awesome. I think this is enough to work with and simple enough to build. Thanks all!

Hope mirror feature could be added soon~

I am encountering this issue as well. The situation is:

  • Have an internal PyPI server with some private packages.
  • Have multiple Python applications that use Pipenv to manage their dependencies.
  • Some of the dependencies live on the internal PyPI server, and others on the community PyPI. The internal one redirects to the community PyPI for any packages not found.

My deployment strategy already sets up a system-wide pip.conf that refers to the internal PyPI server. Surprisingly, I found that this configuration is ignored by Pipenv.

I'm noticing that if I were to move/rename the interal PyPI, then several applications with Pipfiles would have to be updated and their Pipfile.lock files regenerated. A mirror option would provide the desired functionality. It would also work and feel less redundant if Pipenv could just read the system configuration for Pip.

PRs welcome on this one btw

Hi. I have the same need but I would split this override feature into another ticket.

Here is my expected behavior proposal:

  • a configuration file can be defined to set every value defined in [[source]] section of the Pipfile.
  • could be a toml file with only the [[source]] section of the Pipfile
  • The location of this file is heavily inspired by the rules defined for pip.conf (ex: /etc/pipenrc.toml, ~/.pipenvrc.toml
  • environ variables could also be defined to override these value (reminder: we need all value of [[source]]). To be defined
  • the current behavior of pipenv is now :

    • when creating a Pipfile, it takes the values from the configuration file / environment variable

    • if no configuration file or environment variable is defined, the current behavior of pipenv applies

    • pipenv continues to always generate the [[source]] section of the Pipfile

    • if a [[source]] section of the Pipfile exist, pipenv does not try to override whatsoever with values from the configuration file.

And in a second ticket, the —override options can be implemented. It makes sense for example inside a CI or something.

As a side note: we heavily use pipenv in production now, but I need to remind everyone too often that they need to change their Pipfile manually when they start a new project to hit our Arrifactory Pypi repository (for information, Nexus also does a Pypi cache for free and t works great!). We have a very limiting firewall and it is a very good practice inside a company to cache external dependencies, so they can be backed up and checked for vulnerabilities for instance.
If a simple feature similar to the general or user configuration file (like we already do for pip or npm), so that we deploy it on all our workstation so our developers do less mistakes, that would be perfect for me)

Maybe I missed something, but this seems like a regression. We've been on 11.6.0 for a while, and pipenv happily delegated to the settings in our pip.conf, which point to an internal pypi mirror.

Any idea when this broke? It makes pipenv completely unusable in our context. I'm having trouble seeing this as a "missing feature" when it was apparently working fine for a long time.

To be clear: after upgrading to 2018.05.18, even with the mirror specified in our Pipfile[.lock], pipenv tries to install new packages from pypi.org.

Maybe what I'm seeing is a separate issue from this one...

@brettdh It is hard to tell without seeing your environment, but I’d think it is not the same issue. I’d suggest you do some bisecting between releases to see exactly where this changed, and open a new issue for it.

I'm working on the PR for this.

I do think this was regressed vis a vis the default setting. It may have been caught in a wave of updates for pip 10 which are not released yet but I believe we can pick this up without too much difficulty if @JacobHenner isn’t already adding it

I presume you're talking about using devpi as caching proxy for official PyPi. For pip itself, you would need to modify /etc/pip.conf and /usr/lib64/python3.6/disutils/distutils.cfg for pip to use your local devpi server for all requests.

However, it looks like pipenv ignores these system-wide settings, so you are forced to modify the [[source]] config setting in Pipfile to reference your devpi server. But then if you publish your Pipfile externally, external contributors have to remove your [[source]] settings to actually build their own environment.

I think that pipenv should just respect the global settings from /etc/pip.conf and /usr/lib.../distutils.cfg

@polski-g

I presume you're talking about using devpi as caching proxy for official PyPi

Nexus Repository, but yeah, same idea.

However, it looks like pipenv ignores these system-wide settings

As @techalchemy mentioned, I believe that pipenv (11.6.0) used to respect pip.conf (homedir as well), but the latest version does not - specifically, there's a hard-coded pypi.org URL somewhere (dependency resolution, IIRC) that can't be overridden.

I think that pipenv should just respect the global settings from /etc/pip.conf and /usr/lib.../distutils.cfg

Agreed - though personally I haven't had to modify distutils.cfg in my use case.

IIRC there was a resolution to not respect pip.conf, but you’ll need to dig deep into the issue tracker to find it. In any case, the ship has sailed, and with PyPI mirroring almost done, this is unlikely to change in near future.

I'm fairly confident this feature will ship in the next release (which will ship in the next day or two with luck)

Also I'm not sure about this, but it's possible we might just need to call .load() after we create the config parser here to get the config defaults

https://github.com/pypa/pipenv/blob/master/pipenv/project.py#L573-#L577

@uranusjr as long as the mirroring configuration works (i.e. doesn't use that hardcoded pypi.org URL I mentioned), I don't see any problem with pipenv having its own configuration for this and ignoring pip's.

@brettdh Would you be able to checkout my branch and confirm it meets your
use case in your environment?

>

@JacobHenner yep, thanks. My initial testing with the --pypi-mirror option (pipenv install, pipenv lock) looks like it works fine. I left a small suggestion on the PR.

I'm a bit concerned, though, that hardcoded URLs to pypi.org still appear scattered across the pipenv sources. I can't be sure which ones are correctly overridden from [[source]] entries, and I can't remember exactly which workflow caused my issue above. So it's hard to tell if it's fixed. 😬

Yeah following this release I am planning a major code cleanup. Cli stuff moving to the cli, bubbling exceptions there and handling all the exits there, deduping duplicated code, etc. It’s going to be a lot of work and help will be appreciated if anyone wants to volunteer :p

Just pulled the recent version and it is still hardcoding the pypi.org in the sources. Is the goal to take the environmental variable or the pypi-mirror and put that as the default for [[source]]?

edit:

Just dug through the code.. Looks like you have

if PIPENV_TEST_INDEX:
    DEFAULT_SOURCE = {
        u"url": PIPENV_TEST_INDEX,
        u"verify_ssl": True,
        u"name": u"custom",
    }
else:
    DEFAULT_SOURCE = {
        u"url": u"https://pypi.org/simple",
        u"verify_ssl": True,
        u"name": u"pypi",
    }

I think if you changed that If PIPENV_TEST_INDEX to the environmental variable PIPENV_PYPI_MIRROR it would be a good start

The solution discussed here has long been implemented. The snippet you quoted is a default, i.e. used if you do not provide a source when creating the Pipfile.

No, the source should not change in the Pipfile. The goal of this change
was to allow users to override PyPI URLs with a mirror, _without_ changing
the Pipfile.

@JacobHenner The mirror handling code postprocesses the source list and replaces pypi.org URLs with references to the specified mirror.

That's what allows the mirror override to work even if there is an explicit pypi.org entry in the Pipfile. pipenv then relies on that same logic to override its own default source as well.

If there are currently cases where that postprocessing isn't being applied correctly, that's a new bug report against the already implemented feature, rather than a feature request.

I think that last comment was intended for @kylecribbs?

@JacobHenner Ah, sorry - I misinterpreted your comment as saying that this change hadn't achieved its original goal, rather than as a response to Kyle that aimed to clarify what that outcome actually was.

Was this page helpful?
0 / 5 - 0 ratings