Jinja: urlencode doesn't escape slashes

Created on 20 Nov 2015  ·  21Comments  ·  Source: pallets/jinja

This was raised in #444, but this code always evaluates to b'/' on Python 3.4.1, so slashes still not escaped.

Most helpful comment

I hit this issue when trying to create a file in a gitlab repository via API calls. The gitlab api requires the slashes to be encoded. To make this work I do: {{ myvar | urlencode | regex_replace('/','%2F') }}. I'm working with Ansible and Jinja2 filters in my playbook tasks. This could be a workaround for those of you hitting this, as I validated it works.

All 21 comments

Just a heads up @mitsuhiko
I have no idea why, but do_urlencode still doesn't escape slashes, while unicode_urlencode works as expected.

Python 3.4.3 (default, Oct 14 2015, 20:28:29)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jinja2
>>> jinja2.utils.unicode_urlencode("http://url.by", for_qs=True)
'http%3A%2F%2Furl.by'
>>> jinja2.filters.do_urlencode("http://url.by")
'http%3A//url.by'
>>> jinja2.__version__
'2.9.dev'

Because the urlencode filter does not escape slashes. Is there a specific reason why it has to? To clarify this: it only encodes slashes in the value position of passed key/value pairs.

I believe this is standard behaviour for function intended for encoding urls. Tools like http://meyerweb.com/eric/tools/dencoder/ behave this way. Also I have at least one tool inside our company which expects URLs being passed inside GET requests with escaped slashes.
Also, I don't understand why unicode_urlencode called inside do_urlencode with for_qs=True.
May be I understand something wrong.

Slashes are reserved characters in the path component and the more common behavior is to encode everything but slashes there when forcing things to url encoded behavior. The alternative (to encode slashes to %2f) does not even make sense as most servers outright reject those requests due to security problems as backend servers typically cannot distinguish %2f and / in the path component as they operate on decoded octets.

So the only part where a slash actually makes sense encoding is in query strings and this is where the dict based encoder that urlencode has works like that. However even there a slash does not have to be encoded, so there is no reason to force it to be encoded.

The urlencode function should use for most people by default that's why it does not encode a slash. If you have custom requirements then you can override the function in your filter registration.

Ok. Thank you Armin.

Hi, I got a same trouble with reject slash, here is the code:

jinja2.Template("{{ disks|reject('sameas', '/')|list }}").render(disks=["/", "/mnt/disk0", "/mnt/disk1"])
u"['/', '/mnt/disk0', '/mnt/disk1']"

I want to reject root disk, but it not work anymore, how to solved it?

>>> import jinja2
>>> jinja2.Template("{{ disks|reject('sameas', '/')|list }}").render(disks=["/", "/mnt/disk0", "/mnt/disk1"])
u"['/mnt/disk0', '/mnt/disk1']"
>>> jinja2.__version__
Out[3]: '2.8'

Works for me.

Still not work for me, @ThiefMaster, would the a python issue? what's the version of python you used.

    Python 2.7.10 (default, May 23 2015, 09:44:00) [MSC v.1500 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import jinja2
    >>> jinja2.Template("{{ disks|reject('sameas', '/')|list }}").render(disks=["/", "/mnt/disk0", "/mnt/disk1"])
    u"['/', '/mnt/disk0', '/mnt/disk1']"
    >>> jinja2.__version__
    '2.8'

Oh.. sameas uses is (and you cannot expect 'foo' is 'foo' to work). You want equalto which uses ==.

Yes it works, I'm not family with Jinja2. Thanks, @ThiefMaster

So the only part where a slash actually makes sense encoding is in query strings and this is where the dict based encoder that urlencode has works like that. However even there a slash does not have to be encoded, so there is no reason to force it to be encoded.

No, for example you need to encode / in usernames and passwords as well.
It's the reason why JS has encodeURI __and__ encodeURIComponent.

That's fair enough but in practice it's not necessary there either and included credentials are deprecated anyways. Since those are unlikely to be produced within templates it's an edge case that is not really worth considering.

Ansible uses Jinja, and it's pretty common to handle security credentials when setting up systems. I just hit a case where an automatically-generated password contained a slash that was not replace by urlencode to generate a database URL, which is pretty unfortunate. While breaking current behavior would be problematic, why not introduce a second filter that does escape the slashes?

Ansible could do that. There is no need for such a change to be in Jinja itself - it is extensible enough to add custom filters or even replace builtin ones.

@ThiefMaster Are use cases other than constructing HTML templates irrelevant when determining what is useful to be included in Jinja itself? For example, the Saltstack project, with similar purpose to Ansible, also uses Jinja for templating, and would benefit from the same change.

@danielkza what stops saltstack from providing a filter that does that?

@mitsuhiko Why does Jinja include any built-in filters then? I can only guess it is because they're useful in multiple use cases. I used Ansible and Salt as two examples of where being able to escape slashes in URLs is desired, and hence, it would be valuable to have it available for everyone.

What about adding a safe argument to urlencode, as Python's urllib.url_quote has, so that by default slashes are preserved, but in a way that can be easily overriden?

Jinja attempts to provide some commonly used functionality. We have two modes for urlencoding which gets you about 95% there. You can encode entire querystrings by encoding a dict and you can encode to a common set which is valid for paths through urlencode on strings.

We don't do anything other than utf-8 or that. Because where would it stop. There are too many parts of a url, there are iris and they all have their own kinks. When we are there, why not just also provide a punycode encoder for the netloc?

I hit this issue when trying to create a file in a gitlab repository via API calls. The gitlab api requires the slashes to be encoded. To make this work I do: {{ myvar | urlencode | regex_replace('/','%2F') }}. I'm working with Ansible and Jinja2 filters in my playbook tasks. This could be a workaround for those of you hitting this, as I validated it works.

I see some debate about what characters should be percent encoded, to my understanding this is currently covered in RFC3986. https://tools.ietf.org/html/rfc3986#section-2.2

Though for my use case, @ahuffman has provided a reasonable work around.

I want to close this because it came up again. My arguments against having this were up already in here earlier but I want to reiterate them because there is actually a PR open currently (#864) which proposes to add another filter for it to help GitLab's API.

GitLab's API broken if placed behind all kinds of proxy setups and there are issues open for it (Example Issue). I would instead propose to document a workaround and why people should not do it.

Example workaround could be this:

{{ value|urlencode|replace("/", "%2f") }}

This is a bit taking a stance but doing this can encourage people not to repeat the mistakes of others that came before them.

Was this page helpful?
0 / 5 - 0 ratings