Werkzeug: Werkzeug not conforming to RFC2616

Created on 26 May 2016  ·  6Comments  ·  Source: pallets/werkzeug

In particular, internally Werkzeug converts headers of type "Transfer-Encoding" or "X-PoweredBy" into "HTTP_TRANSFER_ENCODING" and "X_POWERED_BY" respectively. However, if two headers "App-Header-1" and "App_Header_1" were both present in the request -- or even just "App_Header_1" -- both will be stored in the same way internally, moreover Werkzeug does not distinguish between the two. This behavior is not conforming to the RFC2616 HTTP 1.1 Spec, where the Header name take can be any valid token without control or separator characters. This has many issues, as if a request came in with "App_Header_1" it will then be converted to "HTTP_APP_HEADER_1", now frameworks like Flask will interpret it as "App-Header-1" which is not the same, and passing those values on to systems that do depend on the correct behavior will fail as "App_Header_1" != "App-Header-1" under any sane comparison strategy.

Most helpful comment

The behavior is not caused by the WSGI spec, if you are referring to PEP 3333.
PEP 3333 states that:
HTTP_ Variables
Variables corresponding to the client-supplied HTTP request headers (i.e., variables whose names begin with "HTTP_" ). The presence or absence of these variables should correspond with the presence or absence of the appropriate HTTP header in the request.

There is nothing in the Spec about treating underscores differently. In fact, you are not conforming to PEP 3333 either because I cannot know which header is present or absent, i.e. if "App_H" or "App-H" is present.

All 6 comments

This behavior is caused by the WSGI spec itself, and it is impossible for Werkzeug to make a distinction between those two headers while still adhering to the WSGI spec. I don't know of any situation where this has been an issue in practice.

RFC 2616 is deprecated by the way, but the newer RFCs don't change anything in that regard anyway.

Also Werkzeug is not responsible for parsing or writing HTTP requests or responses on a syntactical level. This is done by whichever WSGI server you choose. Werkzeug does have a built-in server, but that only extends the WSGI server from the standard library.

The behavior is not caused by the WSGI spec, if you are referring to PEP 3333.
PEP 3333 states that:
HTTP_ Variables
Variables corresponding to the client-supplied HTTP request headers (i.e., variables whose names begin with "HTTP_" ). The presence or absence of these variables should correspond with the presence or absence of the appropriate HTTP header in the request.

There is nothing in the Spec about treating underscores differently. In fact, you are not conforming to PEP 3333 either because I cannot know which header is present or absent, i.e. if "App_H" or "App-H" is present.

WSGI is a Python-specific port of CGI. PEP 0333 references the CGI spec for the definition of "CGI environment variables", which states:

The HTTP header field name is converted to upper case, has all occurrences of "-" replaced with "_" and has `HTTP_' prepended to give the meta-variable name.

See https://tools.ietf.org/html/rfc3875#section-4.1.18

All of this is irrelevant as Werkzeug, for the most part, does not parse raw HTTP requests and does not produce the WSGI environment. It parses the WSGI environment. I don't see why you're filing this issue against this particular implementation. Don't shoot the messenger.

Basically there's a lot of legacy from CGI here, which is inherited by WSGI, which is what Werkzeug is trying to implement.

I really don't mean to start a fight; But it is a problem with security implications, especially for web servers that sit behind "legacy"-ish application gateways that pass in sensitive / application specific information in headers containing underscores without filtering other headers (I have concrete examples, that is, this is actually an issue in practice, which you have dismissed by your initial comment). CGI is over 20 years old, hence WSGI and HTTP specs should take precedence.

As for raw parsing:
https://github.com/pallets/werkzeug/blob/03faf0569861e9d8c8c94785ad5560f735ba72da/werkzeug/serving.py#L123

(I have concrete examples, that is, this is actually an issue in practice, which you have dismissed by your initial comment)

Then all I can say is that I'm sorry you have a server like that. Werkzeug can't do anything about it. The snippet you showed was precisely why I said "for the most part". Werkzeug also includes a simple testserver that can be used in development (to avoid having to set up Gunicorn + Nginx/Apache, for example). But if you're exposing that server to the internet, the potential issue you mention will be the least of your (security) problems.

WSGI was designed to be compatible with CGI to the most extent possible. WSGI does not override or supersede CGI. WSGI _is_ CGI. There is very little work involved in making a WSGI application a valid CGI application. os.environ of a Python CGI script looks mostly like the equivalent WSGI environment. There is a lot of legacy to consider when changing anything about this. Werkzeug as a project doesn't have a say in this, and especially not me.

Again, this issue for this project is pointless. Werkzeug is not alone in this behavior. Last time I checked nginx blocks HTTP headers with underscores by default, due to the whole ambiguity issue. Most other tools probably just conflate them. If most tools can't tell the difference between a dash and a underscore, it does not matter anymore when the standard says there is one. Everybody is forced to play along, and eventually the tool emitting headers with underscores is to blame.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

golf-player picture golf-player  ·  10Comments

miki725 picture miki725  ·  10Comments

mhelmetag picture mhelmetag  ·  8Comments

taion picture taion  ·  7Comments

lepture picture lepture  ·  6Comments