Evolving HTTP Header Fields

Created by Julian Reschke, greenbytes GmbH / julian.reschke@greenbytes.de

Topics

Specification Problems
Implementation Problems
Internationalisation
Further Observations
A Concrete Proposal
Other Thoughts

Specification Problems

many similar header fields with subtle parsing differences
quoting, delimiters, allowable characters
repeating header fields vs list syntax

Similar but different

List of parametrized tokens.

WWW-Authenticate, Cache-Control, Content-Type, Link...

parameters comma-separated (WWW-Authenticate) or semicolon-separated (Content-Type)
token sometimes have values (Cache-Control)
parameter values sometimes optional
different types of token quoting (<uri> in Link header field)
double quotes not always used for quoted-string

List Syntax

Spec allows recombining multiple field values using comma as delimiter

If the field syntax allows unquoted commas, it may be impossible to detect whether recombination has happened

Implementation Problems

Recipients

lots of ad-hoc parsers (simple string comparison, substring matching), for instance for Cache-Control and WWW-Authenticate
inconsistent list handling (read all, read first only, read last only)
race to the bottom: if one recipient accepts garbage, others will follow
parsing bugs make it harder to introduce new things by using extension points

Implementation Problems

Senders

header field values generated by string concatenation, causing invalid field values in edge cases (filename=foo bar.txt)

Internationalisation

no generic solution for non-ASCII characters
this includes HTTP/2
UTF-8 can work when intermediaries/libs do not interfere
but HTTP APIs (XHR, servlet API) default to ISO-8859-1 (though it can be used to tunnel UTF-8)
servers/clients depending on locales or context (referring pages), messages not self-contained anymore

RFC 5987

(ugly) solution for parameters, used in Content-Disposition and Link header fields
filename*= UTF-8''%e2%82%ac%20rates

Further Observations

only few header fields need I18N (it needs to work, but it doesn't necessarily need to be compact)
only few header fields are sent frequently thus need a compact representation ("DNT", "Date", "User-Agent", ...)
HPACK is wonderful for field values that do not change all the time

A Concrete Proposal

leave existing header fields alone for now
start with a format that can be used in new header fields
find something that helps in general (in http/1.1 and http/2) but which make things easier in the future (hpack2?)
have a unified data model, and a syntax that works well for that data model

JSON

draft-reschke-http-jfv
solves most of the issues mentioned before
except verbosity

Other Thoughts

Name overloading has been done before: "Sec-..." for XHR, "param*" in RFC 5987, but that doesn't mean we like it...
Alternatives to defining new header fields: use existing fields such as "Link" and "Prefer"