Evolving HTTP Header Fields

Created by Julian Reschke, greenbytes GmbH / julian.reschke@greenbytes.de

Topics

  • Specification Problems
  • Implementation Problems
  • Internationalisation
  • Further Observations
  • A Concrete Proposal
  • Other Thoughts

Specification Problems

  • many similar header fields with subtle parsing differences
  • quoting, delimiters, allowable characters
  • repeating header fields vs list syntax

Similar but different

List of parametrized tokens.

WWW-Authenticate, Cache-Control, Content-Type, Link...

  • parameters comma-separated (WWW-Authenticate) or semicolon-separated (Content-Type)
  • token sometimes have values (Cache-Control)
  • parameter values sometimes optional
  • different types of token quoting (<uri> in Link header field)
  • double quotes not always used for quoted-string

List Syntax

Spec allows recombining multiple field values using comma as delimiter

If the field syntax allows unquoted commas, it may be impossible to detect whether recombination has happened

Implementation Problems

Recipients

  • lots of ad-hoc parsers (simple string comparison, substring matching), for instance for Cache-Control and WWW-Authenticate
  • inconsistent list handling (read all, read first only, read last only)
  • race to the bottom: if one recipient accepts garbage, others will follow
  • parsing bugs make it harder to introduce new things by using extension points

Implementation Problems

Senders

  • header field values generated by string concatenation, causing invalid field values in edge cases (filename=foo bar.txt)

Internationalisation

  • no generic solution for non-ASCII characters
  • this includes HTTP/2
  • UTF-8 can work when intermediaries/libs do not interfere
  • but HTTP APIs (XHR, servlet API) default to ISO-8859-1 (though it can be used to tunnel UTF-8)
  • servers/clients depending on locales or context (referring pages), messages not self-contained anymore

RFC 5987

  • (ugly) solution for parameters, used in Content-Disposition and Link header fields
  • filename*= UTF-8''%e2%82%ac%20rates

Further Observations

  • only few header fields need I18N (it needs to work, but it doesn't necessarily need to be compact)
  • only few header fields are sent frequently thus need a compact representation ("DNT", "Date", "User-Agent", ...)
  • HPACK is wonderful for field values that do not change all the time

A Concrete Proposal

  • leave existing header fields alone for now
  • start with a format that can be used in new header fields
  • find something that helps in general (in http/1.1 and http/2) but which make things easier in the future (hpack2?)
  • have a unified data model, and a syntax that works well for that data model

JSON

Other Thoughts

  • Name overloading has been done before: "Sec-..." for XHR, "param*" in RFC 5987, but that doesn't mean we like it...
  • Alternatives to defining new header fields: use existing fields such as "Link" and "Prefer"