• Posts
  • RSS
  • ◂◂RSS
  • Contact

  • History of the Public Suffix List

    February 7th, 2021
    history, tech  [html]
    Are forums.example.com and mail.example.com the same site? I'd say yes, since they're probably run by the same people. What about example-a.github.io and example-b.github.io? I'd say no, since GitHub allows anyone to register pages like username.github.io. I can make my judgments as a human, but what should the browser do? Should www.example.com be able to set a cookie that will be sent to mail.example.com?

    It is a bit of a hack, but the way browsers deal with this is a big list: the Public Suffix List. The PSL contains, for example, com and github.io, which tell us that example.com and example.github.io are independent sites. On the other hand, any subdomains are not separate sites: forums.example.com and mail.example.com. Have a look, it's pretty hairy: public_suffix_list.dat

    Browsers are somewhat ashamed of the hackiness of site, and nervous about the security risk of omissions, and so have generally used a much stricter concept of origin when introducing functionality. For example, https://a.example.com cannot write to localStorage in a way visible to https://b.example.com. As browsers work to prevent cross-site tracking, however, with privacy changes such as cache partitioning, the origin model is too strict. These mitigations generally use the PSL, and I wanted to look back at its origins.

    HTTP was originally completely stateless. This poses challenges if you want to implement per-user functionality, like a shopping cart. Netscape's solution, which the world adopted, was cookies. If you read the original specification, it has some discussion of how to prevent someone setting a cookie on all of .com:

    Only hosts within the specified domain can set a cookie for a domain and domains must have at least two (2) or three (3) periods in them to prevent domains of the form: ".com", ".edu", and "va.us". Any domain that fails within one of the seven special top level domains listed below only require two periods. Any other domain requires at least three. The seven special top level domains are: "COM", "EDU", "NET", "ORG", "GOV", "MIL", and "INT".
    This simple heuristic worked reasonably well at the time: it understands that example.com and example.co.uk are independent sites, separate from other .com or .co.uk sites.

    Perhaps because this special-cased domain names, it was not included in the first two attempts to standardize cookies, RFC 2109 (Feb 1997) and RFC 2965 (Oct 2000).

    There were, even from the beginning, cases that this heuristic did not handle. My library growing up was mln.lib.ma.us, which ideally would not have shared cookies with anything else under lib.ma.us. In 2000, however, ICAAN announced seven more TLDs, and initially browsers did not allow anyone to set cookies on example.info etc. It wasn't too bad, since you could still set a cookie on www.example.info, but you couldn't share it with forum.example.info.

    In 2005-2006, Mozilla decided to replace their inconsistent collection of heuristics and exceptions with an explicit list (b319643, b331510), effective_tld_names.dat. You can see the first public version on github (Mar 2007).

    The next round of cookie standardization, RFC 6265 in 2011, recommended projects use it:

    NOTE: A "public suffix" is a domain that is controlled by a public registry, such as "com", "co.uk", and "pvt.k12.wy.us". This step is essential for preventing attacker.com from disrupting the integrity of example.com by setting a cookie with a Domain attribute of "com". Unfortunately, the set of public suffixes (also known as "registry controlled domains") changes over time. If feasible, user agents SHOULD use an up-to-date public suffix list, such as the one maintained by the Mozilla project at <http://publicsuffix.org/>.

    This still doesn't explain how github.io got on the list: that's not a public registry, the way co.uk is. The first private registry to be added was operaunite.com, in November 2009 (b531252):

    The domain operaunite.com is used by Opera's new Unite feature (a small web server built into Opera 10.10, http://unite.opera.com/). Each instance of Opera Unite have a name server.username.operaunite.com. While some restrictions are being implemented in Unite, there are still some ways to set cookies for the operaunite.com domain, and we would like to restrict the impact by adding this domain to the public suffix list.

    Next were appspot.com for App Engine and blogspot.com for Blogger (b593818), though the Blogger change was rolled back for two years (b598911, b805367). These changes seem to have been uncontroversial; I don't see any pushback about how these are not "real TLDs".

    As more of these came in, there was discussion about how these were fundamentally different concepts (b712640, 2011), and the list was split into public ("BEGIN ICANN DOMAINS") and private ("BEGIN PRIVATE DOMAINS") sections. For example, no one should be able to get a wildcard cert for *.co.uk, but one for *.github.io still makes sense.

    Over the last ten years, I believe everyone has migrated to using Mozilla's list. It does take some time for updates to fully propagate, since the list is compiled into browsers, but having one place to update and one place to check for the definition of a site is pretty good.

    Comment via: facebook, lesswrong

    Recent posts on blogs I like:

    Not Everything is Like Rail Transport

    Sometimes, when I write about cost comparisons or public-sector incompetence, I see people make analogies to other fields. and sometimes these analogies are really strained. So I want to make this clear that I am talking about things that are specific to …

    via Pedestrian Observations April 30, 2021

    Collections: Teaching Paradox, Europa Universalis IV, Part I: State of Play

    This is the first post in a series that will be examining the historical assumptions of Paradox Interactive’s grand strategy computer game set in the early modern period, Europa Universalis IV. And this series will in turn be part of a larger series looki…

    via A Collection of Unmitigated Pedantry April 30, 2021

    Books and websites on babies

    Several people I know are expecting a first baby soon, and I wrote up notes for one of them. Might as well share here too: Medical:Scott Alexander’s Biodeterminist’s Guide to Parenting is an interesting read, and some parts are actionable.  If you live in…

    via The whole sky April 14, 2021

    more     (via openring)


  • Posts
  • RSS
  • ◂◂RSS
  • Contact