|August 26th, 2020|
|privacy, tech, webperf [html]|
Peter Snyder of Brave recently wrote a post on the proposed WebBundle specification arguing that it makes URL randomization easier and so is harmful to content blocking. The core of the post is incorrect, however, and based on a misunderstanding of what can be done today and what WebBundles make easier. Snyder and I discussed this some on HN and I wanted to take a step back and try to write up the issues more clearly.
A WebBundle (see explainer) allows you to serve a single file that the browser can treat as multiple files. This solves many long-standing problems, some of which I've been working around for a long time. A classic one is that many sites will have a large number of small files. Everyone who visits the site will need essentially all of these files, and you lose a lot of performance in individually requesting each one. When I worked on open source web page optimization software (mod_pagespeed) it could combine CSS, JS, and images so the browser could request a single 'bundle' for each, but it was not ideal:
Instead of one bundle containing the resources the site needs, you have at least three: CSS, JS, images.
There are differences in error handling and recovery that mean you can't just concatenate CSS or JS files. Even with code to work around these issues there were still differences that meant automatic bundling was a bit risky.
Image combining (spriting) is especially difficult, and our tool was less of a fully automatic image combiner than a tool to take the toil out of doing it manually.
A WebBundle is a much more straight-forward approach, where you tell the browser explicitly about your bundling instead of trying to glue a bunch of things together and pass them off to the browser as one.
I'm no longer working on mod_pagespeed, but these issues still come up; in my work now I'm interested in using WebBundles to allow a single ad response to provide multiple ads.
My understanding of Snyder's view, from their post, HN comments, and spec comments, is that bundles make it much easier to randomize URLs, and bypass URL-based content blockers. Specifically, that if you deliver a single bundle with both HTML and resources, it becomes very easy for you to randomize URLs.
On the other hand, sites like this are few. It is worth it for Facebook or Google Search to run their own ads, but most sites instead use an ad network. These networks typically integrate client-side, with JS. For example, to put AdSense ads on your page you include something like:
<script async src="https://pagead2.googlesyndication.com/ pagead/js/adsbygoogle.js"></script> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-NNNNNNNN" data-ad-slot="NNNNNNNN" data-ad-format="auto" data-full-width-responsive="true"> </ins>
adsbygoogle.js (the ad JS), which reads
configuration from the
<ins> and handles requesting
and rendering the ads. Ad blockers recognize
adsbygoogle.js by its URL and prevent it from loading and
Neither the publisher site nor the ad network can randomize the URL of the ad JS on their own: it's a cross-party integration point. If they worked closely together, perhaps by moving integration from client-side to server-side, then they could randomize URLs easily even with today's technology. A server-side integration is much more technically difficult, however, which is why I think we rarely see them. Bundles don't change the nature of this situation: a client-side integration still needs a consistent (and, hence, blockable) URL to load the ad JS, while a server-side integration doesn't become any easier.
Snyder is concerned about a world in which ad blockers aren't able to operate because they can't recognize ad resources by URL. While I think this is a reasonable concern, the WebBundle proposal is orthogonal to this problem, and does not bring us any closer to that sort of world.