Ad Fraud Detection Prediction Market

January 17th, 2023
ads, tech
In my post on the GDPR status of ads I wrote that I expected that European data protection agencies would more likely than not rule that collecting personal information for ad fraud detection requires consent:

Is it within the legitimate interests of sites to collect user data for ad fraud detection? The ad industry has historically thought that it was. For example, the IAB's TCFv2, the standard protocol consent popups use to talk to ad networks, categorizes ad fraud detection under "Special Purpose 1", with users having "No right-to-object to processing under legitimate interests". On the other hand, based on points 52 and 53 of the recent Microsoft ruling I would predict that French regulators would rule that since users do not visit sites to see ads, sites cannot claim that they have a legitimate interest in using personal data to attempt to determine whether their ads are being viewed by real people.

This is not settled; among other things the Microsoft ruling was primarily considering ePrivacy which is stricter on some points. But I think it's more likely than not that when we get clarity from the regulators it will turn out that the kind of detailed tracking of user behavior necessary for effective detection of ad fraud is not considered to be within a publisher's legitimate interests.

There was some informed pushback on this, from Hugo Roy and Michael Kleber: a privacy lawyer and a privacy engineer. This has definitely pushed me in the direction of thinking I've misunderstood the situation and it's more likely that the conventional ad industry interpretation is correct. Which would be a good thing in my book: as I said at the end of my post I do think ad fraud detection should be permitted without user opt-in.

But I did want to write more about why I had, and to some extent still hold, the view I did. As Hugo referenced, fraud detection is specifically called out in the GDPR as a legitimate interest:

The processing of personal data strictly necessary for the purposes of preventing fraud also constitutes a legitimate interest of the data controller concerned.
Recital 47

The main way I could see a decision preventing the continued operation of economically effective ad fraud detection is that a court might rule that the status quo involves collecting too much. Ad fraud detection looks something like collecting every signal you can and then looking for patterns that distinguish people from bots. How do you know if a signal will be useful? Start logging it and feed it into the analysis. A company might have trouble convincing a court that they really need all this data, especially when there's collection they can't justify in terms of current utility. But if this gets limited to where you can only collect what you can show is immediately useful, and don't have a way to learn from real traffic what new signals you might want to be logging, then more and more bots will get around detection.

A secondary way I could see a decision like this happening is if a data protection agency decided that, while a publisher has a legitimate interest in preventing itself from being defrauded by the user, it doesn't have a legitimate interest in (delegating) collecting data to demonstrate that it is not defrauding its advertisers. Yes, there is a sense in which the data collection is for the purpose of preventing fraud, but it's essentially the publisher preventing themself from committing fraud. A court could draw a distinction between intrusion into an individual user's privacy for the purpose of determining whether that individual user is defrauding the publisher, but not for the purpose of determining whether there's fraud happening between the publisher and the advertiser, which the user has nothing to do with.

This is speculation: can we use a prediction market to get a better estimate? The one I made earlier on the GitHub Co-pilot litigation seems to be going well so far, so here's another market:

Before the comments from Roy and Kleber I would have put this at ~65%; now I'm at ~45%. But if you think I'm wrong, take my (play) money!

Comment via: facebook, lesswrong, mastodon

Recent posts on blogs I like:

Development RCTs Are Good Actually

In defense of trying things out

via Thing of Things March 25, 2024

Clarendon Postmortem

I posted a postmortem of a community I worked to help build, Clarendon, in Cambridge MA, over at Supernuclear.

via Home March 19, 2024

How web bloat impacts users with slow devices

In 2017, we looked at how web bloat affects users with slow connections. Even in the U.S., many users didn't have broadband speeds, making much of the web difficult to use. It's still the case that many users don't have broadband speeds, both …

via Posts on March 16, 2024

more     (via openring)