Determining What People Want to See

January 4th, 2012
tech
One of the major open problems of the modern internet is how to figure out what someone is interested in seeing. What news stories will I enjoy? What comments are worth reading? The general problem is incredibly broad: you want to know about email, web sites, articles, comments, tweets, status updates, website changes, site private messages, mailing lists, blogs, and others. You don't have time to read even a tiny fraction of everything, and different sources vary hugely in how much you want to read them, so you want some sort of filtering system.

The traditional answer was to hire editors. Publication was expensive and only practical on a large scale, so the extra cost of people to decide what should be published was unavoidable and relatively small. With the internet the constraints have changed: publication is much cheaper, the overhead for personalization is far lower, and you can get much better feedback from readers.

One new solution that couldn't have surfaced without the internet is voting. Sites that use this explicitly include Reddit, Digg, and Hacker News. Users vote links and comments up or down, votes 'decay' with time, and that controls what everyone else sees. This gets you a hybrid of popularity and recency. It's vulnerable to people creating fake accounts and vote rings, but the people running the sites seem to do a good job with automatic countermeasures.

Voting is also vulnerable to "bad judgment": people voting up things that I'm not interested in. Reddit dealt with this somewhat by creating subreddits: sections of Reddit devoted to specific topics, where you can control which ones show up for you. So I see posts on Boston, giving, music, and don't see posts from pics, funny, politics, etc. This works reasonably well, but I've found myself going to Reddit less and less over time as I've found fewer things that interest me.

The Hacker News approach to "bad judgment" is more explicit: there is one site, some content guidelines, and users have a strong culture of flagging things that they see as bringing down the quality of the site. This seems to work better, or it just aligns more with my interests, but I read it a lot more than Reddit nowadays.

Another problem is that people mostly don't vote, and when they do vote it's usually on things that have already been recognized as good. While sites do have sections for new posts most readers don't see things until they get to the front page, which means the decision about what gets to the front page is made by the small number of people watching the 'new queue'.

Social networks can take a different approach because they have more information: they know some people who you're likely to have similar interests with. Facebook shows me posts by people I've friended (taking into account how much we interact), and then it does something similar to the voting sites by treating 'like' and interaction as upvotes. This is how you get the newish default "highlighted stories first" algorithm:

If you switch to Facebook's recency option, it still does some filtering; I'm pretty sure the new upper right pane is the full unfiltered real time feed for all your Facebook friends. The full feed isn't so interesting, so you can see why they're putting work into filtering.

A very different approach is the rss reader and mailing list approach: you pick some set of sources for things you're interested in, and then it shows you every new item ordered by recency. I like this a lot, because there are a lot of people (mostly friends, some not) where I want to read everything they have to say. I do need to be careful to add/remove things based on how interested I am in their content, though, which is more work than other options require.

The main thing I find frustrating in trying to understand the current best solutions to this problem is that everyone running a system has a strong incentive to keep its workings secret so people have a harder time gaming or copying it.

Comment via: google plus, facebook

Recent posts on blogs I like:

The Grimke Sisters and Sexism

The necessity of birth control

via Thing of Things April 22, 2024

Clarendon Postmortem

I posted a postmortem of a community I worked to help build, Clarendon, in Cambridge MA, over at Supernuclear.

via Home March 19, 2024

How web bloat impacts users with slow devices

In 2017, we looked at how web bloat affects users with slow connections. Even in the U.S., many users didn't have broadband speeds, making much of the web difficult to use. It's still the case that many users don't have broadband speeds, both …

via Posts on March 16, 2024

more     (via openring)