{"items": [{"author": "Paul", "source_link": "https://www.facebook.com/jefftk/posts/185523338212161?comment_id=185524268212068", "anchor": "fb-185524268212068", "service": "fb", "text": "\"What you will want to have read.\" What tense is that? :)", "timestamp": "1325701139"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/185523338212161?comment_id=185526681545160", "anchor": "fb-185526681545160", "service": "fb", "text": "@Paul: future perfect", "timestamp": "1325701486"}, {"author": "Mac", "source_link": "https://www.facebook.com/jefftk/posts/185523338212161?comment_id=185552734875888", "anchor": "fb-185552734875888", "service": "fb", "text": "Your opening sentence uses the third person \"someone\", while your second sentence uses the first person \"I\".  My assumption is that your goal is to become efficient in filtering down to stuff _you_ want to spend time on.<br><br>Consider analogs to print in pre-internet times.  You corresponded by letter with select friends;  the return address would be something of a filter criterion.  You subscribed to newspapers and news letters in which you were interested, leaving the selection of content to editors whom you trusted.  For cultural reasons (to stay in touch with the culture in which you lived), you paid attention to the most popular topics around the water cooler, but did not immerse yourself in all of them.<br><br>All of these mechanisms and methods obtain in the internet world:  Facebook or equivalents connect you to friends and relatives about whom you are generally interested.  Paper newsletters became serial emails, webpages, blogs or tweets.  Google News feeds us the most popular stories, and Google News can be parametrized to filter the most popular stories to your tastes and interests.<br><br>As in yesteryear, today's primary challenge is to let in most of the stuff you want to read, with a filter that serendipitously allows in enough random stuff to find unanticipated interesting topics.  Every time I read a print edition of the Boston Globe, I find articles that leave me saying, \"Wow, I'm glad I saw that.\"  This is the difference between push and pull news delivery.  Your filters are pull mechanisms, while the totally of the Globe has a lot of pushed content.  Don't overfilter yourself.  Randomness has its beauty.", "timestamp": "1325704304"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/185523338212161?comment_id=185593241538504", "anchor": "fb-185593241538504", "service": "fb", "text": "@Walker: This is a problem that affects a lot of people, not just me.  It probably makes sense to solve it centrally because what one person is glad they read is an important signal predicting what another would think.<br><br>With letters and print mail in general there is another big filter: it needs to be worth it for someone to pay to mail to you.  This is one reason printed event invitations I receive are much more to my interest than electronic ones.<br><br>\"Don't overfilter yourself. Randomness has its beauty.\"<br><br>This is in reaction to reading the Globe?  But that's not anywhere near random!  The words you read are from professional journalists trying to publish some combination of \"newsworthy\" and \"sells papers\".  Even picking up random books from the library isn't random because a lot of filtering went into determining whether something got published and then again into whether the library should have it.  Random wikipedia article? [1] Subject to notability, verifiability, and other guidelines.  It's really hard to get a representation of random, unfiltered content, and I think it's not something you want to let in, primarily because it's probably junk.<br><br>[1] http://en.wikipedia.org/wiki/Special:Random", "timestamp": "1325709491"}, {"author": "opted out", "source_link": "#", "anchor": "unknown", "service": "unknown", "text": "this user has requested that their comments not be shown here", "timestamp": "1325709593"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/185523338212161?comment_id=185596168204878", "anchor": "fb-185596168204878", "service": "fb", "text": "@Justin: \"display stories stochastically in a form of live A/B testing\"<br><br>Reddit does this a little.  At the top of the default view you have a space that sometimes displays a single 'new' link:<br><br>http://www.jefftk.com/reddit_new_preview.png", "timestamp": "1325709888"}, {"author": "opted out", "source_link": "#", "anchor": "unknown", "service": "unknown", "text": "this user has requested that their comments not be shown here", "timestamp": "1325709997"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/185523338212161?comment_id=185597994871362", "anchor": "fb-185597994871362", "service": "fb", "text": "@Justin: HN does this for comments.  In a way facebook does this, because most of what goes into determining whether you will see something is an invisible score.", "timestamp": "1325710142"}, {"author": "Allison", "source_link": "https://plus.google.com/103741579182942078941", "anchor": "gp-1325714889310", "service": "gp", "text": "Some of the folks in my research group are actually working on this problem (and I will be soon, too).  I can point you in the direction of papers, if you like.", "timestamp": 1325714889}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://plus.google.com/103013777355236494008", "anchor": "gp-1325714943942", "service": "gp", "text": "@Allison\n sure!", "timestamp": 1325714943}, {"author": "David&nbsp;Chudzicki", "source_link": "https://plus.google.com/106120852580068301475", "anchor": "gp-1325715493599", "service": "gp", "text": "Major problem with the Reddit (etc.) approach seems to be a strong bias in favor of things that can quickly be viewed, appreciated, and voted up. \n<br>\n<br>\nA 5 page article that I save to Instapaper is at a disadvantage, even of really good. ", "timestamp": 1325715493}, {"author": "Allison", "source_link": "https://plus.google.com/103741579182942078941", "anchor": "gp-1325716617677", "service": "gp", "text": "@Jeff&nbsp;Kaufman\n \n<br>\ngeneral algorithms: \nhttp://en.wikipedia.org/wiki/Recommender_system#Algorithms\n<br>\nyahoo is good about publishing: \nhttp://research.yahoo.com/publication\n<br>\na paper from my group: \nhttp://www.cs.princeton.edu/~chongw/papers/WangBlei2011.pdf\n<br>\nJonathan Chang was in my group a bit ago and now works at Facebook; his research blog: \nhttp://pleasescoopme.com/", "timestamp": 1325716617}, {"author": "George", "source_link": "https://www.facebook.com/jefftk/posts/185523338212161?comment_id=185802201517608", "anchor": "fb-185802201517608", "service": "fb", "text": "Why not use collaborative filtering for each user and content based filtering for items that haven't been voted on much? The website, for each user, asks the user to vote on what they want/don't want to see. Then it builds a statistical model based on other users vote history and performs collaborative filtering, similar to how netflix recommends movies. For a fixed population of things to recommend, this will work great and will keep getting better as you rate more things. So it needs a way to present totally new things to users when no one has rated them. This can be a combination of user submissions and content based filtering. Reading over what I wrote it doesn't seem very clear... hopefully you get what I am saying?", "timestamp": "1325738308"}, {"author": "Todd", "source_link": "https://plus.google.com/112947709146257842066", "anchor": "gp-1325740984779", "service": "gp", "text": "My general approach (this applies less to things like Facebook/G+, more to things like RSS) is to do almost no work to find content, instead letting others alert me to things that might interest me, and investigating only then. It has the advantage that I waste little time on unproductive research. Obviously, it has the disadvantage that I may be missing out on a lot of things, but since I don't know that I'm missing out on them in any but the most abstract sense, that has very little impact on how I assess my own utility.\n<br>\n<br>\nThat said, I'd welcome a system that allowed me more discovery tailored to my interests without requiring dramatically more work on my part (relying on friends obviously also has the disadvantage that they aren't out there finding things for \nmy\n sake). But I suspect the right solution is pretty heavily invested in AI.", "timestamp": 1325740984}, {"author": "Todd", "source_link": "https://plus.google.com/112947709146257842066", "anchor": "gp-1325743618542", "service": "gp", "text": "It just occurred to me that this makes me a free rider. People should be charging me for this service! =P", "timestamp": 1325743618}, {"author": "Allison", "source_link": "https://plus.google.com/103741579182942078941", "anchor": "gp-1325745906800", "service": "gp", "text": "has anyone looked at \nhttp://hunch.com\n?  (or better yet, actually used it?)  It's shtick is being a recommender system for everything.\n<br>\n<br>\n@Todd\n One of the things I'm working on is a smarter RSS aggregator.  Recommendation is a little down the line, but a high priority.", "timestamp": 1325745906}, {"author": "Todd", "source_link": "https://plus.google.com/112947709146257842066", "anchor": "gp-1325747874659", "service": "gp", "text": "@Allison\n What does it say about me/my system if I say that I'll try \nhttp://hunch.com\n if/when I hear from friends that's it's worthwhile? Seems ironic or something similar, at least.", "timestamp": 1325747874}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/185523338212161?comment_id=185916714839490", "anchor": "fb-185916714839490", "service": "fb", "text": "@George: current sites seem to be based on finding new things, which mostly means newly written but sometimes means newly 'discovered'.  Which means you're usually in the case where you have little data on an article.<br><br>Though maybe the answer is a site that doesn't focus so much on being 'new' but instead focuses on showing you the best things you've not seen yet?", "timestamp": "1325763913"}, {"author": "George", "source_link": "https://www.facebook.com/jefftk/posts/185523338212161?comment_id=186038811493947", "anchor": "fb-186038811493947", "service": "fb", "text": "I doubt that is the case actually. How many people have to vote on a link on reddit before the typical user sees it? Hundreds? And truly new links can be presented for ratings to a select group of users based on the interaction between the learned user features and the submitting user's features and conditioned on features about the link itself.", "timestamp": "1325781119"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/185523338212161?comment_id=186043511493477", "anchor": "fb-186043511493477", "service": "fb", "text": "@George: \"How many people have to vote on a link on reddit before the typical user sees it? Hundreds?\"<br><br>Depends on your subreddits.  The lowest number of upvotes for anything in my top-25 is 9 for something from r/psychology, for reddit at the default set (incognito window) it's 114.  This is probably because I have unsubscribed from a lot of popular subreddits.", "timestamp": "1325781714"}, {"author": "George", "source_link": "https://www.facebook.com/jefftk/posts/185523338212161?comment_id=186047858159709", "anchor": "fb-186047858159709", "service": "fb", "text": "9 upvotes but how many total votes? That doesn't seem too bad at all, especially if my hypothetical site was more aggressive at soliciting ratings from users. Most reddit users don't vote. And link features and submitter features can still be used for completely novel links. Also, given that you are a user that has customized the subreddits, you are already one that is willing to provide more information to the system so if the system was different and more effective you might be willing to provide even more preference information.", "timestamp": "1325782263"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/185523338212161?comment_id=186056571492171", "anchor": "fb-186056571492171", "service": "fb", "text": "@George: \"9 upvotes but how many total votes?\"<br><br>10 (now it's 12 up out of 13)<br><br>\"if my hypothetical site was more aggressive at soliciting ratings from users ... if the system was different and more effective you might be willing to provide even more preference information\"<br><br>My guess this would depend on how effectively the site could use votes.  Right now there's a freerider problem where you gain little benefit from voting.", "timestamp": "1325783334"}, {"author": "George", "source_link": "https://www.facebook.com/jefftk/posts/185523338212161?comment_id=186058381491990", "anchor": "fb-186058381491990", "service": "fb", "text": "Well, that wouldn't be a problem. The only way to teach the site what you want to see is to give it ratings for links. Votes have basically no effect on reddit, but on this hypothetical site since there is a different model of what each user should see, they have a huge effect.", "timestamp": "1325783564"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/185523338212161?comment_id=186068478157647", "anchor": "fb-186068478157647", "service": "fb", "text": "@George: \"The only way to teach the site what you want to see is to give it ratings for links.\"<br><br>Well, if you never voted you would see what the average of the users liked, and people might be lazy enough not to mind.", "timestamp": "1325784896"}, {"author": "George", "source_link": "https://www.facebook.com/jefftk/posts/185523338212161?comment_id=186072904823871", "anchor": "fb-186072904823871", "service": "fb", "text": "Data from click-through would still be available, it is just a lot harder to use and much less informative.", "timestamp": "1325785462"}, {"author": "Allison", "source_link": "https://plus.google.com/103741579182942078941", "anchor": "gp-1325824468408", "service": "gp", "text": "@Lucas\n I want to cluster posts that cover overlapping material into \"stories\" and organize stories/posts automatically by topic rather than with a manually folder setup.  Recommendation would then serve the purpose of ordering posts within stories, stories within categories, and the categories themselves.  It could also serve to suggest new feeds to follow, given enough users.", "timestamp": 1325824468}, {"author": "Todd", "source_link": "https://plus.google.com/112947709146257842066", "anchor": "gp-1325829811369", "service": "gp", "text": "@Lucas\n I use RSS. But I don't do much/any work trying to find new things to subscribe to. In the case of RSS, that's at least partially because I'm only willing to devote so much time to reading it (and actually, I probably would prefer to devote less then I do currently; part of the problem is that I'm not terribly good at scanning through lots of headlines to find interesting ones). And a lot of the feeds I am subscribed to are things that were recommended to me in the past.", "timestamp": 1325829811}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://plus.google.com/103013777355236494008", "anchor": "gp-1325856041559", "service": "gp", "text": "@Lucas\n \"time everyone spends actively engaging with the content, which could serve as a rough proxy indicator for interest in the long tail of folks who don't (often) vote for stuff explicitly\"\n<br>\n<br>\nAn rss reader could measure this, if you did all your reading inside, but it wouldn't work for a reddit-style site because they have no way of knowing how much time you spend engaged on the other site.\n<br>\n<br>\nPeople might start voting more if it had more of an effect on what they saw. Voting doesn't affect what you see (currently) only what other people see, so most people don't bother with it. Nothing currently does a good job at getting much information out of votes.", "timestamp": 1325856041}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://plus.google.com/103013777355236494008", "anchor": "gp-1325942734930", "service": "gp", "text": "@Lucas\n  '''What do you mean by the idea that \"nothing currently does a good job at getting much information out of votes\"? I think sites like Reddit and Hacker News seem to do pretty well with relatively simple voting systems'''\n<br>\n<br>\nI was trying to say that out of the information one could extract from a vote current sites only get a small fraction.  They throw away \"person X cast this vote\" and only keep \"this item was voted up at this time\".\n<br>\n<br>\n\"the concept of voting is inherently about signaling that something is interesting to a wider group\"\n<br>\n<br>\nCurrently, but you could have a voting system where the point of voting is also to indicate what you want to see more/less of, training a filter.  This might require building a community up, starting a new site, instead of trying to change an existing site.", "timestamp": 1325942734}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://www.facebook.com/jefftk/posts/185523338212161?comment_id=187138251384003", "anchor": "fb-187138251384003", "service": "fb", "text": "Followup post: http://www.jefftk.com/news/2012-01-07.html", "timestamp": "1325947636"}, {"author": "Jeff&nbsp;Kaufman", "source_link": "https://plus.google.com/103013777355236494008", "anchor": "gp-1325947649865", "service": "gp", "text": "Followup post: \nhttp://www.jefftk.com/news/2012-01-07.html", "timestamp": 1325947649}]}