::  Posts  ::  RSS  ::  ◂◂RSS  ::  Contact

Quotation Detection

July 20th, 2012
algorithms, tech  [html]

I want to thread comments: when a comment quotes another comment it's probably a reply, so attach them together to make the conversation clearer. This means figuring out which comments quote which other comments. I tried a brute force way:
  for each comment A:
    for each later comment B:
      for each word X in A:
        for each word Y in B:
          do A and B match for N words starting at X and Y?
This is psuedocode for an O(n^2*m^2) solution. Not so good. I coded it up in javascript and while I think it's fast enough in Chrome, Firefox, and even my phone, it took IE8 nearly a minute. While I could probably run this on my server and do fine with some combination of caching and C, it seems inefficient. Is there a better algorithm? What is the general version of this problem called?

Update 2012-07-21: Several commenters suggested a better algorithm: to find all quotes of length N, build a dictionary from all sequences of N words to a list of comments in which they appeared. This is O(n*m) and running it it's much faster. I tested it in IE8, and it loaded quickly instead of freezing the browser. I thought briefly about doing something like this earlier, but wrote it off as using insane amounts of memory. After more people suggested it, I realized it only uses N times as much memory as just storing the comments.

Comment via: google plus, facebook

Recent posts on blogs I like:

High-Speed Rail in Small, Dense Countries

Four years ago I brought up the concept of the small, dense country to argue in favor of full electrification in Israel, Belgium, and the Netherlands. Right now I am going to dredge up this concept again, in the context of intercity trains. In a geographi…

via Pedestrian Observations October 12, 2019

What do executives do, anyway?

An executive with 8,000 indirect reports and 2000 hours of work in a year can afford to spend, at most, 15 minutes per year per person in their reporting hierarchy... even if they work on nothing else. That job seems impossible. How can anyone make any im…

via apenwarr September 29, 2019

Taxing investment income is complicated

How should a state tax investment income if it wants to maximize its citizens’ welfare? This sounds like a simple question but I find it surprisingly hard to think about. Here are some of the positions I’ve moved through over the last few years: Taxing in…

via The sideways view September 22, 2019

more     (via openring)

More Posts:

  ::  Posts  ::  RSS  ::  ◂◂RSS  ::  Contact