::  Posts  ::  RSS  ::  ◂◂RSS  ::  Contact

Quotation Detection

July 20th, 2012
tech, algorithms  [html]
I want to thread comments: when a comment quotes another comment it's probably a reply, so attach them together to make the conversation clearer. This means figuring out which comments quote which other comments. I tried a brute force way:
  for each comment A:
    for each later comment B:
      for each word X in A:
        for each word Y in B:
          do A and B match for N words starting at X and Y?
This is psuedocode for an O(n^2*m^2) solution. Not so good. I coded it up in javascript and while I think it's fast enough in Chrome, Firefox, and even my phone, it took IE8 nearly a minute. While I could probably run this on my server and do fine with some combination of caching and C, it seems inefficient. Is there a better algorithm? What is the general version of this problem called?

Update 2012-07-21: Several commenters suggested a better algorithm: to find all quotes of length N, build a dictionary from all sequences of N words to a list of comments in which they appeared. This is O(n*m) and running it it's much faster. I tested it in IE8, and it loaded quickly instead of freezing the browser. I thought briefly about doing something like this earlier, but wrote it off as using insane amounts of memory. After more people suggested it, I realized it only uses N times as much memory as just storing the comments.

Comment via: google plus, facebook

Recent posts on blogs I like:

Absolute scale corrupts absolutely

The Internet has gotten too big. Growing up, I, like many computery people of my generation, was an idealist. I believed that better, faster communication would be an unmitigated improvement to society. "World peace through better communication,"…

via apenwarr August 19, 2019

Traces

At naptime Anna listens to recordings of novels recorded by Jeff’s grandmother. It is the main way she will know Winnie, as it is the main way I have ever known Winnie. Some of the recordings are missing parts, and Suzie often fills in the first few sente…

via The whole sky August 18, 2019

One Night Ultimate Werewolf

I really like One Night Ultimate Werewolf (ONUW), and recommend trying it out. The Daybreak expansion is also good (but I’d skip the later games in the series). App For those who’ve played a lot and wish the night took 20 seconds rather than a few minutes…

via The sideways view August 18, 2019

more     (via openring)

More Posts:


  ::  Posts  ::  RSS  ::  ◂◂RSS  ::  Contact