::  Posts  ::  RSS  ::  ◂◂RSS  ::  Contact

Quotation Detection

July 20th, 2012
algorithms, tech  [html]
I want to thread comments: when a comment quotes another comment it's probably a reply, so attach them together to make the conversation clearer. This means figuring out which comments quote which other comments. I tried a brute force way:
  for each comment A:
    for each later comment B:
      for each word X in A:
        for each word Y in B:
          do A and B match for N words starting at X and Y?
This is psuedocode for an O(n^2*m^2) solution. Not so good. I coded it up in javascript and while I think it's fast enough in Chrome, Firefox, and even my phone, it took IE8 nearly a minute. While I could probably run this on my server and do fine with some combination of caching and C, it seems inefficient. Is there a better algorithm? What is the general version of this problem called?

Update 2012-07-21: Several commenters suggested a better algorithm: to find all quotes of length N, build a dictionary from all sequences of N words to a list of comments in which they appeared. This is O(n*m) and running it it's much faster. I tested it in IE8, and it loaded quickly instead of freezing the browser. I thought briefly about doing something like this earlier, but wrote it off as using insane amounts of memory. After more people suggested it, I realized it only uses N times as much memory as just storing the comments.

Comment via: google plus, facebook

Recent posts on blogs I like:

Veganism and restrictive eating

I’m reading the book Intuitive Eating, which I highly recommend. I was looking for something like it that could get me back to trusting my biological hunger without worrying that I need to control myself or my weight. It’s raised my consciousness to the w…

via Holly Elmore January 17, 2020

Cops on Public Transportation

I wrote a post about American moral panics about fare evasion two months ago, which was mirrored on Streetsblog. I made a mistake in that post that I’d like to correct – and yet the correction itself showcases something interesting about why there are arm…

via Pedestrian Observations January 17, 2020

Algorithms interviews: theory vs. practice

When I ask people at trendy big tech companies why algorithms quizzes are mandatory, the most common answer I get is something like "we have so much scale, we can't afford to have someone accidentally write an O(n^2) algorithm and bring the site d…

via Posts on Dan Luu January 5, 2020

more     (via openring)

More Posts:


  ::  Posts  ::  RSS  ::  ◂◂RSS  ::  Contact