• Posts
  • RSS
  • ◂◂RSS
  • Contact

  • Quotation Detection

    July 20th, 2012
    algorithms, tech  [html]
    I want to thread comments: when a comment quotes another comment it's probably a reply, so attach them together to make the conversation clearer. This means figuring out which comments quote which other comments. I tried a brute force way:
      for each comment A:
        for each later comment B:
          for each word X in A:
            for each word Y in B:
              do A and B match for N words starting at X and Y?
    
    This is psuedocode for an O(n^2*m^2) solution. Not so good. I coded it up in javascript and while I think it's fast enough in Chrome, Firefox, and even my phone, it took IE8 nearly a minute. While I could probably run this on my server and do fine with some combination of caching and C, it seems inefficient. Is there a better algorithm? What is the general version of this problem called?

    Update 2012-07-21: Several commenters suggested a better algorithm: to find all quotes of length N, build a dictionary from all sequences of N words to a list of comments in which they appeared. This is O(n*m) and running it it's much faster. I tested it in IE8, and it loaded quickly instead of freezing the browser. I thought briefly about doing something like this earlier, but wrote it off as using insane amounts of memory. After more people suggested it, I realized it only uses N times as much memory as just storing the comments.

    Comment via: google plus, facebook

    Recent posts on blogs I like:

    The Gift of It's Your Problem Now

    Recently a security hole in a certain open source Java library resulted in a worldwide emergency kerfuffle as, say, 40% of the possibly hundreds of millions of worldwide deployments of this library needed to be updated in a hurry. (The other 60% also …

    via apenwarr January 1, 2022

    The container throttling problem

    This is an excerpt from an internal document David Mackey and I co-authored in April 2019. The document is excerpted since much of the original doc was about comparing possible approaches to increasing efficency at Twitter, which is mostly information tha…

    via Posts on December 18, 2021

    Experiences in raising children in shared housing

    Sometimes I see posts about people’s hope to raise children in a group housing situation, and it often seems overly optimistic to me. In particular they seem to expect that there will be more shared childcare than I think should be expected. Today I talke…

    via The whole sky October 18, 2021

    more     (via openring)


  • Posts
  • RSS
  • ◂◂RSS
  • Contact