• Posts
  • RSS
  • ◂◂RSS
  • Contact

  • Quotation Detection

    July 20th, 2012
    algorithms, tech  [html]
    I want to thread comments: when a comment quotes another comment it's probably a reply, so attach them together to make the conversation clearer. This means figuring out which comments quote which other comments. I tried a brute force way:
      for each comment A:
        for each later comment B:
          for each word X in A:
            for each word Y in B:
              do A and B match for N words starting at X and Y?
    
    This is psuedocode for an O(n^2*m^2) solution. Not so good. I coded it up in javascript and while I think it's fast enough in Chrome, Firefox, and even my phone, it took IE8 nearly a minute. While I could probably run this on my server and do fine with some combination of caching and C, it seems inefficient. Is there a better algorithm? What is the general version of this problem called?

    Update 2012-07-21: Several commenters suggested a better algorithm: to find all quotes of length N, build a dictionary from all sequences of N words to a list of comments in which they appeared. This is O(n*m) and running it it's much faster. I tested it in IE8, and it loaded quickly instead of freezing the browser. I thought briefly about doing something like this earlier, but wrote it off as using insane amounts of memory. After more people suggested it, I realized it only uses N times as much memory as just storing the comments.

    Comment via: google plus, facebook

    Recent posts on blogs I like:

    Collections: The Queen’s Latin or Who Were the Romans? Part IV: The Color of Purple

    This is the fourth part (I, II, III) of our series asking the question “Who were the Romans?” and contrasting the answer we get from the historical evidence with the pop-cultural image of the Romans as a culturally and ethnically homogeneous society typic…

    via A Collection of Unmitigated Pedantry July 23, 2021

    The Leakage Problem

    I’ve spent more than ten years talking about the cost of construction of physical infrastructure, starting with subways and then branching on to other things, most. And yet there’s a problem of comparable size when discussing infrastructure waste, which, …

    via Pedestrian Observations July 23, 2021

    Songs about terrible relationships

    [Spoilers for several old musicals.] TV Tropes lists dozens of examples of the “I want” song (where the hero of a musical sings about their dream of escaping their small surroundings). After watching a bunch of musicals on maternity leave, I’m wondering h…

    via The whole sky July 17, 2021

    more     (via openring)


  • Posts
  • RSS
  • ◂◂RSS
  • Contact