• Posts
  • RSS
  • ◂◂RSS
  • Contact

  • Quotation Detection

    July 20th, 2012
    algorithms, tech  [html]
    I want to thread comments: when a comment quotes another comment it's probably a reply, so attach them together to make the conversation clearer. This means figuring out which comments quote which other comments. I tried a brute force way:
      for each comment A:
        for each later comment B:
          for each word X in A:
            for each word Y in B:
              do A and B match for N words starting at X and Y?
    
    This is psuedocode for an O(n^2*m^2) solution. Not so good. I coded it up in javascript and while I think it's fast enough in Chrome, Firefox, and even my phone, it took IE8 nearly a minute. While I could probably run this on my server and do fine with some combination of caching and C, it seems inefficient. Is there a better algorithm? What is the general version of this problem called?

    Update 2012-07-21: Several commenters suggested a better algorithm: to find all quotes of length N, build a dictionary from all sequences of N words to a list of comments in which they appeared. This is O(n*m) and running it it's much faster. I tested it in IE8, and it loaded quickly instead of freezing the browser. I thought briefly about doing something like this earlier, but wrote it off as using insane amounts of memory. After more people suggested it, I realized it only uses N times as much memory as just storing the comments.

    Comment via: google plus, facebook

    Recent posts on blogs I like:

    Governance in Rich Liberal American Cities

    Matt Yglesias has a blog post called Make Blue America Great Again, about governance in rich liberal states like New York and California. He talks about various good government issues, and he pays a lot of attention specifically to TransitMatters and our …

    via Pedestrian Observations November 19, 2020

    Collections: Why Military History?

    This week, I want to talk about the discipline of military history: what it is, why it is important and how I see my own place within it. This is going to be a bit of an unusual collections post as it is less about the past itself and more about how we st…

    via A Collection of Unmitigated Pedantry November 13, 2020

    Misalignment and misuse: whose values are manifest?

    Crossposted from world spirit sock puppet. AI related disasters are often categorized as involving misaligned AI, or misuse, or accident. Where: misuse means the bad outcomes were wanted by the people involved, misalignment means the bad outcomes were wan…

    via Meteuphoric November 13, 2020

    more     (via openring)


  • Posts
  • RSS
  • ◂◂RSS
  • Contact