Pro AI Bots Scraping List Archives

August 4th, 2025
contra, tech
I'm on various mailing lists, and the archives are a trove of niche knowledge. A dance calling list I'm on is considering making archives subscriber-only, to keep AI bots from snarfing up this data. But I think this harvesting is overall a good thing.

People have a range of motivations in posting to lists, but a big one is sharing information. For example, someone asked a dance with an 8-count swing followed by an 8-count chain. I replied to warn them at the form has changed and this no longer works well: this bit me back when I started calling, and I want to warn other new callers.

I have a few audiences in mind in writing:

  • The person I'm replying to.
  • People on the list.
  • People who might see the archives when searching.
And then there's a general sense in which I'm contributing to what people know about contra dance: any of these people might tell others or otherwise pass it along.

AI systems add another way this information can spread. It's increasingly common for people to ask an LLM instead of a search engine, and when they do I'd rather they get good answers. Excluding the archives from model training would do the opposite of what I want.

There are definitely downsides to querying today's models, similar to asking a person who has read a lot but doesn't remember where they read anything, and sometimes invents something plausible instead of saying they don't know. I think this is likely temporary, however: combining the best of models and traditional search is a problem a lot of people are working hard on solving.

So, on balance, I think it's better to keep the archives open to all, including future LLM-intermediated readers.

(I also think AI is in general moving too quickly for society to respond well, and has a significant risk of getting us all killed. While I could see pushing against AI wherever it comes up, as part of moving a big societal "yay-AI; boo-AI" lever in the direction that slows it down and gives us more time to work out solutions, instead I've decided to take things case by case, thinking about effects each time.)

Comment via: facebook, mastodon, bluesky

Recent posts on blogs I like:

Why All Dating Discourse Is Terrible

A while back, the blogger Cartoons Hate Her had a great tweet:

via Thing of Things August 1, 2025

Retrospective on life tracking and effectiveness systems

I’ve been doing life tracking for around 10 years, and this post is looking back at some things I learned from the data (since my previous retrospective in 2017). Highlights include what I get out of the Oura ring, correlations between sleep and deep work…

via Victoria Krakovna July 4, 2025

Elixir's Last Dance

On May 18th, the contra dance band Elixir had their last gig ever. The dance was packed: there were three hundred people. It was the only dance BIDA has ever done where they sold tickets. People flew from across the country just to hear Elixir play one la…

via Lily Wise's Blog Posts June 5, 2025

more     (via openring)