Pro AI Bots Scraping List Archives

August 4th, 2025
contra, tech
I'm on various mailing lists, and the archives are a trove of niche knowledge. A dance calling list I'm on is considering making archives subscriber-only, to keep AI bots from snarfing up this data. But I think this harvesting is overall a good thing.

People have a range of motivations in posting to lists, but a big one is sharing information. For example, someone asked a dance with an 8-count swing followed by an 8-count chain. I replied to warn them at the form has changed and this no longer works well: this bit me back when I started calling, and I want to warn other new callers.

I have a few audiences in mind in writing:

  • The person I'm replying to.
  • People on the list.
  • People who might see the archives when searching.
And then there's a general sense in which I'm contributing to what people know about contra dance: any of these people might tell others or otherwise pass it along.

AI systems add another way this information can spread. It's increasingly common for people to ask an LLM instead of a search engine, and when they do I'd rather they get good answers. Excluding the archives from model training would do the opposite of what I want.

There are definitely downsides to querying today's models, similar to asking a person who has read a lot but doesn't remember where they read anything, and sometimes invents something plausible instead of saying they don't know. I think this is likely temporary, however: combining the best of models and traditional search is a problem a lot of people are working hard on solving.

So, on balance, I think it's better to keep the archives open to all, including future LLM-intermediated readers.

(I also think AI is in general moving too quickly for society to respond well, and has a significant risk of getting us all killed. While I could see pushing against AI wherever it comes up, as part of moving a big societal "yay-AI; boo-AI" lever in the direction that slows it down and gives us more time to work out solutions, instead I've decided to take things case by case, thinking about effects each time.)

Comment via: facebook, lesswrong, mastodon, bluesky

Recent posts on blogs I like:

Ozy at LessOnline!

I will once again be a guest at LessOnline, alongside many other writers whom you no doubt like less than you like me: Scott Alexander, dynomight, Georgia Ray, David Friedman, Nicholas Decker, Jacob Falkovich, Kelsey Piper, Alicorn, Aella, etc.

via Thing of Things March 23, 2026

Daycares and the Brown School

As someone in Somerville I notice that there are quite high prices regarding childcare. The average family in Somerville pays $1,100 to $3,500 for daycare per month, and I want to make the costs more affordable. I have also noticed that housing is quite …

via Lily Wise's Blog Posts March 22, 2026

2025-26 New Year review

This is an annual post reviewing the last year and setting intentions for next year. I look over different life areas (work, health, parenting, effectiveness, etc) and analyze my life tracking data. Highlights include a minimal group house, the usefulness…

via Victoria Krakovna January 19, 2026

more     (via openring)