Bots are currently scraping the internet for LLM training data at unprecedented rates[1][2][3], driving up costs and destabilizing public-facing websites. I want to talk about how this has been particularly difficult for wikis, and has gotten much worse in the last few months.
I get over it, but It’s still kinda funny how the first line of “defense” is having the bot say that it’s a bot, and not Google Chrome.