I've been letting my servers run on auto pilot a bit too long, only using updates once in awhile reacting to email alarms lately. I didn't have a full appreciation for how many web crawling bots have been hitting them. In particular my instance and my repositories. The bot was particularly nasty, but and were also pretty aggressive.

I was getting 500 errors once in awhile on mastodon because of it but didn't clue in until kallithea crashed...

1/2

Follow

...not only does show complete disregard for robots.txt, it also ignores any sort of session management cookies etc. As such, every single file it crawled created a new anonymous session in kallithea, creating a file in my sessions store until the nodes were exhausted on the filesystem and it crashed.

I finally had the need to put aggressive bot/crawler blocking on my nginx reverse proxy for the first time. I think being on the fediverse right attention to myself lol...

2/3 (oops!)

...anyways a little PSA:

Instance admins: even if your site is small be sure to aggressively block web crawlers. More and more they ignore robot.txt etiquette, and in light of recent archiving incidents you want to have some control over distribution of user's public posts.

Fediversians: REALLY BE CAREFUL about what you publicly post. Delete works only on reliable fediverse servers. Evil bots don't respect post deletes when hoarding data.

Stay safe! :blobpats:

3/3 (EOF)

@msh I have a rant that's about 100 miles long and laden with descriptive expletives regarding SemRushBot and what I'd like to do it. And that's exactly what it had done at my server as well!

I have blocked on .htaccess level everything that even sniffs like SemRushBot, Baidu and other misbehaving webcrawlers...😡 🤬

Sign in to participate in the conversation
COALES.CO - Come Together!

Micro-blogging site operated by Mark Shane Hayden of Coalesco Digital Systems Inc. We are located in Alberta, Canada. This is NOT intended to be a commercial/promotional site! Registration is open to anyone interested in civil discussions on any interesting topic--especially technology, current events and politics.