To: Heading_
Smallsite Design logo (SD) 390x390px

Smallsite Design

Build a site, not just pages

11  Currently blocked bots

!

Smallsite design blocks several bots that do play by the rules and so may produce excessive page accesses that block site usage if not rejected.

Search engine robots (bots) check a site regularly for updated content so they can update their search engine results, usually by initially reading the site's current sitemap.xml file. In a site's robots.txt file, the time between the request for each file is specified. However, many bots ignore this file and blast read a site's pages, and often many times a day. Smallsite design has a list of these, updated with each version, that allow it to block the requests from consuming more site resources than necessary.

However, some bots may be accessing the site so much that it overwhelms the site in what may be a distributed denial of service (DDoS). If that occurs, a modest attack may be slowed down by an optional flag in Smallsite Design to add an extra one second delay to accesses, and so hopefully back off.

More onerous attacks may require temporarily redirecting the domain through Cloudflare's attack mode protection for a few days until the attack subsides through being denied access. This will likely slightly delay normal accesses and affect performance, both of which are far better than no one reaching the site. Even though an attack might only be on a subdomain, the setting is per domain, and thus affects all its subdomains. After the attack, the domain can be removed from Cloudflare and redirected back to your hoster.

Some bots may be run through legitimate companies' sites, such as Google, but are experimental, so may have inadvertently run rogue due to poor design. Known ones of these have also been included in the blocking list as they also may not obey the robots.txt rate limiting, so some facilities offered by those sites, such as image search may be blocked, or a site's pages not being available for preview. Smallsite Design blocks direct access to images by default to prevent them being hijacked by other sites while your site incurs the bandwidth hit, so they are not available for searching anyway.

While large sites may legitimately use some of these bots for SEO analysis or penetration testing, Smallsite Design sites do not generally need them. If wanting to use these tools, do not use this product for your site.

Search site bots used for normal search are not blocked as they obey robots.txt, but they can optionally be blocked in Smallsite Design's settings.

The list

All these may ignore the request to limit their access rate to one page per minute.

Some of these lists include general purpose tools that are often used for the purposes cited. If the creators of bots using those tools were actually being careful, they would have chosen more indicative names and so may not have appeared in these lists to be blocked.

Blocked legitimate but excessive non-search bots are: AhrefsBot, SemrushBot, BLEXBot, MJ12bot, DotBot, masscan, zgrab, zmap, nikto, nmap.

Blocked vulnerability scanner and security tool bots are: url, wget, sqlmap, python-requests, Scrapy, crawler4j, libwww-perl, Java/, Apache-HttpClient, HTTrack, sitebulb, Screaming Frog.

Blocked web-scrapping library bots are: Grabber, winhttp, urlcrazy, vacuum, EmailCollector, EmailSiphon, LinkDexBot, NetTool, OpenVAS, acunetix.

Blocked headless browser and automation framework bots are: Arachni, wpbot, brainsey, blackwidow, crawler, harvest, parsehub, spbot, turnitinbot, GoogleOther.

Blocked social and search preview bots are: AdsBot, BingPreview, facebookexternalhit, meta-externalagent, HeadlessChrome, Puppeteer, Playwright, PhantomJS, phantom.

LinksLatest articles&Subsite links

Powered by   Smallsite Design  ©Smallsite™  Privacy   Manage\