Tried to go here with Google Chrome: http://hosts-file.net/default.asp?s=sshc.org
and got the following WebKnight Alert: Your request triggered an alert! If you feel that you have received this page in error, please contact the administrator of this web site.
What is WebKnight?
AQTRONIX WebKnight is an application firewall for web servers and is released under the GNU General Public License. It is an ISAPI filter for securing web servers by blocking certain requests. If an alert is triggered WebKnight will take over and protect the web server.
How did I trigger that alert. Anyone to explain?
If I reguest through the Google searchpage I get no alert and will normally land where i planned to go…
Then it must be my GoogleChrome configuration, because with other browsers (IE, fx, etc) I can normally go there.
Asked them for looking into the request logs to have a better explanation,
Part of the mystery got solved from what I heard about the access logs on the other end. Very friendly guy, the maintainer of the sites there. I was told it was due to a browser “slurp” request, and that they just had adjusted the configuration. I did not use User Agent switcher, so it could not be that. I think it is also good that he has gotten feedback for the authentication strength of the eventual blocking. I thought I owed the folks here to report this,
How can they detect you are not a spider that likes to pass as a human?
Look here: http://www.iplists.com/ and check the logs with this listing and excessive activity (well there are some humans that may wanna map a site, because they wanna see all the links there). This concept was posted by Dave Sherollman at Stackoverflow
There are some trapdoors that can be used to get bots that pose as human browser agents -
Adding a directory only listed (marked as disallow) in the robots.txt,
Adding invisible links (possibly marked as rel=“nofollow”?),
style=“display: none;” on link or parent container
placed underneath another element with higher z-index
detect who doesn’t understand CaPiTaLiSaTioN,
detect who tries to post replies but always fail the Captcha.
detect GET requests to POST-only resources
detect interval between requests
detect order of pages requested
detect who (consistently) requests https resources over http
detect who does not request image file (this in combination with a list of user-agents of known image capable browsers works surprisingly nice)
List was presented by Jacco author of a thread on Stackoverflow by the name “Detecting ‘stealth’ web-crawlers”,