An end to location field spam?

In a recent column I mentioned what I've termed "location spam" or "location field spam", but it wasn't until recently that I figured out how to stop it from filling my logs.

Location spam is a fake request made to your web server that has a bogus URL in the location field of the HTTP request. Normally, requests to your server are logged relative to the root (or beginning) of the document tree. For instance, a request for the file located at:

http://mghiemstra.com/about

... would have a location of "about". Location spam is generated by a program that (somehow) fakes a request to contain a full URL in the location field. It's likely makes a request like this:

http://mghiemstra.com/http://somebadserver.com/somebadserviceorproduct.php

Now we have a location field of "http://somebadserver.com/somebadserviceorproduct.php" The software that runs your website or web application likely logs requests for files that are not found on your site. Spam requests like this do not reside on your site, so your software dutifully logs the problem for an administrator to view. Some software attempts to make the job of an administrator easier, and puts the appropriate HTML tags around the bad URL so it can be clicked on to figure out what's going on. And there you have the dangers of this type of spam. Who know what lies on the other end of that URL? An administrator might feel inclined to click on the link. Maybe they're clicking on it from a browser on the webserver machine itself, or another restricted box that only administrators have access to? Chances are, this machine may be under-patched or using an old version of a web browser vulnerable to malicious exploit?

At worst, a server on your network now has a compromised administrator account and will be used to slowly take over your network. At the least, you've gone to a nasty site with some popups that you likely shouldn't be viewing at work.

Now, how to make this go away? Well... if your running apache, you could try re-writing this type of URL with a configuration like:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{THE_REQUEST} ^GET\ http:\/\/
RewriteRule ^.*$ page-not-found [G,L]

... where "page-not-found" is a page with a special message for your spammer. You could also use a similar rule just to tell the spamming program that the page they were attempting to look at is gone. What's above is convenient for me because the web server handles any requests (the [G,L]" means that the page requested is gone and that processing of the URL should stop there. I never see the requests in my software's logs after that, so I'm happy!