Wednesday, September 19, 2012

Robots.txt file in MVC3

Recently I was asked to look at adding a robots.txt file to a client's website that we built on ASP.NET MVC3. So I did some research and found out some interesting information about the robots.txt file.
  1. The file is just a suggestion and bots are not required to follow what you ask them to do via the robots.txt file.  
  2. The file is really an exclusion list instead of an inclusion list.  This means you have to put the places you don't want the bots to view which could be a bad idea in that it would give the bad bots areas they should focus on hacking.
After my research we decided to not put a robots.txt file on the website initially. Soon after our deploy we noticed in the ELMAH logs that we were seeing a considerable amount of errors which contained this error message:
The controller for path '/robots.txt' was not found or does not implement IController.

So now we decided we at least needed an empty robots.txt file out there to prevent all these unnecessary errors. So I did some more research and developed a solution for MVC3:
  • Basically you just add the physical robots.txt file to the website by adding it to the project at the root level. It could be empty or could contain the basic level of content required in a robots.txt.
Now that you have the physical file on the website it will ignore the ASP.NET MVC3 routing as long as you haven't changed the default setting of the RouteExistingFiles property of the RouteCollection which will ignore routing if a physical file is found that matches the URL pattern.

To ensure that the physical file will always be served up even if someone changes the RouteExistingFiles property you can add the following ignore route code to the global.asax.cs file:
routes.IgnoreRoute("{robotstxt}", new {robotstxt=@"(.*/)?robots.txt(/.*"});

Your mileage may vary with the robots.txt file and it might not be a bad idea to have a robots.txt with some exclusions if you really need to exclude some of your content from web crawlers or bots.

This particular client didn't really need one because most if not all the content of their website required that you log into their website so bots and web crawlers wouldn't get much content from crawling their entire site.

No comments:

Post a Comment