Working with the robots.txt file
- The robots.txt file is always named in all lowercase (e.g. Robots.txt or robots.Txt is incorrect)\
- Wildcards are not supported in both the fields. Only * can be used in the User-agent fields’ command syntax because it is a special character denoting “all”. Googlebot is the only robot that now supports some wildcard file extensions.
Ref: http://www.google.com/webmasters/faq.html#12
- The robots.txt file is an exclusion file meant for search engine robot reference and not obligatory for a website to function. An empty or absent file simply means that all robots are welcome to index any part of the website.
- Only one robots.txt file can be maintained per domain.
- Website owners who do not have administrative rights cannot sometimes make a robots.txt file. In such situations, the Robots Meta Tag can be configured which will solve the same purpose. Here we must keep in mind that lately, questions have been raised about robot behavior regarding the Robots Meta Tag. Some robots might skip it altogether. Protocol makes it obligatory for all robots to start with the robots.txt thereby making it the default starting point for all robots.
- Separate lines are required for specifying access to different user agents and Disallow field should not carry more than one command in a line in the robots.txt file. There is no limit to the number of lines though i.e. both the User-agent and Disallow fields can be repeated with different commands any number of times. Blank lines will also not work within a single record set of both the commands.
- Use lower-case for all robots.txt file content. Please also note that filenames on Unix systems are case sensitive. Be careful about case sensitivity when defining directory or files for Unix hosted domains.
You can use this great tool to check your robots.txt from www.searchengineworld.com:
The robots.txt Validator
Please note that the full path to the robots.txt file must be entered in the field.