iNET Interactive - Online Advertising Agency
          
   Home    Authors    About    Login    Contact Us
   Search:   
Advanced Search     
  Articles

  Directories (11)
  Google (105)
  Interviews (8)
  Keywords (30)
  Link Development (40)
  Marketing (48)
  Meta Tags (7)
  Optimization (112)
  Promotion (30)
  SE News (706)
  Spiders & Robots (22)
  Submission (8)
  Traffic Analysis (6)
  Tools (7)
  Algorithm (11)
  PPC (17)
  Domain Names (6)
  SEO Services (39)
 
Want to receive new articles via e-mail? Click here!
/Home /Spiders & Robots

Working with robots.txt file 

  Views:    5904
  Votes:    2
by Pannu Jagdeep.S. 4/23/04 Rating: 

Synopsis:

Learn all about working with robots.txt file. A useful guide that talks about what robots.txt file is, its advantages & disadvantages, how to optimize & use robots.txt file to define the content you want excluded from indexing, thus saving the crawler's indexing time…
Pages: firstback1 3 4 5 6 forwardlast
The Article

Working with the robots.txt file

 - The robots.txt file is always named in all lowercase (e.g. Robots.txt or robots.Txt is incorrect)\

 - Wildcards are not supported in both the fields. Only * can be used in the User-agent fields’ command syntax because it is a special character denoting “all”. Googlebot is the only robot that now supports some wildcard file extensions.
Ref:
http://www.google.com/webmasters/faq.html#12 

 - The robots.txt file is an exclusion file meant for search engine robot reference and not obligatory for a website to function. An empty or absent file simply means that all robots are welcome to index any part of the website. 

 - Only one robots.txt file can be maintained per domain. 

 - Website owners who do not have administrative rights cannot sometimes make a robots.txt file. In such situations, the Robots Meta Tag can be configured which will solve the same purpose. Here we must keep in mind that lately, questions have been raised about robot behavior regarding the Robots Meta Tag. Some robots might skip it altogether. Protocol makes it obligatory for all robots to start with the robots.txt thereby making it the default starting point for all robots.

 - Separate lines are required for specifying access to different user agents and Disallow field should not carry more than one command in a line in the robots.txt file. There is no limit to the number of lines though i.e. both the User-agent and Disallow fields can be repeated with different commands any number of times. Blank lines will also not work within a single record set of both the commands.

 - Use lower-case for all robots.txt file content. Please also note that filenames on Unix systems are case sensitive. Be careful about case sensitivity when defining directory or files for Unix hosted domains.

You can use this great tool to check your robots.txt from www.searchengineworld.com:

The robots.txt Validator

Please note that the full path to the robots.txt file must be entered in the field.

Pages: firstback1 3 4 5 6 forwardlast

Similar/related articles:


 
  Sponsors