iNET Interactive - Online Advertising Agency
          
   Home    Authors    About    Login    Contact Us
   Search:   
Advanced Search     
  Articles

  Directories (11)
  Google (98)
  Interviews (8)
  Keywords (30)
  Link Development (40)
  Marketing (48)
  Meta Tags (7)
  Optimization (112)
  Promotion (30)
  SE News (642)
  Spiders & Robots (22)
  Submission (8)
  Traffic Analysis (6)
  Tools (7)
  Algorithm (11)
  PPC (17)
  Domain Names (6)
  SEO Services (39)
 
Want to receive new articles via e-mail? Click here!
/Home /Spiders & Robots

Working with robots.txt file 

  Views:    5661
  Votes:    2
by Pannu Jagdeep.S. 4/23/04 Rating: 

Synopsis:

Learn all about working with robots.txt file. A useful guide that talks about what robots.txt file is, its advantages & disadvantages, how to optimize & use robots.txt file to define the content you want excluded from indexing, thus saving the crawler's indexing time…
Pages: firstback2 3 4 5 6 forwardlast
The Article

What is the robots.txt file?

The robots.txt file is an ASCII text file that has specific instructions for search engine robots about specific content that they are not allowed to index. These instructions are the deciding factor of how a search engine indexes your website’s pages. The universal address of the robots.txt file is: www.domain.com/robots.txt . This is the first file that a robot visits. It picks up instructions for indexing the site content and follows them. This file contains two text fields. Lets study this example :

User-agent: *
Disallow:

The User-agent field is for specifying robot name for which the access policy follows in the Disallow field. Disallow field specifies URLs which the specified robots have no access to. An example :

User-agent: *
Disallow: /

Here “*” means all robots and “/ ” means all URLs. This is read as, “ No access for any search engine to any URL” Since all URLs are preceded by “/ ” so it bans access to all URLs when nothing follows after “/ ”. If partial access has to be given, only the banned URL is specified in the Disallow field. Lets consider this example :

# Research access for Googlebot.
User-agent: Googlebot
Disallow:

User-agent: *
Disallow: /concepts/new/

Here we see that both the fields have been repeated. Multiple commands can be given for different user agents in different lines. The above commands mean that all user agents are banned access to /concepts/new/ except Googlebot which has full access. Characters following # are ignored up to the line termination as they are considered to be comments.

Pages: firstback2 3 4 5 6 forwardlast

Similar/related articles:


 
  Sponsors