iNET Interactive - Online Advertising Agency
          
   Home    Authors    About    Login    Contact Us
   Search:   
Advanced Search     
  Articles

  Directories (11)
  Google (105)
  Interviews (8)
  Keywords (30)
  Link Development (40)
  Marketing (48)
  Meta Tags (7)
  Optimization (112)
  Promotion (30)
  SE News (706)
  Spiders & Robots (22)
  Submission (8)
  Traffic Analysis (6)
  Tools (7)
  Algorithm (11)
  PPC (17)
  Domain Names (6)
  SEO Services (39)
 
Want to receive new articles via e-mail? Click here!
/Home /Spiders & Robots

Writing a Robots.txt file 

  Views:    1759
  Votes:    0
by Vinoth Babu 5/08/06 Rating: 

Synopsis:

Write a Robots.txt File One of the most fundamental steps when optimizing a website is writing a robots.txt file. It helps tell spiders what is useful and public for sharing in the search engine indexes and what is not. In addition, a poorly done robots.txt file can stop the search spiders from crawling and indexing your website properly. In this article I will show you how to be sure everything will work correctly.
Pages: 
The Article

Write a Robots.txt File

 One of the most fundamental steps when optimizing a website is writing a robots.txt file. It helps tell spiders what is useful and public for sharing in the search engine indexes and what is not. In addition, a poorly done robots.txt file can stop the search spiders from crawling and indexing your website properly. In this article I will show you how to be sure everything will work correctly.Some SEO would say that using robots.txt file would not improve your search engine rankings, i would disagree with this point, many search engines have publicly said to use robots.txt file. Here is a quote taken from google

 "Make use of the robots.txt file on your web server. This file tells crawlers which directories can or cannot be crawled. Make sure it's current for your site so that you don't    accidentally block the Googlebot crawler."

 Also if you read your stats file on your web hosting server, you will usually find the URL to your robots.txt being requested. If a search bot asks for the robots.txt and does not find it on your server, the spider often just leaves. Let us now see how to build a robots.txt file

 Write a Robots.txt File - How Do I Build a Robots.txt?

 After opening Notepad (or another text editor), save the blank file as robots.txt. The file must be placed on the root level of your webserver or in other words the same folder where your index page exists ( index.php or index.html).The text file is actually a list. Its directions consist of two fields, or lines of instruction.Now here is an important part , there are two important lines :

 The first line is the User-agent line.

This is the line where you can specify which search spider bots are allowed to index your sites.

 

The second line is the directive line or disallow field.

This is the line you will use to block folders or files blocked from spiders.

 
Here a question may arise why should we disallow certain folders or files, some folders may be private or protected for users or visitors and they could stop search spiders from indexing their pages in the folders which your have specified.

 
To write the robots.txt file, you would start by addressing specific search engines. The User-agent line would start as:

    User-agent:

 Adding a specific search engines spider name here will give the search spider notice that it is to follow the next line for instruction, i.e.:

 
    User-agent: googlebot

 
Now you specify googlebot how to index your pages, what pages must be spidered and what not to be.This tells googlebot that it is to follow the next line's directions on how to proceed through your website, or to leave altogether.

 

The second line known as the directive is written as:

 

    Disallow:

 

By adding a folder after the Disallow statement, the search spider should ignore the folder for indexing purposes and move to others where there is no restriction.

 

    Disallow: /downloads/

 

You can also disallow specific files this way

 

    Disallow: cheeseyporn.htm

 

 

If you leave the Disallow directive line blank or not filled in, this indicates that ALL files may be retrieved and or indexed by specifiedl robot(s). This would let all robots index all files.

 

    User-agent: *

    Disallow:

 

And vice versa you can keep all robots out easily.

 

    User-agent: *

    Disallow: /

 

Since the root directory is blocked, none of the other folders and files can be indexed or crawled. Your site will be removed from search engines once they read your robots.txt and update their indexes.But i think no one would willing your site to be removed from the search engines, until you are a gaint like yahoo or microsoft who might feel that there is a lot of bandwidth wasted by allowing your site indexed for regular updates.

 

 

You can provide multiple Disallows to one User-agent. In the following example, all spiders will be told not to index the cgi-bin and the images directories.

 

    User-agent: *

    Disallow: /downloads/

    Disallow: /images/

 

 If the pages are cleanly coded, this will often result in improved rankings in all three of the major search engines.After you have written your robots.txt file and placed it on your server, you should validate it with one of the robots.txt validation tools online.

Pages: 

Similar/related articles:


 
  Sponsors