Robots.txt - A Comprehensive Guide

Robots.txt – A Comprehensive Guide

What is Robots.txt?

Robot.txt protocol is also known as The Robot Exclusion Standard, which is a pattern to prevent the admittance of web robots and web crawlers to all other websites that are viewed by public. It is a text file not html that searches the pages that you want to view on your site with the help of search boots. Well, robots.txt is not compulsory for search engines, as they obey what is restricted to them properly. However, the search engine robots are automatic, and before they enter any page of a site, they ensures about the presence of a robots.txt file that would prevent it from entering some pages. If you want that your site should be indexed by search engines, then you do not require Robot.txt file. Well, it’s very important to locate robots.txt in an appropriate place.

Location of robots.txt

As far as location is concerned, you must locate robots.txt in the main directory else, search engines cannot detect it. The search engines search for my domain (dot) com/robots.txt in the main directory instead of searching for robots.txt file in an entire site. If it is unable to find in the main directory, they presume that there is no robots.txt file on this site and start indexing the whole site. Hence always try to locate robots.txt file in the right place. Well, it has been a long time that the conception and structure of this file have been designed. However, we will briefly discuss on it.

Don’t Miss : Quick Tip: How To Know If Google Is Indexing My Articles Or Posts

Structure of a robots.txt file

The structure contains a list of infinite disallowed files, user agents and directories, but the structure is very simple. Let us have a look at the syntax of robots.txt file:

User-agent:

Disallow:

The search engine crawlers are termed as “User-agent” and the list of directories and files that should not be included in indexing are termed “Disallow.” However, if you want to write any comment line, then start your line with # sign like:

# All user agents are disallowed to view the/temp directory

User-agent *

Disallow: /temp/

How to create a robots.txt file?

There are certain points that you should keep in mind while creating a robots.txt file. First, enlist the directories and files that you want to block from being indexed in your server. Second, decide whether you want to put some extra information’s for a specific search engine besides the general directives for crawling. Third, create a robots.txt file and commands by using a text editor to block your content. Fourth, to your sitemap file, you can add a reference, but this is optional. Fifth, conform your robots.txt file by checking the errors and finally, in the main directory, upload the robots.txt file. You can use a Robots.txt file generator instead, just make sure to carefully review it when it’s done. But there are certain rules that should be followed while creating a robots.txt file. Let us have a look at some examples that can make us clear.

Examples of robots.txt format

Allow indexing of everything

User-agent: *

Disallow:

Disallow indexing of everything

User-agent: *

Disallow: /

Disallowing indexing of a particular folder

User-agent: *

Disallow: /folder/

Except allowing indexing for one file in a folder, disallow Googlebot from crawling of a folder

User-agent: Googlebot

Disallow: /folder1/

Allow: /folder1/myfile.htm

Important rules for creating a robots.txt file

Parameters like “follow, noindex” should be written with some meta robots to control the indexation or crawling
For each URL, you can write only 1 Disallow line
Different robots.txt files are used by each sub domain that comes under a root domain
Talking about pattern expression, Bing and Google accept two particular expressions, (* and $)
Use robots.txt instead of Robots.TXT, because robots.txt is case sensitive
To separate query parameters, never use spaces as it is not accepted by robots.txt

Well, there certain tools that correct the mistakes in a robots.txt file.

Test a robots.txt file

With the help of these tools, you can know whether robots.txt file is blocking your file from your site or not. However, the search engine robot finds the robots.txt file and stops crawling of your sites.

Well, it is good that search engines visit our site regularly and index our content but sometimes indexing of content is not according to what we want. There are some sensitive data that should not be viewed by the whole world. So with the help of robots.txt file can prevent the search engines to index your site.

Comments

Mosam Gor says

February 3, 2013 at 2:36 pm

Hi Amit,

Great Post!
It is Quite Helpful for SEO and Webmaster Beginners!
Thanks For Sharing 🙂

Mosam

Narender Chopra says

February 3, 2013 at 5:20 pm

Robots.txt is the major SEO component that no one should miss. But if one is not aware to its usage then its a thing to avoid. IF its configured incorrectly , you may stop search engines to access your content. Very well written Tutorial Amit Bro.

Ravi says

February 3, 2013 at 5:55 pm

Great Article.
It is great that You have Posted this one at Right Time for me as I have Just got Started with Blogging and I am Confused about Robots.txt, can you mention any ideal Robots.txt file to prefer?

- Amit Shaw says
  
  February 3, 2013 at 6:04 pm
  
  Thanks Ravi. Glad that i posted this article on Right Time 🙂
  I hope this article will help you to solve your issue.
  See robots.txt depends on your blog and you, which link do you want to index or Deindex from SE.
  Tough here is the one which i mainly prefer :
  
  Sitemap:
  
  User-agent: *
  Disallow: /wp-content/
  Allow: /wp-content/uploads/
  Disallow: /wp-content/downloads/
  Disallow: /downloads/
  Disallow: /feed/
  Disallow: /recommends/
  Disallow: /go/
  Disallow: /category/
  Disallow: /tag/
  Disallow: /tag/*
  Disallow: /archives/
  Disallow: /author/
  Disallow: /search?
  Disallow: /cgi-bin/
  Disallow: /wp-admin/
  Disallow: /wp-includes/
  Disallow: /recommended/
  Disallow: /comments/feed/
  Disallow: /index.php
  Disallow: /xmlrpc.php
  Disallow: *?wptheme
  Disallow: ?comments=*
  Disallow: /?p=*
  Disallow: /*.pdf$
  Disallow: /*.php$
  Disallow: /*.js$
  Disallow: /*.cgi$
  Disallow: /*.xhtml$
  Disallow: /*.php*
  Disallow: /*.inc$
  Disallow: /*.css$
  Disallow: /*.txt$
  Disallow: /*?*
  Disallow: */feed/
  Disallow: */trackback/
  Disallow: /cgi-bin/
  Disallow: /images/
  Disallow: /embed.js?pname=wordpress&pver=*
  Disallow: ?comments=*
  Disallow: /*?replytocom=*
  Disallow: /?p=*
  Disallow: /search?
  Disallow: /stats/
  Disallow: /general/
  Disallow: /date/
  Disallow: /trackback/
  Disallow: *?wptheme
  Disallow: /?attachment_id*
  Disallow: /search/?*
  Disallow: */trackback/*
  Disallow: /*.js$
  Disallow: /*.inc$
  Disallow: /*.css$
  Disallow: /*.cgi$
  Disallow: /*.wmv$
  Disallow: /*.cgi$
  Disallow: /*.xhtml$
  Disallow: /?cat*
  Disallow: /?m*
  Disallow: /*?utm_source*
  Disallow: /index.php/*
  Disallow: /wp-*
  Disallow: /custom-search/
  
  User-agent: Googlebot-Image
  Allow: /images/post/
  
  User-agent: Adsbot-Google
  Allow: /*
  
  User-agent: Mediapartners-Google
  Allow: /*
  
  Thanks.
  
  - Ravi says
    
    February 4, 2013 at 8:59 am
    
    Thanks Man
    Didn’t Expcted such Explained Reply
    
RC Organo says

February 3, 2013 at 8:02 pm

On must be careful when configuring the robots.txt as you may accidentally placed there folders or locations that you need bots to crawl. So mind the “Allow” and “Disallow” when configuring it.

Aamir Saifi says

February 8, 2013 at 9:43 pm

hey its awesome post.robot.txt plays an important role if you want to restrict SE from indexing unnecessary things of your blog.after reading this article one should pay attention to robot.txt specially for blogspot users.

Robots.txt – A Comprehensive Guide

What is Robots.txt?

Location of robots.txt

Structure of a robots.txt file

How to create a robots.txt file?

Examples of robots.txt format

Important rules for creating a robots.txt file

Test a robots.txt file

Comments

Speak Your Mind Cancel reply

Random Posts

Mi TV Stick vs Amazon Fire TV Stick : The Battle of Similarity

Cyber Security in a Rapidly Evolving Digital World

Comparium Tool : Automated Website Testing Tool

The Importance Of Location When Choosing Your Virtual Office

The Introduction of AC3 and DTS Track Used in DVD Movies

Recent Posts

How To Become a CPA

How to inject Ads in Articles in WordPress

Free PHP, HTML, CSS, JavaScript/TypeScript editor – CodeLobster IDE

Demystifying AI: Understanding Its Impact on Today’s Business World

Artificial Intelligence Has These 5 Advantages

Popular Articles

5 Premium WordPress Plugins That Are Worth Every Penny

List of CommentLuv Enabled Blogs for 2012

Guest Posting: Blog Promotion and Backlinks in One Nice Package

One Year of Change in Me and my World – Rebranding ITC

7 Things Which Must Not Be Missing in Your Sidebar

Giveaway #3: Win SEOPressor WordPress Plugin

Who Am I ?

Robots.txt – A Comprehensive Guide

What is Robots.txt?

Location of robots.txt

Structure of a robots.txt file

How to create a robots.txt file?

Examples of robots.txt format

Important rules for creating a robots.txt file

Test a robots.txt file

Related Articles

Comments

Speak Your Mind Cancel reply

Random Posts

Recent Posts

Popular Articles

Who Am I ?