What is Robots.txt?
Robot.txt protocol is also known as The Robot Exclusion Standard, which is a pattern to prevent the admittance of web robots and web crawlers to all other websites that are viewed by public. It is a text file not html that searches the pages that you want to view on your site with the help of search boots. Well, robots.txt is not compulsory for search engines, as they obey what is restricted to them properly. However, the search engine robots are automatic, and before they enter any page of a site, they ensures about the presence of a robots.txt file that would prevent it from entering some pages. If you want that your site should be indexed by search engines, then you do not require Robot.txt file. Well, it’s very important to locate robots.txt in an appropriate place.
Location of robots.txt
As far as location is concerned, you must locate robots.txt in the main directory else, search engines cannot detect it. The search engines search for my domain (dot) com/robots.txt in the main directory instead of searching for robots.txt file in an entire site. If it is unable to find in the main directory, they presume that there is no robots.txt file on this site and start indexing the whole site. Hence always try to locate robots.txt file in the right place. Well, it has been a long time that the conception and structure of this file have been designed. However, we will briefly discuss on it.
Structure of a robots.txt file
The structure contains a list of infinite disallowed files, user agents and directories, but the structure is very simple. Let us have a look at the syntax of robots.txt file:
The search engine crawlers are termed as “User-agent” and the list of directories and files that should not be included in indexing are termed “Disallow.” However, if you want to write any comment line, then start your line with # sign like:
# All user agents are disallowed to view the/temp directory
How to create a robots.txt file?
There are certain points that you should keep in mind while creating a robots.txt file. First, enlist the directories and files that you want to block from being indexed in your server. Second, decide whether you want to put some extra information’s for a specific search engine besides the general directives for crawling. Third, create a robots.txt file and commands by using a text editor to block your content. Fourth, to your sitemap file, you can add a reference, but this is optional. Fifth, conform your robots.txt file by checking the errors and finally, in the main directory, upload the robots.txt file. But there are certain rules that should be followed while creating a robots.txt file. Let us have a look at some examples that can make us clear.
Examples of robots.txt format
Allow indexing of everything
Disallow indexing of everything
Disallowing indexing of a particular folder
Except allowing indexing for one file in a folder, disallow Googlebot from crawling of a folder
Important rules for creating a robots.txt file
- Parameters like “follow, noindex” should be written with some meta robots to control the indexation or crawling
- For each URL, you can write only 1 Disallow line
- Different robots.txt files are used by each sub domain that comes under a root domain
- Talking about pattern expression, Bing and Google accept two particular expressions, (* and $)
- Use robots.txt instead of Robots.TXT, because robots.txt is case sensitive
- To separate query parameters, never use spaces as it is not accepted by robots.txt
Well, there certain tools that correct the mistakes in a robots.txt file.
Test a robots.txt file
With the help of these tools, you can know whether robots.txt file is blocking your file from your site or not. However, the search engine robot finds the robots.txt file and stops crawling of your sites.
Well, it is good that search engines visit our site regularly and index our content but sometimes indexing of content is not according to what we want. There are some sensitive data that should not be viewed by the whole world. So with the help of robots.txt file can prevent the search engines to index your site.
Author: Kelly is a writer/blogger. She loves writing, travelling and reading books. She contributes in Bret Clark Microsoft.