A robots.txt file is created by webmasters to tell robots, mostly search engine robots, what to and what not to crawl and index on a website.
For example if a webmaster wanted to block their whole website from all robots or crawlers then it is as simple as adding the following 2 lines to a notepad file, labeling it robots.txt and uploading it to your root directory:
Disallow: /
And if you only wanted to prevent a certain crawler from crawling and assessing your content then the following 2 lines would suffice:
User-agent: Google bot
Disallow: /no-google/
In the above example you are preventing Google bot to access the “no-google” directory in your root folder. A robots.txt file is very common with most content management systems available in the market these days and they particularly prevent crawling of the admin section of the site as there is no real need for it to be indexed in the search engines.
There are a couple of things that always need to be remembered when dealing with robots.txt files and these include:
- The filename is case sensitive; please remember to use robots.txt and not Robots.txt.
- The file is publically available and anyone has access to the file. It is important to apply better security to sections of your website you are trying to keep hidden for security reason.
- A single entry of disallow is allowed per URL.
- Along with the file the “noindex, follow” tags should also be used on all related pages.
When it comes to preventing single pages from being included within the SERPS then it is probably better to use the “noindex” tag. By only placing the links in the robots.txt file the URL’s themselves will still show up in the SERPS and this could be a problem if people start to link to these URL’s but that link authority will not be passed on to the site itself.
In conclusion, the robots.txt file allows you to block parts of the website you don’t want anyone to see, which is great for content management systems as they can have hundreds of pages sometimes and if every single one is indexed, the authority of the site as a whole will be diluted in to pages that are not even required in the search engines and may also be seen as low quality content on the site.
Source: Bough Digital