What is Robots.txt
A robots.txt file is a protocol used to prevent web crawlers and robots from accessing all or part of a website. This allows you to control which web crawlers and robots can access your site and how often they do so.
It’s important to understand that robots.txt is not a way of blocking robots from scanning your site, but is simply a request to not visit it. Also, because this file is publicly available, anyone can see what sections of the site you don’t want the robots to access.
A Robots.txt file can be created in something like notepad or textedit and you can find a couple of examples below on how best to use this.
To allow all robots complete access
To exclude all robots from the entire server
User-agent: *Disallow: /
To allow a single robot
User-agent: GoogleDisallow:User-agent: *Disallow: /
Once you have created your file, you would need to upload it to your root directory via FTP. Details of how to do this can be found at the link below:
The default robots.txt file that is added to any domain hosted with LCN.com is as follows:
User-agent: *Disallow:Crawl-delay: 10
This means that it will allow all bots to crawl the site, but issues a crawl delay of 10 seconds per page. This helps to prevent server stress when bots crawl our servers.
As this is set server side, the file itself is not displayed in your hosted files, but uploading a custom robots.txt will overwrite this.