Knowledge base
1000 FAQs, 500 tutorials and explanatory videos. Here, there are only solutions!
Manage the default created robots.txt file
This guide provides information about the robots.txt file automatically created for Web hosting where this file is missing.
Preamble
- The
robots.txtfile acts as a guide for search engine crawlers - It is placed at the root of a website and contains specific instructions for these robots, indicating which directories or pages they are allowed to explore and which they should ignore
- However, robots may choose to ignore these directives, making the
robots.txta voluntary guide rather than a strict rule
File Content
If the robots.txt file is missing from an Infomaniak site, a file with the same name is automatically generated with the following directives:
User-agent: *
Crawl-delay: 10These directives tell robots to space out their requests by 10 seconds, which helps to avoid unnecessarily overloading the servers.
Bypassing the default robots.txt
It is possible to bypass the robots.txt by following these steps:
- Create an empty
robots.txtfile (it will serve only as a placeholder so that the rules do not apply). - Manage the redirection of the URI (Uniform Resource Identifier)
robots.txtto your chosen file using a.htaccessfile.
Example
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_URI} /robots.txt$
RewriteRule ^robots\.txt$ index.php [QSA,L]
</IfModule>Explanations
- The
mod_rewritemodule of Apache is enabled to allow redirections. - The condition
RewriteCond %{REQUEST_URI} /robots.txt$checks if the request concerns therobots.txtfile. - The rule
RewriteRule ^robots\.txt$ index.php [QSA,L]redirects all requests torobots.txttoindex.php, with the option[QSA]that preserves the query parameters.
It is recommended to place these instructions at the beginning of the .htaccess file.
Link to this FAQ: