|
Use each user agent only once Google doesn't care if you specify the same user agent multiple times. Just combine all the rules of the various declarations into one and follow them all. For example, suppose you have the following user agent and directives in your robots.txt file: User-agent: Googlebot Disallow: /a/ User-agent: Googlebot Disallow: /b/ … Googlebot will not crawl any of these subfolders . That said, it makes sense to declare each user agent only once to avoid confusion. In other words, by keeping things neat and simple, you're less likely to make serious mistakes. Avoid unintended errors using specificity Not providing specific instructions when setting directives can lead to easy-to-miss mistakes that can have a devastating impact on your SEO.
For example, suppose you have a multilingual site and are developing Australia Phone Number Data a German version available in the /de/ subdirectory. It's not ready yet and we don't want search engines to be able to access it. The robots.txt file below prevents search engines from accessing that subfolder and everything within it. User-agent: * Disallow: /de However, search engines /de. You will also no longer be able to crawl pages or files that start with for example: /designer-dresses/ /delivery-information.html /depeche-mode/t-shirts/ /definitely-not-for-public-viewing.pdf In this case the solution is simple. Let's add a slash at the end. User-agent: *
Disallow: /de/ Use comments to humanize your robots.txt file Comments can help you explain your robots.txt file to developers, and even to yourself in the future. To include a comment, start the line with a hash (#). # This instructs Bing not to crawl our site. User-agent: Bingbot Disallow: / The crawler ignores everything on the line that starts with hash. Use separate robots.txt files for each subdomain Robots.txt only controls crawling behavior on the subdomain where it is hosted. If you want to control crawling on different subdomains, you'll need a separate robots.txt file. For example, if your main site is on domain.com and your blog is on blog.domain.com , you will need two robots.txt files. One should be placed in the root directory of your main domain and the other should be placed in the root directory of your blog.
|
|