jolomic 發表於 2024-2-20 14:01:43

Instructs Bing not to crawl our site

Use each user agent only once Google doesn't care if you specify the same user agent multiple times. Just combine all the rules of the various declarations into one and follow them all. For example, suppose you have the following user agent and directives in your robots.txt file: User-agent: Googlebot Disallow: /a/ User-agent: Googlebot Disallow: /b/ … Googlebot will not crawl any of these subfolders . That said, it makes sense to declare each user agent only once to avoid confusion. In other words, by keeping things neat and simple, you're less likely to make serious mistakes. Avoid unintended errors using specificity Not providing specific instructions when setting directives can lead to easy-to-miss mistakes that can have a devastating impact on your SEO.



For example, suppose you have a multilingual site and are developing Australia Phone Number Data a German version available in the /de/ subdirectory. It's not ready yet and we don't want search engines to be able to access it. The robots.txt file below prevents search engines from accessing that subfolder and everything within it. User-agent: * Disallow: /de However, search engines /de. You will also no longer be able to crawl pages or files that start with for example: /designer-dresses/ /delivery-information.html /depeche-mode/t-shirts/ /definitely-not-for-public-viewing.pdf In this case the solution is simple. Let's add a slash at the end. User-agent: *


http://zh-cn.aolists.com/wp-content/uploads/2024/02/Australia-Phone-Number-Data.jpg       


Disallow: /de/ Use comments to humanize your robots.txt file Comments can help you explain your robots.txt file to developers, and even to yourself in the future. To include a comment, start the line with a hash (#). # This instructs Bing not to crawl our site. User-agent: Bingbot Disallow: / The crawler ignores everything on the line that starts with hash. Use separate robots.txt files for each subdomain Robots.txt only controls crawling behavior on the subdomain where it is hosted. If you want to control crawling on different subdomains, you'll need a separate robots.txt file. For example, if your main site is on domain.com and your blog is on blog.domain.com , you will need two robots.txt files. One should be placed in the root directory of your main domain and the other should be placed in the root directory of your blog.


頁: [1]
查看完整版本: Instructs Bing not to crawl our site

一粒米 | 中興米 | 論壇美工 | 設計 抗ddos | 天堂私服 | ddos | ddos | 防ddos | 防禦ddos | 防ddos主機 | 天堂美工 | 設計 防ddos主機 | 抗ddos主機 | 抗ddos | 抗ddos主機 | 抗攻擊論壇 | 天堂自動贊助 | 免費論壇 | 天堂私服 | 天堂123 | 台南清潔 | 天堂 | 天堂私服 | 免費論壇申請 | 抗ddos | 虛擬主機 | 實體主機 | vps | 網域註冊 | 抗攻擊遊戲主機 | ddos |