Discuz! Board

 找回密碼
 立即註冊
搜索
熱搜: 活動 交友 discuz
查看: 3|回復: 0

Instructs Bing not to crawl our site

[複製鏈接]

1

主題

1

帖子

5

積分

新手上路

Rank: 1

積分
5
發表於 2024-2-20 14:01:43 | 顯示全部樓層 |閱讀模式
Use each user agent only once Google doesn't care if you specify the same user agent multiple times. Just combine all the rules of the various declarations into one and follow them all. For example, suppose you have the following user agent and directives in your robots.txt file: User-agent: Googlebot Disallow: /a/ User-agent: Googlebot Disallow: /b/ … Googlebot will not crawl any of these subfolders . That said, it makes sense to declare each user agent only once to avoid confusion. In other words, by keeping things neat and simple, you're less likely to make serious mistakes. Avoid unintended errors using specificity Not providing specific instructions when setting directives can lead to easy-to-miss mistakes that can have a devastating impact on your SEO.



For example, suppose you have a multilingual site and are developing Australia Phone Number Data a German version available in the /de/ subdirectory. It's not ready yet and we don't want search engines to be able to access it. The robots.txt file below prevents search engines from accessing that subfolder and everything within it. User-agent: * Disallow: /de However, search engines /de. You will also no longer be able to crawl pages or files that start with for example: /designer-dresses/ /delivery-information.html /depeche-mode/t-shirts/ /definitely-not-for-public-viewing.pdf In this case the solution is simple. Let's add a slash at the end. User-agent: *


       


Disallow: /de/ Use comments to humanize your robots.txt file Comments can help you explain your robots.txt file to developers, and even to yourself in the future. To include a comment, start the line with a hash (#). # This instructs Bing not to crawl our site. User-agent: Bingbot Disallow: / The crawler ignores everything on the line that starts with hash. Use separate robots.txt files for each subdomain Robots.txt only controls crawling behavior on the subdomain where it is hosted. If you want to control crawling on different subdomains, you'll need a separate robots.txt file. For example, if your main site is on domain.com and your blog is on blog.domain.com , you will need two robots.txt files. One should be placed in the root directory of your main domain and the other should be placed in the root directory of your blog.


回復

使用道具 舉報

您需要登錄後才可以回帖 登錄 | 立即註冊

本版積分規則

Archiver|自動贊助|GameHost抗攻擊論壇

GMT+8, 2024-11-23 09:54 , Processed in 0.033097 second(s), 20 queries .

抗攻擊 by GameHost X3.4

Copyright © 2001-2021, Tencent Cloud.

快速回復 返回頂部 返回列表
一粒米 | 中興米 | 論壇美工 | 設計 抗ddos | 天堂私服 | ddos | ddos | 防ddos | 防禦ddos | 防ddos主機 | 天堂美工 | 設計 防ddos主機 | 抗ddos主機 | 抗ddos | 抗ddos主機 | 抗攻擊論壇 | 天堂自動贊助 | 免費論壇 | 天堂私服 | 天堂123 | 台南清潔 | 天堂 | 天堂私服 | 免費論壇申請 | 抗ddos | 虛擬主機 | 實體主機 | vps | 網域註冊 | 抗攻擊遊戲主機 | ddos |