Useful robots.txt rules

Here are some common useful robots.txt rules:

Useful rules
Disallow crawling of the entire siteKeep in mind that in some situations URLs from the site may still be indexed, even if they haven't been crawled. Note: This doesn't match the various AdsBot crawlers, which must be named explicitly. User-agent: * Disallow: /
Allow crawling of an entire site (with an empty Disallow rule)This explicitly allows all crawlers to access the entire site. It is functionally equivalent to having no robots.txt file at all, or using an Allow: / rule. User-agent: * Disallow:
Disallow crawling of a directory and its contentsAppend a forward slash to the directory name to disallow crawling of a whole directory. Caution: Remember, don't use robots.txt to block access to private content; use proper authentication instead. URLs disallowed by the robots.txt file might still be indexed without being crawled, and the robots.txt file can be viewed by anyone, potentially disclosing the location of your private content. User-agent: * Disallow: /calendar/ Disallow: /junk/ Disallow: /books/fiction/contemporary/
Disallow crawling of a single web pageFor example, disallow the useless_file.html page located at https://example.com/useless_file.html, and other_useless_file.html in the junk directory. User-agent: * Disallow: /useless_file.html Disallow: /junk/other_useless_file.html
Disallow crawling of the whole site except a subdirectoryCrawlers may only access the public subdirectory. User-agent: * Disallow: / Allow: /public/
Allow access to a single crawlerOnly Googlebot-News may crawl the whole site. User-agent: Googlebot-News Allow: / User-agent: * Disallow: /
Allow access to all but a single crawlerUnnecessarybot may not crawl the site, all other bots may. User-agent: Unnecessarybot Disallow: / User-agent: * Allow: /
Disallow crawling of an entire site, but allow Storebot-GoogleThis implementation hides your pages from Google Search results, but the Storebot-Google web crawler can still analyze them to show your products on Google Shopping. User-agent: * Disallow: / User-agent: Storebot-Google Allow: /
Block all images on your site from Google (includes anywhere images are displayed on Google, including Google Images and Discover)Google can't index images and videos without crawling them. User-agent: Googlebot-Image Disallow: /
Block a specific image from Google ImagesFor example, disallow the dogs.jpg image. User-agent: Googlebot-Image Disallow: /images/dogs.jpg
Disallow crawling of files of a specific file typeFor example, disallow for crawling all .gif files. User-agent: Googlebot Disallow: /*.gif$
Use the * and $ wildcards to match URLs that end with a specific stringFor example, disallow all .xls files: User-agent: Googlebot Disallow: /*.xls$ The $ wildcard designates the end of the URL. This means that any URL that has additional characters after the pattern (such as URL parameters) won't match. For example, https://example.com/cats.xls?personality=loki won't be blocked by the rule /*.xls$.
Combine multiple user agents in a single groupConsolidating rules for multiple crawlers into one group makes the file shorter and easier to manage, as all rules in the group apply to every user agent listed. This is the same as listing the user agents twice with the respective rules. User-agent: Googlebot User-agent: Storebot-Google Allow: /cats Disallow: /