Optimize your crawl budget

This guide describes how to optimize Google's crawling of very large and frequently updated sites.

If your site doesn't have a large number of pages that change rapidly, or if your pages seem to be crawled the same day that they are published, you don't need to read this guide. For Google Search specifically, merely keeping your sitemap up to date and checking your index coverage regularly is adequate.

Who this guide is for

While the recommendations in this guide are generally good practices, this is an advanced guide intended primarily for the following types of sites:

The numbers given here are a

rough estimate

to help you classify your site. These are not exact thresholds.

General theory of crawling

The web is a nearly infinite space, exceeding Google's ability to explore and index every available URL. As a result, there are limits to how much time Google's crawlers can spend crawling any single site, where a site is defined by the hostname. For example, https://www.example.com/ and https://code.example.com/ are two different hostnames, and therefore have separate crawl budgets. The amount of time and resources that Google devotes to crawling a site is commonly called the site's *crawl budget* and it's determined by two main elements: *crawl capacity limit* and *crawl demand*.

For Google Search, not every page that is crawled will necessarily be indexed. After crawling, each page must be evaluated,

consolidated

, and assessed to determine its suitability for the index.

Crawl capacity limit

Google wants to crawl your site without overwhelming your servers. To prevent this, Google's crawlers calculate a *crawl capacity limit*, which is the maximum number of simultaneous parallel connections that Google can use to crawl a site, as well as the time delay between fetches. This is calculated to provide coverage of all your important content without overloading your servers.

The crawl capacity limit can go up and down based on a few factors:

Crawl demand

Each crawler has its own "demand" when it comes to crawling the web. For example, AdsBot generally has a higher demand when a site is running dynamic ad targets, Google Shopping has a higher demand for products you have in your merchant feeds, and Googlebot's demand varies based on a site's size, update frequency, page quality, and relevance, compared to other sites.

In general, the factors that play a significant role in determining crawl demand are:

Additionally, site-wide events like site moves may trigger an increase in crawl demand in order to reprocess the content under the new URLs.

In sum

Taking crawl capacity and crawl demand together, Google defines a site's crawl budget as the set of URLs that Google can and wants to crawl. Even if the crawl capacity limit isn't reached, if crawl demand is low, Google will crawl your site less.

Best practices

To maximize your crawling efficiency, follow these best practices:

404 errors](https://developers.google.com/search/docs/crawling-indexing/troubleshoot-crawling-errors#soft-404-errors).** soft 404 pages will continue to be crawled, and waste your budget. Check the [Index Coverage report](https://support.google.com/webmasters/answer/7440203) for soft 404` errors.

How do I get more crawl budget?

There are two ways to increase crawl budget: