Managing crawling of faceted navigation URLs

Faceted navigation is a common feature of websites that allows its visitors to change how items (for example, products, articles, or events) are displayed on a page. It's a popular and useful feature, however its most common implementation, which is based on URL parameters, can generate infinite URL spaces which harms the website in a couple ways:

A typical faceted navigation URL may contain various parameters in the query string related to the properties of items they filter for. For example:

https://example.com/items.shtm?products=fish&color=radioactive_green&size=tiny

Changing any of the URL parameters products, color, and size would show a different set of items on the underlying page. This often means a very large number of possible combinations of filters, which translates to a very large number of possible URLs. To save your resources, we recommend dealing with these URLs one of the following ways:

Prevent crawling of faceted navigation URLs

If you want to save server resources and you don't need your faceted navigation URLs to show up in Google Search or other Google products, you can prevent crawling of these URLs with one of the following ways.

Other ways to signal a preference of which faceted navigation URLs (not) to crawl is using rel="canonical" link element and the rel="nofollow" anchor attribute. However, these methods are generally less effective in the long term than the previously mentioned methods.

Ensure the faceted navigation URLs are optimal for the web

If you need your faceted navigation URLs to be potentially crawled and indexed, ensure you're following these best practices to minimize the negative effects of crawling the large number of potential URLs on your site:

Keep in mind that having these URLs crawled means an increased resource usage on your server and, potentially, slower discovery of new URLs on your site.

  1. Use the industry standard URL parameter separator '&'. Characters like comma (,), semicolon (;), and brackets ([ and ]) are hard for crawlers to detect as parameter separators (because most often they're not separators).
  2. If you're encoding filters in the URL path, such as /products/fish/green/tiny, ensure that the logical order of the filters always stays the same and that no duplicate filters can exist.
  3. Return an HTTP 404 status code when a filter combination doesn't return results. If there are no green fish in the site's inventory, users as well as crawlers should receive a "not found" error with the proper HTTP status code (404). This should also be the case if the URL contains duplicate filters or otherwise nonsensical filter combinations, and nonexistent pagination URLs. Similarly, if a filter combination has no results, don't redirect to a common "not found" error page. Instead, serve a "not found" error with the 404 HTTP status code under the URL where it was encountered. If you have a single-page app this might not be possible. Follow the best practices for single page apps.