Google’s John Mueller responded to a comment about Google’s guidelines around “Use the robots.txt file on your web server to manage your crawling budget by preventing crawling of infinite spaces such as search result pages.” He said this is less about spam and more about ” watering down your indexed content with useless pages that compete with each other.”
He posted this on Twitter:
This is usually not a spam/not-spam situation, but more about watering down your indexed content with useless pages that compete with each other. More a question of strategy rather than spam.
— 🍌 John 🍌 (@JohnMu) March 9, 2020
Here is the original tweet from Lily Ray from this conversarion:
Google: prevent crawling of your internal search result pages!
Also Google: *ranks internal search result pages on page 1 all the time* pic.twitter.com/IWNtl0KFTc
— Lily Ray 😏 (@lilyraynyc) March 9, 2020
In 2007, Google told webmasters to block internal search results from being indexed. The original guideline read “Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don’t add much value for users coming from search engines.” Now it reads “Use the robots.txt file on your web server to manage your crawling budget by preventing crawling of infinite spaces such as search result pages.”
Then ten years later, Google’s John Mueller explained why Google doesn’t want your search result pages in its index. He said “they make infinite spaces (crawling), they’re often low-quality pages, often lead to empty search results/soft-404s.”
So it really wasn’t always about spam but blocking pages that might not be as relevant to Google.
Forum discussion at Twitter.