Blocking Your Site from Generative AI Tools is a Mistake
Despite the premature reports of its demise Google is still the biggest distributor of referral traffic across the web. New players like ChatGPT and Bing CoPilot represent complementary chat-driven modalities to meet user information needs, but they have not shown themselves to be a true threat to Google’s traffic yet. As Google rolls out its generative AI appliance to Search all receivers of traffic from Google are deep in the throes of uncertainty. The common suggestion is to block your website from it and other tools that would eat your traffic and give you nothing in return.
But is it truly nothing? I mean, you see the number of words on this page, of course it’s not.
Blocking generative AI-driven tools like ChatGPT, Bing CoPiloh, Gemini, and AI Overviews in Google Search is a mistake that could significantly impact your brand awareness. In today's digital landscape, where user experience and immediate access to information are paramount, these AI tools offer a tremendous opportunity for brands to be where their audience is in new and dynamic ways. Here's why blocking these tools could be detrimental to your brand and how to approach it strategically.
The Role of Generative AI in the Search User Journey
Generative AI represents a significant leap forward in how content is created and consumed online. These tools can provide instant, contextually relevant information to users, enhancing their search experience and delivering your brand’s message more effectively. AI Overviews in Google Search, for example, offer concise and precise answers to user queries, often drawing from various sources to provide the best possible information using retrieval augmented generation (RAG).
Google as the Presentation Layer of the Web's Data
Google has effectively become the presentation layer of the web's data, organizing and displaying information in a way that is most useful to users. Through components like Featured Snippets, Knowledge Panels, and AI Overviews, Google aggregates and presents data from various sources in an easily digestible format. This means that a significant portion of web traffic is driven by how well your content is integrated and displayed within Google's ecosystem.
When you block AI tools from accessing your content, you essentially remove your brand from this critical presentation layer. Users looking for quick answers or overviews will be served content from other sources, diminishing your presence and authority. Even for publishers, it is to their brand benefit to be seen as the source of the information. If suddenly CNN, and NYT are no longer there, it does not harm the information ecosystem, it just gives other more digital native publishers and creators more opportunity to supplant the incumbents. Embracing Google's role as a curator of web data can help ensure that your brand remains visible and relevant in the ever-evolving digital landscape.
One of the key benefits of generative AI tools is their ability to increase brand visibility. When your content is featured in AI-driven summaries or chat responses, it places your brand directly in front of users seeking information. Although it may not yield traffic to your site it establishes or reinforces your brand as an authority in your industry. Blocking these tools means missing out on this prime real estate, which your competitors will undoubtedly exploit.
Users want to Explore Deeper Anyway
Users often prefer to go deeper with their searches, seeking comprehensive and detailed information rather than settling for superficial answers. The introduction of components like Featured Snippets has significantly influenced this behavior by providing quick, concise responses that encourage further exploration rather than causing users to leave the search environment.
This redistribution of search volume highlights the value users place on obtaining a deeper understanding of their queries. According to findings from Emily Bender and Chirag Shah’s paper, "Situating Search," users engage in more nuanced and iterative search behaviors, utilizing search engines as tools for ongoing inquiry and learning rather than merely information retrieval . This reinforces the importance of having robust, high-quality content available to meet the evolving needs of searchers who are willing to dig deeper into topics of interest.
Bender and Shah's research argues that the integration of large language models (LLMs) like AI Overviews into search engines doesn't necessarily satisfy user queries. Instead, these tools often serve as starting points that lead users to more in-depth exploration of topics. For instance, while an AI-generated overview can provide a quick answer, it often triggers more specific follow-up questions from users, who then engage in additional searches to gather comprehensive insights.
Bender's work further indicates that users employ search engines iteratively, refining their queries and seeking various perspectives to build a nuanced understanding of their topics of interest. This iterative process is a key aspect of modern search behavior, underscoring the value of having accessible, detailed, and well-structured content that can cater to these deeper inquiries.
By focusing on optimizing content for visibility and engagement within these AI-driven summaries and follow-up searches, brands can ensure they remain relevant and authoritative.
How Retrieval Augmented Generation (RAG) Works
Retrieval Augmented Generation is a technique used by AI systems to improve the accuracy and relevance of their responses. Here’s how it works:
Retrieval - The AI first retrieves relevant chunks of information from a large database of documents. These chunks are typically snippets or sections of web pages that match the user’s query.
Generation - Using the retrieved chunks, the AI generates a coherent and contextually accurate response to the query.
This process ensures that the AI leverages up-to-date and contextually appropriate information, improving the quality of the user experience. The content from your website could be used in these chunks, providing direct exposure to your audience through AI responses.
However, this only matters for new information. Large language models were likely already trained on your content.
Methods of Blocking AI Tools
If I can't convince you that blocking these tools is a pyrrhic victory at best, here's how you might implement such blocks:
Robots.txt File - You can add directives in your robots.txt file to disallow crawlers from accessing your site. For instance:
For ChatGPT training data, you also need to block the Common Crawl.
But that's not all, you'll need to reconsider syndication partnerships with any brand that OpenAI does a deal with because your website is not the only place they can get your content from.
Meta Tags - You can use meta tags to instruct AI tools not to index certain pages:
To control how much of your content is used by Google’s AI Overviews, you can implement the data-nosnippet and max-snippet attributes:
data-nosnippet - This tag prevents specific parts of your page from being used in snippets.
max-snippet - This tag limits the number of characters that can be used in a snippet.
Keep in mind that this will impact your appearance in core Google Search as well.
Update: Glenn Gabe has indicated this no longer works - https://twitter.com/glenngabe/status/1792889819305029932
Server-Side Blocking - Implement server-side rules to block requests from known AI crawlers based on their user-agent strings or IP ranges.
While these steps can effectively block AI tools, the question remains: should you?
The Whack-a-Mole of Blocking GenAI Tools
Blocking GenAI tools involves more than just adding a few lines to your robots.txt file or using meta tags. You also need to block services like Common Crawl, which many AI companies use to gather data. You’ll also have to reconsider any syndication strategy that causes your content to be on someone else’s site because they may not be blocking these bots or they may end up doing direct licensing deals with AI companies. This requires constant vigilance and updates to your blocking strategies as new data sources and AI tools emerge.
Given the complexity and ongoing effort required, this approach can feel like a never-ending game of whack-a-mole. Instead of focusing on blocking, a more effective strategy is to concentrate on brand building and optimizing your content for these AI systems. By doing so, you ensure that your brand remains visible and relevant, leveraging the power of AI to drive growth and engagement.
Embracing Generative AI
Instead of blocking these tools, consider optimizing your content for them. Here’s how:
Structured Data - Implement structured data on your website to help AI tools understand and present your content accurately. Go beyond the schemas that yield rich results in search engines and leverage everything relevant to your pages.
Comprehensive High-Value Content - Focus on creating comprehensive, high-quality content that addresses user queries in-depth. This increases the chances of your content being selected by AI tools for summaries and overviews. The search journeys will become more complex with the long tail getting longer. Your content will need to be more feature complete.
Regular Update - Keep your content up-to-date to ensure it remains relevant and valuable, increasing its attractiveness to AI systems. Review pages with performance decay and refresh them.
Redistributing your keyword strategy is the best approach to maximize Organic Search going forward. Get a free AI Overiew threat report to see where you stand.
Just Don't Do It
If your content strategy relied on giving users quick facts or wading through information they don't want to get to what they do want in order to monetize via display advertising (looking at you recipe sites), you need to change your strategy because that ship has sailed.
Blocking generative AI tools like ChatGPT, Gemini, and AI Overviews in Google Search is a strategic misstep that could cost your brand in terms of visibility and engagement. Users often prefer to go deeper with their searches, seeking comprehensive and detailed information beyond initial summaries or overviews. Research highlights how features like Featured Snippets have reshaped user behavior, redistributing search volume and encouraging further exploration rather than terminating user queries.
Integrating these AI tools can provide a significant edge in an increasingly competitive digital landscape by making your content more accessible and engaging. Embracing these technologies and optimizing your content for them ensures that your brand remains visible and relevant, leveraging the redistributive nature of search volume influenced by AI tools. Remember, if you're not there, your competitors will be. By staying ahead of the curve and leveraging the power of generative AI, you can ensure that your brand remains at the forefront of users' minds, driving growth and success in the digital age.
Craft Marketing @ Scale. Startups -> Fortune 10. Thinker. Builder. Doer.
6moThis entire debacle can be solved to some extent with some transparency and agreement around 1. What the content will and won't be used for 2. Re-defining the value exchange 3. Taking the greed and fear out of the mix The levels of arrogance, a$$ness and audacity among some AI personalities are at an all-time high. Feels like one could say that the AI folks inflicted emotional harm on all of us, put our jobs at risk or helped them get eliminated and should pay the therapy tax.
SEO Team Lead @ BlueGlass, SEOnerdSwitzerland co-host & seasoned speaker
6moI'm absolutely in agreement with you. I even wrote a white paper with Jason Barnard last year on the topic and the fact that you want to make sure that your brand is well covered by answer engines like ChatGPT. However, I understand that their could also be some exceptions like communities, thinking about Stackoverflow.
Vice President of SEO at idealo
6moI think it is a bit more nuanced than that. If you are a manufacturer, SaaS, or service company, you probably want to avoid blocking GenAI crawlers because you want your information out there. Nike profits from someone reading about Nike shoes - no matter where. But if you are a media company or an e-commerce website, you do not get any benefit from having your content out there. You create content to attract visitors. Amazon does not benefit if someone reads about Nike shoes outside of Amazon. So it makes sense to block crawlers that are used for training models. You will still want to allow RAG crawlers - because those bring you traffic.
Co-Creating Digital Products That Delight With Global Brands 🤝 | Speaker, Founder & CEO | Leading POWER SHIFTER Digital & Trove | Serial Entrepreneur | Photographer, Noobie Knife Maker, Leather Crafter & AI Artist 🛸
6moI’ve already experienced sites being blocked while using ChatGPT’s web tools. Might as well put a noindex file on your site.
Vice President Financial Strategy at Kruze Consulting - Helping Startups Fundraise
6moVery much agree. But... I wonder if it's worth it to publish truly novel content anymore, if it's just going to get compressed into an AI answer. Example, say you work for a CPA firm that has an amazing take on a new tax law. In the good old days, you'd publish a solid post, get traffic and then new customers. But today, maybe you just keep that info to yourself and your current clients, and use it on sales calls. 😞