Almost Timely News: 🗞️ 4 Reasons Why Generative AI Prompts Fail (2024-11-24)

Almost Timely News: 🗞️ 4 Reasons Why Generative AI Prompts Fail (2024-11-24)

Almost Timely News: 🗞️ 4 Reasons Why Generative AI Prompts Fail (2024-11-24) :: View in Browser

The Big Plug

Download my newly revised Unofficial Guide to the LinkedIn Algorithm, just updated with new information from LinkedIn Engineering!

Content Authenticity Statement

95% of this week's newsletter was generated by me, the human. You'll see an output from ChatGPT in the opening section. Learn why this kind of disclosure is a good idea and might be required for anyone doing business in any capacity with the EU in the near future.

Watch This Newsletter On YouTube 📺

Click here for the video 📺 version of this newsletter on YouTube »

Click here for an MP3 audio 🎧 only version »

What's On My Mind: 4 Reasons Why Generative AI Prompts Fail

Let’s go back to some basics this week on prompt engineering, leaning into some 101 review. How do generative AI systems - large language models like the ones that power ChatGPT, Gemini, and Claude - go wrong? When they produce bad results, especially things like hallucinations (lies and errors), why, and what can we do about it?

To understand this, we first need to review the basics of what’s inside these models. It’s not magic, it’s not fairy dust, it’s that thing that a lot of people really dislike: numbers. After that, we’ll look at the mechanisms for how these things generate results, four ways they go wrong, and four ways you can improve the output you get.

AI Architecture

Let’s start with model training. When a big company - and for today’s state of the art models, you need a big company with deep pockets - makes an AI model, you start with data. Lots and lots and lots of data. For example, Meta recently said that their models are trained in part on all public content posted to Meta services (Facebook, Instagram, Threads, etc.) since 2007.

In basic terms, the average language model like the ones that powers ChatGPT is trained on anywhere from 5-10 trillion words. IF you had a bookshelf of books - all text, no pictures - 10 trillion words is a bookshelf that stretches around the equator of the planet… twice. That’s how much text today’s models need to deliver fluent responses.

When models are trained, what happens is a two step process. First, every word is tokenized. This is fancy for turned into numbers. For example, this sentence:

“The quick brown fox jumped over the lazy dog.”

Turns into this:

[23171, 4853, 19705, 68347, 48704, 1072, 290, 29082, 6446, 2517]

It’s worth pointing out that none of these numbers repeat, even though the word ‘the’ repeats. Why? Punctuation can alter how an AI perceives words.

After tokenization comes a process called embedding. Conceptually, this is like building massive word clouds based on how often parts of one word (the tokens) appear near others in the text. Every word we use has a conceptual word cloud around it of related words.

If I say “B2B”, related words will be things like “marketing” and “sales”. Model makers compute the probability that any token will be near any other token, over and over again, until you end up with a MASSIVE statistical database of what’s most commonly near what - at the sub word, word, phrase, sentence, paragraph, and even document level.

There are a few other steps involved, but functionally, that’s how models are made.

Why do we need to know this?

Because this is also how AI interprets our prompts.

When we prompt an AI, it tokenizes our prompts, turning it into numbers. It then looks into its massive catalog of probabilities to see what’s most similar and conceptually looks at the word clouds around every word and phrase and sentence in our prompts. Where those word clouds overlap - think a really complex Venn diagram - is what the model returns to us. (For the curious, this is not mathematically how it works, but conceptually it’s close enough.)

Here's a key principle I don't see discussed enough. When we prompt AI, it responds. And then, as we continue the conversation, what's happening is that EVERYTHING in the conversation up to that point becomes part of the next prompt.

This is a critical aspect of generative AI, something not true of earlier systems like auto-complete on your phone. Every word in a conversation - whether you say it or an AI says it - becomes part of the next part of the conversation. This will be very important in just a little while.

Okay, so that’s the inner mechanics of an AI model. It's a library of probabilities, and when we prompt it, we are sending the "librarian" into the library to find the closest matches for what's in our prompt.

That brings us to why prompts sometimes deliver unsatisfying results.

Four Ways Prompting Goes Wrong

Now that we know the basics of how AI models work, let's talk about why they don't work sometimes.

Large language models deliver unsatisfying or unsatisfactory results for one of four major reasons:

  1. They don't have the knowledge to fulfill our request at all.

  2. They don't have the correct knowledge to fulfill our request.

  3. They don't have the ability to fulfill our request.

  4. They do have the knowledge, but we haven't correctly invoked it with a good prompt.

Let's dig into each of these major cases.

Lack of Knowledge

Some models simply don't have the information we want. It's like going to the library and asking for a book, and the library doesn't have the book. In the case of AI, the librarian comes back with the closest thing that they do have, because AI models are built to be helpful - even if they're not factually correct.

It's like going to make a kale avocado smoothie, and you don't have kale or avocado. If you substitute a whole lime and some grass from your yard, that's theoretically close (from the viewpoint of an AI - they're close, right? Both are green!) but boy is the end result not going to be what you want.

In AI terms, that's a hallucination. That's what's happening when a model makes things up. It's not lying, per se, at least not willfully. It's coming up with the probabilities it knows.

For example, if you're working at a new startup company and you ask even a big foundation model like GPT-4o, it still may never have heard of your company. As a result, when you ask it to help you write content about this company it's never heard of, it'll make mistakes. In its effort to be helpful, it will cobble together its best guess probabilities that are not necessarily truthful.

Lack of Correct Knowledge

The second way AI models often go wrong is lack of correct knowledge. The model has a lot of knowledge on the topic, but it's unable to differentiate specific aspects of that knowledge to return something completely correct.

For example, the profession of SEO has been around ever since the dawn of the first search engine more than a quarter century ago. There have been millions and millions of words written about SEO, and all that knowledge (except the most recent) has found its way into AI models.

If we prompt a model with a naive prompt like "Optimize this website copy with SEO best practices", exactly which best practices are we talking about? If we look at Google Books, for example, the most knowledge created about SEO occurred in 2012. With a prompt like that, you have no way of knowing whether or not the model is drawing on information written in 2002, 2012, or 2022. Remember back in the previous section about how models are trained? None of the knowledge in a model is date-stamped, so you could be invoking very, very old information - and as a result, not getting good results.

Another angle on this is factual correctness. Models are trained on massive amounts of public data; again, going back to Meta's example, training it on everything ever published publicly on Facebook since 2007. How much of what was shared on Facebook about COVID is factually correct?

Yeah.

And yet all that knowledge - correct or not - has found its way into Meta's models. If you don't have any domain expertise, you could ask Meta Llama about the SARS-CoV-2 virus mechanisms and not know whether its information is correct or not.

Lack of Ability

The third way AI models often go wrong is lack of ability. Language models are, as we discussed, predictive models, predicting the next token based on all the tokens we've fed it. That makes them especially good at any kind of language task.

Which, by definition, makes them not particularly good at non-language tasks.

Like math.

If we give an AI model a mathematical task, out of the box it's going to do what it always does, and look at the tokens we've fed it and look for high probability tokens to return, treating numbers like words. Except that isn't how math works.

2 + 3 = 5 not because 5 occurs often next to 2 and 3, but because that's literally how computation works. Thus, the more infrequent a mathematical task is, the less likely a language model is to get it right. It can do 2 + 2 = 4 all day long because it has seen that in its training data extensively. It has seen cos((852654 + 47745) / 3411.9) far, far less, and is unlikely to come up with 1 as the answer.

Most language model makers circumvent this by having models write the appropriate code behind the scenes, usually in Python, to solve math problems, reflecting their understanding that a language model can't actually do math.

When we're working with AI, we have to ask ourselves whether or not the AI is even capable of the task we're assigning it. In many cases, AI is not capable of the task. For example, we might want AI to check our inbox and tell us what messages are important. The determining of message importance is a language task, but connecting to an inbox is very much a traditional IT task, and a language model simply can't do that without other systems' help.

Bad Prompting

Finally, a model can have ability, have knowledge, and even have correct knowledge and still deliver bad results if we ask it questions that will generate wrong answers.

Suppose our own knowledge of SEO is badly out of date. Perhaps we stopped following along in SEO back in the early 2000s. We might ask an AI model rather naively to optimize a page's content or copy by putting our keyword in the page dozens of times, in the headings, in the body content bolded, and over and over again in white-on-white text at the bottom.

The AI will accomplish this task. It will do so in a factually correct manner, having the capability to write HTML, the ability to understand the instructions, the knowledge of keywords and such...

... but keyword stuffing like this went out of style around the same time as the start of the Obama Administration.

Again, the model is being helpful, and will carry out the instructions we ask of it, but the actual outcome we care about - attracting search traffic - will not happen because we're fundamentally doing it wrong. In this example, we're the weakest link.

Four Ways to Solve AI Prompting Fails

So with these four problems, what are the solutions?

For the first two cases, lack of knowledge and lack of correct knowledge, the answer is straightforward: more, better knowledge. Specifically, we need to provide the knowledge to the AI and direct it to use it.

This is why it's critically important to follow the Trust Insights RAPPEL AI Prompt Framework. When you get to the third step, Prime, you ask the model what it knows on the topic and task you're working on. This is your opportunity to audit its knowledge and determine if it has enough of the correct knowledge for the task - and if it doesn't, then you know you need to provide it.

Suppose I prompt ChatGPT with the start of a RAPPEL prompt like this:

You're an SEO expert as featured in Search Engine Land. You know search engine optimization, SEO, organic search, search engine rankings, SERPs. Today we'll be optimizing some web copy for SEO. First, what do you know about this topic? What are common mistakes made by less experienced SEO practitioners? What are less known expert tips and tricks for optimizing web copy for SEO?

ChatGPT will foam at the mouth for a while and produce a long list of information. When I ran this with the most current model, GPT-4o, it returned this among its list of tips:

"E-A-T: Build Expertise, Authoritativeness, and Trustworthiness through high-quality content, credible authorship, and strong backlinks."

For those who know SEO, this advice is a little out of date. Not horrendously, but it's now a couple years old. In December of 2022, Google changed its guidelines to now encompass experience as well as expertise, or E-E-A-T.

That means that ChatGPT's SEO knowledge stops roughly at the end of 2022. Which in turn means we need to provide it new knowledge. If I provide Google's 2024 edition of the Search Quality Rater Guidelines, ChatGPT will reference that document first, and build a much more up to date set of recommendations.

For enterprise use cases, you'd want to connect a database to your AI to provide updated or more specific knowledge, a system usually called Retrieval Augmented Generation, or RAG. That's well outside the 101 refresher we're doing now, but it's a company-sized solution. For individuals like you and me, the answer to failures 1 and 2 is all about providing more, better information to AI.

In fact, in the SEO example above, for the Prime step in RAPPEL, we might not even want to ask the model what it knows. We might skip straight to providing the information directly, knowing that anything published in the last 6-12 months probably hasn't made it into the AI's knowledge. Priming is just as effective if we provide the knowledge as when the AI provides it.

For the third failure, lack of ability, the solution there is to have AI help you with a workaround. Sometimes it's built in, like when it writes code to solve math problems. However, the answer there is usually to ask AI if it can help solve the problem.

And for the fourth failure, bad prompting, this comes in two major flavors - flawed requests and naive prompts. Flawed requests are like what we showed in the example. The user wrote a technically fine prompt that directed the AI to do something incorrectly. You can include a check in your prompts, something like:

"What I'm trying to do is optimize my website copy. Based on your knowledge of this topic, do my instructions reflect best practices?"

That addon to your prompts can help at least get your knowledge into the modern era, if not up to current day standards.

For naive prompting, where we just ask the model terrible prompts, like "Write me a blog post about 2025 content marketing trends" - the prompt is so naive and lacks so much specificity that it's always going to underperform. The solution for this is partly to adopt a framework like the RAPPEL framework, and partly to ask the model to help. This is the Learn stage in the RAPPEL framework, but you can apply that particular technique at any time.

Here's an example.

"You're a prompt engineering expert. You know large language models, the transformers framework, attention, and prompt engineering and optimization techniques. Take the following prompt, which I know is terrible, and discern its intent. Talk through your discernment step by step. Then select one or more prompt engineering techniques from peer-reviewed academic papers on prompt engineering that you know and explain why they would be a good choice for improving the prompt. Finally, rewrite it using your selected prompt engineering techniques to deliver optimum results and fulfill the discerned intent of the prompt for ChatGPT. Here's the prompt: Write me a blog post about 2025 content marketing trends."

When we put this into ChatGPT, here's what it returns after its deliberations:

WAct as an expert in content marketing and write a professional, engaging blog post about the trends that will shape content marketing in 2025. The blog should be approximately 1,000 words long and targeted at marketing professionals and business owners. Use a conversational yet authoritative tone. Begin with a compelling introduction that highlights the importance of staying ahead of marketing trends, followed by 4–6 key trends for 2025, supported by examples and data where relevant. Conclude with actionable advice on how marketers can prepare for these trends today. Ensure the blog is well-structured with headings, subheadings, and bullet points for readability.

That's a big, big improvement over a one-line prompt. There's still much, much more that could be done to improve this prompt, but since this is a 101-level focused piece, this alone will deliver far better results.

Wrapping Up

Prompt engineering as a field continues to evolve and advance. In some of the peer-reviewed papers on the topic, there are over 50 sub-disciplines in prompt engineering, but our review today of the basics, the beginner's prompt engineering, is a great start.

Please feel free to share this with anyone in your organization or your network that would appreciate the refresher or might need a starting point to learn better prompting. Today's tips, along with the Trust Insights RAPPEL AI Prompt Framework will immediately improve the results you get from AI.

(and if you'd like content like this delivered as a workshop at your company or event, let me know.)

How Was This Issue?

Rate this week's newsletter issue with a single click. Your feedback over time helps me figure out what content to create for you.

Share With a Friend or Colleague

If you enjoy this newsletter and want to share it with a friend/colleague, please do. Send this URL to your friend/colleague:

https://www.christopherspenn.com/newsletter

For enrolled subscribers on Substack, there are referral rewards if you refer 100, 200, or 300 other readers. Visit the Leaderboard here.

Advertisement: Bring Me In To Speak At Your Event

Elevate your next conference or corporate retreat with a customized keynote on the practical applications of AI. I deliver fresh insights tailored to your audience's industry and challenges, equipping your attendees with actionable resources and real-world knowledge to navigate the evolving AI landscape.

👉 If this sounds good to you, click/tap here to grab 15 minutes with the team to talk over your event's specific needs.

If you'd like to see more, here are:

ICYMI: In Case You Missed it

Besides the Generative AI for Marketers course I'm relentlessly flogging, this week, we burned down more of the inbox with questions you had about generative AI.

This coming week, there won't be any episodes on the channel because of the USA Thanksgiving holiday.

Skill Up With Classes

These are just a few of the classes I have available over at the Trust Insights website that you can take.

Premium

Free

Advertisement: Generative AI Workshops & Courses

Imagine a world where your marketing strategies are supercharged by the most cutting-edge technology available – Generative AI. Generative AI has the potential to save you incredible amounts of time and money, and you have the opportunity to be at the forefront. Get up to speed on using generative AI in your business in a thoughtful way with Trust Insights' new offering, Generative AI for Marketers, which comes in two flavors, workshops and a course.

Workshops: Offer the Generative AI for Marketers half and full day workshops at your company. These hands-on sessions are packed with exercises, resources and practical tips that you can implement immediately.

👉 Click/tap here to book a workshop

Course: We’ve turned our most popular full-day workshop into a self-paced course. Use discount code ALMOSTTIMELY for $50 off the course tuition.

👉 Click/tap here to pre-register for the course

If you work at a company or organization that wants to do bulk licensing, let me know!

Get Back to Work

Folks who post jobs in the free Analytics for Marketers Slack community may have those jobs shared here, too. If you're looking for work, check out these recent open positions, and check out the Slack group for the comprehensive list.

Advertisement: Free Generative AI Cheat Sheets

Grab the Trust Insights cheat sheet bundle with the RACE Prompt Engineering framework, the PARE prompt refinement framework, and the TRIPS AI task identification framework AND worksheet, all in one convenient bundle, the generative AI power pack!

Download the bundle now for free!

How to Stay in Touch

Let's make sure we're connected in the places it suits you best. Here's where you can find different content:

Listen to my theme song as a new single:

Advertisement: Ukraine 🇺🇦 Humanitarian Fund

The war to free Ukraine continues. If you'd like to support humanitarian efforts in Ukraine, the Ukrainian government has set up a special portal, United24, to help make contributing easy. The effort to free Ukraine from Russia's illegal invasion needs your ongoing support.

👉 Donate today to the Ukraine Humanitarian Relief Fund »

Events I'll Be At

Here are the public events where I'm speaking and attending. Say hi if you're at an event also:

  • Social Media Marketing World, San Diego, April 2025

  • Content Jam, Chicago, April 2025

  • SMPS, Columbus, August 2025

There are also private events that aren't open to the public.

If you're an event organizer, let me help your event shine. Visit my speaking page for more details.

Can't be at an event? Stop by my private Slack group instead, Analytics for Marketers.

Required Disclosures

Events with links have purchased sponsorships in this newsletter and as a result, I receive direct financial compensation for promoting them.

Advertisements in this newsletter have paid to be promoted, and as a result, I receive direct financial compensation for promoting them.

My company, Trust Insights, maintains business partnerships with companies including, but not limited to, IBM, Cisco Systems, Amazon, Talkwalker, MarketingProfs, MarketMuse, Agorapulse, Hubspot, Informa, Demandbase, The Marketing AI Institute, and others. While links shared from partners are not explicit endorsements, nor do they directly financially benefit Trust Insights, a commercial relationship exists for which Trust Insights may receive indirect financial benefit, and thus I may receive indirect financial benefit from them as well.

Thank You

Thanks for subscribing and reading this far. I appreciate it. As always, thank you for your support, your attention, and your kindness.

See you next week,

Christopher S. Penn

Dr. Rachna Jain

Psychologist & Human-Centered AI Strategist | Elevating Work, Purpose, and Profitability with AI

2d

I appreciate this article- it offered an easy to understand model for why prompts fail. I appreciated the reminder to prime the model by asking what it knows and updating its knowledge if needed. This was the first article I've read from you and I definitely will be reading more. Thanks!

Patrick Prothe

Founder, Design Buddha | Marketing Strategy, Brand Design

2d

The power is in the prompt. AI is actually pretty dumb. It only knows what it's been fed and can't reason (at least not yet). The more generic the prompt the more generic the output. Knowing what you want to glean from AI - how it can help you - makes it a powerful tool. Use wisely.

Tiffany Wolff

Customer Relationship Marketing Specialist

2d

I really enjoyed this week’s newsletter. It’s back-to-basics without being too general. Great stuff, Christopher!

Mat Loup

Senior Strategic Communications Leader | Sustainability | ESG | Writer | Editor | People Manager | Coach

3d

One of my favourite weekly reads - thanks Christopher Penn. I always learn things from you

Kim Kuhlman, PhD

B2B Content Marketing Strategist | SEO | AI Expert | Data-driven business growth. Let's connect to elevate your marketing strategy with GenAI solutions. #B2BMarketing #ContentMarketing #SEO #GenAI #DigitalGrowth

3d

Love this as always, Christopher! I would add, especially in the case of SEO (but applicable in a lot of cases), give the LLM the current standards in the prompt via a document like the most current Quality Raters Guidelines to make sure it has the latest information.

To view or add a comment, sign in

Explore topics