A Product Manager’s Take on LLMs

Mohammad Jazim

AI Product Owner at DoctusTech-[Building a portfolio of AI Data Products]

Published Oct 18, 2024

I’ve always been fascinated by AI, but let’s face it—training models never quite sparked my excitement. I love seeing the end results, but I’ve come to terms with the fact that the slow, methodical process of building those models just isn’t for me.

However, LLMs (Large Language Models) have reignited that AI curiosity I felt back in 2022. They’ve opened up doors for creating better products in ways that feel less like data science and more like product innovation.

I don’t claim to be an expert on the technical intricacies behind LLMs. My understanding doesn’t go much deeper than embeddings, and I’m not here to talk about what I can’t fully grasp. Instead, I’ll focus on how LLMs fit into the product landscape and how I think about their role in driving business outcomes.

Embracing Indeterminism

As a product manager, I’m used to thinking about predictable, reliable user experiences. You validate data, and from that point on, you trust it. You build your logic around clear inputs and outputs.

But LLMs break that mold. They introduce a level of unpredictability inside your product that’s a bit unsettling, especially when the product’s success hinges on consistency.

For example, when you call OpenAI’s API, you won’t get the same answer every time. You could ask the model for structured data like JSON, and it might give it to you—most of the time. Other times, it hallucinates and adds unexpected fields.

This isn’t a flaw; it’s just the nature of LLMs as they stand today. But as a product manager, you have to consider what this means for the reliability of your product’s core functionality. Can you trust an LLM’s response when it might vary each time?

Think of it this way: when querying a database or hitting an external service, failures typically mean you get no response or an error. With LLMs, it’s different. You might get a response, but its accuracy or structure can be inconsistent.

Speed vs. Stability

The LLM space is evolving at breakneck speed. In fact, by the time I finish writing this, there will probably be new features available. OpenAI, for instance, now offers parameters that help ensure JSON responses are more reliable, and they’ve added options for determinism to provide consistent outputs.

This pace reminds me of the rapid growth we saw in JavaScript frameworks a few years ago. Take Langchain as an example—a framework designed to create more complex workflows with LLMs. It helps you feed models with data, chain multiple models together, and create agents to refine questions. I remember reading the docs, and as I was checking GitHub for an issue, the methods I was using were already deprecated!

In product management, we like stable processes and well-defined roadmaps. But with LLMs, the landscape is too new for those foundations. You have to adapt to an ever-shifting field where the rules are being written in real time.

RAG: The Practical Approach

One emerging approach that makes a lot of sense for teams building LLM applications is Retrieval-Augmented Generation (RAG). This is a clever way of giving your model extra context without having to go through the slow process of training it on new data.

Here’s why it works: LLMs can only provide answers based on what they’ve been trained on. If they don’t know something, they’ll happily hallucinate a response. RAG fixes that by retrieving relevant documents (stored as vectors) to give the model more context for its response.

Essentially, you’re augmenting the LLM’s capabilities by feeding it the right data at the right time, without needing to retrain anything. This gives you faster iteration cycles and allows your product to deliver more accurate results without the heavy lifting of model training.

Training Models: A Bottleneck

Training custom models is slow. In most cases, small and mid-sized teams will find that training models just doesn’t make sense from a time or cost perspective. You can fine-tune GPT or train your own models, but the overhead—both in time and resources—can bog down progress.

Prompting an LLM or using a RAG pipeline, on the other hand, allows you to iterate rapidly. Change a prompt, and you’re ready to deploy after a single pull request. Retraining a model? That’s a far more complex process.

For most teams, the faster iteration cycles made possible through prompt engineering and RAG are far more valuable than the benefits of custom training.

Handling Errors & Testing in LLMs

LLMs introduce new layers of complexity when it comes to errors and testing. Every request is an API call, meaning network issues and timeouts are a constant risk. Beyond that, the variability in LLM responses can lead to unexpected issues.

For example, what do you do when an LLM fails halfway through a task? How do you handle retries? If the response is cut off or the model hallucinates data that breaks your logic, do you have mechanisms in place to roll back any changes?

Testing LLM-driven features is also uncharted territory. Traditional testing frameworks expect predictable outputs, but LLMs are inherently non-deterministic. So how do you ensure a prompt will keep producing acceptable results? Many teams are now vectorizing their ideal responses and comparing them to actual outputs using cosine similarity, which is very different from traditional unit testing.

Streaming Responses & User Experience

From a product perspective, user experience is paramount. And when dealing with LLMs, waiting for the entire response to load before showing anything isn’t always ideal.

We’ve gotten used to streaming word-by-word responses, much like OpenAI does with their chat interface. But when you’re streaming structured data like JSON, things get tricky. The response comes in character by character, and if the structure isn’t complete, you’re left with broken data that can lead to errors.

In these cases, you have to get creative. For example, I’ve had to use libraries that optimistically close the JSON to avoid errors, but even then, handling more complex data structures like arrays becomes challenging.

LLMs: Strengths and Weaknesses

At first, I fell for the hype, thinking LLMs could handle any computational task. I quickly learned that they’re terrible at basic math. Ask one to count the items in an array, and you’ll get an approximation, not an accurate result.

But what LLMs are good at is understanding intent. I had a use case where users needed to submit plain English commands that would generate UI elements. While the LLM was good at generating objects based on schemas, it frequently hallucinated fields that broke the UI.

I realized that the value of LLMs isn’t in generating the structured data itself—it’s in mapping user intent to functions in your code. This approach, called function calling, is becoming more common, allowing LLMs to act as interpreters that trigger specific actions in your application rather than relying on them for perfect outputs.

The Future of Prompt Engineering

I see prompt engineering becoming a fundamental skill, not a specialized discipline. Much like testing has become part of the everyday workflow for engineers, prompt engineering will become a core competency for product teams.

There may be a few specialists in the future who focus entirely on refining prompts and optimizing models, but for most teams, it will be an additional tool in their toolkit. Writing effective prompts will live alongside writing tests and other best practices in product development.

Sandeeip Chincholkar

2mo

Thanks for sharing this. Quite comprehensive analysis.

1 Reaction

Sourabh Kotgire

2mo

Interesting and Informative

1 Reaction

See more comments

To view or add a comment, sign in

See all

A Product Manager’s Take on LLMs

Mohammad Jazim

AI Product Owner at DoctusTech-[Building a portfolio of AI Data Products]

Embracing Indeterminism

Speed vs. Stability

RAG: The Practical Approach

Training Models: A Bottleneck

Handling Errors & Testing in LLMs

Streaming Responses & User Experience

LLMs: Strengths and Weaknesses

The Future of Prompt Engineering

More articles by this author

Insights from the community

Others also viewed

Google's AI Game Engine, Cursor: The AI IDE, 3 R's in Strawberry, and Our New York Event! 🗽

A Master Class in Navigating Enterprise LLM Challenges with Mahesh Yadav

How to scale with Generative AI

Sustainable AI

The "Working Backwards" Secret for Building AI-Powered Code Generation

GenAI Weekly — Edition 34

AI: Artificial Intelligence or Aggregated Ignorance?

📰 AI News, May 14th 2024 - Tools' Tuesday: Unveiling the Latest Innovations and Tools Transforming Our Digital Landscape

Intelligence Matures in Stages and Why You Shouldn’t Wait

TechCompass #83: Generative AI (Part 2)

Explore topics

Embracing Indeterminism

Speed vs. Stability

RAG: The Practical Approach

Training Models: A Bottleneck

Handling Errors & Testing in LLMs

Streaming Responses & User Experience

LLMs: Strengths and Weaknesses

The Future of Prompt Engineering

Vertical AI Agents: The $300 Billion SaaS Revolution

Dec 7, 2024

Agentic RAG: Redefining Knowledge Retrieval with Practical Applications

Dec 5, 2024

How to Build Multi-Step AI Agents with CLAUDE’s mCP Update

Nov 29, 2024

The Week in AI: Major Updates and Releases You Need to Know

Nov 28, 2024

From Specs to Success: Writing PRDs That Actually Ship

Nov 26, 2024

Data Storytelling for PMs: Driving Decisions That Deliver

Nov 26, 2024

VectorShift: Revolutionizing Automation and AI Across Industries

Nov 21, 2024

Integrating RAG API with Vertex AI Vector Search for Enhanced LLM Grounding

Nov 17, 2024

The Next Evolution in AI: Forget RAG, Welcome Agentic RAG

Nov 17, 2024

Generative AI with Vertex AI: Best Practices in Prompt Design

Nov 17, 2024

Insights from the community

Others also viewed

Google's AI Game Engine, Cursor: The AI IDE, 3 R's in Strawberry, and Our New York Event! 🗽

A Master Class in Navigating Enterprise LLM Challenges with Mahesh Yadav

How to scale with Generative AI

Sustainable AI

The "Working Backwards" Secret for Building AI-Powered Code Generation

GenAI Weekly — Edition 34

AI: Artificial Intelligence or Aggregated Ignorance?

📰 AI News, May 14th 2024 - Tools' Tuesday: Unveiling the Latest Innovations and Tools Transforming Our Digital Landscape

Intelligence Matures in Stages and Why You Shouldn’t Wait

TechCompass #83: Generative AI (Part 2)

Explore topics