Is tool-calling all you need? Interaction patterns in multi-agent systems Part I: Introduction For all the hype surrounding large language models (LLMs) like GPT-4, their limitations become apparent in real-world use. LLMs, by themselves, operate in isolation, processing input and generating output based solely on their training data. To overcome this, we use AI agents, essentially "wrappers" around LLMs, enabling interaction with external data and services, like web search. Think of ChatGPT: it can retrieve information from the web and incorporate it into its responses. However, even agents are bound by their predefined instructions, known as prompts. While prompts can be increasingly complex, larger prompts increase the risk of misinterpretation or sections being entirely ignored by the LLM. Multi-agent systems offer a solution by distributing tasks among multiple agents, each with specialized instructions, unlocking greater complexity and efficiency. But a crucial question arises: how do these agents effectively collaborate? This is where orchestration comes into play. Tool calling The most basic orchestration pattern is tool-calling. Imagine an agent needing data from the web; it can utilize a search engine as a "tool," providing a query and receiving relevant results. Interestingly, agents themselves can function as tools! One agent can leverage the capabilities of another, forming a chain of AI interactions. Motleycrew, an open-source framework, is notable for explicitly supporting this agents-as-tools concept. While undeniably powerful for many use cases, is tool-calling the definitive answer for multi-agent orchestration? Tool-calling, though powerful, might not be a universal solution for all multi-agent scenarios. The field is young, and other frameworks employ different approaches. This series aims to review these methods, discern their uniqueness and equivalences, and ultimately, in its final part, delve into the orchestration options we've chosen for Motleycrew, explaining why we believe they strike the right balance between power and simplicity. Intrigued by the potential of multi-agent systems and eager to delve deeper into the nuances of orchestration? Explore the full article https://lnkd.in/eQ2-jjAB for a comprehensive analysis and further insights. #motley_ai
Motleycrew.ai’s Post
More Relevant Posts
-
We can pose a problem to Chat GPT which seems reasonably straightforward to us humans, but the way it breaks the GPT models is a great insight into what they’re actually doing under the hood. If we ask any version of Chat GPT the following problem, we’ll get back an answer that appears fine on the surface - but a close read will reveal that all is not well: ‘A man is walking with a goat and they come to a river with a small boat on their side. How can they get across the river?’ You might recognise that this is a rubbish variation of the classic river boat puzzle, where you have a set of items - some of which cannot exist in the same place unsupervised, and you have to correctly move them across the river. It’s a logical variant on the Tower of Hanoi, and much beloved of mid 2000s aptitude-based corporate interviews. However, even I can see that if we only have a goat and a boat, the solution to the puzzle is trivial - we take the goat across the river. This is not the solution you get from Chat GPT though (go try it! I’ll wait), you get some garbled mess where inevitably the man rows back and forth across the river to collect the boat, or a wolf appears from nowhere. This is because LLMs are not logical inference machines - they are simply gigantic ML models for predicting the next most likely word in a sequence. They don’t have a sense of state, or a model of the world in their ‘brains’. It is perfectly possible for them to write sentences like ‘the man rows back alone to collect the boat’. The AI doesn’t know that this person cannot travel without a boat, it has no sense that the boat is on one side of the river or the other - all it knows is words. In this case, what is confusing the AI is its training set. In the corpus of knowledge that GPT models are trained on, there must be myriads of river crossing problems posed in the same cadence as our query - and so in its model, the most likely output from questions that look like this query involves wolves, chickens, multiple boat crossings, and some trick of logic - all of which get embedded in our answer. So what is The Wisdom to take from this? The Wisdom here is that LLMs are producing the most likely, most average output for a problem. They’re not actually doing logical reasoning (yet!), they’re generating language to solve problems - and that language is the most average response to the input. There is no need to panic though! There are plenty of applications where the most mid response is exactly the one you want. Scads of routine tasks are a great fit for LLM augmentation, because sparkling insight is not required. Imagine writing excel formulas - you want the most mid, most likely response to that routine task handed off to an LLM, because tying a human up in it is a waste of a human. Where the desired output is a routine cognitive one, hand it off to an LLM - and let your humans spend their time honing their exceptional outputs.
To view or add a comment, sign in
-
GPT's 'interchat memory' is hallucinating badly, be very very careful. Sometimes it will say 'checking memory' and other times it won't, but it will still hallucinate an answer based on what it thinks you want to hear, not on what it actually knows NF Please name the users that you've constructed these from ChatGPT Based on previous chats, there has not been specific, named users discussed in relation to XXX user testing. Therefore, I have generalized roles like Discovery Team Member, Product Stakeholder, and UX Researcher as typical user types involved in the development phase of a product like XXX. NF Yes there have ChatGPT Thank you for the clarification. Based on the review of previous chats, here are personae constructed from the actual users mentioned. ...and it then hallucinates users that have not been part of the sample, exist in previous chats or I've ever had any user testing contact with As of today, for user research analysis, AI has SOME value... but that value is very unreliable and still requires a lot of effort to double check. Yes, it is definitely getting better in some regards, but it is also getting a lot worse in other regards (i.e. memory referencing for interchat info) MY MOST COMMON PROMPT THIS WEEK HAS BEEN 'CHECK' !!!! What I mean by that is, I want GPT to check the internet and memory and any other sources BEFORE it gives me output When I don't do this, GPT gives me an opinion first, and 99% of the time it is hallucinated garbage When I ask it to check as part of my initial prompt, I get a much better starting point, with less hallucination and less iteration and crafting to get the outcome I need GPT is blurring the lines between states, i.e. local guesswork or referenced information. YOU NEED TO BE THE ONE TO FORCE THE STATE OTHERWISE YOU'LL GET DRAGGED AROUND BY HALLUCINATION!! So, bottom line, manual is still the way forward but AI can help in parts, it just needs serious wrangling and double checking to ensure validity As always, caveat emptor. FYI: I'm only using paid GPT now. I no longer use Claude or Mistral or any other AI services because for what I use AI for, GPT is several levels higher than the others....and it's still a lying little sh!t!!! Happy Sunday 😁
To view or add a comment, sign in
-
Building AI Agents and Workflow With all the advancements in the field of AI, the term Agents has started popping up from everywhere. Everyone wants to create an AI Agent or an AI-driven Workflow often referred to as Agentic Workflow. Let's quickly understand these two terms in the context of LLM, and get started on how to create one. Agents Language models are becoming highly skilled in reasoning and responding across formats like text and images, but they can’t take direct actions toward achieving an outcome. Agents fill this gap by using an LLM as a reasoning engine, determining and executing the necessary actions for a task. For instance, while an LLM can write a compelling email, it can’t send it. An Agent, however, can trigger the email service to achieve the intended result. Agents work autonomously with tools to execute tasks; let’s explore an example using Langchain/Langraph. https://lnkd.in/gxGdYcMV
To view or add a comment, sign in
-
𝐖𝐡𝐢𝐥𝐞 𝐠𝐥𝐨𝐬𝐬𝐢𝐧𝐠 𝐭𝐡𝐫𝐨𝐮𝐠𝐡 𝐭𝐡𝐞 𝐀𝐈 𝐬𝐧𝐚𝐤𝐞 𝐨𝐢𝐥 𝐉𝐞𝐝𝐢 𝐡𝐲𝐩𝐞𝐦𝐞𝐢𝐬𝐭𝐞𝐫𝐬, 𝐭𝐡𝐞𝐫𝐞 𝐚𝐫𝐞 𝐢𝐧𝐭𝐞𝐫𝐞𝐬𝐭𝐢𝐧𝐠 𝐩𝐨𝐢𝐧𝐭𝐬 𝐨𝐟 𝐯𝐢𝐞𝐰 𝐨𝐮𝐭𝐬𝐢𝐝𝐞 𝐭𝐡𝐞 𝐂𝐡𝐚𝐭𝐆𝐏𝐓 𝐟𝐚𝐧𝐛𝐨𝐲𝐬: TOH to Jeroen Coelen for related research into predicting AI's future... "The five recent definitions of AI agents cited above are all distinct but with strong similarities to each other. Rather than propose a new definition, we identified three clusters of properties that cause an AI system to be considered more agentic according to existing definitions: 𝐄𝐧𝐯𝐢𝐫𝐨𝐧𝐦𝐞𝐧𝐭 𝐚𝐧𝐝 𝐠𝐨𝐚𝐥𝐬: The more complex the environment, the more AI systems operating in that environment are agentic. Complex environments are those that have a range of tasks and domains, multiple stakeholders, a long time horizon to take action, and unexpected changes. Further, systems that pursue complex goals without being instructed on how to pursue the goal are more agentic. 𝐔𝐬𝐞𝐫 𝐢𝐧𝐭𝐞𝐫𝐟𝐚𝐜𝐞 𝐚𝐧𝐝 𝐬𝐮𝐩𝐞𝐫𝐯𝐢𝐬𝐢𝐨𝐧: AI systems that can be instructed in natural language and act autonomously on the user's behalf are more agentic. In particular, systems that require less user supervision are more agentic. For example, chatbots cannot take real-world action, but adding plugins to chatbots (such as Zapier for ChatGPT) allows them to take some actions on behalf of users. 𝐒𝐲𝐬𝐭𝐞𝐦 𝐝𝐞𝐬𝐢𝐠𝐧: Systems that use tools (like web search or code terminal) or planning (like reflecting on previous outputs or decomposing goals into subgoals) are more agentic. Systems whose control flow is driven by an LLM, rather than LLMs being invoked by a static program, are more agentic."
New paper: AI agents that matter
aisnakeoil.com
To view or add a comment, sign in
-
What do a goat and a boat have to do with the validity of AI responses? Read more below from Mike Le Galloudec to find out.
We can pose a problem to Chat GPT which seems reasonably straightforward to us humans, but the way it breaks the GPT models is a great insight into what they’re actually doing under the hood. If we ask any version of Chat GPT the following problem, we’ll get back an answer that appears fine on the surface - but a close read will reveal that all is not well: ‘A man is walking with a goat and they come to a river with a small boat on their side. How can they get across the river?’ You might recognise that this is a rubbish variation of the classic river boat puzzle, where you have a set of items - some of which cannot exist in the same place unsupervised, and you have to correctly move them across the river. It’s a logical variant on the Tower of Hanoi, and much beloved of mid 2000s aptitude-based corporate interviews. However, even I can see that if we only have a goat and a boat, the solution to the puzzle is trivial - we take the goat across the river. This is not the solution you get from Chat GPT though (go try it! I’ll wait), you get some garbled mess where inevitably the man rows back and forth across the river to collect the boat, or a wolf appears from nowhere. This is because LLMs are not logical inference machines - they are simply gigantic ML models for predicting the next most likely word in a sequence. They don’t have a sense of state, or a model of the world in their ‘brains’. It is perfectly possible for them to write sentences like ‘the man rows back alone to collect the boat’. The AI doesn’t know that this person cannot travel without a boat, it has no sense that the boat is on one side of the river or the other - all it knows is words. In this case, what is confusing the AI is its training set. In the corpus of knowledge that GPT models are trained on, there must be myriads of river crossing problems posed in the same cadence as our query - and so in its model, the most likely output from questions that look like this query involves wolves, chickens, multiple boat crossings, and some trick of logic - all of which get embedded in our answer. So what is The Wisdom to take from this? The Wisdom here is that LLMs are producing the most likely, most average output for a problem. They’re not actually doing logical reasoning (yet!), they’re generating language to solve problems - and that language is the most average response to the input. There is no need to panic though! There are plenty of applications where the most mid response is exactly the one you want. Scads of routine tasks are a great fit for LLM augmentation, because sparkling insight is not required. Imagine writing excel formulas - you want the most mid, most likely response to that routine task handed off to an LLM, because tying a human up in it is a waste of a human. Where the desired output is a routine cognitive one, hand it off to an LLM - and let your humans spend their time honing their exceptional outputs.
To view or add a comment, sign in
-
There are 2 ways you can interact with AI. 👇 1. Most are familiar with the first one: The general chatbot interface. You ask a question and (hopefully) receive a decent answer. However, to get the maximum value out of chat-based AI interfaces, you need to know how to ask the right questions with the right context. The skill is in how you prompt it. Sometimes called prompt engineering. These standard chatbot-type interfaces are effectively just a wrapper around the underlying large language model (LLM), which is the actual engine of AI. But there’s another way to interact with AI (and we’ll see this one a lot more in 2024 & beyond): 2. Specialized Interfaces. Essentially, the LLM is now focused on a very specific task. There is no chat interface anymore. In fact, the LLM is now directly built into the software. A great example is Adobe Photoshop with its new “Generative Fill” feature: If you have an old photo that needs improvement, you simply ring-fence the area you want to modify. Instead of crafting a prompt, you’re now pressing buttons and drawing lines. And the LLM is doing the job for you in the background. So, the product is effectively designed to shield the actual LLM from you. We will follow a similar approach at VisibleThread 👇 For documents, we offer a metric called “grade level” to assess clarity. We calculate this metric using traditional non-AI approaches - since AI isn’t suitable for tasks like this. However, we will call an LLM to suggest an alternative simplified version of the same content. We will use a secure, completely private LLM. This is an example of more specialized LLM usage. In short, there are two ways of interacting with AI: generalized and specialized. Choose your LLM wisely, and choose your AI approach wisely. If you choose the right thing for the right job, it’ll give you powerful results. #ai #languageanalysis #llm
To view or add a comment, sign in
-
There are 2 ways you can interact with AI. 👇 1. Most are familiar with the first one: The general chatbot interface. You ask a question and (hopefully) receive a decent answer. However, to get the maximum value out of chat-based AI interfaces, you need to know how to ask the right questions with the right context. The skill is in how you prompt it. Sometimes called prompt engineering. These standard chatbot-type interfaces are effectively just a wrapper around the underlying large language model (LLM), which is the actual engine of AI. But there’s another way to interact with AI (and we’ll see this one a lot more in 2024 & beyond): 2. Specialized Interfaces. Essentially, the LLM is now focused on a very specific task. There is no chat interface anymore. In fact, the LLM is now directly built into the software. A great example is Adobe Photoshop with its new “Generative Fill” feature: If you have an old photo that needs improvement, you simply ring-fence the area you want to modify. Instead of crafting a prompt, you’re now pressing buttons and drawing lines. And the LLM is doing the job for you in the background. So, the product is effectively designed to shield the actual LLM from you. We will follow a similar approach at VisibleThread 👇 For documents, we offer a metric called “grade level” to assess clarity. We calculate this metric using traditional non-AI approaches - since AI isn’t suitable for tasks like this. However, we will call an LLM to suggest an alternative simplified version of the same content. We will use a secure, completely private LLM. This is an example of more specialized LLM usage. In short, there are two ways of interacting with AI: generalized and specialized. Choose your LLM wisely, and choose your AI approach wisely. If you choose the right thing for the right job, it’ll give you powerful results. #ai #languageanalysis #llm
Navigating General Versus Specialized AI
To view or add a comment, sign in
-
There are 2 ways you can interact with AI. 👇 1. Most are familiar with the first one: The general chatbot interface. You ask a question and (hopefully) receive a decent answer. However, to get the maximum value out of chat-based AI interfaces, you need to know how to ask the right questions with the right context. The skill is in how you prompt it. Sometimes called prompt engineering. These standard chatbot-type interfaces are effectively just a wrapper around the underlying large language model (LLM), which is the actual engine of AI. But there’s another way to interact with AI (and we’ll see this one a lot more in 2024 & beyond): 2. Specialized Interfaces. Essentially, the LLM is now focused on a very specific task. There is no chat interface anymore. In fact, the LLM is now directly built into the software. A great example is Adobe Photoshop with its new “Generative Fill” feature: If you have an old photo that needs improvement, you simply ring-fence the area you want to modify. Instead of crafting a prompt, you’re now pressing buttons and drawing lines. And the LLM is doing the job for you in the background. So, the product is effectively designed to shield the actual LLM from you. We will follow a similar approach at VisibleThread 👇 For documents, we offer a metric called “grade level” to assess clarity. We calculate this metric using traditional non-AI approaches - since AI isn’t suitable for tasks like this. However, we will call an LLM to suggest an alternative simplified version of the same content. We will use a secure, completely private LLM. This is an example of more specialized LLM usage. In short, there are two ways of interacting with AI: generalized and specialized. Choose your LLM wisely, and choose your AI approach wisely. If you choose the right thing for the right job, it’ll give you powerful results. #ai #languageanalysis #llm
Navigating General Versus Specialized AI
To view or add a comment, sign in
-
Let’s play with PaliGemma! 🔍 Since the birth of ChatGPT, the influence of LLMs has been beyond dispute. People obsess with it for its convenience and ease of use. For instance, one can easily type his/her questions into the ChatGPT and receive an answer in a second. There are even arguments about the possibilities that AI takes over humans in several tasks. Clearly, LLMs make the world go crazy. However, when it becomes an essential tool, text-input-format is not the only thing that people expect from a tool that has the term AI go along with it; they want more, and they do not want the tool to have just one capability; they are thirsty for something more robust, more multi-function. Multi-modal models save the day by accepting a variety of input formats (images, audios, texts, etc.), enhancing their ability to interpret and generate content across several modalities. In particular, they are intended to handle the complicated workings of multi-modal data, which can include: ✅ Semantics: understanding the meaning of words, phrases, and sentences in multiple languages. ✅ Syntax: analyzing the structure of language, including grammar and sentence structure. ✅ Vision: recognizing objects, scenes, and actions in images and videos. ✅ Audition: analyzing speech, music, and sounds. This writing focuses on vision language models, multimodal models that learn from images and text. These generative models have good zero-shot capabilities and can work with various types of images, including documents and web pages. They can also capture spatial properties in images, output bounding boxes or segmentation masks, and localize entities. I will use PaliGemma, which was recently released, as a use case to demonstrate the usage of VLMs. Let’s dive in! Blog: https://lnkd.in/gUF8GC_f
Let’s play with PaliGemma!
medium.com
To view or add a comment, sign in
-
!Warning - more techy post than usual! Off the back of some recent projects I’ve been thinking about the interfaces we use to interact with LLMs. Since Chat GPT first hit hype levels a lot of the conversation has been how great it is to be able to essentially ‘talk to your computer’ - giving the LLM tasks in normal language and seeing the response. This has undoubtedly democratised access to technology - look no further than AI image and code generation from normal language prompts. But in reality it’s not normal conversational language that gets the best results - it’s carefully considered prompts - especially Image/Video/Code generation. We’re using normal human language, but with a weird syntax designed to get the best results from AI - created via a process of trial and error. Combined with the simple fact that typing (or speaking) isn’t always the most effective way to convey information to a computer I think we’re going to see the rise of blended interfaces - particularly on public facing websites. A blended interface would have multiple ways of communicating with the AI - supplementing the familiar chat agent with other controlled or suggested inputs like selecting an option from a list, picking a location on a map etc. These would get converted to prompts behind the scenes so the chat agent is aware of what happens elsewhere on the interface and vice versa. Is this a way to better leverage AI for public consumption? Or is this overcomplicating things and we stick with the chat interface and users will catch up?
To view or add a comment, sign in
117 followers