ibl.ai’s Post

View organization page for ibl.ai, graphic

2,605 followers

7mo

From our AI CTO, Miguel Amigot II: 🚀 By leveraging Prompt Adaptation, LLM Approximation, and LLM Cascade, ibl.ai optimizes AI performance while cutting costs. 🎯 Precise prompts minimize computational load, fine-tuned smaller models handle specialized tasks, and our tiered querying system ensures efficient resource use. 🌿 Transcript: At ibl.ai, we harness the power of AI efficiently and responsibly. Let’s explore three advanced techniques we use to optimize costs while enhancing the capabilities of large language models. Prompt Adaptation is our first technique. By crafting concise, optimized prompts, we minimize computational load. For example, instead of using lengthy few-shot prompts with multiple examples, we select the most effective ones. This not only reduces processing costs but also enhances the model's focus and effectiveness. Let’s consider email classification. We transform a bulky prompt into a lean one by selecting the top five most relevant past examples using semantic similarity tests. This approach reduces token count significantly, slashing costs by up to 70%. Next is LLM Approximation. Rather than defaulting to high-cost models for all tasks, we employ a strategic use of cached responses and fine-tune smaller models for specific functions. This method maintains high accuracy but with lower operational costs. For instance, when recurrent queries or similar tasks arise, we retrieve answers from a cache, which saves up to 95% in costs. Additionally, by fine-tuning smaller models on specialized tasks, we achieve or surpass the performance of larger models without the associated expense. The third technique is LLM Cascade. We start with the simplest model that might provide a satisfactory answer and escalate to more complex models only if necessary. This approach ensures we use the minimal computational resources required for each query. Using an email sorting task, we first query a basic model. If the confidence score is sufficient, we stop there. If not, the query moves up to more sophisticated models. This tiered querying significantly cuts costs while ensuring quality responses. At ibl.ai, these techniques—Prompt Adaptation, LLM Approximation, and LLM Cascade—are more than just cost-saving measures. They are part of our commitment to sustainable AI use, ensuring that our technologies not only lead in innovation but also in efficiency and responsibility.

Transcript

Welcome to IBLAI, where we optimize AI efficiently. Today we'll explore three key techniques we use to reduce the costs of large language models, Prompt adaptation, LLM Approximation and LLM Cascade. Prompt adaptation is our first technique. By crafting concise, optimized prompts, we minimize computational load. For example, instead of using lengthy few shot prompts with multiple examples, we select the most effective ones. This not only reduces processing costs, but also enhances the Modell's focus and effectiveness. Let's consider e-mail classification. We transform A bulky prompt into a lean one by selecting the top five most relevant past examples using semantic similarity tests. This approach reduces token count significantly, slashing costs by up to 70%. Next is LM approximation. Rather than defaulting to high cost models for all tasks, we employ A strategic use of cached. Responses and fine tune smaller models for specific functions. This method maintains high accuracy, but with lower operational costs. For instance, when recurrent queries or similar tasks arise, we retrieve answers from a cache, which saves up to 95% in costs. Additionally, by fine tuning smaller models on specialized tasks, we achieve or surpass the performance of larger models without the associated expense. The third technique is LM cascade. We start with the simplest model that might provide a satisfactory answer and escalate to more complex models only if necessary. This approach ensures we use the minimal computational resources required for each query. Using an e-mail sorting task example, we first query a basic model. If the confidence score is sufficient, we stop there. If not, the query moves up to more sophisticated models. This tiered querying significantly cuts costs while ensuring quality responses. At IBL dot AI, these techniques prompt adaptation, LM approximation, and LM cascade. Are more than just cost saving measures. They are part of our commitment to sustainable AI use, ensuring that our technologies not only lead in innovation, but also in efficiency and responsibility.

To view or add a comment, sign in

More Relevant Posts

Digital Alpha Platforms

16,525 followers
7mo
Report this post
Self-reflecting #AI agents use a generator to produce output and a reflector to review the generator’s work. The same Large Language Model (LLM) is used for both the generator and the reflector, but each has different prompts, resulting in a self-reflecting AI agent. This method of using the same LLM in two different roles in a cyclical manner is facilitated by the LangGraph framework from LangChain. Multi-agent Workflows The LangGraph framework can also be used to create multi-agent workflows. Just like in the self-reflecting AI agent, the LLM can take on multiple roles, each acting as a different AI agent. This is the concept of multi-agents. Multi-Agent A multi-agent system involves connecting independent actors, each powered by a large language model, in a specific arrangement. Each agent can have its own prompt, LLM, tools, and other custom code to collaborate with other agents. However, the same LLM can also assume different roles based on the prompts provided. https://lnkd.in/gtpWReHe

Multiple AI Agents: Creating Multi-Agent Workflows Using LangGraph and LangChain

vijaykumarkartha.medium.com
Like Comment
To view or add a comment, sign in
Stefan Palm

Technology Advocate and AI Pathfinder
8mo
Report this post
What is current status of AI Agents, leading or bleeding edge? Short story, you might want to start using Agents asap. There are so many things you want to do, but limited with time. So, i decided to combine my interest in Agents to do research in topics on my mind. Could it be of use? It was amazingly easy to set up a local environment (no cost), based on CrewAI and ollama. After a few tweaks i got the output below. So next time i get a question like "What is the pros and cons of fine-tuning vs model merging?", I will deploy my agents again 😀 What about you, are you still searching, reading, writing everything yourself? -------------------------- "While large language models (LLMs) continue to improve with advancements in multimodal capabilities and fine-tuning for specific domains or tasks, innovative methods like model merging present further opportunities to enhance these models' performance and cost efficiency Fine-tuning is a crucial technique for adapting large pretrained models to specific domains or tasks by fine-tuning on domain-specific datasets. This results in improved accuracy for the given domain or task. Other advantages of fine-tuning include customizing models and enabling more efficient understanding of inputs with less need for detailed prompts. However, excessive fine-tuning might affect a model's generalization ability. Model merging is an emerging technique that combines two or more large language models into a single model through vector weight grouping techniques. This approach presents promising results in producing state-of-the-art models on the Open LLM Leaderboard. Model merging's primary advantage includes lower computational costs as no GPU is required during this process. Potential disadvantages could arise due to issues combining parameters effectively or compatibility issues between merged models. As we strive towards better and more efficient LLMs, a balance between fine-tuning, model merging, and computational resources becomes essential to deliver optimal results for various applications."
2 Comments
Like Comment
To view or add a comment, sign in
Sivas Subramaniyan

Designing Applied AI solutions
7mo
Report this post
It is the era of AI Agents ! There is considerable reason to believe that GPT-4omni is an AI Agent more than a Language Model. GPT-4o leverages an AI architecture that connects an LLM with various Machine Learning Models and operates coherently as a software architecture/ platform over just a simple language model. The renowned, Andrew Ng made a compelling presentation on AI Agents. He abstracts the fundamental function of an AI Agent in the steps of Self Reflection, Planning, Tool-use, and Multi-Agent Collaboration. 1. Self Reflection: Self reflection is like hitting back space after you've typed something and iteratively going through a feedback loop of Thought - Refinement - Expression until you are satisfied. Check out this paper on Self-Refine (https://lnkd.in/gsTZFDGj) 2. Planning: Most human inputs are broken or verbose. Capturing the intent from such human essentially means to breakdown the human input into its intermediary reasoning. And solving for such intermediaries using tools: These papers on Chain-of-thought prompting (https://lnkd.in/g4BHDKck) and HuggingGPT helps understand this ( https://lnkd.in/gw-wk6FZ) 3. Tool-use: For a Language Model to pick up Multi-modal inputs and reason on it requires it to be capable of interfacing with other Machine Learning and Deep Learning tools. This is achieved seamlessly through a middle ware that is configured using Language (model descriptions). The way "3x+1 = 4" was solved in GPT-4o announcement video is the first example in the paper MM-REACT (https://lnkd.in/gUWCkVAH). Similarly a try on advanced analytics on GPT-4o calls function from scikit-learn to solve for the uploaded datasets. This potentially means that GPT-4o is an Agent and the future of LLM based architectures are moving steadily and surely into AI Agents. (at least till AGI arrives) We could soon see most computerized systems to become AI Agents based solutions. For instance engaging announcements where Governments are understaffed, A Point of Sales Machine help you decide your Subway meal, or a calendar that helps you organize yourself over a month.
Like Comment
To view or add a comment, sign in
Raphaël MANSUY

Data Engineering | DataScience | AI & Innovation | Author | Follow me for deep dives on AI & data-engineering
9mo
Report this post
# A Breakthrough: Activation Beacon, Unlocking AI's Full Potential: Soaring from LLama2 4K to 400K Context ... Google's Gemini 1.5 has shown promising AI applications using long context. Its 1 million token context capacity is a game changer for coherent dialogues, summarizing lengthy articles, and reasoning across documents.But what about open source models? Existing models like Llama-2 hit computational barriers with a limited to less than 32K. Now, new research from Beijing Academy of AI unveils an innovative solution - "Activation Beacon" - extending models' context by 100x from 4K to 400K tokens. 🚀 How does Activation Beacon achieve this? Through a plug-in module that "condenses raw activations" into compact forms. This allows models to perceive vastly longer contexts within original window sizes, sans expensive retraining. 👉 How it works ? Imagine you have a book with a limit of only being able to remember and use the last four sentences you read to understand the story: This is similar to an LLM with a limited context window—it can only consider a fixed number of tokens (words or characters) from the recent past to make predictions or understand text. Now, let's say you're given a summary tool, which we'll call the "Activation Beacon." This tool allows you to condense the essence of every four sentences into one "beacon" sentence without losing the key information. When you read the next four sentences, you create another beacon sentence, and so on. As you continue reading, instead of only recalling the last four sentences, you can now recall the last four beacon sentences, each representing four original sentences. This effectively extends your memory from just four sentences to sixteen, but you're still only holding four beacon sentences in your mind at any given time. 🎯 3 Key Benefits: - ⚡️"Remarkable efficiency - competitive memory and time costs for training and inference - 🤝 "Full compatibility" - functions as add-on without compromising models' existing capabilities - 📈"Superior performance" - significantly boosts language modeling and understanding across long-context tasks 🔎 Real-world implications are far-reaching, from coherent legal brief generation to multi-document question answering. With open-sourced code available, I encourage exploring Activation Beacon's possibilities. How can we build on this leap towards more contextually-aware and capable AI systems? I welcome your perspectives on extending natural language progress.

5 Comments
Like Comment
To view or add a comment, sign in
AI Dev Tools Club

618 followers
9mo
Report this post
Meet MemGPT: An Open-Source AI Tool that Allows You to Build LLM Agents with Self-Editing Memory https://lnkd.in/gBuxqntn #artificialintelligence #largelanguagemodels #opensource

Meet MemGPT: An Open-Source AI Tool that Allows You to Build LLM Agents with Self-Editing Memory

aidevtoolsclub.com
Like Comment
To view or add a comment, sign in
Pasi Vuorio

Generative AI | Co-founder @ LastBot | Business Oriented Product Architect | Tech Leadership | Developer At Heart
5mo
Report this post
AI news you probably missed. Had a week of holiday and now catching up a bit what happened in the world of AI. GPT-4o-lite was of course the biggest announcement, but here are some other worth mentioning. From GPT-4o-lite, I will write a full review after testing it thoroughly. 📰 Groq Releases Llama-3-Groq-70B-Tool-Use and Llama-3-Groq-8B-Tool-Use: Open-Source, State-of-the-Art Models Achieving Over 90% Accuracy on Berkeley Function Calling Leaderboard. https://lnkd.in/dEeRVCzF 📰 Stella embedding model steals the first position in the MTEB Leaderboard as the best english embedding model. Not much info about the project, but it seems to support following dimensions: 512, 768, 1024, 2048, 4096, 6144 and 8192. The MTEB score of 1024d is only 0.001 lower than 8192d. https://lnkd.in/dZGXpuPZ 📰 Meta will announce their Llama3-400B model this week and it's expected to compete or even beat GPT-4o and Claude 1.5 Sonnet in performance. At the same time they announced that they will not bring their multimodal capabilities to Europe in the fear of regulations (mainly related to which data they use for training the model, namely Facebook and Instagram data). 📰 Arcee AI Introduces Arcee-Nova: A New Open-Sourced Language Model based on Qwen2-72B and Approaches GPT-4 Performance Level. https://lnkd.in/dhW2y56P 📰 Mistral AI has announced NeMo, a 12B model created in partnership with NVIDIA. Compared to closest competitors (Llama3 and Gemma 2), it offers 128k token context and better reasoning performance. These small language models are becoming great.

Groq Releases Llama-3-Groq-70B-Tool-Use and Llama-3-Groq-8B-Tool-Use: Open-Source, State-of-the-Art Models Achieving Over 90% Accuracy on Berkeley Function Calling Leaderboard

https://www.marktechpost.com

2 Comments
Like Comment
To view or add a comment, sign in
Rasheed AbdulKareem

Founder & CBO at D-Aggregate|10Years Data Expert| @Kaggle Contributor|ML Researcher
6mo
Report this post
RAG(Retrieval Augmented Generation)- I know you have seen this acronym many times on the internet, I think it's very important to distill what RAG means and why we talk about RAG in LLM(Large Language Model). Do you know datacentric AI sometimes hallucinates? Yes, they do when its dealing with much complex prompts from users. Again what does it look like when you say AI hallucinates? AI, especially LLM models like #GPT, #mistra and other large language model systems were trained with large amounts of data from the internet, their responses indexed so many responses for a simple but complex question/prompt. In this case, you may see more than 5 responses for just one prompt because the LLM wasn't sure of the best answer, this is a disadvantage to critical cases like hospital patient chatbots. So let's quickly dive into RAG, RAG allows ML engineers to build LLM AI with their content, it's not a model, it's just an architecture, and its a very simple architecture- you can simply call it your knowledge base LLM. RAG helps you define how your LLM AI responds to users in a well-defined and specific way. This in turn helps with LLM accuracy. How does RAG work? For any institution, you need to prepare your data(it could be a repository of research, conversations, reports, product reviews, ticket responses from customer support, engineering solution documentation, tutorial documents etc). ✍ Your data will be chunked(I wrote about chunks and token in my last post) into sentences, paragraphs, words- this is very easy this days, you can easily read up on many different ways LangChain or Hugging Face. Chunking data makes it easier for your LLM to be well-trained and retrieve specific information in response to prompts. ✍ Your chunks will be embedded into a vector(a simple mathematics vector of numbers)- this could be confusing for non-data experts or people with math-phobia but do not worry about this just understand the concept. ✍ Lastly you introduce any LLM prompt engineering model to define how your response is fed to users with new questions. This is the simplest process of RAG, it could be more complex in cases where you need to fine-tune your model. As users of AI or LLM products out there, you may need to try it out and see how AI hallucinates by just providing a prompt that's probably not easy to find on the internet- you will get the gist better. AI could be better with #RAG... Jerry Liu explained this in one of his podcasts. #rag #llm #AI #artificialintelligence #linkedin #aws #azure #datascience #data #gpt #gemini #task #chunks #embedding #ai #knowledgebase #gitex #fintune #model #modelgeneration #retrievalaugmentedgeneration #africa #internet #ml #mlengineering #mlresearch #foundation #nlp #transformer #predictions #googleai
2 Comments
Like Comment
To view or add a comment, sign in
Raghav Grover

Tech Architecture Manager @ Accenture | Driving Innovation in Life Sciences
9mo Edited
Report this post
### LangGraph by LangChain: Pioneering Multi-Agent AI Workflows LangGraph, developed by LangChain, is revolutionizing the way we approach building stateful, multi-actor applications with Large Language Models (LLMs). This innovative library extends the capabilities of LangChain, allowing for the coordination of multiple agents (or actors) across various computational steps in a cyclic manner, inspired by models like Pregel and Apache Beam . #### The Essence of LangGraph: Nodes, Edges, and Stateful Agents LangGraph introduces a graph-based approach to managing AI agents, where each agent is considered a node within a graph. These nodes are interconnected through edges, which can be either normal or conditional, directing the flow of operations based on the outcomes of agent interactions or specific conditions . - **Stateful Agents**: Agents in LangGraph are designed to be stateful, meaning they can remember past interactions and evolve over time. This is crucial for applications requiring complex decision-making and problem-solving abilities that improve with each interaction. - **Conditional Edges**: LangGraph uses conditional edges to dynamically determine the path between nodes based on certain conditions. This flexibility allows for more complex and adaptable agent workflows, where the sequence of agent interactions can change based on the situation or data at hand . #### Multi-Agent Workflows and Their Benefits LangGraph shines in its ability to facilitate multi-agent workflows, where multiple independent actors collaborate to achieve a common goal. This is particularly effective when tasks are divided among specialized agents, each designed to tackle specific aspects of a problem. Such a division of labor not only enhances the overall efficiency and effectiveness of the system but also allows for the development of more robust and scalable AI applications . The concept of multi-agent designs in LangGraph enables: - **Focused Task Success**: Agents focusing on narrower, specialized tasks tend to achieve higher success rates. - **Customizable Prompts and Tools**: Each agent can have its own set of instructions, few-shot examples, and even be powered by distinct LLMs tailored to their specific functions . #### Real-World Applications and Framework Comparisons LangGraph is not alone in the landscape of multi-agent AI frameworks but stands out due to its graph-based representation and integration with the LangChain ecosystem. Compared to other frameworks like Autogen and CrewAI, LangGraph offers a more intuitive and modifiable approach to constructing complex workflows. Its compatibility with LangChain ensures users can leverage a wide range of integrations and functionalities, making it a powerful tool for developers looking to push the boundaries of AI agent collaboration . #generatieveai #langchain #genai

1 Comment
Like Comment
To view or add a comment, sign in
Anthony Alcaraz

Senior AI/ML Strategist Startups & VC @AWS - Writing on AI/ML, analysis are my own 👌
5mo Edited
Report this post
Modularity, Structure, and Agency in Retrieval-Augmented Generation: A Powerful Synergy 🔵 https://lnkd.in/enTiFNa4 A revolution is brewing in the world of artificial intelligence. We’re witnessing a paradigm shift in how large language models reason over vast information pools. At the heart of this transformation lies Retrieval-Augmented Generation (RAG), a promising approach that combines the generative power of large language models with external information retrieval. But RAG is evolving, and its latest iteration — Agentic RAG is the way forward. Agentic RAG represents the convergence of these powerful paradigms: modularity, structured knowledge representation, and intelligent agency. This combination creates a synergy that’s reshaping the landscape of AI-powered information processing. Modularity in RAG is about breaking systems into specialized components. Imagine a Swiss Army knife for AI. One tool drafts responses, another verifies them. This approach enhances flexibility and allows for continuous improvement of individual modules. Modularity in RAG is about agility and customization. It allows businesses to tailor AI systems to their specific needs, much like assembling the perfect team for a critical project. One module might specialize in financial analysis, another in customer sentiment. This flexibility enables companies to adapt quickly to changing market conditions and evolving data landscapes. Structured knowledge representation provides a rich, interconnected foundation for reasoning. Think of it as transforming a pile of books into a vast, intricately linked digital library. By organizing information into knowledge graphs, RAG systems can navigate complex relationships between concepts with ease. This paradigm shift allows businesses to see connections they never knew existed, uncovering hidden opportunities and risks. It’s not just about having data; it’s about understanding the complex relationships within it. When combined, these two paradigms create a synergy. Modular components can now operate on a foundation of structured knowledge, enabling more precise, context-aware, and adaptable AI systems. The agentic layer is where the magic happens. It orchestrates these modular components and structured knowledge bases, creating a system that’s greater than the sum of its parts. Cutting-edge techniques like Speculative RAG and GraphRAG exemplify these approaches.
2 Comments
Like Comment
To view or add a comment, sign in
HANZHANG ZHAI

Building Enterprise Generative AI Solutions | Trilingual (EN/CN/JP) | Cross-Industry Experience | AWS & Azure Certified | RAG/LLM/Agent/Prompt Engineering/Dify/Python
4w
Report this post
Recently I found two Weaviate blog posts incredibly enlightening. While traditional RAG is already powerful - using external knowledge to enhance LLMs and reduce hallucinations - the concept of Agentic RAG takes this even further. Think about it: Instead of a simple "query-retrieve-generate" pipeline, we now have an intelligent assistant that can: - Determine if retrieval is necessary - Choose the most appropriate data sources - Validate retrieval quality - Re-retrieve when needed This evolution from traditional RAG to an agent-driven approach represents a significant leap forward in making AI systems more autonomous and reliable. https://lnkd.in/gRjaw-ZF https://lnkd.in/grjzrrqd

Introduction to Retrieval Augmented Generation (RAG) | Weaviate

weaviate.io

5 Comments
Like Comment
To view or add a comment, sign in

ibl.ai

2,605 followers

View Profile Follow

More from this author

Explore topics