MantisNLP

MantisNLP

IT Services and IT Consulting

Specialist consultancy in Generative AI | Natural Language Processing | AI Development, Consulting and Due Diligence

About us

Mantis NLP is an AI consultancy specialising in Generative AI and Natural Language Processing. We can provide advice for your data needs, integrate or embed into your AI project to provide practical support and develop, build and deploy the most relevant machine learning and deep learning techniques to solve your problem. We are committed to reduce ethical risks in AI applications and be active members of the open source community.

Industry
IT Services and IT Consulting
Company size
2-10 employees
Headquarters
Limassol
Type
Privately Held
Founded
2021
Specialties
Natural Language Processing, Artificial Intelligence, Machine Learning, and MLOps

Locations

Employees at MantisNLP

Updates

  • 📊 Evaluating RAG might be simpler than you think In fact you might not even need evaluation data, since you can use LLMs to do some of the heavy lifting. Here are some metrics you can calculate quite easily 💁 Helpfulness: Is the response helpful? 🎯 Relevance: Is the retrieved context in RAG relevant? 🤥 Faithfulness: Is the response truthful given the context? LLM evaluation is quite a hot topic and there are multiple frameworks and platforms built to enable you to evaluate with little to no code required. This is an interesting guide for one of them https://lnkd.in/ecyUPy6E

    View profile for Dipanjan S., graphic

    Head of Community • Principal AI Scientist • Google Developer Expert & Cloud Champion Innovator • Author

    Most articles and discussions are on building RAG Systems but don't forget to evaluate these systems when building. Here's my updated comprehensive guide on the most common RAG Evaluation Metrics. This guide has the following: - Explanation of Key Metrics in a RAG Workflow - Focus on Retrieval Evaluation Metrics - Context Precision, Recall, Relevancy - Focus on LLM Generation Evaluation Metrics - Answer Relevancy, Faithfulness, Hallucination Check, Custom LLM as a Judge - Detailed mathematical definition of each metric with explanation - Worked out example for each metric - Hands-on code of how to use these Do check this out and share with others if you find it useful!

  • Bites from last week AI news 🍪 1/ Microsoft releases phi-4, a 14B model with performance similar to some frontier models 🚀  https://lnkd.in/eZnXjYT9 2/ Google started rolling out its Gemini 2.0 models - beginning with Flash, which seems to be better than 1.5 Pro ⚡ The 2.0 series is advertised to be optimised for agentic workflows, some examples of which were shown through Project Astra and Mariner. https://lnkd.in/ghJZR66c 3/ Agentic workflows are finding their way into mainstream applications. Agentforce in Salesforce is one good example: it enables their customers to build customer support and sales agents that deal with some of the actions that can be automated 🤖 https://lnkd.in/gMiH_FHB 4/ Meta introduced a new transformer architecture that dynamically creates byte patches instead of tokens based on how probable the next byte is 🧪 This representation seems to scale better since it does not rely on a fixed vocabulary of tokens. https://lnkd.in/d4tf2puZ 5/ Meta released Llama 3.3 70b which delivers the same performance as their flagship 405B in version 3.2 at a much lower cost 🔥 It matches GPT4o, Claude 3.5, Gemini 1.5 Pro and Nova Pro in a couple of benchmarks. https://lnkd.in/ghyZu6UZ

    • No alternative text description for this image
  • ⚡ AI inference is speeding up Transformers, the underlying technology behind today’s AI, arenotoriously slow at inference. Remember when ChatGPT launched and you had to wait for a couple of seconds while your answer was generated one word at a time? During that time generation speed was approximately 1-10 tokens per seconds with the possibility for a 10x increase after quantisation and other optimisations 🐌 Noticeably, AI response are almost instant nowadays. At the same time, there is a race happening at the hardware level for the provider that can run AI the fastest, with two of the most prominent players being Cerebras and Groq 🔥 A few months ago Groq broke into the scene with an advertised 100 tokens per second generation speed for Llama 70B which has now increased to 250 👌 And while this is extremely fast for such a large model, Cerebras recently announced a speed of 2200 tokens per second for the same model 😮 On the surface such speeds may seem irrelevant for your application, but this is not entirely true since AI applications nowadays consist of multiple AI calls and components. Those speeds can enable building more complex solutions that still feel instant to the user. They also allow to improve existing solutions by allowing the model to “think more” for the same time 🚀

    • No alternative text description for this image
  • Bites from last week AI news 🍪 1/ Amazon releases Nova, a series of multi-modal models that push the boundary of cost-efficient intelligence 💸  https://lnkd.in/eqJ57Yt2 2/ OpenAI's o1 exits preview and the final model seems quite close to the preview version 😐 https://lnkd.in/gnyaTZyB 3/ Pydantic releases its agent framework PydanticAI, setting out to formalise inputs, outputs and tools used in agentic workflows 👌 https://lnkd.in/ezZ2z-W9 4/ Researchers around the world collaborated in training a 10B model in a distributed fashion 😮 This opens the door for frontier models to trained and released completely in the open given enough contributors joining forces. https://lnkd.in/gUqnbTEe

    • No alternative text description for this image
  • 🤔 To quantize or not? As the race to ever higher compression of LLM parameters continues, it is worth asking the question: what is the impact this compression has on performance and which models are affected more than others? There is no free lunch here, compressing the parameters essentially reduces the ability of the model to learn, in essence its effective parameter count. The effect is more evident in models that have reached close to peak performance for their size i.e. small models trained on many tokens. Those models use effectively the full representation power they are given be it 16 bit to whatever precision they were trained on and quantising them, makes them lose important information. Importantly the performance penalty happens only in cases where the precision used for inference is smaller than the one the model was trained on i.e. trained in 16 bit and quantised in 4 bits. So to the extent the same precision is used for both training and inference, there is no problem with lower precision. Of course you can anticipate a higher precision model to perform better, all things similar otherwise. So here are the two takeaways to remember in regards to quantisation 1️⃣ Do not quantise small models trained on many tokens, anything state of the art below 20B. Do quantise larger models, anything above 20B 2️⃣ If you end up using a small model trained on many tokens in low precision, its best to find one that has been trained in low precision 🔗 Read more in the scaling law for precision paper https://lnkd.in/e7r7GEXR

    • No alternative text description for this image
  • Bites from last weeks AI news 🍪 1/ Mistral announced Pixtral large, it’s multimodal offering that seems on par with state-of-the-art models 🔥 https://lnkd.in/g5fDyH8s 2/ Cerebras clocks 1000 tokens per second for the largest Llama 😮 For comparison, Groq, another leading AI inference solution advertises a speed of 736 t/s 🐌 for only the small Llama 🦙 https://lnkd.in/g-RGjf9Q 3/ Anthropic now offers you the ability to decipher your writing style from previous writings and adjust Claude to write like you ✍️ https://lnkd.in/eRZFD4Q6

    • No alternative text description for this image
  • 📚 Fast and accurate tool to parse technical PDF documents Parsing documents written for humans - such as scientific papers, policy documents and patents - is a well established use case of AI aiming to make the information inside those documents structured and usable. Up until now yοu could use either a specialised model that worked only in some cases or an LLM that was more general but failed often depending on the document format. It seems that we may have the best of both worlds with Docling 🦆: a new tool, based on a layout- and table-aware architecture, but scaled to a large enough dataset to be more accurate and fast 🔥 It is also open source and easy to use with a few lines of code. Definitely worth trying it as a component of your RAG system or information extraction pipeline. 🔗 Read more in the technical report https://lnkd.in/egQZszDi

    • No alternative text description for this image

Similar pages