Today, we're excited to introduce Natural Language Unit Tests with LMUnit: A new approach to LLM evaluation that brings the rigor and accessibility of traditional software engineering unit testing to large language models. While LLMs have become increasingly accessible, evaluating them effectively remains a significant challenge. With LMUnit, developers can write statements and questions in natural language that verify desirable qualities of LLM responses, like “Does the response’s tone match the query’s?” or “Does the response accurately respond to the query?”. Then, they can use our evaluation-optimized LMUnit model to evaluate these unit tests and improve their systems accordingly. In fact, LMUnit outperforms frontier models like GPT-4 and Claude on evaluation tasks while providing greater interpretability at lower cost Key advantages of using LMUnit: 👱 𝗔𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 𝘄𝗶𝘁𝗵 𝗵𝘂𝗺𝗮𝗻 𝗽𝗿𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗲: 93.5% accuracy (top 10 submission) on RewardBench benchmark. 💪 𝗥𝗼𝗯𝘂𝘀𝘁 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗮𝗰𝗿𝗼𝘀𝘀 𝘁𝗮𝘀𝗸𝘀: State-of-the-art performance on FLASK and BiGGen Bench benchmarks. 🎯 𝗙𝗶𝗻𝗲-𝗴𝗿𝗮𝗶𝗻𝗲𝗱, 𝗔𝗰𝘁𝗶𝗼𝗻𝗮𝗯𝗹𝗲 𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀: In a user study of 16 LLM researchers, all of them preferred LMUnit over traditional LM judges and diagnosed over 131% more error modes. ⚙️ 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻-𝗥𝗲𝗮𝗱𝗶𝗻𝗲𝘀𝘀: Seamless integration with existing CI/CD pipelines with familiar unit testing principles. 👪 𝗔𝗰𝗰𝗲𝘀𝘀𝗶𝗯𝗶𝗹𝗶𝘁𝘆 𝘁𝗼 𝗻𝗼𝗻-𝘁𝗲𝗰𝗵𝗻𝗶𝗰𝗮𝗹 𝘀𝘁𝗮𝗸𝗲𝗵𝗼𝗹𝗱𝗲𝗿𝘀: Non-technical team members can specify and understand evaluation criteria. If you're building production applications using LLMs today or simply exploring new use cases, LMUnit can help accelerate your development process and ensure the the LLM is behaving exactly the way you expect. 👉 We'd love for you to try it for free today. https://lnkd.in/grntDMHJ 👇 See comments below for the full launch blog and technical paper! 🎁 Congratulations to our incredible Research and Product teams who worked tirelessly to deliver this early Christmas present to the AI community! Shikib Mehri Jon S. William Berrios Rajan V. Nandita Shankar Naik Matija Franklin Bertie Vidgen Amanpreet Singh Douwe Kiela Ishan Sinha Aditya Bindal Khaled ElGalaind Elizabeth Lingg Lingxi Li Nikita Bhargava
About us
Enterprise LLMs
- Website
-
https://contextual.ai
External link for Contextual AI
- Industry
- Software Development
- Company size
- 51-200 employees
- Type
- Privately Held
Employees at Contextual AI
Updates
-
We're #hiring a new Data Labeler: Finance in London, England. Apply today or share this post with your network.
-
We're #hiring a new Data Labeler: Hardware in London, England. Apply today or share this post with your network.
-
We're #hiring a new Senior Copy Editor in London, England. Apply today or share this post with your network.
-
We're #hiring a new Member of Technical Staff (Extraction) in Mountain View, California. Apply today or share this post with your network.
-
We're #hiring a new Member of Technical Staff (Full Stack) in Mountain View, California. Apply today or share this post with your network.
-
We're #hiring a new Member of Technical Staff (Research) in Mountain View, California. Apply today or share this post with your network.
-
Contextual AI reposted this
Excited to share I’ve joined Contextual AI as the new Head of Growth! Looking forward to working with Jay Chen, and the Contextual team to help AI teams move beyond the demo and deliver GenAI ROI in 2025 and beyond.
-
We're #hiring a new PhD Research Intern (Winter) in Mountain View, California. Apply today or share this post with your network.
-
Contextual AI reposted this
Beyond Demo-Grade AI: GTM Lessons from Contextual AI's Enterprise Journey We spoke with Douwe Kiela, CEO of Contextual AI, about building enterprise-ready AI solutions. Here are the key go-to-market lessons from their journey: → Solve production problems, not demo problems. While competitors chase impressive demos, Contextual AI focused on tackling the real challenges of enterprise deployment: hallucination, attribution, and data privacy. "A lot of companies are building cool demos that kind of show the potential of the technology, but then they have a hard time bridging the gap to a production deployment." → Let market pull guide product development. Instead of pushing solutions, they let enterprise problems shape their roadmap. "We're in a very fortunate position where we're basically not doing any outreach and folks are coming to us with their problems." → Design for deployment flexibility from day one. They built their infrastructure to support both on-premise and SaaS deployments, addressing crucial enterprise concerns about data privacy and security. → Focus on specialized excellence over general capabilities. Rather than chasing AGI, they're building "artificial specialized intelligence, where you take these models and then you make them very good at the one thing that an enterprise really wants to solve." → Filter for tech-forward customers early. They prioritize enterprises that "already know exactly, like, these are like the, I don't know, top 10 use cases that we're most interested in" - companies with clear AI strategies and implementation plans. These insights from Contextual AI demonstrate how focusing on real enterprise needs over AI hype can build sustainable competitive advantages. Listen to the full conversation with Douwe Kiela on Category Visionaries to learn more about their approach to enterprise AI innovation. https://lnkd.in/dud4P2kD #EnterpriseAI #B2BSaaS #StartupLessons #ProductStrategy #GoToMarket