Gentrace

Software Development

Test generative AI across teams. Automate evals for reliable LLM products and agents.

See jobs Follow

Discover all 9 employees

About us

Test generative AI across teams. Automate evals for reliable LLM products and agents.

Website: https://gentrace.ai
External link for Gentrace
Industry: Software Development
Company size: 2-10 employees
Type: Privately Held
Founded: 2020
Specialties: AI, Infrastructure, Generative AI, LLMops, LLM, Analytics, Testing, DevOps Infrastructure, Monitoring, and Evaluation

Employees at Gentrace

See all employees

Updates

Gentrace reposted this
Doug Safreno

Co-founder, CEO at Gentrace
18h
Report this post
Lately I've seen some criticism of LLM as a judge evals. When I talk to devs struggling with this technique, 80% are asking the LLM the same question as the prompt: Avoid being the 80% by giving your LLM as a judge an "unfair advantage" — aka some additional context or capability that makes the eval task easier than the original generation task. https://lnkd.in/ezv2Y6Yp

Unfair advantages - a framework for building LLM-as-a-judge evaluations that reliably work

Like Comment Share
Gentrace

1,916 followers
1d
Report this post
With the rise of LLMs, the Webflow team set an ambitious goal to use natural language to make modifications to websites. To set up evals, they chose Gentrace. With Gentrace, the Webflow team: • Evaluates multimodal outputs (like website screenshots) using human and LLM-as-a-judge evals • Tests at scale with over 1000s evals/day • Allows over 25 stakeholders, including PMs and leadership, to contribute to evals Congrats to Bryant Chou and the Webflow team for the successful launch of their AI Assistant in October! We can't wait to see how Webflow continues to reinvent web development with AI.
Like Comment Share
Gentrace

1,916 followers
5d Edited
Report this post
With the generative AI market expected to grow to $38.7 billion by 2030, enterprise teams are looking for tools to democratize LLM testing. We believe the future of LLM testing is UI-first and connected to your app, so PMs, engineering, and domain experts can work together in the same tool. Learn more about our vision in Unite.AI's story covering our Series A: https://lnkd.in/eUT_ZTAC

Gentrace Secures $8M Series A to Revolutionize Generative AI Testing

https://www.unite.ai

Like Comment Share
Gentrace

1,916 followers
6d
Report this post
What if your LLM testing system could automatically optimize your prompts? That's where we're headed with Experiments, our new feature helping developers speed up last-mile tuning. Here's how it works: Unlike prompt playgrounds, Experiments provides a testing environment connected to your real application. From there, anything in your app (prompts, top-K, parameters) becomes a knob you can control from Gentrace. To get started: 1. Register your environment 2. Define your test interactions 3. Expose parameters to Gentrace Experiments is currently in beta, with prompt auto-optimization on the roadmap in 2025. Get started with our docs guide: https://lnkd.in/gNzkdC8H
Like Comment Share
Gentrace reposted this
Doug Safreno

Co-founder, CEO at Gentrace
1w
Report this post
Big news today: Gentrace raised our $8M Series A led by Kojo Osei at Matrix. We’re celebrating by launching Experiments, the first collaborative testing environment for LLM product development. This year, we’ve helped our customers like Quizlet, Webflow, and Multiverse ship incredible LLM products. The one thing slowing everyone down? Evals. You can’t get to production without them, but they’re too hard to build and maintain. We’re changing that. Our customers grew testing by 40x adopting Gentrace because of our approach that connects a collaborative testing environment to your actual application code. Now PMs and engineers can work together to build evals that actually work. It’s a massive improvement to testing that ultimately makes generative AI products more reliable for everyone. This journey wouldn’t have been possible without our awesome angel investors: Yuhki Yamashita, Garrett Lord, Bryant Chou, Tuomas Artman, Martin Mao, David Cramer, Ben Sigelman, Steve Bartel, Cai GoGwilt, Linda Tong, Cristina Cordova, and many more. Thank you to our customers and team for believing in what we’re building! Every day, we’re inspired to make generative AI apps better because of your support.

168 Comments

Like Comment Share
Gentrace

1,916 followers
3w
Report this post
Gentrace is expanding our team. We're looking for senior software engineers in NYC and SF who want to build tools to help the most advanced companies in the world make their generative AI systems reliable and predictable. Learn more at https://gentrace.ai/eng

Gentrace - Systematize LLM app development

gentrace.ai

Like Comment Share
Gentrace reposted this
Doug Safreno

Co-founder, CEO at Gentrace
1mo Edited
Report this post
Most engineers approach LLM-as-a-judge all wrong. The usual high-level metrics like hallucination or safety rarely tell you if your app actually works as intended. Even with human evaluators, general metrics won’t help them judge your app’s performance in a useful way. Good evals are built on something specific to your app—a unique “unfair advantage” that gives the LLM clear criteria. LLM-as-a-judge is a widely used approach where an LLM grades another LLM’s output. But without an unfair advantage, it can fall short in quality. LLM-as-a-judge problems: • Circular reasoning: How can an LLM grade what it itself generated? • Poor initial results lead teams back to vibes and manual grading. Imagine an LLM app is tasked with writing emails. A bad eval would ask the LLM to rate itself on how well it followed the prompt—a circular question that adds little signal and won’t help improve the model. Instead, give your LLM unfair advantages. Here’s how: Add tailored asserts: specific criteria your model should meet. For instance, in our email example: • No footer • Directly asks a question • Includes recipient’s email Another approach is comparison to a reference output. You provide a high-quality reference email for comparison, so the LLM can check for: • Missing or extra information • Conciseness vs. verbosity To build reliable AI evaluations, always ask: how can I create an unfair advantage for the LLM These targeted methods ensure your LLM-as-a-judge evals give you actionable insights into the quality of your app. At Gentrace, we’re making it easy for AI engineers to build their own high-quality evals. My unfair advantage blog post shares more examples for using LLM-as-a-judge effectively: https://lnkd.in/gavtDhCY
Like Comment Share
Gentrace

1,916 followers
1mo Edited
Report this post
New release, featuring production evaluator graphs and local evaluations / local test data. Production evaluator graphs Production evaluators now automatically create graphs to show how performance is trending over time. For example, you can create a "Safety" evaluator which uses LLM-as-a-judge to score whether an output is compliant with your AI safety policy. Then, you can see how the average output "Safety" trends over time. Local evaluations / local test data Gentrace now allows you to more easily define local evaluations and use completely local data / datasets. This makes Gentrace work better with existing unit testing frameworks and patterns. It also makes Gentrace incrementally adoptable into homegrown testing stacks. More: • User-specific view settings can be saved and overridden from a URL • Filter test runs by their input values • Added explicit compare button • Pinecone v3 (Node) support • o1 support • Fixed 68 bugs
1 Comment

Like Comment Share
Gentrace

1,916 followers
3mo
Report this post
Quizlet takes unstructured text and builds flash cards, syllabi, and other learning tools for students with generative AI. With Gentrace, they increased testing 40x and reduced test duration to 1 minute. Learn more: https://lnkd.in/gF_Zi7Cd

Quizlet builds automatic, high quality study materials with Gentrace

gentrace.ai

Like Comment Share
Gentrace

1,916 followers
3mo
Report this post
Faire built an AI agent that reviews PRs. They use an AI evaluator in Gentrace to review the reviewer. Learn more about how Faire systematically develops their AI features in their blog post: https://lnkd.in/gf_arpB9
Like Comment Share

Browse jobs

Funding

Gentrace 3 total rounds

Last Round

Series A Jan 10, 2025

US$ 8.0M

Investors

Matrix + 12 Other investors

See more info on crunchbase

Gentrace

Software Development

Test generative AI across teams. Automate evals for reliable LLM products and agents.

About us

Employees at Gentrace

Jeff Cestra

GTM Engine o11y OG

Daniel C. Liem

Co-founder, COO at Gentrace

Vivek Nair

Co-founder, CTO at Gentrace

Nick Enthoven

Account Executive

Updates

Join now to see what you are missing

Similar pages

Linear

StacksWare

Aloft

Playbook

Kindred

Mainframe

n8n

Gitar

Matrix

Stainless

Browse jobs

Full Stack Engineer jobs

Junior Software Engineer jobs

Engineer jobs

Developer jobs

Game Programmer jobs

Statistical Programmer jobs

C Developer jobs

Senior Software Engineer jobs

Java Software Engineer jobs

Frontend Developer jobs

Machine Learning Engineer jobs

Project Engineer jobs

Designer jobs

Software Engineer jobs

Analyst jobs

Funding