𝘞𝘦'𝘷𝘦 𝘨𝘰𝘵 45 𝘮𝘪𝘭𝘭𝘪𝘰𝘯 𝘳𝘦𝘢𝘴𝘰𝘯𝘴 𝘵𝘰 𝘤𝘦𝘭𝘦𝘣𝘳𝘢𝘵𝘦, 𝘢𝘯𝘥 𝘸𝘦 𝘥𝘪𝘥 𝘪𝘵 𝘪𝘯 𝘴𝘵𝘺𝘭𝘦! 🎉🌴 Ahead of our Series B funding announcement, we whisked our incredible Galileo team away to Cabo for an unforgettable offsite. From beachside brainstorming to sunset team-building, we've returned refreshed, inspired, and more united than ever—we're ready to bring Evaluation Intelligence to AI engineers worldwide! 🌎 A massive thank you to our team - your brilliance and dedication are the real reasons we're celebrating. Here's to the exciting journey ahead! 𝘗𝘚 - 𝘞𝘦'𝘳𝘦 𝘩𝘪𝘳𝘪𝘯𝘨 👀🎯 #TeamGalileo #StartupLife #AIEngineering #WorkHardPlayHard #Hiring
Galileo 🔭
Software Development
San Francisco, California 9,343 followers
Galileo is the leading Evaluation Intelligence platform that helps teams of all sizes build AI apps they can trust.
About us
Galileo is the leading platform for enterprise GenAI evaluation and observability. Our comprehensive suite of products support builders across the new AI development workflow—from fine-tuning LLMs to developing, testing, monitoring, and securing their AI applications. Each product is powered by our research-backed evaluation metrics. Today, Galileo is used by 100s of AI teams from startups to Fortune 50 enterprises, including Twilio, Comcast, and HP.
- Website
-
https://www.galileo.ai
External link for Galileo 🔭
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2021
Locations
-
Primary
525 Brannan St
San Francisco, California 94107, US
Employees at Galileo 🔭
-
Dharmesh Thakker
General Partner at Battery Ventures - Supporting Cloud, DevOps, AI and Security Entrepreneurs
-
Jason Gan
Product Design @Galileo
-
Brent Chalker
GTM @ Galileo - Build and Evaluate GenAI Apps Faster | Lean Thinker | Passionate Sales Leader | Business Value Creator
-
Will Goldfarb
Finance at Galileo - do you trust your company's AI app?
Updates
-
Galileo 🔭 reposted this
Enjoyed chatting with Gené Teare at Crunchbase about how Citi Ventures invests in leading AI startups: Datavolo (Acq by Snowflake), Galileo 🔭, Glean, Lexion (Acq by Docusign), Norm Ai, poolside and Writer to accelerate the transformation of Citi and financial services and large enterprises companies in general. Arvind Purushotham Vibhor Rastogi Blaze O'Byrne Çağla Kaymaz Jelena Zec Olivia Zhang Arti Kuthiala Dana Wohlfarth-Piper, MSPR
-
Galileo 🔭 reposted this
Reduce complexity. Deploy faster. Drive results that matter. Building GenAI applications shouldn't mean navigating an endless maze of technology integration choices. The Elastic AI Ecosystem brings together Elasticsearch vector database integrations with industry-leading AI technology providers to build production ready GenAI applications. Meet the ecosystem: Alibaba Cloud, Anthropic, Amazon Web Services (AWS), Cohere, Confluent, Dataiku, DataRobot, Galileo 🔭, Google Cloud, Hugging Face, LlamaIndex, LangChain, Mistral AI, Microsoft Azure, NVIDIA, OpenAI, Protect AI, Red Hat, Vectorize, unstructured.io Learn more about how our AI Ecosystem enables your organization to accelerate AI innovation: https://go.es.io/3CmQ8w4 #Elastic #Genai #VectorDB #Ecosystem
Elastic announces the Elastic AI Ecosystem
elastic.co
-
Galileo 🔭 reposted this
Build a REAL finance research agent with LangGraph 🗯️ Imagine a research assistant that doesn't just answer questions but comes up with questions like a skilled analyst. It starts by understanding the problem and then searches for right information. Once it gets the information it distills it like a human. But it doesn't stop there... The magic happens in how it thinks. When you ask "Should I invest in Tesla?" it breaks this down into smaller questions: How's the EV market doing? What's Tesla's market share? What are their latest financials? Just like a pro would do. We can actually see how it makes decisions. Every claim it makes about Tesla needs proof. If it says "Tesla's sales grew 20%" it must show where it found this data. No random guessing allowed. Our top agents hit 84% accuracy while processing research in just 84 seconds - 3x faster than other versions. At just $0.0025 per request, we can keep testing until it's perfect. Every number tells us exactly when our agent is hitting the mark or needs improvement. Think of it like having a super-smart research intern who shows their work. When they're wrong we can see exactly where they messed up and fix it. When they're right we know precisely why. Want to build your own research agent? Full code and tutorial in the link 👇
-
Lost in the jargon behind AI Agents? Check out Erin Mikail 🏄🏼♀️ Staples's first blog for Galileo, covering the landscape of AI Agents and the workflow behind them. Give it a read 👇
Developer Experience Engineer @ Galileo | Stand-up Comedian | Currently geeking out over technical content, entertaining education, open source, and machine learning.
AI Agents this, AI Agents that — we've all heard this buzzword over the last 6 months. ... but what exactly **is** an AI Agent, and how do they work? In my first blog post for Galileo 🔭, let's explore the different types of AI Agents and the workflows these agents follow to succeed. Thank you to Pratik Bhavsar and Conor Bronsdon for the review and to the fabulous crew of engineers and builders working behind the scenes to build the future of AI Evaluations. 💪 #GalileoAI #AI #AIAgents #AIObservability #AIEvaluations #ML
Agents, Assemble: A Field Guide to AI Agents - Galileo AI
galileo.ai
-
AI agents aren't simply replacing humans - they're becoming collaborative partners, augmenting human capabilities in measured, practical ways. In the latest episode of 𝘊𝘩𝘢𝘪𝘯 𝘰𝘧 𝘛𝘩𝘰𝘶𝘨𝘩𝘵, Vinnie Giarrusso from Twilio shares a grounded perspective on enterprise AI adoption with Conor Bronsdon: • While some chase fully autonomous agents, enterprises are taking a more measured approach • AI agents work best alongside humans, not in place of them • The focus is on augmenting human capabilities rather than replacement • Real enterprise adoption requires careful implementation, not just cutting-edge features 🎙️ "When we think about where the enterprise is sitting, the enterprise is going to be a little bit more careful in our approaches; we're going to be a little bit more measured and conservative in what we put out there and what we expect our agents to do." This reflects what we're seeing across the industry: successful enterprise AI deployment isn't about chasing the latest features but building sustainable, trustworthy systems that enhance human capabilities. You need the right observability and evaluation framework for your AI applications to achieve success. 🎬 Watch the full discussion below 🎧 Full episode in comments 👇 #EnterpriseAI #AIStrategy #DigitalTransformation #AIAdoption #TrustInAI
-
Galileo 🔭 reposted this
We need a new mental model for AI agents: they're async junior digital employees. Managing distributed teams is already complex. Now, we’re rapidly adding AI agents to that mix that are functioning as junior employees - each handling different workflows, requiring oversight, and needing evaluation. The organizational challenge grows exponentially. These digital employees: • Handle specific tasks independently • Need regular check-in points and feedback • Require proper management oversight • Must fit within larger organizational structures • Need training and evaluation to improve Often, the person managing these agents is both learning how to manage this new paradigm of digital employees and not an experienced people manager themselves. Usually, they’re reporting to a people manager. This isn’t just a technical challenge: if you're managing 10 AI agents, each running different workflows, that's significant operational overhead. The companies that solve this organizational design challenge first will have a massive competitive advantage. 𝗧𝗵𝗲 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀 𝗸𝗲𝗲𝗽𝗶𝗻𝗴 𝗺𝗲 𝘂𝗽 𝗮𝘁 𝗻𝗶𝗴𝗵𝘁: • Could more autonomous "senior" agents help oversee junior ones? Can we truly move away from human-in-the-loop systems anytime soon? • How do we build scalable AI management structures? • What new training will our human employees need to manage hordes of junior digital employees? • What's the right framework for measuring AI performance? • How does this reshape traditional org design? I explored this paradigm shift with Vinnie Giarrusso from Twilio in the Season 1 finale of Galileo 🔭's 𝘊𝘩𝘢𝘪𝘯 𝘰𝘧 𝘛𝘩𝘰𝘶𝘨𝘩𝘵 Podcast. How are you thinking about managing AI systems at scale? 🎬 Watch our discussion below 🎧 Full episode in comments 👇 #AI #FutureOfWork #AIStrategy #EnterpriseAI #Leadership
-
Galileo 🔭 reposted this
We spend a lot of time thinking about this -- when & how should you use LLM-as-a-Judge for evaluation? Seven criteria to help you make the right decision for your RAG or agent evaluation. 1️⃣ Nature of the Task Is your evaluation focused on subjective elements like writing style or creativity? Does it require understanding complex context? If traditional metrics like BLEU or ROUGE seem insufficient, LLM-as-a-judge might be your answer. 2️⃣ Scale Considerations When manual evaluation becomes impractical due to volume or when you need consistent criteria across thousands of samples. 3️⃣ Cost vs. Benefit Consider the tradeoffs between API costs and human evaluation. Factor in development iteration speed and the possibility of using smaller models for initial screening. 4️⃣ Evaluation Complexity LLM judges shine when you need to assess multiple aspects simultaneously - from coherence and relevance to accuracy and context awareness. 5️⃣ Ideal Applications Content generation quality assessment Conversational AI response evaluation Document summarization accuracy Style and tone consistency Context-aware fact-checking Creativity measurement 6️⃣ When to Look Elsewhere Avoid using LLM judges for: Tasks with clear ground truth Simple binary decisions High-stakes scenarios Legal/compliance verification Cases requiring perfect accuracy 7️⃣ The Hybrid Approach Consider combining methods: LLM screening + human review Multiple LLM judges for reliability Traditional metrics + LLM evaluation 💡 Pro Tip: Start small, experiment with different approaches and scale gradually. Build your own metrics when off-the-shelf fails.
-
Galileo 🔭 reposted this
Multimodal Playground - accepts text, images, videos, speech etc. as input - multiple multimodal LLMs to choose from - complete control with the end user (zero code!) - workflows naturally connects with humans - users can deploy their own apps in a few clicks!? Courtesy: Galileo 🔭 (Beyond Text: Multimodal AI Evaluations) #designthinking #ux
-
Galileo 🔭 reposted this
From five 9's uptime (0.99999) to just one (0.9)? Indeed, frontier AI models with RAG go off the rails more than 1% of the time, meaning if ChatGPT had to put a number in an SLA, it would be with ONE nine, a 0.9% uptime SLA!!! For those of us coming from the voice/data/networking world, this is like, incredulous. But truly it's a testament to how compelling AI has become for business leaders to even entertain (and attempt to guardrail) such risks. Thankfully, companies like Galileo 🔭 (no affiliation) are helping bring useful metrics to the table with "context adherence" which measures the factual accuracy and reasoning ability for a given context. This is the closest thing we have to an 'uptime' stat for AI, so... Treat yourself to the full report here https://lnkd.in/gcWUQbqA and see how OpenAI, Google Gemini and Llama stack up!
LLM Hallucination Index RAG Special - Galileo - Galileo
galileo.ai