Tianshu Cheng’s Post

View profile for Tianshu Cheng, graphic

Make GenAI inference affordable @Baseten | ex-mosaicML(Databricks) | ex-Twitter

Super excited to share my first project as a Forward Deployed Engineer at Baseten. 🚀 With the launch of Custom Servers, developers can now deploy a production-ready model server from any Docker image (open source or in house) onto a highly reliable, auto-scaling GPU cloud by ONE CLICK. 💡 "You bring your Docker image, we make it production-ready from day 1." Try it today! Quick demo: see post below 👇 Blog: https://lnkd.in/gggmrsmu Doc: https://lnkd.in/gXuKqi7F #ai #genai #genaiapp #Docker #GPU #cloud #LLM #whisper #opensource #startup #aistartup #developers #genaidevelopers

View organization page for Baseten, graphic

5,629 followers

We’re excited to introduce Custom Servers on Baseten! To run customers’ mission-critical inference workloads, Baseten had to be great at 2 things: 1: Performance optimizations at the model level 2: Massive-scale infrastructure with cross-cloud horizontal scaling All wrapped in an expressive DevEx. We've built extensive tooling for performance optimizations—like our optimized Engine Builder in Truss. However, some of our customers come to us with pre-optimized models, mainly wanting to take advantage of the seamless autoscaling Baseten provides. With the launch of Custom Servers, you can bring your Docker image—untouched—and turn on cross-cloud autoscaling, fast cold starts, and low-latency chaining of models for compound AI systems. This makes it easier than ever to leverage our world-class infra, with blazing-fast inference and effortless autoscaling for any demand, coupled with the most expressive DevEx. Shoutout to our engineers Tianshu C., Sidharth Shanker, and Bola Malek across our Forward-Deployed and Core Product teams for this feature! Learn more in the launch blog: https://lnkd.in/e_6_GyAh Check out the demo by the lead engineer behind Custom Servers, Tianshu C.!

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

1w

The emphasis on "production-ready from day 1" suggests a streamlined deployment pipeline leveraging infrastructure as code tools like Terraform or Pulumi. Auto-scaling GPU cloud instances likely utilize container orchestration platforms like Kubernetes, ensuring efficient resource allocation for diverse AI workloads. Given the focus on open-source and in-house Docker images, how would Baseten's platform address potential security vulnerabilities stemming from untrusted custom models during deployment?

Like
Reply
Jiaqi Wang

Director of Product @Mobvoi | AIGC creator | ex-TikTok

1w

cool man!

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics