Serverless GPUs?
As of today, this is now a possibility. Google becomes the first Cloud provider to offer serverless NVIDIA GPU access.
This unlocks new possibilities for companies running large language models (LLMs), by no longer requiring dedicated GPU servers to run their AI projects.
In a way, this almost democratizes GPUs and will certainly help remove barriers to entry for early-stage bootstrapped companies.
"The low cold-start latency is impressive, allowing our models to serve predictions almost instantly, which is critical for time-sensitive customer experiences. Additionally, Cloud Run GPUs maintain consistently minimal serving latency under varying loads, ensuring our generative AI applications are always responsive and dependable — all while effortlessly scaling to zero during periods of inactivity" - Thomas Menard, Head of AI Tech, L’Oreal
Discover the capabilities of Google Cloud Run, which now offers L4 NVIDIA GPU compatibility and NVIDIA NIM integration to tackle the difficulties of deploying AI applications, such as performance, scalability, and complexity. Read the tech blog ➡️ https://nvda.ws/3WTxT88
Software Developer at Run:ai
1w👏 👏 👏