NVIDIA NIM for Developers
NVIDIA NIM™, part of NVIDIA AI Enterprise, provides containers to self-host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and workstations. Upon deployment with a single command, NIM microservices expose industry-standard APIs for simple integration into AI applications, development frameworks, and workflows. Built on pre-optimized inference engines from NVIDIA and the community, including NVIDIA® TensorRT™ and TensorRT-LLM, NIM microservices automatically optimize response latency and throughput for each combination of foundation model and GPU system detected at runtime. NIM containers also provide standard observability data feeds and built-in support for autoscaling on Kubernetes on GPUs.
How It Works
NVIDIA NIM helps overcome the challenges of building AI applications, providing developers with industry-standard APIs for building powerful copilots, chatbots, and AI assistants while making it easy for IT and DevOps teams to self-host AI models in their own managed environments. Built on robust foundations, including inference engines like TensorRT, TensorRT-LLM, and PyTorch, NIM is engineered to facilitate seamless AI inferencing at scale.
Introductory Blog
Learn about NIM’s architecture, key features, and components.
Documentation
Access guides, API reference information, and release notes.
Introductory Video
Learn how to deploy NIM on your infrastructure using a single command.
Deployment Guide
Get step-by-step instructions for self-hosting NIM on any NVIDIA accelerated infrastructure.
Build With NVIDIA NIM
Get Superior Model Performance
Improve AI application performance and efficiency with accelerated engines from NVIDIA and the community, including TensorRT, TensorRT-LLM, and more—prebuilt and optimized for low-latency, high-throughput inferencing on specific NVIDIA GPU systems.
Run AI Models Anywhere
Maintain security and control of applications and data with prebuilt microservices that can be deployed on NVIDIA GPUs anywhere—workstation, data center, or cloud. Download NIM inference microservices for self-hosted deployment, or take advantage of dedicated endpoints on Hugging Face to spin up instances in your preferred cloud.
Customize AI Models for Your Use Case
Improve accuracy for specific use cases by deploying NIM inference microservices for models fine-tuned with your own data.
Maximize Operationalization and Scale
Get detailed observability metrics for dashboarding, and access Helm charts and guides for scaling NIM on Kubernetes.
NVIDIA NIM Examples and Blueprints
Build RAG Applications With Standard APIs
Get started prototyping your AI application with NIM hosted in the NVIDIA API catalog. Using generative AI examples from GitHub, see how to easily deploy a retrieval-augmented generation (RAG) pipeline for chat Q&A using hosted endpoints. Developers can get 1,000 inference credits free on any of the available models to begin developing their application.
Jump-Start Development With NIM Blueprints
NVIDIA NIM Agent Blueprints are reference workflows for canonical generative AI use cases. Enterprises can build and operationalize custom AI applications — creating data-driven AI flywheels — using NIM Agent Blueprints along with NIM microservices and NeMo framework, all part of the NVIDIA AI Enterprise Platform. NIM Agent Blueprints also include partner microservices, one or more AI agents, reference code, customization documentation and a Helm chart for deployment.
Explore NVIDIA NIM Agent BlueprintsDeploy NIM on Cloud via Hugging Face
Simplify and accelerate the deployment of generative AI models on Hugging Face with NIM. With just a few clicks, deploy optimized models like Llama 3 on preferred cloud platforms.
Get Started With NVIDIA NIM
Explore different options for building and deploying optimized AI applications using the latest models with NVIDIA NIM.
Develop
Get free access to NIM for research, development, and testing through the NVIDIA Developer Program. Questions? Check out the FAQ.
Deploy
Move from pilot to production with the assurance of security, API stability, and support with NVIDIA AI Enterprise.
NVIDIA NIM Learning Library
Getting Started Blog
Learn how to use NIM microservices APIs across the most popular generative AI application frameworks like Haystack, LangChain, and LlamaIndex.
Benchmarking Guide
Learn how to benchmark deployment of LLMs , popular metrics and parameters, as well as a step-by-step guide.
Documentation
Learn more about high-performance features, applications, architecture, release notes, and more for NVIDIA NIM for LLMs.
More Resources
Ethical AI
NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.
Learn about the latest NVIDIA NIM models, applications, and tools.