NIM

Serve optimized model containers with NIM in a Flyte task.

NVIDIA NIM, part of NVIDIA AI Enterprise, provides a streamlined path for developing AI-powered enterprise applications and deploying AI models in production. It includes an out-of-the-box optimization suite, enabling AI model deployment across any cloud, data center, or workstation. Since NIM can be self-hosted, there is greater control over cost, data privacy, and more visibility into behind-the-scenes operations.

With NIM, you can invoke the model’s endpoint as if it is hosted locally, minimizing network overhead.

Installation

To use the NIM plugin, run the following command:

$ pip install flytekitplugins-inference

Example usage

For a usage example, see NIM example usage.

NIM can only be run in a Flyte cluster as it must be deployed as a sidecar service in a Kubernetes pod.