Skip to main content

llm-d-infra Quick Start

Getting started with llm-d-infra on Kubernetes.

If you want to deploy llm-d-infra and related tools step by step, see the README-step-by-step.md instructions.

For more information on llm-d, see the llm-d git repository here and website here.

Overview

This guide will walk you through the steps to install and deploy llm-d on a Kubernetes cluster, using an opinionated flow in order to get up and running as quickly as possible.

Client Configuration

Get the code

Clone the llm-d-infra repository.

git clone https://github.com/llm-d-incubation/llm-d-infra.git

Navigate to the quickstart directory

cd llm-d-infra/quickstart

Required tools

Following prerequisite are required for the installer to work.

You can use the installer script that installs all the required dependencies.

./install-deps.sh

Required credentials and configuration

Target Platforms

Since the llm-d-infra is based on helm charts, llm-d can be deployed on a variety of Kubernetes platforms. As more platforms are supported, the installer will be updated to support them.

In this instruction, the target is vanilla kubernetes and if you look for other platform's instructions, you could find them from following links.

llm-d-infra Installation

Only a single installation of llm-d on a cluster is currently supported. In the future, multiple model services will be supported. Until then, uninstall llm-d before reinstalling.

The llm-d-infra contains all the helm charts necessary to deploy llm-d-infra. To facilitate the installation of the helm charts, the llmd-infra-installer.sh script is provided. This script will populate the necessary manifests in the manifests directory. After this, it will apply all the manifests in order to bring up the cluster.

The llmd-infra-installer.sh script aims to simplify the installation of llm-d using the llm-d-infra as it's main function. It scripts as many of the steps as possible to make the installation process more streamlined. This includes:

  • Installing the GAIE infrastructure
  • Creating the namespace with any special configurations
  • Deploying the network stack (istio/kgateway)
  • Creating the pull secret to download the images
  • Deploying the Gateway

It also supports uninstalling the llm-d infrastructure.

Before proceeding with the installation, ensure you have completed the prerequisites and are able to issue kubectl or oc commands to your cluster by configuring your ~/.kube/config file or by using the oc login command.

Usage

The installer needs to be run from the llm-d-infra/quickstart directory as a cluster admin with CLI access to the cluster.

./llmd-infra-installer.sh [OPTIONS]

Flags

FlagDescriptionExample
-n, --namespace NAMEK8s namespace (default: llm-d)./llmd-infra-installer.sh --namespace foo
-f, --values-file PATHPath to Helm values.yaml file (default: values.yaml)./llmd-infra-installer.sh --values-file /path/to/values.yaml
-u, --uninstallUninstall the llm-d components from the current cluster./llmd-infra-installer.sh --uninstall
-d, --debugAdd debug mode to the helm install./llmd-infra-installer.sh --debug
-m, --disable-metrics-collectionDisable metrics collection (Prometheus will not be installed)./llmd-infra-installer.sh --disable-metrics-collection
-k, --minikubeDeploy on an existing minikube instance with hostPath storage./llmd-infra-installer.sh --minikube
-g, --contextSupply a specific Kubernetes context./llmd-infra-installer.sh --context
-j, --gatewaySelect gateway type (istio, kgateway) (default: istio)./llm-installer.sh --gateway kgateway
-r, --release (Helm) Chart release name./llmd-infra-installer.sh --release llm-d-infra
-h, --helpShow this help and exit./llmd-infra-installer.sh --help

Examples

Install llm-d on an Existing Kubernetes Cluster

export HF_TOKEN="your-token"
./llmd-infra-installer.sh

Install on OpenShift

Before running the installer, ensure you have logged into the cluster as a cluster administrator. For example:

oc login --token=sha256~yourtoken --server=https://api.yourcluster.com:6443
export HF_TOKEN="your-token"
./llmd-infra-installer.sh

Validation

After executing install script, you can find the resources will be created according to installation option.

Installation with istio

  • istio-system
kubectl get pods,svc -n istio-system
NAME                         READY   STATUS    RESTARTS   AGE
pod/istiod-774dfd9b6-wjlm2 1/1 Running 0 3m33s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/istiod ClusterIP [Cluster IP] <none> 15010/TCP,15012/TCP,443/TCP,15014/TCP 3m33s
  • llm-d
kubectl get pods,gateway -n llm-d
NAME                                                      READY   STATUS    RESTARTS   AGE
pod/llm-d-infra-inference-gateway-istio-79b75bb5d-blwgs 1/1 Running 0 87s

NAME CLASS ADDRESS PROGRAMMED AGE
gateway.gateway.networking.k8s.io/llm-d-infra-inference-gateway istio llm-d-infra-inference-gateway-istio.llm-d.svc.cluster.local True 87s
  • llm-d-monitoring
kubectl get pods,gateway -n llm-d-monitoring
NAME                                                         READY   STATUS    RESTARTS   AGE
pod/alertmanager-prometheus-kube-prometheus-alertmanager-0 2/2 Running 0 2m51s
pod/prometheus-grafana-7fbfb5f947-h92zc 3/3 Running 0 2m51s
pod/prometheus-kube-prometheus-operator-56c5c488db-clslv 1/1 Running 0 2m51s
pod/prometheus-kube-state-metrics-7f5f75c85d-twvj5 1/1 Running 0 2m51s
pod/prometheus-prometheus-kube-prometheus-prometheus-0 2/2 Running 0 2m51s
pod/prometheus-prometheus-node-exporter-94jkw 1/1 Running 0 2m51s
pod/prometheus-prometheus-node-exporter-c8fzc 1/1 Running 0 2m51s
pod/prometheus-prometheus-node-exporter-tks77 1/1 Running 0 2m51s

Installation with kgateway

  • kgateway-system
kubectl get pods -n kgateway-system
NAME                       READY   STATUS    RESTARTS   AGE
kgateway-ddbb7668c-cc9df 1/1 Running 0 25m
  • llm-d
kubectl get pods,gateway -n llm-d
NAME                                                 READY   STATUS    RESTARTS   AGE
pod/llm-d-infra-inference-gateway-69fd4dcfb9-nzs29 1/1 Running 0 22m

NAME CLASS ADDRESS PROGRAMMED AGE
gateway.gateway.networking.k8s.io/llm-d-infra-inference-gateway kgateway [External IP] True 22m
  • llm-d-monitoring
kubectl get pods,gateway -n llm-d-monitoring
NAME                                                         READY   STATUS    RESTARTS   AGE
pod/alertmanager-prometheus-kube-prometheus-alertmanager-0 2/2 Running 0 24m
pod/prometheus-grafana-7fbfb5f947-jdb7l 3/3 Running 0 24m
pod/prometheus-kube-prometheus-operator-56c5c488db-fr9vt 1/1 Running 0 24m
pod/prometheus-kube-state-metrics-7f5f75c85d-2nfwv 1/1 Running 0 24m
pod/prometheus-prometheus-kube-prometheus-prometheus-0 2/2 Running 0 24m
pod/prometheus-prometheus-node-exporter-65cbt 1/1 Running 0 24m
pod/prometheus-prometheus-node-exporter-n9n6t 1/1 Running 0 24m
pod/prometheus-prometheus-node-exporter-szjwv 1/1 Running 0 24m

Metrics Collection

llm-d-infra includes built-in support for metrics collection using Prometheus and Grafana. This feature is enabled by default but can be disabled using the --disable-metrics-collection flag during installation. llm-d applies ServiceMonitors for vLLM and inference-gateway services to trigger Prometheus scrape targets. In OpenShift, the built-in user workload monitoring Prometheus stack can be utilized. In Kubernetes, Prometheus and Grafana are installed from the prometheus-community kube-prometheus-stack helm charts.

Accessing the Metrics UIs

If running on OpenShift, skip to OpenShift and Grafana.

Port Forwarding

  • Prometheus (port 9090):
kubectl port-forward -n llm-d-monitoring --address 0.0.0.0 svc/prometheus-kube-prometheus-prometheus 9090:9090
  • Grafana (port 3000):
kubectl port-forward -n llm-d-monitoring --address 0.0.0.0 svc/prometheus-grafana 3000:80

Access the User Interfaces at:

  • Prometheus: http://YOUR_IP:9090
  • Grafana: http://YOUR_IP:3000 (default credentials: admin/admin)

Grafana Dashboards

Import the llm-d dashboard from the Grafana UI. Go to Dashboards -> New -> Import. Similarly, import the inference-gateway dashboard from the gateway-api-inference-extension repository. Or, if the Grafana Operator is installed in your environment, you might follow the Grafana setup guide to install the dashboards as GrafanaDashboard custom resources.

Security Note

When running in a cloud environment (like EC2), make sure to:

  1. Configure your security groups to allow inbound traffic on ports 9090 and 3000 (if using port-forwarding)
  2. Use the --address 0.0.0.0 flag with port-forward to allow external access
  3. Consider setting up proper authentication for production environments
  4. If using ingress, ensure proper TLS configuration and authentication
  5. For OpenShift, consider using the built-in OAuth integration for Grafana

Uninstall

This will remove llm-d resources from the cluster. This is useful, especially for test/dev if you want to make a change, simply uninstall and then run the installer again with any changes you make.

./llmd-infra-installer.sh --uninstall
Content Source

This content is automatically synced from quickstart/README.md in the llm-d-incubation/llm-d-infra repository.

📝 To suggest changes, please edit the source file or create an issue.