Skip to main content

2 posts tagged with "llm-d release news!"

llm-d tag description

View All Tags

llm-d 0.2: Our first well-lit paths (mind the tree roots!)

· 10 min read
Robert Shaw
Director of Engineering, Red Hat
Clayton Coleman
Distinguished Engineer, Google
Carlos Costa
Distinguished Engineer, IBM

Our 0.2 release delivers progress against our three well-lit paths to accelerate deploying large scale inference on Kubernetes - better load balancing, lower latency with disaggregation, and native vLLM support for very large Mixture of Expert models like DeepSeek-R1.

We’ve also enhanced our deployment and benchmarking tooling, incorporating lessons from real-world infrastructure deployments and addressing key antipatterns. This release gives llm-d users, contributors, researchers, and operators, clearer guides for efficient use in tested, reproducible scenarios.

Announcing the llm-d community!

· 11 min read
Robert Shaw
Director of Engineering, Red Hat
Clayton Coleman
Distinguished Engineer, Google
Carlos Costa
Distinguished Engineer, IBM

Announcing the llm-d community​

llm-d is a Kubernetes-native high-performance distributed LLM inference framework - a well-lit path for anyone to serve at scale, with the fastest time-to-value and competitive performance per dollar for most models across most hardware accelerators.

With llm-d, users can operationalize gen AI deployments with a modular, high-performance, end-to-end serving solution that leverages the latest distributed inference optimizations like KV-cache aware routing and disaggregated serving, co-designed and integrated with the Kubernetes operational tooling in Inference Gateway (IGW).