Red Hat Brings Distributed AI Inference to Production AI Workloads with Red Hat AI 3

October 14, 2025 at 09:00 AM EDT

Red Hat’s hybrid cloud-native AI platform streamlines AI workflows and offers powerful new inference capabilities, building the foundation for agentic AI at scale and empowering IT teams and AI engineers to innovate faster and more efficiently

Red Hat, the world's leading provider of open source solutions, today announced Red Hat AI 3, a significant evolution of its enterprise AI platform. Bringing together the latest innovations of Red Hat AI Inference Server, Red Hat Enterprise Linux AI (RHEL AI) and Red Hat OpenShift AI, the platform helps simplify the complexities of high-performance AI inference at scale, enabling organizations to more readily move workloads from proofs-of-concept to production and improve collaboration around AI-enabled applications.

As enterprises move beyond AI experimentation, they face significant hurdles, including data privacy, cost control and managing diverse models. “The GenAI Divide: State of AI in Business” from the Massachusetts Institute of Technology NANDA project, highlights the reality of production AI, with approximately 95% of organizations failing to see measurable financial returns from ~$40 billion in enterprise spending.

Red Hat AI 3 focuses on directly addressing these challenges by providing a more consistent, unified experience for CIOs and IT leaders to maximize their investments in accelerated computing technologies. It makes it possible to rapidly scale and distribute AI workloads across hybrid, multi-vendor environments while simultaneously improving cross-team collaboration on next-generation AI workloads like agents, all on the same common platform. With a foundation built on open standards, Red Hat AI 3 meets organizations where they are on their AI journey, supporting any model on any hardware accelerator, from datacenters to public cloud and sovereign AI environments to the farthest edge.

From training to "doing": The shift to enterprise AI inference

As organizations move AI initiatives into production, the emphasis shifts from training and tuning models to inference, the “doing” phase of enterprise AI. Red Hat AI 3 emphasizes scalable and cost-effective inference, by building on the wildly-successful vLLM and llm-d community projects and Red Hat’s model optimization capabilities to deliver production-grade serving of large language models (LLMs).

To help CIOs get the most out of their high-value hardware acceleration, Red Hat OpenShift AI 3.0 introduces the general availability of llm-d, which reimagines how LLMs run natively on Kubernetes. llm-d enables intelligent distributed inference, tapping the proven value of Kubernetes orchestration and the performance of vLLM, combined with key open source technologies like Kubernetes Gateway API Inference Extension, the NVIDIA Dynamo low latency data transfer library (NIXL), and the DeepEP Mixture of Experts (MoE) communication library, allowing organizations to:

Lower costs and improve response times with intelligent inference-aware model scheduling and disaggregated serving
Deliver operational simplicity and maximum reliability with prescriptive "Well-lit Paths" that streamline the deployment of models at scale on Kubernetes.
Maximize flexibility with cross-platform support to deploy LLM inference across different hardware accelerators, including NVIDIA and AMD.

llm-d builds on vLLM, evolving it from a single-node, high-performance inference engine to a distributed, consistent and scalable serving system, tightly integrated with Kubernetes, and designed for enabling predictable performance, measurable ROI and effective infrastructure planning. All enhancements directly address the challenges of handling highly variable LLM workloads and serving massive models like Mixture-of-Experts (MoE) models.

A unified platform for collaborative AI

Red Hat AI 3 delivers a unified, flexible experience tailored to the collaborative demands of building production-ready generative AI solutions. It is designed to deliver tangible value by fostering collaboration and unifying workflows across teams through a single platform for both platform engineers and AI engineers to execute on their AI strategy. New capabilities focused on providing the productivity and efficiency needed to scale from proof-of-concept to production include:

Model as a Service (MaaS) capabilities build on distributed inference and enable IT teams to act as their own MaaS providers, serving common models centrally and delivering on-demand access for both AI developers and AI applications. This allows for better cost management and supports use cases that cannot run on public AI services due to privacy or data concerns.
AI hub empowers platform engineers to explore, deploy and manage foundational AI assets. It provides a central hub with a curated catalog of models, including validated and optimized gen AI models, a registry to manage the lifecycle of models and a deployment environment to configure and monitor all AI assets running on OpenShift AI.
Gen AI studio provides a hands-on environment for AI engineers to interact with models and rapidly prototype new gen AI applications. With the AI assets endpoint feature, engineers can easily discover and consume available models and MCP servers, which are designed to streamline how models interact with external tools. The built-in playground provides an interactive, stateless environment to experiment with models, test prompts and tune parameters for use cases like chat and retrieval-augmented generation (RAG).
New Red Hat validated and optimized models are included to simplify development. The curated selection includes popular open source models like OpenAI’s gpt-oss, DeepSeek-R1, and specialized models such as Whisper for speech-to-text and Voxtral Mini for voice-enabled agents.

Building the foundation for next-generation AI agents

AI agents are poised to transform how applications are built, and their complex, autonomous workflows will place heavy demands on inference capabilities. The Red Hat OpenShift AI 3.0 release continues to lay the groundwork for scalable agentic AI systems not only through its inference capabilities but also with new features and enhancements focused on agent management.

To accelerate agent creation and deployment, Red Hat has introduced a Unified API layer based on Llama Stack, which helps align development with industry standards like OpenAI-compatible LLM interface protocols. Additionally, to champion a more open and interoperable ecosystem, Red Hat is an early adopter of the Model Context Protocol (MCP), a powerful, emerging standard that streamlines how AI models interact with external tools—a fundamental feature for modern AI agents.

Red Hat AI 3 introduces a new modular and extensible toolkit for model customization, built on existing InstructLab functionality. It provides specialized Python libraries that give developers greater flexibility and control. The toolkit is powered by open source projects like Docling for data processing, which streamlines the ingestion of unstructured documents into an AI-readable format. It also includes a flexible framework for synthetic data generation and a training hub for LLM fine tuning. The integrated evaluation hub helps AI engineers monitor and validate results, empowering them to confidently leverage their proprietary data for more accurate and relevant AI outcomes.

Supporting Quotes

Joe Fernandes, vice president and general manager, AI Business Unit, Red Hat

"As enterprises scale AI from experimentation to production, they face a new wave of complexity, cost and control challenges. With Red Hat AI 3, we are providing an enterprise-grade, open source platform that minimizes these hurdles. By bringing new capabilities like distributed inference with llm-d and a foundation for agentic AI, we are enabling IT teams to more confidently operationalize next-generation AI, on their own terms, across any infrastructure."

Dan McNamara, senior vice president and general manager, Server and Enterprise AI, AMD

“As Red Hat brings distributed AI inference into production, AMD is proud to provide the high-performance foundation behind it. Together, we’ve integrated the efficiency of AMD EPYC™ processors, the scalability of AMD Instinct™ GPUs, and the openness of the AMD ROCm™ software stack to help enterprises move beyond experimentation and operationalize next-generation AI — turning performance and scalability into real business impact across on-prem, cloud, and edge environments.”

Mariano Greco, chief executive officer, ARSAT

“As a provider of connectivity infrastructure for Argentina, ARSAT handles massive volumes of customer interactions and sensitive data. We needed a solution that would move us beyond simple automation to 'Augmented Intelligence' while delivering absolute data sovereignty for our customers. By building our agentic AI platform on Red Hat OpenShift AI, we went from identifying the need to live production in just 45 days. Red Hat OpenShift AI has not only helped us improve our service and reduce the time engineers spend on support issues, but also freed them up to focus on innovation and new developments."

Rick Villars, group vice president, Worldwide Research, IDC

"2026 will mark an inflection point as enterprises shift from starting their AI pivot to demanding more measurable and repeatable business outcomes from investments. While initial projects focused on training and testing models, the real value - and the real challenge - is to operationalize model-derived insights with efficient, secure and cost-effective inference. This shift requires more modern infrastructure, data, and app deployment environments with ready to use production-grade inference capabilities that can handle real-world scale and complexity, especially as agentic AI supercharges inference loads. Companies that succeed in becoming AI-fueled businesses will be those who establish a unified platform to orchestrate these ever more sophisticated workloads in hybrid cloud environments, not just in silo domains."

Ujval Kapasi, vice president, Engineering AI Frameworks, NVIDIA

“Scalable, high-performance inference is key to the next wave of generative and agentic AI. With built-in support for accelerated inference with open source NVIDIA Dynamo and NIXL technologies, Red Hat AI 3 provides a unified platform that empowers teams to move swiftly from experimentation to running advanced AI workloads and agents at scale.”

Additional Resources

Learn more about Red Hat AI 3
Read the blog about Red Hat AI 3
Watch the webinar on what’s new and what’s next for Red Hat AI
Learn more about how Red Hat partners are powering AI innovation

Connect with Red Hat

Learn more about Red Hat
Get more news in the Red Hat newsroom
Read the Red Hat blog
Follow Red Hat on X
Follow Red Hat on Instagram
Watch Red Hat videos on YouTube
Follow Red Hat on LinkedIn

About Red Hat, Inc.

Red Hat is the open hybrid cloud technology leader, delivering a trusted, consistent and comprehensive foundation for transformative IT innovation and AI applications. Its portfolio of cloud, developer, AI, Linux, automation and application platform technologies enables any application, anywhere—from the datacenter to the edge. As the world's leading provider of enterprise open source software solutions, Red Hat invests in open ecosystems and communities to solve tomorrow's IT challenges. Collaborating with partners and customers, Red Hat helps them build, connect, automate, secure and manage their IT environments, supported by consulting services and award-winning training and certification offerings.

Forward-Looking Statements

Except for the historical information and discussions contained herein, statements contained in this press release may constitute forward-looking statements within the meaning of the Private Securities Litigation Reform Act of 1995. Forward-looking statements are based on the company’s current assumptions regarding future business and financial performance. These statements involve a number of risks, uncertainties and other factors that could cause actual results to differ materially. Any forward-looking statement in this press release speaks only as of the date on which it is made. Except as required by law, the company assumes no obligation to update or revise any forward-looking statements.

Red Hat, Red Hat Enterprise Linux, the Red Hat logo and OpenShift are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries in the U.S. and other countries. Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries.

View source version on businesswire.com: https://www.businesswire.com/news/home/20251014891532/en/

Contacts

Media Contact:

Kathryn Lucas

kkaplan@redhat.com

301-938-8726

Red Hat Brings Distributed AI Inference to Production AI Workloads with Red Hat AI 3

Contacts

More News

Sections

Services

Contact Information

Follow Us

Newton, NC (28658)

Today

Tonight

Red Hat Brings Distributed AI Inference to Production AI Workloads with Red Hat AI 3

Contacts

More News

Sections

Services

Contact Information

Follow Us