The Evolution of Cloud Native: Accelerating the Development of AI Applications

Yang QiuDi

Oct 1, 2025

Share on X

Hello everyone, I am Yang Qiudi, a senior product expert from Alibaba Cloud Intelligence Group. Today, I am honored to share with you some of our practices and reflections in supporting enterprises to build AI applications over the past year at the Yunqi Conference.

01 Intelligent Applications Have Become an Important Component of Application Architecture

Colleagues engaged in the AI field, whether they are research scholars, enterprises implementing AI, or supplying companies providing AI technologies or products, I believe we all share a very consistent feeling: the development of AI applications is unstoppable and is reshaping the software industry. Let's take a look at some data:

Explosive Growth in Model Calls: Expenditure on GenAI is particularly rapid, expected to grow from $16 billion in 2023 to $143 billion by 2027, with a compound annual growth rate (CAGR) of 73.3%.
High Proportion of New Applications Becoming Intelligent: By 2027, achieving wide and deep integration of artificial intelligence across six key areas, with the adoption rate of new-generation intelligent terminals, intelligent agents, and other applications exceeding 70%.
Agentic AI Gradually Enters Core Systems of Enterprises: By 2028, 33% of enterprise software will integrate agent-based AI, while this proportion was less than 1% in 2024.

It can be seen that intelligent application has gradually become an important part of the customer application architecture. This evolution process shows that the development of applications and the upgrading of infrastructure drive each other mutually.

Applications Drive Infrastructure Upgrades: Intelligent applications introduce a completely new load model: the real-time nature of inference computing, the state retention of long sessions, the demands for multimodal interaction, and the extreme pursuit of security, stability, and cost. These requirements far exceed the assumptions of traditional cloud computing, forcing infrastructure to evolve from resource pooling to intelligence: computing power must have millisecond-level elasticity, run-time must have security isolation and session affinity, and the communication layer must support asynchronous and high throughput, even message queues, storage, and gateways must have AI scene awareness capabilities. It can be said that the demands of applications directly shape the form of infrastructure.
Infrastructure Empowers Application Evolution: When AI infrastructure possesses capabilities such as serverless GPU, AI gateway, intelligent messaging middleware, and full-stack observability, the development and deployment of intelligent applications can be greatly accelerated. Developers do not need to worry about low-level issues such as inference delay, computing power scheduling, and security isolation, but can focus on business logic and scenario innovation. The abstraction and shielding role of infrastructure significantly lowers the entry threshold for application innovation, driving the large-scale popularization of intelligent applications.
Application and Infrastructure Iterative Resonance: As enterprises introduce more intelligent applications, the scale and complexity of operations continue to rise, and infrastructure will face new challenges, such as governance of calls across multiple models, intelligent entities, and regions. These feedbacks prompt infrastructure to evolve continuously and, in the next iteration, feed back into applications. Thus, a spiral development relationship of 'application-infrastructure-application' is formed, similar to the interaction between microservices and containers in the cloud-native era, but in the AI era, the scale, complexity, and uncertainty are even higher.

However, this evolution process is not without challenges. From the process of serving customers on the cloud, we found that the obstacles in landing AI application architecture center around the following three aspects:

A completely new agent development skill stack; how to quickly develop and launch, focus on business, and rapidly validate.
How to quickly integrate with existing systems, empower core systems, and utilize existing assets.
How to ensure that newly constructed AI applications can run stably and securely, avoiding uncontrollable risks in large-scale use.

Solving these issues requires upgrading cloud infrastructure from traditional forms to an AI-native architecture. Key elements of AI infrastructure include: a function computing runtime with millisecond-level elasticity, an AI gateway that unifies traffic governance and protocol adaptation, messaging middleware that supports asynchronous high-throughput communication, and a full-stack observability system covering model invocation, agent orchestration, and system interaction. Only with the support of this new type of infrastructure can intelligent applications truly become the 'new infrastructure' of enterprise application architecture, driving continuous intelligent upgrades of the business.

Therefore, we have distilled the AI-native application architecture shown in the above figure, linking eight key components such as AI runtime, AI gateway, AI MQ, AI memory, and AI observability to form a complete AI-native technology stack, which we call AgentRun. Enterprises do not need to start from scratch; they can significantly shorten the time from Proof of Concept to production launch based on AgentRun.

Next, we will analyze how the eight core components of AgentRun provide support for AI-native architecture, focusing on the three challenges mentioned earlier.

02 Accelerating Intelligent Agent Development with Cloud Native

With the overall architectural blueprint in place, we first need to solve the most fundamental question: as a 'new member' of enterprise IT systems, what kind of base should intelligent agents run on? This leads us to the core requirements for the runtime.

Function Computing

We have found that Agent applications have several typical characteristics: unpredictable traffic, multi-tenant data isolation, and vulnerability to injection attacks. These characteristics require the runtime to have three core capabilities: millisecond-level elasticity, session affinity management, and security isolation.

Traditional monolithic or microservice application development is bound by services, where developers strive to build functionally cohesive monolithic or microservices, but this often leads to deep coupling and complexity in code logic. The emergence of AI Agents completely subverts this model. Its core is no longer about building rigid services, but about dynamically and intelligently orchestrating a series of atomic tools or agents by understanding user intentions through large language models (LLM). This new development model aligns perfectly with the design philosophy of function computing (FaaS). Function computing allows developers to package each atomic capability of an agent into an independent function in the lightest and most native way. This means that any agent or tool envisioned by developers can be accurately mapped to a ready-to-use, lightweight, flexible, securely isolated, and highly elastic function. It not only provides a better development experience and lower costs but, more importantly, greatly enhances the production usability of agents and their go-to-market efficiency, making large-scale landings of AI innovation possible.

To deeply embrace the needs of AI agents and practice the Function-to-AI concept, function computing innovatively breaks the traditional FaaS stateless boundary. By natively supporting serverless session affinity, a dedicated persistent function instance is dynamically allocated to each user session, which can last up to 8 hours or even longer, perfectly solving the context retention problem in multi-turn dialogues of agents. The lightweight management and operation of hundreds of thousands of functions and millions of sessions, along with request-aware scheduling strategies, supports elastic scaling from zero to millions of QPS, perfectly matching the common sparse or burst traffic patterns of AI agent applications, ensuring stable service operation.

In terms of tool runtime, function computing includes built-in execution engines for multiple languages such as Python, Node.js, Shell, Java, with code execution latency < 100ms; built-in cloud sandbox tools such as Code Sandbox, Browser Sandbox, Computer Sandbox, and RL Sandbox are readily available. In terms of security isolation, function computing provides a multi-dimensional isolation mechanism at the request, session, and function levels through security container technology, offering virtual machine-level strong isolation for each task. Combined with session-level dynamic mounting capabilities, it achieves security isolation between the compute layer and storage layer, covering the most stringent security and data security demands of sandboxed code execution in all scenarios. For model runtime, function computing focuses on domain models and small-parameter large language models.

In terms of model runtime, function computing focuses on vertical models and small-parameter large language models, providing serverless GPU based on memory snapshot technology, achieving millisecond-level automatic switching between busy and idle times, significantly reducing the implementation costs of AI. Relying on request-aware scheduling strategies can better address GPU resource vacancy or contention issues, ensuring stable business request RT. With the decoupling and free combination of GPU and CPU computing power, virtualization technologies that slice a single card or even 1/N cards provide customers with finer-grained model resource configurations, making model hosting more economical and efficient.

As the most typical product of serverless, function computing has already served many important customers such as Bailian, Modao, and Tongyi Qianwen, and has become an ideal choice for enterprises building AI applications.

RocketMQ for AI

With an efficient runtime in place, as the scale of intelligent agents increases and interaction patterns become more diversified, we need to introduce asynchronous communication to ensure system throughput and stability. To this end, we have newly released RocketMQ for AI. Its core innovation is the newly released LiteTopic, which dynamically creates a lightweight LiteTopic for each session to persistently store the session context, intermediate results, and more. LiteTopic can support the breakpoint resume of agents and improve the throughput of multi-agent communication by more than ten times.

The implementation of this innovative architecture relies on RocketMQ's four core capabilities deeply optimized for AI scenarios:

Support for Millions of Lite-Topics: A single cluster can manage millions of lightweight topics, independently assigning Topics for each session, achieving session isolation under high concurrency without performance loss.
Fully Automated Lightweight Management: Lite-Topics are created dynamically on demand and automatically recycled after disconnection, completely eliminating resource leaks, with zero operational intervention.
Ability to Transmit Large Message Bodies: Supports messages of dozens of MB or even larger, easily carrying typical AIGC data loads such as long prompts, images, and documents.
Strict Order Message Guarantee: Ensures message order within a single queue, ensuring the token order of LLM's streaming output is intact, supporting coherent and smooth interaction experiences.

03 Cloud Native Accelerates the Integration of Intelligent Agents and Legacy Systems

Compared to developing an intelligent agent from scratch, integrating legacy systems with intelligent agents is a less costly path to intelligence. However, it often runs into the following two challenges:

Challenge 1: How to Connect Legacy Business with Intelligent Agents

For most enterprises, there are large legacy systems and service interfaces that have been solidified; these systems are core assets of the business, yet they are usually built on traditional HTTP/REST protocols and lack the capability for direct interaction with intelligent agents. The challenge lies in how to allow intelligent agents to seamlessly access and invoke these legacy capabilities without overturning the existing architecture. Forcing a transformation of the legacy system would not only be costly but also impact the stability of existing operations. Therefore, a unified middleware is required that can both interface with legacy services and provide a standardized, governable calling entry for intelligent agents.

The Alibaba Cloud Native API Gateway is designed for this scenario: it seamlessly converts traditional APIs into services consumable by intelligent agents through protocol adaptation, traffic governance, built-in security, and observability capabilities, helping enterprises achieve intelligent upgrades at low costs. It features:

Optimized protocols for intelligent agents, supporting long connection protocols such as SSE and Websocket, enabling lossless Websocket changes and graceful online/offline transitions.
Enterprise-grade capabilities, including built-in observability, security capabilities, consumer authentication, and gray release.
Hardware acceleration capabilities to enhance TLS and Gzip performance.
A rich plugin ecosystem that supports hot loading and updates.

Challenge 2: How to Rapidly Create and Manage MCP Server

In addition to connecting legacy systems, enterprises need to continuously build new agent tools, especially based on the emerging standard protocol MCP (Model Context Protocol). The challenge is how to rapidly develop, deploy, and manage MCP Servers to ensure seamless integration with intelligent agents. Without an efficient development and runtime environment, enterprises often face complex resource preparation, long deployment cycles, and difficulty in ensuring elasticity and security when creating MCP Servers.

In response, Alibaba Cloud provides Function Computing as an ideal runtime for the rapid development and operation of MCP Servers, featuring millisecond-level elasticity, zero maintenance, and multi-language support. Leveraging the lightweight, millisecond-level elasticity, zero maintenance, and built-in multi-language runtime environment of Function Computing provides an ideal runtime for MCP Servers. Utilizing the one-stop development and broad integration capabilities of Function Computing increases the efficiency of MCP development.

At the same time, the AI Gateway allows enterprises to manage the registration, authentication, gray release, throttling, and observable management of MCP Servers at a unified entry point, supporting zero-code transitions from HTTP to MCP, enabling enterprises to build and launch MCP Servers in the shortest possible time, facilitating the rapid integration of intelligent agents with business scenarios. Moreover, the AI Gateway provides capabilities for the MCP marketplace, suitable for enterprises to build private AI open platforms.

04 Accelerating Stable Operation of Intelligent Agents with Cloud Native

Whether you build an intelligent agent from scratch or integrate intelligent agents with legacy systems, this is only the first step towards intelligent applications. When enterprises push intelligence into production, they will also face challenges such as inference delays, stability fluctuations, difficulty in problem resolution, prominent security risks, unreliable outputs, and high costs. These are systemic challenges faced by enterprise-level AI applications in stability, performance, security, and cost control. Below we will share some of our responses from the perspectives of the AI Gateway and AI Observability.

AI Gateway

The gateway serves as the entry traffic control in the application architecture. However, compared to traditional web applications, the traffic characteristics of AI applications are vastly different, mainly characterized by high latency, large bandwidth, streaming transmission, long connections, and API-driven features. This has led to the emergence of a new form of gateway—AI Gateway.

In summary, the AI Gateway is the next-generation gateway that provides multi-model traffic scheduling, MCP and Agent management, intelligent routing, and AI governance. Alibaba Cloud offers both open source (Higress) and commercial (API Gateway) forms of AI Gateways. It plays a significant role in accelerating the stable operation of intelligent agents:

Failure-oriented design to improve availability: By enhancing model invocation availability to 99.9%+ through capabilities such as multi-model routing, failover, gray release, and concurrent control.
Pre-integrated security capabilities: Unified management of API keys and consumer credentials, power delegation management, and pre-integration of WAF protection and security barrier capabilities to enhance security capabilities.
Multi-protocol support: Centralized management of various intelligent agent invocations, including model calls, MCP tool calls, and A2A calls.

AI Observability

AI Observability encompasses a set of practices and tools that allow engineers to comprehensively gain insight into applications built on large language models.

Unlike traditional applications, AI applications face a series of unique challenges that have not been observed before, which can be summarized in three categories:

Performance and Reliability Issues: Large models are resource-intensive, with latency peaks and bottlenecks occurring frequently. Observability links the data of all components, enabling engineers to precisely locate the source of latency, whether it is the model itself, external API calls, or database queries. It can also track each step in a multi-step process, simplifying the debugging process in complex systems.
Cost Issues: Many large model services charge based on token usage, and without controls, costs can unexpectedly surge. Observability tools track the number of tokens per request and the total daily usage; when usage spikes occur, alerts can be sent out to help teams optimize warnings or set limits before receiving exorbitant bills.
Quality Issues: The possible outputs of large models often inherit biases or harmful content from training data and can cause hallucinations, leading to outputs that are entirely unanticipated. Observability helps provide assessments and tools, detecting inappropriate, inaccurate, or dangerous content in each stage of the AI application execution process through automatic analysis and scoring to assist engineers in taking timely action.

To tackle the above challenges, Alibaba Cloud's AI Observability solution provides:

End-to-End Full-Link Tracing: Provides end-to-end log collection and link tracking, visually displaying the execution path of requests throughout the AI application. Supports flexible queries and filtering of historical dialogues to facilitate debugging and improvement.
Full-Stack Monitoring: Includes observability across three dimensions: applications, AI Gateway, and inference engines. Observed content includes real-time tracking of response latency, request throughput, token consumption, error rates, and resource usage (such as CPU, memory, API tokens), and can trigger alerts when metrics are abnormal, helping teams respond quickly before affecting users while effectively monitoring costs.
Automated Evaluation Functionality: By introducing evaluation Agents, automatic assessments of inputs and outputs of applications and models can be conducted, detecting problems such as hallucinations, inconsistencies, or declines in answer quality. Effective tools often integrate evaluation templates that help engineers quickly evaluate common quality and safety issues.
AI/ML Platform Pre-Integration: Defaults to integration with container services, PAI, and Bailian, allowing one-click access to observability data collection with a default dashboard provided.

In addition to a complete solution, we also provide an intelligent operations assistant aimed at operation and development personnel, helping every IT engineer improve efficiency in discovering system anomalies, locating root causes of problems, and recovering from failures.

Unlike traditional rules-based AIOps, our AIOps intelligent agents are based on a multi-Agent architecture, possessing autonomy to solve unknown problems. After receiving an issue, it autonomously plans, executes, and reflects, thereby enhancing its ability to resolve problems. At the algorithm level, we have accumulated many atomic capabilities, which include preprocessing of massive datasets, anomaly detection, intelligent forecasting, etc., all of which can be utilized as tools by intelligent agents. We welcome everyone to log into our console for an experience and provide us with feedback.

AI DevOps

So far, the AI-native architecture has fundamentally covered rapid development and deployment of intelligent agents, the integration of intelligent agents with legacy systems, and the stable operation of intelligent agents, but the last mile remains—how to incorporate this new intelligent agent development paradigm into existing enterprise DevOps processes.

Alibaba Cloud's AI DevOps connects the entire link from coding, building, to operation and maintenance, and integrates with Lingma, AIOps, etc., injecting AI capabilities throughout. Once code is submitted for release, the system automatically captures and correlates online issues, generating intelligent diagnostic reports by the AI Agent.

05 Open Source and Openness Remains the Core Philosophy of Cloud Native

Although today's sharing focuses more on commercial products, open source and openness has always been the fundamental color and core philosophy of cloud-native. For almost every commercial product, we have directly opened basic capabilities or engaged in continuous contributions based on external open source projects, promoting community evolution. For example:

Higress & HiMarket: Providing AI gateway and AI open platform capabilities.
AgentScope Studio: Providing application governance capabilities, including Agent debugging, observability, evaluation, etc.
AgentScope Agent Framework: Providing application development capabilities.
AgentScope-Runtime: Function computing supports AgentScope's Agent/Sandbox/Browser-use, these three types of runtimes.
Lite RocketMQ: Dynamically create an independent lightweight topic (Lite-Topic) for each session or issue.
Nacos: Provides service registration, discovery, and routing capabilities for MCP through Nacos MCP Registry & Router.
Observability: Cloud products are fully compatible with industry-standard observability infrastructures such as OTel/Prometheus/Grafana.

We believe that it is precisely this parallel evolution model of commercial and open source that allows cloud-native products to better meet the demands of enterprise customers while maintaining ongoing technological leadership and community vitality through collaboration in open source and openness.

Contact

Follow and engage with us through the following channels to stay updated on the latest developments from higress.ai.

https://medium.com/@higress_ai