Distributed Multi-Agent Security High Availability Exploration and Practice
Yi Zhan
|
Oct 10, 2025
|
In today's accelerated development of artificial intelligence, AI Agent is becoming the core engine to promote the implementation of the "Artificial Intelligence +" strategy. Both technological trends and policy orientations indicate that a profound transformation is taking place.
Recently, the State Council issued the "Opinions on Deepening the Implementation of the 'Artificial Intelligence +' Action", clearly stating that by 2027, the penetration rate of the new generation of intelligent terminals and agents will exceed 70%; by 2030 it will break 90%, and the intelligent economy will become an important growth pole of China’s economic development; by 2035, we will fully enter a new stage of intelligent economy and intelligent society.
Behind this series of goals is the signal that the technology system centered around AI Agent is gradually maturing and being implemented on a large scale.
The founder of OpenAI, Andrej Karpathy, once proposed a classic syllogism to describe the evolution of software programming paradigms:
This marks a fundamental shift in the form of application development: the operational logic of AI Agents increasingly relies on the intelligent capabilities of the models themselves. In this context, the development methods of AI Agents have also undergone three stages of evolution: low-code, high-code, and no-code.
Low-code: Rapid verification, but with obvious limitations.
Low-code platforms are characterized by a “drag-and-drop canvas,” greatly lowering the entry barrier, and play an important role in early-stage concept validation (POC). However, due to their high abstraction, poor flexibility, and limited performance, they struggle to handle complex business scenarios and often stop at the experimental stage.
High-code: The current mainstream choice.
Developing AI Agents by writing code based on development frameworks offers stronger performance and greater flexibility compared to low-code. In a situation where current model capabilities are not sufficiently intelligent, it can balance the model's autonomy with the deterministic business requirements in complex scenarios, making it the mainstream approach at this time.
No-code: Future vision, still immature.
No-code aspires for users to drive the entire application construction solely through natural language, completely relying on the model's planning and execution capabilities. Although the vision is promising, it is still limited by the current cognitive boundaries and stability of the models, which cannot support reliable delivery of real business.
In summary, the most feasible and sustainable path at this stage remains the high-code development model based on frameworks.
As the scenarios for Agents become increasingly complex, development frameworks are continuously evolving. They can be classified into three generations:
Chat Client mode
This is the most primitive form—input a sentence, and the model responds, in a question-and-answer format. Although it also supports tool calls and loop execution, all logic relies on the same model, repeatedly making decisions under the same prompt and context. Limited by the current capabilities of the model, it is difficult to stably handle complex business scenarios, such as error recovery and parallel collaboration.
Workflow framework
This type of framework is represented by LangGraph. It is based on a process orchestration engine that breaks tasks down into multiple nodes and supports conditional judgments, loops, and parallel logic, effectively addressing the issues of traditional business process AI transformation, upgrading from "chat-based interaction" to "structured execution".
However, this manually orchestrated approach has obvious drawbacks: as business scales grow, processes become increasingly complex, and maintenance costs rise exponentially. A bigger issue is that this “static” orchestration architecture cannot benefit from the continuous improvement of model capabilities.
Agentic API
People are gradually realizing that developing AI Agents requires providing an Agentic API focused on Agent abstraction. Alibaba's AgentScope and Google's ADK are typical representatives. Based on Agent-oriented abstraction, developers can enjoy the model's autonomous planning and dynamic decision-making capabilities while ensuring stability and controllability through structured design. This allows for the development of scalable, maintainable, and sustainably evolving AI Agents.
At the beginning of last month, Alibaba Cloud officially released the 1.0 final version of the AgentScope development framework, which is a milestone version.
AgentScope mainly includes three core modules:
Framework: Supports interruptible and resumable execution control, with built-in long and short-term memory mechanisms to support long-term task processing; also provides an efficient tool calling system, supporting on-demand activation and dynamic loading.
Runtime: Built on container technology to create a secure sandbox that isolates external operation risks and provides a powerful deployment and runtime engine, supporting rapid publishing and flexible management.
Studio: Provides comprehensive debugging and observation capabilities, integrating evaluation systems to help developers understand execution processes and ensure iteration quality.
In the past, these capabilities mainly served Python developers. But starting today, AgentScope officially launches a Java version! This means that a wide range of Java developers can also easily access this advanced Agent development system. Not only that, we have also fully upgraded the core of Spring AI Alibaba to AgentScope, creating an automatically assembled, out-of-the-box Java native Agent framework that helps enterprises quickly build production-level intelligent applications.
Why must Multi-Agent systems be distributed?
For enterprises, the biggest challenge in developing Agents often lies not in the technology itself, but in complex business processes—this is not only the most difficult part of the development process but also the core competitiveness of the enterprise.
However, in reality, very few teams can fully understand all business processes within the enterprise, let alone independently complete the development and implementation of all processes. Therefore, the real situation is that multiple different teams develop different business systems, and the architecture of the system ultimately tends to follow Conway's Law: the organizational structure determines the system architecture, which naturally evolves into a modular, distributed architecture.
From the perspective of high availability, traditional monolithic architectures always face the risk of single points of failure and performance bottlenecks. To achieve high availability and elastic expansion, Multi-Agent systems must adopt a distributed architecture.
Therefore, whether from a business perspective or a technological perspective, a distributed architecture is an inevitable choice.
The execution process of Agents has several notable characteristics: long execution processes, unstable output results, stateful execution processes, and high computational costs. When distributed Agents work together, these challenges are further amplified.
Firstly, service registration and discovery become key issues—how does each Agent register its capabilities? How do other Agents accurately discover and invoke it?
Then, in a distributed architecture, security cannot be overlooked. It is necessary to prevent Agents from being maliciously attacked, protect sensitive credentials such as API keys from being leaked, and ensure that generated content complies with regulatory requirements to avoid legal risks.
In response to the characteristic of long execution times for AI tasks, a message queue is introduced as an asynchronous hub to achieve task decoupling and non-blocking calls, improving system throughput. At the same time, a persistent checkpoint mechanism is built based on MQ: key states are automatically saved, allowing tasks to be resumed from the last breakpoint after interruption, effectively reducing computational costs and losses due to failures.
Finally, in the face of sudden spikes in traffic, a comprehensive traffic protection and high-availability mechanism must be established to ensure stable system operation through means such as circuit breaking and throttling.
It can be said that building an efficient, secure, and reliable Multi-Agent system is not only a challenge for AI but also a comprehensive test of engineering architecture. All of this relies on AI middleware.
When it comes to registration configuration, people often think of Nacos. In the era of microservices, Nacos holds over 60% of the domestic market share, and mainstream cloud vendors at home and abroad, including Azure, provide Nacos-managed products. In today's AI era, Nacos has evolved into AI Registry.
In terms of AI tools, Nacos supports the MCP Registry official protocol, allowing traditional applications to be quickly transformed into MCP Servers and managed dynamically without any code modification.
In multi-Agent collaboration, Nacos is the first registration center to support the A2A protocol. After Agents register to Nacos, the caller only needs to fill in the Nacos address to achieve orchestration of distributed multi-Agents, making the distributed collaboration of Agents as simple and stable as monolithic applications.
In configuration management, based on Nacos' dynamic push capability, dynamic updates of Agent capabilities can also be achieved: runtime behaviors of Agents can be flexibly adjusted without redeployment.
The Nacos 3.1 version has already been released simultaneously in the open-source community and Alibaba Cloud's MSE, and everyone can directly use the latest version of Nacos in Alibaba Cloud MSE.
The full-link security of AI native applications mainly focuses on three aspects: traffic entry, AI asset security, and generated content security. Firstly, the security of the API gateway entry is the first line of defense. Through the Higress API gateway, mTLS mutual encryption communication is achieved, ensuring transmission security; simultaneously, a WAF firewall is integrated to resist common web attacks. Combined with login authentication, IP black-and-white lists, and custom authentication, a multi-dimensional access control system has been constructed to effectively prevent illegal requests.
In terms of AI-related configuration security, taking API Key as an example, Nacos is introduced as a unified configuration center, supporting encrypted storage and scheduled rotation of keys to prevent sensitive information leakage. Moreover, multiple protection mechanisms are integrated within the Higress AI gateway. Various consumer authentication methods, such as JWT and OAuth, are supported to ensure that the identity of the caller is trustworthy, effectively protecting AI asset security.
Finally, regarding the security of AI-generated content, Alibaba Cloud AI safety barriers and third-party SaaS review services have been integrated to conduct real-time reviews of all output content, preventing the generation of illegal, non-compliant, or harmful information. This realizes full-link security of “traffic entry security, AI asset security, and generated content security.”
In the era of AI-native architecture, traditional throttling methods are no longer applicable. In the past, throttling in microservices was quite simple—such as limiting to accepting only 100 requests per second, which was simple and effective. However, in the AI era, this method is ineffective. In the era of large models, the length of each request varies: some may ask for a greeting, while others may request a 5000-word report. Although they are both “one request,” the computational resources consumed may differ by tens or hundreds of times. If throttling is still based on “request count,” it would be like a taxi fare being a flat rate that looks only at the number of rides without considering the distance, which is clearly unreasonable.
Throttling in the AI era must adopt a new methodology.
Firstly, fine-grained throttling based on Tokens must be implemented. The Higress AI gateway supports real-time statistics of the input and output Token quantities of each request and uses this as the basis for throttling.
Next, priority scheduling must be implemented. We mark the traffic from different sources using our API gateway—such as paid users, free users, etc.—and label them with different priorities. Then, priority scheduling can be performed at the AI gateway based on these labels, ensuring that high-priority tasks are not crowded out by low-priority traffic.
Finally, the throttling threshold is not a fixed value; dynamic self-adaptive throttling must be achieved. The AI gateway can sense the load situation of the backend GPU in real-time. Once system pressure rises, it automatically tightens the quota for free users, prioritizing the protection of core business.
Alibaba Cloud's Higress gateway natively supports Token-level throttling, priority scheduling, and adaptive throttling, and these features are all plug-and-play. No code modification is required; you can directly use Alibaba Cloud's Higress gateway to ensure high availability of AI applications.
The testing process before going live used to be very simple: running regression tests once to see if the use cases passed, and if they did, it could be confidently released. But AI applications are different now—asking the same question twice may yield different answers. Their output is flexible and probabilistic. We cannot exhaustively enumerate all test cases before going live; if we continue using the old testing methods, we will encounter situations where everything tests well offline, but then “fails” when facing real users online.
We must transform evaluation testing from “the last step before project launch” to a core process throughout the application lifecycle. Every feature deployment must go through A/B testing, using real traffic to validate effectiveness.
After real traffic is generated, this evaluation process must be real-time and automated. Here, Alibaba Cloud's observable products support automatic extraction, deduplication, and correlation of observable data such as logs and metrics, generating test sets and triggering real-time evaluation. They can compare which strategy generated higher quality and better user experience in the A/B tests, thus making correct decisions.
More importantly, these evaluation processes do not stop at the “scoring” stage; they also help accumulate a substantial amount of high-quality data—both cases with significant improvements and those where users were dissatisfied. After cleaning and labeling this data, it can be used to retrain and optimize the model.
This is the “positive data flywheel” that we aim to build: data-centric, continuously constructing high-quality datasets to train competitive barriers. Evaluation is no longer just an acceptance phase but the “engine” for the continuous evolution of AI applications. Only in this way can our AI applications truly evolve continuously and become more user-friendly over time.
The era of AI Agents has arrived. It represents not only a technological revolution but also a comprehensive reconstruction of the development paradigm, engineering architecture, and even organizational collaboration methods.
From the evolution of development frameworks to the support of the full ecosystem of Python, Java, and Golang; from the practical implementation of distributed systems to full-link security and intelligent evaluation mechanisms—every step is propelling AI native applications towards maturity.
The future is here, waiting for us to dive in. If you are also exploring application scenarios for Agents, feel free to pay attention to the AgentScope project or try using Alibaba Cloud MSE + Higress + Nacos to build your own AI native application. Together, let's step into the new world of intelligent agents.






