Safe AI Should be Bounded and Multi-Agent

This is an old revision of the document!

David Hyland, Daniel Jarne Ornia, Nicholas Bishop, Joel Dyer, Olivia Macmillan-Scott, Tomáš Gaveniak, Ani Calinescu, Michael Wooldridge, Fernando Rosas, Pedro Ortega\ Keywords: AI safety, bounded agency, multi-agent systems, modularity, governance.
Position Paper
June 2026

Summary

The dominant story of recent AI progress is scaling: make models larger, train on more data, give them more compute, and hope that broader capability emerges. This paper argues for a complementary design principle. Instead of treating boundedness as a limitation to overcome, we should treat it as a safety-relevant feature to engineer.

The proposal is Bounded Multi-Agent Systems (BMAS): systems composed of deliberately constrained agents, connected through legible interfaces and coordination mechanisms. A bounded agent may be limited in its information, architecture, resources, tools, authority, or objectives. The point is not to make AI weak, but to create system-level capability without giving any single component unchecked general power.

A monolithic AI system tends to concentrate many capacities in one place: reasoning, planning, coding, knowledge retrieval, tool use, persuasion, and long-horizon optimization. This can be powerful, but it also produces a difficult safety problem. When many capabilities are entangled inside one opaque system, it becomes harder to interpret, monitor, verify, govern, or assign responsibility.

BMAS instead asks whether capability can be distributed across specialized components. A planner can plan, a retriever can retrieve information, a coder can write code, a verifier can check outputs, and a monitor can watch interactions. Each component can be bounded by design, while the system as a whole remains capable through coordination.

This is not a claim that multi-agent systems are always better than monolithic systems. The paper’s position is more careful: different architectures are appropriate for different tasks, budgets, risks, and assurance requirements. BMAS expands the design space for safe AI rather than replacing every existing approach.

The capability argument is that many powerful systems are collective. Multicellular organisms, firms, laboratories, markets, and scientific communities achieve system-level competence through division of labor. Similarly, AI systems may be able to decompose tasks, route subtasks, allocate resources, and compose verified outputs without every component needing broad capability.

The safety argument is that bounded systems shift some risks into more familiar engineering territory. Narrower agents may be easier to understand and predict. Explicit interfaces make interactions easier to monitor. Local verification becomes possible at component boundaries. Redundancy can make the system more robust to the failure of any single agent.

The practicality argument is that BMAS may better support governance, accountability, privacy, and pluralistic alignment. If components have clearer scopes, they can be certified, audited, owned, replaced, or constrained individually. If agents only access the data required for their task, privacy and data minimisation can become architectural properties rather than afterthoughts.

The paper highlights the “lethal trifecta”: private data, untrusted content, and external communications. A single agent with access to all three can be vulnerable to prompt injection, data exfiltration, or manipulation. In a bounded architecture, these capabilities can be separated. One component may read private data, another may inspect untrusted content, another may communicate externally, and monitors or verifiers can mediate the boundaries between them.

The important move is not merely adding more agents. It is designing the interfaces so that dangerous combinations of information and affordances are deliberately avoided.

BMAS raises many open questions. We need better methods for designing verifiable interfaces and contracts between agents; orchestration mechanisms for assigning tasks to bounded specialists; markets, voting rules, and collective-choice procedures for aggregating outputs; and monitoring methods that can detect collusion, drift, or emergent misalignment.

The paper also points toward an institutional research agenda. AI systems are increasingly becoming ecosystems of agents, tools, protocols, and services. If this infrastructure emerges by default around unbounded, opaque, and concentrated systems, many governance problems will become harder. If it is designed deliberately around boundedness, modularity, and accountability, it could support a more distributed and democratic AI ecosystem.

The central claim is simple: safe AI should not be pursued only by scaling individual models and then trying to control them after the fact. Boundedness should be part of the architecture.

A bounded multi-agent approach gives us a way to build capable systems from constrained parts, with legible interfaces, local verification, distributed authority, and clearer accountability. It does not solve AI safety by itself, but it offers a promising design principle: make the system powerful through composition, not through unbounded concentration.

Safe AI Should be Bounded and Multi-Agent

Summary

Core idea

Why boundedness helps

A motivating safety example

Research agenda

Takeaway