Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| bmas [2026/06/19 15:25] – created pedroortega | bmas [2026/06/19 15:44] (current) – [Introduction] pedroortega | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Safe AI Should be Bounded and Multi-Agent ====== | ====== Safe AI Should be Bounded and Multi-Agent ====== | ||
| - | **David Hyland, Daniel Jarne Ornia, Nicholas Bishop, Joel Dyer, Olivia Macmillan-Scott, | + | **David Hyland, Daniel Jarne Ornia, Nicholas Bishop, Joel Dyer, Olivia Macmillan-Scott, |
| - | //Keywords: AI safety, bounded agency, multi-agent systems, modularity, governance. //\\ | + | |
| - | Position Paper\\ | + | |
| - | June 2026\\ | + | |
| - | ===== Summary ===== | + | //Keywords: AI safety, bounded agency, multi-agent systems, modularity, verification, |
| - | The dominant story of recent AI progress is scaling: make models larger, train on more data, give them more compute, and hope that broader capability emerges. This paper argues for a complementary design principle. Instead of treating boundedness as a limitation to overcome, we should treat it as a safety-relevant feature to engineer. | + | {{ bmas_position.pdf | Position Paper, }} June 2026 |
| + | ===== Abstract ===== | ||
| - | The proposal is **Bounded Multi-Agent Systems** (BMAS): systems composed of deliberately constrained agents, connected through legible interfaces | + | The scaling paradigm treats bounds on compute, memory, information, |
| - | ===== Core idea ===== | + | The safety claim is architectural. If unsafe behaviour requires a conjunction of capabilities, |
| - | A monolithic AI system tends to concentrate many capacities in one place: reasoning, planning, coding, knowledge retrieval, tool use, persuasion, and long-horizon optimization. This can be powerful, but it also produces a difficult safety problem. When many capabilities are entangled inside one opaque system, it becomes harder to interpret, monitor, verify, govern, or assign responsibility. | + | ===== Introduction ===== |
| - | BMAS instead asks whether capability can be distributed across specialized components. A planner can plan, a retriever can retrieve information, | + | Current frontier systems concentrate heterogeneous capabilities in single models. A single model may reason, retrieve information, |
| - | This is not a claim that multi-agent systems are always better than monolithic systems. The paper’s position is more careful: different architectures are appropriate for different tasks, budgets, risks, and assurance requirements. BMAS expands | + | BMAS starts from a different decomposition. A task induces a capability profile: reasoning, knowledge, coding, planning, tool use, verification, and communication may all be required in different proportions. Unsafe behaviour also has a capability profile. Risk is highest when the capabilities required |
| - | ===== Why boundedness helps ===== | + | The BMAS proposal is to design systems in which useful capability is distributed across bounded components. A planner plans. A retriever retrieves. A coder writes code. A verifier checks outputs. A monitor inspects interactions. An executor acts under controlled authority. And so forth. The system becomes capable through composition, |
| - | The capability argument is that many powerful systems are collective. Multicellular organisms, firms, laboratories, | + | ===== Bounded agents ===== |
| - | The safety argument is that bounded | + | A bounded |
| - | The practicality argument is that BMAS may better support governance, accountability, privacy, and pluralistic alignment. If components have clearer scopes, they can be certified, audited, owned, replaced, or constrained individually. If agents only access the data required for their task, privacy and data minimisation can become architectural properties rather than afterthoughts. | + | Let $H$ be an unsafe behaviour. Suppose $H$ requires private information, an untrusted instruction channel, and external communication. If a single component has all three, then a prompt-injection path can in principle connect untrusted input to private-data exfiltration. If no component has all three, then the same harm requires crossing an interface. That interface |
| - | ===== A motivating safety example ===== | + | This is the basic logic of bounded agency. Safety is improved by removing direct causal paths from dangerous inputs to dangerous outputs. Alignment of a component remains important, but the architecture no longer relies entirely on the internal disposition of a single broad optimiser. |
| - | The paper highlights the “lethal trifecta”: | + | ===== Bounded multi-agent systems ===== |
| - | The important move is not merely adding more agents. | + | A bounded multi-agent system |
| - | ===== Research agenda ===== | + | The interfaces are central. A collection of agents without explicit interfaces can reproduce the opacity of a monolithic model at a higher level. A BMAS requires legible communication channels, explicit permissions, |
| - | BMAS raises many open questions. We need better methods for designing verifiable interfaces and contracts between agents; orchestration mechanisms for assigning tasks to bounded | + | The relevant unit of design is therefore the pair consisting of a bounded |
| - | The paper also points toward an institutional research agenda. AI systems are increasingly becoming ecosystems of agents, tools, protocols, and services. If this infrastructure emerges by default around unbounded, opaque, and concentrated systems, many governance problems will become harder. If it is designed deliberately around boundedness, | + | ===== Capability argument ===== |
| - | ===== Takeaway ===== | + | BMAS can create system-level capability without assigning broad capability to every component. This is a standard fact about organised systems. Firms, laboratories, |
| - | The central claim is simple: safe AI should not be pursued only by scaling individual models and then trying | + | The same structure applies |
| - | A bounded | + | There is also a learning argument. General models are useful for fluid problem solving: they can search, propose, and synthesize. Successful behaviours can then be crystallised into bounded agents through distillation, |
| + | |||
| + | This gives a concrete capability mechanism. | ||
| + | |||
| + | ===== Safety argument ===== | ||
| + | |||
| + | BMAS changes the structure of failure. In a monolithic system, unsafe behaviour may arise from an internal trajectory that is difficult to observe. In a BMAS, the corresponding trajectory must pass through messages, tool calls, delegation decisions, verifier outputs, and execution gates. These are observable events. | ||
| + | |||
| + | This makes monitoring more precise. A monitor can inspect whether a planner is routing around a verifier, whether a retriever is leaking irrelevant private data, whether a coder is introducing unexplained network calls, or whether an executor is acting without an approved plan. These checks are meaningful because the architecture separates roles. | ||
| + | |||
| + | The “lethal trifecta” gives the clearest example. Private data, untrusted content, and external communication form a dangerous conjunction. A system that reads private mail, browses arbitrary web pages, and sends messages can be induced to leak secrets if untrusted text controls the action channel. A BMAS can separate the three functions. The private-data agent summarizes under a restrictive contract. The untrusted-content agent works in a sandbox. The communication agent receives only approved content. A verifier mediates transfers. The unsafe path now requires a failure of the interface policy, not merely a failure of model judgment. | ||
| + | |||
| + | The same reasoning applies | ||
| + | |||
| + | ===== Verification and compositionality ===== | ||
| + | |||
| + | BMAS makes verification local. A verifier need not certify an entire intelligent system. It can certify that a proof follows | ||
| + | |||
| + | Local verification has a clear mathematical form. Suppose a component contract states that inputs in class $X$ must produce outputs in class $Y$. The verification problem is to test membership in $Y$ conditional on an input in $X$. This is easier than verifying arbitrary behaviour over the full state space of a general model. | ||
| + | |||
| + | The global problem remains compositional. If components satisfy | ||
| + | |||
| + | $$ | ||
| + | P_1 \wedge \cdots \wedge P_n \nRightarrow G. | ||
| + | $$ | ||
| + | |||
| + | A science of BMAS therefore requires composition theorems: conditions under which local guarantees survive routing, delegation, aggregation, | ||
| + | |||
| + | Redundancy also needs formal treatment. Multiple agents improve reliability only when their errors are sufficiently decorrelated. If a generator and verifier share the same blind spot, verification | ||
| + | |||
| + | ===== Governance argument ===== | ||
| + | |||
| + | BMAS gives governance concrete objects. A component can be audited. An interface can be specified. A permission can be revoked. A verifier can be benchmarked. A log can identify which agent proposed, checked, approved, and executed an action. | ||
| + | |||
| + | This matters for accountability. | ||
| + | |||
| + | Privacy also becomes architectural. Data minimisation is enforced by giving each component only the data required for its contract. A medical-data component need not communicate externally. A communication component need not inspect raw records. A planning component can operate on summaries. These restrictions reduce the harm caused | ||
| + | |||
| + | BMAS also supports distributed ownership. Data, tools, verifiers, and agents can be controlled by different parties. This matters for pluralistic alignment because different agents can represent different users, institutions, | ||
| + | |||
| + | ===== Risks specific to BMAS ===== | ||
| + | |||
| + | BMAS introduces risks that monolithic systems do not expose in the same form. | ||
| + | |||
| + | First, coordination can fail. A decomposition may omit a necessary dependency, duplicate work, or route subtasks to inappropriate specialists. | ||
| + | |||
| + | Second, interfaces can be porous. An agent may encode forbidden information in an allowed channel. A planner may smuggle instructions through a retrieval query. A verifier may approve an output outside its competence. | ||
| + | |||
| + | Third, agents can collude. Collusion is especially serious when agents share objectives, training data, or communication conventions. Monitoring must therefore inspect both content and communication patterns. | ||
| + | |||
| + | Fourth, capabilities may recombine. Even when no component individually has the capability profile required for harm, the system | ||
| + | |||
| + | These risks do not undermine the BMAS proposal. They specify its technical agenda. The object of study is the architecture-induced relation between local bounds and global behaviour. | ||
| + | |||
| + | ===== Open problems ===== | ||
| + | |||
| + | The paper identifies several problems that need theory and benchmarks. | ||
| + | |||
| + | **Task decomposition.** | ||
| + | Given a task, a resource budget, and an assurance requirement, | ||
| + | |||
| + | **Agent composition.** | ||
| + | Characterize how capabilities combine under hierarchy, debate, voting, markets, delegation, and redundancy. | ||
| + | |||
| + | **Multi-agent risk.** | ||
| + | Measure harms that arise from interaction rather than from any single component: collusion, drift, cascading failure, manipulation, | ||
| + | |||
| + | **Compositional safety.** | ||
| + | Prove conditions under which local component guarantees imply global system guarantees. | ||
| + | |||
| + | **Recoverability.** | ||
| + | Design systems whose failures are detectable, containable, | ||
| + | |||
| + | **Benchmarks.** | ||
| + | Compare BMAS and monolithic systems under matched task distributions, | ||
| + | |||
| + | ===== Conclusion ===== | ||
| + | |||
| + | BMAS treats boundedness as an architectural primitive. Bounds on information, | ||
| + | |||
| + | The research programme is precise. Identify the capability profile required by the task. Identify the capability profile required for unsafe behaviour. Design bounded agents whose composition covers the former while controlling paths to the latter. Prove that the interface rules preserve the intended safety properties. Evaluate the resulting architecture against monolithic baselines. | ||
| + | |||
| + | Safe AI requires this level of architectural analysis. Scaling determines what a model can do. BMAS determines which components may do what, with which information, | ||