Differences

This shows you the differences between two versions of the page.

--- bmas [2026/06/19 15:25] – created pedroortega
+++ bmas [2026/06/19 15:44] (current) – [Introduction] pedroortega
@@ Line 1: / Line 1: @@
 ====== Safe AI Should be Bounded and Multi-Agent ======
-**David Hyland, Daniel Jarne Ornia, Nicholas Bishop, Joel Dyer, Olivia Macmillan-Scott, Tomáš Gaveniak, Ani Calinescu, Michael Wooldridge, Fernando Rosas, Pedro Ortega**\
+**David Hyland, Daniel Jarne Ornia, Nicholas Bishop, Joel Dyer, Olivia Macmillan-Scott, Tomáš Gaveniak, Ani Calinescu, Michael Wooldridge, Fernando Rosas, Pedro Ortega**
-//Keywords: AI safety, bounded agency, multi-agent systems, modularity, governance. //\\
-Position Paper\\
-June 2026\\
-===== Summary =====
+//Keywords: AI safety, bounded agency, multi-agent systems, modularity, verification, governance. //
-The dominant story of recent AI progress is scaling: make models larger, train on more data, give them more compute, and hope that broader capability emerges. This paper argues for a complementary design principle. Instead of treating boundedness as a limitation to overcome, we should treat it as a safety-relevant feature to engineer.
+{{ bmas_position.pdf | Position Paper, }} June 2026
+===== Abstract =====
-The proposal is **Bounded Multi-Agent Systems** (BMAS): systems composed of deliberately constrained agents, connected through legible interfaces and coordination mechanisms. A bounded agent may be limited in its information, architecture, resources, tools, authority, or objectives. The point is not to make AI weak, but to create system-level capability without giving any single component unchecked general power.
+The scaling paradigm treats bounds on compute, memory, information, authority, and affordances as obstacles. This paper argues that these bounds are also design variables. A **bounded agent** is an agent whose information, architecture, resources, or affordances limit its ability to optimise its objective. A **bounded multi-agent system** (BMAS) composes such agents through explicit interfaces so that system-level capability is obtained through decomposition, delegation, verification, and coordination.
-===== Core idea =====
+The safety claim is architectural. If unsafe behaviour requires a conjunction of capabilities, then risk can be reduced by separating those capabilities across components and controlling the interfaces that recombine them. BMAS therefore turns some global safety problems into local specification, monitoring, and verification problems. The approach does not eliminate  multi-agent risk, but it makes the relevant risk-bearing structures explicit.
-A monolithic AI system tends to concentrate many capacities in one place: reasoning, planning, coding, knowledge retrieval, tool use, persuasion, and long-horizon optimization. This can be powerful, but it also produces a difficult safety problem. When many capabilities are entangled inside one opaque system, it becomes harder to interpret, monitor, verify, govern, or assign responsibility.
+===== Introduction =====
-BMAS instead asks whether capability can be distributed across specialized components. A planner can plan, a retriever can retrieve information, a coder can write code, a verifier can check outputs, and a monitor can watch interactions. Each component can be bounded by design, while the system as a whole remains capable through coordination.
+Current frontier systems concentrate heterogeneous capabilities in single models. A single model may reason, retrieve information, write code, plan over long horizons, call tools, read private data, process untrusted content, and communicate externally. This concentration creates a difficult safety problem. The same system that solves the user’s task may also possess the information and affordances required for unsafe behaviour.
-This is not a claim that multi-agent systems are always better than monolithic systems. The paper’s position is more careful: different architectures are appropriate for different tasks, budgets, risks, and assurance requirements. BMAS expands the design space for safe AI rather than replacing every existing approach.
+BMAS starts from a different decomposition. A task induces a capability profile: reasoning, knowledge, coding, planning, tool use, verification, and communication may all be required in different proportions. Unsafe behaviour also has a capability profile. Risk is highest when the capabilities required for task completion and the capabilities required for harm are colocated in one agent.
-===== Why boundedness helps =====
+The BMAS proposal is to design systems in which useful capability is distributed across bounded components. A planner plans. A retriever retrieves. A coder writes code. A verifier checks outputs. A monitor inspects interactions. An executor acts under controlled authority. And so forth. The system becomes capable through composition, while each component has a restricted scope.
-The capability argument is that many powerful systems are collective. Multicellular organisms, firms, laboratories, markets, and scientific communities achieve system-level competence through division of labor. Similarly, AI systems may be able to decompose tasks, route subtasks, allocate resources, and compose verified outputs without every component needing broad capability.
+===== Bounded agents =====
-The safety argument is that bounded systems shift some risks into more familiar engineering territory. Narrower agents may be easier to understand and predict. Explicit interfaces make interactions easier to monitor. Local verification becomes possible at component boundaries. Redundancy can make the system more robust to the failure of any single agent.
+A bounded agent has limits on at least one of four dimensions: information, computation, authority, and affordances. These limits are safety-relevant because many harms require a conjunction of these dimensions.
-The practicality argument is that BMAS may better support governance, accountability, privacy, and pluralistic alignment. If components have clearer scopes, they can be certified, audited, owned, replaced, or constrained individually. If agents only access the data required for their task, privacy and data minimisation can become architectural properties rather than afterthoughts.
+Let $H$ be an unsafe behaviour. Suppose $H$ requires private information, an untrusted instruction channel, and external communication. If a single component has all three, then a prompt-injection path can in principle connect untrusted input to private-data exfiltration. If no component has all three, then the same harm requires crossing an interface. That interface can be logged, filtered, verified, rate-limited, or blocked.
-===== A motivating safety example =====
+This is the basic logic of bounded agency. Safety is improved by removing direct causal paths from dangerous inputs to dangerous outputs. Alignment of a component remains important, but the architecture no longer relies entirely on the internal disposition of a single broad optimiser.
-The paper highlights the “lethal trifecta”: private data, untrusted content, and external communications. A single agent with access to all three can be vulnerable to prompt injection, data exfiltration, or manipulation. In a bounded architecture, these capabilities can be separated. One component may read private data, another may inspect untrusted content, another may communicate externally, and monitors or verifiers can mediate the boundaries between them.
+===== Bounded multi-agent systems =====
-The important move is not merely adding more agents. It is designing the interfaces so that dangerous combinations of information and affordances are deliberately avoided.
+A bounded multi-agent system is a collection of bounded agents with designed interfaces. The interfaces specify what information is transmitted, which actions are permitted, what evidence is recorded, and which checks must occur before execution.
-===== Research agenda =====
+The interfaces are central. A collection of agents without explicit interfaces can reproduce the opacity of a monolithic model at a higher level. A BMAS requires legible communication channels, explicit permissions, and inspectable intermediate objects. The system should expose plans, claims, code, tool calls, votes, proofs, critiques, and approvals as objects that can be monitored and evaluated.
-BMAS raises many open questions. We need better methods for designing verifiable interfaces and contracts between agents; orchestration mechanisms for assigning tasks to bounded specialists; markets, voting rules, and collective-choice procedures for aggregating outputs; and monitoring methods that can detect collusion, drift, or emergent misalignment.
+The relevant unit of design is therefore the pair consisting of a bounded component and its interface. A narrow verifier with an unrestricted communication channel is unsafe. A powerful planner with no execution authority may be acceptable. A weak executor with excessive permissions may be dangerous. Capability cannot be assessed independently of affordance.
-The paper also points toward an institutional research agenda. AI systems are increasingly becoming ecosystems of agents, tools, protocols, and services. If this infrastructure emerges by default around unbounded, opaque, and concentrated systems, many governance problems will become harder. If it is designed deliberately around boundedness, modularity, and accountability, it could support a more distributed and democratic AI ecosystem.
+===== Capability argument =====
-===== Takeaway =====
+BMAS can create system-level capability without assigning broad capability to every component. This is a standard fact about organised systems. Firms, laboratories, software systems, and scientific communities solve tasks through division of labour, memory, specialization, criticism, and aggregation.
-The central claim is simple: safe AI should not be pursued only by scaling individual models and then trying to control them after the fact. Boundedness should be part of the architecture.
+The same structure applies to AI. A coordinator decomposes a task. Specialists solve subtasks. Verifiers check intermediate outputs. Monitors inspect communication. The final answer is composed from checked parts. This architecture can improve reliability whenever decomposition produces subproblems whose solutions are easier to verify than to generate.
-A bounded multi-agent approach gives us a way to build capable systems from constrained parts, with legible interfaces, local verification, distributed authority, and clearer accountability. It does not solve AI safety by itself, but it offers a promising design principle: make the system powerful through composition, not through unbounded concentration.
+There is also a learning argument. General models are useful for fluid problem solving: they can search, propose, and synthesize. Successful behaviours can then be crystallised into bounded agents through distillation, fine-tuning, tool wrappers, cached procedures, or written skills. The resulting specialist is easier to benchmark and constrain because its task distribution is narrower.
+This gives a concrete capability mechanism. A general model discovers a procedure; a bounded component stores or executes it; a verifier checks its outputs; an orchestrator decides when to invoke it. Capability is preserved through reuse, while the broad model need not retain all authority at execution time.
+===== Safety argument =====
+BMAS changes the structure of failure. In a monolithic system, unsafe behaviour may arise from an internal trajectory that is difficult to observe. In a BMAS, the corresponding trajectory must pass through messages, tool calls, delegation decisions, verifier outputs, and execution gates. These are observable events.
+This makes monitoring more precise. A monitor can inspect whether a planner is routing around a verifier, whether a retriever is leaking irrelevant private data, whether a coder is introducing unexplained network calls, or whether an executor is acting without an approved plan. These checks are meaningful because the architecture separates roles.
+The “lethal trifecta” gives the clearest example. Private data, untrusted content, and external communication form a dangerous conjunction. A system that reads private mail, browses arbitrary web pages, and sends messages can be induced to leak secrets if untrusted text controls the action channel. A BMAS can separate the three functions. The private-data agent summarizes under a restrictive contract. The untrusted-content agent works in a sandbox. The communication agent receives only approved content. A verifier mediates transfers. The unsafe path now requires a failure of the interface policy, not merely a failure of model judgment.
+The same reasoning applies to long-horizon agency. Broad optimisers can exploit errors in their objectives. Bounded agents have restricted search spaces, limited tools, and local goals. These restrictions act as regularizers. They reduce the set of policies the agent can realize, and hence reduce the probability that misspecification is amplified into extreme behaviour.
+===== Verification and compositionality =====
+BMAS makes verification local. A verifier need not certify an entire intelligent system. It can certify that a proof follows from assumptions, that a code patch passes tests, that a retrieved document supports a claim, or that a proposed action satisfies a policy.
+Local verification has a clear mathematical form. Suppose a component contract states that inputs in class $X$ must produce outputs in class $Y$. The verification problem is to test membership in $Y$ conditional on an input in $X$. This is easier than verifying arbitrary behaviour over the full state space of a general model.
+The global problem remains compositional. If components satisfy local contracts $P_1,\ldots,P_n$, the system is safe only when the composition rule implies the desired global property $G$. In general,
+$$
+P_1 \wedge \cdots \wedge P_n \nRightarrow G.
+$$
+A science of BMAS therefore requires composition theorems: conditions under which local guarantees survive routing, delegation, aggregation, and execution. Without such theorems, modularity is an engineering heuristic rather than a safety guarantee.
+Redundancy also needs formal treatment. Multiple agents improve reliability only when their errors are sufficiently decorrelated. If a generator and verifier share the same blind spot, verification fails systematically. BMAS therefore requires diversity across models, data, objectives, tools, prompts, or evidence sources when redundancy is used as a safety mechanism.
+===== Governance argument =====
+BMAS gives governance concrete objects. A component can be audited. An interface can be specified. A permission can be revoked. A verifier can be benchmarked. A log can identify which agent proposed, checked, approved, and executed an action.
+This matters for accountability. In a monolithic system, responsibility is difficult to allocate because internal causes are entangled. In a BMAS, responsibility can attach to roles: planner, retriever, verifier, monitor, executor, data custodian, tool provider, or orchestrator. Liability and certification can then track functional responsibility rather than treating the system as an undifferentiated model.
+Privacy also becomes architectural. Data minimisation is enforced by giving each component only the data required for its contract. A medical-data component need not communicate externally. A communication component need not inspect raw records. A planning component can operate on summaries. These restrictions reduce the harm caused by compromise or misalignment of any single component.
+BMAS also supports distributed ownership. Data, tools, verifiers, and agents can be controlled by different parties. This matters for pluralistic alignment because different agents can represent different users, institutions, or normative standpoints. Aggregation can then be handled by standard mechanisms: voting, bargaining, markets, reputation, or constitutional constraints. The alignment problem becomes partly a problem of institutional design.
+===== Risks specific to BMAS =====
+BMAS introduces risks that monolithic systems do not expose in the same form.
+First, coordination can fail. A decomposition may omit a necessary dependency, duplicate work, or route subtasks to inappropriate specialists.
+Second, interfaces can be porous. An agent may encode forbidden information in an allowed channel. A planner may smuggle instructions through a retrieval query. A verifier may approve an output outside its competence.
+Third, agents can collude. Collusion is especially serious when agents share objectives, training data, or communication conventions. Monitoring must therefore inspect both content and communication patterns.
+Fourth, capabilities may recombine. Even when no component individually has the capability profile required for harm, the system may assemble that profile through delegation. The safety boundary is therefore a property of the interaction graph, not merely of the nodes.
+These risks do not undermine the BMAS proposal. They specify its technical agenda. The object of study is the architecture-induced relation between local bounds and global behaviour.
+===== Open problems =====
+The paper identifies several problems that need theory and benchmarks.
+**Task decomposition.**
+Given a task, a resource budget, and an assurance requirement, determine a decomposition into bounded agents and interfaces.
+**Agent composition.**
+Characterize how capabilities combine under hierarchy, debate, voting, markets, delegation, and redundancy.
+**Multi-agent risk.**
+Measure harms that arise from interaction rather than from any single component: collusion, drift, cascading failure, manipulation, and unsafe recombination.
+**Compositional safety.**
+Prove conditions under which local component guarantees imply global system guarantees.
+**Recoverability.**
+Design systems whose failures are detectable, containable, reversible, and repairable.
+**Benchmarks.**
+Compare BMAS and monolithic systems under matched task distributions, resource budgets, risk tolerances, and assurance requirements.
+===== Conclusion =====
+BMAS treats boundedness as an architectural primitive. Bounds on information, computation, authority, and affordances define the safety-relevant shape of an agent. Composition then determines whether the system recovers useful capability while preventing dangerous conjunctions.
+The research programme is precise. Identify the capability profile required by the task. Identify the capability profile required for unsafe behaviour. Design bounded agents whose composition covers the former while controlling paths to the latter. Prove that the interface rules preserve the intended safety properties. Evaluate the resulting architecture against monolithic baselines.
+Safe AI requires this level of architectural analysis. Scaling determines what a model can do. BMAS determines which components may do what, with which information, under which checks, and through which interfaces.