@cognitive-swarm/agent
LLM-powered swarm agents with Thompson Sampling strategy selection. Each agent is a full cognitive-engine pipeline with a swarm-specific reaction layer.
Install
npm install @cognitive-swarm/agentSwarmAgent
Each agent wraps a CognitiveOrchestrator (perception, memory, emotions, reasoning, metacognition) and adds swarm-specific behavior: signal routing, strategy selection via Thompson Sampling, and personality-driven reactions.
import { SwarmAgent } from '@cognitive-swarm/agent'
const agent = new SwarmAgent(
orchestrator, // CognitiveOrchestrator from @cognitive-engine
bandit, // ThompsonBandit from @cognitive-engine/bandit
config, // SwarmAgentConfig
toolSupport?, // AgentToolSupport (MCP tools)
)Agents are created internally by SwarmOrchestrator from SwarmAgentDef objects -- you don't need to construct them directly.
Agent Lifecycle
Creation → Signal Reception → Strategy Selection → Execution → Signal Emission → Feedback → Repeat- Creation:
SwarmOrchestratorcreates agents fromSwarmAgentDefobjects. Each gets its ownCognitiveOrchestrator,ThompsonBandit,PersonalityFilter, andBeliefModel. - Signal Reception:
SignalBusroutes signals based onlistens. Agent checksshouldReact(). - Strategy Selection: Thompson Sampling bandit picks optimal strategy from context vector.
- Execution:
CognitiveOrchestratorprocesses prompt built from signal, strategy, history, and ToM context. - Signal Emission: Response mapped to typed output signals, filtered by
canEmit. - Feedback: After consensus, orchestrator calls
recordFeedback()to update bandit rewards. - Repeat steps 2-6 until consensus or
maxRoundsreached.
SwarmAgentConfig
interface SwarmAgentConfig {
readonly id: string
readonly name: string
readonly role: string // system prompt injected into cognitive pipeline
readonly personality: PersonalityVector // drives reaction behavior
readonly listens: readonly SignalType[] // which signals this agent processes
readonly canEmit: readonly SignalType[] // which signals this agent can emit
readonly weight?: number // vote weight, default: 1.0
readonly maxConcurrentSignals?: number // default: 1
readonly reactionDelayMs?: number // artificial delay, default: 0
readonly strategyActions?: readonly AgentStrategyId[] // which strategies bandit selects from
readonly tools?: AgentToolConfig // MCP tool access
readonly onError?: ErrorHandler
}Configuration Recommendations by Task Type
| Task Type | Key Config | Reasoning |
|---|---|---|
| Analysis / Research | maxConcurrentSignals: 1, default strategies | Sequential processing ensures thorough analysis |
| Code Review | strategyActions: ['analyze', 'challenge', 'support'] | No synthesis needed -- each agent reviews independently |
| Decision Making | weight: 2.0 for judge agent | Judge's synthesis carries more authority |
| Brainstorming | reactionDelayMs: 0, maxConcurrentSignals: 3 | Fast, parallel idea generation |
| Safety-Critical | strategyActions: ['analyze', 'challenge', 'defer'] | Never auto-support -- force explicit analysis |
PersonalityVector
The personality vector drives how an agent reacts to signals. It shapes the probability distribution over strategies the Thompson Sampling bandit samples from, and also controls the PersonalityFilter that gates which signals the agent processes.
interface PersonalityVector {
readonly curiosity: number // 0..1 - drives explore/discover behavior
readonly caution: number // 0..1 - drives doubt/challenge behavior
readonly conformity: number // 0..1 - drives agree/support behavior
readonly verbosity: number // 0..1 - drives signal emission volume
}How Each Dimension Affects Behavior
All four dimensions are included in the context vector passed to Thompson Sampling. They also control the PersonalityFilter:
| Dimension | Bandit Bias | PersonalityFilter Rule | High Value Effect | Low Value Effect |
|---|---|---|---|---|
| Curiosity | analyze, propose | <= 0.3: ignores discovery signals | Reacts to more discoveries | Narrow focus |
| Caution | challenge, defer | > 0.7: skips signals with confidence < 0.4 | Selective, confident-only | Engages with speculation |
| Conformity | support | >= 0.8: ignores challenge/doubt signals | Agrees, avoids dissent | Pushes back, challenges |
| Verbosity | signal volume | (no filter rule) | Longer, detailed responses | Concise output |
Examples
// A strong critic: high caution, low conformity
const critic = {
curiosity: 0.6,
caution: 0.9,
conformity: 0.1,
verbosity: 0.5,
}
// An explorer: high curiosity, low caution
const explorer = {
curiosity: 0.95,
caution: 0.3,
conformity: 0.4,
verbosity: 0.8,
}
// A consensus builder: high conformity
const synthesizer = {
curiosity: 0.5,
caution: 0.4,
conformity: 0.9,
verbosity: 0.6,
}Agent Strategies
Each agent selects a strategy per signal using a Thompson Sampling bandit. The bandit adapts over time -- strategies that produce signals influencing consensus get rewarded.
type AgentStrategyId =
| 'analyze' // deep analysis of the current state
| 'propose' // emit a concrete proposal
| 'challenge' // emit a challenge to an existing proposal
| 'support' // emit a vote of agreement
| 'synthesize' // emit a synthesis of multiple signals
| 'defer' // abstain (useful when uncertain)All 6 Strategies Explained
analyze -- Outputs discovery signals. The agent examines the incoming signal and reports factual findings, patterns, or relevant information. Does not take a position -- just provides data.
propose -- Outputs proposal signals. The agent formulates a concrete solution, course of action, or answer. Each proposal gets a unique proposalId that other agents can vote on.
challenge -- Outputs challenge or doubt signals. The agent critically examines the incoming signal, identifies weaknesses, risks, or alternative interpretations. Essential for preventing groupthink.
support -- Outputs vote signals. The agent evaluates the incoming signal and casts a vote (agree/disagree/abstain). The vote stance is determined by the LLM response's confidence level:
- Confidence >= 0.6 ->
agree - Confidence <= 0.3 ->
disagree - Between 0.3 and 0.6 ->
abstain
synthesize -- Outputs proposal signals. Similar to propose, but the prompt specifically asks the LLM to combine insights from multiple sources into a coherent whole. Used by synthesizer/judge agents.
defer -- No output signals. The agent abstains from responding. Useful when the agent is uncertain or when the signal is outside its expertise. No LLM call is made.
Strategy Output Types
Each strategy maps to specific signal types:
| Strategy | Output Signal Types | Filtered by canEmit |
|---|---|---|
analyze | discovery | Yes |
propose | proposal | Yes |
challenge | challenge, doubt | Yes |
support | vote | Yes |
synthesize | proposal | Yes |
defer | (none) | N/A |
Fallback behavior: If a strategy's output type is not in the agent's canEmit list, the agent falls back to the closest allowed type: discovery -> challenge -> doubt -> proposal -> vote.
Strategy Selection Mechanics: Thompson Sampling
Thompson Sampling is a Bayesian approach to the multi-armed bandit problem. Here is how it picks strategies:
Context vector is built from the incoming signal:
[signal.confidence, curiosity, caution, conformity, verbosity, ...signalTypeOneHot]The one-hot encoding covers all 11 signal types (
task:new,discovery,proposal,doubt,challenge,vote,conflict,consensus:reached,escalate,memory:shared,tool:result).Bandit samples from its posterior distribution for each available strategy, given the context.
Highest-sampled strategy wins. Because Thompson Sampling samples randomly from posteriors, it naturally explores (tries less-proven strategies) vs exploits (picks the best-known strategy).
After consensus, the orchestrator calls
recordFeedback()to update the bandit:- Strategies that led to signals incorporated in the consensus get reward ~1.0
- Strategies that were ignored or led to rejected positions get reward ~0.0
- Over time, the bandit learns which strategies work best for each signal type and personality combination
Restricting Strategies
{
config: {
id: 'critic',
strategyActions: ['analyze', 'challenge', 'defer'],
// This agent never proposes or supports - only analyzes and challenges
}
}Common restrictions:
| Agent Role | Recommended Strategies | Rationale |
|---|---|---|
| Pure critic | analyze, challenge, defer | Should never agree without analysis |
| Judge/synthesizer | synthesize, support, defer | Should combine, not explore |
| Explorer | analyze, propose | Should generate, not evaluate |
| Voter only | support, defer | Cast votes, do not produce content |
Belief Model (Theory of Mind)
Each agent maintains a BeliefModel that tracks other agents' mental states (L1 Theory of Mind). Updated from received signals.
interface AgentBeliefState {
readonly stances: ReadonlyMap<string, 'agree' | 'disagree' | 'abstain'>
readonly dominantOutputTypes: readonly SignalType[]
readonly recentConcerns: readonly string[]
readonly avgConfidence: number
readonly signalCount: number
}How it Works
The BeliefModel incrementally updates from every signal the agent receives:
Stances: When another agent emits a
votesignal, the belief model records their stance (agree/disagree/abstain) keyed byproposalId.Dominant output types: Tracks a histogram of signal types emitted by each other agent. The top 3 most frequent types are reported as
dominantOutputTypes.Recent concerns: Extracts text from
challengeanddoubtsignals (last 5 per agent). These are specific worries or objections the other agent has raised.Average confidence: Running mean of
signal.confidenceacross all signals from that agent.
Theory of Mind in Practice
The belief model generates a prompt section that is injected into the agent's LLM call:
OTHER AGENTS' POSITIONS (anticipate their objections, address directly):
[agent-critic] voted: 0 agree, 2 disagree | mainly: challenge | concern: "SQL injection risk in user input handling" | avg confidence: 0.72
[agent-explorer] voted: 1 agree, 0 disagree | mainly: discovery | avg confidence: 0.85This enables agents to:
- Anticipate objections from known critics
- Build on discoveries from explorers
- Address specific concerns rather than talking past each other
- Adjust their approach based on the group's current state
Capacity Limits
- Maximum 20 agents tracked per belief model (oldest evicted on overflow)
- Maximum 5 recent concerns stored per tracked agent
- Agents with fewer than 2 observed signals are excluded from the Theory of Mind prompt
- Concern text is truncated to 150 characters
Personality Filter
The PersonalityFilter determines whether an agent should react to a given signal based on its personality vector. It runs before the Thompson Sampling bandit, acting as a fast pre-filter.
Filter rules:
| Condition | Result |
|---|---|
caution > 0.7 AND signal.confidence < 0.4 | Skip (too uncertain for cautious agent) |
Signal type is challenge or doubt AND conformity >= 0.8 | Skip (conformist ignores dissent) |
Signal type is discovery AND curiosity <= 0.3 | Skip (not curious enough) |
| Otherwise | React |
Note: These rules interact with each other. An agent with caution: 0.8, conformity: 0.9, curiosity: 0.2 will ignore low-confidence signals, all challenges/doubts, AND discoveries -- effectively making it react only to high-confidence proposals and votes.
Signal Processing Pipeline
The full pipeline when an agent receives a signal:
1. shouldReact(signal)
├── Is this signal from myself? → skip
├── Am I at maxConcurrentSignals? → skip
├── Is signal.type in my listens list? → if not, skip
└── PersonalityFilter.shouldReact(signal) → if false, skip
2. selectStrategy(signal)
├── Build context vector: [confidence, curiosity, caution, conformity, verbosity, ...typeOneHot]
├── Filter available strategies to config.strategyActions
└── Thompson Sampling bandit.select(context, actions) → strategy
3. executeStrategy(signal, strategy)
├── If strategy === 'defer' → return []
├── Build prompt from role + history + ToM + signal + strategy
├── If tools configured → tool loop (up to maxToolCalls iterations)
└── CognitiveOrchestrator.process(agentId, prompt) → CognitiveResponse
4. buildOutputSignals(signal, strategy, response)
├── Map strategy → allowed output signal types
├── Filter by canEmit (with fallback to closest allowed type)
├── Build typed payload (discovery/proposal/challenge/doubt/vote)
├── Infer causal level (correlation/intervention/counterfactual)
└── Return Signal[]Causal Level Inference
Each output signal is tagged with a causal reasoning level based on Pearl's Ladder of Causation:
| Strategy / Output | Causal Level | Meaning |
|---|---|---|
analyze, support | correlation | "We observe X correlating with Y" |
propose, synthesize | intervention | "If we do X, then Y" |
challenge, doubt | counterfactual | "What if this assumption is wrong?" |
This metadata can be used by downstream systems (e.g., evolution, composer) to assess the depth of reasoning.
Tool Support
Agents can use external tools via MCP (Model Context Protocol):
interface AgentToolConfig {
readonly servers: readonly McpServerConfig[]
readonly maxToolCalls?: number // default: 3 per reaction
readonly toolTimeoutMs?: number // default: 30_000
readonly personalityGating?: boolean // high-caution agents use tools more conservatively
}Tool calls loop: prompt with tools -> LLM response -> parse tool calls -> execute -> emit tool:result signals -> follow-up prompt with results -> repeat until no more calls or maxToolCalls exhausted.
Tool results are broadcast as tool:result signals visible to all agents.
agentDef({
id: 'researcher',
name: 'Web Researcher',
role: 'Search the web for relevant information to verify claims',
personality: 'analytical',
listens: ['task:new', 'proposal'],
canEmit: ['discovery', 'vote'],
tools: {
servers: [{ uri: 'http://localhost:3001/mcp', name: 'web-search' }],
maxToolCalls: 3,
toolTimeoutMs: 15_000,
},
}, { engine })Concurrency Handling
Agents process signals sequentially by default (maxConcurrentSignals: 1). This prevents race conditions in the belief model and signal history.
If you increase maxConcurrentSignals, be aware:
- Multiple signals may be processed simultaneously
- The belief model updates are not atomic (but are append-only, so this is safe)
- Signal history may interleave (order is preserved per-signal but not across concurrent signals)
- The
activeTaskscounter gates concurrency -- if at capacity,shouldReact()returns false
// Sequential (default, safest)
{ maxConcurrentSignals: 1 }
// Parallel processing (for high-throughput scenarios)
{ maxConcurrentSignals: 3 }Cross-Round Context
Agents maintain a signal history (last 20 signals, self-signals excluded) that provides context across rounds. The prompt includes summaries of what other agents said, enabling agents to build on discoveries, reference specific prior claims, and avoid duplicate analysis.
AgentReaction
The result of an agent processing a signal:
interface AgentReaction {
readonly agentId: string
readonly inResponseTo: string // signal ID that triggered this reaction
readonly signals: readonly Signal[] // signals emitted in response
readonly strategyUsed: AgentStrategyId
readonly processingTimeMs: number
}On error: If the cognitive pipeline throws, the agent catches the error, calls onError, and returns an empty reaction with strategyUsed: 'defer' and signals: []. The swarm continues without this agent's contribution for that round.
AgentContribution
Tracked across a full solve:
interface AgentContribution {
readonly agentId: string
readonly signalsEmitted: number
readonly proposalsMade: number
readonly challengesMade: number
readonly votesCast: number
readonly avgConfidence: number
}Available in SwarmResult.agentContributions.
Debug Traces
Use SwarmResult.agentContributions and @cognitive-swarm/introspection to understand agent decisions:
Common patterns to look for:
- Agent with 0 signals emitted: check
listens,canEmit, and personality filter thresholds - Agent always defers: bandit has not learned useful strategies yet
- Agent challenges everything: caution too high or conformity too low
- Agent agrees with everything: conformity too high (>= 0.8 blocks challenges)
Example: Creating a Specialized Expert Agent
import { agentDef } from '@cognitive-swarm/templates'
// A security expert that only analyzes and challenges, never proposes
const securityExpert = agentDef({
id: 'security-expert',
name: 'Security Expert',
role: 'Identify security vulnerabilities, OWASP Top 10, injection risks, auth bypass, data exposure. Never approve code with unvalidated input.',
personality: {
curiosity: 0.5, // moderately interested in discoveries
caution: 0.85, // very cautious -- skips low-confidence signals
conformity: 0.15, // very non-conformist -- always challenges
verbosity: 0.6, // detailed but not excessive
},
listens: ['task:new', 'proposal', 'discovery'],
canEmit: ['challenge', 'doubt', 'discovery', 'vote'],
weight: 1.5, // security concerns get extra weight
strategyActions: ['analyze', 'challenge', 'defer'], // never proposes or supports
}, { engine })Why these choices:
caution: 0.85-- above the 0.7 threshold, so it ignores uncertain signals. A security reviewer should only comment on things it is confident about.conformity: 0.15-- well below 0.8, so it engages with challenges and doubts from other agents.strategyActionsexcludesproposeandsupport-- this agent should only find problems, not suggest fixes (leave that to the fixer agent).weight: 1.5-- security issues should weigh more heavily in consensus.