Skip to content

Observability

cognitive-swarm has zero-overhead observability when no provider is configured, and full distributed tracing via OpenTelemetry when enabled. This page covers the OTel integration internals, all span types and attributes, Jaeger setup, streaming events, and the SwarmResult observability fields.

Architecture

SwarmOrchestrator

    │  events: TypedEventEmitter<SwarmEventMap>
    │   emits signal:emitted, round:start, consensus:reached, ...


instrumentSwarm(orchestrator)

    │  Subscribes to ALL 20 event types
    │  Routes each event to SpanManager


SpanManager

    │  Maintains span hierarchy:
    │    solve → round → agent / debate / advisor


OpenTelemetry API
    │  trace.startSpan(), span.end(), span.addEvent()


OTLP Exporter → Jaeger / Grafana Tempo / etc.

The key design: tracing failures never crash the swarm. Every method in SpanManager is wrapped in try-catch with empty catch blocks.

Quick Setup

bash
npm install @cognitive-swarm/otel @opentelemetry/sdk-node @opentelemetry/exporter-trace-otlp-grpc
typescript
import { SwarmOrchestrator } from '@cognitive-swarm/orchestrator'
import { instrumentSwarm } from '@cognitive-swarm/otel'
import { NodeSDK } from '@opentelemetry/sdk-node'
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc'

// 1. Initialize OTel SDK (before anything else)
const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://localhost:4317',
  }),
})
sdk.start()

// 2. Create orchestrator
const swarm = new SwarmOrchestrator({ agents, ... })

// 3. Wrap with instrumentation
const instrumented = instrumentSwarm(swarm, {
  agentCount: agents.length,
  maxRounds: 10,
})

// 4. Use instrumented orchestrator (same API)
const result = await instrumented.solve('task')

// 5. Cleanup
instrumented.destroy()  // cleans up OTel subscriptions + orchestrator
// or: instrumented.dispose()  // OTel only, keeps orchestrator alive

instrumentSwarm API

typescript
interface InstrumentSwarmOptions {
  readonly agentCount?: number   // for span attributes
  readonly maxRounds?: number    // for span attributes
}

interface InstrumentedOrchestrator {
  solve(task: string): Promise<SwarmResult>
  solveWithStream(task: string): AsyncIterable<SwarmEvent>
  destroy(): void    // dispose OTel + destroy orchestrator
  dispose(): void    // dispose OTel only
}

function instrumentSwarm(
  orchestrator: InstrumentableOrchestrator,
  options?: InstrumentSwarmOptions,
): InstrumentedOrchestrator

The instrumentSwarm function subscribes to 20 event types on the orchestrator and creates spans/events via the SpanManager. It returns a proxy that wraps solve() and solveWithStream() with span lifecycle management.

Span Hierarchy

cognitive-swarm.solve [task]

  ├── cognitive-swarm.round [1]
  │    ├── signal:emitted (event)      [discovery from analyst]
  │    ├── signal:emitted (event)      [proposal from analyst]
  │    ├── cognitive-swarm.agent.on-signal [analyst]
  │    │    └── cognitive-swarm.tool.execute [web-search]  (if MCP tools)
  │    ├── cognitive-swarm.agent.on-signal [critic]
  │    ├── signal:delivered (event)    [to analyst]
  │    ├── consensus:reached (event)   [decided=false]
  │    └── advisor:action (event)      [inject-signal]

  ├── cognitive-swarm.round [2]
  │    ├── cognitive-swarm.debate      [proposal-a vs proposal-b]
  │    │    ├── debate:round (event)   [posteriors={...}]
  │    │    └── debate:round (event)
  │    ├── topology:updated (event)    [pruned 1 edge]
  │    └── conflict:detected (event)

  ├── cognitive-swarm.round [3]
  │    └── consensus:reached (event)   [decided=true, confidence=0.87]

  └── cognitive-swarm.synthesize

Span Types

Span NameParentCreated When
cognitive-swarm.solveRootsolve() called
cognitive-swarm.roundsolveEach round starts
cognitive-swarm.agent.on-signalroundAgent finishes processing
cognitive-swarm.tool.executeroundMCP tool call completes
cognitive-swarm.debateroundStructured debate begins
cognitive-swarm.synthesizesolveSynthesis LLM call

Event Types (attached to spans)

Event NameAttached ToWhen
signal:emittedCurrent roundAny signal published
signal:expiredCurrent roundSignal TTL expired
signal:deliveredCurrent roundSignal delivered to subscriber
agent:errorCurrent roundAgent processing error
conflict:detectedCurrent roundConflictDetector found conflict
proposal:submittedCurrent roundProposal signal created
vote:castCurrent roundVote signal created
consensus:reachedCurrent roundConsensus decided=true
consensus:failedCurrent roundConsensus decided=false
advisor:actionCurrent roundAdvisor intervened
topology:updatedCurrent roundCommunication graph changed
debate:rounddebate spanDebate round with posteriors

All Attribute Keys

The ATTR constant defines semantic conventions for cognitive-swarm:

typescript
// packages/otel/src/attributes.ts

export const ATTR = {
  // Solve-level
  TASK: 'swarm.task',                        // string, truncated to 256 chars
  AGENT_COUNT: 'swarm.agent_count',          // number
  MAX_ROUNDS: 'swarm.max_rounds',            // number
  ROUNDS_USED: 'swarm.rounds_used',          // number
  TOTAL_SIGNALS: 'swarm.total_signals',      // number
  CONSENSUS_REACHED: 'swarm.consensus_reached', // boolean
  CONFIDENCE: 'swarm.confidence',            // number 0..1
  TOKENS: 'swarm.tokens',                    // number
  COST_USD: 'swarm.cost_usd',               // number

  // Round-level
  ROUND_NUMBER: 'swarm.round.number',        // number
  ROUND_SIGNAL_COUNT: 'swarm.round.signal_count', // number

  // Agent-level
  AGENT_ID: 'swarm.agent.id',               // string
  AGENT_NAME: 'swarm.agent.name',           // string
  AGENT_STRATEGY: 'swarm.agent.strategy',   // string
  PROCESSING_TIME_MS: 'swarm.agent.processing_time_ms', // number

  // Signal-level
  SIGNAL_TYPE: 'swarm.signal.type',          // string
  SIGNAL_ID: 'swarm.signal.id',             // string

  // Tool-level
  TOOL_NAME: 'swarm.tool.name',             // string
  TOOL_IS_ERROR: 'swarm.tool.is_error',     // boolean
  TOOL_DURATION_MS: 'swarm.tool.duration_ms', // number

  // Debate-level
  DEBATE_RESOLVED: 'swarm.debate.resolved', // boolean
  DEBATE_ROUNDS: 'swarm.debate.rounds',     // number

  // Advisor-level
  ADVISOR_ACTION: 'swarm.advisor.action_type', // string

  // Topology-level
  TOPOLOGY_REASON: 'swarm.topology.reason', // string
  TOPOLOGY_NEIGHBOR_COUNT: 'swarm.topology.neighbor_count', // number
} as const

Attributes Per Span

SpanAttributes Set
solve (start)TASK, AGENT_COUNT, MAX_ROUNDS
solve (end)ROUNDS_USED, TOTAL_SIGNALS, CONSENSUS_REACHED, CONFIDENCE, TOKENS, COST_USD
round (start)ROUND_NUMBER
round (end)ROUND_SIGNAL_COUNT
agent.on-signalAGENT_ID, AGENT_STRATEGY, PROCESSING_TIME_MS, SIGNAL_ID
tool.executeTOOL_NAME, TOOL_IS_ERROR, TOOL_DURATION_MS, AGENT_ID
debate (end)DEBATE_RESOLVED, DEBATE_ROUNDS, CONFIDENCE

20 Subscribed Events

The instrumentSwarm function subscribes to exactly these 20 event types:

typescript
// packages/otel/src/instrument.ts

cleanups.push(orchestrator.on('round:start',       d => manager.onRoundStart(d)))
cleanups.push(orchestrator.on('round:end',         d => manager.onRoundEnd(d)))
cleanups.push(orchestrator.on('signal:emitted',    s => manager.onSignalEmitted(s)))
cleanups.push(orchestrator.on('signal:expired',    s => manager.onSignalExpired(s)))
cleanups.push(orchestrator.on('signal:delivered',  e => manager.onSignalDelivered(e)))
cleanups.push(orchestrator.on('agent:reacted',     r => manager.onAgentReacted(r)))
cleanups.push(orchestrator.on('agent:error',       e => manager.onAgentError(e)))
cleanups.push(orchestrator.on('tool:called',       e => manager.onToolCalled(e)))
cleanups.push(orchestrator.on('conflict:detected', c => manager.onConflictDetected(c)))
cleanups.push(orchestrator.on('proposal:submitted',p => manager.onProposalSubmitted(p)))
cleanups.push(orchestrator.on('vote:cast',         v => manager.onVoteCast(v)))
cleanups.push(orchestrator.on('consensus:reached', r => manager.onConsensusReached(r)))
cleanups.push(orchestrator.on('consensus:failed',  e => manager.onConsensusFailed(e)))
cleanups.push(orchestrator.on('advisor:action',    a => manager.onAdvisorAction(a)))
cleanups.push(orchestrator.on('debate:start',      d => manager.onDebateStart(d)))
cleanups.push(orchestrator.on('debate:round',      d => manager.onDebateRound(d)))
cleanups.push(orchestrator.on('debate:end',        r => manager.onDebateEnd(r)))
cleanups.push(orchestrator.on('topology:updated',  d => manager.onTopologyUpdated(d)))
cleanups.push(orchestrator.on('synthesis:start',   () => manager.onSynthesisStart()))
cleanups.push(orchestrator.on('synthesis:complete', d => manager.onSynthesisComplete(d)))

All subscriptions return cleanup functions. Calling dispose() or destroy() runs all cleanups.

Jaeger Setup

Docker Compose

yaml
# docker-compose.yml
services:
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"  # UI
      - "4317:4317"    # OTLP gRPC
    environment:
      - COLLECTOR_OTLP_ENABLED=true
bash
docker compose up -d jaeger

Full Working Example

typescript
import { SwarmOrchestrator } from '@cognitive-swarm/orchestrator'
import { instrumentSwarm } from '@cognitive-swarm/otel'
import { NodeSDK } from '@opentelemetry/sdk-node'
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc'
import { OpenAiLlmProvider } from '@cognitive-engine/provider-openai'
import { MemoryStore } from '@cognitive-engine/store-memory'

// Initialize OTel FIRST
const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://localhost:4317',
  }),
})
sdk.start()

// Create swarm
const llm = new OpenAiLlmProvider({ model: 'gpt-4o-mini' })
const engine = { llm, embedding: null, store: new MemoryStore() }

const swarm = new SwarmOrchestrator({
  agents: [
    { config: { id: 'analyst', name: 'Analyst', role: '...',
        personality: { curiosity: 0.9, caution: 0.6, conformity: 0.3, verbosity: 0.7 },
        listens: ['task:new', 'proposal'], canEmit: ['discovery', 'proposal', 'vote'] },
      engine },
    { config: { id: 'critic', name: 'Critic', role: '...',
        personality: { curiosity: 0.7, caution: 0.9, conformity: 0.1, verbosity: 0.6 },
        listens: ['task:new', 'proposal'], canEmit: ['doubt', 'challenge', 'vote'] },
      engine },
  ],
  maxRounds: 5,
  consensus: { strategy: 'confidence-weighted', threshold: 0.7 },
})

// Instrument
const instrumented = instrumentSwarm(swarm, {
  agentCount: 2,
  maxRounds: 5,
})

// Solve
const result = await instrumented.solve('Should we use microservices?')
console.log(result.answer)

// Cleanup
instrumented.destroy()
await sdk.shutdown()

Open http://localhost:16686 and search for service cognitive-swarm to explore traces.

Streaming Events (Without OTel)

Without OTel, use the built-in streaming for real-time visibility:

typescript
for await (const event of swarm.solveWithStream('task')) {
  switch (event.type) {
    case 'round:start':
      console.log(`\n--- Round ${event.round} ---`)
      break

    case 'signal:emitted':
      const sig = event.signal
      console.log(`  [${sig.source}] ${sig.type} (confidence: ${sig.confidence.toFixed(2)})`)
      break

    case 'agent:reacted':
      const r = event.reaction
      console.log(`  Agent ${r.agentId} used "${r.strategyUsed}" (${r.processingTimeMs}ms)`)
      break

    case 'math:round-analysis':
      console.log(`  Math: entropy=${event.normalizedEntropy.toFixed(3)}, ` +
                  `gain=${event.informationGain.toFixed(3)}`)
      break

    case 'consensus:check':
      const c = event.result
      console.log(`  Consensus: decided=${c.decided}, confidence=${c.confidence.toFixed(2)}`)
      break

    case 'advisor:action':
      console.log(`  Advisor: ${event.advice.type}`)
      break

    case 'debate:start':
      console.log(`  Debate: ${event.proposalA} vs ${event.proposalB}`)
      break

    case 'evolution:spawned':
      console.log(`  Spawned: ${event.domain} - ${event.reason}`)
      break

    case 'evolution:dissolved':
      console.log(`  Dissolved: ${event.agentId} - ${event.reason}`)
      break

    case 'solve:complete':
      const res = event.result
      console.log(`\nDone: ${res.timing.roundsUsed} rounds, ` +
                  `$${res.cost.estimatedUsd.toFixed(4)}, ` +
                  `confidence=${res.confidence.toFixed(2)}`)
      break
  }
}

SwarmResult Observability Fields

The SwarmResult contains full observability data without any external tooling:

typescript
const result = await swarm.solve('task')

// Math analysis (28 modules)
result.mathAnalysis.entropy.history            // per-round entropy values
result.mathAnalysis.entropy.normalized         // final normalized entropy
result.mathAnalysis.freeEnergy?.history        // per-round free energy
result.mathAnalysis.freeEnergy?.converged      // did F converge?
result.mathAnalysis.surprise?.history          // per-round surprise
result.mathAnalysis.surprise?.mostInformativeAgent
result.mathAnalysis.stoppingReason             // why the swarm stopped

// Consensus (full voting record, dissent preserved)
result.consensus.decided                       // boolean
result.consensus.confidence                    // 0..1
result.consensus.votingRecord                  // every vote with reasoning
result.consensus.dissent                       // reasoning from dissenters
result.consensus.reasoning                     // human-readable explanation

// Per-agent contributions
for (const [agentId, contrib] of result.agentContributions) {
  console.log(agentId,
    'signals:', contrib.signalsEmitted,
    'proposals:', contrib.proposalsMade,
    'avgConf:', contrib.avgConfidence.toFixed(2))
}

// Advisor actions
result.advisorReport?.actions                  // all interventions taken

// Debate results
for (const debate of result.debateResults) {
  console.log(`Debate: resolved=${debate.resolved}, ` +
              `rounds=${debate.roundsUsed}, ` +
              `winner=${debate.winningProposalId}`)
}

// Evolution
result.evolutionReport?.spawned                // agents spawned
result.evolutionReport?.dissolved              // agents dissolved
result.evolutionReport?.activeEvolvedCount     // still active

// Cost and timing
result.cost.tokens                             // total tokens used
result.cost.estimatedUsd                       // estimated cost (configurable via costPerToken)
result.timing.totalMs                          // wall clock time
result.timing.roundsUsed                       // rounds completed

// Full signal log
result.signalLog                               // every signal ever published

SpanManager Internals

The SpanManager maintains active span references:

typescript
class SpanManager {
  private solveSpan: Span | undefined           // root span
  private readonly roundSpans = new Map<number, Span>()  // round → span
  private readonly agentSpans = new Map<string, Span>()  // key → span (for tool children)
  private debateSpan: Span | undefined
  private synthesisSpan: Span | undefined

  // getCurrentRoundSpan() returns the span with the highest round number
  // This is the parent for events/child spans within a round
}

The cleanup() method ends all orphaned spans (e.g., if solve was interrupted). This is called automatically on errors in the instrumented solve() wrapper.

Released under the Apache 2.0 License.