SecondMind — Maxime Gagné

What was built

🤖

Multi-Agent Architecture

9 specialized agents with unique, isolated roles — each one replaceable without breaking the system. One central orchestrator coordinates them all.

🧠

Persistent Memory

The system remembers past conversations, consolidates them over time, and retrieves the most relevant context automatically — no manual input required.

🔍

Hybrid RAG Pipeline

Combines text search, vector search, and code dependency analysis to give the AI precise, hallucination-resistant answers from its own knowledge base.

🛡️

Self-Auditing Code

A dedicated agent continuously scans the codebase using static analysis (AST) to catch contract violations and data integrity issues in real time.

⚙️

Zero-Boilerplate Infrastructure

Metaprogramming automatically injects logging, monitoring, and statistics into every agent at creation — no repetitive setup code.

👁️

Real-Time Observability

A live dashboard (Prompt Viewer) shows exactly what context the AI receives before generating each response — essential for debugging complex pipelines.

Architectural principles

Hub & Spoke Design One central orchestrator with loose coupling — agents can be swapped or extended without touching the rest of the system.

Strict Contracts All inter-agent communication uses typed data structures. If an agent sends the wrong format, the system blocks it immediately.

Fail-Fast & Resilient Errors surface immediately rather than propagating silently — the system detects, logs, and handles failures in real time.

Security by Design Continuous static audit enforces compliance at runtime. No agent can bypass the governance layer.

Single Source of Truth One authoritative configuration and path registry shared across all agents — no hardcoded values, no inconsistencies.

Cognitive pipeline

From a user message to a validated, memory-backed response — every step is audited, scored for relevance, and logged before the AI generates a single word.

SecondMind cognitive data pipeline diagram

Data pipeline — from user input to memory-backed LLM response.

Hardware & performance

The entire system runs on consumer hardware. Optimized to handle a 130,000-token context window — roughly 200 pages of text — on a single GPU without running out of memory.

Context Window

130,000 tokens

Main Model

Qwen 2.5 — 14B

Judge Model

Phi-3 Mini — 3B

Intent Router

SBERT (~10ms)

GPU

RTX 3090 — 24 GB

Cache Optimization

Q4 / Q8 KV Cache

Interface

Chat interface — context slots, live token counter, and reflexive feedback controls.

View full repository on GitHub ↗ ← Back to portfolio

SecondMindAn AI that runs on your own machine.

SecondMind
An AI that runs on your own machine.