Local Agentic System — Python

SecondMind
An AI that runs on your own machine.

Built from scratch over 6 months with no prior coding experience — a fully local, self-governing AI system made of 9 specialized agents that collaborate, remember, audit themselves, and improve over time. No cloud. No subscriptions. No data leaving your computer.

Python Multi-Agent RAG Local LLM Metaprogramming AST Audit RTX 3090
🤖

Multi-Agent Architecture

9 specialized agents with unique, isolated roles — each one replaceable without breaking the system. One central orchestrator coordinates them all.

🧠

Persistent Memory

The system remembers past conversations, consolidates them over time, and retrieves the most relevant context automatically — no manual input required.

🔍

Hybrid RAG Pipeline

Combines text search, vector search, and code dependency analysis to give the AI precise, hallucination-resistant answers from its own knowledge base.

🛡️

Self-Auditing Code

A dedicated agent continuously scans the codebase using static analysis (AST) to catch contract violations and data integrity issues in real time.

⚙️

Zero-Boilerplate Infrastructure

Metaprogramming automatically injects logging, monitoring, and statistics into every agent at creation — no repetitive setup code.

👁️

Real-Time Observability

A live dashboard (Prompt Viewer) shows exactly what context the AI receives before generating each response — essential for debugging complex pipelines.

Hub & Spoke Design One central orchestrator with loose coupling — agents can be swapped or extended without touching the rest of the system.
Strict Contracts All inter-agent communication uses typed data structures. If an agent sends the wrong format, the system blocks it immediately.
Fail-Fast & Resilient Errors surface immediately rather than propagating silently — the system detects, logs, and handles failures in real time.
Security by Design Continuous static audit enforces compliance at runtime. No agent can bypass the governance layer.
Single Source of Truth One authoritative configuration and path registry shared across all agents — no hardcoded values, no inconsistencies.

From a user message to a validated, memory-backed response — every step is audited, scored for relevance, and logged before the AI generates a single word.

SecondMind cognitive data pipeline diagram

Data pipeline — from user input to memory-backed LLM response.

The entire system runs on consumer hardware. Optimized to handle a 130,000-token context window — roughly 200 pages of text — on a single GPU without running out of memory.

Context Window

130,000 tokens

Main Model

Qwen 2.5 — 14B

Judge Model

Phi-3 Mini — 3B

Intent Router

SBERT (~10ms)

GPU

RTX 3090 — 24 GB

Cache Optimization

Q4 / Q8 KV Cache

SecondMind chat interface screenshot

Chat interface — context slots, live token counter, and reflexive feedback controls.

View full repository on GitHub ↗ ← Back to portfolio