Architecture

Technical Architecture

Complete system diagrams for every subsystem. Etherion is open source — every component is inspectable on GitHub.

/* System Architecture */}

Complete System Architecture

Bare-metal architecture with database-enforced multi-tenancy, asynchronous job execution, and real-time updates. Full infrastructure ownership with NixOS + Ansible.

flowchart LR subgraph Edge["Load Balancer"] DNS["DNS"] --> HAProxy["HAProxy + Nginx"] end HAProxy --> FE["Frontend (Next.js)"] HAProxy --> API["API Server (FastAPI+GraphQL)"] API <--> REDIS[("Redis Cluster")] API <--> SQL[("PostgreSQL pgvector + Patroni")] API --> MINIO[("MinIO Per-tenant buckets")] API --> VAULT[("HashiCorp Vault Secrets + Creds")] BEAT["Celery Beat Scheduler"] --> WRK["Systemd Worker (Celery)"] API --> WRK WRK <--> REDIS WRK <--> SQL style Edge fill:#dbeafe,stroke:#1e40af,stroke-width:2px style DNS fill:#bfdbfe,stroke:#3b82f6,stroke-width:2px style HAProxy fill:#bfdbfe,stroke:#3b82f6,stroke-width:2px style FE fill:#d1fae5,stroke:#10b981,stroke-width:2px style API fill:#d1fae5,stroke:#10b981,stroke-width:2px style REDIS fill:#fecaca,stroke:#ef4444,stroke-width:2px style SQL fill:#ddd6fe,stroke:#8b5cf6,stroke-width:2px style MINIO fill:#fed7aa,stroke:#f59e0b,stroke-width:2px style VAULT fill:#fce7f3,stroke:#ec4899,stroke-width:2px style BEAT fill:#e0e7ff,stroke:#6366f1,stroke-width:2px style WRK fill:#d1fae5,stroke:#10b981,stroke-width:2px

Knowledge Base Architecture

OAuth-secured connectors ingest data from your tools into PostgreSQL with pgvector. Configurable embedding models enable semantic search. All data is tenant-isolated with Row-Level Security.

flowchart TD OAuth["OAuth Consent"] --> Vault["HashiCorp Vault Encrypted tokens"] Scheduler["Celery Beat Scheduler"] --> Worker["Ingestion Worker Celery Task"] Drive["Google Drive"] --> Worker OneDrive["OneDrive"] --> Worker Airtable["Airtable"] --> Worker Notion["Notion"] --> Worker HubSpot["HubSpot"] --> Worker Jira["Jira"] --> Worker Slack["Slack"] --> Worker Vault --> Worker Worker --> PGVector["PostgreSQL + pgvector tenant-scoped docs"] PGVector --> Embed["Embedding Service Configurable models High-D vectors"] Embed --> Search["Semantic Search HNSW Index"] PGVector --> RLS["Row-Level Security app.tenant_id"] style OAuth fill:#bfdbfe,stroke:#3b82f6,stroke-width:2px style Vault fill:#ddd6fe,stroke:#8b5cf6,stroke-width:2px style Scheduler fill:#e0e7ff,stroke:#6366f1,stroke-width:2px style Worker fill:#d1fae5,stroke:#10b981,stroke-width:2px style Drive fill:#fed7aa,stroke:#f59e0b,stroke-width:2px style OneDrive fill:#bfdbfe,stroke:#3b82f6,stroke-width:2px style Airtable fill:#fed7aa,stroke:#f59e0b,stroke-width:2px style Notion fill:#e5e7eb,stroke:#000000,stroke-width:2px style HubSpot fill:#fed7aa,stroke:#ff7a59,stroke-width:2px style Jira fill:#bfdbfe,stroke:#0052cc,stroke-width:2px style Slack fill:#ddd6fe,stroke:#4a154b,stroke-width:2px style PGVector fill:#cffafe,stroke:#06b6d4,stroke-width:2px style Embed fill:#fce7f3,stroke:#ec4899,stroke-width:2px style Search fill:#d1fae5,stroke:#10b981,stroke-width:2px style RLS fill:#fecaca,stroke:#ef4444,stroke-width:2px

Dual-Orchestrator: The 2N+1 Loop

IO performs dual search (KB + web), evaluates teams, and enforces fail-closed tool approval. TeamOrchestrator executes the 2N+1 loop: N specialist agents work in parallel, each validating tool requests with what/how/why justification against the ToolManager registry. A final synthesis step integrates all findings into a coherent response.

flowchart TD Goal["User Goal"] --> IO["IO Dual Search"] IO --> KBSearch["Search KB"] IO --> WebSearch["Search Web"] KBSearch --> EvalTeams["Evaluate Teams"] WebSearch --> EvalTeams EvalTeams --> SelectTeam["Select Team"] SelectTeam --> Spec1["Specialist 1 Parallel"] SelectTeam --> Spec2["Specialist 2 Parallel"] SelectTeam --> SpecN["Specialist N Parallel"] Spec1 --> ToolReq1["Tool Request what/how/why"] Spec2 --> ToolReq2["Tool Request what/how/why"] SpecN --> ToolReqN["Tool Request what/how/why"] ToolReq1 --> RegCheck1["ToolManager Registry"] ToolReq2 --> RegCheck1 ToolReqN --> RegCheck1 RegCheck1 --> PreApprove["Pre-Approved for Team?"] PreApprove --> CredsCheck["Tenant Creds Available?"] CredsCheck --> WriteOp["Write Op? Confirm User"] WriteOp --> ExecTool["Execute Tool"] Spec1 --> Synthesis["Synthesis Integrate 2N"] Spec2 --> Synthesis SpecN --> Synthesis Synthesis --> Result["Final Response"] style Goal fill:#bfdbfe,stroke:#3b82f6,stroke-width:2px style IO fill:#ddd6fe,stroke:#8b5cf6,stroke-width:2px style KBSearch fill:#d1fae5,stroke:#10b981,stroke-width:2px style WebSearch fill:#d1fae5,stroke:#10b981,stroke-width:2px style EvalTeams fill:#fed7aa,stroke:#f59e0b,stroke-width:2px style SelectTeam fill:#ddd6fe,stroke:#8b5cf6,stroke-width:2px style Spec1 fill:#d1fae5,stroke:#10b981,stroke-width:2px style Spec2 fill:#d1fae5,stroke:#10b981,stroke-width:2px style SpecN fill:#d1fae5,stroke:#10b981,stroke-width:2px style ToolReq1 fill:#fecaca,stroke:#ef4444,stroke-width:2px style ToolReq2 fill:#fecaca,stroke:#ef4444,stroke-width:2px style ToolReqN fill:#fecaca,stroke:#ef4444,stroke-width:2px style RegCheck1 fill:#e0e7ff,stroke:#6366f1,stroke-width:2px style PreApprove fill:#fed7aa,stroke:#f59e0b,stroke-width:2px style CredsCheck fill:#fed7aa,stroke:#f59e0b,stroke-width:2px style WriteOp fill:#fecaca,stroke:#ef4444,stroke-width:2px style ExecTool fill:#cffafe,stroke:#06b6d4,stroke-width:2px style Synthesis fill:#ddd6fe,stroke:#8b5cf6,stroke-width:2px style Result fill:#d1fae5,stroke:#10b981,stroke-width:2px

Asynchronous Job Execution

Jobs run in the background using Celery workers and Redis as the message broker. Two worker pools handle different workloads: worker-agents for orchestration, worker-artifacts for ingestion and heavy processing. Real-time status updates stream via GraphQL subscriptions.

flowchart TD API["API Request executeGoal"] --> Job["Job Table Postgres RLS"] Job --> Celery["Celery Task Enqueued"] Celery --> Redis["Redis Broker Cluster"] Redis --> WorkerA["Worker-Agents Orchestration"] Redis --> WorkerB["Worker-Artifacts Ingestion + Heavy"] WorkerA --> Repo["Repository MinIO + PostgreSQL"] WorkerB --> Repo WorkerA --> Done["Job Complete"] WorkerB --> Done Done --> PubSub["Redis Pub/Sub job_trace_{job_id}"] PubSub --> UI["GraphQL Subscription Real-time updates"] Repo --> UI style API fill:#d1fae5,stroke:#10b981,stroke-width:2px style Job fill:#ddd6fe,stroke:#8b5cf6,stroke-width:2px style Celery fill:#fecaca,stroke:#ef4444,stroke-width:2px style Redis fill:#fecaca,stroke:#ef4444,stroke-width:2px style WorkerA fill:#d1fae5,stroke:#10b981,stroke-width:2px style WorkerB fill:#fed7aa,stroke:#f59e0b,stroke-width:2px style Repo fill:#cffafe,stroke:#06b6d4,stroke-width:2px style Done fill:#d1fae5,stroke:#10b981,stroke-width:2px style PubSub fill:#fecaca,stroke:#ef4444,stroke-width:2px style UI fill:#bfdbfe,stroke:#3b82f6,stroke-width:2px

MCP Tools with OAuth Security

All tools use Model Context Protocol (MCP) and connect to third-party systems via OAuth. OAuth tokens are encrypted in HashiCorp Vault. Tool calls validate against the ToolManager registry, require pre-approval for the team, and for write operations, require explicit user confirmation. Rate limiting via token bucket + Redis prevents API abuse.

flowchart TD User["Specialist Requests Tool"] --> Approval["Fail-Closed Approval"] Approval --> Reg["1. Registered in ToolManager?"] Reg --> PreApp["2. Pre-Approved for Team?"] PreApp --> Creds["3. Tenant Creds Available?"] Creds --> Write["4. Write Op? User Confirm"] Write --> OAuth["OAuth Token Retrieval"] OAuth --> Vault["HashiCorp Vault Encrypted Storage"] Vault --> RateLimit["Rate Limiter Token bucket + Redis"] RateLimit --> MCPTool["MCP Tool Invoke"] MCPTool --> API["Third-party API Slack, Jira, Gmail, etc."] API --> Result["Result to Specialist"] style User fill:#bfdbfe,stroke:#3b82f6,stroke-width:2px style Approval fill:#fecaca,stroke:#ef4444,stroke-width:2px style Reg fill:#e0e7ff,stroke:#6366f1,stroke-width:2px style PreApp fill:#fed7aa,stroke:#f59e0b,stroke-width:2px style Creds fill:#fed7aa,stroke:#f59e0b,stroke-width:2px style Write fill:#fecaca,stroke:#ef4444,stroke-width:2px style OAuth fill:#bfdbfe,stroke:#3b82f6,stroke-width:2px style Vault fill:#ddd6fe,stroke:#8b5cf6,stroke-width:2px style RateLimit fill:#fed7aa,stroke:#f59e0b,stroke-width:2px style MCPTool fill:#d1fae5,stroke:#10b981,stroke-width:2px style API fill:#cffafe,stroke:#06b6d4,stroke-width:2px style Result fill:#d1fae5,stroke:#10b981,stroke-width:2px

AI Assets Repository

Every artifact agents create is stored in MinIO and indexed in PostgreSQL. Documents, datasets, code, and media are searchable and retrievable. Full execution traces are archived as JSONL for replay and audit.

flowchart TD Repo["Repository Service"] --- Assets["Assets docs, data, code, media"] AgentA["Agent A"] --> Repo AgentB["Agent B"] --> Repo AgentC["Agent C"] --> Repo Repo --> MINIO["MinIO Buckets tnt-{tenant}-assets"] Repo --> PG["PostgreSQL tenant-scoped assets"] BQ --> Search["Vector Search Semantic retrieval"] Repo --> Trace["Execution Traces replays/{job_id}/trace.jsonl"] Search --> AgentA Search --> AgentB Search --> AgentC Trace --> Replay["Full-Fidelity Replay LangChain messages + IO"] style Repo fill:#cffafe,stroke:#06b6d4,stroke-width:2px style Assets fill:#fed7aa,stroke:#f59e0b,stroke-width:2px style AgentA fill:#d1fae5,stroke:#10b981,stroke-width:2px style AgentB fill:#d1fae5,stroke:#10b981,stroke-width:2px style AgentC fill:#d1fae5,stroke:#10b981,stroke-width:2px style MINIO fill:#fed7aa,stroke:#f59e0b,stroke-width:2px style PG fill:#cffafe,stroke:#06b6d4,stroke-width:2px style Search fill:#ddd6fe,stroke:#8b5cf6,stroke-width:2px style Trace fill:#fecaca,stroke:#ef4444,stroke-width:2px style Replay fill:#e0e7ff,stroke:#6366f1,stroke-width:2px

Database-Enforced Multi-Tenancy

Row-Level Security policies in PostgreSQL enforce tenant isolation at the database layer. Every connection sets app.tenant_id. Application bugs cannot cause cross-tenant data leaks.

flowchart TD Request["API Request Bearer JWT"] --> Middleware["Tenant Middleware Extract tenant_id"] Middleware --> Context["Tenant Context ContextVar"] Context --> Engine["DB Engine Connection checkout"] Engine --> SetLocal["SET LOCAL app.tenant_id = X"] SetLocal --> RLS["Row-Level Security USING (tenant_id = current_setting)"] RLS --> Query["Query Execution Automatic filtering"] Query --> Result["Tenant-Scoped Results"] style Request fill:#bfdbfe,stroke:#3b82f6,stroke-width:2px style Middleware fill:#d1fae5,stroke:#10b981,stroke-width:2px style Context fill:#fed7aa,stroke:#f59e0b,stroke-width:2px style Engine fill:#ddd6fe,stroke:#8b5cf6,stroke-width:2px style SetLocal fill:#fecaca,stroke:#ef4444,stroke-width:2px style RLS fill:#fecaca,stroke:#ef4444,stroke-width:2px style Query fill:#cffafe,stroke:#06b6d4,stroke-width:2px style Result fill:#d1fae5,stroke:#10b981,stroke-width:2px

Multimodal Ingestion Pipeline

PyMuPDF extracts text and images from PDFs. Configurable embedding models generate high-dimensional vectors for both text and images. All embeddings are stored in PostgreSQL with pgvector HNSW indexes for fast cosine-distance search. Files stored in MinIO with per-tenant buckets.

flowchart TD Upload["Upload File PDF, Image, Text"] --> MINIO["MinIO Bucket tnt-{tenant}-media"] MINIO --> Worker["Worker-Artifacts Celery Task"] Worker --> PyMuPDF["PyMuPDF Extract text + images"] PyMuPDF --> Embed["Embedding Service Configurable models High-D vectors"] Embed --> PGV["PostgreSQL + pgvector tenant-scoped docs"] PGV --> Index["HNSW Index COSINE distance"] Index --> Search["Vector Search Semantic retrieval"] style Upload fill:#bfdbfe,stroke:#3b82f6,stroke-width:2px style MINIO fill:#fed7aa,stroke:#f59e0b,stroke-width:2px style Worker fill:#d1fae5,stroke:#10b981,stroke-width:2px style PyMuPDF fill:#ddd6fe,stroke:#8b5cf6,stroke-width:2px style Embed fill:#fce7f3,stroke:#ec4899,stroke-width:2px style PGV fill:#cffafe,stroke:#06b6d4,stroke-width:2px style Index fill:#fecaca,stroke:#ef4444,stroke-width:2px style Search fill:#d1fae5,stroke:#10b981,stroke-width:2px

Tool Request Queue and Validation

All tool requests require what/how/why justification. Requests are validated in 4 steps: (1) Is it registered in ToolManager? (2) Pre-approved for this team? (3) Are tenant credentials available? (4) For write operations, confirmed by user? Blueprint creation validates tools against the registry—no hallucinated tools can enter production. Fail-closed policy ensures every tool invocation is auditable and secure.

flowchart TD Specialist["Specialist Agent Needs Tool"] --> Request["Tool Request what/how/why"] Request --> Check1["Check 1 Registered?"] Check1 --> Fail1["Not in Registry"] Check1 --> Check2["Check 2 Pre-Approved?"] Fail1 --> Reject["Reject Request"] Check2 --> Fail2["Not in Team Allowlist"] Check2 --> Check3["Check 3 Creds OK?"] Fail2 --> Reject Check3 --> Fail3["Missing Credentials"] Check3 --> Check4["Check 4 Write Op?"] Fail3 --> Reject Check4 --> WriteYes["Yes: Ask User"] Check4 --> WriteNo["No: Proceed"] WriteYes --> Confirm["User Confirms"] Confirm --> Execute["Execute Tool"] WriteNo --> Execute Execute --> Result["Return Result"] Reject --> Escalate["Escalate to User or IO"] Result --> Specialist style Specialist fill:#d1fae5,stroke:#10b981,stroke-width:2px style Request fill:#bfdbfe,stroke:#3b82f6,stroke-width:2px style Check1 fill:#e0e7ff,stroke:#6366f1,stroke-width:2px style Check2 fill:#fed7aa,stroke:#f59e0b,stroke-width:2px style Check3 fill:#fed7aa,stroke:#f59e0b,stroke-width:2px style Check4 fill:#fecaca,stroke:#ef4444,stroke-width:2px style Fail1 fill:#fecaca,stroke:#ef4444,stroke-width:2px style Fail2 fill:#fecaca,stroke:#ef4444,stroke-width:2px style Fail3 fill:#fecaca,stroke:#ef4444,stroke-width:2px style WriteYes fill:#fecaca,stroke:#ef4444,stroke-width:2px style WriteNo fill:#d1fae5,stroke:#10b981,stroke-width:2px style Confirm fill:#fecaca,stroke:#ef4444,stroke-width:2px style Execute fill:#cffafe,stroke:#06b6d4,stroke-width:2px style Result fill:#d1fae5,stroke:#10b981,stroke-width:2px style Reject fill:#fecaca,stroke:#ef4444,stroke-width:2px style Escalate fill:#fed7aa,stroke:#f59e0b,stroke-width:2px

Execution Modes

The Team Orchestrator selects execution mode based on task complexity. Sequential mode runs one specialist at a time. Parallel mode (future) will run all specialists concurrently. Mode selection is logged in execution trace events.

Sequential Mode

One specialist active at a time. Tool requests handled immediately. Checklists maintained throughout execution. Current default mode.

Predictable execution order

Lower resource usage

Easier debugging

Parallel Mode

All specialists run concurrently. Tool requests queued and processed in FIFO order. Deferred for future release.

Faster completion time

Higher throughput

Complex coordination

Full-Fidelity Replay System

Every job execution is recorded with complete LangChain message lists, tool IO, and specialist delegations. Traces are archived to MinIO as JSONL and indexed in PostgreSQL for semantic search. Replay artifacts enable 100% reconstruction of any past execution.

flowchart TD Runtime["Orchestrator Runtime Execute job"] --> Trace["Execution Trace PostgreSQL RLS"] Trace --> Events["Trace Events TOOL_START/END SPECIALIST_REQUEST/RESPONSE"] Events --> Redis["Redis Pub/Sub job_trace_{job_id}"] Redis --> WS["GraphQL Subscription Real-time stream"] Trace --> Archive["Archive Task Job completion"] Archive --> JSONL["MinIO replays/{job_id}/trace.jsonl"] Archive --> Transcript["MinIO replays/{job_id}/transcript.md"] JSONL --> PG["PostgreSQL tenant-scoped assets"] Transcript --> PG PG --> Search["Vector Search Find past replays"] style Runtime fill:#d1fae5,stroke:#10b981,stroke-width:2px style Trace fill:#ddd6fe,stroke:#8b5cf6,stroke-width:2px style Events fill:#bfdbfe,stroke:#3b82f6,stroke-width:2px style Redis fill:#fecaca,stroke:#ef4444,stroke-width:2px style WS fill:#cffafe,stroke:#06b6d4,stroke-width:2px style Archive fill:#fed7aa,stroke:#f59e0b,stroke-width:2px style JSONL fill:#ddd6fe,stroke:#8b5cf6,stroke-width:2px style Transcript fill:#ddd6fe,stroke:#8b5cf6,stroke-width:2px style PG fill:#cffafe,stroke:#06b6d4,stroke-width:2px style Search fill:#d1fae5,stroke:#10b981,stroke-width:2px

Authentication and OAuth

JWT-based authentication with invite-only onboarding. OAuth tokens encrypted in HashiCorp Vault. Subdomain validation enforces 8 rules, reserves 90+ system subdomains, and blocks 1662 banned words. Users cannot switch tenants after signup.

flowchart TD Signup["Password Signup email + subdomain"] --> Validate["Subdomain Validation 8 rules + banned words"] Validate --> Tenant["Create Tenant New tenant per user"] Tenant --> User["Create User Bind to tenant"] User --> JWT["Issue JWT tenant_id + user_id"] OAuth["OAuth Flow Google, GitHub, Microsoft"] --> State["State Validation tenant_id + invite_token"] State --> Tokens["HashiCorp Vault {tenant}/{service}/oauth_tokens"] Tokens --> JWT JWT --> Middleware["Tenant Middleware Extract tenant_id"] Middleware --> RLS["SET LOCAL app.tenant_id"] style Signup fill:#bfdbfe,stroke:#3b82f6,stroke-width:2px style Validate fill:#ddd6fe,stroke:#8b5cf6,stroke-width:2px style Tenant fill:#d1fae5,stroke:#10b981,stroke-width:2px style User fill:#d1fae5,stroke:#10b981,stroke-width:2px style JWT fill:#fed7aa,stroke:#f59e0b,stroke-width:2px style OAuth fill:#bfdbfe,stroke:#3b82f6,stroke-width:2px style State fill:#ddd6fe,stroke:#8b5cf6,stroke-width:2px style Tokens fill:#fce7f3,stroke:#ec4899,stroke-width:2px style Middleware fill:#fecaca,stroke:#ef4444,stroke-width:2px style RLS fill:#fecaca,stroke:#ef4444,stroke-width:2px