← All docs· Edit on GitHub ↗

Architecture

ARP Guardian Enterprise Suite is a real‑time ARP‑spoofing detection and reversible‑mitigation platform. It runs on a flat L2 lab segment today (192.168.10.0/24, 4 nodes) but the contracts are designed for fanned‑out, multi‑VLAN deployments. The hot path is Rust; correlation, mitigation and the operator API are Go; control‑plane scripts and ML scoring are Python.

1. Data flow (single segment, single sensor)

attacker (lab-only) ─► ens33 ─► sensor (Rust, AF_PACKET, Tier 1/2/3)
                                  │
                                  ▼  detection NDJSON
                          NATS JetStream  subject = arp.<site>.<vlan>      (lab: arp.lab.10)
                                  ▼
                       correlator (Go, nats.go + pgx)
                          • entity graph (attacker MAC → claimed IPs, sensors)
                          • multi‑evidence fusion → incident row
                                  ▼
                       mitigation controller (Go)
                          L0 alert │ L1 corrective ARP │ L2 NAC/CoA
                          • requires deterministic Tier‑1/2 hit to auto‑act
                          • TTL auto‑revert + circuit‑breaker
                                  ▼
                       PostgreSQL  ──►  Go API (REST + SSE + RBAC)
                                              │
                                              ▼
                       React 18 + TS dashboard, embedded via go:embed
                       (login → /api/login; SSE feed on /events)

                       Prometheus  :9110 (correlator)
                       Grafana     :3000 (dashboard uid: arpg-overview)
                       SIEM        CEF + ECS over syslog (control/siem_connector.py)

Lab nodes: control 192.168.10.200 hosts NATS + PostgreSQL + Prometheus + Grafana + the Go API; managed-1/2/3 (.201/.202/.203) host sensors. One L2 segment, one VLAN.

2. Repository layout

backend/      Rust sensor + Go services + Python control tools (one daemon family)
frontend/     Vite + React + TS SPA (operator console)
infra/        docker-compose, SQL schema, Grafana, Prometheus, Ansible, systemd
tests/        Playwright E2E
docs/         this directory
marketing/    deck + landing-site content (not part of the platform)

The frontend's vite build writes into backend/api/static/, which go:embed then pulls into the API binary — at runtime a single Go binary serves the SPA and the JSON/SSE API.

3. Components

Component	Tech	Path	Notes
Sensor (hot path)	Rust 1.95 (libc, no pcap)	`backend/sensor`	Tier 1 signature, Tier 2 binding, Tier 3 anomaly/storm; commands: `selftest`, `bench`, `replay`, `live`
Bus	NATS JetStream	external (`infra/docker-compose.yml`)	Hand‑rolled minimal NATS client in `sensor/src/nats.rs` with PING/PONG keepalive + reconnect
Correlator	Go (nats.go + pgx)	`backend/correlator`	Consumer reset on startup (`DeleteConsumer` + `DeliverNew`) to avoid stuck JetStream state
Mitigation controller	Go (in‑process with correlator)	`backend/correlator/mitigate.go`	L0 alert · L1 corrective ARP · L2 NAC/CoA; TTL revert
Operator API	Go (`net/http` + SSE + pgx)	`backend/api`	Embeds the SPA via `//go:embed static/*`; auth in `auth.go`
Auth / RBAC	bcrypt + HS256 JWT (stdlib `crypto/hmac`)	`backend/api/auth.go`	4 roles: `viewer < analyst < responder < admin`; 8h tokens
Control‑plane	Python stdlib	`backend/control`	`baseline_sync`, `approve`, `dhcp_lease`, `siem_connector`, `ml_train`, `ml_shadow`
Lab attack tooling	Python (Scapy)	`backend/generator`	`arp_attack.py`, `dataset_gen.py`, `eval.py`, `fuzz.py`, `load_inject.py` — authorized lab only
Storage	PostgreSQL 15	`infra/sql/00*.sql`	`bindings`, `vmac_allowlist`, `dhcp_leases`, `detections`, `incidents`, `mitigation_audit`, `policies`, `settings`, `users`
Observability	Prometheus + Grafana	`infra/prometheus`, `infra/grafana`	Correlator exposes `:9110/metrics`; dashboard uid `arpg-overview`
Frontend	React 18 + TS + Vite + Tailwind + Chart.js	`frontend/`	SPA with auth gate, 5s polling, SSE for live detections

4. Detection tiers

Tier 1 — signature. Forged gateway, mismatched ARP/ETH source MAC, broadcast SHA, zero‑target‑MAC unsolicited reply, gratuitous ARP storm. Deterministic; eligible for auto‑mitigation.

Tier 2 — binding‑aware. Cross‑checks ARP claims against the approved binding table (bindings), DHCP leases (dhcp_leases) and HA virtual‑MAC allow‑list (vmac_allowlist). A BIND-FLIP to an unknown MAC for a protected IP is CRITICAL. DHCP corroboration suppresses legitimate moves before they ever fire. Eligible for auto‑mitigation.

Tier 3 — anomaly / storm. EWMA on per‑MAC and per‑VLAN ARP rates, cardinality (one MAC → many IPs), low‑and‑slow fan‑out. Tier 3 alone never auto‑blocks — it raises severity and feeds the correlator.

ML (Python) runs shadow only, never gates mitigation. Logs into ml_scores so we can compare against the deterministic ground truth.

5. Correlation & mitigation

The correlator opens a durable JetStream consumer on arp.<site>.<vlan>, holds an entity graph keyed by attacker MAC, and writes one open incidents row per attacker per site. It enriches the incident with kinds (e.g. CARD-MULTI-IP,CROSS-SENSOR), claimed_ips, distinct_sensors, confidence.

The mitigation controller ladders responses:

L0 — alert. Always emitted; SIEM + dashboard notification.
L1 — corrective ARP. Reissue the truth binding to the segment.
L2 — NAC / RADIUS CoA. Quarantine via switch policy. Requires actuator credentials.

Safety invariants (also in SECURITY.md):

Auto‑mitigation requires a deterministic Tier‑1 or Tier‑2 hit. Tier‑3 / ML alone never blocks.
Every action is reversible by default — revert_ts populates a TTL.
Circuit‑breaker: more than N mitigations per minute aborts to alert‑only.
All actions go through mitigation_audit with record_hash chain‑of‑custody.

6. Operator API

REST over PostgreSQL plus a Server‑Sent Events feed at /events for live detections. JWT (bearer) on every protected route; /api/login issues 8h tokens. Full reference in API.md.

7. Dashboard

Single‑page React + TypeScript app built with Vite. The build (npm run build) writes straight into backend/api/static/; go build ./api then embeds that directory. All vendor assets ship inside the bundle — no CDNs at runtime, which matters because restrictive corporate proxies were breaking earlier builds via 403s on unpkg/jsdelivr.

Pages: Dashboard (KPIs + 7‑day trend + active incidents + segments + sensors), Incidents (active/all/archive tabs), Segments, Sensors, Binding Database, Policies, Audit Log, Users (admin), Settings.

8. Storage schema

infra/sql/001..006 (run in order). Highlights:

bindings(site, vlan, ip, mac, is_protected, state) — approval queue + truth table.
vmac_allowlist(site, vlan, vip, vmac, protocol) — HSRP/VRRP/cluster — primary FP

suppressor.

dhcp_leases(ip, mac, expires) — Tier 2 corroborating evidence.
detections(ts, site, vlan, sensor_id, rule_id, severity, confidence, eth_src, sha, spa, tpa, …).
incidents(attacker_mac, kinds, claimed_ips, sensor_ids, detection_count, distinct_ips, status, …).
mitigation_audit(incident_id, target_mac, level, actuator, result, revert_ts, operator, record_hash).
policies(name, segments, mode, enabled) — operator‑editable.
settings(key, value) — operator key/value store.
users(username, password_hash bcrypt, role, last_login).

Schema mismatch trap: the column in vmac_allowlist is vmac, not mac — an early baseline_sync.py made the opposite assumption and quietly skipped rows.

9. Lab specifics

Gateway 192.168.10.1 real MAC is 20:3a:eb:9a:e8:ac. The lab baseline must match

reality — a placeholder MAC once produced a CRITICAL false positive on legitimate gateway ARP. See the comment in `backend/control/baseline_sync.py`.

Sensor needs CAP_NET_RAW + CAP_NET_ADMIN and the NIC in promiscuous mode. Both are

dropped after every cargo build; make caps re‑grants them.

JetStream durable consumers can get stuck on stale state; the correlator calls

DeleteConsumer then re‑creates with DeliverNew on startup.

10. Where rationale lives

Deep design rationale (threat model, detection literature, deployment models) is in the external archive at ../arpocalypse-2.0-docs/research/. This repository keeps only the operational doc set you see here.