Architecture
ARP Guardian Enterprise Suite is a real‑time ARP‑spoofing detection and reversible‑mitigation platform. It runs on a flat L2 lab segment today (192.168.10.0/24, 4 nodes) but the contracts are designed for fanned‑out, multi‑VLAN deployments. The hot path is Rust; correlation, mitigation and the operator API are Go; control‑plane scripts and ML scoring are Python.
1. Data flow (single segment, single sensor)
attacker (lab-only) ─► ens33 ─► sensor (Rust, AF_PACKET, Tier 1/2/3)
│
▼ detection NDJSON
NATS JetStream subject = arp.<site>.<vlan> (lab: arp.lab.10)
▼
correlator (Go, nats.go + pgx)
• entity graph (attacker MAC → claimed IPs, sensors)
• multi‑evidence fusion → incident row
▼
mitigation controller (Go)
L0 alert │ L1 corrective ARP │ L2 NAC/CoA
• requires deterministic Tier‑1/2 hit to auto‑act
• TTL auto‑revert + circuit‑breaker
▼
PostgreSQL ──► Go API (REST + SSE + RBAC)
│
▼
React 18 + TS dashboard, embedded via go:embed
(login → /api/login; SSE feed on /events)
Prometheus :9110 (correlator)
Grafana :3000 (dashboard uid: arpg-overview)
SIEM CEF + ECS over syslog (control/siem_connector.py)
Lab nodes: control 192.168.10.200 hosts NATS + PostgreSQL + Prometheus + Grafana + the Go API; managed-1/2/3 (.201/.202/.203) host sensors. One L2 segment, one VLAN.
2. Repository layout
backend/ Rust sensor + Go services + Python control tools (one daemon family)
frontend/ Vite + React + TS SPA (operator console)
infra/ docker-compose, SQL schema, Grafana, Prometheus, Ansible, systemd
tests/ Playwright E2E
docs/ this directory
marketing/ deck + landing-site content (not part of the platform)
The frontend's vite build writes into backend/api/static/, which go:embed then pulls into the API binary — at runtime a single Go binary serves the SPA and the JSON/SSE API.
3. Components
| Component | Tech | Path | Notes |
|---|---|---|---|
| Sensor (hot path) | Rust 1.95 (libc, no pcap) | backend/sensor |
Tier 1 signature, Tier 2 binding, Tier 3 anomaly/storm; commands: selftest, bench, replay, live |
| Bus | NATS JetStream | external (infra/docker-compose.yml) |
Hand‑rolled minimal NATS client in sensor/src/nats.rs with PING/PONG keepalive + reconnect |
| Correlator | Go (nats.go + pgx) | backend/correlator |
Consumer reset on startup (DeleteConsumer + DeliverNew) to avoid stuck JetStream state |
| Mitigation controller | Go (in‑process with correlator) | backend/correlator/mitigate.go |
L0 alert · L1 corrective ARP · L2 NAC/CoA; TTL revert |
| Operator API | Go (net/http + SSE + pgx) |
backend/api |
Embeds the SPA via //go:embed static/*; auth in auth.go |
| Auth / RBAC | bcrypt + HS256 JWT (stdlib crypto/hmac) |
backend/api/auth.go |
4 roles: viewer < analyst < responder < admin; 8h tokens |
| Control‑plane | Python stdlib | backend/control |
baseline_sync, approve, dhcp_lease, siem_connector, ml_train, ml_shadow |
| Lab attack tooling | Python (Scapy) | backend/generator |
arp_attack.py, dataset_gen.py, eval.py, fuzz.py, load_inject.py — authorized lab only |
| Storage | PostgreSQL 15 | infra/sql/00*.sql |
bindings, vmac_allowlist, dhcp_leases, detections, incidents, mitigation_audit, policies, settings, users |
| Observability | Prometheus + Grafana | infra/prometheus, infra/grafana |
Correlator exposes :9110/metrics; dashboard uid arpg-overview |
| Frontend | React 18 + TS + Vite + Tailwind + Chart.js | frontend/ |
SPA with auth gate, 5s polling, SSE for live detections |
4. Detection tiers
Tier 1 — signature. Forged gateway, mismatched ARP/ETH source MAC, broadcast SHA, zero‑target‑MAC unsolicited reply, gratuitous ARP storm. Deterministic; eligible for auto‑mitigation.
Tier 2 — binding‑aware. Cross‑checks ARP claims against the approved binding table (bindings), DHCP leases (dhcp_leases) and HA virtual‑MAC allow‑list (vmac_allowlist). A BIND-FLIP to an unknown MAC for a protected IP is CRITICAL. DHCP corroboration suppresses legitimate moves before they ever fire. Eligible for auto‑mitigation.
Tier 3 — anomaly / storm. EWMA on per‑MAC and per‑VLAN ARP rates, cardinality (one MAC → many IPs), low‑and‑slow fan‑out. Tier 3 alone never auto‑blocks — it raises severity and feeds the correlator.
ML (Python) runs shadow only, never gates mitigation. Logs into ml_scores so we can compare against the deterministic ground truth.
5. Correlation & mitigation
The correlator opens a durable JetStream consumer on arp.<site>.<vlan>, holds an entity graph keyed by attacker MAC, and writes one open incidents row per attacker per site. It enriches the incident with kinds (e.g. CARD-MULTI-IP,CROSS-SENSOR), claimed_ips, distinct_sensors, confidence.
The mitigation controller ladders responses:
- L0 — alert. Always emitted; SIEM + dashboard notification.
- L1 — corrective ARP. Reissue the truth binding to the segment.
- L2 — NAC / RADIUS CoA. Quarantine via switch policy. Requires actuator credentials.
Safety invariants (also in SECURITY.md):
- Auto‑mitigation requires a deterministic Tier‑1 or Tier‑2 hit. Tier‑3 / ML alone never blocks.
- Every action is reversible by default —
revert_tspopulates a TTL. - Circuit‑breaker: more than N mitigations per minute aborts to alert‑only.
- All actions go through
mitigation_auditwithrecord_hashchain‑of‑custody.
6. Operator API
REST over PostgreSQL plus a Server‑Sent Events feed at /events for live detections. JWT (bearer) on every protected route; /api/login issues 8h tokens. Full reference in API.md.
7. Dashboard
Single‑page React + TypeScript app built with Vite. The build (npm run build) writes straight into backend/api/static/; go build ./api then embeds that directory. All vendor assets ship inside the bundle — no CDNs at runtime, which matters because restrictive corporate proxies were breaking earlier builds via 403s on unpkg/jsdelivr.
Pages: Dashboard (KPIs + 7‑day trend + active incidents + segments + sensors), Incidents (active/all/archive tabs), Segments, Sensors, Binding Database, Policies, Audit Log, Users (admin), Settings.
8. Storage schema
infra/sql/001..006 (run in order). Highlights:
bindings(site, vlan, ip, mac, is_protected, state)— approval queue + truth table.vmac_allowlist(site, vlan, vip, vmac, protocol)— HSRP/VRRP/cluster — primary FPdhcp_leases(ip, mac, expires)— Tier 2 corroborating evidence.detections(ts, site, vlan, sensor_id, rule_id, severity, confidence, eth_src, sha, spa, tpa, …).incidents(attacker_mac, kinds, claimed_ips, sensor_ids, detection_count, distinct_ips, status, …).mitigation_audit(incident_id, target_mac, level, actuator, result, revert_ts, operator, record_hash).policies(name, segments, mode, enabled)— operator‑editable.settings(key, value)— operator key/value store.users(username, password_hash bcrypt, role, last_login).
suppressor.
Schema mismatch trap: the column in vmac_allowlist is vmac, not mac — an early baseline_sync.py made the opposite assumption and quietly skipped rows.
9. Lab specifics
- Gateway
192.168.10.1real MAC is20:3a:eb:9a:e8:ac. The lab baseline must match - Sensor needs
CAP_NET_RAW+CAP_NET_ADMINand the NIC in promiscuous mode. Both are - JetStream durable consumers can get stuck on stale state; the correlator calls
reality — a placeholder MAC once produced a CRITICAL false positive on legitimate gateway ARP. See the comment in `backend/control/baseline_sync.py`.
dropped after every cargo build; make caps re‑grants them.
DeleteConsumer then re‑creates with DeliverNew on startup.
10. Where rationale lives
Deep design rationale (threat model, detection literature, deployment models) is in the external archive at ../arpocalypse-2.0-docs/research/. This repository keeps only the operational doc set you see here.