Architecture

ARP Guardian Enterprise Suite is a real‑time ARP‑spoofing detection and reversible‑mitigation platform. It runs on a flat L2 lab segment today (192.168.10.0/24, 4 nodes) but the contracts are designed for fanned‑out, multi‑VLAN deployments. The hot path is Rust; correlation, mitigation and the operator API are Go; control‑plane scripts and ML scoring are Python.

1. Data flow (single segment, single sensor)

attacker (lab-only) ─► ens33 ─► sensor (Rust, AF_PACKET, Tier 1/2/3)
                                  │
                                  ▼  detection NDJSON
                          NATS JetStream  subject = arp.<site>.<vlan>      (lab: arp.lab.10)
                                  ▼
                       correlator (Go, nats.go + pgx)
                          • entity graph (attacker MAC → claimed IPs, sensors)
                          • multi‑evidence fusion → incident row
                                  ▼
                       mitigation controller (Go)
                          L0 alert │ L1 corrective ARP │ L2 NAC/CoA
                          • requires deterministic Tier‑1/2 hit to auto‑act
                          • TTL auto‑revert + circuit‑breaker
                                  ▼
                       PostgreSQL  ──►  Go API (REST + SSE + RBAC)
                                              │
                                              ▼
                       React 18 + TS dashboard, embedded via go:embed
                       (login → /api/login; SSE feed on /events)

                       Prometheus  :9110 (correlator)
                       Grafana     :3000 (dashboard uid: arpg-overview)
                       SIEM        CEF + ECS over syslog (control/siem_connector.py)

Lab nodes: control 192.168.10.200 hosts NATS + PostgreSQL + Prometheus + Grafana + the Go API; managed-1/2/3 (.201/.202/.203) host sensors. One L2 segment, one VLAN.

2. Repository layout

backend/      Rust sensor + Go services + Python control tools (one daemon family)
frontend/     Vite + React + TS SPA (operator console)
infra/        docker-compose, SQL schema, Grafana, Prometheus, Ansible, systemd
tests/        Playwright E2E
docs/         this directory
marketing/    deck + landing-site content (not part of the platform)

The frontend's vite build writes into backend/api/static/, which go:embed then pulls into the API binary — at runtime a single Go binary serves the SPA and the JSON/SSE API.

3. Components

Component Tech Path Notes
Sensor (hot path) Rust 1.95 (libc, no pcap) backend/sensor Tier 1 signature, Tier 2 binding, Tier 3 anomaly/storm; commands: selftest, bench, replay, live
Bus NATS JetStream external (infra/docker-compose.yml) Hand‑rolled minimal NATS client in sensor/src/nats.rs with PING/PONG keepalive + reconnect
Correlator Go (nats.go + pgx) backend/correlator Consumer reset on startup (DeleteConsumer + DeliverNew) to avoid stuck JetStream state
Mitigation controller Go (in‑process with correlator) backend/correlator/mitigate.go L0 alert · L1 corrective ARP · L2 NAC/CoA; TTL revert
Operator API Go (net/http + SSE + pgx) backend/api Embeds the SPA via //go:embed static/*; auth in auth.go
Auth / RBAC bcrypt + HS256 JWT (stdlib crypto/hmac) backend/api/auth.go 4 roles: viewer < analyst < responder < admin; 8h tokens
Control‑plane Python stdlib backend/control baseline_sync, approve, dhcp_lease, siem_connector, ml_train, ml_shadow
Lab attack tooling Python (Scapy) backend/generator arp_attack.py, dataset_gen.py, eval.py, fuzz.py, load_inject.pyauthorized lab only
Storage PostgreSQL 15 infra/sql/00*.sql bindings, vmac_allowlist, dhcp_leases, detections, incidents, mitigation_audit, policies, settings, users
Observability Prometheus + Grafana infra/prometheus, infra/grafana Correlator exposes :9110/metrics; dashboard uid arpg-overview
Frontend React 18 + TS + Vite + Tailwind + Chart.js frontend/ SPA with auth gate, 5s polling, SSE for live detections

4. Detection tiers

Tier 1 — signature. Forged gateway, mismatched ARP/ETH source MAC, broadcast SHA, zero‑target‑MAC unsolicited reply, gratuitous ARP storm. Deterministic; eligible for auto‑mitigation.

Tier 2 — binding‑aware. Cross‑checks ARP claims against the approved binding table (bindings), DHCP leases (dhcp_leases) and HA virtual‑MAC allow‑list (vmac_allowlist). A BIND-FLIP to an unknown MAC for a protected IP is CRITICAL. DHCP corroboration suppresses legitimate moves before they ever fire. Eligible for auto‑mitigation.

Tier 3 — anomaly / storm. EWMA on per‑MAC and per‑VLAN ARP rates, cardinality (one MAC → many IPs), low‑and‑slow fan‑out. Tier 3 alone never auto‑blocks — it raises severity and feeds the correlator.

ML (Python) runs shadow only, never gates mitigation. Logs into ml_scores so we can compare against the deterministic ground truth.

5. Correlation & mitigation

The correlator opens a durable JetStream consumer on arp.<site>.<vlan>, holds an entity graph keyed by attacker MAC, and writes one open incidents row per attacker per site. It enriches the incident with kinds (e.g. CARD-MULTI-IP,CROSS-SENSOR), claimed_ips, distinct_sensors, confidence.

The mitigation controller ladders responses:

  • L0 — alert. Always emitted; SIEM + dashboard notification.
  • L1 — corrective ARP. Reissue the truth binding to the segment.
  • L2 — NAC / RADIUS CoA. Quarantine via switch policy. Requires actuator credentials.

Safety invariants (also in SECURITY.md):

  1. Auto‑mitigation requires a deterministic Tier‑1 or Tier‑2 hit. Tier‑3 / ML alone never blocks.
  2. Every action is reversible by default — revert_ts populates a TTL.
  3. Circuit‑breaker: more than N mitigations per minute aborts to alert‑only.
  4. All actions go through mitigation_audit with record_hash chain‑of‑custody.

6. Operator API

REST over PostgreSQL plus a Server‑Sent Events feed at /events for live detections. JWT (bearer) on every protected route; /api/login issues 8h tokens. Full reference in API.md.

7. Dashboard

Single‑page React + TypeScript app built with Vite. The build (npm run build) writes straight into backend/api/static/; go build ./api then embeds that directory. All vendor assets ship inside the bundle — no CDNs at runtime, which matters because restrictive corporate proxies were breaking earlier builds via 403s on unpkg/jsdelivr.

Pages: Dashboard (KPIs + 7‑day trend + active incidents + segments + sensors), Incidents (active/all/archive tabs), Segments, Sensors, Binding Database, Policies, Audit Log, Users (admin), Settings.

8. Storage schema

infra/sql/001..006 (run in order). Highlights:

  • bindings(site, vlan, ip, mac, is_protected, state) — approval queue + truth table.
  • vmac_allowlist(site, vlan, vip, vmac, protocol) — HSRP/VRRP/cluster — primary FP
  • suppressor.

  • dhcp_leases(ip, mac, expires) — Tier 2 corroborating evidence.
  • detections(ts, site, vlan, sensor_id, rule_id, severity, confidence, eth_src, sha, spa, tpa, …).
  • incidents(attacker_mac, kinds, claimed_ips, sensor_ids, detection_count, distinct_ips, status, …).
  • mitigation_audit(incident_id, target_mac, level, actuator, result, revert_ts, operator, record_hash).
  • policies(name, segments, mode, enabled) — operator‑editable.
  • settings(key, value) — operator key/value store.
  • users(username, password_hash bcrypt, role, last_login).

Schema mismatch trap: the column in vmac_allowlist is vmac, not mac — an early baseline_sync.py made the opposite assumption and quietly skipped rows.

9. Lab specifics

  • Gateway 192.168.10.1 real MAC is 20:3a:eb:9a:e8:ac. The lab baseline must match
  • reality — a placeholder MAC once produced a CRITICAL false positive on legitimate gateway ARP. See the comment in `backend/control/baseline_sync.py`.

  • Sensor needs CAP_NET_RAW + CAP_NET_ADMIN and the NIC in promiscuous mode. Both are
  • dropped after every cargo build; make caps re‑grants them.

  • JetStream durable consumers can get stuck on stale state; the correlator calls
  • DeleteConsumer then re‑creates with DeliverNew on startup.

10. Where rationale lives

Deep design rationale (threat model, detection literature, deployment models) is in the external archive at ../arpocalypse-2.0-docs/research/. This repository keeps only the operational doc set you see here.