Testing

What we run, why, and how to invoke each suite. Numbers in claims should always be measured ([MEASURED]) — never project them.

1. Sensor unit tests (Rust)

cd backend && make sensor-test         # cargo test --release

Covers detection logic in detect.rs and anomaly.rs. Synthetic frames, no privileges.

2. Sensor bench (Rust)

make sensor-bench                       # ./target/release/arpg-sensor bench 2000000

Hot‑path parse+inspect on 2 M synthetic frames. Sub‑µs/frame on lab hardware proves the <2 ms SLA with several orders of magnitude headroom.

3. Sensor selftest

make sensor-selftest                    # synthetic Tier 1/2/3 cases + verdicts

Exercises every rule without touching the network. Useful as a first sanity check after a detector change.

4. Correlator + API (Go)

cd backend && go test ./...             # everything in correlator/ and api/

There aren't many unit tests today; the truth is in the end‑to‑end runs below.

5. Labeled‑dataset evaluation

make eval                               # backend/generator/eval.py against the dataset

Reads a labeled set (poison / GARP storm / mix / benign), runs the sensor in replay mode, computes precision / recall / FP. The release note records the measured numbers; update docs/OPERATIONS.md if they change materially.

6. Fuzz / adversarial harness

make fuzz                               # backend/generator/fuzz.py

Mutated ARP frames (oversized, truncated, weird opcodes, partial replies) — sensor must neither crash nor silently drop them.

7. Latency probe

make latency-probe                      # detect→mitigate p95

End‑to‑end timing on synthetic attacks. Reports p50 / p95 / p99 over the configured window. SLA target is <100 ms; lab measurement is around 5 ms.

8. Frontend type‑check

cd frontend && npm run build            # tsc --noEmit + vite build

The build script runs tsc --noEmit first; a TS error fails the build. There is no component test suite today — the integration coverage lives in Playwright (below).

9. End‑to‑end (Playwright)

cd tests/e2e
npm install
PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=ubuntu24.04-x64 npx playwright install chromium

node audit.mjs       http://127.0.0.1:8080   # screenshots every page (sanity)
node writeflow.mjs   http://127.0.0.1:8080   # binding add/delete + policy toggle
node rbac_ui.mjs     http://127.0.0.1:8080   # viewer vs admin UI gates

Screenshots land in tests/e2e/shots/. The API must be running with demo accounts on port 8080.

10. Manual smoke

sudo python3 backend/generator/arp_attack.py -i ens33 --mode poison --spa 192.168.10.1

Watch the dashboard. Expect a CRITICAL incident within ~5 s, an audit row, and (in GUARDED/ENFORCE) a corrective ARP from the controller. Acknowledge the incident to close the loop.

11. What we don't test (honest debt)

  • Multi‑VLAN sharding under load — designed, not stress‑tested.
  • vMAC‑mimicry against a real HA segment — currently exercised only against synthetic
  • HSRP/VRRP allow‑list entries.

  • NAC / RADIUS CoA actuator — wired up against a stub; the L2 path on real hardware needs
  • vendor‑specific integration.

Document these gaps when claiming results; don't paper over them.