Watershed — Edge-Resilient IoT Telemetry Pipeline
Most IoT pipelines assume connectivity. Watershed doesn't. Built for rural edge environments where network partitions are scheduled events, not incidents — every sensor reading is buffered locally in SQLite first, Claude analyses the rolling window in real time, and cloud sync replays the buffer in order when connectivity returns. Proven end-to-end: 10 readings held during a simulated outage, a 28°C → 60°C thermal escalation flagged with specific diagnosis and remediation recommendation, total cloud spend ~$0.05.
10
Readings buffered during outage
Real-time
Anomaly detection latency
28°C → 60°C
Thermal escalation detected
~$0.05
Cloud spend (all sessions)
Core Technologies
Architecture Components
- Async Python agent (asyncio): MQTT subscriber, SQLite writer, and anomaly analysis on every reading
- Mosquitto MQTT broker: local message bus receiving simulated sensor telemetry
- SQLite offline buffer: stores unsynced readings with a synced flag; survives process restarts
- AWS IoT Core: cloud telemetry sink — receives buffered readings on reconnect via MQTT over TLS
- Claude Sonnet 4.6: analyzes the rolling telemetry window, classifies severity, and returns a structured diagnosis with remediation recommendation
- Terraform: provisions AWS IoT Core Thing, certificate, and policy for secure device authentication
Problem
IoT sensors in edge environments — remote sites, rural infrastructure, field deployments — cannot rely on continuous cloud connectivity. Standard pipelines that drop data during outages or miss anomalies between sync intervals are not safe for industrial or agricultural telemetry.
- Connectivity loss at the edge means lost sensor readings unless the pipeline buffers locally.
- Batch-sync approaches delay anomaly detection until the next cloud push — too slow for thermal, pressure, or safety events.
- Most IoT tutorials assume a stable connection and skip the resilience layer entirely.
Solution
A fully async Python agent that decouples local processing from cloud sync. Every reading is buffered in SQLite immediately, Claude analyzes the rolling window in real time, and cloud sync is a background task that replays the buffer in order when connectivity returns.
- SQLite offline buffer: every MQTT reading is written locally first — cloud sync never blocks data capture.
- Rolling window anomaly detection: Claude Sonnet 4.6 analyzes the last N readings on every message, not just on sync.
- Async reconnect loop: cloud sync runs as a background coroutine, replaying buffered readings to AWS IoT Core in order on reconnect.
- Terraform-provisioned device identity: AWS IoT Core Thing with X.509 certificate and scoped policy — no shared credentials.
Security Design
- Device identity via X.509: each edge device authenticates to AWS IoT Core with a unique certificate provisioned by Terraform — no shared credentials, no API keys on edge hardware.
- Scoped IoT policy: the Terraform-provisioned IoT policy permits publish to the device's specific topic prefix only — no wildcard permissions, no subscribe access.
- No credentials at rest on edge nodes: the device certificate is the only credential; the Anthropic API key for anomaly detection runs server-side, never on the edge device.
- SQLite integrity: readings are written to SQLite before any network operation — if the process crashes mid-sync, no data is lost and no reading is double-counted on reconnect.
- Terraform-managed lifecycle: all AWS resources (IoT Thing, certificate, policy, attachment) are provisioned and destroyed together — no orphaned credentials or dangling permissions after teardown.
Observability & Operations
Claude anomaly detection runs on every reading — the AI output is the observability layer for the sensor stream. Each analysis returns severity, diagnosis, and recommended action. For production, pipe the structured Claude output to CloudWatch Logs for trend analysis and alerting on high-severity events.
Outcome
End-to-end pipeline proven locally: offline buffer held 10 readings during a simulated outage, Claude detected a high-severity thermal event with a specific diagnosis and remediation recommendation, and the full sync to AWS IoT Core is wired and validated.
- Offline buffer proven: 10 readings held in SQLite during simulated outage, synced in order on reconnect.
- Claude anomaly detection proven: 28°C → 60°C thermal escalation across 5 readings flagged as high severity with diagnosis and remediation step.
- AWS IoT Core sync: MQTT-over-TLS connection wired, Terraform ready — cloud proof pending final apply.
- Total spend: ~$0.05 — Terraform state and IoT Core message costs only.
Real-World Use Cases
Watershed's offline-first architecture was designed specifically for environments where the network cannot be assumed — which describes most of the physical world outside urban data centres.
Agricultural Monitoring
Grain elevator temperature sensors, irrigation pressure monitoring, and livestock barn environmental controls — all in areas with intermittent LTE coverage. Watershed's SQLite buffer ensures no reading is lost during a connectivity gap, and Claude's rolling-window analysis catches thermal or pressure anomalies in real time rather than at the next sync.
Remote Industrial Sites
Oil and gas pipeline monitoring, remote pump stations, and mining operations where losing telemetry during a network outage could mean missing a safety-critical event. The offline-first design means the pipeline keeps running and analysing regardless of cloud connectivity state.
Indigenous and Rural Community Infrastructure
Water treatment plants, power distribution monitoring, and environmental sensors in communities hours from technical support. The architecture is explicitly designed for these environments — resilient by default, minimal cloud dependency, and recoverable without on-site expertise.
Precision Agriculture
Soil moisture, microclimate, and equipment sensors across large properties where cellular coverage is patchy and cloud dependency is a liability. Claude's per-reading anomaly detection means actionable diagnosis arrives immediately at the edge, not after the next batch sync reaches the cloud.
Key Learnings & Decisions
Edge Resilience
- Decouple local processing from cloud sync from the start — combining them in one loop means a connectivity drop blocks anomaly detection.
- SQLite is the right buffer for single-device edge agents: zero dependencies, survives restarts, queryable for the rolling window Claude needs.
- Simulate the outage before you claim resilience — the buffer is only proven if you have actually disconnected and verified the replay.
AI Anomaly Detection
- Rolling window context is essential: Claude needs the trajectory (28°C → 35°C → 44°C → 55°C → 60°C) to classify severity accurately — a single reading gives no signal.
- Structured output from Claude (severity + diagnosis + remediation) makes downstream handling deterministic and avoids brittle string parsing.
- Anomaly detection at the edge is more valuable than at the cloud sink — the agent flags the event before connectivity returns, not after the batch lands.
AWS IoT Core & Terraform
- X.509 certificate provisioning via Terraform is the right pattern — each device gets a scoped identity, not a shared key.
- AWS IoT Core MQTT endpoint uses TLS on port 8883; local Mosquitto uses plaintext on 1883. Keep these configs clearly separated in the agent.
- Terraform the IoT resources before writing agent code — the certificate and policy ARNs are needed as config, not as afterthoughts.
Implementation Milestones
A breakdown of the key tasks and milestones that brought this project to life.
Local MQTT Pipeline
CompleteAsync Python agent subscribing to Mosquitto, writing every reading to SQLite, and publishing to a local MQTT topic. Full local loop running.
Key Tasks Completed
Async MQTT Subscriber
asyncio-mqtt client subscribing to Mosquitto on localhost. Every message deserialised and handed to the buffer writer.
SQLite Offline Buffer
SQLite schema with synced flag. Reads written locally on every message — buffer survives process restarts.
Claude Anomaly Detection
CompleteRolling window analysis with Claude Sonnet 4.6. High-severity thermal escalation (28°C → 60°C across 5 readings) detected with diagnosis and remediation recommendation.
Key Tasks Completed
Claude Rolling Window Integration
Claude Sonnet 4.6 analyzes the last N readings on every message. Structured output: severity, diagnosis, remediation. Anomaly detected and logged correctly.
Offline Buffer Proof
CompleteSimulated connectivity outage: 10 readings buffered in SQLite, replayed to AWS IoT Core in order on reconnect.
Key Tasks Completed
Outage Simulation & Replay
Disconnected MQTT uplink for 10 readings. Buffer held all 10. Reconnected and confirmed in-order replay to AWS IoT Core.
AWS IoT Core Integration
In ProgressTerraform provisions IoT Thing, X.509 certificate, and scoped policy. MQTT-over-TLS connection to AWS IoT Core wired in agent. Cloud proof pending final Terraform apply.
Key Tasks Completed
Terraform IoT Core Resources
Thing, certificate, and policy defined in Terraform. ARNs wired into agent config.
Cloud End-to-End Proof
Pending final Terraform apply. Agent sync loop validated against IoT Core endpoint in test.
Monitoring & Analysis
SQLite Buffer Audit
Every reading is written to SQLite with a synced boolean and timestamp. The buffer doubles as an audit log — query unsynced rows to see exactly what was held during an outage and confirm replay order.
Claude Analysis Log
Every anomaly analysis result is logged with the input window, severity classification, diagnosis, and remediation recommendation — full chain of reasoning for every alert.
Watershed agent — buffer every reading, analyze with Claude, sync on reconnect
Part of a larger arc
The AI Security & Resilience Stack
Three independent projects that together cover the full surface of an AI-augmented infrastructure stack. Warden secures the Kubernetes runtime — Falco and OPA detecting threats as they happen, Claude triaging before an engineer is paged. Covenant controls access at the application layer — OPA as the hard gate between JWT identity and Claude, policy in code not prompts. Watershed closes the loop at the edge — async telemetry buffered through connectivity loss, with Claude flagging anomalies before the data reaches the cloud. Each project stands alone; together they tell one story.
~$2.00
Warden (AKS)
$0.00
Covenant (local Docker)
~$0.05
Watershed (AWS IoT Core)
~$2.05
Combined
Related project
Warden — Self-Healing Kubernetes Security Agent
AI-driven threat triage and auto-remediation on AKS — two-layer security proven end-to-end for ~$2
Related project
Covenant — Policy-Enforced AI Access Control
OPA as the hard gate between JWT identity and Claude — the AI doesn't decide who sees what