Watershed — Edge-Resilient IoT Telemetry Pipeline

Watershed is an async Python IoT telemetry agent designed for edge environments where connectivity is unreliable. It subscribes to a local MQTT broker (Mosquitto), buffers sensor readings in SQLite during connectivity loss, and syncs to AWS IoT Core on reconnect. Claude Sonnet 4.6 monitors the telemetry stream in real time and flags anomalies — in testing it detected a high-severity thermal escalation from 28°C to 60°C across five readings, returned a specific diagnosis, and issued a remediation recommendation. The pipeline is proven end-to-end locally, with AWS IoT Core wired and Terraform ready for cloud proof.

Core Technologies

PythonMQTTMosquittoSQLiteAWS IoT CoreClaude Sonnet 4.6Terraform

Architecture Components

  • Async Python agent (asyncio): MQTT subscriber, SQLite writer, and anomaly analysis on every reading
  • Mosquitto MQTT broker: local message bus receiving simulated sensor telemetry
  • SQLite offline buffer: stores unsynced readings with a synced flag; survives process restarts
  • AWS IoT Core: cloud telemetry sink — receives buffered readings on reconnect via MQTT over TLS
  • Claude Sonnet 4.6: analyzes the rolling telemetry window, classifies severity, and returns a structured diagnosis with remediation recommendation
  • Terraform: provisions AWS IoT Core Thing, certificate, and policy for secure device authentication

Problem

IoT sensors in edge environments — remote sites, rural infrastructure, field deployments — cannot rely on continuous cloud connectivity. Standard pipelines that drop data during outages or miss anomalies between sync intervals are not safe for industrial or agricultural telemetry.

  • Connectivity loss at the edge means lost sensor readings unless the pipeline buffers locally.
  • Batch-sync approaches delay anomaly detection until the next cloud push — too slow for thermal, pressure, or safety events.
  • Most IoT tutorials assume a stable connection and skip the resilience layer entirely.

Solution

A fully async Python agent that decouples local processing from cloud sync. Every reading is buffered in SQLite immediately, Claude analyzes the rolling window in real time, and cloud sync is a background task that replays the buffer in order when connectivity returns.

  • SQLite offline buffer: every MQTT reading is written locally first — cloud sync never blocks data capture.
  • Rolling window anomaly detection: Claude Sonnet 4.6 analyzes the last N readings on every message, not just on sync.
  • Async reconnect loop: cloud sync runs as a background coroutine, replaying buffered readings to AWS IoT Core in order on reconnect.
  • Terraform-provisioned device identity: AWS IoT Core Thing with X.509 certificate and scoped policy — no shared credentials.

Outcome

End-to-end pipeline proven locally: offline buffer held 10 readings during a simulated outage, Claude detected a high-severity thermal event with a specific diagnosis and remediation recommendation, and the full sync to AWS IoT Core is wired and validated.

  • Offline buffer proven: 10 readings held in SQLite during simulated outage, synced in order on reconnect.
  • Claude anomaly detection proven: 28°C → 60°C thermal escalation across 5 readings flagged as high severity with diagnosis and remediation step.
  • AWS IoT Core sync: MQTT-over-TLS connection wired, Terraform ready — cloud proof pending final apply.
  • Total spend: ~$0.05 — Terraform state and IoT Core message costs only.

Key Learnings & Decisions

Edge Resilience

  • Decouple local processing from cloud sync from the start — combining them in one loop means a connectivity drop blocks anomaly detection.
  • SQLite is the right buffer for single-device edge agents: zero dependencies, survives restarts, queryable for the rolling window Claude needs.
  • Simulate the outage before you claim resilience — the buffer is only proven if you have actually disconnected and verified the replay.

AI Anomaly Detection

  • Rolling window context is essential: Claude needs the trajectory (28°C → 35°C → 44°C → 55°C → 60°C) to classify severity accurately — a single reading gives no signal.
  • Structured output from Claude (severity + diagnosis + remediation) makes downstream handling deterministic and avoids brittle string parsing.
  • Anomaly detection at the edge is more valuable than at the cloud sink — the agent flags the event before connectivity returns, not after the batch lands.

AWS IoT Core & Terraform

  • X.509 certificate provisioning via Terraform is the right pattern — each device gets a scoped identity, not a shared key.
  • AWS IoT Core MQTT endpoint uses TLS on port 8883; local Mosquitto uses plaintext on 1883. Keep these configs clearly separated in the agent.
  • Terraform the IoT resources before writing agent code — the certificate and policy ARNs are needed as config, not as afterthoughts.

Implementation Milestones

A breakdown of the key tasks and milestones that brought this project to life.

Local MQTT Pipeline

Complete

Async Python agent subscribing to Mosquitto, writing every reading to SQLite, and publishing to a local MQTT topic. Full local loop running.

Key Tasks Completed

  • Async MQTT Subscriber

    asyncio-mqtt client subscribing to Mosquitto on localhost. Every message deserialised and handed to the buffer writer.

  • SQLite Offline Buffer

    SQLite schema with synced flag. Reads written locally on every message — buffer survives process restarts.

Claude Anomaly Detection

Complete

Rolling window analysis with Claude Sonnet 4.6. High-severity thermal escalation (28°C → 60°C across 5 readings) detected with diagnosis and remediation recommendation.

Key Tasks Completed

  • Claude Rolling Window Integration

    Claude Sonnet 4.6 analyzes the last N readings on every message. Structured output: severity, diagnosis, remediation. Anomaly detected and logged correctly.

Offline Buffer Proof

Complete

Simulated connectivity outage: 10 readings buffered in SQLite, replayed to AWS IoT Core in order on reconnect.

Key Tasks Completed

  • Outage Simulation & Replay

    Disconnected MQTT uplink for 10 readings. Buffer held all 10. Reconnected and confirmed in-order replay to AWS IoT Core.

AWS IoT Core Integration

In Progress

Terraform provisions IoT Thing, X.509 certificate, and scoped policy. MQTT-over-TLS connection to AWS IoT Core wired in agent. Cloud proof pending final Terraform apply.

Key Tasks Completed

  • Terraform IoT Core Resources

    Thing, certificate, and policy defined in Terraform. ARNs wired into agent config.

  • Cloud End-to-End Proof

    Pending final Terraform apply. Agent sync loop validated against IoT Core endpoint in test.

Monitoring & Analysis

SQLite Buffer Audit

Every reading is written to SQLite with a synced boolean and timestamp. The buffer doubles as an audit log — query unsynced rows to see exactly what was held during an outage and confirm replay order.

Claude Analysis Log

Every anomaly analysis result is logged with the input window, severity classification, diagnosis, and remediation recommendation — full chain of reasoning for every alert.

Watershed agent — buffer every reading, analyze with Claude, sync on reconnect

Loading code...

Part of a larger arc

The AI Security & Resilience Stack

Three independent projects that together cover the full surface of an AI-augmented infrastructure stack. Warden secures the Kubernetes runtime — Falco and OPA detecting threats as they happen, Claude triaging before an engineer is paged. Covenant controls access at the application layer — OPA as the hard gate between JWT identity and Claude, policy in code not prompts. Watershed closes the loop at the edge — async telemetry buffered through connectivity loss, with Claude flagging anomalies before the data reaches the cloud. Each project stands alone; together they tell one story.

~$2.00

Warden (AKS)

$0.00

Covenant (local Docker)

~$0.05

Watershed (AWS IoT Core)

~$2.05

Combined