Nomad Edge — SCDC Edge Platform (Hero Case Study)

The Hero project. Nomad Edge is a production-grade SCDC platform that takes workloads from Terraform module to Packer image to Nomad deployment, with telemetry and incident workflows managed in Linear.

Core Technologies

HashiCorp NomadTerraformPackerNode.jsAWSLinear

Architecture Components

  • Three Nomad servers (The Brain) in us-east-1 for high availability and consistent scheduling
  • Nomad clients (The Sentry nodes) at remote sites running Node.js sensor workloads
  • Terraform modules defining VPC, subnets, security groups, and Nomad cluster topology
  • Packer pipelines baking immutable AMIs for rural edge nodes
  • VPN or WireGuard bridge providing secure, resilient connectivity between cloud and edge

Problem

Field teams in rural and remote areas need consistent access to critical tools, but bandwidth is limited, latency is high, and connectivity drops unexpectedly.

  • Centralised, cloud-only systems fail hard when the network does.
  • Manual configuration of edge devices makes scale and recovery slow.
  • Operations teams need observability into distant sites without constant SSH access.

Solution

Use a Single Cluster, Distant Client (SCDC) model with Nomad as orchestrator, backed by Terraform and Packer, and Node.js services designed to tolerate network partitions.

  • Run a hardened Nomad control plane in the cloud as the single source of scheduling truth.
  • Attach distant edge clients over secure tunnels with retry-friendly configurations.
  • Push offline-capable Node.js workloads to the edge so work continues when the link drops.

Outcome

Nomad Edge turns fragile rural links into a tolerable constraint instead of a blocker, giving teams consistent tools and operators clear visibility.

  • Edge workers can continue operating while offline and reconcile state when connectivity returns.
  • New sites can be brought online with a small Terraform change and a Packer-built image.
  • Operators gain a single pane of glass for workload health across distant clients.

Key Learnings & Decisions

Key Decisions & Learnings

  • Adopted an SCDC Topology to centralise control while keeping remote sites operational during network partitions, reducing operational overhead.
  • Standardised on Immutable Edge Nodes using Packer, which lowers Mean Time To Recovery (MTTR) and reduces the need for on-site expertise.
  • Codified all resources using Infrastructure as Code (Terraform) to make changes reviewable, auditable, and reproducible.
  • Structured services to be Offline-first, using local persistence and periodic syncs to tolerate flaky upstream network links.

Implementation Milestones

A breakdown of the key tasks and milestones that brought this project to life.

The Nervous System

Complete

Solidified the telemetry and edge intake path so that remote sites can report in reliably, even when the network behaves badly.

Key Tasks Completed

  • Terraform Base Infrastructure

    Laid the foundation: VPCs, subnets, security groups, and Nomad server cluster topology defined as reusable Terraform modules. This became the single source of truth for all Nomad Edge environments.

  • Packer Image Pipeline

    Built the Packer pipeline that bakes Nomad client, Node.js runtime, and baseline observability into immutable AMIs. Edge nodes now boot consistent and join cleanly.

  • Nomad Cluster Bootstrap

    Configured the three-node Nomad server cluster (The Brain) in us-east-1 with proper quorum, Raft consensus, and high-availability networking. The control plane is hardened and ready for distant clients.

  • VPN/WireGuard Bridge

    Set up secure tunnels between cloud and edge with aggressive retry logic and keepalives. The bridge tolerates flaky links and reconnects gracefully when connectivity returns.

The Brain

Complete

Turning raw telemetry into decisions, history, and alerts by layering analysis and storage on top of the Nomad Edge stream.

Key Tasks Completed

  • The Analyst

    Designing streaming analysis that prefers clear, explainable thresholds over clever magic. After enough high-pressure incidents, boring and reliable wins.

  • The Historian

    Building a history layer that keeps a truthful record of rural outages and recoveries so we can spot patterns, not just fight fires one at a time.

Monitoring & Analysis

Cluster Health

Dashboards for Nomad server and client health, allocation status, and edge node connectivity, with alerts on job failures and node flaps.

Edge Telemetry

Node.js services emit structured logs and metrics from distant clients so rural deployments can be observed without constant SSH access.

Cost & Capacity

Visual diagram for Cost & Capacity

Terraform-managed tagging plus cloud-native metrics to understand per-site capacity, cost, and utilisation trends as more clients are added.

Terraform – Resilient Nomad Edge node (rural Saskatchewan)

Loading code...