IoT Streaming Analytics: Architecture, Stack & Delivery
Implemented streaming analytics with NATS, SQLMesh, and RisingWave for monitoring and failure detection. Built with NATS, SQLMesh, RisingWave, Python.
By Yogendra Raghuvanshi
Introduction
In this article I break down how I designed and delivered IoT Streaming Analytics — from the original business pain point through architecture, technology choices, implementation phases, and lessons learned. This is the same project featured in my portfolio's Built Solutions section, documented here in full technical depth for engineers, architects, and hiring managers who want to understand how the work was actually done.
I led this initiative as part of my broader program delivery work across enterprise AI, data platforms, and analytics transformation. The approach reflects how I operate: start with the business outcome, choose the minimum viable architecture, instrument everything, and iterate with real users.
Business problem
Device failures required near-real-time detection, not batch-only reporting.
Implemented streaming analytics with NATS, SQLMesh, and RisingWave for monitoring and failure detection.
Architecture decisions
Key design choices that shaped reliability, performance, and maintainability of the solution.
- Hot path kept under 1s alert latency for temperature anomalies
- Replay topic supports backfill after device firmware updates
- SQLMesh manages batch/stream consistency for daily reports
Technology stack in depth
This project was built with NATS, SQLMesh, RisingWave, Python. Each technology was selected for a specific role in the architecture — not because it was trendy, but because it solved a measured bottleneck.
- NATS: production component with documented integration patterns and operational runbooks
- SQLMesh: production component with documented integration patterns and operational runbooks
- RisingWave: production component with documented integration patterns and operational runbooks
- Python: production component with documented integration patterns and operational runbooks
Implementation timeline
Delivery followed phased milestones with explicit deliverables at each gate. This kept stakeholders aligned and made progress auditable for program reviews.
- Event contract design (1 week): Defined device telemetry schema and NATS subject hierarchy.
- → Avro/JSON contracts
- → Subject naming standard
- → Retention policy
- Stream pipelines (4 weeks): SQLMesh models and RisingWave materialized views for aggregates and alerts.
- → Streaming DAG
- → Alert rules
- → Dead-letter handling
- Operations rollout (2 weeks): Dashboards and on-call runbooks for failure scenarios.
- → Ops dashboard
- → Runbooks
- → SLO definitions
Streaming topology
Device telemetry flows through NATS subjects organized by site, device type, and severity. Producers publish JSON events; consumers include RisingWave for sub-second aggregates and SQLMesh for batch reconciliation and daily reporting.
The hot path targets under one second from anomaly detection to alert dispatch. The cold path replays events from a retention-backed topic when firmware updates require backfill.
- NATS subject hierarchy: iot.{site}.{device_type}.{metric}
- RisingWave materialized views for rolling windows and threshold breaches
- SQLMesh incremental models for daily device health summaries
- Dead-letter queue for malformed payloads with ops dashboard
Operations and SLOs
Streaming systems fail silently without explicit SLOs. We defined alert latency, event drop rate, and replay completion time as first-class metrics with on-call runbooks for each failure mode.
- Temperature anomaly alerts: p99 < 1s from event to notification
- Replay topic supports 7-day retention for firmware rollout windows
- Batch/stream consistency validated via SQLMesh reconciliation jobs
Business outcomes
Enabled proactive operations through real-time device insights.
Success was measured against adoption, latency/throughput targets, and stakeholder feedback — not just deployment dates. Program reviews tracked these KPIs alongside technical milestones.
Lessons learned
Streaming architecture must define retention, replay, and alert semantics upfront.
If I were starting again, I would invest even earlier in observability and golden test sets. The cost of retrofitting guardrails after pilot launch always exceeds building them in from day one.