Edge vs. Cloud for Predictive Maintenance: When to Use Which

The False Dichotomy

Every predictive maintenance vendor will tell you their approach is better. Edge vendors say cloud is too slow and too expensive. Cloud vendors say edge can't handle complex models. Both are wrong — or rather, both are right in specific contexts.

The real question isn't "edge or cloud?" It's "what computation should happen where, and why?"

The Decision Matrix

| Factor | Edge | Cloud | Hybrid | |---|---|---|---| | Latency requirement | <10ms (safety-critical) | <500ms (acceptable) | Mixed | | Connectivity | Intermittent/none | Reliable | Variable | | Model complexity | Simple (IF, ONNX) | Complex (LSTM, Transformer) | Both | | Data volume | >100K points/sec | Aggregated summaries | Pre-filtered | | Regulatory | Data can't leave site | Cloud-compliant region | Processed locally, meta to cloud | | Cost at scale | Fixed CAPEX | Variable OPEX | Optimized | | Update frequency | Manual/scheduled | Continuous | Staged |

When Edge Wins

1. Safety-Critical Latency

A compressor surge detection system cannot wait 200ms for a cloud round-trip. By the time the response arrives, mechanical damage is already done.

Rule of thumb: If the response time requirement is under 50ms, it must run at the edge. Period.

Real example: A centrifugal compressor running at 12,000 RPM completes a full revolution in 5ms. Surge detection needs to trigger within 2-3 revolutions. That's 10-15ms — impossible over a network round-trip to the cloud.

2. Unreliable Connectivity

Offshore oil platforms, underground mines, rural manufacturing sites, and mobile equipment (fleet, cranes, ships) often have intermittent connectivity. Your PdM system must keep working when the network doesn't.

What works at the edge:

Isolation Forest anomaly detection (small model, fast inference)
ONNX-exported neural networks (pre-trained in cloud, deployed to edge)
Rule-based alerts as a fallback
Local data buffering with sync-when-connected

3. Data Sovereignty

Some industries (defense, nuclear, pharmaceutical) prohibit sensor data from leaving the facility. Edge inference with only aggregated health scores sent to a central dashboard satisfies both the ML need and the compliance requirement.

4. Bandwidth Economics

A single vibration sensor sampling at 25.6 kHz generates ~2 GB/day of raw data. Multiply by 200 sensors, and you're looking at 400 GB/day — that's expensive to stream to the cloud and unnecessary for most use cases.

Edge preprocessing: Extract features locally (RMS, kurtosis, spectral peaks, bearing frequencies), send the 20-byte feature vector instead of the 200KB raw waveform. That's a 10,000x bandwidth reduction.

When Cloud Wins

1. Complex Model Training

Training an LSTM autoencoder or a TranAD transformer model requires GPU compute, large datasets spanning months of history, and hyperparameter optimization. This doesn't happen at the edge.

The pattern:

Sensor data flows to cloud (raw or feature-extracted)
Models train on GPU clusters (Ray Train, distributed)
Trained models export to ONNX
ONNX models deploy to edge for inference

2. Cross-Asset Learning

The most powerful PdM capability is learning patterns across your entire fleet. "Pump 7A at Plant Chicago is showing the same degradation pattern that preceded the failure of Pump 3B at Plant Munich last month."

This requires centralized data from all assets — which means cloud. Edge devices only see their local sensors.

3. Advanced Analytics

Root Cause Analysis: PCMCI causal graphs need data from multiple related sensors and assets
Remaining Useful Life: Weibull-RNN models with confidence intervals need historical failure data
Feature-attribution explanations: Computing feature contributions for explainability is computationally expensive
Digital Twins: Physics-informed models require centralized simulation environments

4. Multi-Plant Dashboards

A VP of Operations needs a single view across 15 plants, 3,000 assets, and 20,000 sensors. That's a cloud problem — aggregation, visualization, and role-based access at scale.

The Hybrid Architecture

The best PdM systems use both. Here's how the layers work:

Layer 1: Sensor → Edge Gateway (μs)
  - Signal conditioning, sampling, FFT
  - Immediate safety shutdowns (hardwired, not software)

Layer 2: Edge Agent (ms)
  - Feature extraction (rolling stats, spectral features)
  - ONNX model inference (anomaly score, basic fault class)
  - Local alerting (SMS, relay output, local HMI)
  - Data buffering for batch upload

Layer 3: Cloud Platform (seconds)
  - Full ML pipeline (LSTM, TranAD, Weibull-RNN, CNN)
  - Cross-asset pattern matching
  - Feature-attribution explanations
  - RUL prediction with confidence intervals
  - Dashboard, reporting, CMMS integration

Layer 4: Cloud ML Ops (hours/days)
  - Model retraining on accumulated data
  - A/B testing new model versions
  - AutoML for tenant-specific fine-tuning
  - ONNX export → edge deployment

Data Flow in Practice

12 kHz vibration → Edge FFT → 256 spectral bins → Cloud (every 10 seconds)
1 Hz temperature/pressure → Edge rolling stats → Cloud (every 60 seconds)
Edge anomaly score → Cloud (real-time via MQTT) → Dashboard
Cloud LSTM prediction → Alert Engine → PagerDuty/ServiceNow

Cost Optimization

The hybrid approach isn't just technically superior — it's cheaper:

| Architecture | Monthly Cost (200 assets) | Latency | Offline Capable | |---|---|---|---| | Cloud-only | ~€2,400 (compute + bandwidth) | 200-500ms | No | | Edge-only | ~€8,000 (hardware CAPEX amortized) | <10ms | Yes | | Hybrid | ~€1,800 (reduced bandwidth + smaller cloud) | <10ms local, <500ms cloud | Partially |

The bandwidth savings from edge preprocessing alone typically pay for the edge hardware within 6 months.

Choosing Your Architecture

Start with cloud if:

You have reliable connectivity (>99% uptime)
Your latency requirement is >100ms
You have <50 assets (edge hardware CAPEX doesn't justify itself)
You want the fastest time-to-value

Start with edge if:

You have unreliable or no connectivity
You have safety-critical latency requirements (<50ms)
Data cannot leave your facility
You already have edge gateways (Raspberry Pi, Siemens IOT2050, etc.)

Start hybrid if:

You have 50+ assets across multiple sites
You need both fast local response AND advanced cloud analytics
You want cross-asset learning with local resilience

Prevly's Approach

Prevly supports all three architectures:

Edge deployment: Standalone edge agent with ONNX inference, local dashboard, and parquet batch sync — for sites that need on-premise autonomy and data sovereignty.
Cloud deployment: Full cloud SaaS with all ML models, explainability, RUL prediction, and integrations.
Hybrid deployment: Edge agents plus the cloud platform — fleet-wide learning with local resilience.

See current pricing for up-to-date plans and what each tier includes.

The edge agent runs on any Linux device with Python 3.10+ — from a €35 Raspberry Pi 5 to an industrial Advantech gateway. Models train in the cloud and deploy to the edge automatically via ONNX export.

Because the right answer to "edge or cloud?" is almost always "yes."

Related reading: On-premise vs cloud PdM · Read-only OPC-UA monitoring · Build vs buy PdM