SHAP Explained: How AI Tells Your Engineers Exactly Why It Fired an Alert

"Because the Model Said So" Isn't Good Enough

Picture this conversation at your morning maintenance standup:

"The AI says replace the bearing on Motor 12B."

"Why?"

"Because the model confidence is 91%."

"91% of what? What's actually wrong? Is it vibration? Temperature? Is it even a bearing or could it be alignment?"

Silence.

This is the moment where most AI-based maintenance tools lose the people who matter most: the reliability engineers who actually have to make the call. And they're right to push back. Decades of experience tell them that a number without context is useless — or worse, dangerous.

The Trust Problem Is Real

Reliability engineers aren't being stubborn when they reject black-box AI. They're being responsible. They know that:

A model trained on limited data can learn the wrong patterns
Sensor drift, calibration errors, and process changes create false signals
The cost of a wrong shutdown can be tens of thousands of euros per hour
The cost of a missed failure can be even higher

So when an AI system says "fault detected" without explaining its reasoning, it gets ignored. Studies on predictive maintenance adoption consistently show the same thing: the technology isn't the bottleneck — trust is. Engineers need to understand the "why" before they'll act on the "what."

This is exactly what SHAP was designed to solve.

What SHAP Actually Is

SHAP stands for SHapley Additive exPlanations. The name comes from Lloyd Shapley, a Nobel Prize-winning economist who solved a fundamental problem in game theory: how do you fairly distribute the payout of a team effort among the individual players?

Applied to machine learning, the "team" is the set of input features (sensor readings, calculated metrics), and the "payout" is the model's prediction. SHAP answers the question: how much did each feature contribute to this specific prediction?

Here's a simple analogy. Imagine your plant had a perfect day — zero anomalies, everything running smoothly. That's your baseline. Now something changes: vibration goes up, temperature rises, current becomes erratic. SHAP asks, for each of those changes: "If we put this one feature back to its normal value, how much would the prediction change?" It does this for every feature and every combination of features, producing a precise contribution score for each one.

The result isn't a vague "these features are important in general." It's specific to this prediction, this machine, this moment: "Vibration RMS contributed +0.34 toward the fault prediction. Temperature delta contributed +0.21. RPM standard deviation contributed -0.05, actually pushing away from the fault prediction."

That's not a black box. That's a diagnostic report.

How Prevly applies this. Prevly uses SHAP directly for its gradient-boosted Remaining-Useful-Life model. For the deep-learning models — the LSTM autoencoder that detects anomalies and the CNN that classifies bearing faults — it uses a closely related attribution method, Integrated Gradients, which produces the same kind of per-feature contribution breakdown. The waterfall chart and how you read it are identical; only the underlying math differs (Shapley values vs. integrated gradients). So everything in this guide applies to every Prevly alert, whichever model fired it.

Reading the Waterfall: A Bearing Fault Example

Let's walk through a real prediction. The model has flagged a bearing fault on a centrifugal pump with 87% confidence. Six features drove the prediction. Here's the SHAP waterfall:

Base value (healthy average): 0.12
───────────────────────────────────────────────
vibration_x_rms     ████████████████░  +0.34
temperature_delta   ██████████░        +0.21
current_kurtosis    █████░             +0.12
flow_rate_mean      ██░                +0.04
rpm_std             ▓░                 -0.05
pressure_slope      ▓▓░                -0.08
───────────────────────────────────────────────
Final prediction:                       0.70
                                   (87% fault probability after sigmoid)

Reading this from top to bottom:

vibration_x_rms (+0.34): This is the biggest driver. Radial vibration RMS has increased beyond what the model considers normal for this machine under current operating conditions. Not just "above threshold" — above the model's learned baseline for this specific pump at this speed and load. An experienced engineer seeing this would immediately think: mechanical looseness, imbalance, or bearing defect.

temperature_delta (+0.21): This isn't absolute temperature — it's the difference between bearing temperature and housing temperature. A growing delta means the bearing is generating more heat than it should relative to its surroundings. This rules out ambient temperature changes and points to internal friction.

current_kurtosis (+0.12): Kurtosis measures the "spikiness" of the motor current signal. Elevated kurtosis means brief, sharp current fluctuations — the kind you get when a damaged bearing intermittently catches, causing the motor to work harder in short bursts. Normal kurtosis is close to 3.0 (Gaussian); this bearing's motor is showing 4.8.

flow_rate_mean (+0.04): A small positive contribution. Flow has dropped slightly, consistent with increased mechanical resistance in the pump — but not enough on its own to flag anything.

rpm_std (-0.05): Here's where it gets interesting. The negative value means this feature is pushing away from the fault prediction. RPM is stable, which tells the engineer: this isn't a drive problem, VFD issue, or load variation. The motor speed is consistent. That actually helps narrow the diagnosis — the problem is downstream of the drive.

pressure_slope (-0.08): Discharge pressure trend is flat. Again, this is evidence against certain failure modes (like impeller erosion or cavitation, which would show pressure changes). The model accounts for this — it's not just looking at what's wrong, it's also considering what's normal.

Second Example: Motor Stator Winding Fault

Bearing faults are the textbook example, but SHAP works identically for electrical failure modes — where the feature contributions tell a completely different story.

The model flags Motor 22C with 79% fault probability. Here's the SHAP waterfall:

Base value (healthy average): 0.15
───────────────────────────────────────────────
current_rms         ██████████████░    +0.28
current_imbalance   █████████░         +0.19
temperature_stator  ██████░            +0.14
vibration_x_rms     ██░                +0.04
power_factor        ▓▓░                -0.07
rpm_std             ▓░                 -0.03
───────────────────────────────────────────────
Final prediction:                       0.70
                                   (79% fault probability after sigmoid)

The pattern is immediately recognizable to any motor specialist: current-dominated, not vibration-dominated. Current RMS is elevated, phase imbalance is growing (indicating asymmetric winding resistance), and stator temperature is rising — classic early-stage inter-turn short circuit. Meanwhile, vibration and RPM are essentially normal, ruling out mechanical causes.

Without SHAP, the alert would say "anomaly detected on Motor 22C." The engineer would default to checking the bearing (the most common failure mode). With SHAP, they go straight to electrical testing — megger insulation resistance, surge comparison, and thermal imaging of the winding — saving hours of misdirected investigation.

This example illustrates a critical point: SHAP doesn't just tell you something is wrong, it tells you what kind of wrong, guiding the diagnostic workflow before a technician even walks to the machine.

SHAP vs. Other Explainability Methods

SHAP isn't the only approach to ML explainability, but it has specific advantages for industrial use. LIME (Local Interpretable Model-agnostic Explanations) approximates the model locally using a simpler linear model. It's faster to compute but less precise — LIME attributions can vary between runs for the same prediction, which undermines trust in a maintenance context where consistency matters. Attention weights from Transformer models show which timesteps the model focused on, but they don't provide per-feature attribution — you know when the model looked, but not what it saw. SHAP provides exact, deterministic, per-feature contributions grounded in game theory, making it the strongest choice when engineers need repeatable, auditable explanations they can act on.

What This Means in Practice

An engineer reading this SHAP breakdown draws the same conclusion they would from a manual vibration analysis — but in seconds instead of hours:

Elevated radial vibration + bearing temperature rise = mechanical bearing defect
Current kurtosis confirms intermittent mechanical resistance
Stable RPM rules out drive/electrical issues
Stable pressure rules out hydraulic/cavitation issues
Most likely diagnosis: outer race defect, early-to-mid stage

They now have a specific, testable hypothesis. They can schedule an ultrasound inspection, check the vibration spectrum for bearing defect frequencies (BPFO), and make a data-backed decision about when to intervene.

Compare this to "anomaly detected, confidence 87%." It's the difference between actionable intelligence and noise.

Why Plant Managers Should Care

Explainability isn't just an engineering nicety. For plant managers and operations leaders, SHAP attribution serves three critical business functions:

Audit trail. Every prediction comes with a complete record of what drove it. When leadership asks "why did we shut down Line 3 on Tuesday?" the answer isn't "the AI told us to." It's "bearing vibration RMS was 2.3x baseline, thermal delta was rising at 0.4 degrees per day, and current kurtosis indicated intermittent mechanical resistance — consistent with an outer race defect confirmed on inspection." That holds up in any review.

Compliance and standards. ISO 55000 (asset management) and ISO 27001 (information security) both emphasize documented decision-making processes. SHAP attribution gives you machine-generated, timestamped documentation for every maintenance decision that involved AI. When the auditor asks how your AI works, you can show them exactly what it considers and why.

Reduced false positive cost. When engineers trust the alerts — because they can verify the reasoning — they act on them faster and more accurately. No more "cry wolf" fatigue where valid alerts get dismissed because the system has a reputation for false alarms. Every alert comes with its own evidence, and engineers can quickly distinguish between a real degradation pattern and a sensor glitch.

How Prevly Implements This

In Prevly, SHAP isn't an afterthought or a premium add-on. Every anomaly alert and every RUL (Remaining Useful Life) prediction automatically includes the top contributing features with their SHAP values. The waterfall visualization is built into the alert detail view — engineers see it the moment they open an alert.

The system computes SHAP attribution in real-time using the same model that generated the prediction. There's no separate explainability pipeline to maintain. And because Prevly learns a separate baseline per machine, the SHAP values reflect what's abnormal for that specific asset — not some generic threshold from a standards book.

For teams that want to go deeper, the full SHAP feature vector is available via API, enabling integration with existing CMMS workflows, custom dashboards, or root cause analysis tools.

Trust Is the Prerequisite

The best ML model in the world is worthless if nobody acts on its output. SHAP bridges the gap between what AI can detect and what engineers will actually trust. It turns a prediction into a conversation — one where the AI shows its work and the engineer decides what to do with it.

That's not AI replacing expertise. That's AI augmenting expertise with speed and consistency.

Start a free trial at prevly.org and see explainable AI predictions on your own equipment data. Every alert comes with the "why" built in.