RUL Prediction Explained — With Attribution You Can Audit

A remaining-useful-life (RUL) prediction is only useful if a reliability engineer believes it. "This bearing has 9 days left" is worthless if the next question — why? — gets a shrug. The models that move maintenance decisions are the ones that show their reasoning.

This post covers what RUL prediction actually produces, how the anomaly and RUL models behind it work, and why auditable attribution is the feature that turns a prediction into an action.

What RUL prediction outputs

RUL is an estimate of how much useful life an asset has left before it crosses into failure or unacceptable degradation — expressed in cycles, hours, or days depending on the asset.

A good RUL output is more than a single number:

A point estimate (e.g. "≈ 9 days") for triage.
A confidence band (e.g. P10/P50/P90) so you can plan against the pessimistic case, not just the median.
Feature attribution — which inputs drove this estimate down.

That third item is what separates a decision-support tool from a black box.

The models behind the number

Different jobs call for different model families. The honest answer is that no single model wins everywhere — you pick by data shape and sample count.

Anomaly detection — LSTM autoencoder. For catching "this doesn't look normal" on vibration and process signals, an LSTM autoencoder is trained only on healthy operation. It learns to reconstruct normal behavior; when reconstruction error spikes, something has changed. Because it's unsupervised, it works without a labeled failure history — which most plants don't have. (For very small datasets, a simpler Isolation Forest is the cold-start fallback; for very large ones with a GPU, a transformer-based detector like TranAD.)

RUL estimation — gradient-boosted trees and sequence models. For predicting remaining life from engineered features (rolling statistics over multiple windows, trend slopes, spectral features), gradient-boosted trees are a strong, fast, interpretable baseline. Where raw multi-sensor sequences are available, a tuned LSTM learns the temporal degradation pattern directly: on the public NASA C-MAPSS turbofan dataset, a tuned LSTM remaining-useful-life model reaches an RMSE around 11.5 cycles with MAE under 9 — competitive with published results — and trains in under a minute on a single GPU. For probabilistic life curves, Weibull-RNN and physics-informed models (e.g. Paris-law crack growth) add P10/P50/P90 bands.

Fault classification — 1D-CNN. For diagnosing what is wrong (bearing outer-race, inner-race, imbalance, misalignment), a 1D convolutional network on high-frequency vibration windows classifies the fault signature directly.

The point isn't the acronyms. It's that these are the same architectures used in peer-reviewed condition-monitoring work — not simplified toys — and they run on-premise.

Why "vibration anomaly detection LSTM" keeps coming up

If you search the technical literature, the LSTM autoencoder shows up constantly for vibration anomaly detection, for three reasons:

It models sequence, not snapshots. Vibration is inherently temporal; an autoencoder over a sliding window captures how a signal evolves, not just its instantaneous value.
It needs no failure labels. Train on healthy data, flag deviations. That matches the reality of plants that haven't catalogued every failure mode.
The threshold is tunable. Reconstruction error gives you a continuous score; you choose the alarm threshold to balance precision against recall for your tolerance for false alarms.

The trade-off is that an anomaly score alone says "something's off" — not what or how urgent. That's why anomaly detection feeds RUL and fault models, rather than replacing them.

Attribution: the part that makes it auditable

Here's the requirement that separates production-grade PdM from a science project: every prediction must be explainable per-feature.

For the tree-based RUL models, per-feature contribution analysis (SHAP-style attribution) shows exactly how much each input — vibration RMS, temperature trend, pressure variance — pushed the estimate up or down for this specific prediction.
For the deep models (LSTM autoencoder, 1D-CNN), gradient-based attribution (Integrated Gradients) maps the model's output back onto the input signal, so you can see which part of the waveform or which window drove the anomaly score.

Why does this matter beyond engineering curiosity?

Triage. An engineer can sanity-check the model against domain knowledge in seconds. "RUL dropped and the top driver is bearing-frequency vibration energy" is actionable. "RUL dropped, reason unknown" gets ignored.
Trust. Maintenance teams adopt tools they can interrogate. Attribution is how a model earns standing orders instead of being overridden.
Audit. In regulated environments, "the model said so" isn't a defensible basis for a maintenance decision. A per-feature attribution record is.

A prediction without attribution is a number you have to take on faith. A prediction with attribution is a hypothesis you can verify — and that's the only kind reliability engineers act on.

What to ask about any RUL model

Does it output a confidence band, or just a point estimate?
Can it show per-feature attribution for an individual prediction? (Not global feature importance — this prediction.)
What does it do at cold start, before you have failure history?
What's the published benchmark and on what public dataset? (Vague "highly accurate" claims aren't benchmarks.)
Does it run where your data lives?

The best RUL model isn't the one with the lowest error on a slide. It's the one whose every prediction your engineers can check — and therefore trust.

Prevly runs anomaly detection, RUL prediction, and fault classification on-premise, with per-feature attribution on every prediction. Try the interactive demo or request a technical walkthrough.