Choosing the Right PdM Solution: Build vs. Buy
Choosing the Right PdM Solution: Build vs. Buy
You've decided predictive maintenance is worth investing in. The next question: build it yourself or buy a platform?
Both paths work. The right choice depends on your team, timeline, and what you're optimizing for.
The Build Path
What it takes: A data engineering team (2-3 engineers) plus an ML engineer, 6-12 months for MVP, ongoing maintenance.
Where Build Wins
- Full control over models. You train on your exact equipment and failure modes.
- Integration flexibility. Custom SCADA from 2008? You build the exact connectors.
- No per-asset pricing. At very large scale (10,000+ assets), marginal cost is infrastructure only.
Where Build Hurts
- Time to value. Equipment fails while you're building the pipeline.
- The 80% problem. Data engineering consumes 80% of effort. You're not building a model — you're building a distributed real-time data platform.
- Ongoing maintenance. Models drift. Kafka needs upgrades. This is permanent headcount.
- Cold start for ML. Without pre-trained baselines, your models start from zero.
The Buy Path
What it takes: Vendor evaluation (2-4 weeks), pilot deployment (1-2 weeks), production rollout (2-4 weeks).
Where Buy Wins
- Immediate time to value. Pre-trained models detect anomalies from day one.
- Operational maturity. The vendor has already solved edge collection, feature engineering, model serving, and alert deduplication.
- Cross-industry learning. Models trained across multiple customers transfer knowledge.
- Predictable cost. Per-asset pricing means you know the cost before you start.
Where Buy Hurts
- Vendor dependency. Your predictions depend on a third party.
- Less customization. You might want a specific model architecture the platform doesn't support.
- Data leaves your network. (Though edge-hybrid architectures address this.)
3-Year Total Cost of Ownership
The "build is cheaper at scale" argument is common but rarely backed by actual numbers. Here's an illustrative TCO comparison across three scale points — the point is the cost structure, not the exact euros (see current pricing for real SaaS rates):
| Cost Component | Build (100 assets) | Buy (100 assets) | Build (500 assets) | Buy (500 assets) | Build (2,000 assets) | Buy (2,000 assets) | |---|---|---|---|---|---|---| | Year 1: Development | €350,000 | €0 | €350,000 | €0 | €450,000 | €0 | | Year 1: Infrastructure | €40,000 | Included | €80,000 | Included | €180,000 | Included | | Year 1: Licensing/SaaS | €0 | €60,000 | €0 | €180,000 | €0 | €480,000 | | Year 2: Maintenance/Dev | €180,000 | €0 | €220,000 | €0 | €280,000 | €0 | | Year 2: Infrastructure | €40,000 | Included | €80,000 | Included | €180,000 | Included | | Year 2: Licensing/SaaS | €0 | €60,000 | €0 | €180,000 | €0 | €480,000 | | Year 3: Maintenance/Dev | €180,000 | €0 | €220,000 | €0 | €280,000 | €0 | | Year 3: Infrastructure | €40,000 | Included | €80,000 | Included | €180,000 | Included | | Year 3: Licensing/SaaS | €0 | €60,000 | €0 | €180,000 | €0 | €480,000 | | 3-Year Total | €830,000 | €180,000 | €1,030,000 | €540,000 | €1,550,000 | €1,440,000 |
Key assumptions: Build costs include 2 FTE data engineers (€90K/yr each) + 1 FTE ML engineer (€100K/yr) for development, reduced to 1.5 FTE for ongoing maintenance. Buy is modeled at a representative per-asset SaaS rate that declines with volume — see current pricing for the rate at your scale. Infrastructure includes cloud compute, storage, Kafka, databases. These figures are illustrative of the cost structure, not a quote.
The structural takeaway holds regardless of the exact rate: Build front-loads a large fixed cost (development + a standing ML/data team), while Buy is a predictable subscription with zero build risk. The crossover — where building becomes cheaper per year — is typically only at large fleets (~1,500-2,000+ assets), and only if you already have the team. Below that, Buy is usually cheaper on total cost; at any scale it wins on time-to-value (weeks vs. months). Your actual savings depend on your negotiated rate — run the numbers against current pricing.
The Integration Problem
Here's the number most build-vs-buy analyses miss: integration with existing systems consumes 60-70% of total project time, regardless of whether you build or buy the PdM platform itself.
CMMS Integration
Your predictive maintenance system needs to create work orders, check maintenance history, and update asset records. The major CMMS platforms each have their own integration challenges:
- SAP PM: RFC/BAPI interfaces are well-documented but complex. Creating a maintenance notification (IW21 equivalent) via API requires mapping to SAP's specific data structures (functional location, equipment number, damage codes). Budget 4-6 weeks for a reliable bidirectional integration.
- IBM Maximo: REST API is modern and well-designed, but field mapping between your asset model and Maximo's hierarchy takes time. Maximo's workflow engine means your work orders need to respect existing approval chains.
- Infor EAM / Hexagon: API quality varies significantly by version. Older installations may require custom middleware.
Historian Integration
Most plants have years of valuable sensor data locked in a historian:
- OSIsoft PI (AVEVA PI): The PI Web API is capable but requires careful authentication setup (Kerberos or Basic Auth). Extracting historical data for ML training at scale (millions of readings) requires batched requests and rate limiting to avoid overloading the PI server. Budget 2-3 weeks.
- Honeywell PHD: Older API, often requires an on-premises integration server. Data extraction is slower than PI.
- GE Proficy Historian: OPC-HDA interface, generally straightforward but limited to pull-based data access.
ERP Integration
For spare parts procurement and cost tracking, you'll need ERP connectivity:
- SAP MM integration for automated purchase requisitions based on RUL predictions
- Cost center mapping for maintenance ROI tracking
- Plant hierarchy synchronization
The implication: Whether you build or buy the PdM engine, budget significant time and expertise for integration. A platform with pre-built CMMS connectors and documented historian integration patterns saves 3-6 months compared to building these from scratch.
Decision Framework
| Factor | Lean Build | Lean Buy | |--------|-----------|----------| | Data engineering team? | Yes, 3+ engineers | No or < 3 | | Timeline to first value? | 6+ months OK | Need results in weeks | | Asset types | 1-2 well-understood | Many, diverse equipment | | ML infrastructure | Kubernetes, MLflow, etc. | Starting from scratch | | Scale | 10,000+ assets | 50-5,000 assets | | Budget model | CapEx (build once) | OpEx (predictable monthly) |
Scored Decision Matrix
For a more structured evaluation, score each factor 1-5 for your organization:
| Factor | Weight | Score 1 (Lean Buy) | Score 5 (Lean Build) | |---|---|---|---| | Time to value | 25% | Need results in < 3 months | 12+ months acceptable | | Customization needs | 20% | Standard equipment types | Unique processes/equipment | | Team capability | 20% | No ML/data eng team | Experienced ML + data team | | Data sensitivity | 15% | Cloud-OK, standard DPA | Air-gapped, classified data | | Scale (asset count) | 10% | < 500 assets | > 5,000 assets | | Budget structure | 10% | Prefer OpEx (monthly) | Prefer CapEx (one-time) |
Worked example — mid-size chemical plant (300 assets):
- Time to value: 2 (board wants results this quarter) × 25% = 0.50
- Customization: 3 (some proprietary reactors) × 20% = 0.60
- Team capability: 2 (one data analyst, no ML) × 20% = 0.40
- Data sensitivity: 3 (standard cloud OK, EU residency required) × 15% = 0.45
- Scale: 2 (300 assets) × 10% = 0.20
- Budget: 2 (OpEx preferred) × 10% = 0.20
- Total: 2.35 → Strong Buy signal
A score below 2.5 points toward Buy. Above 3.5 points toward Build. Between 2.5 and 3.5 is the Hybrid zone.
The Hybrid Option
Many teams start with buy, then build around it. This is often the pragmatic choice — especially in the 2.5-3.5 scoring range. Here's a phased timeline:
Month 1-3: Deploy and validate. Deploy a platform for immediate anomaly detection on your most critical assets. Use this phase to prove value to leadership, build internal data literacy, and establish baseline metrics (current failure rates, downtime costs, spare parts spend). This is your "quick win" phase — don't try to boil the ocean.
Month 3-6: Integrate and extend. Use the platform API to feed predictions into custom dashboards and existing CMMS/ERP workflows. Export training data for specialized analysis. Start building internal expertise by working alongside the platform's models — understanding what they detect, what they miss, and why.
Month 6-12: Customize and optimize. Train specialized models for equipment the platform doesn't handle well (proprietary processes, unusual failure modes). Build custom integrations with legacy systems. At this point, you have 6+ months of labeled data (predictions that were confirmed or corrected by your team), which is gold for training.
Year 2+: Evaluate independence. With 12+ months of labeled data, trained models, and internal ML capability, you can make an informed decision: continue with the platform (because the operational burden of running your own is clear by now), migrate to a self-hosted solution, or run a hybrid where the platform handles standard equipment and your team handles the specialized stuff.
What to Look For in a Platform
When evaluating PdM vendors, these capabilities separate production-grade platforms from demos:
- Explainability — feature attribution (SHAP for tree models, Integrated Gradients for deep models), per-sensor contributions. If the vendor can't explain their predictions, neither can you. Ask them to show a real alert with feature attribution — not a marketing slide.
- Multi-model — Anomaly detection, RUL estimation, and fault classification. A platform that only does anomaly detection is solving 30% of the problem. You need all three to go from "something is wrong" to "here's what's wrong and when it will fail."
- Cold start handling — Pre-trained baselines that work from day one, not "wait 6 months for enough data." Ask what benchmark datasets the models were trained on and what accuracy they achieve out of the box.
- Tenant isolation — Cryptographic data separation. Ask specifically about row-level security vs. application-level filtering. If the vendor says "application-level," that means a single bug can expose your data to other customers.
- Edge support — Low-latency inference, data sovereignty, offline capability. If your facility has intermittent connectivity or strict data residency requirements, edge inference isn't optional.
- API-first architecture — If you can't automate it via API, you'll outgrow it. Check that work order creation, model status, alert management, and data export are all available programmatically.
- Transparent pricing — No "contact sales" for basic info. Per-asset pricing you can model before signing anything. Hidden costs (data storage overage, API call limits, premium support tiers) should be visible upfront.
Prevly offers a free pilot for up to 10 machines. Start here — see real predictions on your equipment in under a week.
Related reading: Edge vs cloud PdM · Predictive maintenance ROI · Predictive maintenance vs CMMS