In Part 1, we built a predictive maintenance agent with sensor ingestion, hybrid anomaly detection, and LLM-driven root cause analysis. Now let's talk about what it costs to run and when you shouldn't use it.
Part 2 covers the questions every engineering manager asks before approving a production deployment: How much does this cost per asset? How do I debug false positives? Should my team use Python or C#? And most importantly — when is this approach overkill?
What You'll Learn
- Real token costs per asset per month for the hybrid detection approach
- Observability patterns for tracing anomaly detection decisions
- Python vs C# decision framework for industrial IoT workloads
- Azure infrastructure requirements and Foundry Agent Service
- Five scenarios where AI predictive maintenance is the wrong choice
Cost Analysis
Let's break down the real costs. The hybrid approach from Part 1 has three cost layers: sensor ingestion (fixed), statistical screening (negligible), and LLM analysis (variable).
Token Costs (GPT-4o)
The LLM only runs when Stage 1 statistical screening flags an anomaly. For well-tuned thresholds, that's roughly 5-15% of sensor batches.
| Operation | Input Tokens | Output Tokens | Cost per Call | Daily Calls (per asset) |
|---|---|---|---|---|
| Anomaly analysis (Stage 2) | ~800 | ~150 | $0.0049 | 7-20 |
| Root cause diagnosis | ~2,000 | ~400 | $0.014 | 1-3 |
| RAG retrieval (embedding) | ~200 | — | $0.00002 | 1-3 |
Per-asset monthly token cost: $3.50 - $8.00 (assuming 10 anomaly checks/day, 2 root cause diagnoses/day).
Infrastructure Costs
| Service | Purpose | Monthly Cost |
|---|---|---|
| Azure IoT Hub (S1) | Sensor ingestion | $25/unit (400K messages/day) |
| Azure Stream Analytics (1 SU) | Real-time processing | ~$80 |
| Azure OpenAI (GPT-4o) | LLM inference | Pay-per-token (see above) |
| Azure AI Search (Basic) | Historical failure retrieval | ~$75 |
| Azure App Service (B2) | Agent hosting | ~$55 |
| Cosmos DB (serverless) | State persistence | ~$25-50 |
Cost Reality
Fixed infrastructure: ~$260-285/month regardless of asset count.
Variable (per-asset): ~$4-8/month in token costs.
For 50 assets: ~$460-685/month total (~$9-14/asset/month). The fixed costs dominate at low asset counts. This approach becomes cost-effective at 20+ monitored assets.
Observability and Debugging
When a maintenance team gets a false alarm at 3 AM, you need to trace exactly why the system flagged it. That means structured logging at every pipeline stage with correlation IDs.
import structlog
import uuid
from functools import wraps
logger = structlog.get_logger()
def trace_pipeline_stage(stage_name: str):
"""Decorator that logs entry/exit for each pipeline stage."""
def decorator(func):
@wraps(func)
async def wrapper(state: PipelineState, *args, **kwargs):
trace_id = state.get("trace_id", str(uuid.uuid4()))
state["trace_id"] = trace_id
logger.info(
"pipeline_stage_start",
trace_id=trace_id,
stage=stage_name,
asset_id=state["asset_id"],
)
result = await func(state, *args, **kwargs)
# Log key decisions for debugging
if stage_name == "anomaly_detection":
logger.info(
"anomaly_decision",
trace_id=trace_id,
is_anomaly=result["anomaly"].is_anomaly
if result.get("anomaly") else False,
confidence=result["anomaly"].confidence
if result.get("anomaly") else 0,
features_snapshot={
k: round(v, 4)
for k, v in (result.get("features") or {}).items()
},
)
return result
return wrapper
return decorator
# Usage:
@trace_pipeline_stage("ingestion")
async def ingest_agent(state: PipelineState) -> PipelineState:
# ... implementation
pass
using Microsoft.Extensions.Logging;
using System.Diagnostics;
public class PipelineTracing
{
private readonly ILogger<PipelineTracing> _logger;
public async Task<PipelineState> TraceStageAsync(
string stageName,
PipelineState state,
Func<PipelineState, Task<PipelineState>> stageFunc)
{
var traceId = state.TraceId ?? Guid.NewGuid().ToString();
state.TraceId = traceId;
var sw = Stopwatch.StartNew();
_logger.LogInformation(
"Pipeline stage {Stage} started for asset {AssetId}. " +
"TraceId: {TraceId}",
stageName, state.AssetId, traceId);
var result = await stageFunc(state);
sw.Stop();
if (stageName == "anomaly_detection" &&
result.Anomaly is not null)
{
_logger.LogInformation(
"Anomaly decision: {IsAnomaly}, " +
"Confidence: {Confidence:F2}, " +
"Duration: {Duration}ms. TraceId: {TraceId}",
result.Anomaly.IsAnomaly,
result.Anomaly.Confidence,
sw.ElapsedMilliseconds,
traceId);
}
return result;
}
}
What to monitor in production:
- False positive rate — Track anomalies flagged vs. confirmed by maintenance teams. Target: <20%.
- Stage 1 pass-through rate — What percentage of sensor batches reach the LLM? Above 25% means your Z-score threshold is too loose.
- LLM latency (P95) — Root cause analysis should complete within 5 seconds. Alert if P95 exceeds 10s.
- Token consumption per asset — Track daily to catch cost anomalies early.
Debugging False Positives
When maintenance reports a false alarm, pull the trace by
trace_id. Check: (1) which features triggered Stage 1, (2) what the LLM
prompt contained, (3) what the LLM responded. 90% of false positives trace back to
stale baselines — equipment that changed operating conditions but the baseline
wasn't updated.
Technology Choices: Python vs C#
Python Implementation
Why choose Python: If your team writes Python, you get access to the richest AI/ML and data engineering ecosystem.
- Library ecosystem — LangGraph, NumPy, pandas, scikit-learn for feature engineering and statistical analysis
- Rapid prototyping — Jupyter notebooks for tuning anomaly thresholds interactively
- Community — Most IoT + AI tutorials and examples are Python-first
- Data science integration — Easy to hand off feature pipelines to data scientists for model improvement
C#/.NET Implementation
Why choose C#: If your backend and industrial control systems run .NET, you get first-party Microsoft support and enterprise patterns.
- Native Azure IoT integration — First-party SDKs for IoT Hub, Event Hubs, Stream Analytics
- Enterprise patterns — Dependency injection, strong typing, mature error handling for 24/7 operations
- Performance — Better throughput for high-frequency sensor ingestion (sub-millisecond processing)
- Existing OT stack — Many SCADA/OPC-UA integrations are .NET-based
The Bottom Line
This is primarily a team and stack decision. Both approaches are production-ready.
Python team + data science focus? Use Python. You'll iterate faster on feature engineering.
C#/.NET team + enterprise OT stack? Use C#. You'll integrate with existing industrial systems more easily. Don't fight your stack.
Azure Infrastructure
Here's the minimum Azure setup for a production deployment:
| Service | Purpose | Starting Price |
|---|---|---|
| Azure IoT Hub (S1) | Sensor data ingestion, device management | $25/month per unit |
| Azure Stream Analytics | Real-time windowed aggregation | ~$80/month (1 SU) |
| Azure OpenAI Service | GPT-4o inference, embeddings | Pay-per-token |
| Azure AI Search (Basic) | Historical failure record retrieval | ~$75/month |
| Azure Cosmos DB (Serverless) | Pipeline state, baselines | Pay-per-RU |
| Azure App Service (B2) | Agent hosting | ~$55/month |
Azure AI Foundry Agent Service
Azure AI Foundry Agent Service is now generally available, providing managed orchestration for AI agent systems.
- Built-in routing and workflow orchestration
- Managed state persistence (no need for separate Cosmos DB for state)
- Native Azure OpenAI integration with automatic retry/rate limiting
- Observability through Azure Monitor and Application Insights
For predictive maintenance specifically, Foundry Agent Service can replace the custom LangGraph/Semantic Kernel orchestration layer, reducing code you maintain. The trade-off is less control over pipeline flow and potential vendor lock-in.
Check Azure AI Foundry Agent Service for current pricing.
When NOT to Use AI Predictive Maintenance
AI-based predictive maintenance is powerful, but it's not always the right tool. Here are five scenarios where simpler approaches win.
Skip AI Predictive Maintenance When:
- You have fewer than 10 critical assets. The fixed infrastructure cost ($260+/month) doesn't justify itself. Use simple threshold alerts with a spreadsheet tracking maintenance history.
- Your equipment has no sensor instrumentation. The AI agent needs data. If your machines don't have vibration, temperature, or pressure sensors, start with an IoT retrofit first — that's a 3-6 month project before you even build the AI layer.
- Failure modes are purely random. Some equipment fails unpredictably (lightning strikes, contamination events). If failures don't correlate with gradual sensor degradation, pattern detection won't help. Invest in redundancy instead.
- Your historical maintenance records are sparse. The root cause agent relies on RAG retrieval against past failures. If your CMMS has fewer than 100 documented failure events across your equipment types, the LLM won't have enough context for useful diagnoses. Build the data foundation first.
- A simple rules engine solves 90% of your cases. If "temperature above X for Y minutes" catches most failures, you don't need an LLM. Build a rules engine with configurable thresholds. Add AI later when the remaining 10% of unpredictable failures starts costing real money.
The honest truth: for many small-to-medium operations, a well-configured condition-based monitoring system with manual review catches 80% of what AI predictive maintenance catches, at 20% of the cost. AI earns its keep when your downtime costs are high enough that catching the remaining 20% pays for the infrastructure.
Key Takeaways
- Hybrid detection controls costs: Statistical screening + LLM analysis keeps token spend to $4-8/asset/month by only invoking GPT-4o on flagged readings
- Observability is non-negotiable: Correlation IDs through every pipeline stage. When maintenance disputes a recommendation, you need to trace the full decision chain in seconds
- Start with the data: AI predictive maintenance is only as good as your sensor coverage and historical failure records. If either is weak, fix that first
- Know your break-even: At $260+ fixed monthly infrastructure, you need at least 20 assets or very high downtime costs for the economics to work
The best predictive maintenance system is the one your operations team actually trusts. Explainable recommendations with confidence scores beat opaque ML models every time.
If you haven't read Part 1 yet, start there for the architecture and core implementation: Part 1 - Architecture and Core Implementation.
This article covers production considerations for AI predictive maintenance. Actual costs vary by Azure region, negotiated pricing, and usage patterns. Always run a pilot with real sensor data before committing to full deployment.
Want More Practical AI Tutorials?
I write about building production AI systems with Azure, Python, and C#. Subscribe for practical tutorials delivered twice a month.
Subscribe to Newsletter →