Incident & Quality Intelligence Assistant: Part 2 — Production Considerations

From Working Demo to Running System

In Part 1, we built the core pipeline: a three-agent system that classifies incident root causes, clusters them against historical data using vector similarity, and generates prioritised recommendations. The code works. The architecture makes sense. Now for the questions that actually determine whether you ship this to production or shelve it after the demo.

This part covers the four things that matter most for production deployment: what it actually costs to run per incident, how to debug when an agent gives you a wrong answer, which technology stack you should choose based on your team's reality, and — most importantly — the honest assessment of when this system is the wrong tool entirely.

Cost Analysis: What This Actually Costs to Run

Every incident processed by the pipeline touches three LLM calls: the classifier, the cluster summariser, and the risk recommendation generator. Here's what that looks like in real token numbers against Azure OpenAI GPT-4o pricing (as of 2026):

Agent	Input Tokens (avg)	Output Tokens (avg)	Cost / incident
Root Cause Classifier	~800	~250	~$0.008
Cluster Summariser	~600	~100	~$0.005
Risk & Recommendation	~1,200	~400	~$0.012
Total per incident	~2,600	~750	~$0.025

At 2.5 cents per incident, a team running 200 incidents per month spends $5/month on LLM calls. A team running 2,000 incidents per month spends $50/month. The embeddings for vector search add another ~$0.001 per incident (text-embedding-3-large is very cheap).

Where Cost Scales Up

The numbers above assume incidents with descriptions under ~500 words. If your team links full runbooks or Slack thread transcripts into incident tickets, input tokens can jump 5–10x. Either trim inputs at ingestion time (recommended) or switch to GPT-4o-mini for the classifier step, which handles structured classification just as well at roughly one-tenth the cost.

Infrastructure cost beyond LLM calls:

Service	Tier	Monthly Cost (est.)
Azure Service Bus	Standard (1M ops/mo)	~$10
Azure Cosmos DB	Serverless (10 RU/s avg)	~$5–$15
Azure AI Search	Basic (1 replica)	~$75
Azure Container Apps (pipeline runtime)	Consumption plan	~$5–$20
Infrastructure total		~$95–$120/month

The AI Search cost is the dominant line item. If you already have an Azure AI Search instance running for another purpose, this entire system becomes nearly free on the infrastructure side. The tricky decision is whether the Basic tier (1 replica, no SLA) is acceptable. For an internal tooling system where a few minutes of downtime doesn't cascade into user-facing issues, yes. For anything that feeds real-time on-call workflows, step up to Standard.

Observability & Debugging

The most common debugging scenario: an engineer looks at a recommendation and says "that's wrong, the root cause is clearly X not Y." How do you trace back through an agent pipeline to find where the reasoning went off track?

The answer is structured logging at every state transition. Each agent writes its full input, output, confidence score, and elapsed time to Application Insights. When you're debugging, you pull the trace for a specific incident ID and you can see exactly what each agent received and produced.

telemetry.py / Telemetry.cs

import time
from azure.monitor.opentelemetry import configure_azure_monitor
from opentelemetry import trace

configure_azure_monitor(connection_string=os.environ["APPINSIGHTS_CONNECTION_STRING"])
tracer = trace.get_tracer("incident-pipeline")

def trace_agent(agent_name: str):
    """Decorator that wraps any agent function with tracing."""
    def decorator(fn):
        async def wrapper(state: IncidentState) -> IncidentState:
            with tracer.start_as_current_span(agent_name) as span:
                span.set_attribute("incident.id",       state.signal.id)
                span.set_attribute("incident.severity", state.signal.severity)

                start = time.monotonic()
                try:
                    result = await fn(state)
                    elapsed = time.monotonic() - start

                    # Log what the agent produced
                    if result.classification:
                        span.set_attribute("classification.category",   result.classification.root_cause_category.value)
                        span.set_attribute("classification.confidence", result.classification.confidence)

                    if result.cluster_match:
                        span.set_attribute("cluster.id",    result.cluster_match.cluster_id)
                        span.set_attribute("cluster.score", result.cluster_match.similarity_score)

                    span.set_attribute("agent.elapsed_ms", int(elapsed * 1000))
                    return result

                except Exception as e:
                    span.record_exception(e)
                    state.errors.append(f"{agent_name}: {str(e)}")
                    return state

        return wrapper
    return decorator

# Usage
@trace_agent("classify")
async def classify_agent(state: IncidentState) -> IncidentState:
    # ... implementation
    pass

using Azure.Monitor.OpenTelemetry.AspNetCore;
using OpenTelemetry.Trace;

// In Program.cs / Startup
builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddSource("IncidentPipeline")
        .AddAzureMonitorTraceExporter(o =>
            o.ConnectionString = config["AppInsights:ConnectionString"]));

// In each step
public class ClassifyStep : KernelProcessStep
{
    private static readonly ActivitySource _tracer = new("IncidentPipeline");

    [KernelFunction("Run")]
    public async Task RunAsync(KernelProcessStepContext context, IncidentState state)
    {
        using var activity = _tracer.StartActivity("classify");
        activity?.SetTag("incident.id",       state.Signal.Id);
        activity?.SetTag("incident.severity", state.Signal.Severity);

        var sw = Stopwatch.StartNew();
        try
        {
            // ... classification logic ...

            activity?.SetTag("classification.category",   state.Classification?.RootCauseCategory.ToString());
            activity?.SetTag("classification.confidence", state.Classification?.Confidence);
            activity?.SetTag("agent.elapsed_ms",          sw.ElapsedMilliseconds);
        }
        catch (Exception ex)
        {
            activity?.RecordException(ex);
            state.Errors.Add($"classify: {ex.Message}");
        }
    }
}

The Debugging Query You'll Use Constantly

In Application Insights, this KQL query pulls the full agent trace for any incident:

dependencies
| where customDimensions["incident.id"] == "INC-1234"
| project timestamp, name, duration,
          category = customDimensions["classification.category"],
          confidence = customDimensions["classification.confidence"],
          cluster = customDimensions["cluster.id"]
| order by timestamp asc

Beyond per-incident tracing, you'll want two Azure Monitor alerts set up from day one: one for pipeline latency (alert if p95 exceeds 30 seconds — that usually means the LLM is throttling), and one for classification confidence (alert if average confidence drops below 0.6 over a sliding window — that's a signal your incident descriptions have changed format and the prompt needs updating).

Technology Choices: Python vs C#

Python Implementation

Why choose Python: If your team writes Python, you get access to the richest AI/ML ecosystem available anywhere.

Library ecosystem — LangChain, LangGraph, thousands of community tools; new AI patterns land in Python first
Rapid iteration — prompt changes and agent rewiring can be tested in Jupyter without a full rebuild cycle
Async support — LangGraph's async graph execution maps cleanly to the Service Bus event model
Debugging ergonomics — LangGraph's built-in state inspection makes it easy to step through graph execution and see exactly where an agent's reasoning went wrong

C#/.NET Implementation

Why choose C#: If your incident tooling and existing backend runs on .NET, Semantic Kernel gives you first-party Microsoft support and enterprise integration patterns.

Native Azure integration — Semantic Kernel is Microsoft-maintained and ships with Azure OpenAI, Azure AI Search, and Cosmos DB connectors out of the box
Enterprise patterns — dependency injection, strong typing, and structured logging integrate with whatever ASP.NET Core stack you already have
Deployment familiarity — if your ops team knows how to deploy .NET services to Azure Container Apps, you don't add a new runtime to the mix
Process model — Semantic Kernel's Process abstraction maps naturally to the incident pipeline's sequential-with-conditional-routing pattern

The Bottom Line

Python team? Use LangGraph. C#/.NET team? Use Semantic Kernel. Don't fight your stack — the right framework is the one your team can debug at 2am during an actual incident.

Azure Infrastructure

The minimal Azure footprint for a production deployment:

Azure OpenAI Service — GPT-4o deployment (gpt-4o, 2024-08-06 or later) + text-embedding-3-large. Use PTU (provisioned throughput) if you're processing more than 500 incidents/day to avoid rate-limit interruptions.
Azure Service Bus — Standard tier for reliable event queuing. Create one topic per source system (pagerduty-incidents, jira-tickets, azure-alerts) so you can pause ingest from one source without affecting others.
Azure Cosmos DB — Serverless for most teams. Move to provisioned throughput only when you're querying the incident store heavily for reporting — serverless handles write-heavy, read-light patterns very efficiently.
Azure AI Search — Basic tier for up to ~500K incident documents. The vector index is the key component; use semantic ranking (semantic search add-on) to improve cluster search quality beyond pure cosine similarity.
Azure Container Apps — Host the pipeline worker on the Consumption plan. Use KEDA's Service Bus trigger for scale-to-zero behaviour; the pipeline only runs when there are messages in the queue.

Azure AI Foundry Agent Service

Azure AI Foundry Agent Service is now generally available and worth considering as an alternative to self-managing LangGraph or Semantic Kernel orchestration.

Built-in multi-agent routing and workflow management
Managed state persistence — you don't manage Cosmos DB schemas for agent state
Native Azure OpenAI integration with automatic retry and throttling handling
Observability through Azure Monitor without custom instrumentation

The trade-off: less control over the exact prompt structure and routing logic compared to rolling your own LangGraph pipeline. For teams that want to move fast and don't need precise control over the classification prompt format, Foundry Agent Service removes significant infrastructure overhead.

Check Azure AI Foundry Agent Service for current pricing.

When NOT to Build This

The honest part: most of the teams asking about this system don't actually need it yet.

Don't Build This If...

You have fewer than 20 incidents per month. At that volume, a human reading the incidents once a week in 30 minutes is faster, cheaper, and more accurate than an AI pipeline. The overhead of building and maintaining the system isn't justified.
Your incident descriptions are too short to classify. If your team writes "server down, fixed" as the entire incident description, there's nothing for the classifier to work with. The prerequisite is structured incident writing — the AI amplifies existing process quality, it doesn't substitute for it.
You don't have a baseline process. If there's no post-mortem process, no action item tracking, and no regular review of incident patterns — adding AI doesn't fix that. It surfaces patterns nobody will act on. Fix the process first.
Your incidents are extremely high-sensitivity. Security incidents or compliance-related failures may have restrictions on what data can be sent to an LLM. Check before building.

A simpler alternative worth considering first: build a static report that groups incidents by service and severity, runs weekly, and sends it to a Slack channel. Takes a day to build. If the team actually reads it and acts on it, then you've validated the underlying value and you're ready to invest in AI-powered analysis. If nobody reads the simple report, nobody will act on the AI recommendations either.

Key Takeaways

The Incident & Quality Intelligence Assistant works well in practice because it's solving a real bottleneck — the gap between incident data accumulating and pattern analysis actually happening. The AI doesn't replace engineering judgment; it does the data-correlation work that humans are too slow and too busy to do manually.

Production numbers to carry with you:

~$0.025 per incident in LLM costs — negligible against the engineering time it saves
~$95–$120/month in base infrastructure, dominated by Azure AI Search
Semantic threshold of 0.82 with structural overlap filter — tune this against your own data before going live
Classification confidence below 0.6 as a monitoring threshold — that's your canary for prompt drift

The technology choice is simpler than it looks: use the stack your team already knows. LangGraph and Semantic Kernel both deliver the full feature set. The differentiator isn't the framework — it's whether your team can maintain it at 2am.

If you haven't read Part 1 yet, start there — it covers the architecture diagram, the state model, and the three core agent implementations in full.

Want More Practical AI Tutorials?

I write about building production AI systems with Azure, Python, and C#. Subscribe for practical tutorials delivered twice a month.

Subscribe to Newsletter →