Code examples:

Board-Ready Insights Generator — Part 2: Production Considerations

March 26, 2026 10-12 min read By Jaffar Kazi
AI Development Production AI Cost Analysis
Python C# Azure

The demo worked. Now let's talk about what it costs to run 50 insight decks a week, why a bad narrative is hard to debug, and five situations where you shouldn't build this at all.

In Part 1, I built the core pipeline: four agents that take a natural language question and produce a board-ready narrative. The architecture is clean, the output is genuinely useful, and consultants who've seen it want it immediately.

But production is different from demos. In production, you need to know what each deck costs. You need to trace a bad recommendation back to the agent that produced it. You need to make the Azure infrastructure choice before the client signs the contract. And you need to have an honest answer ready for when the partner asks "what happens if the AI gets this wrong?"

This is that article.

Real Cost Per Insight Deck

The pipeline runs four LLM calls per insight deck. Here are the real token counts and costs at current Azure OpenAI GPT-4o pricing ($2.50/M input tokens, $10.00/M output tokens as of March 2026):

Agent Avg Input Tokens Avg Output Tokens Cost
Intent Classifier 150 10 $0.0005
NL Query Agent (SQL gen) 2,800 400 $0.011
Data Analyst Agent 3,500 800 $0.017
Strategist Agent 2,200 600 $0.012
Exec Reporter Agent 3,000 700 $0.015
Total per deck 11,650 2,510 $0.054

About $0.05 per insight deck. At 50 decks per week (10 analysts, 5 each), that's $2.70/week in LLM costs — or roughly $140/year. The Azure Synapse Analytics query costs are larger: approximately $0.05-0.15 per query execution, though the Cosmos DB cache eliminates repeat-question costs entirely.

The Schema Search Cost You'll Miss

The Azure AI Search semantic queries add up separately. Each NL Query Agent call runs one semantic search against the schema index. At ~$1.00 per 1,000 semantic queries, that's negligible at 50 decks/week but becomes $2-3/day at enterprise scale. Factor it in before quoting a client.

Infrastructure costs (monthly estimates for a mid-size consulting team):

Azure Service Configuration Monthly Cost
Azure OpenAI (GPT-4o) Pay-per-token (S0) $12-20
Azure AI Search Standard S1, schema index $245
Cosmos DB (query cache) Serverless, ~5GB $8-15
Azure App Service (API) P1v3 $135
Azure Synapse Analytics Client-provided or 100 DWU $730 or client-owned
Total (excl. Synapse) ~$400/month

Observability — Tracing a Bad Narrative

When a board narrative is wrong, you need to know which of the four agents introduced the error. Was it a bad SQL query? A misidentified trend? A strategy framing that didn't match the client context? Or a Reporter that wrote confidently about the wrong thing?

The answer is structured logging at every agent boundary, with the full state snapshot attached. I use Azure Application Insights with custom events for each agent transition.

observability.py
from opencensus.ext.azure import metrics_exporter
from applicationinsights import TelemetryClient

tc = TelemetryClient(instrumentation_key=os.environ["APPINSIGHTS_KEY"])

def trace_agent(agent_name: str):
    """Decorator that logs agent input/output to App Insights."""
    def decorator(func):
        async def wrapper(state: InsightsState) -> InsightsState:
            start = time.monotonic()
            try:
                result = await func(state)
                duration_ms = (time.monotonic() - start) * 1000

                tc.track_event(f"agent.{agent_name}.success", properties={
                    "session_id":     state["session_id"],
                    "nl_question":    state["nl_question"][:200],
                    "intent":         state.get("intent", ""),
                    "cached":         str(state.get("cached", False)),
                    "duration_ms":    str(int(duration_ms)),
                    # Snapshot key outputs for debugging
                    "sql_generated":  state.get("generated_sql", "")[:500],
                    "findings_count": str(len(state.get("analyst_findings", []))),
                    "narrative_len":  str(len(state.get("narrative", ""))),
                })
                tc.track_metric(f"agent.{agent_name}.duration_ms", duration_ms)
                return result

            except Exception as e:
                tc.track_exception()
                tc.track_event(f"agent.{agent_name}.error", properties={
                    "session_id": state["session_id"],
                    "error":      str(e)[:500],
                })
                raise
        return wrapper
    return decorator

# Usage:
@trace_agent("nl_query")
async def nl_query(state: InsightsState) -> InsightsState:
    # ... implementation
Observability/AgentTracer.cs
public class AgentTracer(TelemetryClient telemetry)
{
    public async Task<T> TraceAsync<T>(
        string agentName,
        InsightsState state,
        Func<Task<T>> operation)
    {
        var sw = Stopwatch.StartNew();
        try
        {
            var result = await operation();
            sw.Stop();

            telemetry.TrackEvent($"agent.{agentName}.success",
                new Dictionary<string, string>
                {
                    ["session_id"]     = state.SessionId,
                    ["nl_question"]    = state.NlQuestion[..Math.Min(200, state.NlQuestion.Length)],
                    ["intent"]         = state.Intent,
                    ["cached"]         = state.Cached.ToString(),
                    ["duration_ms"]    = sw.ElapsedMilliseconds.ToString(),
                    ["sql_generated"]  = state.GeneratedSql[..Math.Min(500, state.GeneratedSql.Length)],
                    ["findings_count"] = state.AnalystFindings.Count.ToString(),
                    ["narrative_len"]  = state.Narrative.Length.ToString()
                });

            telemetry.TrackMetric($"agent.{agentName}.duration_ms",
                sw.ElapsedMilliseconds);

            return result;
        }
        catch (Exception ex)
        {
            telemetry.TrackException(ex,
                new Dictionary<string, string>
                {
                    ["session_id"] = state.SessionId,
                    ["agent"]      = agentName
                });
            throw;
        }
    }
}

With this in place, when a partner says "the narrative for the EMEA question was wrong last Tuesday", you filter App Insights by session_id, see the SQL that was generated, see what findings the Analyst produced, and identify exactly where the error was introduced. Debugging without this is archaeology.

Why Choose Python vs C# for This System

Factor Python (LangGraph) C# (Semantic Kernel)
Team background Data science / ML teams .NET enterprise teams
Graph visualisation Native LangGraph Studio support Manual implementation
Azure Synapse connector pyodbc / azure-synapse-spark Microsoft.Data.SqlClient (mature)
Structured output (JSON) Pydantic + response_format ResponseFormat = typeof(T)
Deployment Azure Container Apps or Functions Azure App Service or Functions
Azure Foundry Agent Service SDK in preview Native Semantic Kernel integration

My recommendation for this use case: If the team that will maintain this is already in the Microsoft ecosystem (.NET, Azure DevOps, existing C# services), use Semantic Kernel — the Azure Synapse connector is more mature and the Foundry Agent Service integration is cleaner. If the team comes from a data science background and wants to iterate quickly on prompts, Python with LangGraph has better tooling for debugging the graph.

One Specific Case for C#: Enterprise Client Data

Many large consulting clients have strict requirements around how their data is handled. If the client's warehouse is on Azure and the client IT team needs to audit the integration, a C# application running in their Azure subscription is far easier to handover and maintain than a Python service the consulting firm controls. This is a practical consideration, not a technical one.

Azure Infrastructure

Here's the production setup I recommend. The key decision is whether to use Azure AI Foundry Agent Service or self-host the pipeline.

Azure AI Foundry Agent Service (recommended for teams that don't want to manage infrastructure): Foundry provides managed orchestration, built-in thread management, and first-class Semantic Kernel integration. You define agents as plugins and Foundry handles execution, retry, and state. The trade-off is less control over the pipeline and higher per-request cost than self-hosted.

Self-hosted on Azure Container Apps (recommended for teams that need full control): The LangGraph pipeline runs as a FastAPI app in Container Apps. This gives you full visibility into the graph execution, native LangGraph Studio debugging, and lower per-request cost. The trade-off is you manage the runtime.

Service Purpose Notes
Azure OpenAI (GPT-4o) All four LLM calls Deploy in the client's Azure subscription if data sovereignty required
Azure Synapse Analytics Data warehouse queries Client typically provides; use Synapse Serverless for cost control
Azure AI Search Schema index retrieval Standard S1; re-index when warehouse schema changes
Cosmos DB (Serverless) Query result cache, session state 4-hour TTL on query cache; 24-hour TTL on session state
Azure Container Apps API runtime Scale to zero overnight; scale up during morning board prep rush
Azure Key Vault Secrets management Never hardcode Synapse credentials or OpenAI keys
Application Insights Observability Agent traces, token usage, latency per deck

When NOT to Build This

Be Honest About the Limits

This system accelerates the mechanical parts of insight generation. It doesn't replace the consultant's judgment about what question to ask, what context matters, or whether the data source is trustworthy. If you deploy this and the client treats it as a replacement for human judgment, you've created a liability, not a product.

Five situations where you shouldn't build this:

  1. Your warehouse schema is unstable. If tables are renamed, columns added, or relationships change weekly, the schema index goes stale and SQL generation breaks. This system needs a reasonably stable schema to be reliable. If schema governance is weak, fix that first.
  2. The questions require cross-system reasoning. If answering "why did margin drop?" requires joining your financial warehouse with CRM data in Salesforce, external market data, and a manual Excel spreadsheet, NL-to-SQL won't cover it. You need a proper data integration layer first.
  3. You need regulatory sign-off on every output. For heavily regulated sectors (financial services audit, healthcare compliance), AI-generated narratives may require explicit human review and sign-off before they reach the board. The system can assist, but if every output must be manually certified anyway, the time saving collapses.
  4. The client has data sovereignty requirements that preclude Azure OpenAI. If client data cannot leave a specific jurisdiction and there's no in-region Azure OpenAI deployment, this architecture doesn't work without significant adaptation. Check data residency requirements before scoping.
  5. Your team doesn't have the prompting skills to maintain it. The quality of output depends entirely on prompt quality. If no one on the team can evaluate whether the Exec Reporter prompt is producing good board language vs mediocre board language, you'll ship a system that degrades quietly over time as prompts drift. Invest in prompt engineering skills before deploying.

Key Takeaways

  • $0.05 per deck — The LLM cost is trivial compared to analyst time. The value equation is straightforward if the system is maintained properly.
  • Trace every agent boundary — Structured logging with state snapshots is the only reliable way to debug bad narratives. Do this from day one, not after the first incident.
  • Schema injection over fine-tuning — Injecting relevant table DDL at runtime beats fine-tuning for schema adherence, because schemas change and fine-tuned models don't update automatically.
  • Query caching is essential — Synapse queries are expensive and slow. The 4-hour Cosmos DB cache removes the cost and latency of repeat questions, which are common in board prep sessions.
  • Board language is a skill, not a format — The Exec Reporter prompt is the hardest prompt to get right. Invest time iterating on it with real senior consultants reviewing output.

If you want to dig deeper into the Part 1 architecture — the NL-to-SQL generation, the analyst pipeline, and the full LangGraph/Semantic Kernel implementation — start there: Board-Ready Insights Generator Part 1 →


Cost figures are based on Azure OpenAI pricing as of March 2026 and will change. Always verify current pricing at the Azure pricing calculator before scoping a client engagement.

Want More Practical AI Tutorials?

I write about building production AI systems with Azure, Python, and C#. Subscribe for practical tutorials delivered twice a month.

Subscribe to Newsletter →

Written by Jaffar Kazi, a software engineer in Sydney building AI-powered applications. Connect on LinkedIn.