The demo worked. Now let's talk about what it costs to run 50 insight decks a week, why a bad narrative is hard to debug, and five situations where you shouldn't build this at all.
In Part 1, I built the core pipeline: four agents that take a natural language question and produce a board-ready narrative. The architecture is clean, the output is genuinely useful, and consultants who've seen it want it immediately.
But production is different from demos. In production, you need to know what each deck costs. You need to trace a bad recommendation back to the agent that produced it. You need to make the Azure infrastructure choice before the client signs the contract. And you need to have an honest answer ready for when the partner asks "what happens if the AI gets this wrong?"
This is that article.
Real Cost Per Insight Deck
The pipeline runs four LLM calls per insight deck. Here are the real token counts and costs at current Azure OpenAI GPT-4o pricing ($2.50/M input tokens, $10.00/M output tokens as of March 2026):
| Agent | Avg Input Tokens | Avg Output Tokens | Cost |
|---|---|---|---|
| Intent Classifier | 150 | 10 | $0.0005 |
| NL Query Agent (SQL gen) | 2,800 | 400 | $0.011 |
| Data Analyst Agent | 3,500 | 800 | $0.017 |
| Strategist Agent | 2,200 | 600 | $0.012 |
| Exec Reporter Agent | 3,000 | 700 | $0.015 |
| Total per deck | 11,650 | 2,510 | $0.054 |
About $0.05 per insight deck. At 50 decks per week (10 analysts, 5 each), that's $2.70/week in LLM costs — or roughly $140/year. The Azure Synapse Analytics query costs are larger: approximately $0.05-0.15 per query execution, though the Cosmos DB cache eliminates repeat-question costs entirely.
The Schema Search Cost You'll Miss
The Azure AI Search semantic queries add up separately. Each NL Query Agent call runs one semantic search against the schema index. At ~$1.00 per 1,000 semantic queries, that's negligible at 50 decks/week but becomes $2-3/day at enterprise scale. Factor it in before quoting a client.
Infrastructure costs (monthly estimates for a mid-size consulting team):
| Azure Service | Configuration | Monthly Cost |
|---|---|---|
| Azure OpenAI (GPT-4o) | Pay-per-token (S0) | $12-20 |
| Azure AI Search | Standard S1, schema index | $245 |
| Cosmos DB (query cache) | Serverless, ~5GB | $8-15 |
| Azure App Service (API) | P1v3 | $135 |
| Azure Synapse Analytics | Client-provided or 100 DWU | $730 or client-owned |
| Total (excl. Synapse) | — | ~$400/month |
Observability — Tracing a Bad Narrative
When a board narrative is wrong, you need to know which of the four agents introduced the error. Was it a bad SQL query? A misidentified trend? A strategy framing that didn't match the client context? Or a Reporter that wrote confidently about the wrong thing?
The answer is structured logging at every agent boundary, with the full state snapshot attached. I use Azure Application Insights with custom events for each agent transition.
from opencensus.ext.azure import metrics_exporter
from applicationinsights import TelemetryClient
tc = TelemetryClient(instrumentation_key=os.environ["APPINSIGHTS_KEY"])
def trace_agent(agent_name: str):
"""Decorator that logs agent input/output to App Insights."""
def decorator(func):
async def wrapper(state: InsightsState) -> InsightsState:
start = time.monotonic()
try:
result = await func(state)
duration_ms = (time.monotonic() - start) * 1000
tc.track_event(f"agent.{agent_name}.success", properties={
"session_id": state["session_id"],
"nl_question": state["nl_question"][:200],
"intent": state.get("intent", ""),
"cached": str(state.get("cached", False)),
"duration_ms": str(int(duration_ms)),
# Snapshot key outputs for debugging
"sql_generated": state.get("generated_sql", "")[:500],
"findings_count": str(len(state.get("analyst_findings", []))),
"narrative_len": str(len(state.get("narrative", ""))),
})
tc.track_metric(f"agent.{agent_name}.duration_ms", duration_ms)
return result
except Exception as e:
tc.track_exception()
tc.track_event(f"agent.{agent_name}.error", properties={
"session_id": state["session_id"],
"error": str(e)[:500],
})
raise
return wrapper
return decorator
# Usage:
@trace_agent("nl_query")
async def nl_query(state: InsightsState) -> InsightsState:
# ... implementation
public class AgentTracer(TelemetryClient telemetry)
{
public async Task<T> TraceAsync<T>(
string agentName,
InsightsState state,
Func<Task<T>> operation)
{
var sw = Stopwatch.StartNew();
try
{
var result = await operation();
sw.Stop();
telemetry.TrackEvent($"agent.{agentName}.success",
new Dictionary<string, string>
{
["session_id"] = state.SessionId,
["nl_question"] = state.NlQuestion[..Math.Min(200, state.NlQuestion.Length)],
["intent"] = state.Intent,
["cached"] = state.Cached.ToString(),
["duration_ms"] = sw.ElapsedMilliseconds.ToString(),
["sql_generated"] = state.GeneratedSql[..Math.Min(500, state.GeneratedSql.Length)],
["findings_count"] = state.AnalystFindings.Count.ToString(),
["narrative_len"] = state.Narrative.Length.ToString()
});
telemetry.TrackMetric($"agent.{agentName}.duration_ms",
sw.ElapsedMilliseconds);
return result;
}
catch (Exception ex)
{
telemetry.TrackException(ex,
new Dictionary<string, string>
{
["session_id"] = state.SessionId,
["agent"] = agentName
});
throw;
}
}
}
With this in place, when a partner says "the narrative for the EMEA question was wrong last Tuesday", you filter App Insights by session_id, see the SQL that was generated, see what findings the Analyst produced, and identify exactly where the error was introduced. Debugging without this is archaeology.
Why Choose Python vs C# for This System
| Factor | Python (LangGraph) | C# (Semantic Kernel) |
|---|---|---|
| Team background | Data science / ML teams | .NET enterprise teams |
| Graph visualisation | Native LangGraph Studio support | Manual implementation |
| Azure Synapse connector | pyodbc / azure-synapse-spark | Microsoft.Data.SqlClient (mature) |
| Structured output (JSON) | Pydantic + response_format | ResponseFormat = typeof(T) |
| Deployment | Azure Container Apps or Functions | Azure App Service or Functions |
| Azure Foundry Agent Service | SDK in preview | Native Semantic Kernel integration |
My recommendation for this use case: If the team that will maintain this is already in the Microsoft ecosystem (.NET, Azure DevOps, existing C# services), use Semantic Kernel — the Azure Synapse connector is more mature and the Foundry Agent Service integration is cleaner. If the team comes from a data science background and wants to iterate quickly on prompts, Python with LangGraph has better tooling for debugging the graph.
One Specific Case for C#: Enterprise Client Data
Many large consulting clients have strict requirements around how their data is handled. If the client's warehouse is on Azure and the client IT team needs to audit the integration, a C# application running in their Azure subscription is far easier to handover and maintain than a Python service the consulting firm controls. This is a practical consideration, not a technical one.
Azure Infrastructure
Here's the production setup I recommend. The key decision is whether to use Azure AI Foundry Agent Service or self-host the pipeline.
Azure AI Foundry Agent Service (recommended for teams that don't want to manage infrastructure): Foundry provides managed orchestration, built-in thread management, and first-class Semantic Kernel integration. You define agents as plugins and Foundry handles execution, retry, and state. The trade-off is less control over the pipeline and higher per-request cost than self-hosted.
Self-hosted on Azure Container Apps (recommended for teams that need full control): The LangGraph pipeline runs as a FastAPI app in Container Apps. This gives you full visibility into the graph execution, native LangGraph Studio debugging, and lower per-request cost. The trade-off is you manage the runtime.
| Service | Purpose | Notes |
|---|---|---|
| Azure OpenAI (GPT-4o) | All four LLM calls | Deploy in the client's Azure subscription if data sovereignty required |
| Azure Synapse Analytics | Data warehouse queries | Client typically provides; use Synapse Serverless for cost control |
| Azure AI Search | Schema index retrieval | Standard S1; re-index when warehouse schema changes |
| Cosmos DB (Serverless) | Query result cache, session state | 4-hour TTL on query cache; 24-hour TTL on session state |
| Azure Container Apps | API runtime | Scale to zero overnight; scale up during morning board prep rush |
| Azure Key Vault | Secrets management | Never hardcode Synapse credentials or OpenAI keys |
| Application Insights | Observability | Agent traces, token usage, latency per deck |
When NOT to Build This
Be Honest About the Limits
This system accelerates the mechanical parts of insight generation. It doesn't replace the consultant's judgment about what question to ask, what context matters, or whether the data source is trustworthy. If you deploy this and the client treats it as a replacement for human judgment, you've created a liability, not a product.
Five situations where you shouldn't build this:
- Your warehouse schema is unstable. If tables are renamed, columns added, or relationships change weekly, the schema index goes stale and SQL generation breaks. This system needs a reasonably stable schema to be reliable. If schema governance is weak, fix that first.
- The questions require cross-system reasoning. If answering "why did margin drop?" requires joining your financial warehouse with CRM data in Salesforce, external market data, and a manual Excel spreadsheet, NL-to-SQL won't cover it. You need a proper data integration layer first.
- You need regulatory sign-off on every output. For heavily regulated sectors (financial services audit, healthcare compliance), AI-generated narratives may require explicit human review and sign-off before they reach the board. The system can assist, but if every output must be manually certified anyway, the time saving collapses.
- The client has data sovereignty requirements that preclude Azure OpenAI. If client data cannot leave a specific jurisdiction and there's no in-region Azure OpenAI deployment, this architecture doesn't work without significant adaptation. Check data residency requirements before scoping.
- Your team doesn't have the prompting skills to maintain it. The quality of output depends entirely on prompt quality. If no one on the team can evaluate whether the Exec Reporter prompt is producing good board language vs mediocre board language, you'll ship a system that degrades quietly over time as prompts drift. Invest in prompt engineering skills before deploying.
Key Takeaways
- $0.05 per deck — The LLM cost is trivial compared to analyst time. The value equation is straightforward if the system is maintained properly.
- Trace every agent boundary — Structured logging with state snapshots is the only reliable way to debug bad narratives. Do this from day one, not after the first incident.
- Schema injection over fine-tuning — Injecting relevant table DDL at runtime beats fine-tuning for schema adherence, because schemas change and fine-tuned models don't update automatically.
- Query caching is essential — Synapse queries are expensive and slow. The 4-hour Cosmos DB cache removes the cost and latency of repeat questions, which are common in board prep sessions.
- Board language is a skill, not a format — The Exec Reporter prompt is the hardest prompt to get right. Invest time iterating on it with real senior consultants reviewing output.
If you want to dig deeper into the Part 1 architecture — the NL-to-SQL generation, the analyst pipeline, and the full LangGraph/Semantic Kernel implementation — start there: Board-Ready Insights Generator Part 1 →
Cost figures are based on Azure OpenAI pricing as of March 2026 and will change. Always verify current pricing at the Azure pricing calculator before scoping a client engagement.
Want More Practical AI Tutorials?
I write about building production AI systems with Azure, Python, and C#. Subscribe for practical tutorials delivered twice a month.
Subscribe to Newsletter →