Most AI tutorials stop at the demo. That's exactly where the real work starts.
In Part 1, I built the core architecture for a Financial Close & Variance Review Copilot: ERP data ingestion, variance calculation, RAG-augmented commentary generation, and triage routing. The code works. A demo runs end-to-end in under a minute.
But finance is one of the domains where "it works in demo" is the lowest possible bar. Auditors need to trace every commentary line back to the exact model call, prompt, and context that produced it. Finance directors need to know what this costs per close. And the CFO's office needs to understand what happens when the model is confidently wrong.
This part covers what I had to build before I'd feel comfortable running this in a real month-end close: cost analysis with actual numbers, observability for audit trails, the technology decision framework, and — honestly — the scenarios where you should skip AI entirely.
Cost Analysis
Let's work through a real example: a 300-account close with 180 accounts above the materiality threshold (60% of accounts generate commentary). Each commentary call uses roughly 800 input tokens (prompt + historical context) and produces 120 output tokens.
| Component | Per Close | Monthly (1 close) | Annual |
|---|---|---|---|
| GPT-4o input tokens (180 × 800) | 144,000 tokens | $1.80 | $21.60 |
| GPT-4o output tokens (180 × 120) | 21,600 tokens | $1.08 | $12.96 |
| Azure AI Search (vector queries) | 180 queries | ~$8.00 | ~$96.00 |
| Azure App Service (hosting) | — | ~$25.00 | ~$300.00 |
| Total infrastructure | ~$36/month | ~$430/year |
Cost Reality Check
$36/month for a 300-account close is genuinely cheap. But costs grow linearly with accounts and non-linearly with context window size. A company with 1,500 accounts and richer historical context (3,000 input tokens per call) will pay ~$200–300/month — still a rounding error compared to the analyst hours saved, but worth modelling before you commit.
One cost lever worth knowing: you can use GPT-4o mini for accounts that are below your materiality threshold but still need basic commentary (pure factual statements, no context needed). At roughly 15× cheaper per token, this can halve your total LLM spend if you have a long tail of small-variance accounts.
Observability & Debugging
Finance is auditable. Every number in the management report needs to be traceable to its source. For AI-generated commentary, that means being able to answer: "What context did the model see when it wrote this sentence?"
I implement this with a structured trace log on each commentary item, written alongside the output:
import uuid
from datetime import datetime, timezone
@dataclass
class CommentaryTrace:
trace_id: str
account_code: str
period: str
model: str # "gpt-4o-2024-11-20"
input_tokens: int
output_tokens: int
temperature: float
retrieved_docs: list # Document IDs from Azure AI Search
prompt_hash: str # SHA-256 of the exact prompt sent
generated_at: str # ISO 8601 UTC timestamp
commentary: str
async def generate_with_trace(self, variance: dict, context: list) -> CommentaryTrace:
prompt = self._build_prompt(variance, context)
response = await self.llm.chat_with_usage(
[{"role": "user", "content": prompt}],
temperature=0.3, max_tokens=150
)
return CommentaryTrace(
trace_id=str(uuid.uuid4()),
account_code=variance["account_code"],
period=variance["period"],
model=response.model,
input_tokens=response.usage.prompt_tokens,
output_tokens=response.usage.completion_tokens,
temperature=0.3,
retrieved_docs=[doc["id"] for doc in context],
prompt_hash=hashlib.sha256(prompt.encode()).hexdigest()[:16],
generated_at=datetime.now(timezone.utc).isoformat(),
commentary=response.content
)
public record CommentaryTrace
{
public string TraceId { get; init; } = Guid.NewGuid().ToString();
public string AccountCode { get; init; } = string.Empty;
public string Period { get; init; } = string.Empty;
public string Model { get; init; } = string.Empty;
public int InputTokens { get; init; }
public int OutputTokens { get; init; }
public float Temperature { get; init; }
public List<string> RetrievedDocs { get; init; } = new();
public string PromptHash { get; init; } = string.Empty;
public DateTimeOffset GeneratedAt { get; init; } = DateTimeOffset.UtcNow;
public string Commentary { get; init; } = string.Empty;
}
public async Task<CommentaryTrace> GenerateWithTraceAsync(
VarianceResult variance, List<SearchResult> context)
{
var prompt = BuildPrompt(variance, context);
var result = await _kernel.InvokePromptAsync(prompt,
executionSettings: new OpenAIPromptExecutionSettings
{
Temperature = 0.3f, MaxTokens = 150
});
var metadata = result.Metadata;
return new CommentaryTrace
{
AccountCode = variance.AccountCode,
Period = variance.Period,
Model = metadata?["model"]?.ToString() ?? "unknown",
InputTokens = (int)(metadata?["usage.prompt_tokens"] ?? 0),
OutputTokens = (int)(metadata?["usage.completion_tokens"] ?? 0),
Temperature = 0.3f,
RetrievedDocs = context.Select(c => c.Id).ToList(),
PromptHash = ComputeHash(prompt)[..16],
Commentary = result.ToString().Trim()
};
}
Store these traces in Azure Table Storage or Cosmos DB alongside the commentary. When an auditor asks "why did the system say revenue variance was timing-related?", you can pull the trace, re-fetch the retrieved documents by ID, and reconstruct exactly what the model saw. This is the difference between a system that finance leadership will trust and one they won't.
Why Choose Python or C#
Python Implementation
Why choose Python: If your data team already writes Python — which is common in finance analytics — LangGraph gives you the richest AI orchestration tooling available anywhere.
- Library ecosystem — LangChain, LangGraph, Pandas, direct ERP connectors via PyODBC
- Data team fit — Finance analysts who know Python can extend the commentary prompts themselves
- Rapid iteration — Changing prompt logic or adding a new agent node takes minutes
- LangSmith integration — Managed tracing and prompt versioning if you want it
C#/.NET Implementation
Why choose C#: If this system lives inside a .NET ERP integration layer or a .NET finance application, Semantic Kernel fits naturally with zero runtime mismatch.
- Native Azure integration — First-party Microsoft SDKs, maintained in lockstep with Azure OpenAI releases
- Enterprise patterns — Dependency injection, strong typing, and the full ASP.NET stack if you're building a UI
- Finance system alignment — Most enterprise ERP connectors (SAP .NET SDK, Dynamics 365 SDK) are C#-native
- Azure AI Foundry — C# SDK is the primary language for Azure AI Foundry Agent Service
The Bottom Line
Data/analytics team? Use Python — LangGraph's ecosystem advantage is real. Backend/.NET team? Use C# — Semantic Kernel integrates with your existing stack without friction. Don't choose a language to learn it. Choose the language your team already ships in production.
Azure Infrastructure
A production financial close copilot needs these Azure services:
| Service | Purpose | Approx. Monthly Cost |
|---|---|---|
| Azure OpenAI (GPT-4o) | Commentary generation | $3–10/close |
| Azure AI Search (S1) | Historical commentary retrieval | ~$245/month |
| Azure App Service (B2) | API hosting | ~$55/month |
| Azure Table Storage | Trace log storage | <$1/month |
| Azure Key Vault | ERP credentials, API keys | ~$5/month |
Azure AI Foundry Agent Service
Azure AI Foundry Agent Service is now generally available and worth considering as an alternative to self-hosting the LangGraph or Semantic Kernel orchestration layer.
- Built-in agent routing and workflow management
- Managed state persistence — no Cosmos DB needed for pipeline state
- Native Azure OpenAI integration with built-in rate limiting
- Observability through Azure Monitor (though I'd still add the custom trace log above for audit specificity)
Check Azure AI Foundry Agent Service for current pricing — it may reduce your App Service cost significantly.
When NOT to Use This Approach
Skip the AI Copilot When:
- Your chart of accounts is a mess. If you can't build a reliable account mapping, the pipeline's first step fails. Fix the CoA before adding AI on top of it.
- You have fewer than 50 accounts requiring commentary. At that scale, a well-structured Excel template with pre-written commentary starters is faster to build and easier to audit.
- Your close cycle is already under 2 days. The gains are marginal. Spend the engineering effort on something with higher ROI.
- Your ERP has no API access. If you're manually exporting CSV files, solve the integration problem first. Building AI on top of a manual export process is fragile.
- The finance team won't review AI output. If leadership wants to auto-publish AI-generated commentary without human review, that's a governance problem, not a technology one. Don't build a system that removes the human from the loop entirely.
The copilot adds the most value in the 200–2,000 account range, with reliable ERP API access, a stable chart of accounts, and a finance team that's willing to treat the AI output as a first draft rather than final truth. Those constraints rule out more companies than you'd expect — but they also describe the majority of mid-market and enterprise finance teams.
Key Takeaways
- Cost is negligible relative to savings: ~$36/month for a 300-account close vs. $6,400/month in analyst time. The economics are compelling even at 10× cost.
- Traceability is non-negotiable in finance: Every AI-generated commentary line needs a trace log with prompt hash, retrieved documents, and model metadata. Auditors will ask for it.
- Start with the data quality problem: A clean chart of accounts mapping is the prerequisite for everything else. If that mapping is unreliable, the AI commentary will be too.
- Humans stay in the loop: The copilot generates first drafts. Finance professionals review and approve. This isn't about removing analysts — it's about giving them time to do analysis instead of transcription.
If you missed Part 1, it covers the full architecture, state model, ERP normalization logic, and RAG-augmented commentary generation — read it here.
Cost figures use Azure OpenAI pricing as of April 2026. Check Azure pricing for current rates — token prices have been declining steadily.
Want More Practical AI Tutorials?
I write about building production AI systems with Azure, Python, and C#. Subscribe for practical tutorials delivered twice a month.
Subscribe to Newsletter →