From Prototype to Production
Part 1 covered the architecture of the AI Launch & Narrative Studio — the four-agent pipeline, the brand voice RAG layer using Azure AI Search, and the multi-channel content adaptation strategy. If you followed along, you have a working system that generates press releases, blog posts, social posts, and email announcements from a single product brief.
Part 2 is about what happens when you try to run that system reliably, repeatedly, and at the cost that was implied in the pitch to your stakeholders. Specifically: what does a full launch package actually cost in Azure OpenAI tokens, how do you trace a multi-agent workflow when something produces unexpected output, what are the real trade-offs between Python and C# for this use case, and — most importantly — when should you not build this at all.
I'll give you real numbers where I have them, and honest uncertainty where I don't.
Cost Analysis: What a Launch Package Actually Costs
The cost of a single complete launch package (press release + blog post + LinkedIn post + email) depends on three variables: the length of your product brief, whether the brand checker requires revisions, and your Azure OpenAI deployment pricing tier.
Using GPT-4o at standard Azure OpenAI pricing ($2.50 per million input tokens, $10.00 per million output tokens), here's the per-agent breakdown for a typical 500-word product brief:
| Agent | Input Tokens | Output Tokens | Cost |
|---|---|---|---|
| Research Agent | ~2,200 | ~900 | $0.014 |
| Narrative Architect | ~3,100 | ~500 | $0.013 |
| Content Adaptor (4 channels) | ~6,400 | ~3,800 | $0.054 |
| Brand Checker | ~4,200 | ~400 | $0.015 |
| Total (no revisions) | ~15,900 | ~5,600 | $0.096 |
| Total (1 revision cycle) | ~24,100 | ~7,000 | $0.130 |
The Number That Will Surprise Your CFO
100 launch packages per month: approximately $13 in LLM costs. Add Azure AI Search (S1 tier, ~$250/month for the search infrastructure) and Azure OpenAI Provisioned Throughput if you need latency guarantees (~$1,400/month for 100 PTU). Your real production cost is the infrastructure, not the tokens. The LLM calls are nearly free at this scale.
The Content Adaptor dominates token consumption because it runs four separate LLM calls — one per channel. An optimisation worth considering is batch generation: a single prompt that requests all four channel formats in one response. In practice, I found this produces slightly lower quality results (the model trades off depth for breadth), but if token costs are a genuine constraint, it's a viable approach that cuts Content Adaptor costs by roughly 60%.
Azure AI Search query costs are negligible in comparison: at $0.003 per 1,000 queries on the S1 tier, 100 launches with 3 RAG queries each costs $0.90. The meaningful cost is the monthly infrastructure fee, not the per-query charge.
Observability: Tracing What Went Wrong
The tricky part with multi-agent pipelines isn't that they fail loudly — it's that they fail quietly. The brand checker approves content. The launch package looks complete. You post the press release and then someone notices that the LinkedIn post uses a competitor's trademarked term that GPT-4o helpfully inferred from context.
You need to be able to trace exactly what each agent received as input, what it produced as output, and why the brand checker passed it. This is non-negotiable in production.
The pattern I use is OpenTelemetry with Azure Monitor as the backend. Each agent gets a named span. The span carries the input token count, the output length, and the revision cycle number. For failed brand checks, the span includes the specific failure reason.
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from azure.monitor.opentelemetry import configure_azure_monitor
import os
configure_azure_monitor(
connection_string=os.environ["APPLICATIONINSIGHTS_CONNECTION_STRING"]
)
tracer = trace.get_tracer("launch-studio")
def traced_agent(agent_name: str):
"""Decorator that wraps any agent function with a named OTel span."""
def decorator(fn):
def wrapper(state: LaunchStudioState) -> dict:
with tracer.start_as_current_span(agent_name) as span:
span.set_attribute("revision_count", state.get("revision_count", 0))
span.set_attribute("brief_length", len(state["product_brief"]))
result = fn(state)
# Log key outputs for debugging
if "messaging_pillars" in result:
span.set_attribute(
"pillars_count", len(result["messaging_pillars"]))
if "brand_approved" in result:
span.set_attribute(
"brand_approved", result["brand_approved"])
if "brand_feedback" in result and result.get("brand_feedback"):
span.set_attribute(
"brand_failure_reason", result["brand_feedback"][:500])
return result
return wrapper
return decorator
# Apply to each agent
@traced_agent("research_agent")
def research_agent(state: LaunchStudioState) -> dict:
# ... existing implementation
pass
@traced_agent("narrative_architect")
def narrative_agent(state: LaunchStudioState) -> dict:
# ... existing implementation
pass
using System.Diagnostics;
using Azure.Monitor.OpenTelemetry.Exporter;
using OpenTelemetry;
using OpenTelemetry.Trace;
// Program.cs — configure once at startup
builder.Services.AddOpenTelemetry()
.WithTracing(b => b
.AddSource("LaunchStudio")
.AddAzureMonitorTraceExporter(o =>
o.ConnectionString =
config["ApplicationInsights:ConnectionString"]));
// Base class for all agents
public abstract class TracedAgent
{
private static readonly ActivitySource Source = new("LaunchStudio");
protected async Task<LaunchStudioState> TraceAsync(
string agentName,
LaunchStudioState state,
Func<LaunchStudioState, Task<LaunchStudioState>> work)
{
using var activity = Source.StartActivity(agentName);
activity?.SetTag("revision_count", state.RevisionCount);
activity?.SetTag("brief_length", state.ProductBrief.Length);
var result = await work(state);
activity?.SetTag("brand_approved", result.BrandApproved);
if (!string.IsNullOrEmpty(result.BrandFeedback))
activity?.SetTag("brand_failure_reason",
result.BrandFeedback[..Math.Min(500, result.BrandFeedback.Length)]);
return result;
}
}
// Usage in ResearchAgent
public class ResearchAgent : TracedAgent
{
public Task<LaunchStudioState> RunAsync(LaunchStudioState state) =>
TraceAsync("research_agent", state, async s =>
{
// ... existing implementation
return s;
});
}
What to Alert On
Set Azure Monitor alerts for: revision cycles exceeding 2 (indicates brand guidelines may need updating), Content Adaptor output below 100 words per channel (model may have truncated), and brand_approved = false on the final cycle (means the safety valve fired — review that launch package manually).
Technology Choices: Python or C#?
Python Implementation
Why choose Python: If your team writes Python, you get access to the richest AI/ML ecosystem available. For a launch studio specifically, the ability to prototype new agent types quickly and iterate on prompt engineering in Jupyter notebooks is a real productivity advantage.
- LangGraph — purpose-built for stateful multi-agent workflows; the conditional routing and state management map directly to the launch studio's branching logic
- LangChain integrations — Azure AI Search, Azure OpenAI, and dozens of other connectors are maintained by the community and production-tested
- Rapid prompt iteration — changing a system prompt, testing it, seeing the result in a notebook takes seconds
- Ecosystem fit — if you later want to add a sentiment analysis step or a readability scorer, there are Python libraries for both
C#/.NET Implementation
Why choose C#: If your backend is .NET and you're embedding the launch studio into an existing enterprise application, Semantic Kernel gives you first-party Microsoft support, native dependency injection, and the same deployment model as the rest of your stack.
- Native Azure integration — first-party SDKs, Microsoft-maintained, same support channels as your Azure subscription
- Enterprise patterns — dependency injection, typed configuration, strong typing across the entire agent interface; no runtime surprises from a dictionary that's missing a key
- Production-ready by default — the OpenTelemetry integration, structured logging, and health check patterns that take effort to set up in Python are idiomatic .NET from day one
- Team fit — if your team maintains a .NET API, they can maintain a .NET launch studio; no language context-switching
The Bottom Line
Python team? Use Python with LangGraph — faster to iterate, richer ecosystem for AI tooling. C#/.NET team? Use Semantic Kernel — better enterprise patterns, native Microsoft support, no language mismatch with the rest of your stack. Don't fight your team's existing expertise.
Azure Infrastructure
The minimum Azure footprint for a production launch studio:
| Service | Tier | Purpose | Approx. Monthly Cost |
|---|---|---|---|
| Azure OpenAI | Pay-as-you-go | GPT-4o for all agents | $1–$20 (token-based) |
| Azure AI Search | S1 | Brand guidelines + market research RAG | ~$250 |
| Azure Blob Storage | LRS | Generated launch packages, brand documents | <$5 |
| Application Insights | Pay-as-you-go | Agent tracing and alerting | $2–$15 |
| Azure Container Apps | Consumption | Host the orchestrator API | $0–$30 |
The S1 AI Search tier is the most significant fixed cost. If you're just starting out, the Basic tier (~$75/month) is enough while you're indexing and testing brand guidelines. Upgrade to S1 when you add the market research index and need higher storage and query throughput.
Azure AI Foundry Agent Service
Azure AI Foundry Agent Service is now generally available, providing managed orchestration for AI systems. For a launch studio, the main advantage is eliminating the infrastructure you'd otherwise manage yourself.
- Built-in routing and workflow management for multi-agent pipelines
- Managed state persistence — no need to handle state serialisation yourself
- Native Azure OpenAI integration with automatic retry and circuit-breaking
- Observability through Azure Monitor out of the box
Check Azure AI Foundry Agent Service for current pricing.
When NOT to Use This Approach
I want to be direct about this, because I've seen teams build AI content pipelines for the wrong reasons.
Don't Build This If:
- You launch less than once a month — The maintenance overhead (keeping brand guidelines indexed, updating market research, monitoring agent performance) costs more in time than it saves on infrequent launches.
- Your brand voice is still being defined — The RAG layer enforces consistency against a fixed brand guide. If your voice is evolving, you'll spend more time updating the index than generating content.
- You're handling sensitive or regulated communications — Financial disclosures, healthcare claims, legal statements: the brand checker is not a compliance engine. A human must own regulated content.
- Your launch strategy requires deep competitive response — If your positioning depends on real-time competitive intelligence (e.g., you're responding to a competitor's announcement), the Research Agent's indexed corpus won't be current enough to be useful.
- Your executives will rewrite everything anyway — If the final decision-maker fundamentally distrusts AI-generated copy, the system produces drafts that get discarded. That's not a technology problem; it's a people problem that technology can't solve.
The right use case is a team that launches regularly, has a well-defined brand voice, trusts the system enough to use the first draft as a genuine starting point (not a blank slate to rewrite from scratch), and is comfortable with the maintenance overhead of keeping the knowledge indexes current.
Key Takeaways
If you've read both parts, here's what I'd want you to carry forward:
- The pipeline design matters more than the model. GPT-4o is capable enough that the bottleneck is how you structure the workflow, not the model's raw ability. The Research → Narrative → Content → Brand Check sequence with explicit state passing is what produces coherent output.
- Brand voice consistency requires RAG, not just prompting. Putting your full style guide in every system prompt doesn't scale and produces lower quality retrievals than vector search over indexed chunks.
- The real cost is infrastructure, not tokens. At $0.10–$0.20 per launch package in LLM costs, you're paying almost nothing for the generation itself. Azure AI Search at the S1 tier is the dominant cost item.
- Multi-agent pipelines fail quietly. Invest in OpenTelemetry tracing from day one. You will need it the first time a brand checker silently passes content that should have failed.
- This is a first-draft machine, not a communications strategy. It removes the blank-page problem and the channel-by-channel reformatting work. It doesn't replace the human judgment about what to say and to whom.
Start Here
Build the pipeline in a Python notebook first, with your brand guidelines as a static text file (no RAG yet). Run it against two or three real product briefs. If the output quality is high enough that a human would use it as a genuine first draft, then invest in the Azure AI Search infrastructure and proper observability. Don't build the full system until you've validated the quality.
Part 1 gave you the architecture. Part 2 gave you the production reality. The rest is your brief, your brand, and your launch calendar.
Want More Practical AI Tutorials?
I write about building production AI systems with Azure, Python, and C#. Subscribe for practical tutorials delivered twice a month.
Subscribe to Newsletter →