Agents - Deep Dive
Agent Anatomy
An agent in Microsoft Agent Framework is a stateful conversational entity that combines:
- Model client: Connection to an LLM
- Instructions: System prompt defining behavior
- Tools: Functions the agent can call
- Context: Conversation history and state
- Middleware: Request/response interceptors
Creating Agents
Basic Agent
from agents_framework import Agent, ModelClient
agent = Agent(
name="assistant", # Agent identifier
model=ModelClient(model="gpt-4"), # LLM connection
instructions="You are helpful" # System prompt
)
Agent with Configuration
agent = Agent(
name="code_reviewer",
model=ModelClient(
model="gpt-4-turbo",
temperature=0.7,
max_tokens=2000,
timeout=60.0
),
instructions="""You are a code reviewer. Focus on:
- Code correctness and bugs
- Performance issues
- Security vulnerabilities
- Best practices
Provide actionable feedback.""",
tools=[analyze_code, suggest_fix],
parallel_tool_calls=True, # Call multiple tools concurrently
response_format=ReviewReport # Structured output
)
C# Agent
using Microsoft.Agents.AI;
var agent = new Agent(
name: "assistant",
model: new ModelClient(
model: "gpt-4",
temperature: 0.7,
maxTokens: 2000
),
instructions: "You are a helpful assistant",
tools: new[] { analyzeTool, suggestTool },
parallelToolCalls: true
);
Agent Lifecycle
1. Initialization
# Agent created with configuration
agent = Agent(name="agent", model=model, instructions="...")
# Initialization happens once
# Tools are registered, middleware is set up
2. Message Processing
# Single-turn conversation
response = await agent.run(message="Hello")
# Multi-turn with thread
thread = Thread()
response1 = await agent.run(thread=thread, message="First message")
response2 = await agent.run(thread=thread, message="Follow-up")
Processing Flow:
- Message arrives → middleware preprocessing
- Thread retrieves conversation history
- Context providers inject additional context
- Model processes message + history + context
- If tool calls needed → execute tools → back to step 4
- Generate response → middleware postprocessing
- Update thread with new messages
3. Tool Execution
@function_tool
def get_data(query: str) -> dict:
return {"result": "data"}
agent = Agent(model=model, tools=[get_data])
# Agent automatically decides when to call tools
response = await agent.run(message="Get data for X")
# Internally: agent calls get_data("X") → processes result → responds
4. State Management
from agents_framework import Thread
thread = Thread()
# Each run updates thread state
await agent.run(thread=thread, message="My name is Alice")
await agent.run(thread=thread, message="What's my name?")
# Agent remembers: "Your name is Alice"
# Access thread history
for message in thread.messages:
print(f"{message.role}: {message.content}")
Advanced Agent Features
Structured Outputs
Force agent to return data in a specific format:
from pydantic import BaseModel
class TaskBreakdown(BaseModel):
tasks: list[str]
priority: str
estimated_hours: float
agent = Agent(
model=model,
instructions="Break down projects into tasks",
response_format=TaskBreakdown
)
response = await agent.run(message="Plan website redesign")
breakdown: TaskBreakdown = response.parsed
print(breakdown.tasks)
print(f"Priority: {breakdown.priority}")
print(f"Est. hours: {breakdown.estimated_hours}")
C# Example:
public class TaskBreakdown
{
public List<string> Tasks { get; set; }
public string Priority { get; set; }
public double EstimatedHours { get; set; }
}
var agent = new Agent(
model: model,
instructions: "Break down projects into tasks",
responseFormat: typeof(TaskBreakdown)
);
var response = await agent.RunAsync("Plan website redesign");
var breakdown = response.Parsed<TaskBreakdown>();
Parallel Tool Calls
Allow agent to call multiple tools simultaneously:
@function_tool
def get_weather(location: str) -> str:
return f"Weather in {location}: Sunny"
@function_tool
def get_time(timezone: str) -> str:
return f"Time in {timezone}: 3:00 PM"
agent = Agent(
model=model,
tools=[get_weather, get_time],
parallel_tool_calls=True # Enable parallel execution
)
# Agent can call both tools at once
response = await agent.run(message="What's the weather and time in Seattle?")
# Internally: get_weather("Seattle") and get_time("America/Los_Angeles") run concurrently
Streaming Responses
Stream agent responses as they're generated:
thread = Thread()
async for chunk in agent.run_stream(thread=thread, message="Explain quantum computing"):
print(chunk.delta, end="", flush=True)
# Prints response incrementally
C# Example:
await foreach (var chunk in agent.RunStreamAsync(thread, "Explain quantum computing"))
{
Console.Write(chunk.Delta);
}
Token Usage Tracking
Monitor token consumption:
response = await agent.run(message="Hello")
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
Temperature & Sampling Control
# Low temperature for deterministic outputs
code_agent = Agent(
model=ModelClient(model="gpt-4", temperature=0.1),
instructions="Generate code"
)
# Higher temperature for creative outputs
creative_agent = Agent(
model=ModelClient(model="gpt-4", temperature=0.9),
instructions="Write stories"
)
Max Tokens & Truncation
agent = Agent(
model=ModelClient(
model="gpt-4",
max_tokens=500 # Limit response length
),
instructions="Be concise"
)
Stop Sequences
agent = Agent(
model=ModelClient(
model="gpt-4",
stop=["END", "DONE"] # Stop generation at these sequences
),
instructions="Generate text until END"
)
Agent Patterns
Chain-of-Thought Agents
agent = Agent(
model=model,
instructions="""Think step-by-step:
1. Understand the problem
2. Break down into sub-problems
3. Solve each sub-problem
4. Synthesize solution
Show your reasoning at each step."""
)
Specialized Agents
# Research agent
researcher = Agent(
model=model,
instructions="Research topics thoroughly. Cite sources.",
tools=[search_web, fetch_article]
)
# Writing agent
writer = Agent(
model=model,
instructions="Write clear, engaging content based on research.",
tools=[check_grammar, suggest_synonyms]
)
# Review agent
reviewer = Agent(
model=model,
instructions="Review content for accuracy and clarity.",
tools=[fact_check, readability_score]
)
Persona-Based Agents
expert_agent = Agent(
model=model,
instructions="""You are Dr. Smith, a senior software architect with 20 years experience.
You provide detailed technical advice, reference design patterns, and consider scalability."""
)
beginner_agent = Agent(
model=model,
instructions="""You are a friendly tutor. Use simple language, provide examples,
and encourage learning. Avoid jargon."""
)
Error-Handling Agents
from agents_framework import RetryPolicy, ErrorHandler
agent = Agent(
model=model,
retry_policy=RetryPolicy(
max_retries=3,
backoff_factor=2.0,
exceptions=[TimeoutError, ConnectionError]
),
error_handler=ErrorHandler(
fallback_response="I'm having trouble right now. Please try again.",
log_errors=True
)
)
Agent Communication
Agent-to-Agent Messages
# Agent 1 generates message for Agent 2
response1 = await agent1.run(message="Research AI trends")
# Pass to Agent 2
response2 = await agent2.run(message=f"Summarize this: {response1.content}")
Shared Thread
thread = Thread()
# Multiple agents share conversation history
await agent1.run(thread=thread, message="Hello")
await agent2.run(thread=thread, message="Continue the conversation")
# agent2 sees agent1's message in history
Agent Handoff
async def handoff_flow():
thread = Thread()
# Start with classifier agent
classification = await classifier.run(
thread=thread,
message="User query here"
)
# Route to appropriate specialist
if "technical" in classification.content.lower():
return await technical_agent.run(thread=thread, message="Handle this")
else:
return await general_agent.run(thread=thread, message="Handle this")
Best Practices
- Clear Instructions: Write specific, actionable system prompts
- Tool Naming: Use descriptive function names and docstrings
- Thread Management: Reuse threads for related conversations, create new threads for new topics
- Error Handling: Always implement retry policies for production agents
- Token Limits: Monitor usage and set max_tokens to prevent runaway costs
- Streaming: Use streaming for long-form responses to improve UX
- Structured Outputs: Use Pydantic models for consistent data parsing
- Testing: Test agents with diverse inputs to ensure reliability
Debugging Agents
import logging
# Enable debug logging
logging.basicConfig(level=logging.DEBUG)
# Run agent
response = await agent.run(message="Test")
# Inspect response
print(f"Model: {response.model}")
print(f"Role: {response.role}")
print(f"Content: {response.content}")
print(f"Tool calls: {response.tool_calls}")
print(f"Usage: {response.usage}")
Performance Optimization
- Parallel Tool Calls: Enable for independent operations
- Caching: Cache tool results for repeated queries
- Model Selection: Use smaller models for simple tasks
- Token Efficiency: Prune conversation history to reduce context
- Batch Processing: Process multiple messages concurrently
# Batch process
messages = ["Query 1", "Query 2", "Query 3"]
responses = await asyncio.gather(*[
agent.run(message=msg) for msg in messages
])