A Practical Guide for Building Agent Capabilities
"Skills orchestrate the how. MCP Tools execute the what."
This guide consolidates learnings from building AI agent capabilities for Azure, focusing on the relationship between Skills (workflow orchestration) and MCP Tools (discrete operations). Whether you're creating a new Copilot Skill, building an MCP server, or integrating both, this document provides the architecture patterns, development best practices, and real-world case studies you need to build effective, non-conflicting agent capabilities. The core principle is simple: Skills act as the "brain" that orchestrates complex workflows, while MCP Tools serve as the "hands" that execute individual operations—and Skills should invoke MCP Tools, not duplicate them.
- Introduction
- Architecture Overview
- When to Use What
- Skills Development Guide
- Skill Organization Patterns
- MCP Tool Development Guide
- Integration Patterns
- DOs and DON'Ts
- Case Studies
- Testing & Evaluation
- References
Appendices
| Concept | What It Is | When to Use |
|---|---|---|
| Skill | A folder with SKILL.md that teaches the agent how to do something | Multi-step workflows, decisions, code generation |
| MCP Tool | A function the agent can call to do something | Single operations, data retrieval, queries |
| The Pattern | Skills invoke MCP Tools, not the other way around | Always |
One-liner: Skills are onboarding guides for AI agents. MCP Tools are the buttons they press.
Is this a workflow with decisions?
├─ YES → Create a SKILL
└─ NO → Is it a single operation?
├─ YES → Create an MCP TOOL
└─ BOTH → Skill orchestrates, MCP executes
Skills load content incrementally to stay lean:
- Level 1: Metadata —
name+description(always in system prompt, ~50 tokens) - Level 2: Instructions — Full SKILL.md (loaded when triggered, <500 tokens ideal)
- Level 3: Resources —
references/+scripts/(loaded on demand)
- Match instruction detail to task risk — Fragile operations need exact scripts; flexible tasks need guidance
- Use checklists for multi-step workflows — Makes progress trackable and resumable
- Test first, document minimally — Write evals before writing docs (§4.1)
- Skills call MCP for patterns — Single source of truth, no duplication
my-skill/
├── SKILL.md # Workflow instructions + frontmatter
├── references/ # Deep-dive docs (loaded on demand)
└── scripts/ # Executable code (not loaded, just run)
As AI agents become the primary interface for developer workflows, the quality and consistency of agent guidance directly impacts developer success. Skills and MCP Tools are two distinct systems that provide capabilities to AI agents—and when they conflict or overlap without coordination, developers lose trust and productivity.
This guide provides a framework for building Skills and MCP Tools that work together rather than compete. The principles here apply to any domain where both systems coexist.
A Note on Scope and Examples: This guide uses Azure as the primary source of examples because that's where we conducted our research and validated these patterns. However, the guidance is agent-agnostic and domain-agnostic—the same principles apply whether you're building Skills for AWS, GCP, internal platforms, or any other domain. Similarly, while we reference GitHub Copilot as the implementation platform, the Skill pattern itself works across any agent that supports structured capability injection.
| Audience | What You'll Learn |
|---|---|
| Architects | How Skills and MCP fit together, when to recommend each, how to design for the hybrid pattern |
| Skill Developers | How to write Skills that complement (not compete with) MCP Tools, frontmatter best practices |
| MCP Contributors | When to add a new tool vs. defer to Skills, naming conventions, integration patterns |
| Platform Teams | How to evaluate existing Skills/Tools, identify overlaps, and improve routing |
Before diving in, here are the key questions this guide addresses:
- When do I create a Skill vs. an MCP Tool? → Section 3: When to Use What
- How do I write a Skill that invokes MCP Tools? → Section 6: Integration Patterns
- What does good Skill frontmatter look like? → Section 4: Skills Development Guide
- How do I avoid creating conflicting guidance? → Section 7: DOs and DON'Ts
- How do I test that my Skill routes correctly? → Section 9: Testing & Evaluation
AI agents need clear, unambiguous guidance to help users effectively. When two capability systems exist side-by-side, problems emerge:
| System | Purpose | Control Model |
|---|---|---|
| MCP Tools | Execute discrete operations via JSON-RPC | Model-controlled (LLM decides when to invoke) |
| Copilot Skills | Orchestrate multi-step workflows via prompts | User-controlled (explicit selection) |
The problem: When both are exposed to the LLM simultaneously without clear routing, you get:
- ❌ Duplicate invocations — LLM calls both systems for the same request
- ❌ Name collisions — Same capability name exists in both (e.g., "deploy")
- ❌ Conflicting guidance — Different systems suggest incompatible approaches
- ❌ Inconsistent experience — Same prompt yields different results
Real-World Example: In the Azure ecosystem, research found that MCP's best practices for Static Web Apps recommended SWA CLI (
npx swa deploy), while Skills recommendedazd upwith Bicep. No coordination layer existed—the agent picked randomly, leading to deployment failures when incompatible approaches were mixed. See Section 8: Case Studies for the full analysis and solution.
┌─────────────────────────────────────────────────────────────────────┐
│ USER REQUEST │
│ "Deploy my app to Azure" │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ LLM ROUTER │
│ Analyzes intent, context, and decides execution path │
└─────────────────────────────────────────────────────────────────────┘
│ │
┌─────────▼─────────┐ ┌─────────▼─────────┐
│ SKILL LAYER │ │ MCP TOOL LAYER │
│ 🧠 THE BRAIN │ │ 🖐️ THE HANDS │
│ │ │ │
│ • Workflows │ │ • CRUD Operations │
│ • Decisions │ │ • Queries │
│ • Best Practices │ │ • Direct API Calls│
│ • Multi-step │ │ • Single Actions │
│ Guidance │ │ │
└─────────┬─────────┘ └─────────▲─────────┘
│ │
└───────────────────────────┘
Skills INVOKE MCP tools
| Component | Role | Control Model | Analogy |
|---|---|---|---|
| Skills | Workflow orchestration | User-controlled (explicit selection) | The Brain |
| MCP Tools | Discrete operations | Model-controlled (LLM decides) | The Hands |
The Pattern:
User Request → SKILL (user-initiated workflow) → MCP TOOLS (model-executed actions)
This section explains the foundational architecture of MCP and Skills. Understanding these primitives is essential before building capabilities—the architecture dictates what goes where.
The Model Context Protocol (MCP) is an open standard created by Anthropic that defines how AI applications (hosts) connect to external data sources and tools (servers). It uses JSON-RPC 2.0 over stdio or HTTP for transport.
┌─────────────────────────────────────────────────────────────────────┐
│ MCP HOST │
│ (VS Code, Claude Desktop, Copilot CLI) │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ MCP Client │ │ MCP Client │ │ MCP Client │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
└──────────┼─────────────────┼─────────────────┼──────────────────────┘
│ │ │
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ MCP Server │ │ MCP Server │ │ MCP Server │
│ (Azure) │ │ (GitHub) │ │(Filesystem)│
└────────────┘ └────────────┘ └────────────┘
Key architectural principle: One host can connect to multiple servers simultaneously. Each server exposes capabilities through three primitives, each with a distinct control model.
| Primitive | Purpose | Control Model | Who Decides When to Use |
|---|---|---|---|
| Tools | Executable functions for actions | Model-controlled | The LLM decides based on context |
| Prompts | Reusable interaction templates | User-controlled | The user explicitly selects |
| Resources | Data/context sources | Application-controlled | The host application decides |
Understanding control models is critical:
- Model-controlled (Tools): The LLM sees the tool's schema and decides autonomously when to call it. You cannot prevent the LLM from calling a tool—you can only influence its decision through descriptions.
- User-controlled (Prompts): The user must explicitly select a prompt. The LLM cannot invoke it on its own.
- Application-controlled (Resources): The host application determines when to load resources into context.
"Tools enable models to interact with external systems... Each tool is uniquely identified by a name and includes metadata describing its schema." — MCP Tools Documentation
"Prompts are designed to be user-controlled, meaning they are exposed from servers to clients with the intention of the user being able to explicitly select them for use." — MCP Prompts Documentation
Copilot Skills are conceptually similar to MCP Prompts—they're user-controlled workflow templates. However, Skills are implemented differently (as markdown files with frontmatter, not JSON-RPC endpoints). The key insight:
| Concept | MCP Term | Copilot Implementation | Control |
|---|---|---|---|
| Discrete operation | Tool | MCP Tool | Model-controlled |
| Workflow template | Prompt | Skill (SKILL.md) | User-controlled |
| Context data | Resource | Skill references/ | Application-controlled |
Architectural clarification: The MCP specification describes Tools as "model-controlled" and Skills as "user-controlled," but in practice (particularly in Copilot's implementation), both use similar routing mechanisms:
- Both populate the context window with their descriptions
- Both are selected via embedding/semantic similarity matching
- Both appear as "tool execution" in agent logs
The practical difference is not how they're invoked but what they're designed for: Skills provide rich workflow orchestration with frontmatter (USE FOR, DO NOT USE FOR, INVOKES), while MCP Tools execute discrete operations. The coordination challenge is really about description collision—when a Skill and Tool have overlapping descriptions, either may be selected regardless of your intended workflow. This is why clear, differentiated descriptions matter more than the conceptual skill-vs-tool distinction.
A Skill is a structured markdown file that provides workflow guidance to the LLM. Unlike MCP Tools (which execute code), Skills inject prompts and context that guide the LLM's behavior.
my-skill/
├── SKILL.md ◄── Primary skill definition (frontmatter + workflow)
│ ├── ---
│ │ name: my-skill
│ │ description: |
│ │ **WORKFLOW SKILL** - [description]
│ │ USE FOR: [triggers]
│ │ DO NOT USE FOR: [anti-triggers]
│ │ INVOKES: [mcp tools]
│ │ ---
│ └── [Workflow body with steps]
│
├── references/ ◄── Supplemental materials (deep-dive docs)
│ ├── services/ • Service-specific guidance
│ ├── recipes/ • Step-by-step procedures
│ └── patterns/ • Reusable patterns
│
└── scripts/ ◄── Automation scripts
Understanding when components load is critical for optimization:
| Component | When Loaded | Purpose | Token Budget |
|---|---|---|---|
SKILL.md |
Always (when skill matches) | Primary definition, routing logic | < 500 tokens (soft), < 5000 (hard) |
references/*.md |
On-demand (LLM requests) | Deep-dive docs, patterns, recipes | < 1000 tokens each |
scripts/ |
Never (execution only) | Automation, not LLM context | N/A |
Why token budgets matter: Skills compete for context window space. A 5000-token skill leaves less room for user code, conversation history, and MCP tool schemas. Lean skills perform better.
Skills use progressive disclosure to keep context lean. Only load what's needed, when it's needed.
┌─────────────────────────────────────────────────────────────────────┐
│ LEVEL 1: METADATA (Always in System Prompt) │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ name: azure-deploy │ │
│ │ description: Deploy applications to Azure... │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ ↓ Pre-loaded at startup for ALL installed skills (~50 tokens) │
│ │
│ LEVEL 2: INSTRUCTIONS (Loaded on Trigger) │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ # Azure Deploy │ │
│ │ ## Steps │ │
│ │ 1. Validate prerequisites... │ │
│ │ 2. Run deployment... │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ ↓ Loaded when skill matches user request (<500-5000 tokens) │
│ │
│ LEVEL 3: RESOURCES (Loaded on Demand) │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ references/container-apps.md ← Loaded if CA detected │ │
│ │ references/functions.md ← Loaded if Functions needed │ │
│ │ scripts/validate.py ← Executed, output only │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ ↓ Loaded only when skill references them (unlimited) │
└─────────────────────────────────────────────────────────────────────┘
Why this matters: Level 1 is ~50 tokens per skill. Level 2 is 500-5000 tokens. Level 3 can be unlimited but only loads when referenced. This keeps agents fast and focused.
The description field in frontmatter determines whether your skill gets invoked. This is the LLM's only signal for routing decisions. A poor description means your skill won't trigger—or will trigger incorrectly.
Key frontmatter elements (detailed in Section 4):
- USE FOR: Trigger phrases that should activate this skill
- DO NOT USE FOR: Anti-triggers that should route elsewhere
- INVOKES: MCP tools this skill calls (helps LLM understand the relationship)
When building a skill ecosystem, not all skills are equal. Some orchestrate primary workflows (like deployment); others provide deep-dive knowledge for specific services. Separating these into tiers prevents confusion and improves maintainability.
┌────────────────────────────────────────────────────────────────────┐
│ TIER 1: CORE SKILLS │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ PREPARE │ → │ VALIDATE │ → │ DEPLOY │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ Purpose: Orchestrate the primary development workflow │
│ Ownership: Central skills team │
│ Invocation: User wants to build, prepare, validate, or deploy │
└────────────────────────────────────────────────────────────────────┘
│
│ references
▼
┌────────────────────────────────────────────────────────────────────┐
│ TIER 2: SERVICE-SPECIFIC SKILLS │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Azure │ │ Azure │ │ Azure │ │ Azure │ ... │
│ │ Functions │ │ Container │ │ Cosmos │ │ Redis │ │
│ │ Skill │ │ Apps Skill │ │ DB Skill │ │ Skill │ │
│ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │
│ │
│ Purpose: Deep-dive guidance for specific Azure services │
│ Ownership: Product and service teams │
│ Invocation: User asks service-specific questions (not workflow) │
└────────────────────────────────────────────────────────────────────┘
-
Separation of concerns: Workflow logic (Tier 1) is different from domain knowledge (Tier 2). Mixing them creates bloated, hard-to-maintain skills.
-
Ownership clarity: The team that owns deployment workflows shouldn't have to be experts in every Azure service. Tier 2 lets service teams own their domain.
-
Routing precision: "Deploy my app" should always go to a core skill, even if the app uses Functions. "How do Functions triggers work?" should go to the Functions skill directly.
-
Composability: Tier 1 skills can reference Tier 2 skills as needed, enabling modularity without duplication.
| Aspect | Tier 1 (Core) | Tier 2 (Service) |
|---|---|---|
| Scope | Primary workflow (build/deploy) | Service-specific knowledge |
| When Invoked | "Deploy my app", "Prepare for Azure" | "How do Functions triggers work?" |
| Can Reference | Tier 2 skills, MCP tools | MCP tools only |
| Ownership | Central team | Product teams |
| Update Frequency | Less frequent (workflow stability) | More frequent (service changes) |
Example from Azure: The
azure-prepareskill (Tier 1) handles the workflow of preparing any app for Azure. When it detects a Functions app, it references theazure-functionsskill (Tier 2) for service-specific configuration patterns, then calls MCP tools likeazure-functionappfor resource queries.
This section answers the most common question: "Should this be a Skill or an MCP Tool?" The answer depends on intent, scope, and control model.
Incorrect routing leads to poor user experiences:
| Routing Error | Consequence | Example |
|---|---|---|
| Skill when Tool needed | Slow, over-engineered response | User asks "list my VMs" → Gets a workflow lecture instead of a list |
| Tool when Skill needed | Incomplete, context-free action | User asks "deploy my app" → Tool runs azd up without validation |
| Both invoked | Conflicting guidance, wasted tokens | Both Skill and Tool answer, giving different advice |
| Neither invoked | Capability gap | Request falls through without any response |
The goal is single, correct routing for every user request.
Ask: "Is this a workflow or an operation?"
User: "Deploy my app"
│
▼
┌────────────────┐
│ Is it a │
│ workflow task? │
└───────┬────────┘
│
┌─────────────┴─────────────┐
│ YES NO │
▼ ▼
┌───────────────┐ ┌───────────────┐
│ SKILL │ │ MCP TOOL │
│ │ │ │
│ • Deploy │ │ • List │
│ • Create │ │ • Get │
│ • Set up │ │ • Query │
│ • Configure │ │ • Run command │
└───────────────┘ └───────────────┘
Workflow = Multiple steps, decisions required, generates artifacts Operation = Single action, no decisions, returns data or executes command
The verb in a user request often signals the correct route:
| Verb | Route | Reason | Example Request |
|---|---|---|---|
| Deploy, Create, Set up, Configure | SKILL | Multi-step workflow | "Deploy my React app" |
| List, Get, Show, Query, Check | MCP TOOL | Data retrieval | "List my storage accounts" |
| Help, Guide, Walk through, Explain | SKILL | Guidance needed | "Help me set up CI/CD" |
| Run, Execute | MCP TOOL | Direct execution | "Run azd up" |
| Troubleshoot, Debug, Diagnose | SKILL first | Then MCP for data | "Why is my app failing?" |
| Optimize, Review, Analyze | SKILL | Analysis workflow | "Review my architecture" |
Not every request maps cleanly to Skill or Tool. Here's how to handle ambiguity:
| Ambiguous Request | Resolution | Rationale |
|---|---|---|
| "Create a storage account" | SKILL | "Create" implies workflow; user needs guidance on SKU, replication, etc. |
| "Create storage account named 'myacct' in eastus, Standard_LRS" | MCP TOOL | Fully specified; no decisions needed |
| "Deploy" (no context) | SKILL | Ask clarifying questions via workflow |
| "azd up" (explicit command) | MCP TOOL | User knows exactly what they want |
| "Set up monitoring" (broad) | SKILL | Workflow to determine what to monitor, how, and where |
| "Show me the metrics for my app" | MCP TOOL | Data retrieval, specific ask |
Rule of thumb: If the user provides all required parameters explicitly, route to Tool. If decisions remain, route to Skill.
Add this to LLM system prompts to improve routing consistency:
## Tool Routing Rules
BEFORE invoking any capability, determine the correct route:
### Route to SKILL when:
- Request involves multiple steps: "deploy my app", "set up monitoring"
- Request needs decisions: "what should I use for...", "help me choose..."
- Request generates code: "create azure.yaml", "generate Bicep"
- Request follows workflow: prepare → validate → deploy
- User says: "help me", "guide me", "walk me through"
### Route to MCP TOOL when:
- Request is data retrieval: "list my...", "show me...", "get..."
- Request is single operation: "delete this", "query logs", "run azd up"
- Request targets specific resource: "storage account named X"
- Skill step explicitly invokes MCP tool
- User says: "just run", "execute", "check status"
### When BOTH are needed (Skill invokes Tool):
- Skill orchestrates the workflow
- Skill calls MCP Tool for specific operations within the workflow
- Example: azure-diagnostics (Skill) calls azure-applens (Tool) for diagnostic dataReal-world routing decisions:
| User Request | Route | Target | Why |
|---|---|---|---|
| "Deploy my app to Azure" | SKILL | azure-prepare |
New deployment workflow |
| "Run azd up" | MCP | azure-azd |
Direct command execution |
| "List my storage accounts" | MCP | azure-storage |
Data query |
| "Set up Key Vault" | SKILL | azure-security |
Workflow guidance |
| "Get secret 'api-key'" | MCP | azure-keyvault |
Direct operation |
| "What's wrong with my app" | SKILL | azure-diagnostics |
Analysis workflow |
| "Check resource health status" | MCP | azure-resourcehealth |
Status query |
| "Create a new Function App" | SKILL | azure-functions |
Creation workflow |
| "List my Function Apps" | MCP | azure-functionapp |
Data query |
| "How do I configure CORS for SWA?" | SKILL | azure-prepare |
Guidance needed |
This section provides practical guidance for building effective Skills. A well-designed Skill has a clear purpose, triggers correctly, and integrates smoothly with MCP Tools.
Principle: Build evaluations first, then write minimal instructions.
Don't document everything the agent might need. Document only what it gets wrong without guidance.
1. TEST WITHOUT SKILL → Run agent on representative tasks
↓
2. IDENTIFY GAPS → Where does it fail? What does it get wrong?
↓
3. CREATE EVALS → Structure test scenarios
↓
4. WRITE MINIMAL DOCS → Only what's needed to pass evals
↓
5. ITERATE → Refine based on results
"Building a skill is not the same as training a model. It is closer to writing an onboarding guide for a new hire." — Anthropic
The agent already knows how to code. Your skill teaches it your patterns, constraints, and preferences.
Use Waza for structured skill evaluation:
# Generate eval from existing skill
waza generate --repo microsoft/GitHub-Copilot-for-Azure --skill azure-functions -o ./eval
# Run evaluation
waza run eval.yamlMatch instruction specificity to task fragility:
| Freedom Level | When to Use | Risk if Too Loose | Risk if Too Tight |
|---|---|---|---|
| High | Multiple valid approaches exist | Agent picks suboptimal path | Unnecessary constraints |
| Medium | Preferred pattern with variations | Inconsistent results | Missed edge cases |
| Low | Fragile, error-prone operations | Broken systems, data loss | Agent can't adapt |
Rule of thumb: The more damage a wrong approach can cause, the more prescriptive your instructions should be.
## Code Review Process
1. Analyze code structure and organization
2. Check for potential bugs or edge cases
3. Suggest improvements for readability## Database Migration
Run exactly this script:
```bash
python scripts/migrate.py --verify --backupDo not modify the command or add additional flags.
### 4.3 The Skill Development Process
Before writing code, answer these questions:
| Question | Why It Matters | Wrong Answer = Don't Build |
|----------|----------------|---------------------------|
| What workflow does this skill orchestrate? | Skills are for workflows, not single operations | "It lists resources" → That's an MCP Tool |
| What decisions does the user need help with? | Skills provide guidance | "None, just run the command" → MCP Tool |
| What MCP Tools will this skill invoke? | Skills complement Tools | "None" → May not need a Skill |
| What triggers should activate this skill? | Routing depends on triggers | "Everything" → Too broad, will conflict |
| What should NOT trigger this skill? | Anti-triggers prevent conflicts | "Nothing" → Will have false positives |
#### Skill Development Lifecycle
- IDENTIFY → Is this a workflow? What decisions are involved? ↓
- DESIGN → Map the workflow steps. What MCP tools are needed? ↓
- WRITE → Frontmatter first (triggers). Then body (steps). ↓
- TEST → Trigger accuracy tests. Does it invoke correctly? ↓
- ITERATE → Refine anti-triggers based on false positives.
> **Tip:** For multi-step workflows, use the [Workflow Checklist Pattern (§7.6)](#76-workflow-checklist-pattern) to make progress trackable.
### 4.4 SKILL.md Structure
Every skill follows this structure:
#### Naming Constraints
| Field | Constraints |
|-------|-------------|
| `name` | • Max 64 characters<br>• Lowercase letters, numbers, hyphens only<br>• No XML tags<br>• No reserved words: `anthropic`, `claude`, `openai`, `copilot` |
| `description` | • Must be non-empty<br>• Max 1024 characters<br>• No XML tags<br>• Write in third person ("Deploys applications..." not "I deploy...") |
```yaml
---
name: my-skill-name
description: |
**WORKFLOW SKILL** - One-line description of what the skill does.
USE FOR: trigger phrase 1, trigger phrase 2, trigger phrase 3.
DO NOT USE FOR: scenario1 (use other-skill), scenario2 (use mcp-tool).
INVOKES: `mcp-tool-1`, `mcp-tool-2` for execution.
FOR SINGLE OPERATIONS: Use `mcp-tool` directly for simple queries.
---
Skill Body Structure:
# Skill Title
## When to Use This Skill
Activate when user wants to:
- Specific action 1
- Specific action 2
## Prerequisites
- Required MCP tools: `azure-xxx`, `azure-yyy`
- Required permissions: list
## MCP Tools Used
| Step | MCP Tool | Command | Purpose |
|------|----------|---------|---------|
| 1 | `azure-xxx` | `xxx_list` | Gather data |
| 3 | `azure-yyy` | `yyy_create` | Execute action |
## Steps
### Step 1: Action Name
**Using MCP (Preferred):**
Invoke `azure-xxx` MCP tool:
- Command: `command_name`
- Parameters: `subscription`, `resource-group`
**CLI Fallback (if MCP unavailable):**
az command --subscription X
## Related Skills
- For X: `azure-x-workflow`
- For Y: `azure-y-guide`The frontmatter is the most critical part—it determines when your skill is invoked. The LLM uses the description field to decide whether to route a request to your skill.
- User makes a request: "Deploy my React app to Azure"
- LLM scans all available skills'
descriptionfields - LLM matches request keywords against skill descriptions
- Best-matching skill is invoked (or none, if no match)
Implication: If your description doesn't contain the right trigger phrases, your skill won't be invoked—even if it's the right tool for the job.
| Element | Purpose | Example |
|---|---|---|
name |
Unique identifier | azure-deploy |
description |
Triggers + anti-triggers + relationships | See below |
description: |
**WORKFLOW SKILL** - Process PDF files including text extraction, rotation, and merging.
USE FOR: "extract PDF text", "rotate PDF", "merge PDFs", "PDF to text".
DO NOT USE FOR: creating PDFs from scratch (use document-creator),
image extraction (use image-extractor).
INVOKES: pdf-tools MCP for extraction, file-system for I/O.
FOR SINGLE OPERATIONS: Use pdf-tools MCP directly for simple extractions.Why each element matters:
| Element | Purpose | What Happens Without It |
|---|---|---|
**WORKFLOW SKILL** |
Signals multi-step nature | LLM may route single ops here |
USE FOR: |
Explicit triggers | Skill won't trigger on relevant requests |
DO NOT USE FOR: |
Anti-triggers | False positives, conflicts with other skills |
INVOKES: |
MCP relationship | LLM doesn't know skill uses tools |
FOR SINGLE OPERATIONS: |
Bypass guidance | Users confused about when to use skill vs. tool |
Add a prefix to clarify the skill type:
| Prefix | Use When |
|---|---|
**WORKFLOW SKILL** |
Multi-step orchestration |
**UTILITY SKILL** |
Single-purpose helper |
**ANALYSIS SKILL** |
Read-only analysis/reporting |
Effectiveness Note: These prefixes improve routing based on qualitative testing and observed behavior during development. Formal A/B testing with quantified metrics would strengthen these recommendations. The prefixes work because they add semantic signal to the description field, which LLMs use for routing decisions (see Appendix A). In practice, we've observed fewer false positives when prefixes clearly signal the skill's intent.
Skills are scored on compliance using the criteria below. Target: Medium-High or better.
Tooling Note: The scoring criteria described here were developed using internal evaluation frameworks (sensei for skill analysis, waza for trigger testing). These tools are not currently publicly available, but you can apply the same criteria manually or build equivalent tooling. The key is having a consistent rubric for evaluating skill quality before deployment.
| Score | Requirements |
|---|---|
| Low | Description < 150 chars OR no triggers |
| Medium | Description >= 150 chars AND has trigger keywords |
| Medium-High | Has "USE FOR:" AND "DO NOT USE FOR:" |
| High | Medium-High + routing clarity (INVOKES/FOR SINGLE OPERATIONS) |
description: 'Process PDF files'description: |
**WORKFLOW SKILL** - Process PDF files including text extraction, rotation, and merging.
USE FOR: "extract PDF text", "rotate PDF", "merge PDFs", "PDF to text".
DO NOT USE FOR: creating PDFs from scratch (use document-creator).
INVOKES: pdf-tools MCP for extraction.
FOR SINGLE OPERATIONS: Use pdf-tools MCP directly.| File | Soft Limit | Hard Limit |
|---|---|---|
SKILL.md |
500 tokens | 5,000 tokens |
references/*.md |
1,000 tokens | 5,000 tokens |
Use a .token-limits.json configuration:
{
"defaults": {
"SKILL.md": 500,
"references/**/*.md": 1000
},
"overrides": {
"README.md": 3000
}
}Enforcement Note: Token limit enforcement is currently a design pattern, not an automated gate. The
.token-limits.jsonfile serves as documentation and can be enforced via CI scripts (count tokens using tiktoken or similar). The limits are based on observed context window usage and agent performance degradation with oversized skills. If building automated enforcement, integrate token counting into your skill linting pipeline.
Keep SKILL.md lean. Put deep content in references:
my-skill/
├── SKILL.md # Workflow orchestration only
└── references/
├── services/
│ └── static-web-apps.md # SWA-specific patterns
├── recipes/
│ └── deploy-react.md # Step-by-step for React
└── patterns/
└── error-handling.md # Common error resolutions
Reference them in SKILL.md:
See [SWA Configuration](references/services/static-web-apps.md) for framework-specific settings.Scripts in scripts/ are executed, not loaded into context. Write them defensively.
Be explicit about how scripts should be used:
| Instruction | Meaning |
|---|---|
"Run scripts/validate.py" |
Execute the script |
"See scripts/validate.py for the algorithm" |
Read as reference, don't execute |
Scripts should handle errors gracefully—the agent can't debug runtime failures:
# Good: Recovers from errors
def process_file(path):
try:
with open(path) as f:
return f.read()
except FileNotFoundError:
print(f"File {path} not found, using default")
return ''
# Bad: Crashes unexpectedly
def process_file(path):
return open(path).read()Document and verify dependencies:
**Requirements:** Python 3.11+, `pypdf` package
Install: `pip install pypdf`Always use forward slashes: scripts/helper.py ✅, scripts\helper.py ❌
As the number of skills grows, organization becomes critical. This section covers patterns for structuring related skills to avoid trigger collisions, enable cross-cutting guidance, and maintain clear routing.
When building skills for a large domain (e.g., data services, compute platforms, messaging systems), you face a fundamental choice about granularity:
| Approach | Characteristics | Trade-offs |
|---|---|---|
| One skill per service | Each service gets its own skill with deep, focused content | Precise activation, smaller context; but no cross-service guidance |
| One consolidated skill | All related services in a single skill | Cross-service guidance, single entry point; but bloated context window |
| Orchestrator + service skills | Routing skill delegates to specialized skills | Best of both; but more complex to maintain |
The right choice depends on your domain's complexity and how often users need cross-service guidance.
skills/
├── database-postgres/
├── database-mysql/
├── database-mongodb/
└── storage-blob/
When to use:
- Services are distinct with minimal overlap
- Users rarely ask "which should I use?"
- Each skill is self-contained
Pros: Precise activation, smaller context per invocation, independent evolution Cons: No cross-service guidance, potential duplication of shared patterns
skills/
└── data-services/
├── SKILL.md # All database + storage content
└── references/
├── postgres.md
├── mysql.md
└── mongodb.md
When to use:
- Services are tightly related
- Users frequently compare options
- Shared patterns dominate (auth, backup, networking)
Pros: Single entry point, cross-service guidance built-in Cons: Large context window usage, trigger phrase collisions, monolithic maintenance
skills/
├── data-services/ # Orchestrator
│ ├── SKILL.md # Decision trees, comparisons, routing
│ └── references/
│ ├── selection-guide.md
│ └── migration-patterns.md
├── database-postgres/ # Service skill
├── database-mysql/ # Service skill
└── storage-blob/ # Service skill
When to use:
- Domain has both cross-cutting concerns AND deep service-specific content
- Users ask both "which should I use?" AND "how do I configure X?"
- You want to scale the number of services without bloating a single skill
Pros: Cross-service guidance without context bloat, clear routing, independent service skill evolution Cons: More skills to maintain, requires careful trigger phrase design
The orchestrator skill handles cross-cutting concerns and routing decisions, while service skills handle implementation details.
Orchestrator responsibilities:
- Decision trees ("Which service should I use?")
- Comparison tables (Service A vs. Service B)
- Cross-service patterns (authentication, networking, migration)
- Explicit routing to service skills
Service skill responsibilities:
- Service-specific configuration
- Implementation guides
- Troubleshooting
- Service-specific best practices
The orchestrator's description must explicitly define boundaries:
---
name: data-services
description: >
Data service selection and cross-cutting patterns.
USE FOR: compare databases, choose data store, data migration strategy,
which database to use, Service A vs Service B decisions.
DO NOT USE FOR: service-specific implementation (use database-postgres,
database-mysql, storage-blob directly for configuration tasks).
---Service skills include reciprocal boundaries:
---
name: database-postgres
description: >
PostgreSQL configuration, authentication, and operations.
USE FOR: PostgreSQL setup, configuration, query optimization, auth setup.
DO NOT USE FOR: comparing database options (use data-services).
---Without clear boundaries, both orchestrator and service skills may activate for the same prompt, causing inconsistent behavior.
Problem pattern (collision):
# Orchestrator
description: "Help with databases including PostgreSQL setup"
# Service skill
description: "Help with PostgreSQL setup and configuration"Both match "help me set up PostgreSQL" → unpredictable routing.
Solution pattern (clear boundaries):
# Orchestrator
description: >
USE FOR: compare databases, choose data store
DO NOT USE FOR: PostgreSQL setup (use database-postgres)
# Service skill
description: >
USE FOR: PostgreSQL setup, configuration, optimization
DO NOT USE FOR: comparing databases (use data-services)"Help me set up PostgreSQL" → database-postgres
"Should I use PostgreSQL or MySQL?" → data-services
| User Intent | Activated Skill | Rationale |
|---|---|---|
| "Which database should I use?" | Orchestrator | Cross-service decision |
| "Compare PostgreSQL and MySQL" | Orchestrator | Comparison query |
| "Set up PostgreSQL authentication" | Service (postgres) | Service-specific implementation |
| "Optimize my MySQL queries" | Service (mysql) | Service-specific task |
| "Migrate from MySQL to PostgreSQL" | Orchestrator | Cross-service workflow |
When implementing the orchestrator pattern, tests must verify proper routing:
Orchestrator tests:
- Activates for cross-cutting prompts (comparisons, selection, migration)
- Does NOT activate for service-specific prompts
Service skill tests:
- Activates for service-specific prompts
- Does NOT activate for orchestrator prompts (add negative test cases)
// Example: Service skill negative tests
describe('Should NOT Trigger (Orchestrator Handles These)', () => {
const orchestratorPrompts = [
'Which database should I use?',
'Compare PostgreSQL and MySQL',
'Help me choose a data store',
];
test.each(orchestratorPrompts)(
'does not trigger on: "%s"',
(prompt) => {
const result = triggerMatcher.shouldTrigger(prompt);
expect(result.triggered).toBe(false);
}
);
});Consider adding an orchestrator when:
- Users frequently ask comparison questions — "Should I use A or B?"
- Multiple skills share patterns — Authentication, networking, backup strategies
- The domain is growing — Adding more services that need unified guidance
- Context window is a concern — Individual skills are getting too large
Don't add an orchestrator when:
- Services are unrelated (no cross-service questions)
- The domain is small (2-3 skills with minimal overlap)
- Maintenance overhead isn't justified
| Factor | Flat | Consolidated | Orchestrator |
|---|---|---|---|
| Cross-service guidance | None | Built-in | Via orchestrator |
| Context efficiency | Best | Worst | Good |
| Maintenance complexity | Low | Medium | Higher |
| Trigger collision risk | Low | High | Low (if designed well) |
| Scales with services | Yes | No | Yes |
Rule of thumb: Start with flat (Pattern A). When cross-service questions become common, introduce an orchestrator (Pattern C). Avoid consolidated (Pattern B) unless the domain is small and stable.
This section covers when and how to create MCP Tools. Remember: Tools are model-controlled—the LLM decides when to call them based on schema and description.
MCP Tools are for discrete, atomic operations. The decision framework:
| Criteria | Example | Why It's a Tool |
|---|---|---|
| Exposing a new API endpoint | Key Vault secret retrieval | Direct API wrapper |
| Operation is atomic | List storage accounts | Single request/response |
| Returns data for further processing | Get metrics | LLM needs the output |
| No decisions required | Delete a resource by ID | Parameters fully specify action |
| Can describe in one sentence | "Get the value of a secret from Key Vault" | Clear, bounded scope |
| Criteria | Example | What to Build Instead |
|---|---|---|
| Multi-step workflow | "Deploy my app" | Skill (orchestrates steps) |
| User decisions mid-process | "Set up monitoring" | Skill (guides decisions) |
| Needs context accumulation | "Troubleshoot this error" | Skill (maintains state) |
| Duplicates existing capability | Another way to list VMs | Nothing (use existing) |
1. Single Responsibility One tool = one operation. Don't create a tool that "creates or updates or deletes" based on parameters. Create three tools.
2. Clear Naming Names should be verb_noun or noun_verb patterns that clearly indicate the action:
- ✅
secret_get,account_list,container_create - ❌
handle_secret,manage_storage,do_operation
3. Descriptive Schemas
The description field is how the LLM decides to use your tool. Be explicit:
- ✅ "Get the value of a specific secret from an Azure Key Vault. Returns the secret value and metadata."
- ❌ "Key Vault operations"
4. Skill References Help the LLM understand when NOT to use your tool by referencing Skills:
FOR FULL WORKFLOW: Use `azure-security` skill for Key Vault setup and configuration.
Tools are defined using JSON Schema. The schema tells the LLM what parameters are available and required:
{
"name": "keyvault_secret_get",
"title": "Get Key Vault Secret",
"description": "Retrieve the value of a specific secret from Azure Key Vault. Returns the secret value and metadata. FOR FULL WORKFLOW: Use azure-security skill for Key Vault setup.",
"inputSchema": {
"type": "object",
"properties": {
"vault_name": {
"type": "string",
"description": "Name of the Key Vault"
},
"secret_name": {
"type": "string",
"description": "Name of the secret to retrieve"
},
"version": {
"type": "string",
"description": "Optional: specific version of the secret"
}
},
"required": ["vault_name", "secret_name"]
}
}Schema Best Practices:
| Element | Best Practice | Why |
|---|---|---|
name |
Use resource_action pattern |
Predictable, searchable |
description |
Include skill cross-reference | Helps routing decisions |
required |
List only truly required params | Reduces friction |
Property description |
Be specific about format/constraints | LLM generates better calls |
Consistent naming helps both LLMs and developers find the right tool:
Namespace: {platform}-{service}
Command: {resource}_{action}
Examples:
- Namespace: azure-storage
- storage_account_list (list all storage accounts)
- storage_blob_get (get a specific blob)
- storage_container_create (create a container)
- Namespace: azure-keyvault
- keyvault_list (list all vaults)
- keyvault_secret_get (get a secret value)
- keyvault_secret_set (set a secret value)
Naming Rules:
- Namespace = service identity. All tools for a service share the namespace.
- Resource = what you're operating on. Usually the ARM resource type.
- Action = the operation. Use standard verbs:
list,get,create,update,delete,query.
Include skill references in MCP tool descriptions to improve routing:
**EXECUTION TOOL** - [One sentence describing what it does].
USE FOR: [Specific operations this tool handles].
FOR FULL WORKFLOW: Use `skill-name` skill for [workflow description].
FOR GUIDANCE: Use `skill-name` skill to understand [concept].
Example:
**EXECUTION TOOL** - Execute Azure Developer CLI (azd) commands.
USE FOR: Running azd up, azd deploy, azd provision, getting deployment logs.
FOR FULL WORKFLOW: Use `azure-deploy` skill (prepare → validate → deploy chain).
FOR GUIDANCE: Use `azure-prepare` skill to configure azure.yaml before running azd.
For configuration patterns and reference material, create best practices files that MCP can serve:
Purpose: Centralize patterns that multiple Skills might need. Skills call get_azure_bestpractices(resource="X") instead of embedding duplicate content.
File: azure-swa-best-practices.txt
# Azure Static Web Apps Best Practices
## azure.yaml Configuration
services:
web:
host: staticwebapp
...
## Bicep Patterns
resource staticWebApp 'Microsoft.Web/staticSites@2022-09-01' = {
...
}
## Build Output by Framework
| Framework | outputLocation |
| React | build |
| Vue | dist |
| Angular | dist/{project} |This section describes how Skills and MCP Tools work together. The key insight: Skills should orchestrate, MCP should execute. When this pattern is followed, you get consistent, maintainable, and testable workflows.
The hybrid pattern assigns clear responsibilities:
┌─────────────────────────────────┐ ┌──────────────────────────────┐
│ MCP = "WHAT" │ │ Skills = "HOW" │
│ (Patterns & Configurations) │ │ (Workflow Orchestration) │
│ │ │ │
│ • azure.yaml snippets │◄───│ • Detection logic │
│ • Bicep resource patterns │ │ • Workflow steps │
│ • SKU guidance │ │ • Error handling │
│ • Build output by framework │ │ • Decision trees │
│ • API references │ │ • User interaction │
└─────────────────────────────────┘ └──────────────────────────────┘
Single Source of Truth Invokes MCP for patterns
Why this works:
- No duplication: Patterns live in one place (MCP). Skills reference them.
- Easy updates: Change a pattern in MCP; all Skills get the update.
- Clear ownership: MCP team owns patterns; Skill team owns workflows.
- Testable: Test patterns independently from workflows.
Key Principle: Skills call get_azure_bestpractices(resource="static-web-app") instead of embedding duplicate content.
Real-World Example: The Static Web Apps routing fix (see Section 8) moved build output patterns from the
azure-prepareskill into MCP's best practices file. The skill now calls MCP to get the patterns, ensuring consistency.
The skill orchestrates the workflow; MCP tools execute the operations:
SKILL orchestrates → MCP executes → SKILL interprets → User output
This pattern ensures:
- Workflow logic stays in Skills — Decisions, branching, error handling
- Execution stays in MCP — API calls, data retrieval, resource operations
- Results are synthesized by Skills — Combine outputs into user-facing guidance
Example: Cost Optimization Workflow
## Step 1: Load Best Practices
Use `azure-get_azure_bestpractices` MCP tool with:
- resource: "cost-optimization"
- action: "all"
## Step 2: Discover Resources
Use `azure-storage` MCP tool → `storage_account_list`
Use `azure-cosmos` MCP tool → `cosmos_account_list`
## Step 3: Run Compliance Check
Use `azure-extension_azqr` MCP tool for orphaned resources
## Step 4: Generate Report (Skill logic)
Synthesize MCP results into actionable recommendations| Anti-Pattern | Problem | Correct Pattern |
|---|---|---|
| Skill embeds CLI commands | Bypasses MCP, creates duplication | Skill invokes MCP tool |
| MCP tool includes workflow logic | Tools should be atomic | Move logic to Skill |
| Skill duplicates MCP patterns | Two sources of truth, drift | Skill calls MCP for patterns |
| Tool has no skill reference | LLM doesn't know when to use Skill | Add FOR FULL WORKFLOW in tool description |
| Skill doesn't list MCP dependencies | Hard to maintain, unclear requirements | Add MCP Tools Used section |
The Preparation Manifest connects the three core skills and maintains state across workflow steps:
┌─────────────────────────────────────────────────────────────────────┐
│ PREPARE │
│ Get app Azure-ready │
│ Discovery → Architecture Planning → File Generation → Manifest │
└─────────────────────────────────────────────────────────────────────┘
│ outputs
▼
┌───────────────────────────────┐
│ PREPARATION MANIFEST │
│ .azure/preparation.md │
│ │
│ • Application components │
│ • Generated artifacts │
│ • Deployment config │
│ • Validation requirements │
│ • Decision log │
└───────────────────────────────┘
│ reads
▼
┌─────────────────────────────────────────────────────────────────────┐
│ VALIDATE │
│ Read Manifest → Execute Validation Checks → Update Manifest │
└─────────────────────────────────────────────────────────────────────┘
│ reads
▼
┌─────────────────────────────────────────────────────────────────────┐
│ DEPLOY │
│ Read Manifest → Execute Deployment → Record Outcome │
└─────────────────────────────────────────────────────────────────────┘
Why use a manifest?
- State persistence: Skills are stateless; the manifest maintains context
- Resumability: User can stop and restart the workflow
- Auditability: Decision log shows why choices were made
- Validation: Each skill can verify prerequisites from the manifest
Every skill should have an "MCP Tools Used" section that documents dependencies:
## MCP Tools Used in This Skill
| Step | Tool | Command | Purpose |
|------|------|---------|---------|
| 1 | `azure-get_azure_bestpractices` | `get_bestpractices` | Load guidance |
| 3 | `azure-deploy` | `plan_get` | Analyze workspace |
| 5 | `azure-azd` | `up` | Execute deployment |
**If Azure MCP is not enabled:** Run `/mcp add azure` or use CLI fallback.Benefits:
- Developers know what MCP tools to have enabled
- LLM understands the skill-tool relationship
- Maintenance is easier (clear dependencies)
For multi-step workflows, provide a copyable checklist that makes progress trackable:
## Deployment Workflow
Copy this checklist and track progress:
Deployment Progress:
- Step 1: Validate prerequisites (azure.yaml, authentication)
- Step 2: Run pre-flight checks (azd validate)
- Step 3: Execute deployment (azd up)
- Step 4: Verify deployment succeeded
- Step 5: Run smoke tests
**Step 1: Validate prerequisites**
Check that azure.yaml exists and contains valid configuration...
Why this pattern works:
| Benefit | How |
|---|---|
| Verifiable progress | Each checkbox = completed state |
| Resumable | Agent can restart from failed step |
| Visible | User sees exactly where workflow is |
| Debuggable | Failed step is obvious |
When to use:
- Workflows with 3+ sequential steps
- Tasks that might fail mid-way
- Processes where order matters
Cross-reference: See §4.2 Degrees of Freedom for guidance on how prescriptive each step should be.
Good Pattern (azure-observability):
| Service | Use When | MCP Tools | CLI |
|---------|----------|-----------|-----|
| Azure Monitor | Metrics, alerts | `azure__monitor` | `az monitor` |Good Pattern (azure-security):
### Key Vault
- `azure__keyvault` with command `keyvault_list` - List Key Vaults
- `azure__keyvault` with command `keyvault_secret_get` - Get secret valuedescription: |
**WORKFLOW SKILL** - Orchestrates deployment through preparation, validation, execution.description: |
...
INVOKES: `azure-deploy` MCP tool, `azure-azd` MCP tool for execution.
FOR SINGLE OPERATIONS: Use `azure-azd` MCP tool directly for single azd commands.| Content Type | Belongs In |
|---|---|
| azure.yaml snippets | MCP best practices |
| Bicep patterns | MCP best practices |
| SKU guidance | MCP best practices |
| Detection logic | Skill |
| Workflow steps | Skill |
| Error handling | Skill |
Use Waza-style trigger testing:
# trigger_tests.yaml
shouldTriggerPrompts:
- "deploy my app to Azure"
- "set up Azure deployment"
- "prepare for Azure"
shouldNotTriggerPrompts:
- "list my storage accounts"
- "run azd up"
- "check resource health"{
"defaults": {
"SKILL.md": 500,
"references/**/*.md": 1000
}
}Bad:
MCP: azure.yaml template with host: staticwebapp
SKILL: Also contains azure.yaml template with host: staticwebapp
Good:
MCP: Single source of truth for azure.yaml patterns
SKILL: Invokes MCP for patterns, focuses on workflow
Bad (azure-diagnostics problem):
# Current: Embeds CLI commands directly
az containerapp show --name APP -g RG --query "properties.configuration.registries"Good:
Use azure-applens MCP tool for AI-powered diagnostics, or
Use azure-resourcehealth MCP tool to check availability status.
CLI Fallback (if MCP unavailable):
az containerapp show --name APP -g RGBad (SWA CLI vs azd issue):
MCP: "Use npx swa deploy"
SKILL: "Use azd up with Bicep"
→ Agent picks randomly → ~50% deployment failures
Good:
MCP: Patterns only (azure.yaml, Bicep templates)
SKILL: Workflow only (calls MCP for patterns)
→ Single path → Consistent results
Bad (Low compliance):
description: 'Process PDF files'Good (High compliance):
description: |
**WORKFLOW SKILL** - Process PDF files including extraction and merging.
USE FOR: "extract PDF", "merge PDFs". DO NOT USE FOR: creating PDFs.Bad:
description: |
Deploy applications to Azure.
USE FOR: azd up, azd deploy, push to Azure.Good:
description: |
Deploy applications to Azure.
USE FOR: azd up, azd deploy, push to Azure.
DO NOT USE FOR: listing resources (use azure-xxx MCP), querying logs (use azure-monitor MCP).Bad:
azure-deployskill existsazure-deployMCP tool exists- No guidance on which to use
Good:
# Skill description:
description: |
**WORKFLOW SKILL** - Full deployment workflow.
FOR SINGLE COMMANDS: Use `azure-azd` MCP tool directly.
# MCP tool description:
description: |
**EXECUTION TOOL** - Execute deployment commands.
FOR FULL WORKFLOW: Use `azure-deploy` skill.Skills execute in the user's environment with significant privileges. Build defensively.
| Do | Don't |
|---|---|
| Use environment variables for secrets | Hardcode credentials or API keys |
| Validate inputs before passing to scripts | Trust user input blindly |
| Document required permissions | Request more permissions than needed |
| Handle errors gracefully | Let scripts fail silently |
# Good: Explicit error handling, safe defaults
def process_file(path):
try:
with open(path) as f:
return f.read()
except FileNotFoundError:
print(f"File {path} not found, using default")
return ''
except PermissionError:
print(f"Cannot access {path}")
return ''
# Bad: Fails unexpectedly, no recovery
def process_file(path):
return open(path).read() # Crashes on missing fileBefore approving a skill, check:
- No hardcoded secrets or credentials
- Scripts handle errors without data loss
- External URLs/APIs are documented and necessary
- Destructive operations require confirmation
- No unexpected network calls or data exfiltration
This section presents real examples of Skills and MCP Tools working together (and conflicts that arose when they didn't). These case studies are drawn from Azure ecosystem development but the patterns apply broadly.
This case study illustrates the core problem this guide addresses: conflicting guidance from uncoordinated systems.
When a user says "deploy my React app to Azure", two systems provided conflicting guidance:
| System | File | Guidance | Result |
|---|---|---|---|
| MCP | azure-swa-best-practices.txt |
Use SWA CLI (npx swa deploy) |
❌ Non-IaC, unreliable |
| Skills | azure-deploy + azure-prepare |
Use azd up with Bicep |
✅ IaC, reproducible |
No coordination layer existed → Agent picked randomly → Estimated ~50% deployment failures when conflicting approaches were mixed.
Key Insight: The observation that identified this issue: "I think I found part of the problem. Our azure best practices tool doesn't use azd for SWA guidance which conflicts with the other guidance in both skills and general deployment."
Hybrid Architecture:
┌─────────────────────────────────┐ ┌──────────────────────────────┐
│ MCP = "WHAT" │ │ Skills = "HOW" │
│ (Patterns & Configurations) │ │ (Workflow Orchestration) │
│ │ │ │
│ • azure.yaml snippets │◄───│ • Detection logic │
│ • Bicep resource patterns │ │ • Workflow steps │
│ • SKU guidance │ │ • Error handling │
│ • Build output by framework │ │ • Decision trees │
└─────────────────────────────────┘ └──────────────────────────────┘
Changes Made:
-
MCP (azure-swa-best-practices.txt):
- Replaced CLI-only guidance with comprehensive azd patterns
- Added azure.yaml configurations
- Added Bicep resource patterns
- Kept SWA CLI as explicit-only alternative
-
Skills (azure-deploy, azure-prepare):
- Slimmed down, removed duplicate patterns
- Added
get_azure_bestpractices(resource="static-web-app")invocation - Added SWA detection signals
Metrics Note: The ~50% failure estimate is based on observed behavior during pre-fix testing where the agent would inconsistently apply SWA CLI vs azd approaches. Post-fix formal evaluation is in progress. Early qualitative observations show improved consistency (agent now reliably uses azd for deployment workflows), but quantified failure rate reduction requires controlled testing that is currently underway. We will update this section with hard metrics when available.
| User Prompt | Expected Behavior | Pass Criteria |
|---|---|---|
| "Deploy my React app" | Uses azd, NOT swa CLI | No npx swa commands |
| "Use SWA CLI to deploy" | Uses SWA CLI (explicit) | npx swa deploy allowed |
| "Preview my app locally" | Uses SWA CLI for preview | npx swa start |
Both azure-functions skill and azure-functionapp MCP tool exist.
| User Intent | Route | Target |
|---|---|---|
| "Create a new Function App" | SKILL | azure-functions (creation workflow) |
| "List my Function Apps" | MCP | azure-functionapp (data query) |
| "How do Functions triggers work?" | SKILL | azure-functions (knowledge) |
| "Get function app settings" | MCP | azure-functionapp (data retrieval) |
Skill Description Update:
description: |
**WORKFLOW SKILL** - Create and configure Azure Functions.
USE FOR: "create function app", "add Azure Function", "set up serverless".
DO NOT USE FOR: listing functions (use azure-functionapp MCP), querying logs.
INVOKES: `azure-functionapp` MCP for queries, `azure-azd` for deployment.The azure-security skill demonstrates ideal MCP cross-referencing:
MCP Server (Preferred):
azure__keyvaultwith commandkeyvault_list- List Key Vaultsazure__keyvaultwith commandkeyvault_secret_get- Get secret valueazure__keyvaultwith commandkeyvault_secret_set- Set secret valueIf Azure MCP is not enabled: Run
/azure:setupor enable via/mcp.CLI Fallback:
az keyvault list --subscription $SUB az keyvault secret show --vault-name $VAULT --name $SECRET
This pattern:
- ✅ Prefers MCP tools
- ✅ Documents fallback path
- ✅ Maintains skill focus on workflow, not execution
Building Skills and Tools is only half the job—you need to verify they work correctly. This section covers testing strategies for both trigger accuracy (does the right thing get invoked?) and task completion (does it actually work?).
| Test Type | What It Catches | Consequence of Skipping |
|---|---|---|
| Trigger accuracy | False positives/negatives in routing | Wrong skill invoked; user frustration |
| Task completion | Broken workflows, missing steps | Deployment failures; data loss |
| Regression | Breaking changes from updates | Previously working flows break |
Waza is a framework for evaluating Agent Skills with task completion metrics and trigger accuracy testing:
# Install
pip install waza
# Generate eval from skill
waza generate --repo microsoft/GitHub-Copilot-for-Azure --skill azure-functions -o ./eval
# Run evaluation
waza run eval.yamlTest that your skill triggers on the right prompts:
# trigger_tests.yaml
name: my-skill-triggers
skill: my-skill
shouldTriggerPrompts:
- "deploy my app to Azure"
- "set up Azure deployment"
- "prepare for Azure"
- "help me deploy"
- "configure Azure hosting"
shouldNotTriggerPrompts:
- "list my storage accounts"
- "run azd up"
- "check resource health"
- "get my subscription"
- "query logs"Define tasks with success criteria:
# tasks/deploy-app.yaml
id: deploy-app-001
name: Deploy Container App
inputs:
prompt: "Deploy my app to Azure Container Apps"
context:
files: ["Dockerfile", "app.py"]
expected:
output_contains:
- "container"
- "deployed"
tool_calls:
required:
- pattern: "az containerapp"
forbidden:
- pattern: "rm -rf"# .github/workflows/skill-eval.yaml
name: Skill Evaluation
on:
pull_request:
paths:
- 'skills/**'
jobs:
eval:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install waza
run: pip install waza
- name: Run evaluations
run: waza run evals/my-skill/eval.yaml --output results.json
- name: Check thresholds
run: |
python -c "
import json
r = json.load(open('results.json'))
assert r['summary']['composite_score'] >= 0.8
"Use Sensei to improve skill frontmatter:
# Run on a single skill
Run sensei on my-skill
# Run on all low-adherence skills
Run sensei on all Low-adherence skillsSensei will:
- Score current compliance (Low → High)
- Add USE FOR trigger phrases
- Add DO NOT USE FOR anti-triggers
- Add INVOKES for tool relationships
- Verify token budget
- Run tests
This section provides authoritative sources for deeper learning. These references were used in creating this guide.
| Resource | URL | What You'll Learn |
|---|---|---|
| MCP Specification | https://modelcontextprotocol.io/specification/latest | Protocol details, message formats |
| MCP Architecture Overview | https://modelcontextprotocol.io/docs/concepts/architecture | Host/client/server relationships |
| MCP Tools Concepts | https://modelcontextprotocol.io/docs/concepts/tools | How tools work, schema definition |
| MCP Prompts Concepts | https://modelcontextprotocol.io/docs/concepts/prompts | User-controlled primitives |
| Code Execution with MCP (Anthropic) | https://www.anthropic.com/engineering/code-execution-with-mcp | Real-world MCP patterns |
| Resource | URL | What You'll Learn |
|---|---|---|
| Copilot SDK Architecture | https://deepwiki.com/github/copilot-sdk/3-sdk-architecture | How Copilot integrates extensions |
| Awesome Copilot | https://github.com/github/awesome-copilot | Curated list of resources |
| Maximizing Copilot's Agentic Capabilities | https://github.blog/ai-and-ml/github-copilot/how-to-maximize-github-copilots-agentic-capabilities/ | Best practices for agent workflows |
| Resource | URL | What You'll Learn |
|---|---|---|
| Azure MCP Server | https://github.com/microsoft/mcp/tree/main/servers/Azure.Mcp.Server | Production MCP server implementation |
| GitHub Copilot for Azure Skills | https://github.com/microsoft/GitHub-Copilot-for-Azure/tree/main/plugin/skills | Production skill examples |
| MCP Commands Reference | https://github.com/microsoft/mcp/blob/main/servers/Azure.Mcp.Server/docs/azmcp-commands.md | Available Azure MCP commands |
| Resource | URL | What You'll Learn |
|---|---|---|
| Waza (Skill Evaluation) | https://github.com/spboyer/waza | Testing framework for skills |
| Sensei (Frontmatter Improvement) | https://github.com/spboyer/sensei | Automated skill compliance fixes |
| Resource | URL | What You'll Learn |
|---|---|---|
| Building Effective AI Agents | https://www.anthropic.com/research/building-effective-agents | Core agent patterns (routing, chaining, orchestration) |
| Agent Skills Engineering Blog | https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills | Progressive disclosure, skill authoring best practices |
| Claude Skills Cookbook | https://github.com/anthropics/claude-cookbooks/tree/main/skills | Practical skill examples |
| Agent Skills Open Standard | https://agentskills.io/specification | Specification for portable skills |
| User Says | Route To | Why |
|---|---|---|
| "Deploy my app" | SKILL | Workflow |
| "List my resources" | MCP | Data query |
| "Help me set up" | SKILL | Guidance |
| "Run this command" | MCP | Execution |
| "What went wrong" | SKILL | Diagnosis |
| Skill | Primary MCP Tools |
|---|---|
azure-prepare |
azure-deploy (plan_get), azure-get_azure_bestpractices |
azure-validate |
azure-azd (validate_azure_yaml) |
azure-deploy |
azure-azd (up, deploy), azure-deploy (app_logs_get) |
azure-diagnostics |
azure-applens, azure-resourcehealth, azure-monitor |
azure-functions |
azure-functionapp, azure-azd |
azure-observability |
azure-monitor, azure-applicationinsights |
azure-security |
azure-keyvault, azure-role |
---
name: azure-{domain}
description: |
**WORKFLOW SKILL** - {One-line description}.
USE FOR: {trigger1}, {trigger2}, {trigger3}.
DO NOT USE FOR: {scenario1} (use {other}), {scenario2}.
INVOKES: `{mcp-tool-1}`, `{mcp-tool-2}`.
FOR SINGLE OPERATIONS: Use `{mcp-tool}` directly.
---This appendix explores how different AI platforms decide between using their own knowledge, invoking tools (like MCP), or activating skills/prompts. Understanding these mechanisms can help us write better descriptions and improve routing accuracy.
All LLM-based agents face the same fundamental question: Given a user prompt and a set of available capabilities, which (if any) should be invoked?
The answer varies by platform, but the core mechanisms are similar:
User Prompt → Intent Analysis → Capability Matching → Decision
↓
┌─────────────────────────────────────────┐
│ Answer directly (LLM knowledge) │
│ Invoke tool (function call) │
│ Activate skill/prompt (workflow) │
│ Request clarification (ask user) │
└─────────────────────────────────────────┘
Mechanism: Function calling via trained policy + orchestration layer
| Component | How It Works |
|---|---|
| Function schemas | Developers define JSON schemas with name, description, parameters |
| Intent matching | Model analyzes prompt against function descriptions |
| Decision | Outputs tool_call message if function matches; otherwise answers directly |
| Confidence | Picks function with highest semantic similarity to prompt |
Key insight: The description field is critical. GPT uses it to decide whether to call your function. Poor descriptions = poor routing.
"The decision to call a function is made purely by the model, based on prompt-to-function intent matching and context." — OpenAI Function Calling Docs
Mechanism: Tool use with progressive disclosure + MCP integration
| Component | How It Works |
|---|---|
| Tool discovery | Claude can search for tools dynamically (doesn't load all at once) |
| Progressive disclosure | Only loads schemas of relevant tools based on query |
| MCP integration | Uses tool_use blocks via MCP protocol |
| Code orchestration | Can generate code to orchestrate multi-tool workflows |
Key insight: Claude's "progressive disclosure" means it searches for the right tool rather than scanning all tools. Clear, distinct descriptions help Claude find your tool.
"Tools built for agents are most ergonomic—and effective—when they are intuitive for both non-deterministic agents and humans." — Anthropic: Writing Tools for Agents
Mechanism: Function declarations with semantic routing
| Component | How It Works |
|---|---|
| Function declarations | Schema with name, description, parameters passed at runtime |
| Intent analysis | Compares user intent to function descriptions |
| Routing | Semantic similarity + context determines function selection |
| Parallel calls | Can call multiple functions simultaneously |
Key insight: Gemini emphasizes the quality of function descriptions—more precise descriptions yield better routing.
"The more descriptive and precise the function definitions are, the better Gemini can match them to user requests." — Gemini Function Calling Docs
Mechanism: Embedding-guided skill routing with semantic matching
| Component | How It Works |
|---|---|
| Skill frontmatter | YAML with name and description in SKILL.md |
| Embedding matching | Creates vector embeddings of user prompt and skill descriptions |
| Clustering | Groups skills by similarity to narrow candidates |
| On-demand loading | Only loads matching skill content into context |
Key insight: Copilot uses embedding-based semantic similarity. Your skill's description field is converted to a vector and compared against the user's prompt vector. Similar vectors = skill gets invoked.
"Copilot compares the user's prompt against each available skill's description using embedding-based semantic similarity." — GitHub Blog: Making Copilot Smarter
Despite implementation differences, all platforms share these routing principles:
| Principle | Description | Implication for Skill/Tool Authors |
|---|---|---|
| Description is king | The description field drives routing decisions | Write clear, specific descriptions |
| Semantic matching | Embeddings or intent classifiers compare prompt to description | Use the same words users would use |
| Negative examples help | Stating what something doesn't do prevents misrouting | Include "DO NOT USE FOR" sections |
| Context matters | Conversation history influences routing | Skills should be context-aware |
| Confidence thresholds | If no good match, LLM answers directly | Don't force routing—let LLM decide |
Based on this research, our guidance aligns well with how LLMs actually route:
| Our Recommendation | Why It Works |
|---|---|
| USE FOR: trigger phrases | Matches how embedding similarity works |
| DO NOT USE FOR: anti-triggers | Prevents false positives in semantic matching |
| INVOKES: tool list | Helps LLM understand skill-tool relationships |
| FOR SINGLE OPERATIONS | Provides fallback routing guidance |
| Clear, specific descriptions | Improves embedding quality and intent matching |
Some routing behaviors are opaque or model-specific:
| Factor | What We Know | What We Don't Know |
|---|---|---|
| Embedding models | Used for semantic similarity | Exact model, training data |
| Confidence thresholds | Exist, vary by platform | Specific values |
| Priority when tied | First match? Highest score? | Implementation details |
| Context window impact | More tools = more competition | Exact degradation curve |
Anthropic published specific guidance on writing effective tool descriptions:
-
Be specific about function AND intent
- ❌ "Get information about weather"
- ✅ "Retrieves current weather (temperature, precipitation, condition) for a city. Use only for present-day conditions, not forecasts."
-
Highlight boundaries explicitly
- State what the tool doesn't do
- Prevents misrouting to similar-sounding tools
-
Provide usage examples
- Short canonical examples improve generalization
- "Example: 'What's the weather in Paris right now?'"
-
Namespace tools
- Use prefixes:
Weather_GetCurrent,Weather_GetForecast - Helps LLM distinguish similar tools
- Use prefixes:
-
Return meaningful context
- Tool responses should enable good follow-up decisions
- Balance detail and brevity
To verify your descriptions work across platforms:
# routing_tests.yaml
tests:
- prompt: "Deploy my React app to Azure"
expected_route: skill
expected_target: azure-prepare
- prompt: "List my storage accounts"
expected_route: mcp_tool
expected_target: azure-storage
- prompt: "What is Azure Functions?"
expected_route: llm_knowledge
expected_target: null # No tool neededRun these tests against multiple LLMs to ensure consistent routing.
| Resource | URL |
|---|---|
| OpenAI Function Calling | https://platform.openai.com/docs/guides/function-calling |
| Anthropic Tool Use | https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview |
| Anthropic: Writing Tools for Agents | https://www.anthropic.com/engineering/writing-tools-for-agents |
| Anthropic: Advanced Tool Use | https://www.anthropic.com/engineering/advanced-tool-use |
| Gemini Function Calling | https://ai.google.dev/gemini-api/docs/function-calling |
| GitHub Copilot Skills | https://docs.github.com/en/copilot/concepts/agents/about-agent-skills |
| Copilot Embedding Routing | https://github.blog/ai-and-ml/github-copilot/how-were-making-github-copilot-smarter-with-fewer-tools/ |
Document Version: 2.0 | Last Updated: 2026-02-05
What's New in v2.0:
- Added TL;DR Quick Start section
- Added Three Levels (Progressive Disclosure) diagram
- Added Evaluation-First Development (§4.1)
- Added Degrees of Freedom guidance (§4.2)
- Added Script Guidelines (§4.9)
- Added Workflow Checklist Pattern (§7.6)
- Added Security Considerations (§8.3)
- Added Anthropic Agent Skills references