Skills, Tools & MCP Development Guide

A Practical Guide for Building Agent Capabilities

"Skills orchestrate the how. MCP Tools execute the what."

This guide consolidates learnings from building AI agent capabilities for Azure, focusing on the relationship between Skills (workflow orchestration) and MCP Tools (discrete operations). Whether you're creating a new Copilot Skill, building an MCP server, or integrating both, this document provides the architecture patterns, development best practices, and real-world case studies you need to build effective, non-conflicting agent capabilities. The core principle is simple: Skills act as the "brain" that orchestrates complex workflows, while MCP Tools serve as the "hands" that execute individual operations—and Skills should invoke MCP Tools, not duplicate them.

Introduction
Architecture Overview
When to Use What
Skills Development Guide
Skill Organization Patterns
MCP Tool Development Guide
Integration Patterns
DOs and DON'Ts
Case Studies
Testing & Evaluation
References

Appendices

Appendix A: How LLMs Decide What to Invoke

TL;DR — Quick Start

Core Concepts

Concept	What It Is	When to Use
Skill	A folder with SKILL.md that teaches the agent how to do something	Multi-step workflows, decisions, code generation
MCP Tool	A function the agent can call to do something	Single operations, data retrieval, queries
The Pattern	Skills invoke MCP Tools, not the other way around	Always

One-liner: Skills are onboarding guides for AI agents. MCP Tools are the buttons they press.

Quick Decision Tree

Is this a workflow with decisions?
├─ YES → Create a SKILL
└─ NO  → Is it a single operation?
         ├─ YES → Create an MCP TOOL
         └─ BOTH → Skill orchestrates, MCP executes

Progressive Disclosure (Three Levels)

Skills load content incrementally to stay lean:

Level 1: Metadata — name + description (always in system prompt, ~50 tokens)
Level 2: Instructions — Full SKILL.md (loaded when triggered, <500 tokens ideal)
Level 3: Resources — references/ + scripts/ (loaded on demand)

Key Principles

Match instruction detail to task risk — Fragile operations need exact scripts; flexible tasks need guidance
Use checklists for multi-step workflows — Makes progress trackable and resumable
Test first, document minimally — Write evals before writing docs (§4.1)
Skills call MCP for patterns — Single source of truth, no duplication

Key Files

my-skill/
├── SKILL.md              # Workflow instructions + frontmatter
├── references/           # Deep-dive docs (loaded on demand)
└── scripts/              # Executable code (not loaded, just run)

1. Introduction

Why This Matters

As AI agents become the primary interface for developer workflows, the quality and consistency of agent guidance directly impacts developer success. Skills and MCP Tools are two distinct systems that provide capabilities to AI agents—and when they conflict or overlap without coordination, developers lose trust and productivity.

This guide provides a framework for building Skills and MCP Tools that work together rather than compete. The principles here apply to any domain where both systems coexist.

A Note on Scope and Examples: This guide uses Azure as the primary source of examples because that's where we conducted our research and validated these patterns. However, the guidance is agent-agnostic and domain-agnostic—the same principles apply whether you're building Skills for AWS, GCP, internal platforms, or any other domain. Similarly, while we reference GitHub Copilot as the implementation platform, the Skill pattern itself works across any agent that supports structured capability injection.

Who Should Read This

Audience	What You'll Learn
Architects	How Skills and MCP fit together, when to recommend each, how to design for the hybrid pattern
Skill Developers	How to write Skills that complement (not compete with) MCP Tools, frontmatter best practices
MCP Contributors	When to add a new tool vs. defer to Skills, naming conventions, integration patterns
Platform Teams	How to evaluate existing Skills/Tools, identify overlaps, and improve routing

What Questions This Guide Answers

Before diving in, here are the key questions this guide addresses:

When do I create a Skill vs. an MCP Tool? → Section 3: When to Use What
How do I write a Skill that invokes MCP Tools? → Section 6: Integration Patterns
What does good Skill frontmatter look like? → Section 4: Skills Development Guide
How do I avoid creating conflicting guidance? → Section 7: DOs and DON'Ts
How do I test that my Skill routes correctly? → Section 9: Testing & Evaluation

What Problem Are We Solving?

AI agents need clear, unambiguous guidance to help users effectively. When two capability systems exist side-by-side, problems emerge:

System	Purpose	Control Model
MCP Tools	Execute discrete operations via JSON-RPC	Model-controlled (LLM decides when to invoke)
Copilot Skills	Orchestrate multi-step workflows via prompts	User-controlled (explicit selection)

The problem: When both are exposed to the LLM simultaneously without clear routing, you get:

❌ Duplicate invocations — LLM calls both systems for the same request
❌ Name collisions — Same capability name exists in both (e.g., "deploy")
❌ Conflicting guidance — Different systems suggest incompatible approaches
❌ Inconsistent experience — Same prompt yields different results

Real-World Example: In the Azure ecosystem, research found that MCP's best practices for Static Web Apps recommended SWA CLI (npx swa deploy), while Skills recommended azd up with Bicep. No coordination layer existed—the agent picked randomly, leading to deployment failures when incompatible approaches were mixed. See Section 8: Case Studies for the full analysis and solution.

The Solution: Clear Separation of Concerns

┌─────────────────────────────────────────────────────────────────────┐
│                         USER REQUEST                                 │
│                    "Deploy my app to Azure"                          │
└─────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         LLM ROUTER                                   │
│     Analyzes intent, context, and decides execution path             │
└─────────────────────────────────────────────────────────────────────┘
                     │                           │
          ┌─────────▼─────────┐       ┌─────────▼─────────┐
          │   SKILL LAYER     │       │   MCP TOOL LAYER  │
          │   🧠 THE BRAIN    │       │   🖐️ THE HANDS    │
          │                   │       │                   │
          │ • Workflows       │       │ • CRUD Operations │
          │ • Decisions       │       │ • Queries         │
          │ • Best Practices  │       │ • Direct API Calls│
          │ • Multi-step      │       │ • Single Actions  │
          │   Guidance        │       │                   │
          └─────────┬─────────┘       └─────────▲─────────┘
                    │                           │
                    └───────────────────────────┘
                         Skills INVOKE MCP tools

The Golden Rule

Component	Role	Control Model	Analogy
Skills	Workflow orchestration	User-controlled (explicit selection)	The Brain
MCP Tools	Discrete operations	Model-controlled (LLM decides)	The Hands

The Pattern:

User Request → SKILL (user-initiated workflow) → MCP TOOLS (model-executed actions)

2. Architecture Overview

This section explains the foundational architecture of MCP and Skills. Understanding these primitives is essential before building capabilities—the architecture dictates what goes where.

2.1 MCP Protocol Fundamentals

The Model Context Protocol (MCP) is an open standard created by Anthropic that defines how AI applications (hosts) connect to external data sources and tools (servers). It uses JSON-RPC 2.0 over stdio or HTTP for transport.

┌─────────────────────────────────────────────────────────────────────┐
│                         MCP HOST                                     │
│              (VS Code, Claude Desktop, Copilot CLI)                 │
│                                                                      │
│   ┌─────────────┐   ┌─────────────┐   ┌─────────────┐               │
│   │ MCP Client  │   │ MCP Client  │   │ MCP Client  │               │
│   └──────┬──────┘   └──────┬──────┘   └──────┬──────┘               │
└──────────┼─────────────────┼─────────────────┼──────────────────────┘
           │                 │                 │
           ▼                 ▼                 ▼
    ┌────────────┐    ┌────────────┐    ┌────────────┐
    │ MCP Server │    │ MCP Server │    │ MCP Server │
    │  (Azure)   │    │ (GitHub)   │    │(Filesystem)│
    └────────────┘    └────────────┘    └────────────┘

Key architectural principle: One host can connect to multiple servers simultaneously. Each server exposes capabilities through three primitives, each with a distinct control model.

MCP Primitives

Primitive	Purpose	Control Model	Who Decides When to Use
Tools	Executable functions for actions	Model-controlled	The LLM decides based on context
Prompts	Reusable interaction templates	User-controlled	The user explicitly selects
Resources	Data/context sources	Application-controlled	The host application decides

Understanding control models is critical:

Model-controlled (Tools): The LLM sees the tool's schema and decides autonomously when to call it. You cannot prevent the LLM from calling a tool—you can only influence its decision through descriptions.
User-controlled (Prompts): The user must explicitly select a prompt. The LLM cannot invoke it on its own.
Application-controlled (Resources): The host application determines when to load resources into context.

"Tools enable models to interact with external systems... Each tool is uniquely identified by a name and includes metadata describing its schema." — MCP Tools Documentation

"Prompts are designed to be user-controlled, meaning they are exposed from servers to clients with the intention of the user being able to explicitly select them for use." — MCP Prompts Documentation

Why This Matters for Skills

Copilot Skills are conceptually similar to MCP Prompts—they're user-controlled workflow templates. However, Skills are implemented differently (as markdown files with frontmatter, not JSON-RPC endpoints). The key insight:

Concept	MCP Term	Copilot Implementation	Control
Discrete operation	Tool	MCP Tool	Model-controlled
Workflow template	Prompt	Skill (SKILL.md)	User-controlled
Context data	Resource	Skill references/	Application-controlled

Architectural clarification: The MCP specification describes Tools as "model-controlled" and Skills as "user-controlled," but in practice (particularly in Copilot's implementation), both use similar routing mechanisms:

Both populate the context window with their descriptions
Both are selected via embedding/semantic similarity matching
Both appear as "tool execution" in agent logs

The practical difference is not how they're invoked but what they're designed for: Skills provide rich workflow orchestration with frontmatter (USE FOR, DO NOT USE FOR, INVOKES), while MCP Tools execute discrete operations. The coordination challenge is really about description collision—when a Skill and Tool have overlapping descriptions, either may be selected regardless of your intended workflow. This is why clear, differentiated descriptions matter more than the conceptual skill-vs-tool distinction.

2.2 Copilot Skill Anatomy

A Skill is a structured markdown file that provides workflow guidance to the LLM. Unlike MCP Tools (which execute code), Skills inject prompts and context that guide the LLM's behavior.

File Structure

my-skill/
├── SKILL.md              ◄── Primary skill definition (frontmatter + workflow)
│   ├── ---
│   │   name: my-skill
│   │   description: |
│   │     **WORKFLOW SKILL** - [description]
│   │     USE FOR: [triggers]
│   │     DO NOT USE FOR: [anti-triggers]
│   │     INVOKES: [mcp tools]
│   │   ---
│   └── [Workflow body with steps]
│
├── references/           ◄── Supplemental materials (deep-dive docs)
│   ├── services/             • Service-specific guidance
│   ├── recipes/              • Step-by-step procedures
│   └── patterns/             • Reusable patterns
│
└── scripts/              ◄── Automation scripts

Component Loading Behavior

Understanding when components load is critical for optimization:

Component	When Loaded	Purpose	Token Budget
`SKILL.md`	Always (when skill matches)	Primary definition, routing logic	< 500 tokens (soft), < 5000 (hard)
`references/*.md`	On-demand (LLM requests)	Deep-dive docs, patterns, recipes	< 1000 tokens each
`scripts/`	Never (execution only)	Automation, not LLM context	N/A

Why token budgets matter: Skills compete for context window space. A 5000-token skill leaves less room for user code, conversation history, and MCP tool schemas. Lean skills perform better.

Progressive Disclosure: Three Levels

Skills use progressive disclosure to keep context lean. Only load what's needed, when it's needed.

┌─────────────────────────────────────────────────────────────────────┐
│  LEVEL 1: METADATA (Always in System Prompt)                        │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │ name: azure-deploy                                           │    │
│  │ description: Deploy applications to Azure...                 │    │
│  └─────────────────────────────────────────────────────────────┘    │
│       ↓ Pre-loaded at startup for ALL installed skills (~50 tokens) │
│                                                                      │
│  LEVEL 2: INSTRUCTIONS (Loaded on Trigger)                           │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │ # Azure Deploy                                               │    │
│  │ ## Steps                                                     │    │
│  │ 1. Validate prerequisites...                                 │    │
│  │ 2. Run deployment...                                         │    │
│  └─────────────────────────────────────────────────────────────┘    │
│       ↓ Loaded when skill matches user request (<500-5000 tokens)   │
│                                                                      │
│  LEVEL 3: RESOURCES (Loaded on Demand)                               │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │ references/container-apps.md   ← Loaded if CA detected      │    │
│  │ references/functions.md        ← Loaded if Functions needed │    │
│  │ scripts/validate.py            ← Executed, output only      │    │
│  └─────────────────────────────────────────────────────────────┘    │
│       ↓ Loaded only when skill references them (unlimited)          │
└─────────────────────────────────────────────────────────────────────┘

Why this matters: Level 1 is ~50 tokens per skill. Level 2 is 500-5000 tokens. Level 3 can be unlimited but only loads when referenced. This keeps agents fast and focused.

The Frontmatter Is Everything

The description field in frontmatter determines whether your skill gets invoked. This is the LLM's only signal for routing decisions. A poor description means your skill won't trigger—or will trigger incorrectly.

Key frontmatter elements (detailed in Section 4):

USE FOR: Trigger phrases that should activate this skill
DO NOT USE FOR: Anti-triggers that should route elsewhere
INVOKES: MCP tools this skill calls (helps LLM understand the relationship)

2.3 Two-Tier Skill Architecture

When building a skill ecosystem, not all skills are equal. Some orchestrate primary workflows (like deployment); others provide deep-dive knowledge for specific services. Separating these into tiers prevents confusion and improves maintainability.

┌────────────────────────────────────────────────────────────────────┐
│                        TIER 1: CORE SKILLS                         │
│                                                                    │
│   ┌──────────────┐   ┌──────────────┐   ┌──────────────┐          │
│   │   PREPARE    │ → │   VALIDATE   │ → │    DEPLOY    │          │
│   └──────────────┘   └──────────────┘   └──────────────┘          │
│                                                                    │
│   Purpose: Orchestrate the primary development workflow            │
│   Ownership: Central skills team                                   │
│   Invocation: User wants to build, prepare, validate, or deploy    │
└────────────────────────────────────────────────────────────────────┘
                                │
                                │ references
                                ▼
┌────────────────────────────────────────────────────────────────────┐
│                    TIER 2: SERVICE-SPECIFIC SKILLS                 │
│                                                                    │
│   ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐     │
│   │  Azure     │ │  Azure     │ │  Azure     │ │  Azure     │ ... │
│   │ Functions  │ │ Container  │ │  Cosmos    │ │   Redis    │     │
│   │   Skill    │ │ Apps Skill │ │  DB Skill  │ │   Skill    │     │
│   └────────────┘ └────────────┘ └────────────┘ └────────────┘     │
│                                                                    │
│   Purpose: Deep-dive guidance for specific Azure services          │
│   Ownership: Product and service teams                             │
│   Invocation: User asks service-specific questions (not workflow)  │
└────────────────────────────────────────────────────────────────────┘

Why Two Tiers?

Separation of concerns: Workflow logic (Tier 1) is different from domain knowledge (Tier 2). Mixing them creates bloated, hard-to-maintain skills.
Ownership clarity: The team that owns deployment workflows shouldn't have to be experts in every Azure service. Tier 2 lets service teams own their domain.
Routing precision: "Deploy my app" should always go to a core skill, even if the app uses Functions. "How do Functions triggers work?" should go to the Functions skill directly.
Composability: Tier 1 skills can reference Tier 2 skills as needed, enabling modularity without duplication.

Tier Responsibilities

Aspect	Tier 1 (Core)	Tier 2 (Service)
Scope	Primary workflow (build/deploy)	Service-specific knowledge
When Invoked	"Deploy my app", "Prepare for Azure"	"How do Functions triggers work?"
Can Reference	Tier 2 skills, MCP tools	MCP tools only
Ownership	Central team	Product teams
Update Frequency	Less frequent (workflow stability)	More frequent (service changes)

Example from Azure: The azure-prepare skill (Tier 1) handles the workflow of preparing any app for Azure. When it detects a Functions app, it references the azure-functions skill (Tier 2) for service-specific configuration patterns, then calls MCP tools like azure-functionapp for resource queries.

3. When to Use What

This section answers the most common question: "Should this be a Skill or an MCP Tool?" The answer depends on intent, scope, and control model.

Why Routing Matters

Incorrect routing leads to poor user experiences:

Routing Error	Consequence	Example
Skill when Tool needed	Slow, over-engineered response	User asks "list my VMs" → Gets a workflow lecture instead of a list
Tool when Skill needed	Incomplete, context-free action	User asks "deploy my app" → Tool runs `azd up` without validation
Both invoked	Conflicting guidance, wasted tokens	Both Skill and Tool answer, giving different advice
Neither invoked	Capability gap	Request falls through without any response

The goal is single, correct routing for every user request.

3.1 The Core Routing Question

Ask: "Is this a workflow or an operation?"

                    User: "Deploy my app"
                            │
                            ▼
                   ┌────────────────┐
                   │ Is it a        │
                   │ workflow task? │
                   └───────┬────────┘
                          │
            ┌─────────────┴─────────────┐
            │ YES                   NO  │
            ▼                           ▼
    ┌───────────────┐          ┌───────────────┐
    │    SKILL      │          │   MCP TOOL    │
    │               │          │               │
    │ • Deploy      │          │ • List        │
    │ • Create      │          │ • Get         │
    │ • Set up      │          │ • Query       │
    │ • Configure   │          │ • Run command │
    └───────────────┘          └───────────────┘

Workflow = Multiple steps, decisions required, generates artifacts Operation = Single action, no decisions, returns data or executes command

3.2 Route by Verb

The verb in a user request often signals the correct route:

Verb	Route	Reason	Example Request
Deploy, Create, Set up, Configure	SKILL	Multi-step workflow	"Deploy my React app"
List, Get, Show, Query, Check	MCP TOOL	Data retrieval	"List my storage accounts"
Help, Guide, Walk through, Explain	SKILL	Guidance needed	"Help me set up CI/CD"
Run, Execute	MCP TOOL	Direct execution	"Run azd up"
Troubleshoot, Debug, Diagnose	SKILL first	Then MCP for data	"Why is my app failing?"
Optimize, Review, Analyze	SKILL	Analysis workflow	"Review my architecture"

3.3 Edge Cases: When Routing Is Ambiguous

Not every request maps cleanly to Skill or Tool. Here's how to handle ambiguity:

Ambiguous Request	Resolution	Rationale
"Create a storage account"	SKILL	"Create" implies workflow; user needs guidance on SKU, replication, etc.
"Create storage account named 'myacct' in eastus, Standard_LRS"	MCP TOOL	Fully specified; no decisions needed
"Deploy" (no context)	SKILL	Ask clarifying questions via workflow
"azd up" (explicit command)	MCP TOOL	User knows exactly what they want
"Set up monitoring" (broad)	SKILL	Workflow to determine what to monitor, how, and where
"Show me the metrics for my app"	MCP TOOL	Data retrieval, specific ask

Rule of thumb: If the user provides all required parameters explicitly, route to Tool. If decisions remain, route to Skill.

3.4 Routing Rules for System Prompts

Add this to LLM system prompts to improve routing consistency:

## Tool Routing Rules

BEFORE invoking any capability, determine the correct route:

### Route to SKILL when:
- Request involves multiple steps: "deploy my app", "set up monitoring"
- Request needs decisions: "what should I use for...", "help me choose..."
- Request generates code: "create azure.yaml", "generate Bicep"
- Request follows workflow: prepare → validate → deploy
- User says: "help me", "guide me", "walk me through"

### Route to MCP TOOL when:
- Request is data retrieval: "list my...", "show me...", "get..."
- Request is single operation: "delete this", "query logs", "run azd up"
- Request targets specific resource: "storage account named X"
- Skill step explicitly invokes MCP tool
- User says: "just run", "execute", "check status"

### When BOTH are needed (Skill invokes Tool):
- Skill orchestrates the workflow
- Skill calls MCP Tool for specific operations within the workflow
- Example: azure-diagnostics (Skill) calls azure-applens (Tool) for diagnostic data

3.5 Disambiguation Examples

Real-world routing decisions:

User Request	Route	Target	Why
"Deploy my app to Azure"	SKILL	`azure-prepare`	New deployment workflow
"Run azd up"	MCP	`azure-azd`	Direct command execution
"List my storage accounts"	MCP	`azure-storage`	Data query
"Set up Key Vault"	SKILL	`azure-security`	Workflow guidance
"Get secret 'api-key'"	MCP	`azure-keyvault`	Direct operation
"What's wrong with my app"	SKILL	`azure-diagnostics`	Analysis workflow
"Check resource health status"	MCP	`azure-resourcehealth`	Status query
"Create a new Function App"	SKILL	`azure-functions`	Creation workflow
"List my Function Apps"	MCP	`azure-functionapp`	Data query
"How do I configure CORS for SWA?"	SKILL	`azure-prepare`	Guidance needed

4. Skills Development Guide

This section provides practical guidance for building effective Skills. A well-designed Skill has a clear purpose, triggers correctly, and integrates smoothly with MCP Tools.

4.1 Evaluation-First Development

Principle: Build evaluations first, then write minimal instructions.

Don't document everything the agent might need. Document only what it gets wrong without guidance.

The Workflow

1. TEST WITHOUT SKILL  → Run agent on representative tasks
       ↓
2. IDENTIFY GAPS       → Where does it fail? What does it get wrong?
       ↓
3. CREATE EVALS        → Structure test scenarios
       ↓
4. WRITE MINIMAL DOCS  → Only what's needed to pass evals
       ↓
5. ITERATE             → Refine based on results

Key Insight

"Building a skill is not the same as training a model. It is closer to writing an onboarding guide for a new hire." — Anthropic

The agent already knows how to code. Your skill teaches it your patterns, constraints, and preferences.

Tooling

Use Waza for structured skill evaluation:

# Generate eval from existing skill
waza generate --repo microsoft/GitHub-Copilot-for-Azure --skill azure-functions -o ./eval

# Run evaluation
waza run eval.yaml

4.2 Degrees of Freedom

Match instruction specificity to task fragility:

Freedom Level	When to Use	Risk if Too Loose	Risk if Too Tight
High	Multiple valid approaches exist	Agent picks suboptimal path	Unnecessary constraints
Medium	Preferred pattern with variations	Inconsistent results	Missed edge cases
Low	Fragile, error-prone operations	Broken systems, data loss	Agent can't adapt

Rule of thumb: The more damage a wrong approach can cause, the more prescriptive your instructions should be.

High Freedom Example (code review)

## Code Review Process
1. Analyze code structure and organization
2. Check for potential bugs or edge cases
3. Suggest improvements for readability

Low Freedom Example (database migration)

## Database Migration
Run exactly this script:
```bash
python scripts/migrate.py --verify --backup

Do not modify the command or add additional flags.


### 4.3 The Skill Development Process

Before writing code, answer these questions:

| Question | Why It Matters | Wrong Answer = Don't Build |
|----------|----------------|---------------------------|
| What workflow does this skill orchestrate? | Skills are for workflows, not single operations | "It lists resources" → That's an MCP Tool |
| What decisions does the user need help with? | Skills provide guidance | "None, just run the command" → MCP Tool |
| What MCP Tools will this skill invoke? | Skills complement Tools | "None" → May not need a Skill |
| What triggers should activate this skill? | Routing depends on triggers | "Everything" → Too broad, will conflict |
| What should NOT trigger this skill? | Anti-triggers prevent conflicts | "Nothing" → Will have false positives |

#### Skill Development Lifecycle

IDENTIFY → Is this a workflow? What decisions are involved? ↓
DESIGN → Map the workflow steps. What MCP tools are needed? ↓
WRITE → Frontmatter first (triggers). Then body (steps). ↓
TEST → Trigger accuracy tests. Does it invoke correctly? ↓
ITERATE → Refine anti-triggers based on false positives.


> **Tip:** For multi-step workflows, use the [Workflow Checklist Pattern (§7.6)](#76-workflow-checklist-pattern) to make progress trackable.

### 4.4 SKILL.md Structure

Every skill follows this structure:

#### Naming Constraints

| Field | Constraints |
|-------|-------------|
| `name` | • Max 64 characters<br>• Lowercase letters, numbers, hyphens only<br>• No XML tags<br>• No reserved words: `anthropic`, `claude`, `openai`, `copilot` |
| `description` | • Must be non-empty<br>• Max 1024 characters<br>• No XML tags<br>• Write in third person ("Deploys applications..." not "I deploy...") |

```yaml
---
name: my-skill-name
description: |
  **WORKFLOW SKILL** - One-line description of what the skill does.
  USE FOR: trigger phrase 1, trigger phrase 2, trigger phrase 3.
  DO NOT USE FOR: scenario1 (use other-skill), scenario2 (use mcp-tool).
  INVOKES: `mcp-tool-1`, `mcp-tool-2` for execution.
  FOR SINGLE OPERATIONS: Use `mcp-tool` directly for simple queries.
---

Skill Body Structure:

# Skill Title

## When to Use This Skill
Activate when user wants to:
- Specific action 1
- Specific action 2

## Prerequisites
- Required MCP tools: `azure-xxx`, `azure-yyy`
- Required permissions: list

## MCP Tools Used

| Step | MCP Tool | Command | Purpose |
|------|----------|---------|---------|
| 1 | `azure-xxx` | `xxx_list` | Gather data |
| 3 | `azure-yyy` | `yyy_create` | Execute action |

## Steps

### Step 1: Action Name

**Using MCP (Preferred):**
Invoke `azure-xxx` MCP tool:
- Command: `command_name`
- Parameters: `subscription`, `resource-group`

**CLI Fallback (if MCP unavailable):**
az command --subscription X

## Related Skills
- For X: `azure-x-workflow`
- For Y: `azure-y-guide`

4.5 Frontmatter Best Practices

The frontmatter is the most critical part—it determines when your skill is invoked. The LLM uses the description field to decide whether to route a request to your skill.

How Routing Works

User makes a request: "Deploy my React app to Azure"
LLM scans all available skills' description fields
LLM matches request keywords against skill descriptions
Best-matching skill is invoked (or none, if no match)

Implication: If your description doesn't contain the right trigger phrases, your skill won't be invoked—even if it's the right tool for the job.

Required Elements

Element	Purpose	Example
`name`	Unique identifier	`azure-deploy`
`description`	Triggers + anti-triggers + relationships	See below

Description Pattern (High Compliance)

description: |
  **WORKFLOW SKILL** - Process PDF files including text extraction, rotation, and merging.
  USE FOR: "extract PDF text", "rotate PDF", "merge PDFs", "PDF to text".
  DO NOT USE FOR: creating PDFs from scratch (use document-creator),
  image extraction (use image-extractor).
  INVOKES: pdf-tools MCP for extraction, file-system for I/O.
  FOR SINGLE OPERATIONS: Use pdf-tools MCP directly for simple extractions.

Why each element matters:

Element	Purpose	What Happens Without It
`WORKFLOW SKILL`	Signals multi-step nature	LLM may route single ops here
`USE FOR:`	Explicit triggers	Skill won't trigger on relevant requests
`DO NOT USE FOR:`	Anti-triggers	False positives, conflicts with other skills
`INVOKES:`	MCP relationship	LLM doesn't know skill uses tools
`FOR SINGLE OPERATIONS:`	Bypass guidance	Users confused about when to use skill vs. tool

Skill Classification Prefixes

Add a prefix to clarify the skill type:

Prefix	Use When
`WORKFLOW SKILL`	Multi-step orchestration
`UTILITY SKILL`	Single-purpose helper
`ANALYSIS SKILL`	Read-only analysis/reporting

Effectiveness Note: These prefixes improve routing based on qualitative testing and observed behavior during development. Formal A/B testing with quantified metrics would strengthen these recommendations. The prefixes work because they add semantic signal to the description field, which LLMs use for routing decisions (see Appendix A). In practice, we've observed fewer false positives when prefixes clearly signal the skill's intent.

4.6 Scoring Criteria

Skills are scored on compliance using the criteria below. Target: Medium-High or better.

Tooling Note: The scoring criteria described here were developed using internal evaluation frameworks (sensei for skill analysis, waza for trigger testing). These tools are not currently publicly available, but you can apply the same criteria manually or build equivalent tooling. The key is having a consistent rubric for evaluating skill quality before deployment.

Score	Requirements
Low	Description < 150 chars OR no triggers
Medium	Description >= 150 chars AND has trigger keywords
Medium-High	Has "USE FOR:" AND "DO NOT USE FOR:"
High	Medium-High + routing clarity (INVOKES/FOR SINGLE OPERATIONS)

Before (Low Compliance)

description: 'Process PDF files'

After (High Compliance)

description: |
  **WORKFLOW SKILL** - Process PDF files including text extraction, rotation, and merging.
  USE FOR: "extract PDF text", "rotate PDF", "merge PDFs", "PDF to text".
  DO NOT USE FOR: creating PDFs from scratch (use document-creator).
  INVOKES: pdf-tools MCP for extraction.
  FOR SINGLE OPERATIONS: Use pdf-tools MCP directly.

4.7 Token Budget Management

File	Soft Limit	Hard Limit
`SKILL.md`	500 tokens	5,000 tokens
`references/*.md`	1,000 tokens	5,000 tokens

Use a .token-limits.json configuration:

{
  "defaults": {
    "SKILL.md": 500,
    "references/**/*.md": 1000
  },
  "overrides": {
    "README.md": 3000
  }
}

Enforcement Note: Token limit enforcement is currently a design pattern, not an automated gate. The .token-limits.json file serves as documentation and can be enforced via CI scripts (count tokens using tiktoken or similar). The limits are based on observed context window usage and agent performance degradation with oversized skills. If building automated enforcement, integrate token counting into your skill linting pipeline.

4.8 Reference Documentation Patterns

Keep SKILL.md lean. Put deep content in references:

my-skill/
├── SKILL.md                      # Workflow orchestration only
└── references/
    ├── services/
    │   └── static-web-apps.md    # SWA-specific patterns
    ├── recipes/
    │   └── deploy-react.md       # Step-by-step for React
    └── patterns/
        └── error-handling.md     # Common error resolutions

Reference them in SKILL.md:

See [SWA Configuration](references/services/static-web-apps.md) for framework-specific settings.

4.9 Script Guidelines

Scripts in scripts/ are executed, not loaded into context. Write them defensively.

Execute vs. Reference

Be explicit about how scripts should be used:

Instruction	Meaning
"Run `scripts/validate.py`"	Execute the script
"See `scripts/validate.py` for the algorithm"	Read as reference, don't execute

Error Handling

Scripts should handle errors gracefully—the agent can't debug runtime failures:

# Good: Recovers from errors
def process_file(path):
    try:
        with open(path) as f:
            return f.read()
    except FileNotFoundError:
        print(f"File {path} not found, using default")
        return ''

# Bad: Crashes unexpectedly
def process_file(path):
    return open(path).read()

Dependencies

Document and verify dependencies:

**Requirements:** Python 3.11+, `pypdf` package

Install: `pip install pypdf`

Path Conventions

Always use forward slashes: scripts/helper.py ✅, scripts\helper.py ❌

5. Skill Organization Patterns

As the number of skills grows, organization becomes critical. This section covers patterns for structuring related skills to avoid trigger collisions, enable cross-cutting guidance, and maintain clear routing.

5.1 The Problem: Skill Proliferation

When building skills for a large domain (e.g., data services, compute platforms, messaging systems), you face a fundamental choice about granularity:

Approach	Characteristics	Trade-offs
One skill per service	Each service gets its own skill with deep, focused content	Precise activation, smaller context; but no cross-service guidance
One consolidated skill	All related services in a single skill	Cross-service guidance, single entry point; but bloated context window
Orchestrator + service skills	Routing skill delegates to specialized skills	Best of both; but more complex to maintain

The right choice depends on your domain's complexity and how often users need cross-service guidance.

5.2 Three Organization Patterns

Pattern A: Flat (Service-Specific Skills Only)

skills/
├── database-postgres/
├── database-mysql/
├── database-mongodb/
└── storage-blob/

When to use:

Services are distinct with minimal overlap
Users rarely ask "which should I use?"
Each skill is self-contained

Pros: Precise activation, smaller context per invocation, independent evolution Cons: No cross-service guidance, potential duplication of shared patterns

Pattern B: Consolidated (Single Domain Skill)

skills/
└── data-services/
    ├── SKILL.md           # All database + storage content
    └── references/
        ├── postgres.md
        ├── mysql.md
        └── mongodb.md

When to use:

Services are tightly related
Users frequently compare options
Shared patterns dominate (auth, backup, networking)

Pros: Single entry point, cross-service guidance built-in Cons: Large context window usage, trigger phrase collisions, monolithic maintenance

Pattern C: Orchestrator + Service Skills (Recommended)

skills/
├── data-services/              # Orchestrator
│   ├── SKILL.md                # Decision trees, comparisons, routing
│   └── references/
│       ├── selection-guide.md
│       └── migration-patterns.md
├── database-postgres/          # Service skill
├── database-mysql/             # Service skill
└── storage-blob/               # Service skill

When to use:

Domain has both cross-cutting concerns AND deep service-specific content
Users ask both "which should I use?" AND "how do I configure X?"
You want to scale the number of services without bloating a single skill

Pros: Cross-service guidance without context bloat, clear routing, independent service skill evolution Cons: More skills to maintain, requires careful trigger phrase design

5.3 The Orchestrator Pattern in Detail

The orchestrator skill handles cross-cutting concerns and routing decisions, while service skills handle implementation details.

Orchestrator responsibilities:

Decision trees ("Which service should I use?")
Comparison tables (Service A vs. Service B)
Cross-service patterns (authentication, networking, migration)
Explicit routing to service skills

Service skill responsibilities:

Service-specific configuration
Implementation guides
Troubleshooting
Service-specific best practices

Key Mechanism: USE FOR / DO NOT USE FOR

The orchestrator's description must explicitly define boundaries:

---
name: data-services
description: >
  Data service selection and cross-cutting patterns.
  USE FOR: compare databases, choose data store, data migration strategy,
  which database to use, Service A vs Service B decisions.
  DO NOT USE FOR: service-specific implementation (use database-postgres,
  database-mysql, storage-blob directly for configuration tasks).
---

Service skills include reciprocal boundaries:

---
name: database-postgres
description: >
  PostgreSQL configuration, authentication, and operations.
  USE FOR: PostgreSQL setup, configuration, query optimization, auth setup.
  DO NOT USE FOR: comparing database options (use data-services).
---

5.4 Preventing Trigger Collisions

Without clear boundaries, both orchestrator and service skills may activate for the same prompt, causing inconsistent behavior.

Problem pattern (collision):

# Orchestrator
description: "Help with databases including PostgreSQL setup"

# Service skill  
description: "Help with PostgreSQL setup and configuration"

Both match "help me set up PostgreSQL" → unpredictable routing.

Solution pattern (clear boundaries):

# Orchestrator
description: >
  USE FOR: compare databases, choose data store
  DO NOT USE FOR: PostgreSQL setup (use database-postgres)

# Service skill
description: >
  USE FOR: PostgreSQL setup, configuration, optimization
  DO NOT USE FOR: comparing databases (use data-services)

"Help me set up PostgreSQL" → database-postgres "Should I use PostgreSQL or MySQL?" → data-services

5.5 Routing Examples

User Intent	Activated Skill	Rationale
"Which database should I use?"	Orchestrator	Cross-service decision
"Compare PostgreSQL and MySQL"	Orchestrator	Comparison query
"Set up PostgreSQL authentication"	Service (postgres)	Service-specific implementation
"Optimize my MySQL queries"	Service (mysql)	Service-specific task
"Migrate from MySQL to PostgreSQL"	Orchestrator	Cross-service workflow

5.6 Testing Implications

When implementing the orchestrator pattern, tests must verify proper routing:

Orchestrator tests:

Activates for cross-cutting prompts (comparisons, selection, migration)
Does NOT activate for service-specific prompts

Service skill tests:

Activates for service-specific prompts
Does NOT activate for orchestrator prompts (add negative test cases)

// Example: Service skill negative tests
describe('Should NOT Trigger (Orchestrator Handles These)', () => {
  const orchestratorPrompts = [
    'Which database should I use?',
    'Compare PostgreSQL and MySQL',
    'Help me choose a data store',
  ];

  test.each(orchestratorPrompts)(
    'does not trigger on: "%s"',
    (prompt) => {
      const result = triggerMatcher.shouldTrigger(prompt);
      expect(result.triggered).toBe(false);
    }
  );
});

5.7 When to Introduce an Orchestrator

Consider adding an orchestrator when:

Users frequently ask comparison questions — "Should I use A or B?"
Multiple skills share patterns — Authentication, networking, backup strategies
The domain is growing — Adding more services that need unified guidance
Context window is a concern — Individual skills are getting too large

Don't add an orchestrator when:

Services are unrelated (no cross-service questions)
The domain is small (2-3 skills with minimal overlap)
Maintenance overhead isn't justified

5.8 Summary: Choosing an Organization Pattern

Factor	Flat	Consolidated	Orchestrator
Cross-service guidance	None	Built-in	Via orchestrator
Context efficiency	Best	Worst	Good
Maintenance complexity	Low	Medium	Higher
Trigger collision risk	Low	High	Low (if designed well)
Scales with services	Yes	No	Yes

Rule of thumb: Start with flat (Pattern A). When cross-service questions become common, introduce an orchestrator (Pattern C). Avoid consolidated (Pattern B) unless the domain is small and stable.

6. MCP Tool Development Guide

This section covers when and how to create MCP Tools. Remember: Tools are model-controlled—the LLM decides when to call them based on schema and description.

6.1 When to Create an MCP Tool

MCP Tools are for discrete, atomic operations. The decision framework:

Create a Tool When:

Criteria	Example	Why It's a Tool
Exposing a new API endpoint	Key Vault secret retrieval	Direct API wrapper
Operation is atomic	List storage accounts	Single request/response
Returns data for further processing	Get metrics	LLM needs the output
No decisions required	Delete a resource by ID	Parameters fully specify action
Can describe in one sentence	"Get the value of a secret from Key Vault"	Clear, bounded scope

Do NOT Create a Tool When:

Criteria	Example	What to Build Instead
Multi-step workflow	"Deploy my app"	Skill (orchestrates steps)
User decisions mid-process	"Set up monitoring"	Skill (guides decisions)
Needs context accumulation	"Troubleshoot this error"	Skill (maintains state)
Duplicates existing capability	Another way to list VMs	Nothing (use existing)

6.2 Tool Design Principles

1. Single Responsibility One tool = one operation. Don't create a tool that "creates or updates or deletes" based on parameters. Create three tools.

2. Clear Naming Names should be verb_noun or noun_verb patterns that clearly indicate the action:

✅ secret_get, account_list, container_create
❌ handle_secret, manage_storage, do_operation

3. Descriptive Schemas The description field is how the LLM decides to use your tool. Be explicit:

✅ "Get the value of a specific secret from an Azure Key Vault. Returns the secret value and metadata."
❌ "Key Vault operations"

4. Skill References Help the LLM understand when NOT to use your tool by referencing Skills:

FOR FULL WORKFLOW: Use `azure-security` skill for Key Vault setup and configuration.

6.3 Tool Schema Definition

Tools are defined using JSON Schema. The schema tells the LLM what parameters are available and required:

{
  "name": "keyvault_secret_get",
  "title": "Get Key Vault Secret",
  "description": "Retrieve the value of a specific secret from Azure Key Vault. Returns the secret value and metadata. FOR FULL WORKFLOW: Use azure-security skill for Key Vault setup.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "vault_name": {
        "type": "string",
        "description": "Name of the Key Vault"
      },
      "secret_name": {
        "type": "string",
        "description": "Name of the secret to retrieve"
      },
      "version": {
        "type": "string",
        "description": "Optional: specific version of the secret"
      }
    },
    "required": ["vault_name", "secret_name"]
  }
}

Schema Best Practices:

Element	Best Practice	Why
`name`	Use `resource_action` pattern	Predictable, searchable
`description`	Include skill cross-reference	Helps routing decisions
`required`	List only truly required params	Reduces friction
Property `description`	Be specific about format/constraints	LLM generates better calls

6.4 Naming Conventions

Consistent naming helps both LLMs and developers find the right tool:

Namespace: {platform}-{service}
Command:   {resource}_{action}

Examples:
- Namespace: azure-storage
  - storage_account_list     (list all storage accounts)
  - storage_blob_get         (get a specific blob)
  - storage_container_create (create a container)

- Namespace: azure-keyvault
  - keyvault_list           (list all vaults)
  - keyvault_secret_get     (get a secret value)
  - keyvault_secret_set     (set a secret value)

Naming Rules:

Namespace = service identity. All tools for a service share the namespace.
Resource = what you're operating on. Usually the ARM resource type.
Action = the operation. Use standard verbs: list, get, create, update, delete, query.

6.5 MCP Description Template

Include skill references in MCP tool descriptions to improve routing:

**EXECUTION TOOL** - [One sentence describing what it does].
USE FOR: [Specific operations this tool handles].
FOR FULL WORKFLOW: Use `skill-name` skill for [workflow description].
FOR GUIDANCE: Use `skill-name` skill to understand [concept].

Example:

**EXECUTION TOOL** - Execute Azure Developer CLI (azd) commands.
USE FOR: Running azd up, azd deploy, azd provision, getting deployment logs.
FOR FULL WORKFLOW: Use `azure-deploy` skill (prepare → validate → deploy chain).
FOR GUIDANCE: Use `azure-prepare` skill to configure azure.yaml before running azd.

6.6 Best Practices Files

For configuration patterns and reference material, create best practices files that MCP can serve:

Purpose: Centralize patterns that multiple Skills might need. Skills call get_azure_bestpractices(resource="X") instead of embedding duplicate content.

File: azure-swa-best-practices.txt

# Azure Static Web Apps Best Practices

## azure.yaml Configuration
services:
  web:
    host: staticwebapp
    ...

## Bicep Patterns
resource staticWebApp 'Microsoft.Web/staticSites@2022-09-01' = {
  ...
}

## Build Output by Framework
| Framework | outputLocation |
| React     | build          |
| Vue       | dist           |
| Angular   | dist/{project} |

7. Integration Patterns

This section describes how Skills and MCP Tools work together. The key insight: Skills should orchestrate, MCP should execute. When this pattern is followed, you get consistent, maintainable, and testable workflows.

7.1 The Hybrid Pattern (Recommended)

The hybrid pattern assigns clear responsibilities:

┌─────────────────────────────────┐    ┌──────────────────────────────┐
│ MCP = "WHAT"                    │    │ Skills = "HOW"               │
│ (Patterns & Configurations)     │    │ (Workflow Orchestration)     │
│                                 │    │                              │
│ • azure.yaml snippets           │◄───│ • Detection logic            │
│ • Bicep resource patterns       │    │ • Workflow steps             │
│ • SKU guidance                  │    │ • Error handling             │
│ • Build output by framework     │    │ • Decision trees             │
│ • API references                │    │ • User interaction           │
└─────────────────────────────────┘    └──────────────────────────────┘
         Single Source of Truth              Invokes MCP for patterns

Why this works:

No duplication: Patterns live in one place (MCP). Skills reference them.
Easy updates: Change a pattern in MCP; all Skills get the update.
Clear ownership: MCP team owns patterns; Skill team owns workflows.
Testable: Test patterns independently from workflows.

Key Principle: Skills call get_azure_bestpractices(resource="static-web-app") instead of embedding duplicate content.

Real-World Example: The Static Web Apps routing fix (see Section 8) moved build output patterns from the azure-prepare skill into MCP's best practices file. The skill now calls MCP to get the patterns, ensuring consistency.

7.2 Pattern: Skill as Orchestrator

The skill orchestrates the workflow; MCP tools execute the operations:

SKILL orchestrates → MCP executes → SKILL interprets → User output

This pattern ensures:

Workflow logic stays in Skills — Decisions, branching, error handling
Execution stays in MCP — API calls, data retrieval, resource operations
Results are synthesized by Skills — Combine outputs into user-facing guidance

Example: Cost Optimization Workflow

## Step 1: Load Best Practices
Use `azure-get_azure_bestpractices` MCP tool with:
- resource: "cost-optimization"
- action: "all"

## Step 2: Discover Resources  
Use `azure-storage` MCP tool → `storage_account_list`
Use `azure-cosmos` MCP tool → `cosmos_account_list`

## Step 3: Run Compliance Check
Use `azure-extension_azqr` MCP tool for orphaned resources

## Step 4: Generate Report (Skill logic)
Synthesize MCP results into actionable recommendations

7.3 Anti-Patterns to Avoid

Anti-Pattern	Problem	Correct Pattern
Skill embeds CLI commands	Bypasses MCP, creates duplication	Skill invokes MCP tool
MCP tool includes workflow logic	Tools should be atomic	Move logic to Skill
Skill duplicates MCP patterns	Two sources of truth, drift	Skill calls MCP for patterns
Tool has no skill reference	LLM doesn't know when to use Skill	Add FOR FULL WORKFLOW in tool description
Skill doesn't list MCP dependencies	Hard to maintain, unclear requirements	Add MCP Tools Used section

7.4 Pattern: Preparation Manifest

The Preparation Manifest connects the three core skills and maintains state across workflow steps:

┌─────────────────────────────────────────────────────────────────────┐
│                              PREPARE                                │
│                     Get app Azure-ready                             │
│    Discovery → Architecture Planning → File Generation → Manifest  │
└─────────────────────────────────────────────────────────────────────┘
                                │ outputs
                                ▼
                ┌───────────────────────────────┐
                │    PREPARATION MANIFEST       │
                │    .azure/preparation.md      │
                │                               │
                │    • Application components   │
                │    • Generated artifacts      │
                │    • Deployment config        │
                │    • Validation requirements  │
                │    • Decision log             │
                └───────────────────────────────┘
                                │ reads
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                             VALIDATE                                │
│    Read Manifest → Execute Validation Checks → Update Manifest      │
└─────────────────────────────────────────────────────────────────────┘
                                │ reads
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                              DEPLOY                                 │
│    Read Manifest → Execute Deployment → Record Outcome              │
└─────────────────────────────────────────────────────────────────────┘

Why use a manifest?

State persistence: Skills are stateless; the manifest maintains context
Resumability: User can stop and restart the workflow
Auditability: Decision log shows why choices were made
Validation: Each skill can verify prerequisites from the manifest

7.5 MCP Cross-References in Skills

Every skill should have an "MCP Tools Used" section that documents dependencies:

## MCP Tools Used in This Skill

| Step | Tool | Command | Purpose |
|------|------|---------|---------|
| 1 | `azure-get_azure_bestpractices` | `get_bestpractices` | Load guidance |
| 3 | `azure-deploy` | `plan_get` | Analyze workspace |
| 5 | `azure-azd` | `up` | Execute deployment |

**If Azure MCP is not enabled:** Run `/mcp add azure` or use CLI fallback.

Benefits:

Developers know what MCP tools to have enabled
LLM understands the skill-tool relationship
Maintenance is easier (clear dependencies)

7.6 Workflow Checklist Pattern

For multi-step workflows, provide a copyable checklist that makes progress trackable:

## Deployment Workflow

Copy this checklist and track progress:

Deployment Progress:

Step 1: Validate prerequisites (azure.yaml, authentication)
Step 2: Run pre-flight checks (azd validate)
Step 3: Execute deployment (azd up)
Step 4: Verify deployment succeeded
Step 5: Run smoke tests


**Step 1: Validate prerequisites**
Check that azure.yaml exists and contains valid configuration...

Why this pattern works:

Benefit	How
Verifiable progress	Each checkbox = completed state
Resumable	Agent can restart from failed step
Visible	User sees exactly where workflow is
Debuggable	Failed step is obvious

When to use:

Workflows with 3+ sequential steps
Tasks that might fail mid-way
Processes where order matters

Cross-reference: See §4.2 Degrees of Freedom for guidance on how prescriptive each step should be.

8. DOs and DON'Ts

8.1 DOs ✅

DO: Add MCP Cross-References in Skills

Good Pattern (azure-observability):

| Service | Use When | MCP Tools | CLI |
|---------|----------|-----------|-----|
| Azure Monitor | Metrics, alerts | `azure__monitor` | `az monitor` |

Good Pattern (azure-security):

### Key Vault
- `azure__keyvault` with command `keyvault_list` - List Key Vaults
- `azure__keyvault` with command `keyvault_secret_get` - Get secret value

DO: Use Skill Classification Prefix

description: |
  **WORKFLOW SKILL** - Orchestrates deployment through preparation, validation, execution.

DO: Include Routing Clarity

description: |
  ...
  INVOKES: `azure-deploy` MCP tool, `azure-azd` MCP tool for execution.
  FOR SINGLE OPERATIONS: Use `azure-azd` MCP tool directly for single azd commands.

DO: Consolidate Patterns in MCP, Workflows in Skills

Content Type	Belongs In
azure.yaml snippets	MCP best practices
Bicep patterns	MCP best practices
SKU guidance	MCP best practices
Detection logic	Skill
Workflow steps	Skill
Error handling	Skill

DO: Test with Trigger Tests

Use Waza-style trigger testing:

# trigger_tests.yaml
shouldTriggerPrompts:
  - "deploy my app to Azure"
  - "set up Azure deployment"
  - "prepare for Azure"

shouldNotTriggerPrompts:
  - "list my storage accounts"
  - "run azd up"
  - "check resource health"

DO: Use Token Limits

{
  "defaults": {
    "SKILL.md": 500,
    "references/**/*.md": 1000
  }
}

8.2 DON'Ts ❌

DON'T: Duplicate Configuration in Both MCP and Skills

Bad:

MCP: azure.yaml template with host: staticwebapp
SKILL: Also contains azure.yaml template with host: staticwebapp

Good:

MCP: Single source of truth for azure.yaml patterns
SKILL: Invokes MCP for patterns, focuses on workflow

DON'T: Embed CLI Commands Directly in Skills

Bad (azure-diagnostics problem):

# Current: Embeds CLI commands directly
az containerapp show --name APP -g RG --query "properties.configuration.registries"

Good:

Use azure-applens MCP tool for AI-powered diagnostics, or Use azure-resourcehealth MCP tool to check availability status.

CLI Fallback (if MCP unavailable):

az containerapp show --name APP -g RG

DON'T: Create Competing Guidance

Bad (SWA CLI vs azd issue):

MCP: "Use npx swa deploy"
SKILL: "Use azd up with Bicep"
→ Agent picks randomly → ~50% deployment failures

Good:

MCP: Patterns only (azure.yaml, Bicep templates)
SKILL: Workflow only (calls MCP for patterns)
→ Single path → Consistent results

DON'T: Leave Descriptions Under 150 Characters

Bad (Low compliance):

description: 'Process PDF files'

Good (High compliance):

description: |
  **WORKFLOW SKILL** - Process PDF files including extraction and merging.
  USE FOR: "extract PDF", "merge PDFs". DO NOT USE FOR: creating PDFs.

DON'T: Omit Anti-Triggers

Bad:

description: |
  Deploy applications to Azure.
  USE FOR: azd up, azd deploy, push to Azure.

Good:

description: |
  Deploy applications to Azure.
  USE FOR: azd up, azd deploy, push to Azure.
  DO NOT USE FOR: listing resources (use azure-xxx MCP), querying logs (use azure-monitor MCP).

DON'T: Create Name Collisions Without Routing Guidance

Bad:

azure-deploy skill exists
azure-deploy MCP tool exists
No guidance on which to use

Good:

# Skill description:
description: |
  **WORKFLOW SKILL** - Full deployment workflow.
  FOR SINGLE COMMANDS: Use `azure-azd` MCP tool directly.

# MCP tool description:
description: |
  **EXECUTION TOOL** - Execute deployment commands.
  FOR FULL WORKFLOW: Use `azure-deploy` skill.

8.3 Security Considerations

Skills execute in the user's environment with significant privileges. Build defensively.

For Skill Authors

Do	Don't
Use environment variables for secrets	Hardcode credentials or API keys
Validate inputs before passing to scripts	Trust user input blindly
Document required permissions	Request more permissions than needed
Handle errors gracefully	Let scripts fail silently

For Script Safety

# Good: Explicit error handling, safe defaults
def process_file(path):
    try:
        with open(path) as f:
            return f.read()
    except FileNotFoundError:
        print(f"File {path} not found, using default")
        return ''
    except PermissionError:
        print(f"Cannot access {path}")
        return ''

# Bad: Fails unexpectedly, no recovery
def process_file(path):
    return open(path).read()  # Crashes on missing file

For Skill Reviewers

Before approving a skill, check:

No hardcoded secrets or credentials
Scripts handle errors without data loss
External URLs/APIs are documented and necessary
Destructive operations require confirmation
No unexpected network calls or data exfiltration

9. Case Studies

This section presents real examples of Skills and MCP Tools working together (and conflicts that arose when they didn't). These case studies are drawn from Azure ecosystem development but the patterns apply broadly.

9.1 Static Web Apps Routing Fix

This case study illustrates the core problem this guide addresses: conflicting guidance from uncoordinated systems.

The Problem

When a user says "deploy my React app to Azure", two systems provided conflicting guidance:

System	File	Guidance	Result
MCP	`azure-swa-best-practices.txt`	Use SWA CLI (`npx swa deploy`)	❌ Non-IaC, unreliable
Skills	`azure-deploy` + `azure-prepare`	Use `azd up` with Bicep	✅ IaC, reproducible

No coordination layer existed → Agent picked randomly → Estimated ~50% deployment failures when conflicting approaches were mixed.

Key Insight: The observation that identified this issue: "I think I found part of the problem. Our azure best practices tool doesn't use azd for SWA guidance which conflicts with the other guidance in both skills and general deployment."

The Solution

Hybrid Architecture:

┌─────────────────────────────────┐    ┌──────────────────────────────┐
│ MCP = "WHAT"                    │    │ Skills = "HOW"               │
│ (Patterns & Configurations)     │    │ (Workflow Orchestration)     │
│                                 │    │                              │
│ • azure.yaml snippets           │◄───│ • Detection logic            │
│ • Bicep resource patterns       │    │ • Workflow steps             │
│ • SKU guidance                  │    │ • Error handling             │
│ • Build output by framework     │    │ • Decision trees             │
└─────────────────────────────────┘    └──────────────────────────────┘

Changes Made:

MCP (azure-swa-best-practices.txt):
- Replaced CLI-only guidance with comprehensive azd patterns
- Added azure.yaml configurations
- Added Bicep resource patterns
- Kept SWA CLI as explicit-only alternative
Skills (azure-deploy, azure-prepare):
- Slimmed down, removed duplicate patterns
- Added get_azure_bestpractices(resource="static-web-app") invocation
- Added SWA detection signals

Results & Metrics

Metrics Note: The ~50% failure estimate is based on observed behavior during pre-fix testing where the agent would inconsistently apply SWA CLI vs azd approaches. Post-fix formal evaluation is in progress. Early qualitative observations show improved consistency (agent now reliably uses azd for deployment workflows), but quantified failure rate reduction requires controlled testing that is currently underway. We will update this section with hard metrics when available.

Test Verification

User Prompt	Expected Behavior	Pass Criteria
"Deploy my React app"	Uses azd, NOT swa CLI	No `npx swa` commands
"Use SWA CLI to deploy"	Uses SWA CLI (explicit)	`npx swa deploy` allowed
"Preview my app locally"	Uses SWA CLI for preview	`npx swa start`

9.2 Azure Functions: Skill vs MCP Disambiguation

The Problem

Both azure-functions skill and azure-functionapp MCP tool exist.

The Solution

User Intent	Route	Target
"Create a new Function App"	SKILL	`azure-functions` (creation workflow)
"List my Function Apps"	MCP	`azure-functionapp` (data query)
"How do Functions triggers work?"	SKILL	`azure-functions` (knowledge)
"Get function app settings"	MCP	`azure-functionapp` (data retrieval)

Skill Description Update:

description: |
  **WORKFLOW SKILL** - Create and configure Azure Functions.
  USE FOR: "create function app", "add Azure Function", "set up serverless".
  DO NOT USE FOR: listing functions (use azure-functionapp MCP), querying logs.
  INVOKES: `azure-functionapp` MCP for queries, `azure-azd` for deployment.

9.3 Key Vault Integration: Excellent MCP Reference Pattern

The azure-security skill demonstrates ideal MCP cross-referencing:

Key Vault Operations

MCP Server (Preferred):

azure__keyvault with command keyvault_list - List Key Vaults

azure__keyvault with command keyvault_secret_get - Get secret value

azure__keyvault with command keyvault_secret_set - Set secret value

If Azure MCP is not enabled: Run /azure:setup or enable via /mcp.

CLI Fallback:
az keyvault list --subscription $SUB
az keyvault secret show --vault-name $VAULT --name $SECRET

This pattern:

✅ Prefers MCP tools
✅ Documents fallback path
✅ Maintains skill focus on workflow, not execution

10. Testing & Evaluation

Building Skills and Tools is only half the job—you need to verify they work correctly. This section covers testing strategies for both trigger accuracy (does the right thing get invoked?) and task completion (does it actually work?).

Why Testing Matters

Test Type	What It Catches	Consequence of Skipping
Trigger accuracy	False positives/negatives in routing	Wrong skill invoked; user frustration
Task completion	Broken workflows, missing steps	Deployment failures; data loss
Regression	Breaking changes from updates	Previously working flows break

10.1 Waza Framework Overview

Waza is a framework for evaluating Agent Skills with task completion metrics and trigger accuracy testing:

# Install
pip install waza

# Generate eval from skill
waza generate --repo microsoft/GitHub-Copilot-for-Azure --skill azure-functions -o ./eval

# Run evaluation
waza run eval.yaml

10.2 Trigger Accuracy Testing

Test that your skill triggers on the right prompts:

# trigger_tests.yaml
name: my-skill-triggers
skill: my-skill

shouldTriggerPrompts:
  - "deploy my app to Azure"
  - "set up Azure deployment"
  - "prepare for Azure"
  - "help me deploy"
  - "configure Azure hosting"

shouldNotTriggerPrompts:
  - "list my storage accounts"
  - "run azd up"
  - "check resource health"
  - "get my subscription"
  - "query logs"

10.3 Task Completion Metrics

Define tasks with success criteria:

# tasks/deploy-app.yaml
id: deploy-app-001
name: Deploy Container App

inputs:
  prompt: "Deploy my app to Azure Container Apps"
  context:
    files: ["Dockerfile", "app.py"]

expected:
  output_contains:
    - "container"
    - "deployed"
  
  tool_calls:
    required:
      - pattern: "az containerapp"
    forbidden:
      - pattern: "rm -rf"

10.4 CI/CD Integration

# .github/workflows/skill-eval.yaml
name: Skill Evaluation

on:
  pull_request:
    paths:
      - 'skills/**'

jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install waza
        run: pip install waza
      
      - name: Run evaluations
        run: waza run evals/my-skill/eval.yaml --output results.json
      
      - name: Check thresholds
        run: |
          python -c "
          import json
          r = json.load(open('results.json'))
          assert r['summary']['composite_score'] >= 0.8
          "

10.5 Sensei: Frontmatter Compliance

Use Sensei to improve skill frontmatter:

# Run on a single skill
Run sensei on my-skill

# Run on all low-adherence skills
Run sensei on all Low-adherence skills

Sensei will:

Score current compliance (Low → High)
Add USE FOR trigger phrases
Add DO NOT USE FOR anti-triggers
Add INVOKES for tool relationships
Verify token budget
Run tests

11. References

This section provides authoritative sources for deeper learning. These references were used in creating this guide.

MCP (Model Context Protocol)

Resource	URL	What You'll Learn
MCP Specification	https://modelcontextprotocol.io/specification/latest	Protocol details, message formats
MCP Architecture Overview	https://modelcontextprotocol.io/docs/concepts/architecture	Host/client/server relationships
MCP Tools Concepts	https://modelcontextprotocol.io/docs/concepts/tools	How tools work, schema definition
MCP Prompts Concepts	https://modelcontextprotocol.io/docs/concepts/prompts	User-controlled primitives
Code Execution with MCP (Anthropic)	https://www.anthropic.com/engineering/code-execution-with-mcp	Real-world MCP patterns

GitHub Copilot

Resource	URL	What You'll Learn
Copilot SDK Architecture	https://deepwiki.com/github/copilot-sdk/3-sdk-architecture	How Copilot integrates extensions
Awesome Copilot	https://github.com/github/awesome-copilot	Curated list of resources
Maximizing Copilot's Agentic Capabilities	https://github.blog/ai-and-ml/github-copilot/how-to-maximize-github-copilots-agentic-capabilities/	Best practices for agent workflows

Azure Implementation Examples

Resource	URL	What You'll Learn
Azure MCP Server	https://github.com/microsoft/mcp/tree/main/servers/Azure.Mcp.Server	Production MCP server implementation
GitHub Copilot for Azure Skills	https://github.com/microsoft/GitHub-Copilot-for-Azure/tree/main/plugin/skills	Production skill examples
MCP Commands Reference	https://github.com/microsoft/mcp/blob/main/servers/Azure.Mcp.Server/docs/azmcp-commands.md	Available Azure MCP commands

Tools & Frameworks

Resource	URL	What You'll Learn
Waza (Skill Evaluation)	https://github.com/spboyer/waza	Testing framework for skills
Sensei (Frontmatter Improvement)	https://github.com/spboyer/sensei	Automated skill compliance fixes

Agent Skills & Patterns (Anthropic)

Resource	URL	What You'll Learn
Building Effective AI Agents	https://www.anthropic.com/research/building-effective-agents	Core agent patterns (routing, chaining, orchestration)
Agent Skills Engineering Blog	https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills	Progressive disclosure, skill authoring best practices
Claude Skills Cookbook	https://github.com/anthropics/claude-cookbooks/tree/main/skills	Practical skill examples
Agent Skills Open Standard	https://agentskills.io/specification	Specification for portable skills

Quick Reference Card

Routing Cheat Sheet

User Says	Route To	Why
"Deploy my app"	SKILL	Workflow
"List my resources"	MCP	Data query
"Help me set up"	SKILL	Guidance
"Run this command"	MCP	Execution
"What went wrong"	SKILL	Diagnosis

Skill → MCP Tool Mapping

Skill	Primary MCP Tools
`azure-prepare`	`azure-deploy` (plan_get), `azure-get_azure_bestpractices`
`azure-validate`	`azure-azd` (validate_azure_yaml)
`azure-deploy`	`azure-azd` (up, deploy), `azure-deploy` (app_logs_get)
`azure-diagnostics`	`azure-applens`, `azure-resourcehealth`, `azure-monitor`
`azure-functions`	`azure-functionapp`, `azure-azd`
`azure-observability`	`azure-monitor`, `azure-applicationinsights`
`azure-security`	`azure-keyvault`, `azure-role`

Frontmatter Template

---
name: azure-{domain}
description: |
  **WORKFLOW SKILL** - {One-line description}.
  USE FOR: {trigger1}, {trigger2}, {trigger3}.
  DO NOT USE FOR: {scenario1} (use {other}), {scenario2}.
  INVOKES: `{mcp-tool-1}`, `{mcp-tool-2}`.
  FOR SINGLE OPERATIONS: Use `{mcp-tool}` directly.
---

Appendix A: How LLMs Decide What to Invoke

This appendix explores how different AI platforms decide between using their own knowledge, invoking tools (like MCP), or activating skills/prompts. Understanding these mechanisms can help us write better descriptions and improve routing accuracy.

A.1 The Universal Routing Problem

All LLM-based agents face the same fundamental question: Given a user prompt and a set of available capabilities, which (if any) should be invoked?

The answer varies by platform, but the core mechanisms are similar:

User Prompt → Intent Analysis → Capability Matching → Decision
                                                         ↓
                              ┌─────────────────────────────────────────┐
                              │  Answer directly (LLM knowledge)         │
                              │  Invoke tool (function call)             │
                              │  Activate skill/prompt (workflow)        │
                              │  Request clarification (ask user)        │
                              └─────────────────────────────────────────┘

A.2 Platform-Specific Routing Mechanisms

OpenAI (GPT-4, GPT-5)

Mechanism: Function calling via trained policy + orchestration layer

Component	How It Works
Function schemas	Developers define JSON schemas with name, description, parameters
Intent matching	Model analyzes prompt against function descriptions
Decision	Outputs `tool_call` message if function matches; otherwise answers directly
Confidence	Picks function with highest semantic similarity to prompt

Key insight: The description field is critical. GPT uses it to decide whether to call your function. Poor descriptions = poor routing.

"The decision to call a function is made purely by the model, based on prompt-to-function intent matching and context." — OpenAI Function Calling Docs

Anthropic (Claude)

Mechanism: Tool use with progressive disclosure + MCP integration

Component	How It Works
Tool discovery	Claude can search for tools dynamically (doesn't load all at once)
Progressive disclosure	Only loads schemas of relevant tools based on query
MCP integration	Uses `tool_use` blocks via MCP protocol
Code orchestration	Can generate code to orchestrate multi-tool workflows

Key insight: Claude's "progressive disclosure" means it searches for the right tool rather than scanning all tools. Clear, distinct descriptions help Claude find your tool.

"Tools built for agents are most ergonomic—and effective—when they are intuitive for both non-deterministic agents and humans." — Anthropic: Writing Tools for Agents

Google (Gemini)

Mechanism: Function declarations with semantic routing

Component	How It Works
Function declarations	Schema with name, description, parameters passed at runtime
Intent analysis	Compares user intent to function descriptions
Routing	Semantic similarity + context determines function selection
Parallel calls	Can call multiple functions simultaneously

Key insight: Gemini emphasizes the quality of function descriptions—more precise descriptions yield better routing.

"The more descriptive and precise the function definitions are, the better Gemini can match them to user requests." — Gemini Function Calling Docs

GitHub Copilot

Mechanism: Embedding-guided skill routing with semantic matching

Component	How It Works
Skill frontmatter	YAML with `name` and `description` in SKILL.md
Embedding matching	Creates vector embeddings of user prompt and skill descriptions
Clustering	Groups skills by similarity to narrow candidates
On-demand loading	Only loads matching skill content into context

Key insight: Copilot uses embedding-based semantic similarity. Your skill's description field is converted to a vector and compared against the user's prompt vector. Similar vectors = skill gets invoked.

"Copilot compares the user's prompt against each available skill's description using embedding-based semantic similarity." — GitHub Blog: Making Copilot Smarter

A.3 Common Patterns Across Platforms

Despite implementation differences, all platforms share these routing principles:

Principle	Description	Implication for Skill/Tool Authors
Description is king	The description field drives routing decisions	Write clear, specific descriptions
Semantic matching	Embeddings or intent classifiers compare prompt to description	Use the same words users would use
Negative examples help	Stating what something doesn't do prevents misrouting	Include "DO NOT USE FOR" sections
Context matters	Conversation history influences routing	Skills should be context-aware
Confidence thresholds	If no good match, LLM answers directly	Don't force routing—let LLM decide

A.4 Implications for This Guide

Based on this research, our guidance aligns well with how LLMs actually route:

Our Recommendation	Why It Works
USE FOR: trigger phrases	Matches how embedding similarity works
DO NOT USE FOR: anti-triggers	Prevents false positives in semantic matching
INVOKES: tool list	Helps LLM understand skill-tool relationships
FOR SINGLE OPERATIONS	Provides fallback routing guidance
Clear, specific descriptions	Improves embedding quality and intent matching

A.5 What We Can't Control

Some routing behaviors are opaque or model-specific:

Factor	What We Know	What We Don't Know
Embedding models	Used for semantic similarity	Exact model, training data
Confidence thresholds	Exist, vary by platform	Specific values
Priority when tied	First match? Highest score?	Implementation details
Context window impact	More tools = more competition	Exact degradation curve

A.6 Best Practices from Anthropic Research

Anthropic published specific guidance on writing effective tool descriptions:

Be specific about function AND intent
- ❌ "Get information about weather"
- ✅ "Retrieves current weather (temperature, precipitation, condition) for a city. Use only for present-day conditions, not forecasts."
Highlight boundaries explicitly
- State what the tool doesn't do
- Prevents misrouting to similar-sounding tools
Provide usage examples
- Short canonical examples improve generalization
- "Example: 'What's the weather in Paris right now?'"
Namespace tools
- Use prefixes: Weather_GetCurrent, Weather_GetForecast
- Helps LLM distinguish similar tools
Return meaningful context
- Tool responses should enable good follow-up decisions
- Balance detail and brevity

A.7 Testing Routing Accuracy

To verify your descriptions work across platforms:

# routing_tests.yaml
tests:
  - prompt: "Deploy my React app to Azure"
    expected_route: skill
    expected_target: azure-prepare
    
  - prompt: "List my storage accounts"
    expected_route: mcp_tool
    expected_target: azure-storage
    
  - prompt: "What is Azure Functions?"
    expected_route: llm_knowledge
    expected_target: null  # No tool needed

Run these tests against multiple LLMs to ensure consistent routing.

A.8 References

Resource	URL
OpenAI Function Calling	https://platform.openai.com/docs/guides/function-calling
Anthropic Tool Use	https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview
Anthropic: Writing Tools for Agents	https://www.anthropic.com/engineering/writing-tools-for-agents
Anthropic: Advanced Tool Use	https://www.anthropic.com/engineering/advanced-tool-use
Gemini Function Calling	https://ai.google.dev/gemini-api/docs/function-calling
GitHub Copilot Skills	https://docs.github.com/en/copilot/concepts/agents/about-agent-skills
Copilot Embedding Routing	https://github.blog/ai-and-ml/github-copilot/how-were-making-github-copilot-smarter-with-fewer-tools/

Document Version: 2.0 | Last Updated: 2026-02-05

What's New in v2.0:

Added TL;DR Quick Start section
Added Three Levels (Progressive Disclosure) diagram
Added Evaluation-First Development (§4.1)
Added Degrees of Freedom guidance (§4.2)
Added Script Guidelines (§4.9)
Added Workflow Checklist Pattern (§7.6)
Added Security Considerations (§8.3)
Added Anthropic Agent Skills references

spboyer/skills-mcp-development-guide-v2.md

Skills, Tools & MCP Development Guide

Table of Contents

TL;DR — Quick Start

Core Concepts

Quick Decision Tree

Progressive Disclosure (Three Levels)

Key Principles

Key Files

1. Introduction

Why This Matters

Who Should Read This

What Questions This Guide Answers

What Problem Are We Solving?

The Solution: Clear Separation of Concerns

The Golden Rule

2. Architecture Overview

2.1 MCP Protocol Fundamentals

MCP Primitives

Why This Matters for Skills

2.2 Copilot Skill Anatomy

File Structure

Component Loading Behavior

Progressive Disclosure: Three Levels

The Frontmatter Is Everything

2.3 Two-Tier Skill Architecture

Why Two Tiers?

Tier Responsibilities

3. When to Use What

Why Routing Matters

3.1 The Core Routing Question

3.2 Route by Verb

3.3 Edge Cases: When Routing Is Ambiguous

3.4 Routing Rules for System Prompts

3.5 Disambiguation Examples

4. Skills Development Guide

4.1 Evaluation-First Development

The Workflow

Key Insight

Tooling

4.2 Degrees of Freedom

High Freedom Example (code review)

Low Freedom Example (database migration)

4.5 Frontmatter Best Practices

How Routing Works

Required Elements

Description Pattern (High Compliance)

Skill Classification Prefixes

4.6 Scoring Criteria

Before (Low Compliance)

After (High Compliance)

4.7 Token Budget Management

4.8 Reference Documentation Patterns

4.9 Script Guidelines

Execute vs. Reference

Error Handling

Dependencies

Path Conventions

5. Skill Organization Patterns

5.1 The Problem: Skill Proliferation

5.2 Three Organization Patterns

Pattern A: Flat (Service-Specific Skills Only)

Pattern B: Consolidated (Single Domain Skill)

Pattern C: Orchestrator + Service Skills (Recommended)

5.3 The Orchestrator Pattern in Detail

Key Mechanism: USE FOR / DO NOT USE FOR

5.4 Preventing Trigger Collisions

5.5 Routing Examples

5.6 Testing Implications

5.7 When to Introduce an Orchestrator

5.8 Summary: Choosing an Organization Pattern

6. MCP Tool Development Guide

6.1 When to Create an MCP Tool

Create a Tool When:

Do NOT Create a Tool When:

6.2 Tool Design Principles

6.3 Tool Schema Definition

6.4 Naming Conventions

6.5 MCP Description Template

6.6 Best Practices Files