Skip to content

Instantly share code, notes, and snippets.

@spboyer
Created February 5, 2026 16:22
Show Gist options
  • Select an option

  • Save spboyer/b01ea7dc9d063598c698b93ede676505 to your computer and use it in GitHub Desktop.

Select an option

Save spboyer/b01ea7dc9d063598c698b93ede676505 to your computer and use it in GitHub Desktop.
Skills, Tools & MCP Development Guide v2 - Enhanced with TL;DR, Security, Evaluation-First patterns

Skills, Tools & MCP Development Guide

A Practical Guide for Building Agent Capabilities

"Skills orchestrate the how. MCP Tools execute the what."

This guide consolidates learnings from building AI agent capabilities for Azure, focusing on the relationship between Skills (workflow orchestration) and MCP Tools (discrete operations). Whether you're creating a new Copilot Skill, building an MCP server, or integrating both, this document provides the architecture patterns, development best practices, and real-world case studies you need to build effective, non-conflicting agent capabilities. The core principle is simple: Skills act as the "brain" that orchestrates complex workflows, while MCP Tools serve as the "hands" that execute individual operations—and Skills should invoke MCP Tools, not duplicate them.


Table of Contents

  1. Introduction
  2. Architecture Overview
  3. When to Use What
  4. Skills Development Guide
  5. Skill Organization Patterns
  6. MCP Tool Development Guide
  7. Integration Patterns
  8. DOs and DON'Ts
  9. Case Studies
  10. Testing & Evaluation
  11. References

Appendices


TL;DR — Quick Start

Core Concepts

Concept What It Is When to Use
Skill A folder with SKILL.md that teaches the agent how to do something Multi-step workflows, decisions, code generation
MCP Tool A function the agent can call to do something Single operations, data retrieval, queries
The Pattern Skills invoke MCP Tools, not the other way around Always

One-liner: Skills are onboarding guides for AI agents. MCP Tools are the buttons they press.

Quick Decision Tree

Is this a workflow with decisions?
├─ YES → Create a SKILL
└─ NO  → Is it a single operation?
         ├─ YES → Create an MCP TOOL
         └─ BOTH → Skill orchestrates, MCP executes

Progressive Disclosure (Three Levels)

Skills load content incrementally to stay lean:

  1. Level 1: Metadataname + description (always in system prompt, ~50 tokens)
  2. Level 2: Instructions — Full SKILL.md (loaded when triggered, <500 tokens ideal)
  3. Level 3: Resourcesreferences/ + scripts/ (loaded on demand)

Key Principles

  • Match instruction detail to task risk — Fragile operations need exact scripts; flexible tasks need guidance
  • Use checklists for multi-step workflows — Makes progress trackable and resumable
  • Test first, document minimally — Write evals before writing docs (§4.1)
  • Skills call MCP for patterns — Single source of truth, no duplication

Key Files

my-skill/
├── SKILL.md              # Workflow instructions + frontmatter
├── references/           # Deep-dive docs (loaded on demand)
└── scripts/              # Executable code (not loaded, just run)

1. Introduction

Why This Matters

As AI agents become the primary interface for developer workflows, the quality and consistency of agent guidance directly impacts developer success. Skills and MCP Tools are two distinct systems that provide capabilities to AI agents—and when they conflict or overlap without coordination, developers lose trust and productivity.

This guide provides a framework for building Skills and MCP Tools that work together rather than compete. The principles here apply to any domain where both systems coexist.

A Note on Scope and Examples: This guide uses Azure as the primary source of examples because that's where we conducted our research and validated these patterns. However, the guidance is agent-agnostic and domain-agnostic—the same principles apply whether you're building Skills for AWS, GCP, internal platforms, or any other domain. Similarly, while we reference GitHub Copilot as the implementation platform, the Skill pattern itself works across any agent that supports structured capability injection.

Who Should Read This

Audience What You'll Learn
Architects How Skills and MCP fit together, when to recommend each, how to design for the hybrid pattern
Skill Developers How to write Skills that complement (not compete with) MCP Tools, frontmatter best practices
MCP Contributors When to add a new tool vs. defer to Skills, naming conventions, integration patterns
Platform Teams How to evaluate existing Skills/Tools, identify overlaps, and improve routing

What Questions This Guide Answers

Before diving in, here are the key questions this guide addresses:

  1. When do I create a Skill vs. an MCP Tool?Section 3: When to Use What
  2. How do I write a Skill that invokes MCP Tools?Section 6: Integration Patterns
  3. What does good Skill frontmatter look like?Section 4: Skills Development Guide
  4. How do I avoid creating conflicting guidance?Section 7: DOs and DON'Ts
  5. How do I test that my Skill routes correctly?Section 9: Testing & Evaluation

What Problem Are We Solving?

AI agents need clear, unambiguous guidance to help users effectively. When two capability systems exist side-by-side, problems emerge:

System Purpose Control Model
MCP Tools Execute discrete operations via JSON-RPC Model-controlled (LLM decides when to invoke)
Copilot Skills Orchestrate multi-step workflows via prompts User-controlled (explicit selection)

The problem: When both are exposed to the LLM simultaneously without clear routing, you get:

  • Duplicate invocations — LLM calls both systems for the same request
  • Name collisions — Same capability name exists in both (e.g., "deploy")
  • Conflicting guidance — Different systems suggest incompatible approaches
  • Inconsistent experience — Same prompt yields different results

Real-World Example: In the Azure ecosystem, research found that MCP's best practices for Static Web Apps recommended SWA CLI (npx swa deploy), while Skills recommended azd up with Bicep. No coordination layer existed—the agent picked randomly, leading to deployment failures when incompatible approaches were mixed. See Section 8: Case Studies for the full analysis and solution.

The Solution: Clear Separation of Concerns

┌─────────────────────────────────────────────────────────────────────┐
│                         USER REQUEST                                 │
│                    "Deploy my app to Azure"                          │
└─────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         LLM ROUTER                                   │
│     Analyzes intent, context, and decides execution path             │
└─────────────────────────────────────────────────────────────────────┘
                     │                           │
          ┌─────────▼─────────┐       ┌─────────▼─────────┐
          │   SKILL LAYER     │       │   MCP TOOL LAYER  │
          │   🧠 THE BRAIN    │       │   🖐️ THE HANDS    │
          │                   │       │                   │
          │ • Workflows       │       │ • CRUD Operations │
          │ • Decisions       │       │ • Queries         │
          │ • Best Practices  │       │ • Direct API Calls│
          │ • Multi-step      │       │ • Single Actions  │
          │   Guidance        │       │                   │
          └─────────┬─────────┘       └─────────▲─────────┘
                    │                           │
                    └───────────────────────────┘
                         Skills INVOKE MCP tools

The Golden Rule

Component Role Control Model Analogy
Skills Workflow orchestration User-controlled (explicit selection) The Brain
MCP Tools Discrete operations Model-controlled (LLM decides) The Hands

The Pattern:

User Request → SKILL (user-initiated workflow) → MCP TOOLS (model-executed actions)

2. Architecture Overview

This section explains the foundational architecture of MCP and Skills. Understanding these primitives is essential before building capabilities—the architecture dictates what goes where.

2.1 MCP Protocol Fundamentals

The Model Context Protocol (MCP) is an open standard created by Anthropic that defines how AI applications (hosts) connect to external data sources and tools (servers). It uses JSON-RPC 2.0 over stdio or HTTP for transport.

┌─────────────────────────────────────────────────────────────────────┐
│                         MCP HOST                                     │
│              (VS Code, Claude Desktop, Copilot CLI)                 │
│                                                                      │
│   ┌─────────────┐   ┌─────────────┐   ┌─────────────┐               │
│   │ MCP Client  │   │ MCP Client  │   │ MCP Client  │               │
│   └──────┬──────┘   └──────┬──────┘   └──────┬──────┘               │
└──────────┼─────────────────┼─────────────────┼──────────────────────┘
           │                 │                 │
           ▼                 ▼                 ▼
    ┌────────────┐    ┌────────────┐    ┌────────────┐
    │ MCP Server │    │ MCP Server │    │ MCP Server │
    │  (Azure)   │    │ (GitHub)   │    │(Filesystem)│
    └────────────┘    └────────────┘    └────────────┘

Key architectural principle: One host can connect to multiple servers simultaneously. Each server exposes capabilities through three primitives, each with a distinct control model.

MCP Primitives

Primitive Purpose Control Model Who Decides When to Use
Tools Executable functions for actions Model-controlled The LLM decides based on context
Prompts Reusable interaction templates User-controlled The user explicitly selects
Resources Data/context sources Application-controlled The host application decides

Understanding control models is critical:

  • Model-controlled (Tools): The LLM sees the tool's schema and decides autonomously when to call it. You cannot prevent the LLM from calling a tool—you can only influence its decision through descriptions.
  • User-controlled (Prompts): The user must explicitly select a prompt. The LLM cannot invoke it on its own.
  • Application-controlled (Resources): The host application determines when to load resources into context.

"Tools enable models to interact with external systems... Each tool is uniquely identified by a name and includes metadata describing its schema." — MCP Tools Documentation

"Prompts are designed to be user-controlled, meaning they are exposed from servers to clients with the intention of the user being able to explicitly select them for use." — MCP Prompts Documentation

Why This Matters for Skills

Copilot Skills are conceptually similar to MCP Prompts—they're user-controlled workflow templates. However, Skills are implemented differently (as markdown files with frontmatter, not JSON-RPC endpoints). The key insight:

Concept MCP Term Copilot Implementation Control
Discrete operation Tool MCP Tool Model-controlled
Workflow template Prompt Skill (SKILL.md) User-controlled
Context data Resource Skill references/ Application-controlled

Architectural clarification: The MCP specification describes Tools as "model-controlled" and Skills as "user-controlled," but in practice (particularly in Copilot's implementation), both use similar routing mechanisms:

  • Both populate the context window with their descriptions
  • Both are selected via embedding/semantic similarity matching
  • Both appear as "tool execution" in agent logs

The practical difference is not how they're invoked but what they're designed for: Skills provide rich workflow orchestration with frontmatter (USE FOR, DO NOT USE FOR, INVOKES), while MCP Tools execute discrete operations. The coordination challenge is really about description collision—when a Skill and Tool have overlapping descriptions, either may be selected regardless of your intended workflow. This is why clear, differentiated descriptions matter more than the conceptual skill-vs-tool distinction.

2.2 Copilot Skill Anatomy

A Skill is a structured markdown file that provides workflow guidance to the LLM. Unlike MCP Tools (which execute code), Skills inject prompts and context that guide the LLM's behavior.

File Structure

my-skill/
├── SKILL.md              ◄── Primary skill definition (frontmatter + workflow)
│   ├── ---
│   │   name: my-skill
│   │   description: |
│   │     **WORKFLOW SKILL** - [description]
│   │     USE FOR: [triggers]
│   │     DO NOT USE FOR: [anti-triggers]
│   │     INVOKES: [mcp tools]
│   │   ---
│   └── [Workflow body with steps]
│
├── references/           ◄── Supplemental materials (deep-dive docs)
│   ├── services/             • Service-specific guidance
│   ├── recipes/              • Step-by-step procedures
│   └── patterns/             • Reusable patterns
│
└── scripts/              ◄── Automation scripts

Component Loading Behavior

Understanding when components load is critical for optimization:

Component When Loaded Purpose Token Budget
SKILL.md Always (when skill matches) Primary definition, routing logic < 500 tokens (soft), < 5000 (hard)
references/*.md On-demand (LLM requests) Deep-dive docs, patterns, recipes < 1000 tokens each
scripts/ Never (execution only) Automation, not LLM context N/A

Why token budgets matter: Skills compete for context window space. A 5000-token skill leaves less room for user code, conversation history, and MCP tool schemas. Lean skills perform better.

Progressive Disclosure: Three Levels

Skills use progressive disclosure to keep context lean. Only load what's needed, when it's needed.

┌─────────────────────────────────────────────────────────────────────┐
│  LEVEL 1: METADATA (Always in System Prompt)                        │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │ name: azure-deploy                                           │    │
│  │ description: Deploy applications to Azure...                 │    │
│  └─────────────────────────────────────────────────────────────┘    │
│       ↓ Pre-loaded at startup for ALL installed skills (~50 tokens) │
│                                                                      │
│  LEVEL 2: INSTRUCTIONS (Loaded on Trigger)                           │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │ # Azure Deploy                                               │    │
│  │ ## Steps                                                     │    │
│  │ 1. Validate prerequisites...                                 │    │
│  │ 2. Run deployment...                                         │    │
│  └─────────────────────────────────────────────────────────────┘    │
│       ↓ Loaded when skill matches user request (<500-5000 tokens)   │
│                                                                      │
│  LEVEL 3: RESOURCES (Loaded on Demand)                               │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │ references/container-apps.md   ← Loaded if CA detected      │    │
│  │ references/functions.md        ← Loaded if Functions needed │    │
│  │ scripts/validate.py            ← Executed, output only      │    │
│  └─────────────────────────────────────────────────────────────┘    │
│       ↓ Loaded only when skill references them (unlimited)          │
└─────────────────────────────────────────────────────────────────────┘

Why this matters: Level 1 is ~50 tokens per skill. Level 2 is 500-5000 tokens. Level 3 can be unlimited but only loads when referenced. This keeps agents fast and focused.

The Frontmatter Is Everything

The description field in frontmatter determines whether your skill gets invoked. This is the LLM's only signal for routing decisions. A poor description means your skill won't trigger—or will trigger incorrectly.

Key frontmatter elements (detailed in Section 4):

  • USE FOR: Trigger phrases that should activate this skill
  • DO NOT USE FOR: Anti-triggers that should route elsewhere
  • INVOKES: MCP tools this skill calls (helps LLM understand the relationship)

2.3 Two-Tier Skill Architecture

When building a skill ecosystem, not all skills are equal. Some orchestrate primary workflows (like deployment); others provide deep-dive knowledge for specific services. Separating these into tiers prevents confusion and improves maintainability.

┌────────────────────────────────────────────────────────────────────┐
│                        TIER 1: CORE SKILLS                         │
│                                                                    │
│   ┌──────────────┐   ┌──────────────┐   ┌──────────────┐          │
│   │   PREPARE    │ → │   VALIDATE   │ → │    DEPLOY    │          │
│   └──────────────┘   └──────────────┘   └──────────────┘          │
│                                                                    │
│   Purpose: Orchestrate the primary development workflow            │
│   Ownership: Central skills team                                   │
│   Invocation: User wants to build, prepare, validate, or deploy    │
└────────────────────────────────────────────────────────────────────┘
                                │
                                │ references
                                ▼
┌────────────────────────────────────────────────────────────────────┐
│                    TIER 2: SERVICE-SPECIFIC SKILLS                 │
│                                                                    │
│   ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐     │
│   │  Azure     │ │  Azure     │ │  Azure     │ │  Azure     │ ... │
│   │ Functions  │ │ Container  │ │  Cosmos    │ │   Redis    │     │
│   │   Skill    │ │ Apps Skill │ │  DB Skill  │ │   Skill    │     │
│   └────────────┘ └────────────┘ └────────────┘ └────────────┘     │
│                                                                    │
│   Purpose: Deep-dive guidance for specific Azure services          │
│   Ownership: Product and service teams                             │
│   Invocation: User asks service-specific questions (not workflow)  │
└────────────────────────────────────────────────────────────────────┘

Why Two Tiers?

  1. Separation of concerns: Workflow logic (Tier 1) is different from domain knowledge (Tier 2). Mixing them creates bloated, hard-to-maintain skills.

  2. Ownership clarity: The team that owns deployment workflows shouldn't have to be experts in every Azure service. Tier 2 lets service teams own their domain.

  3. Routing precision: "Deploy my app" should always go to a core skill, even if the app uses Functions. "How do Functions triggers work?" should go to the Functions skill directly.

  4. Composability: Tier 1 skills can reference Tier 2 skills as needed, enabling modularity without duplication.

Tier Responsibilities

Aspect Tier 1 (Core) Tier 2 (Service)
Scope Primary workflow (build/deploy) Service-specific knowledge
When Invoked "Deploy my app", "Prepare for Azure" "How do Functions triggers work?"
Can Reference Tier 2 skills, MCP tools MCP tools only
Ownership Central team Product teams
Update Frequency Less frequent (workflow stability) More frequent (service changes)

Example from Azure: The azure-prepare skill (Tier 1) handles the workflow of preparing any app for Azure. When it detects a Functions app, it references the azure-functions skill (Tier 2) for service-specific configuration patterns, then calls MCP tools like azure-functionapp for resource queries.


3. When to Use What

This section answers the most common question: "Should this be a Skill or an MCP Tool?" The answer depends on intent, scope, and control model.

Why Routing Matters

Incorrect routing leads to poor user experiences:

Routing Error Consequence Example
Skill when Tool needed Slow, over-engineered response User asks "list my VMs" → Gets a workflow lecture instead of a list
Tool when Skill needed Incomplete, context-free action User asks "deploy my app" → Tool runs azd up without validation
Both invoked Conflicting guidance, wasted tokens Both Skill and Tool answer, giving different advice
Neither invoked Capability gap Request falls through without any response

The goal is single, correct routing for every user request.

3.1 The Core Routing Question

Ask: "Is this a workflow or an operation?"

                    User: "Deploy my app"
                            │
                            ▼
                   ┌────────────────┐
                   │ Is it a        │
                   │ workflow task? │
                   └───────┬────────┘
                          │
            ┌─────────────┴─────────────┐
            │ YES                   NO  │
            ▼                           ▼
    ┌───────────────┐          ┌───────────────┐
    │    SKILL      │          │   MCP TOOL    │
    │               │          │               │
    │ • Deploy      │          │ • List        │
    │ • Create      │          │ • Get         │
    │ • Set up      │          │ • Query       │
    │ • Configure   │          │ • Run command │
    └───────────────┘          └───────────────┘

Workflow = Multiple steps, decisions required, generates artifacts Operation = Single action, no decisions, returns data or executes command

3.2 Route by Verb

The verb in a user request often signals the correct route:

Verb Route Reason Example Request
Deploy, Create, Set up, Configure SKILL Multi-step workflow "Deploy my React app"
List, Get, Show, Query, Check MCP TOOL Data retrieval "List my storage accounts"
Help, Guide, Walk through, Explain SKILL Guidance needed "Help me set up CI/CD"
Run, Execute MCP TOOL Direct execution "Run azd up"
Troubleshoot, Debug, Diagnose SKILL first Then MCP for data "Why is my app failing?"
Optimize, Review, Analyze SKILL Analysis workflow "Review my architecture"

3.3 Edge Cases: When Routing Is Ambiguous

Not every request maps cleanly to Skill or Tool. Here's how to handle ambiguity:

Ambiguous Request Resolution Rationale
"Create a storage account" SKILL "Create" implies workflow; user needs guidance on SKU, replication, etc.
"Create storage account named 'myacct' in eastus, Standard_LRS" MCP TOOL Fully specified; no decisions needed
"Deploy" (no context) SKILL Ask clarifying questions via workflow
"azd up" (explicit command) MCP TOOL User knows exactly what they want
"Set up monitoring" (broad) SKILL Workflow to determine what to monitor, how, and where
"Show me the metrics for my app" MCP TOOL Data retrieval, specific ask

Rule of thumb: If the user provides all required parameters explicitly, route to Tool. If decisions remain, route to Skill.

3.4 Routing Rules for System Prompts

Add this to LLM system prompts to improve routing consistency:

## Tool Routing Rules

BEFORE invoking any capability, determine the correct route:

### Route to SKILL when:
- Request involves multiple steps: "deploy my app", "set up monitoring"
- Request needs decisions: "what should I use for...", "help me choose..."
- Request generates code: "create azure.yaml", "generate Bicep"
- Request follows workflow: prepare → validate → deploy
- User says: "help me", "guide me", "walk me through"

### Route to MCP TOOL when:
- Request is data retrieval: "list my...", "show me...", "get..."
- Request is single operation: "delete this", "query logs", "run azd up"
- Request targets specific resource: "storage account named X"
- Skill step explicitly invokes MCP tool
- User says: "just run", "execute", "check status"

### When BOTH are needed (Skill invokes Tool):
- Skill orchestrates the workflow
- Skill calls MCP Tool for specific operations within the workflow
- Example: azure-diagnostics (Skill) calls azure-applens (Tool) for diagnostic data

3.5 Disambiguation Examples

Real-world routing decisions:

User Request Route Target Why
"Deploy my app to Azure" SKILL azure-prepare New deployment workflow
"Run azd up" MCP azure-azd Direct command execution
"List my storage accounts" MCP azure-storage Data query
"Set up Key Vault" SKILL azure-security Workflow guidance
"Get secret 'api-key'" MCP azure-keyvault Direct operation
"What's wrong with my app" SKILL azure-diagnostics Analysis workflow
"Check resource health status" MCP azure-resourcehealth Status query
"Create a new Function App" SKILL azure-functions Creation workflow
"List my Function Apps" MCP azure-functionapp Data query
"How do I configure CORS for SWA?" SKILL azure-prepare Guidance needed

4. Skills Development Guide

This section provides practical guidance for building effective Skills. A well-designed Skill has a clear purpose, triggers correctly, and integrates smoothly with MCP Tools.

4.1 Evaluation-First Development

Principle: Build evaluations first, then write minimal instructions.

Don't document everything the agent might need. Document only what it gets wrong without guidance.

The Workflow

1. TEST WITHOUT SKILL  → Run agent on representative tasks
       ↓
2. IDENTIFY GAPS       → Where does it fail? What does it get wrong?
       ↓
3. CREATE EVALS        → Structure test scenarios
       ↓
4. WRITE MINIMAL DOCS  → Only what's needed to pass evals
       ↓
5. ITERATE             → Refine based on results

Key Insight

"Building a skill is not the same as training a model. It is closer to writing an onboarding guide for a new hire." — Anthropic

The agent already knows how to code. Your skill teaches it your patterns, constraints, and preferences.

Tooling

Use Waza for structured skill evaluation:

# Generate eval from existing skill
waza generate --repo microsoft/GitHub-Copilot-for-Azure --skill azure-functions -o ./eval

# Run evaluation
waza run eval.yaml

4.2 Degrees of Freedom

Match instruction specificity to task fragility:

Freedom Level When to Use Risk if Too Loose Risk if Too Tight
High Multiple valid approaches exist Agent picks suboptimal path Unnecessary constraints
Medium Preferred pattern with variations Inconsistent results Missed edge cases
Low Fragile, error-prone operations Broken systems, data loss Agent can't adapt

Rule of thumb: The more damage a wrong approach can cause, the more prescriptive your instructions should be.

High Freedom Example (code review)

## Code Review Process
1. Analyze code structure and organization
2. Check for potential bugs or edge cases
3. Suggest improvements for readability

Low Freedom Example (database migration)

## Database Migration
Run exactly this script:
```bash
python scripts/migrate.py --verify --backup

Do not modify the command or add additional flags.


### 4.3 The Skill Development Process

Before writing code, answer these questions:

| Question | Why It Matters | Wrong Answer = Don't Build |
|----------|----------------|---------------------------|
| What workflow does this skill orchestrate? | Skills are for workflows, not single operations | "It lists resources" → That's an MCP Tool |
| What decisions does the user need help with? | Skills provide guidance | "None, just run the command" → MCP Tool |
| What MCP Tools will this skill invoke? | Skills complement Tools | "None" → May not need a Skill |
| What triggers should activate this skill? | Routing depends on triggers | "Everything" → Too broad, will conflict |
| What should NOT trigger this skill? | Anti-triggers prevent conflicts | "Nothing" → Will have false positives |

#### Skill Development Lifecycle

  1. IDENTIFY → Is this a workflow? What decisions are involved? ↓
  2. DESIGN → Map the workflow steps. What MCP tools are needed? ↓
  3. WRITE → Frontmatter first (triggers). Then body (steps). ↓
  4. TEST → Trigger accuracy tests. Does it invoke correctly? ↓
  5. ITERATE → Refine anti-triggers based on false positives.

> **Tip:** For multi-step workflows, use the [Workflow Checklist Pattern (§7.6)](#76-workflow-checklist-pattern) to make progress trackable.

### 4.4 SKILL.md Structure

Every skill follows this structure:

#### Naming Constraints

| Field | Constraints |
|-------|-------------|
| `name` | • Max 64 characters<br>• Lowercase letters, numbers, hyphens only<br>• No XML tags<br>• No reserved words: `anthropic`, `claude`, `openai`, `copilot` |
| `description` | • Must be non-empty<br>• Max 1024 characters<br>• No XML tags<br>• Write in third person ("Deploys applications..." not "I deploy...") |

```yaml
---
name: my-skill-name
description: |
  **WORKFLOW SKILL** - One-line description of what the skill does.
  USE FOR: trigger phrase 1, trigger phrase 2, trigger phrase 3.
  DO NOT USE FOR: scenario1 (use other-skill), scenario2 (use mcp-tool).
  INVOKES: `mcp-tool-1`, `mcp-tool-2` for execution.
  FOR SINGLE OPERATIONS: Use `mcp-tool` directly for simple queries.
---

Skill Body Structure:

# Skill Title

## When to Use This Skill
Activate when user wants to:
- Specific action 1
- Specific action 2

## Prerequisites
- Required MCP tools: `azure-xxx`, `azure-yyy`
- Required permissions: list

## MCP Tools Used

| Step | MCP Tool | Command | Purpose |
|------|----------|---------|---------|
| 1 | `azure-xxx` | `xxx_list` | Gather data |
| 3 | `azure-yyy` | `yyy_create` | Execute action |

## Steps

### Step 1: Action Name

**Using MCP (Preferred):**
Invoke `azure-xxx` MCP tool:
- Command: `command_name`
- Parameters: `subscription`, `resource-group`

**CLI Fallback (if MCP unavailable):**
az command --subscription X

## Related Skills
- For X: `azure-x-workflow`
- For Y: `azure-y-guide`

4.5 Frontmatter Best Practices

The frontmatter is the most critical part—it determines when your skill is invoked. The LLM uses the description field to decide whether to route a request to your skill.

How Routing Works

  1. User makes a request: "Deploy my React app to Azure"
  2. LLM scans all available skills' description fields
  3. LLM matches request keywords against skill descriptions
  4. Best-matching skill is invoked (or none, if no match)

Implication: If your description doesn't contain the right trigger phrases, your skill won't be invoked—even if it's the right tool for the job.

Required Elements

Element Purpose Example
name Unique identifier azure-deploy
description Triggers + anti-triggers + relationships See below

Description Pattern (High Compliance)

description: |
  **WORKFLOW SKILL** - Process PDF files including text extraction, rotation, and merging.
  USE FOR: "extract PDF text", "rotate PDF", "merge PDFs", "PDF to text".
  DO NOT USE FOR: creating PDFs from scratch (use document-creator),
  image extraction (use image-extractor).
  INVOKES: pdf-tools MCP for extraction, file-system for I/O.
  FOR SINGLE OPERATIONS: Use pdf-tools MCP directly for simple extractions.

Why each element matters:

Element Purpose What Happens Without It
**WORKFLOW SKILL** Signals multi-step nature LLM may route single ops here
USE FOR: Explicit triggers Skill won't trigger on relevant requests
DO NOT USE FOR: Anti-triggers False positives, conflicts with other skills
INVOKES: MCP relationship LLM doesn't know skill uses tools
FOR SINGLE OPERATIONS: Bypass guidance Users confused about when to use skill vs. tool

Skill Classification Prefixes

Add a prefix to clarify the skill type:

Prefix Use When
**WORKFLOW SKILL** Multi-step orchestration
**UTILITY SKILL** Single-purpose helper
**ANALYSIS SKILL** Read-only analysis/reporting

Effectiveness Note: These prefixes improve routing based on qualitative testing and observed behavior during development. Formal A/B testing with quantified metrics would strengthen these recommendations. The prefixes work because they add semantic signal to the description field, which LLMs use for routing decisions (see Appendix A). In practice, we've observed fewer false positives when prefixes clearly signal the skill's intent.

4.6 Scoring Criteria

Skills are scored on compliance using the criteria below. Target: Medium-High or better.

Tooling Note: The scoring criteria described here were developed using internal evaluation frameworks (sensei for skill analysis, waza for trigger testing). These tools are not currently publicly available, but you can apply the same criteria manually or build equivalent tooling. The key is having a consistent rubric for evaluating skill quality before deployment.

Score Requirements
Low Description < 150 chars OR no triggers
Medium Description >= 150 chars AND has trigger keywords
Medium-High Has "USE FOR:" AND "DO NOT USE FOR:"
High Medium-High + routing clarity (INVOKES/FOR SINGLE OPERATIONS)

Before (Low Compliance)

description: 'Process PDF files'

After (High Compliance)

description: |
  **WORKFLOW SKILL** - Process PDF files including text extraction, rotation, and merging.
  USE FOR: "extract PDF text", "rotate PDF", "merge PDFs", "PDF to text".
  DO NOT USE FOR: creating PDFs from scratch (use document-creator).
  INVOKES: pdf-tools MCP for extraction.
  FOR SINGLE OPERATIONS: Use pdf-tools MCP directly.

4.7 Token Budget Management

File Soft Limit Hard Limit
SKILL.md 500 tokens 5,000 tokens
references/*.md 1,000 tokens 5,000 tokens

Use a .token-limits.json configuration:

{
  "defaults": {
    "SKILL.md": 500,
    "references/**/*.md": 1000
  },
  "overrides": {
    "README.md": 3000
  }
}

Enforcement Note: Token limit enforcement is currently a design pattern, not an automated gate. The .token-limits.json file serves as documentation and can be enforced via CI scripts (count tokens using tiktoken or similar). The limits are based on observed context window usage and agent performance degradation with oversized skills. If building automated enforcement, integrate token counting into your skill linting pipeline.

4.8 Reference Documentation Patterns

Keep SKILL.md lean. Put deep content in references:

my-skill/
├── SKILL.md                      # Workflow orchestration only
└── references/
    ├── services/
    │   └── static-web-apps.md    # SWA-specific patterns
    ├── recipes/
    │   └── deploy-react.md       # Step-by-step for React
    └── patterns/
        └── error-handling.md     # Common error resolutions

Reference them in SKILL.md:

See [SWA Configuration](references/services/static-web-apps.md) for framework-specific settings.

4.9 Script Guidelines

Scripts in scripts/ are executed, not loaded into context. Write them defensively.

Execute vs. Reference

Be explicit about how scripts should be used:

Instruction Meaning
"Run scripts/validate.py" Execute the script
"See scripts/validate.py for the algorithm" Read as reference, don't execute

Error Handling

Scripts should handle errors gracefully—the agent can't debug runtime failures:

# Good: Recovers from errors
def process_file(path):
    try:
        with open(path) as f:
            return f.read()
    except FileNotFoundError:
        print(f"File {path} not found, using default")
        return ''

# Bad: Crashes unexpectedly
def process_file(path):
    return open(path).read()

Dependencies

Document and verify dependencies:

**Requirements:** Python 3.11+, `pypdf` package

Install: `pip install pypdf`

Path Conventions

Always use forward slashes: scripts/helper.py ✅, scripts\helper.py


5. Skill Organization Patterns

As the number of skills grows, organization becomes critical. This section covers patterns for structuring related skills to avoid trigger collisions, enable cross-cutting guidance, and maintain clear routing.

5.1 The Problem: Skill Proliferation

When building skills for a large domain (e.g., data services, compute platforms, messaging systems), you face a fundamental choice about granularity:

Approach Characteristics Trade-offs
One skill per service Each service gets its own skill with deep, focused content Precise activation, smaller context; but no cross-service guidance
One consolidated skill All related services in a single skill Cross-service guidance, single entry point; but bloated context window
Orchestrator + service skills Routing skill delegates to specialized skills Best of both; but more complex to maintain

The right choice depends on your domain's complexity and how often users need cross-service guidance.

5.2 Three Organization Patterns

Pattern A: Flat (Service-Specific Skills Only)

skills/
├── database-postgres/
├── database-mysql/
├── database-mongodb/
└── storage-blob/

When to use:

  • Services are distinct with minimal overlap
  • Users rarely ask "which should I use?"
  • Each skill is self-contained

Pros: Precise activation, smaller context per invocation, independent evolution Cons: No cross-service guidance, potential duplication of shared patterns

Pattern B: Consolidated (Single Domain Skill)

skills/
└── data-services/
    ├── SKILL.md           # All database + storage content
    └── references/
        ├── postgres.md
        ├── mysql.md
        └── mongodb.md

When to use:

  • Services are tightly related
  • Users frequently compare options
  • Shared patterns dominate (auth, backup, networking)

Pros: Single entry point, cross-service guidance built-in Cons: Large context window usage, trigger phrase collisions, monolithic maintenance

Pattern C: Orchestrator + Service Skills (Recommended)

skills/
├── data-services/              # Orchestrator
│   ├── SKILL.md                # Decision trees, comparisons, routing
│   └── references/
│       ├── selection-guide.md
│       └── migration-patterns.md
├── database-postgres/          # Service skill
├── database-mysql/             # Service skill
└── storage-blob/               # Service skill

When to use:

  • Domain has both cross-cutting concerns AND deep service-specific content
  • Users ask both "which should I use?" AND "how do I configure X?"
  • You want to scale the number of services without bloating a single skill

Pros: Cross-service guidance without context bloat, clear routing, independent service skill evolution Cons: More skills to maintain, requires careful trigger phrase design

5.3 The Orchestrator Pattern in Detail

The orchestrator skill handles cross-cutting concerns and routing decisions, while service skills handle implementation details.

Orchestrator responsibilities:

  • Decision trees ("Which service should I use?")
  • Comparison tables (Service A vs. Service B)
  • Cross-service patterns (authentication, networking, migration)
  • Explicit routing to service skills

Service skill responsibilities:

  • Service-specific configuration
  • Implementation guides
  • Troubleshooting
  • Service-specific best practices

Key Mechanism: USE FOR / DO NOT USE FOR

The orchestrator's description must explicitly define boundaries:

---
name: data-services
description: >
  Data service selection and cross-cutting patterns.
  USE FOR: compare databases, choose data store, data migration strategy,
  which database to use, Service A vs Service B decisions.
  DO NOT USE FOR: service-specific implementation (use database-postgres,
  database-mysql, storage-blob directly for configuration tasks).
---

Service skills include reciprocal boundaries:

---
name: database-postgres
description: >
  PostgreSQL configuration, authentication, and operations.
  USE FOR: PostgreSQL setup, configuration, query optimization, auth setup.
  DO NOT USE FOR: comparing database options (use data-services).
---

5.4 Preventing Trigger Collisions

Without clear boundaries, both orchestrator and service skills may activate for the same prompt, causing inconsistent behavior.

Problem pattern (collision):

# Orchestrator
description: "Help with databases including PostgreSQL setup"

# Service skill  
description: "Help with PostgreSQL setup and configuration"

Both match "help me set up PostgreSQL" → unpredictable routing.

Solution pattern (clear boundaries):

# Orchestrator
description: >
  USE FOR: compare databases, choose data store
  DO NOT USE FOR: PostgreSQL setup (use database-postgres)

# Service skill
description: >
  USE FOR: PostgreSQL setup, configuration, optimization
  DO NOT USE FOR: comparing databases (use data-services)

"Help me set up PostgreSQL" → database-postgres "Should I use PostgreSQL or MySQL?" → data-services

5.5 Routing Examples

User Intent Activated Skill Rationale
"Which database should I use?" Orchestrator Cross-service decision
"Compare PostgreSQL and MySQL" Orchestrator Comparison query
"Set up PostgreSQL authentication" Service (postgres) Service-specific implementation
"Optimize my MySQL queries" Service (mysql) Service-specific task
"Migrate from MySQL to PostgreSQL" Orchestrator Cross-service workflow

5.6 Testing Implications

When implementing the orchestrator pattern, tests must verify proper routing:

Orchestrator tests:

  • Activates for cross-cutting prompts (comparisons, selection, migration)
  • Does NOT activate for service-specific prompts

Service skill tests:

  • Activates for service-specific prompts
  • Does NOT activate for orchestrator prompts (add negative test cases)
// Example: Service skill negative tests
describe('Should NOT Trigger (Orchestrator Handles These)', () => {
  const orchestratorPrompts = [
    'Which database should I use?',
    'Compare PostgreSQL and MySQL',
    'Help me choose a data store',
  ];

  test.each(orchestratorPrompts)(
    'does not trigger on: "%s"',
    (prompt) => {
      const result = triggerMatcher.shouldTrigger(prompt);
      expect(result.triggered).toBe(false);
    }
  );
});

5.7 When to Introduce an Orchestrator

Consider adding an orchestrator when:

  1. Users frequently ask comparison questions — "Should I use A or B?"
  2. Multiple skills share patterns — Authentication, networking, backup strategies
  3. The domain is growing — Adding more services that need unified guidance
  4. Context window is a concern — Individual skills are getting too large

Don't add an orchestrator when:

  • Services are unrelated (no cross-service questions)
  • The domain is small (2-3 skills with minimal overlap)
  • Maintenance overhead isn't justified

5.8 Summary: Choosing an Organization Pattern

Factor Flat Consolidated Orchestrator
Cross-service guidance None Built-in Via orchestrator
Context efficiency Best Worst Good
Maintenance complexity Low Medium Higher
Trigger collision risk Low High Low (if designed well)
Scales with services Yes No Yes

Rule of thumb: Start with flat (Pattern A). When cross-service questions become common, introduce an orchestrator (Pattern C). Avoid consolidated (Pattern B) unless the domain is small and stable.


6. MCP Tool Development Guide

This section covers when and how to create MCP Tools. Remember: Tools are model-controlled—the LLM decides when to call them based on schema and description.

6.1 When to Create an MCP Tool

MCP Tools are for discrete, atomic operations. The decision framework:

Create a Tool When:

Criteria Example Why It's a Tool
Exposing a new API endpoint Key Vault secret retrieval Direct API wrapper
Operation is atomic List storage accounts Single request/response
Returns data for further processing Get metrics LLM needs the output
No decisions required Delete a resource by ID Parameters fully specify action
Can describe in one sentence "Get the value of a secret from Key Vault" Clear, bounded scope

Do NOT Create a Tool When:

Criteria Example What to Build Instead
Multi-step workflow "Deploy my app" Skill (orchestrates steps)
User decisions mid-process "Set up monitoring" Skill (guides decisions)
Needs context accumulation "Troubleshoot this error" Skill (maintains state)
Duplicates existing capability Another way to list VMs Nothing (use existing)

6.2 Tool Design Principles

1. Single Responsibility One tool = one operation. Don't create a tool that "creates or updates or deletes" based on parameters. Create three tools.

2. Clear Naming Names should be verb_noun or noun_verb patterns that clearly indicate the action:

  • secret_get, account_list, container_create
  • handle_secret, manage_storage, do_operation

3. Descriptive Schemas The description field is how the LLM decides to use your tool. Be explicit:

  • ✅ "Get the value of a specific secret from an Azure Key Vault. Returns the secret value and metadata."
  • ❌ "Key Vault operations"

4. Skill References Help the LLM understand when NOT to use your tool by referencing Skills:

FOR FULL WORKFLOW: Use `azure-security` skill for Key Vault setup and configuration.

6.3 Tool Schema Definition

Tools are defined using JSON Schema. The schema tells the LLM what parameters are available and required:

{
  "name": "keyvault_secret_get",
  "title": "Get Key Vault Secret",
  "description": "Retrieve the value of a specific secret from Azure Key Vault. Returns the secret value and metadata. FOR FULL WORKFLOW: Use azure-security skill for Key Vault setup.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "vault_name": {
        "type": "string",
        "description": "Name of the Key Vault"
      },
      "secret_name": {
        "type": "string",
        "description": "Name of the secret to retrieve"
      },
      "version": {
        "type": "string",
        "description": "Optional: specific version of the secret"
      }
    },
    "required": ["vault_name", "secret_name"]
  }
}

Schema Best Practices:

Element Best Practice Why
name Use resource_action pattern Predictable, searchable
description Include skill cross-reference Helps routing decisions
required List only truly required params Reduces friction
Property description Be specific about format/constraints LLM generates better calls

6.4 Naming Conventions

Consistent naming helps both LLMs and developers find the right tool:

Namespace: {platform}-{service}
Command:   {resource}_{action}

Examples:
- Namespace: azure-storage
  - storage_account_list     (list all storage accounts)
  - storage_blob_get         (get a specific blob)
  - storage_container_create (create a container)

- Namespace: azure-keyvault
  - keyvault_list           (list all vaults)
  - keyvault_secret_get     (get a secret value)
  - keyvault_secret_set     (set a secret value)

Naming Rules:

  1. Namespace = service identity. All tools for a service share the namespace.
  2. Resource = what you're operating on. Usually the ARM resource type.
  3. Action = the operation. Use standard verbs: list, get, create, update, delete, query.

6.5 MCP Description Template

Include skill references in MCP tool descriptions to improve routing:

**EXECUTION TOOL** - [One sentence describing what it does].
USE FOR: [Specific operations this tool handles].
FOR FULL WORKFLOW: Use `skill-name` skill for [workflow description].
FOR GUIDANCE: Use `skill-name` skill to understand [concept].

Example:

**EXECUTION TOOL** - Execute Azure Developer CLI (azd) commands.
USE FOR: Running azd up, azd deploy, azd provision, getting deployment logs.
FOR FULL WORKFLOW: Use `azure-deploy` skill (prepare → validate → deploy chain).
FOR GUIDANCE: Use `azure-prepare` skill to configure azure.yaml before running azd.

6.6 Best Practices Files

For configuration patterns and reference material, create best practices files that MCP can serve:

Purpose: Centralize patterns that multiple Skills might need. Skills call get_azure_bestpractices(resource="X") instead of embedding duplicate content.

File: azure-swa-best-practices.txt

# Azure Static Web Apps Best Practices

## azure.yaml Configuration
services:
  web:
    host: staticwebapp
    ...

## Bicep Patterns
resource staticWebApp 'Microsoft.Web/staticSites@2022-09-01' = {
  ...
}

## Build Output by Framework
| Framework | outputLocation |
| React     | build          |
| Vue       | dist           |
| Angular   | dist/{project} |

7. Integration Patterns

This section describes how Skills and MCP Tools work together. The key insight: Skills should orchestrate, MCP should execute. When this pattern is followed, you get consistent, maintainable, and testable workflows.

7.1 The Hybrid Pattern (Recommended)

The hybrid pattern assigns clear responsibilities:

┌─────────────────────────────────┐    ┌──────────────────────────────┐
│ MCP = "WHAT"                    │    │ Skills = "HOW"               │
│ (Patterns & Configurations)     │    │ (Workflow Orchestration)     │
│                                 │    │                              │
│ • azure.yaml snippets           │◄───│ • Detection logic            │
│ • Bicep resource patterns       │    │ • Workflow steps             │
│ • SKU guidance                  │    │ • Error handling             │
│ • Build output by framework     │    │ • Decision trees             │
│ • API references                │    │ • User interaction           │
└─────────────────────────────────┘    └──────────────────────────────┘
         Single Source of Truth              Invokes MCP for patterns

Why this works:

  • No duplication: Patterns live in one place (MCP). Skills reference them.
  • Easy updates: Change a pattern in MCP; all Skills get the update.
  • Clear ownership: MCP team owns patterns; Skill team owns workflows.
  • Testable: Test patterns independently from workflows.

Key Principle: Skills call get_azure_bestpractices(resource="static-web-app") instead of embedding duplicate content.

Real-World Example: The Static Web Apps routing fix (see Section 8) moved build output patterns from the azure-prepare skill into MCP's best practices file. The skill now calls MCP to get the patterns, ensuring consistency.

7.2 Pattern: Skill as Orchestrator

The skill orchestrates the workflow; MCP tools execute the operations:

SKILL orchestrates → MCP executes → SKILL interprets → User output

This pattern ensures:

  1. Workflow logic stays in Skills — Decisions, branching, error handling
  2. Execution stays in MCP — API calls, data retrieval, resource operations
  3. Results are synthesized by Skills — Combine outputs into user-facing guidance

Example: Cost Optimization Workflow

## Step 1: Load Best Practices
Use `azure-get_azure_bestpractices` MCP tool with:
- resource: "cost-optimization"
- action: "all"

## Step 2: Discover Resources  
Use `azure-storage` MCP tool → `storage_account_list`
Use `azure-cosmos` MCP tool → `cosmos_account_list`

## Step 3: Run Compliance Check
Use `azure-extension_azqr` MCP tool for orphaned resources

## Step 4: Generate Report (Skill logic)
Synthesize MCP results into actionable recommendations

7.3 Anti-Patterns to Avoid

Anti-Pattern Problem Correct Pattern
Skill embeds CLI commands Bypasses MCP, creates duplication Skill invokes MCP tool
MCP tool includes workflow logic Tools should be atomic Move logic to Skill
Skill duplicates MCP patterns Two sources of truth, drift Skill calls MCP for patterns
Tool has no skill reference LLM doesn't know when to use Skill Add FOR FULL WORKFLOW in tool description
Skill doesn't list MCP dependencies Hard to maintain, unclear requirements Add MCP Tools Used section

7.4 Pattern: Preparation Manifest

The Preparation Manifest connects the three core skills and maintains state across workflow steps:

┌─────────────────────────────────────────────────────────────────────┐
│                              PREPARE                                │
│                     Get app Azure-ready                             │
│    Discovery → Architecture Planning → File Generation → Manifest  │
└─────────────────────────────────────────────────────────────────────┘
                                │ outputs
                                ▼
                ┌───────────────────────────────┐
                │    PREPARATION MANIFEST       │
                │    .azure/preparation.md      │
                │                               │
                │    • Application components   │
                │    • Generated artifacts      │
                │    • Deployment config        │
                │    • Validation requirements  │
                │    • Decision log             │
                └───────────────────────────────┘
                                │ reads
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                             VALIDATE                                │
│    Read Manifest → Execute Validation Checks → Update Manifest      │
└─────────────────────────────────────────────────────────────────────┘
                                │ reads
                                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                              DEPLOY                                 │
│    Read Manifest → Execute Deployment → Record Outcome              │
└─────────────────────────────────────────────────────────────────────┘

Why use a manifest?

  • State persistence: Skills are stateless; the manifest maintains context
  • Resumability: User can stop and restart the workflow
  • Auditability: Decision log shows why choices were made
  • Validation: Each skill can verify prerequisites from the manifest

7.5 MCP Cross-References in Skills

Every skill should have an "MCP Tools Used" section that documents dependencies:

## MCP Tools Used in This Skill

| Step | Tool | Command | Purpose |
|------|------|---------|---------|
| 1 | `azure-get_azure_bestpractices` | `get_bestpractices` | Load guidance |
| 3 | `azure-deploy` | `plan_get` | Analyze workspace |
| 5 | `azure-azd` | `up` | Execute deployment |

**If Azure MCP is not enabled:** Run `/mcp add azure` or use CLI fallback.

Benefits:

  • Developers know what MCP tools to have enabled
  • LLM understands the skill-tool relationship
  • Maintenance is easier (clear dependencies)

7.6 Workflow Checklist Pattern

For multi-step workflows, provide a copyable checklist that makes progress trackable:

## Deployment Workflow

Copy this checklist and track progress:

Deployment Progress:

  • Step 1: Validate prerequisites (azure.yaml, authentication)
  • Step 2: Run pre-flight checks (azd validate)
  • Step 3: Execute deployment (azd up)
  • Step 4: Verify deployment succeeded
  • Step 5: Run smoke tests

**Step 1: Validate prerequisites**
Check that azure.yaml exists and contains valid configuration...

Why this pattern works:

Benefit How
Verifiable progress Each checkbox = completed state
Resumable Agent can restart from failed step
Visible User sees exactly where workflow is
Debuggable Failed step is obvious

When to use:

  • Workflows with 3+ sequential steps
  • Tasks that might fail mid-way
  • Processes where order matters

Cross-reference: See §4.2 Degrees of Freedom for guidance on how prescriptive each step should be.


8. DOs and DON'Ts

8.1 DOs ✅

DO: Add MCP Cross-References in Skills

Good Pattern (azure-observability):

| Service | Use When | MCP Tools | CLI |
|---------|----------|-----------|-----|
| Azure Monitor | Metrics, alerts | `azure__monitor` | `az monitor` |

Good Pattern (azure-security):

### Key Vault
- `azure__keyvault` with command `keyvault_list` - List Key Vaults
- `azure__keyvault` with command `keyvault_secret_get` - Get secret value

DO: Use Skill Classification Prefix

description: |
  **WORKFLOW SKILL** - Orchestrates deployment through preparation, validation, execution.

DO: Include Routing Clarity

description: |
  ...
  INVOKES: `azure-deploy` MCP tool, `azure-azd` MCP tool for execution.
  FOR SINGLE OPERATIONS: Use `azure-azd` MCP tool directly for single azd commands.

DO: Consolidate Patterns in MCP, Workflows in Skills

Content Type Belongs In
azure.yaml snippets MCP best practices
Bicep patterns MCP best practices
SKU guidance MCP best practices
Detection logic Skill
Workflow steps Skill
Error handling Skill

DO: Test with Trigger Tests

Use Waza-style trigger testing:

# trigger_tests.yaml
shouldTriggerPrompts:
  - "deploy my app to Azure"
  - "set up Azure deployment"
  - "prepare for Azure"

shouldNotTriggerPrompts:
  - "list my storage accounts"
  - "run azd up"
  - "check resource health"

DO: Use Token Limits

{
  "defaults": {
    "SKILL.md": 500,
    "references/**/*.md": 1000
  }
}

8.2 DON'Ts ❌

DON'T: Duplicate Configuration in Both MCP and Skills

Bad:

MCP: azure.yaml template with host: staticwebapp
SKILL: Also contains azure.yaml template with host: staticwebapp

Good:

MCP: Single source of truth for azure.yaml patterns
SKILL: Invokes MCP for patterns, focuses on workflow

DON'T: Embed CLI Commands Directly in Skills

Bad (azure-diagnostics problem):

# Current: Embeds CLI commands directly
az containerapp show --name APP -g RG --query "properties.configuration.registries"

Good:

Use azure-applens MCP tool for AI-powered diagnostics, or Use azure-resourcehealth MCP tool to check availability status.

CLI Fallback (if MCP unavailable):

az containerapp show --name APP -g RG

DON'T: Create Competing Guidance

Bad (SWA CLI vs azd issue):

MCP: "Use npx swa deploy"
SKILL: "Use azd up with Bicep"
→ Agent picks randomly → ~50% deployment failures

Good:

MCP: Patterns only (azure.yaml, Bicep templates)
SKILL: Workflow only (calls MCP for patterns)
→ Single path → Consistent results

DON'T: Leave Descriptions Under 150 Characters

Bad (Low compliance):

description: 'Process PDF files'

Good (High compliance):

description: |
  **WORKFLOW SKILL** - Process PDF files including extraction and merging.
  USE FOR: "extract PDF", "merge PDFs". DO NOT USE FOR: creating PDFs.

DON'T: Omit Anti-Triggers

Bad:

description: |
  Deploy applications to Azure.
  USE FOR: azd up, azd deploy, push to Azure.

Good:

description: |
  Deploy applications to Azure.
  USE FOR: azd up, azd deploy, push to Azure.
  DO NOT USE FOR: listing resources (use azure-xxx MCP), querying logs (use azure-monitor MCP).

DON'T: Create Name Collisions Without Routing Guidance

Bad:

  • azure-deploy skill exists
  • azure-deploy MCP tool exists
  • No guidance on which to use

Good:

# Skill description:
description: |
  **WORKFLOW SKILL** - Full deployment workflow.
  FOR SINGLE COMMANDS: Use `azure-azd` MCP tool directly.

# MCP tool description:
description: |
  **EXECUTION TOOL** - Execute deployment commands.
  FOR FULL WORKFLOW: Use `azure-deploy` skill.

8.3 Security Considerations

Skills execute in the user's environment with significant privileges. Build defensively.

For Skill Authors

Do Don't
Use environment variables for secrets Hardcode credentials or API keys
Validate inputs before passing to scripts Trust user input blindly
Document required permissions Request more permissions than needed
Handle errors gracefully Let scripts fail silently

For Script Safety

# Good: Explicit error handling, safe defaults
def process_file(path):
    try:
        with open(path) as f:
            return f.read()
    except FileNotFoundError:
        print(f"File {path} not found, using default")
        return ''
    except PermissionError:
        print(f"Cannot access {path}")
        return ''

# Bad: Fails unexpectedly, no recovery
def process_file(path):
    return open(path).read()  # Crashes on missing file

For Skill Reviewers

Before approving a skill, check:

  • No hardcoded secrets or credentials
  • Scripts handle errors without data loss
  • External URLs/APIs are documented and necessary
  • Destructive operations require confirmation
  • No unexpected network calls or data exfiltration

9. Case Studies

This section presents real examples of Skills and MCP Tools working together (and conflicts that arose when they didn't). These case studies are drawn from Azure ecosystem development but the patterns apply broadly.

9.1 Static Web Apps Routing Fix

This case study illustrates the core problem this guide addresses: conflicting guidance from uncoordinated systems.

The Problem

When a user says "deploy my React app to Azure", two systems provided conflicting guidance:

System File Guidance Result
MCP azure-swa-best-practices.txt Use SWA CLI (npx swa deploy) ❌ Non-IaC, unreliable
Skills azure-deploy + azure-prepare Use azd up with Bicep ✅ IaC, reproducible

No coordination layer existed → Agent picked randomly → Estimated ~50% deployment failures when conflicting approaches were mixed.

Key Insight: The observation that identified this issue: "I think I found part of the problem. Our azure best practices tool doesn't use azd for SWA guidance which conflicts with the other guidance in both skills and general deployment."

The Solution

Hybrid Architecture:

┌─────────────────────────────────┐    ┌──────────────────────────────┐
│ MCP = "WHAT"                    │    │ Skills = "HOW"               │
│ (Patterns & Configurations)     │    │ (Workflow Orchestration)     │
│                                 │    │                              │
│ • azure.yaml snippets           │◄───│ • Detection logic            │
│ • Bicep resource patterns       │    │ • Workflow steps             │
│ • SKU guidance                  │    │ • Error handling             │
│ • Build output by framework     │    │ • Decision trees             │
└─────────────────────────────────┘    └──────────────────────────────┘

Changes Made:

  1. MCP (azure-swa-best-practices.txt):

    • Replaced CLI-only guidance with comprehensive azd patterns
    • Added azure.yaml configurations
    • Added Bicep resource patterns
    • Kept SWA CLI as explicit-only alternative
  2. Skills (azure-deploy, azure-prepare):

    • Slimmed down, removed duplicate patterns
    • Added get_azure_bestpractices(resource="static-web-app") invocation
    • Added SWA detection signals

Results & Metrics

Metrics Note: The ~50% failure estimate is based on observed behavior during pre-fix testing where the agent would inconsistently apply SWA CLI vs azd approaches. Post-fix formal evaluation is in progress. Early qualitative observations show improved consistency (agent now reliably uses azd for deployment workflows), but quantified failure rate reduction requires controlled testing that is currently underway. We will update this section with hard metrics when available.

Test Verification

User Prompt Expected Behavior Pass Criteria
"Deploy my React app" Uses azd, NOT swa CLI No npx swa commands
"Use SWA CLI to deploy" Uses SWA CLI (explicit) npx swa deploy allowed
"Preview my app locally" Uses SWA CLI for preview npx swa start

9.2 Azure Functions: Skill vs MCP Disambiguation

The Problem

Both azure-functions skill and azure-functionapp MCP tool exist.

The Solution

User Intent Route Target
"Create a new Function App" SKILL azure-functions (creation workflow)
"List my Function Apps" MCP azure-functionapp (data query)
"How do Functions triggers work?" SKILL azure-functions (knowledge)
"Get function app settings" MCP azure-functionapp (data retrieval)

Skill Description Update:

description: |
  **WORKFLOW SKILL** - Create and configure Azure Functions.
  USE FOR: "create function app", "add Azure Function", "set up serverless".
  DO NOT USE FOR: listing functions (use azure-functionapp MCP), querying logs.
  INVOKES: `azure-functionapp` MCP for queries, `azure-azd` for deployment.

9.3 Key Vault Integration: Excellent MCP Reference Pattern

The azure-security skill demonstrates ideal MCP cross-referencing:

Key Vault Operations

MCP Server (Preferred):

  • azure__keyvault with command keyvault_list - List Key Vaults
  • azure__keyvault with command keyvault_secret_get - Get secret value
  • azure__keyvault with command keyvault_secret_set - Set secret value

If Azure MCP is not enabled: Run /azure:setup or enable via /mcp.

CLI Fallback:

az keyvault list --subscription $SUB
az keyvault secret show --vault-name $VAULT --name $SECRET

This pattern:

  • ✅ Prefers MCP tools
  • ✅ Documents fallback path
  • ✅ Maintains skill focus on workflow, not execution

10. Testing & Evaluation

Building Skills and Tools is only half the job—you need to verify they work correctly. This section covers testing strategies for both trigger accuracy (does the right thing get invoked?) and task completion (does it actually work?).

Why Testing Matters

Test Type What It Catches Consequence of Skipping
Trigger accuracy False positives/negatives in routing Wrong skill invoked; user frustration
Task completion Broken workflows, missing steps Deployment failures; data loss
Regression Breaking changes from updates Previously working flows break

10.1 Waza Framework Overview

Waza is a framework for evaluating Agent Skills with task completion metrics and trigger accuracy testing:

# Install
pip install waza

# Generate eval from skill
waza generate --repo microsoft/GitHub-Copilot-for-Azure --skill azure-functions -o ./eval

# Run evaluation
waza run eval.yaml

10.2 Trigger Accuracy Testing

Test that your skill triggers on the right prompts:

# trigger_tests.yaml
name: my-skill-triggers
skill: my-skill

shouldTriggerPrompts:
  - "deploy my app to Azure"
  - "set up Azure deployment"
  - "prepare for Azure"
  - "help me deploy"
  - "configure Azure hosting"

shouldNotTriggerPrompts:
  - "list my storage accounts"
  - "run azd up"
  - "check resource health"
  - "get my subscription"
  - "query logs"

10.3 Task Completion Metrics

Define tasks with success criteria:

# tasks/deploy-app.yaml
id: deploy-app-001
name: Deploy Container App

inputs:
  prompt: "Deploy my app to Azure Container Apps"
  context:
    files: ["Dockerfile", "app.py"]

expected:
  output_contains:
    - "container"
    - "deployed"
  
  tool_calls:
    required:
      - pattern: "az containerapp"
    forbidden:
      - pattern: "rm -rf"

10.4 CI/CD Integration

# .github/workflows/skill-eval.yaml
name: Skill Evaluation

on:
  pull_request:
    paths:
      - 'skills/**'

jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install waza
        run: pip install waza
      
      - name: Run evaluations
        run: waza run evals/my-skill/eval.yaml --output results.json
      
      - name: Check thresholds
        run: |
          python -c "
          import json
          r = json.load(open('results.json'))
          assert r['summary']['composite_score'] >= 0.8
          "

10.5 Sensei: Frontmatter Compliance

Use Sensei to improve skill frontmatter:

# Run on a single skill
Run sensei on my-skill

# Run on all low-adherence skills
Run sensei on all Low-adherence skills

Sensei will:

  1. Score current compliance (Low → High)
  2. Add USE FOR trigger phrases
  3. Add DO NOT USE FOR anti-triggers
  4. Add INVOKES for tool relationships
  5. Verify token budget
  6. Run tests

11. References

This section provides authoritative sources for deeper learning. These references were used in creating this guide.

MCP (Model Context Protocol)

Resource URL What You'll Learn
MCP Specification https://modelcontextprotocol.io/specification/latest Protocol details, message formats
MCP Architecture Overview https://modelcontextprotocol.io/docs/concepts/architecture Host/client/server relationships
MCP Tools Concepts https://modelcontextprotocol.io/docs/concepts/tools How tools work, schema definition
MCP Prompts Concepts https://modelcontextprotocol.io/docs/concepts/prompts User-controlled primitives
Code Execution with MCP (Anthropic) https://www.anthropic.com/engineering/code-execution-with-mcp Real-world MCP patterns

GitHub Copilot

Resource URL What You'll Learn
Copilot SDK Architecture https://deepwiki.com/github/copilot-sdk/3-sdk-architecture How Copilot integrates extensions
Awesome Copilot https://github.com/github/awesome-copilot Curated list of resources
Maximizing Copilot's Agentic Capabilities https://github.blog/ai-and-ml/github-copilot/how-to-maximize-github-copilots-agentic-capabilities/ Best practices for agent workflows

Azure Implementation Examples

Resource URL What You'll Learn
Azure MCP Server https://github.com/microsoft/mcp/tree/main/servers/Azure.Mcp.Server Production MCP server implementation
GitHub Copilot for Azure Skills https://github.com/microsoft/GitHub-Copilot-for-Azure/tree/main/plugin/skills Production skill examples
MCP Commands Reference https://github.com/microsoft/mcp/blob/main/servers/Azure.Mcp.Server/docs/azmcp-commands.md Available Azure MCP commands

Tools & Frameworks

Resource URL What You'll Learn
Waza (Skill Evaluation) https://github.com/spboyer/waza Testing framework for skills
Sensei (Frontmatter Improvement) https://github.com/spboyer/sensei Automated skill compliance fixes

Agent Skills & Patterns (Anthropic)

Resource URL What You'll Learn
Building Effective AI Agents https://www.anthropic.com/research/building-effective-agents Core agent patterns (routing, chaining, orchestration)
Agent Skills Engineering Blog https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills Progressive disclosure, skill authoring best practices
Claude Skills Cookbook https://github.com/anthropics/claude-cookbooks/tree/main/skills Practical skill examples
Agent Skills Open Standard https://agentskills.io/specification Specification for portable skills

Quick Reference Card

Routing Cheat Sheet

User Says Route To Why
"Deploy my app" SKILL Workflow
"List my resources" MCP Data query
"Help me set up" SKILL Guidance
"Run this command" MCP Execution
"What went wrong" SKILL Diagnosis

Skill → MCP Tool Mapping

Skill Primary MCP Tools
azure-prepare azure-deploy (plan_get), azure-get_azure_bestpractices
azure-validate azure-azd (validate_azure_yaml)
azure-deploy azure-azd (up, deploy), azure-deploy (app_logs_get)
azure-diagnostics azure-applens, azure-resourcehealth, azure-monitor
azure-functions azure-functionapp, azure-azd
azure-observability azure-monitor, azure-applicationinsights
azure-security azure-keyvault, azure-role

Frontmatter Template

---
name: azure-{domain}
description: |
  **WORKFLOW SKILL** - {One-line description}.
  USE FOR: {trigger1}, {trigger2}, {trigger3}.
  DO NOT USE FOR: {scenario1} (use {other}), {scenario2}.
  INVOKES: `{mcp-tool-1}`, `{mcp-tool-2}`.
  FOR SINGLE OPERATIONS: Use `{mcp-tool}` directly.
---

Appendix A: How LLMs Decide What to Invoke

This appendix explores how different AI platforms decide between using their own knowledge, invoking tools (like MCP), or activating skills/prompts. Understanding these mechanisms can help us write better descriptions and improve routing accuracy.

A.1 The Universal Routing Problem

All LLM-based agents face the same fundamental question: Given a user prompt and a set of available capabilities, which (if any) should be invoked?

The answer varies by platform, but the core mechanisms are similar:

User Prompt → Intent Analysis → Capability Matching → Decision
                                                         ↓
                              ┌─────────────────────────────────────────┐
                              │  Answer directly (LLM knowledge)         │
                              │  Invoke tool (function call)             │
                              │  Activate skill/prompt (workflow)        │
                              │  Request clarification (ask user)        │
                              └─────────────────────────────────────────┘

A.2 Platform-Specific Routing Mechanisms

OpenAI (GPT-4, GPT-5)

Mechanism: Function calling via trained policy + orchestration layer

Component How It Works
Function schemas Developers define JSON schemas with name, description, parameters
Intent matching Model analyzes prompt against function descriptions
Decision Outputs tool_call message if function matches; otherwise answers directly
Confidence Picks function with highest semantic similarity to prompt

Key insight: The description field is critical. GPT uses it to decide whether to call your function. Poor descriptions = poor routing.

"The decision to call a function is made purely by the model, based on prompt-to-function intent matching and context." — OpenAI Function Calling Docs

Anthropic (Claude)

Mechanism: Tool use with progressive disclosure + MCP integration

Component How It Works
Tool discovery Claude can search for tools dynamically (doesn't load all at once)
Progressive disclosure Only loads schemas of relevant tools based on query
MCP integration Uses tool_use blocks via MCP protocol
Code orchestration Can generate code to orchestrate multi-tool workflows

Key insight: Claude's "progressive disclosure" means it searches for the right tool rather than scanning all tools. Clear, distinct descriptions help Claude find your tool.

"Tools built for agents are most ergonomic—and effective—when they are intuitive for both non-deterministic agents and humans." — Anthropic: Writing Tools for Agents

Google (Gemini)

Mechanism: Function declarations with semantic routing

Component How It Works
Function declarations Schema with name, description, parameters passed at runtime
Intent analysis Compares user intent to function descriptions
Routing Semantic similarity + context determines function selection
Parallel calls Can call multiple functions simultaneously

Key insight: Gemini emphasizes the quality of function descriptions—more precise descriptions yield better routing.

"The more descriptive and precise the function definitions are, the better Gemini can match them to user requests." — Gemini Function Calling Docs

GitHub Copilot

Mechanism: Embedding-guided skill routing with semantic matching

Component How It Works
Skill frontmatter YAML with name and description in SKILL.md
Embedding matching Creates vector embeddings of user prompt and skill descriptions
Clustering Groups skills by similarity to narrow candidates
On-demand loading Only loads matching skill content into context

Key insight: Copilot uses embedding-based semantic similarity. Your skill's description field is converted to a vector and compared against the user's prompt vector. Similar vectors = skill gets invoked.

"Copilot compares the user's prompt against each available skill's description using embedding-based semantic similarity." — GitHub Blog: Making Copilot Smarter

A.3 Common Patterns Across Platforms

Despite implementation differences, all platforms share these routing principles:

Principle Description Implication for Skill/Tool Authors
Description is king The description field drives routing decisions Write clear, specific descriptions
Semantic matching Embeddings or intent classifiers compare prompt to description Use the same words users would use
Negative examples help Stating what something doesn't do prevents misrouting Include "DO NOT USE FOR" sections
Context matters Conversation history influences routing Skills should be context-aware
Confidence thresholds If no good match, LLM answers directly Don't force routing—let LLM decide

A.4 Implications for This Guide

Based on this research, our guidance aligns well with how LLMs actually route:

Our Recommendation Why It Works
USE FOR: trigger phrases Matches how embedding similarity works
DO NOT USE FOR: anti-triggers Prevents false positives in semantic matching
INVOKES: tool list Helps LLM understand skill-tool relationships
FOR SINGLE OPERATIONS Provides fallback routing guidance
Clear, specific descriptions Improves embedding quality and intent matching

A.5 What We Can't Control

Some routing behaviors are opaque or model-specific:

Factor What We Know What We Don't Know
Embedding models Used for semantic similarity Exact model, training data
Confidence thresholds Exist, vary by platform Specific values
Priority when tied First match? Highest score? Implementation details
Context window impact More tools = more competition Exact degradation curve

A.6 Best Practices from Anthropic Research

Anthropic published specific guidance on writing effective tool descriptions:

  1. Be specific about function AND intent

    • ❌ "Get information about weather"
    • ✅ "Retrieves current weather (temperature, precipitation, condition) for a city. Use only for present-day conditions, not forecasts."
  2. Highlight boundaries explicitly

    • State what the tool doesn't do
    • Prevents misrouting to similar-sounding tools
  3. Provide usage examples

    • Short canonical examples improve generalization
    • "Example: 'What's the weather in Paris right now?'"
  4. Namespace tools

    • Use prefixes: Weather_GetCurrent, Weather_GetForecast
    • Helps LLM distinguish similar tools
  5. Return meaningful context

    • Tool responses should enable good follow-up decisions
    • Balance detail and brevity

A.7 Testing Routing Accuracy

To verify your descriptions work across platforms:

# routing_tests.yaml
tests:
  - prompt: "Deploy my React app to Azure"
    expected_route: skill
    expected_target: azure-prepare
    
  - prompt: "List my storage accounts"
    expected_route: mcp_tool
    expected_target: azure-storage
    
  - prompt: "What is Azure Functions?"
    expected_route: llm_knowledge
    expected_target: null  # No tool needed

Run these tests against multiple LLMs to ensure consistent routing.

A.8 References

Resource URL
OpenAI Function Calling https://platform.openai.com/docs/guides/function-calling
Anthropic Tool Use https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview
Anthropic: Writing Tools for Agents https://www.anthropic.com/engineering/writing-tools-for-agents
Anthropic: Advanced Tool Use https://www.anthropic.com/engineering/advanced-tool-use
Gemini Function Calling https://ai.google.dev/gemini-api/docs/function-calling
GitHub Copilot Skills https://docs.github.com/en/copilot/concepts/agents/about-agent-skills
Copilot Embedding Routing https://github.blog/ai-and-ml/github-copilot/how-were-making-github-copilot-smarter-with-fewer-tools/

Document Version: 2.0 | Last Updated: 2026-02-05

What's New in v2.0:

  • Added TL;DR Quick Start section
  • Added Three Levels (Progressive Disclosure) diagram
  • Added Evaluation-First Development (§4.1)
  • Added Degrees of Freedom guidance (§4.2)
  • Added Script Guidelines (§4.9)
  • Added Workflow Checklist Pattern (§7.6)
  • Added Security Considerations (§8.3)
  • Added Anthropic Agent Skills references
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment