MLOps Guide for Beginners

Kubeflow + MLflow + KServe

This guide uses everyday analogies to explain Machine Learning concepts and the MLOps ecosystem tools: Kubeflow, MLflow, and KServe.

Part 1: Understanding Machine Learning

The Restaurant Chain Analogy

Imagine you want to create a restaurant chain that serves the perfect dish to every customer. To achieve this, you need:

A culinary school to train chefs (Kubeflow)
A professional registry to catalog and version chefs (MLflow)
Industrial kitchens where chefs work (KServe)

┌─────────────────────────────────────────────────────────────────────────────┐
│                        REAL WORLD vs MACHINE LEARNING                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   RESTAURANT CHAIN                     MLOPS                                │
│   ────────────────                     ─────                                │
│   Culinary school                 =    Kubeflow (orchestrated training)     │
│   Professional chef registry      =    MLflow (model registry)              │
│   Industrial kitchen              =    KServe (model serving)               │
│                                                                             │
│   INSIDE EACH RESTAURANT               MACHINE LEARNING                     │
│   ──────────────────────               ────────────────                     │
│   Apprentice chef                 =    Algorithm (code)                     │
│   Recipes and practice            =    Training data                        │
│   Experienced chef                =    Trained model                        │
│   Cooking a dish                  =    Making an inference                  │
│   Customer order                  =    Input (input data)                   │
│   Served dish                     =    Output (prediction)                  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

The Process of Training a Model

    TRAINING (happens once, before "opening the restaurant")
    ─────────────────────────────────────────────────────────

    ┌──────────────┐      ┌──────────────┐      ┌──────────────┐
    │   RECIPES    │      │   PRACTICE   │      │  EXPERIENCED │
    │   (Data)     │ ───▶│  (Training)  │ ───▶│     CHEF     │
    │              │      │              │      │   (Model)    │
    └──────────────┘      └──────────────┘      └──────────────┘

    Real example:
    - Data: 1000 labeled photos of cats and dogs
    - Training: Algorithm learns patterns (ears, snouts, etc.)
    - Model: File that "knows" how to distinguish cats from dogs

The Inference Process

    INFERENCE (happens every time a customer places an order)
    ────────────────────────────────────────────────────────

    ┌──────────────┐      ┌──────────────┐      ┌──────────────┐
    │    ORDER     │      │     CHEF     │      │     DISH     │
    │   (Input)    │ ───▶│   KITCHEN    │ ───▶│   (Output)   │
    │              │      │   (Model)    │      │              │
    └──────────────┘      └──────────────┘      └──────────────┘

    Real example:
    - Input: New photo of an animal
    - Model: Analyzes the photo using what it learned
    - Output: "It's a cat" (with 95% confidence)

Why Separate Training from Inference?

Aspect	Training	Inference
Analogy	Culinary school	Running restaurant
Frequency	Once (or periodically)	Thousands of times per day
Resources	Heavy CPU/GPU, hours/days	Light, milliseconds
Where it happens	Notebooks, training clusters	Production servers (KServe!)

Part 2: MLOps Ecosystem Overview

How the Tools Complement Each Other

┌─────────────────────────────────────────────────────────────────────────────┐
│                                                                             │
│                        ML MODEL JOURNEY                                     │
│                                                                             │
│    ┌───────────────┐      ┌───────────────┐      ┌───────────────┐          │
│    │   KUBEFLOW    │      │    MLFLOW     │      │    KSERVE     │          │
│    │               │      │               │      │               │          │
│    │  "Culinary    │ ───▶│ "Professional │ ───▶│  "Industrial  │          │
│    │   School"     │      │   Registry"   │      │   Kitchen"    │          │
│    │               │      │               │      │               │          │
│    │  Trains the   │      │  Catalogs and │      │  Puts chef    │          │
│    │  chef         │      │  versions     │      │  to work      │          │
│    │               │      │  the chef     │      │               │          │
│    └───────────────┘      └───────────────┘      └───────────────┘          │
│                                                                             │
│    DEVELOPMENT             MANAGEMENT/GOVERNANCE   PRODUCTION               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Complete Analogy: The Restaurant Chain

┌─────────────────────────────────────────────────────────────────────────────┐
│                                                                             │
│  YOU WANT: A restaurant chain where each location serves perfect dishes,    │
│  with certified chefs and standardized kitchens.                            │
│                                                                             │
│  ═══════════════════════════════════════════════════════════════════════    │
│                                                                             │
│  KUBEFLOW = CULINARY SCHOOL                                                 │
│  ──────────────────────────                                                 │
│  • Where chefs are trained                                                  │
│  • Has structured curriculum (Pipelines)                                    │
│  • Laboratories for experiments (Notebooks)                                 │
│  • Classrooms with heavy equipment (GPUs for training)                      │
│  • Can train multiple chefs simultaneously (distributed jobs)               │
│                                                                             │
│  ═══════════════════════════════════════════════════════════════════════    │
│                                                                             │
│  MLFLOW = PROFESSIONAL CHEF REGISTRY                                        │
│  ───────────────────────────────────                                        │
│  • History of all courses taken (experiment tracking)                       │
│  • Certificates and diplomas (model artifacts)                              │
│  • Versions: Junior Chef, Mid Chef, Senior Chef (model versions)            │
│  • Who's working vs retired (staging, production, archived)                 │
│  • Compare performance between chefs (metrics comparison)                   │
│                                                                             │
│  ═══════════════════════════════════════════════════════════════════════    │
│                                                                             │
│  KSERVE = INDUSTRIAL KITCHEN                                                │
│  ───────────────────────────                                                │
│  • Where the certified chef goes to work                                    │
│  • Kitchen ready to use, just bring the chef                                │
│  • Serves thousands of orders per day                                       │
│  • Opens more kitchens if needed (auto-scaling)                             │
│  • Closes at night if no customers (scale-to-zero)                          │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Part 3: Kubeflow - The Culinary School

What is Kubeflow?

Kubeflow is an ML platform for Kubernetes that orchestrates the entire model development lifecycle. It's like a complete culinary school.

┌─────────────────────────────────────────────────────────────────────────────┐
│                         KUBEFLOW = CULINARY SCHOOL                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   KUBEFLOW COMPONENT           ANALOGY                                      │
│   ──────────────────           ───────                                      │
│                                                                             │
│   Kubeflow Notebooks      =    Experimental laboratory                      │
│                                (where you test new recipes)                 │
│                                                                             │
│   Kubeflow Pipelines      =    Structured curriculum                        │
│                                (sequence of classes until graduation)       │
│                                                                             │
│   Training Operators      =    Specialized classrooms                       │
│   (TFJob, PyTorchJob)          (with specific equipment)                    │
│                                                                             │
│   Katib                   =    Optimization program                         │
│   (hyperparameter tuning)      (finding the best teaching technique)        │
│                                                                             │
│   KServe (integrated)     =    Internship program                           │
│                                (putting graduates to work)                  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Kubeflow Pipelines: The Chef's Curriculum

A Pipeline is a sequence of steps that transform an apprentice into a chef:

    PIPELINE = TRAINING CURRICULUM
    ──────────────────────────────

    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
    │  Fetch   │    │ Prepare  │    │  Train   │    │ Evaluate │    │ Register │
    │  Data    │──▶│  Data    │──▶│  Model   │──▶│  Model   │──▶│in MLflow │
    │          │    │          │    │          │    │          │    │          │
    │"Buy      │    │"Organize │    │"Intensive│    │"Final    │    │"Issue    │
    │ingredi-  │    │ pantry"  │    │ course"  │    │ exam"    │    │ diploma" │
    │  ents"   │    │          │    │          │    │          │    │          │
    └──────────┘    └──────────┘    └──────────┘    └──────────┘    └──────────┘

    Each box is a pipeline "component" that runs in a separate container.

Pipeline Example (Conceptual)

# pipeline.py - Chef training curriculum

@pipeline(name="train-iris-chef")
def train_chef():

    # Lesson 1: Get ingredients
    data = fetch_data(source="s3://data/iris.csv")

    # Lesson 2: Prepare ingredients
    prepared_data = prepare_data(data=data.output)

    # Lesson 3: Intensive course (training)
    model = train_model(
        data=prepared_data.output,
        algorithm="sklearn.RandomForest",
        epochs=100
    )

    # Lesson 4: Final exam (evaluation)
    metrics = evaluate_model(model=model.output)

    # Lesson 5: Issue diploma (register in MLflow)
    register_mlflow(
        model=model.output,
        metrics=metrics.output,
        name="iris-chef"
    )

Training Operators: Specialized Classrooms

┌─────────────────────────────────────────────────────────────────────────────┐
│                         TRAINING OPERATORS                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   TFJob              For training with TensorFlow                           │
│   (TensorFlow room)  "Room with Google equipment"                           │
│                                                                             │
│   PyTorchJob         For training with PyTorch                              │
│   (PyTorch room)     "Room with Meta equipment"                             │
│                                                                             │
│   MPIJob             For distributed training                               │
│   (distributed room) "Multiple connected rooms for large classes"           │
│                                                                             │
│   XGBoostJob         For training with XGBoost                              │
│   (XGBoost room)     "Room specialized in tabular data"                     │
│                                                                             │
│                                                                             │
│   Why different rooms?                                                      │
│   Each framework has specific resource and configuration needs.             │
│   Operators abstract this complexity away.                                  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Part 4: MLflow - The Professional Registry

What is MLflow?

MLflow is a platform for managing the ML lifecycle. It's like the professional registry that keeps history, certificates, and versions of all chefs.

┌─────────────────────────────────────────────────────────────────────────────┐
│                    MLFLOW = PROFESSIONAL CHEF REGISTRY                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   MLFLOW COMPONENT             ANALOGY                                      │
│   ────────────────             ───────                                      │
│                                                                             │
│   MLflow Tracking         =    Grade book                                   │
│                                (all attempts, parameters, results)          │
│                                                                             │
│   MLflow Models           =    Standard diploma format                      │
│                                (certificate any kitchen accepts)            │
│                                                                             │
│   MLflow Model Registry   =    Records office                               │
│                                (official versions, status, approvals)       │
│                                                                             │
│   MLflow Projects         =    Instruction manual                           │
│                                (how to reproduce the training)              │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

MLflow Tracking: The Grade Book

Each training attempt is recorded with all details:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         EXPERIMENT: train-iris-chef                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  RUN #1 (first attempt)                                                     │
│  ──────────────────────                                                     │
│  Parameters:      n_estimators=10, max_depth=3                              │
│  Metrics:         accuracy=0.85, f1_score=0.83                              │
│  Artifacts:       model.pkl, confusion_matrix.png                           │
│  Status:          ❌ Failed (accuracy < 0.90)                               │
│                                                                             │
│  RUN #2 (second attempt)                                                    │
│  ───────────────────────                                                    │
│  Parameters:      n_estimators=100, max_depth=5                             │
│  Metrics:         accuracy=0.92, f1_score=0.91                              │
│  Artifacts:       model.pkl, confusion_matrix.png                           │
│  Status:          ❌ Failed (overfitting detected)                          │
│                                                                             │
│  RUN #3 (third attempt)                                                     │
│  ──────────────────────                                                     │
│  Parameters:      n_estimators=50, max_depth=4                              │
│  Metrics:         accuracy=0.95, f1_score=0.94                              │
│  Artifacts:       model.pkl, confusion_matrix.png                           │
│  Status:          ✅ Passed! Register in Model Registry                     │
│                                                                             │
│  Analogy: It's like keeping all exams, with grades and comments,            │
│  to know exactly what worked and what didn't.                               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

MLflow Model Registry: The Records Office

┌─────────────────────────────────────────────────────────────────────────────┐
│                         MODEL REGISTRY: iris-chef                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   VERSION     STAGE           DESCRIPTION                                   │
│   ───────     ─────           ───────────                                   │
│                                                                             │
│   v1          Archived        "Retired chef"                                │
│                               First model, accuracy 0.85                    │
│                               Replaced by v2                                │
│                                                                             │
│   v2          Production      "Working chef" ← CURRENT IN KSERVE            │
│                               Current model, accuracy 0.95                  │
│                               In production since 2024-01-15                │
│                                                                             │
│   v3          Staging         "Chef in testing"                             │
│                               New model, accuracy 0.97                      │
│                               Awaiting approval for production              │
│                                                                             │
│   ═══════════════════════════════════════════════════════════════════════   │
│                                                                             │
│   STAGES:                                                                   │
│                                                                             │
│   None       → Model registered but not classified                          │
│   Staging    → In testing/validation (intern chef)                          │
│   Production → In active use (hired chef)                                   │
│   Archived   → Retired/historical (retired chef)                            │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Code: Registering in MLflow

import mlflow
from mlflow.tracking import MlflowClient

# During training - log experiment
with mlflow.start_run():
    # Parameters (recipe ingredients)
    mlflow.log_param("n_estimators", 50)
    mlflow.log_param("max_depth", 4)

    # ... train model ...

    # Metrics (exam grades)
    mlflow.log_metric("accuracy", 0.95)
    mlflow.log_metric("f1_score", 0.94)

    # Artifacts (diploma + portfolio)
    mlflow.sklearn.log_model(model, "model")
    mlflow.log_artifact("confusion_matrix.png")

# After approval - register official version
client = MlflowClient()
client.create_registered_model("iris-chef")
client.create_model_version(
    name="iris-chef",
    source="runs:/abc123/model",
    run_id="abc123"
)

# Promote to production
client.transition_model_version_stage(
    name="iris-chef",
    version=2,
    stage="Production"
)

Part 5: How Everything Connects

Complete Flow: From Idea to Restaurant

┌─────────────────────────────────────────────────────────────────────────────┐
│                                                                             │
│                    COMPLETE MLOPS FLOW                                      │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                         1. KUBEFLOW                                 │    │
│  │                      (Culinary School)                              │    │
│  │                                                                     │    │
│  │   ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐         │    │
│  │   │ Notebook │─▶│ Pipeline │─▶│ Training │─▶│  Trained │         │    │
│  │   │(experim.)│   │  (ETL)   │   │   Job    │   │  Model   │         │    │
│  │   └──────────┘   └──────────┘   └──────────┘   └─────┬────┘         │    │
│  │                                                      │              │    │
│  └──────────────────────────────────────────────────────┼──────────────┘    │
│                                                         │                   │
│                                                         ▼                   │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                         2. MLFLOW                                   │    │
│  │                   (Professional Registry)                           │    │
│  │                                                                     │    │
│  │   ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐         │    │
│  │   │  Track   │─▶│ Compare  │─▶│ Register │─▶│ Promote  │         │    │
│  │   │experiment│   │  runs    │   │  model   │   │to "Prod" │         │    │
│  │   └──────────┘   └──────────┘   └──────────┘   └─────┬────┘         │    │
│  │                                                      │              │    │
│  └──────────────────────────────────────────────────────┼──────────────┘    │
│                                                         │                   │
│                                                         ▼                   │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                         3. KSERVE                                   │    │
│  │                     (Industrial Kitchen)                            │    │
│  │                                                                     │    │
│  │   ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐         │    │
│  │   │  Deploy  │─▶│  Serve   │─▶│  Scale   │─▶│ Monitor  │         │    │
│  │   │  model   │   │ requests │   │ as needed│   │ & logs   │         │    │
│  │   └──────────┘   └──────────┘   └──────────┘   └──────────┘         │    │
│  │                                                                     │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

InferenceService with MLflow

KServe can fetch models directly from MLflow Model Registry:

# inference-service-mlflow.yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: iris-chef-prod
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      # Fetches "iris-chef" model in "Production" version from MLflow
      storageUri: "gs://mlflow-bucket/mlartifacts/123/abc/artifacts/model"

Comparison Table

┌────────────────────────────────────────────────────────────────────────────┐
│                         TOOLS COMPARISON                                   │
├────────────────┬───────────────────┬───────────────────┬───────────────────┤
│                │     KUBEFLOW      │      MLFLOW       │      KSERVE       │
├────────────────┼───────────────────┼───────────────────┼───────────────────┤
│ Analogy        │ Culinary          │ Professional      │ Industrial        │
│                │ School            │ Registry          │ Kitchen           │
├────────────────┼───────────────────┼───────────────────┼───────────────────┤
│ Focus          │ Training and      │ Tracking and      │ Production        │
│                │ experimentation   │ versioning        │ serving           │
├────────────────┼───────────────────┼───────────────────┼───────────────────┤
│ Lifecycle      │ Development       │ Management/       │ Production        │
│ phase          │                   │ Governance        │                   │
├────────────────┼───────────────────┼───────────────────┼───────────────────┤
│ Main           │ • Pipelines       │ • Tracking        │ • InferenceService│
│ features       │ • Notebooks       │ • Model Registry  │ • Auto-scaling    │
│                │ • Training Jobs   │ • Artifacts       │ • Canary deploy   │
├────────────────┼───────────────────┼───────────────────┼───────────────────┤
│ Runs on        │ Kubernetes        │ Standalone or K8s │ Kubernetes        │
├────────────────┼───────────────────┼───────────────────┼───────────────────┤
│ Used by        │ Data Scientists   │ DS + ML Engineers │ ML Engineers +    │
│                │                   │                   │ DevOps            │
└────────────────┴───────────────────┴───────────────────┴───────────────────┘

Real Scenario: Complete Lifecycle

    DAY 1: EXPERIMENTATION (Data Scientist in Kubeflow)
    ───────────────────────────────────────────────────

    1. Opens Kubeflow Notebook
    2. Loads data, explores, tests algorithms
    3. Finds a promising approach
    4. MLflow automatically logs each experiment

    DAY 2: TRAINING PIPELINE (Kubeflow + MLflow)
    ────────────────────────────────────────────

    1. Creates Pipeline in Kubeflow with steps:
       - Fetch data → Preprocess → Train → Evaluate → Register
    2. Executes Pipeline
    3. MLflow logs metrics and artifacts
    4. Model approved → registers in Model Registry

    DAY 3: DEPLOY TO PRODUCTION (KServe)
    ────────────────────────────────────

    1. ML Engineer creates InferenceService pointing to MLflow
    2. KServe downloads model from Model Registry
    3. Model starts serving requests
    4. Active monitoring

    DAY 30: NEW MODEL (Cycle continues)
    ────────────────────────────────────

    1. New model trained (v3) with better accuracy
    2. Deploy as canary (10% of traffic)
    3. Good metrics → promote to 100%
    4. Old model (v2) archived

Part 6: KServe - The Industrial Kitchen

What is KServe?

You trained your chef (model). Now you need a professional kitchen to serve thousands of customers. This is where KServe comes in.

┌─────────────────────────────────────────────────────────────────────────────┐
│                                                                             │
│    WITHOUT KSERVE (improvised kitchen)  WITH KSERVE (industrial kitchen)    │
│    ───────────────────────────────────  ────────────────────────────────    │
│                                                                             │
│    - You set up everything manually     - Kitchen ready to use              │
│    - Configure stove, sink, fridge      - Just bring the chef (model)       │
│    - Hire waiters                       - Automatic service                 │
│    - Manage queues manually             - Managed queues                    │
│    - If crowded, customers wait         - Opens more kitchens (auto-scaling)│
│    - Kitchen always on                  - Turns off when empty (scale-zero) │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

KServe = Standardized Kitchen Infrastructure

    What you provide:                What KServe provides:
    ────────────────────             ────────────────────────

    ┌──────────────────┐             ┌──────────────────────────────────┐
    │                  │             │  • Web server (waiter)           │
    │   YOUR MODEL     │             │  • Load balancing                │
    │   (chef.pkl)     │───────────▶│  • Auto-scaling                  │
    │                  │             │  • Health checks                 │
    └──────────────────┘             │  • Logging and metrics           │
                                     │  • Automatic model download      │
                                     └──────────────────────────────────┘

Part 7: KServe Components

InferenceService: The Kitchen Contract

The InferenceService is like a contract you sign with the industrial kitchen:

"I want to serve this model, it's stored at this address, and it's sklearn type"

# "Contract" with the KServe kitchen
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: sklearn-iris              # Your "restaurant" name
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn             # Chef type (sklearn, tensorflow, pytorch...)
      storageUri: "gs://..."      # Where the chef is stored

ServingRuntime: The Kitchen Type

Different chefs need different kitchens:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         KITCHEN TYPES (ServingRuntimes)                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Sklearn Chef     ──▶  kserve-sklearnserver kitchen    (simple, light)    │
│   TensorFlow Chef  ──▶  kserve-tensorflow kitchen       (heavy, GPUs)      │
│   PyTorch Chef     ──▶  kserve-torchserve kitchen       (flexible)         │
│   XGBoost Chef     ──▶  kserve-xgbserver kitchen        (tabular data)     │
│   HuggingFace Chef ──▶  kserve-huggingfaceserver kitchen (LLMs, NLP)       │
│                                                                             │
│   You don't need to choose! KServe automatically detects based on           │
│   the modelFormat you specified.                                            │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

StorageUri: Where the Chef is Stored

The model needs to be somewhere accessible:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         WHERE TO STORE THE MODEL                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   gs://bucket/model      Google Cloud Storage (Google cloud)                │
│   s3://bucket/model      AWS S3 or MinIO (Amazon cloud or self-hosted)      │
│   http://server/model    Public HTTP server                                 │
│   pvc://my-pvc/model     Volume inside Kubernetes                           │
│                                                                             │
│   Analogy: It's like the warehouse address where your chef is waiting       │
│   to be picked up and taken to the kitchen.                                 │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Part 8: Anatomy of an InferenceService

The Three Possible Components

┌─────────────────────────────────────────────────────────────────────────────┐
│                         ORDER FLOW                                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│                    ┌─────────────┐                                          │
│   ORDER            │ TRANSFORMER │   Prep cook (assistant)                  │
│   (request)  ───▶ │ (optional)  │   - Washes ingredients                   │
│                    │             │   - Cuts, seasons                        │
│                    └──────┬──────┘   - Prepares for the chef                │
│                           │                                                 │
│                           ▼                                                 │
│                    ┌─────────────┐                                          │
│                    │  PREDICTOR  │   Head chef (REQUIRED)                   │
│                    │  (model)    │   - Makes the dish (inference)           │
│                    │             │   - Is your ML model                     │
│                    └──────┬──────┘                                          │
│                           │                                                 │
│                           ▼                                                 │
│                    ┌─────────────┐                                          │
│   RESPONSE         │  EXPLAINER  │   Sommelier (optional)                   │
│   (response) ◀─── │ (optional)  │   - Explains why this dish               │
│                    │             │   - "I chose this wine because..."       │
│                    └─────────────┘   - Useful for understanding ML decisions│
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Practical Example: Iris Flower Classifier

    PROBLEM: Identify flower species based on petal/sepal measurements

    ┌─────────────────────────────────────────────────────────────────────────┐
    │                                                                         │
    │   INPUT (flower measurements):                                          │
    │   ┌─────────────────────────────────────────┐                           │
    │   │ sepal_length: 5.1 cm                    │                           │
    │   │ sepal_width:  3.5 cm                    │   ─────▶   Model         │
    │   │ petal_length: 1.4 cm                    │              │            │
    │   │ petal_width:  0.2 cm                    │              │            │
    │   └─────────────────────────────────────────┘              │            │
    │                                                            ▼            │
    │   OUTPUT (prediction):                      ◀───────  Class: 0         │
    │   ┌─────────────────────────────────────────┐         (Iris Setosa)     │
    │   │ 0 = Iris Setosa                         │                           │
    │   │ 1 = Iris Versicolor                     │                           │
    │   │ 2 = Iris Virginica                      │                           │
    │   └─────────────────────────────────────────┘                           │
    │                                                                         │
    └─────────────────────────────────────────────────────────────────────────┘

Part 9: How It Works in Practice

Step by Step: From YAML to Served Dish

    1. YOU APPLY THE YAML
       ─────────────────────────────────────────────────────────────────────
       kubectl apply -f sklearn-iris.yaml

       "Hey KServe, I want to open a restaurant called sklearn-iris"


    2. KSERVE CREATES THE INFRASTRUCTURE
       ─────────────────────────────────────────────────────────────────────

       ┌──────────────────────────────────────────────────────────┐
       │                    KServe creates:                       │
       │                                                          │
       │    ┌─────────────┐  ┌─────────────┐  ┌─────────────┐     │
       │    │ Deployment  │  │   Service   │  │   Ingress   │     │
       │    │ (kitchen)   │  │  (waiter)   │  │  (entrance) │     │
       │    └─────────────┘  └─────────────┘  └─────────────┘     │
       │                                                          │
       └──────────────────────────────────────────────────────────┘


    3. STORAGE-INITIALIZER DOWNLOADS THE MODEL
       ─────────────────────────────────────────────────────────────────────

       gs://bucket/model  ────download────▶  /mnt/models/

       "Go to the warehouse and bring the chef to the kitchen"


    4. SERVER STARTS AND LOADS THE MODEL
       ─────────────────────────────────────────────────────────────────────

       kserve-sklearnserver loads sklearn-iris.pkl into memory

       "Chef is in the kitchen, ready to cook!"


    5. CLIENT MAKES REQUEST
       ─────────────────────────────────────────────────────────────────────

       curl -X POST .../v1/models/sklearn-iris:predict
            -d '{"instances": [[5.1, 3.5, 1.4, 0.2]]}'

       "Table 7 wants a dish with these ingredients"


    6. MODEL RETURNS PREDICTION
       ─────────────────────────────────────────────────────────────────────

       {"predictions": [0]}

       "Here it is: Iris Setosa! (class 0)"

Part 10: Complete Commented Example

# sklearn-iris.yaml - Each line explained

apiVersion: serving.kserve.io/v1beta1    # KServe "recipe" version
kind: InferenceService                    # Resource type: inference service
metadata:
  name: sklearn-iris                      # Your service name (how you'll call it)
spec:
  predictor:                              # "Head chef" configuration
    model:
      modelFormat:
        name: sklearn                     # Model type (sklearn, tensorflow, etc)
                                          # KServe uses this to choose the right kitchen

      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
                                          # Address where the model is stored
                                          # KServe will download automatically

      resources:                          # How much resources the kitchen needs
        requests:                         # Minimum guaranteed
          cpu: "100m"                     # 0.1 CPU (100 millicores)
          memory: "256Mi"                 # 256 MB of RAM
        limits:                           # Maximum allowed
          cpu: "500m"                     # 0.5 CPU
          memory: "512Mi"                 # 512 MB of RAM

Part 11: Inference Protocol (How to Place Orders)

V1 Protocol: The Simple Menu

┌─────────────────────────────────────────────────────────────────────────────┐
│                              V1 PROTOCOL                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Endpoint:  POST /v1/models/{model-name}:predict                           │
│                                                                             │
│   Request (order):                                                          │
│   {                                                                         │
│     "instances": [                    // List of "orders"                   │
│       [5.1, 3.5, 1.4, 0.2],          // Order 1: one flower                 │
│       [6.7, 3.1, 4.4, 1.4]           // Order 2: another flower             │
│     ]                                                                       │
│   }                                                                         │
│                                                                             │
│   Response:                                                                 │
│   {                                                                         │
│     "predictions": [0, 1]            // Answers: Setosa, Versicolor         │
│   }                                                                         │
│                                                                             │
│   Analogy: "I want these two dishes" → "Here are the two dishes"            │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

V2 Protocol: The Detailed Menu (Open Inference Protocol)

┌─────────────────────────────────────────────────────────────────────────────┐
│                              V2 PROTOCOL                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Endpoint:  POST /v2/models/{model-name}/infer                             │
│                                                                             │
│   Request (more detailed):                                                  │
│   {                                                                         │
│     "inputs": [{                                                            │
│       "name": "input-0",             // Ingredient name                     │
│       "shape": [1, 4],               // Format: 1 order, 4 values           │
│       "datatype": "FP32",            // Type: decimal numbers               │
│       "data": [[5.1, 3.5, 1.4, 0.2]] // The values                          │
│     }]                                                                      │
│   }                                                                         │
│                                                                             │
│   Advantage: More precise, industry standard, works with any                │
│   framework. It's like an order with all specifications.                    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Part 12: InferenceService States

Lifecycle (How to Track Your Restaurant)

┌─────────────────────────────────────────────────────────────────────────────┐
│                         INFERENCESERVICE STATES                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│                         ┌─────────┐                                         │
│                         │ Unknown │  "Order received"                       │
│                         └────┬────┘                                         │
│                              │                                              │
│                              ▼                                              │
│                         ┌─────────┐                                         │
│                         │ Pending │  "Setting up kitchen, fetching chef"    │
│                         └────┬────┘                                         │
│                              │                                              │
│                    ┌─────────┴─────────┐                                    │
│                    ▼                   ▼                                    │
│               ┌─────────┐         ┌─────────┐                               │
│               │  Ready  │         │ Failed  │                               │
│               │   ✓     │         │   ✗    │                               │
│               └─────────┘         └─────────┘                               │
│           "Restaurant open!"      "Something went wrong"                    │
│                                                                             │
│   Check: kubectl get isvc sklearn-iris                                      │
│   READY=True means it's working!                                            │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Detailed Conditions

# See all conditions
kubectl get isvc sklearn-iris -o jsonpath='{range .status.conditions[*]}{.type}: {.status}{"\n"}{end}'

# Result:
# IngressReady: True      ← Restaurant door is open
# PredictorReady: True    ← Chef is in the kitchen
# Ready: True             ← Everything working!

Part 13: Troubleshooting (When Something Goes Wrong)

Common Problems and Solutions

┌─────────────────────────────────────────────────────────────────────────────┐
│  PROBLEM                      │  ANALOGY               │  SOLUTION          │
├───────────────────────────────┼────────────────────────┼────────────────────┤
│                               │                        │                    │
│  Pod in "Pending"             │  Kitchen doesn't have  │  Check if there    │
│                               │  space/resources       │  are resources in  │
│                               │                        │  the cluster       │
│                               │                        │                    │
├───────────────────────────────┼────────────────────────┼────────────────────┤
│                               │                        │                    │
│  Pod in "CrashLoopBackOff"    │  Chef arrives and      │  Check logs:       │
│                               │  faints                │  kubectl logs ...  │
│                               │                        │                    │
├───────────────────────────────┼────────────────────────┼────────────────────┤
│                               │                        │                    │
│  "Model not found"            │  Chef isn't at the     │  Check             │
│                               │  given address         │  storageUri        │
│                               │                        │                    │
├───────────────────────────────┼────────────────────────┼────────────────────┤
│                               │                        │                    │
│  Download timeout             │  Warehouse too far     │  Model too large   │
│                               │  or closed             │  or URL            │
│                               │                        │  inaccessible      │
│                               │                        │                    │
└─────────────────────────────────────────────────────────────────────────────┘

Debug Commands

# 1. Check general status
kubectl get isvc sklearn-iris
# READY=False? Something's wrong!

# 2. See details and events
kubectl describe isvc sklearn-iris
# Look for "Events" at the bottom

# 3. Check pods
kubectl get pods -l serving.kserve.io/inferenceservice=sklearn-iris
# STATUS other than "Running"? Problem!

# 4. Check model download logs (storage-initializer)
kubectl logs <pod-name> -c storage-initializer
# Download errors appear here

# 5. Check server logs (kserve-container)
kubectl logs <pod-name> -c kserve-container
# Model loading errors appear here

Part 14: Essential Commands (Quick Reference)

# ══════════════════════════════════════════════════════════════════════════════
#                              DAILY COMMANDS
# ══════════════════════════════════════════════════════════════════════════════

# APPLY a model
kubectl apply -f sklearn-iris.yaml

# LIST all models
kubectl get isvc

# SEE DETAILS of a model
kubectl describe isvc sklearn-iris

# CHECK LOGS of the model
kubectl logs -l serving.kserve.io/inferenceservice=sklearn-iris -c kserve-container

# DELETE a model
kubectl delete isvc sklearn-iris

# ══════════════════════════════════════════════════════════════════════════════
#                              TEST INFERENCE
# ══════════════════════════════════════════════════════════════════════════════

# 1. Find the pod
POD=$(kubectl get pods -l serving.kserve.io/inferenceservice=sklearn-iris \
      -o jsonpath='{.items[0].metadata.name}')

# 2. Open tunnel (in one terminal)
kubectl port-forward pod/$POD 8080:8080

# 3. Make request (in another terminal)
curl -X POST http://localhost:8080/v1/models/sklearn-iris:predict \
     -H "Content-Type: application/json" \
     -d '{"instances": [[5.1, 3.5, 1.4, 0.2]]}'

# Expected response: {"predictions": [0]}

Glossary: Technical Terms Translated

General ML Terms

Term	Meaning	Analogy
Model	File with learned "knowledge"	Trained chef
Training	Process of creating the model	Culinary school course
Inference	Using model to predict	Cooking a dish
Serving	Making model available as service	Opening a restaurant
MLOps	Practices for operationalizing ML	Restaurant chain management

Kubeflow Terms

Term	Meaning	Analogy
Pipeline	Sequence of automated steps	Training curriculum
Component	Individual pipeline step	One class in the course
Experiment	Set of pipeline executions	Class of students
Run	One pipeline execution	One student taking the course
Notebook	Interactive development environment	Experimental laboratory
Training Operator	Manages training jobs	Specialized classroom
Katib	Hyperparameter optimization	Finding best teaching method

MLflow Terms

Term	Meaning	Analogy
Experiment	Group of related attempts	Course record
Run	One training attempt	One exam/evaluation
Parameter	Configuration used in training	Recipe ingredients
Metric	Performance measurement	Exam grade
Artifact	Generated file (model, charts)	Diploma + portfolio
Model Registry	Catalog of official models	Records office
Stage	Model status (Staging/Production)	Intern vs Hired

KServe Terms

Term	Meaning	Analogy
InferenceService	Resource that defines serving	Restaurant contract
Predictor	Component that makes predictions	Head chef
Transformer	Pre/post-processing	Kitchen assistant
Explainer	Explains predictions	Sommelier
StorageUri	Where model is stored	Warehouse address
ServingRuntime	Execution environment	Kitchen type
Scale-to-zero	Turn off when no orders	Close kitchen at night
Auto-scaling	Automatically adjust capacity	Open more kitchens
Canary	Gradual deploy of new version	Test new chef with few customers

Next Steps

Now that you understand the concepts, you can:

Kubeflow

Create a Notebook: Experiment with data in Kubeflow environment
Create a Pipeline: Automate the training flow
Use Katib: Automatically optimize hyperparameters

MLflow

Instrument your code: Add mlflow.log_* in training
Use Model Registry: Version and promote models
Compare experiments: Use MLflow UI for analysis

KServe

Try other models: XGBoost, TensorFlow
Add Transformer: Preprocess data before inference
Configure auto-scaling: Adjust replicas based on demand
Canary deploy: Test new model with part of the traffic

Integrations

Complete pipeline: Kubeflow → MLflow → KServe automated
Monitor: Prometheus/Grafana for inference metrics
CI/CD for ML: GitHub Actions + ArgoCD for automatic deploy

Visual Summary

┌─────────────────────────────────────────────────────────────────────────────┐
│                                                                             │
│                     MLOPS ECOSYSTEM = RESTAURANT CHAIN                      │
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                                                                     │   │
│   │   KUBEFLOW              MLFLOW                KSERVE                │   │
│   │   ════════              ══════                ══════                │   │
│   │                                                                     │   │
│   │   Culinary             Professional          Industrial             │   │
│   │   School               Registry              Kitchen                │   │
│   │                                                                     │   │
│   │   "Where the chef      "Where the chef       "Where the chef        │   │
│   │    is trained"          is cataloged"         works"                │   │
│   │                                                                     │   │
│   │   • Notebooks          • Tracking            • Serving              │   │
│   │   • Pipelines          • Model Registry      • Auto-scaling         │   │
│   │   • Training Jobs      • Artifacts           • Canary deploy        │   │
│   │                                                                     │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│   Flow: Train (Kubeflow) → Register (MLflow) → Serve (KServe)               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Remember: Kubeflow is the school, MLflow is the registry, KServe is the kitchen. Together, they form the complete infrastructure to take your model from notebook to production!

mmmarceleza/kserve-guide-en.md