Gourav J. Shah gouravjshah

This Dockerfile builds a container for running vLLM (Large Language Model inference engine) on CPU with specific patches and optimizations. Here's a breakdown:

Base Image

FROM openeuler/vllm-cpu:0.9.1-oe2403lts

Uses OpenEuler Linux distribution's pre-built vLLM image (version 0.9.1)
Built for CPU inference (not GPU)
Based on OpenEuler 24.03 LTS

Critical Patch (Lines 4-5)

Lab: Using `kubectl-ai --mcp-server` with Cursor to Inspect the `atharva-ml` Namespace

0. Lab Goals

By the end of this lab you’ll be able to:

Run kubectl-ai as an MCP server.
Wire it into Cursor via mcp.json.
Use Cursor chat + kubectl-ai tools to:

0) Repo layout (GitOps view)

add the existing code


git status

kubectl get secret -n monitoring prom-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

Following is a crisp, battle-tested playbook for running databases on Kubernetes—what to do, what to avoid, and how to keep them safe, fast, and recoverable.

Before you start

Default to managed DBs if possible (RDS/Aurora/Cloud SQL/AlloyDB/Atlas). Run on K8s only when you need: portability, custom extensions, tight sidecar/tooling, or cost control with commodity nodes.
Use an Operator, not raw manifests. Prefer mature operators (e.g., Crunchy/Percona for Postgres & MySQL, Vitess for MySQL sharding, PXC/MongoDB Enterprise/StackGres, RabbitMQ Operator for queues). Operators give sane HA, backups, upgrades, and day-2 ops.

Core architecture

StatefulSets + Headless Services for stable identities and volumes.

	loki:
	commonConfig:
	replication_factor: 1
	schemaConfig:
	configs:
	- from: "2024-04-01"
	store: tsdb
	object_store: s3
	schema: v13
	index:

	FROM openeuler/vllm-cpu:0.9.1-oe2403lts

	# Patch the cpu_worker.py to handle zero NUMA nodes
	RUN sed -i 's/cpu_count_per_numa = cpu_count \/\/ numa_size/cpu_count_per_numa = cpu_count \/\/ numa_size if numa_size > 0 else cpu_count/g' \
	/workspace/vllm/vllm/worker/cpu_worker.py

	ENV VLLM_TARGET_DEVICE=cpu \
	VLLM_CPU_KVCACHE_SPACE=1 \
	OMP_NUM_THREADS=2 \
	OPENBLAS_NUM_THREADS=1 \

	FROM openeuler/vllm-cpu:0.9.1-oe2403lts

	# Patch the cpu_worker.py to handle zero NUMA nodes
	RUN sed -i 's/cpu_count_per_numa = cpu_count \/\/ numa_size/cpu_count_per_numa = cpu_count \/\/ numa_size if numa_size > 0 else cpu_count/g' \
	/workspace/vllm/vllm/worker/cpu_worker.py

	ENV VLLM_TARGET_DEVICE=cpu \
	VLLM_CPU_KVCACHE_SPACE=1 \
	OMP_NUM_THREADS=2 \
	OPENBLAS_NUM_THREADS=1 \

	deploymentMode: SingleBinary

	singleBinary:
	replicas: 1

	loki:
	commonConfig:
	replication_factor: 1

	# Required for new installs

	# airbnb_mcp.py
	from textwrap import dedent

	from agno.agent import Agent
	from agno.models.google import Gemini
	from agno.tools.mcp import MCPTools
	from agno.tools.reasoning import ReasoningTools
	from agno.os import AgentOS