Skip to content

Instantly share code, notes, and snippets.

View gouravjshah's full-sized avatar

Gourav J. Shah gouravjshah

View GitHub Profile

This Dockerfile builds a container for running vLLM (Large Language Model inference engine) on CPU with specific patches and optimizations. Here's a breakdown:

Base Image

FROM openeuler/vllm-cpu:0.9.1-oe2403lts

  • Uses OpenEuler Linux distribution's pre-built vLLM image (version 0.9.1)
  • Built for CPU inference (not GPU)
  • Based on OpenEuler 24.03 LTS

Critical Patch (Lines 4-5)

Lab: Using kubectl-ai --mcp-server with Cursor to Inspect the atharva-ml Namespace

0. Lab Goals

By the end of this lab you’ll be able to:

  • Run kubectl-ai as an MCP server.
  • Wire it into Cursor via mcp.json.
  • Use Cursor chat + kubectl-ai tools to:

0) Repo layout (GitOps view)

add the existing code


git status
@gouravjshah
gouravjshah / Dockerfile
Created November 18, 2025 07:58
Dockerfile for vLLM with CPU only Serving
FROM openeuler/vllm-cpu:0.9.1-oe2403lts
# Patch the cpu_worker.py to handle zero NUMA nodes
RUN sed -i 's/cpu_count_per_numa = cpu_count \/\/ numa_size/cpu_count_per_numa = cpu_count \/\/ numa_size if numa_size > 0 else cpu_count/g' \
/workspace/vllm/vllm/worker/cpu_worker.py
ENV VLLM_TARGET_DEVICE=cpu \
VLLM_CPU_KVCACHE_SPACE=1 \
OMP_NUM_THREADS=2 \
OPENBLAS_NUM_THREADS=1 \
@gouravjshah
gouravjshah / loki-values.yaml
Created November 9, 2025 14:33
Fixed Loki values.yaml
deploymentMode: SingleBinary
singleBinary:
replicas: 1
loki:
commonConfig:
replication_factor: 1
# Required for new installs
@gouravjshah
gouravjshah / get_grafana_admin_pass.md
Created November 7, 2025 04:30
Get Grafana Admin Password
kubectl get secret -n monitoring prom-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
@gouravjshah
gouravjshah / databases_on_kubernetes.md
Created October 29, 2025 07:30
Best Practices for Running Databases on Kubernetes

Following is a crisp, battle-tested playbook for running databases on Kubernetes—what to do, what to avoid, and how to keep them safe, fast, and recoverable.

Before you start

  • Default to managed DBs if possible (RDS/Aurora/Cloud SQL/AlloyDB/Atlas). Run on K8s only when you need: portability, custom extensions, tight sidecar/tooling, or cost control with commodity nodes.
  • Use an Operator, not raw manifests. Prefer mature operators (e.g., Crunchy/Percona for Postgres & MySQL, Vitess for MySQL sharding, PXC/MongoDB Enterprise/StackGres, RabbitMQ Operator for queues). Operators give sane HA, backups, upgrades, and day-2 ops.

Core architecture

  • StatefulSets + Headless Services for stable identities and volumes.
@gouravjshah
gouravjshah / airbnb_mcp-gemini.py
Created October 17, 2025 04:47
Agno Agent to search for AirBnB Listings
# airbnb_mcp.py
from textwrap import dedent
from agno.agent import Agent
from agno.models.google import Gemini
from agno.tools.mcp import MCPTools
from agno.tools.reasoning import ReasoningTools
from agno.os import AgentOS

switch to instavote namespace

kubectl config set-context --current --namespace=instavote
helm uninstall -n dev instavote 
kubectl delete deploy vote redis db result worker  -n instavote 
kubectl delete svc vote redis db result -n instavote 
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: vote
namespace: instavote
spec:
ingressClassName: nginx
rules:
- host: vote.example.com