KIND vSR Deploy/Validation
# Create a Cluster #
$ kind create cluster --name semantic-router
Creating cluster "semantic-router" ...
β Ensuring node image (kindest/node:v1.35.0) πΌ
β Preparing nodes π¦
β Writing configuration π
β Starting control-plane πΉοΈ
β Installing CNI π
β Installing StorageClass πΎ
Set kubectl context to "kind-semantic-router"
You can now use your cluster with:
kubectl cluster-info --context kind-semantic-router
Have a nice day! π
# Deploy #
$ ./deploy/openshift/deploy-to-openshift.sh --kind --no-observability
[SUCCESS] Connected to cluster: kind-semantic-router
[INFO] Creating namespace: vllm-semantic-router-system
namespace/vllm-semantic-router-system created
[SUCCESS] Namespace ready
[INFO] KServe CRD not found - using standalone deployment mode
[INFO] Deploying standalone simulator pods...
deployment.apps/vllm-model-a created
deployment.apps/vllm-model-b created
service/vllm-model-a created
service/vllm-model-b created
[INFO] Waiting for simulator services to get ClusterIPs...
[SUCCESS] Got ClusterIPs: model-a=10.96.101.161, model-b=10.96.143.105
[INFO] Creating PersistentVolumeClaims...
persistentvolumeclaim/semantic-router-models created
persistentvolumeclaim/semantic-router-cache created
[SUCCESS] PVCs created
[INFO] Generating configuration...
[SUCCESS] Configuration generated
[INFO] Creating ConfigMaps...
configmap/semantic-router-config created
configmap/envoy-config created
[SUCCESS] ConfigMaps created
[INFO] Deploying semantic-router...
deployment.apps/semantic-router created
[SUCCESS] Semantic-router deployment applied
[INFO] Creating services...
service/semantic-router created
service/semantic-router-metrics created
[SUCCESS] Services created
[INFO] Waiting for deployments to be ready...
[INFO] This may take several minutes as models are downloaded...
Waiting for deployment "vllm-model-a" rollout to finish: 0 of 1 updated replicas are available...
deployment "vllm-model-a" successfully rolled out
deployment "vllm-model-b" successfully rolled out
Waiting for deployment "semantic-router" rollout to finish: 0 of 1 updated replicas are available...
deployment "semantic-router" successfully rolled out
[SUCCESS] Deployment complete!
==================================================
Kind Deployment Summary
==================================================
Namespace: vllm-semantic-router-system
Access the services (run in a separate terminal):
kubectl port-forward -n vllm-semantic-router-system svc/semantic-router 8080:8080 8801:8801
Then test:
# Auto-routing (classifier picks the model)
curl http://localhost:8801/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model": "auto", "messages": [{"role": "user", "content": "What is 2+2?"}]}'
# STEM query -> routes to Model-A
curl http://localhost:8801/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model": "auto", "messages": [{"role": "user", "content": "Explain quantum physics"}]}'
# Humanities query -> routes to Model-B
curl http://localhost:8801/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model": "auto", "messages": [{"role": "user", "content": "Explain the elements of a contract under common law and give a simple example."}]}'
View logs:
kubectl logs -f deployment/semantic-router -c semantic-router -n vllm-semantic-router-system
kubectl logs -f deployment/semantic-router -c envoy-proxy -n vllm-semantic-router-system
View status:
kubectl get pods -n vllm-semantic-router-system
kubectl get svc -n vllm-semantic-router-system
# Validation #
$ curl http://localhost:8801/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model": "auto", "messages": [{"role": "user", "content": "What is 2+2?"}]}'
{"id":"chatcmpl-431952b9-f369-4cd7-b398-c09f1425c774","created":1769578653,"model":"Model-A","usage":{"prompt_tokens":6,"completion_tokens":50,"total_tokens":56},"object":"chat.completion","do_remote_decode":false,"do_remote_prefill":false,"remote_block_ids":null,"remote_engine_id":"","remote_host":"","remote_port":0,"choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"Testing@, #testing 1$ ,2%,3^, [4\u0026*5], 6~, 7-_ + (8 : 9) / \\ \u003c \u003e . Today it is partially cloudy and raining. The temperature here is "}}]}
$ curl http://localhost:8801/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model": "auto", "messages": [{"role": "user", "content": "Explain quantum physics"}]}'
{"id":"chatcmpl-2306494d-3681-481e-82b0-9e160b36d16c","created":1769578669,"model":"Model-A","usage":{"prompt_tokens":3,"completion_tokens":45,"total_tokens":48},"object":"chat.completion","do_remote_decode":false,"do_remote_prefill":false,"remote_block_ids":null,"remote_engine_id":"","remote_host":"","remote_port":0,"choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"Alas, poor Yorick! I knew him, Horatio: A fellow of infinite jest The rest is silence. Today it is partially cloudy and raining. Testing@, #testing 1$ ,2%,3^, [4"}}]}
$ curl http://localhost:8801/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model": "auto", "messages": [{"role": "user", "content": "Explain the elements of a contract under common law and give a simple example."}]}'
{"id":"chatcmpl-3f759710-4064-4143-b5b2-402398fbda6b","created":1769578677,"model":"Model-B","usage":{"prompt_tokens":15,"completion_tokens":25,"total_tokens":40},"object":"chat.completion","do_remote_decode":false,"do_remote_prefill":false,"remote_block_ids":null,"remote_engine_id":"","remote_host":"","remote_port":0,"choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"Today it is partially cloudy and raining. The temperature here is twenty-five degrees centigrade. Today it is partially cloudy and raining"}}]}$