You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
$ ./deploy/openshift/deploy-to-openshift.sh --kserve --simulator --no-observability
[SUCCESS] Logged in as cluster-admin
[INFO] Creating namespace: vllm-semantic-router-system
namespace/vllm-semantic-router-system configured
[SUCCESS] Namespace ready
[INFO] Installing KServe and LLMInferenceService CRDs...
[INFO] InferenceService CRD already installed.
[INFO] LLMInferenceService CRD already installed.
[INFO] cert-manager namespace already present.
deployment.apps/cert-manager condition met
deployment.apps/cert-manager-webhook condition met
deployment.apps/cert-manager-cainjector condition met
deployment.apps/kserve-controller-manager condition met
[SUCCESS] KServe webhook service has ready endpoints
clusterrole.rbac.authorization.k8s.io/system:openshift:scc:anyuid added: "llmisvc-controller-manager"
deployment.apps/llmisvc-controller-manager restarted
deployment.apps/llmisvc-controller-manager condition met
[SUCCESS] LLMInferenceService webhook has ready endpoints
[INFO] Ensuring LLMInferenceServiceConfig templates...
llminferenceserviceconfig.serving.kserve.io/kserve-config-llm-decode-template unchanged
llminferenceserviceconfig.serving.kserve.io/kserve-config-llm-decode-worker-data-parallel unchanged
llminferenceserviceconfig.serving.kserve.io/kserve-config-llm-prefill-template unchanged
llminferenceserviceconfig.serving.kserve.io/kserve-config-llm-prefill-worker-data-parallel unchanged
llminferenceserviceconfig.serving.kserve.io/kserve-config-llm-router-route unchanged
llminferenceserviceconfig.serving.kserve.io/kserve-config-llm-scheduler unchanged
llminferenceserviceconfig.serving.kserve.io/kserve-config-llm-template unchanged
llminferenceserviceconfig.serving.kserve.io/kserve-config-llm-worker-data-parallel unchanged
configmap/inferenceservice-config patched (no change)
[SUCCESS] All KServe CRDs already installed.
deployment.apps/llmisvc-controller-manager condition met
[INFO] Ensuring simulator service account and SCC...
clusterrole.rbac.authorization.k8s.io/system:openshift:scc:anyuid added: "llmisvc-workload"
clusterrole.rbac.authorization.k8s.io/system:openshift:scc:privileged added: "llmisvc-workload"
clusterrole.rbac.authorization.k8s.io/system:openshift:scc:privileged added: "llmisvc-controller-manager"
[INFO] Deploying simulator LLMInferenceServices...
llminferenceservice.serving.kserve.io/model-a created
llminferenceservice.serving.kserve.io/model-b created
[INFO] Waiting for simulator LLMInferenceServices to be ready...
llminferenceservice.serving.kserve.io/model-a condition met
llminferenceservice.serving.kserve.io/model-b condition met
[INFO] KServe mode: Deploying semantic-router with KServe backend...
==================================================
vLLM Semantic Router - KServe Deployment
==================================================
Configuration:
Namespace: vllm-semantic-router-system
Simulator Mode: true
LLMInferenceService A: model-a
LLMInferenceService B: model-b
Model A Name: Model-A
Model B Name: Model-B
Embedding Model: all-MiniLM-L12-v2
Storage Class: <cluster default>
Models PVC Size: 10Gi
Cache PVC Size: 5Gi
Dry Run: false
Step 1: Validating prerequisites...
✓ OpenShift CLI found
✓ Logged in as cluster-admin
✓ Namespace exists: vllm-semantic-router-system
✓ LLMInferenceService exists: model-a
✓ LLMInferenceService is ready
✓ LLMInferenceService exists: model-b
✓ LLMInferenceService is ready
Creating stable ClusterIP service for predictor: model-a
✓ Predictor service ClusterIP A: 172.30.103.62 (stable across pod restarts)
Creating stable ClusterIP service for predictor: model-b
✓ Predictor service ClusterIP B: 172.30.6.32 (stable across pod restarts)
Step 2: Generating manifests...
✓ Generated: configmap-router-config.yaml
✓ Generated: configmap-envoy-config.yaml
✓ Generated: serviceaccount.yaml
✓ Generated: pvc.yaml
✓ Generated: peerauthentication.yaml
✓ Generated: deployment.yaml
✓ Generated: service.yaml
✓ Generated: route.yaml
Step 3: Deploying to OpenShift...
serviceaccount/semantic-router unchanged
persistentvolumeclaim/semantic-router-models created
persistentvolumeclaim/semantic-router-cache created
configmap/semantic-router-kserve-config created
configmap/semantic-router-envoy-kserve-config created
Skipping PeerAuthentication (Istio CRD not found).
deployment.apps/semantic-router-kserve created
service/semantic-router-kserve created
route.route.openshift.io/semantic-router-kserve created
route.route.openshift.io/semantic-router-kserve-api created
✓ Resources deployed successfully
Step 4: Waiting for deployment to be ready...
This may take a few minutes while models are downloaded...
Waiting for pod... (1/36)
Waiting for pod... (2/36)
Initializing... (downloading models)
Initializing... (downloading models)
Initializing... (downloading models)
Initializing... (downloading models)
Initializing... (downloading models)
Waiting for pod... (8/36)
Waiting for pod... (9/36)
Waiting for pod... (10/36)
Waiting for pod... (11/36)
Waiting for pod... (12/36)
Quick status (init logs):
Downloaded sentence-transformers/all-MiniLM-L12-v2
All models downloaded successfully!
Model download complete!
total 40
drwxrwsr-x. 8 root 1001240000 4096 Jan 30 05:52 .
drwxr-xr-t. 4 root root 33 Jan 30 05:51 ..
drwxr-sr-x. 6 1001240000 1001240000 4096 Jan 30 05:52 all-MiniLM-L12-v2
drwxr-sr-x. 3 1001240000 1001240000 4096 Jan 30 05:51 category_classifier_modernbert-base_model
drwxr-sr-x. 3 1001240000 1001240000 4096 Jan 30 05:52 jailbreak_classifier_modernbert-base_model
drwxrws---. 2 root 1001240000 16384 Jan 30 05:51 lost+found
drwxr-sr-x. 3 1001240000 1001240000 4096 Jan 30 05:51 pii_classifier_modernbert-base_model
drwxr-sr-x. 3 1001240000 1001240000 4096 Jan 30 05:52 pii_classifier_modernbert-base_presidio_token_model
Setting proper permissions...
Creating cache directories...
Model download complete!
Waiting for pod... (13/36)
Waiting for pod... (14/36)
Waiting for pod... (15/36)
Waiting for pod... (16/36)
Waiting for pod... (17/36)
Waiting for pod... (18/36)
Waiting for pod... (19/36)
Waiting for pod... (20/36)
Waiting for pod... (21/36)
Waiting for pod... (22/36)
Waiting for pod... (23/36)
✓ Pod is ready: semantic-router-kserve-5696479cbd-q7kl7
✓ External URL: https://semantic-router-kserve-vllm-semantic-router-system.apps.brent.pcbk.p1.openshiftapps.com
==================================================
Deployment Complete!
==================================================
Next steps:
1. Set the route:
ENVOY_ROUTE=semantic-router-kserve-vllm-semantic-router-system.apps.brent.pcbk.p1.openshiftapps.com
2. Test model auto-routing:
curl -k -X POST https://semantic-router-kserve-vllm-semantic-router-system.apps.brent.pcbk.p1.openshiftapps.com/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"auto","messages":[{"role":"user","content":"Explain the elements of a contract under common law and give a simple example."}]}'
3. View logs:
oc logs -l app=semantic-router -c semantic-router -n vllm-semantic-router-system -f
For more information, see: semantic-router/deploy/kserve/README.md
[SUCCESS] KServe deployment complete
Validation
$ ENVOY_ROUTE=semantic-router-kserve-vllm-semantic-router-system.apps.brent.pcbk.p1.openshiftapps.com
$ curl -k -X POST https://semantic-router-kserve-vllm-semantic-router-system.apps.brent.pcbk.p1.openshiftapps.com/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"auto","messages":[{"role":"user","content":"Explain the elements of a contract under common law and give a simple example."}]}'
{
"id": "chatcmpl-0b73f7dc-0014-4e59-84c7-8dc0e2227241",
"created": 1769752597,
"model": "Model-B",
"usage": {
"prompt_tokens": 15,
"completion_tokens": 32,
"total_tokens": 47
},
"object": "chat.completion",
"do_remote_decode": false,
"do_remote_prefill": false,
"remote_block_ids": null,
"remote_engine_id": "",
"remote_host": "",
"remote_port": 0,
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "To be or not to be that is the question. Today it is partially cloudy and raining. Testing@, #testing 1$ ,2%,3^"
}
}
]
}