Upgrade Runbook
Procedure for upgrading an SIE cluster to a new release version. Covers Helm-managed deployments on GKE and EKS.
Components upgraded:
- Router (Deployment) — stateless, fast restart
- Worker pools (StatefulSets) — GPU pods, model cache in emptyDir
Version management: SIE uses release-please for unified versioning. A single version (e.g., 0.1.6) is applied to the Helm chart (Chart.yaml appVersion), all Python packages, and all TypeScript packages. The CHANGELOG.md at the repo root documents all changes per release.
1. Pre-Upgrade Checklist
Section titled “1. Pre-Upgrade Checklist”Complete all items before starting the upgrade.
1.1 Review the CHANGELOG
Section titled “1.1 Review the CHANGELOG”Read CHANGELOG.md for the target version. Pay attention to:
- Breaking changes in the router or server API
- Helm values changes (new required values, renamed keys, removed options)
- Model config changes (new or removed models, adapter changes)
# View changelog for the target versiongit log v<CURRENT>..v<TARGET> --oneline1.2 Record Current State
Section titled “1.2 Record Current State”# Note current Helm release versionhelm list -n sie
# Note current chart values (save for rollback reference)helm get values sie -n sie -o yaml > /tmp/sie-values-backup.yaml
# Back up pool state (ConfigMaps + Leases in the sie namespace)kubectl get configmap,lease -n sie -o yaml > /tmp/sie-pool-state-backup.yaml
# Record current image tagskubectl get deployment -n sie -o jsonpath='{range .items[*]}{.metadata.name}: {.spec.template.spec.containers[0].image}{"\n"}{end}'kubectl get statefulset -n sie -o jsonpath='{range .items[*]}{.metadata.name}: {.spec.template.spec.containers[0].image}{"\n"}{end}'
# Record Helm revision numberhelm history sie -n sie --max 51.3 Verify Cluster Health
Section titled “1.3 Verify Cluster Health”# All router pods should be Running and Readykubectl get pods -n sie -l app.kubernetes.io/component=router
# All worker pods should be Running and Ready (if not scaled to zero)kubectl get pods -n sie -l app.kubernetes.io/component=worker
# Router readiness (returns {"status": "ready", "healthy_workers": N})kubectl exec -n sie deploy/sie-sie-cluster-router -- wget -qO- http://localhost:8080/readyz
# Router detailed health (returns worker count, GPU count, loaded models)kubectl exec -n sie deploy/sie-sie-cluster-router -- wget -qO- http://localhost:8080/health
# KEDA ScaledObjects should not be in Fallback modekubectl get scaledobject -n siekubectl describe scaledobject -n sie | grep -A2 "Type.*Fallback"
# Check for recent errors in router logskubectl logs -n sie -l app.kubernetes.io/component=router --tail=50 | grep -i error
# Check for recent errors in worker logskubectl logs -n sie -l app.kubernetes.io/component=worker --tail=50 | grep -i error1.4 Verify Observability Stack
Section titled “1.4 Verify Observability Stack”# Prometheus is serving querieskubectl exec -n monitoring svc/prometheus-operated -- wget -qO- \ 'http://localhost:9090/api/v1/query?query=up' 2>/dev/null | head -c 200
# Grafana is accessiblekubectl port-forward -n monitoring svc/prometheus-grafana 3000:80 &# Open http://localhost:3000 and verify SIE dashboards show data1.5 Drain Active Workloads (Optional)
Section titled “1.5 Drain Active Workloads (Optional)”If running during active traffic, consider:
# Pause KEDA autoscaling to prevent scale events during upgrade.# Each ScaledObject targets a specific StatefulSet, so freeze each one# at its own replica count (pools may differ).for so in $(kubectl get scaledobject -n sie -o jsonpath='{.items[*].metadata.name}'); do # Read the actual scale target from the ScaledObject spec sts=$(kubectl get scaledobject "$so" -n sie -o jsonpath='{.spec.scaleTargetRef.name}') replicas=$(kubectl get statefulset "$sts" -n sie -o jsonpath='{.spec.replicas}' 2>/dev/null) if [ -n "$replicas" ]; then kubectl annotate scaledobject "$so" -n sie \ autoscaling.keda.sh/paused-replicas="$replicas" --overwrite fidone2. Upgrade Procedure
Section titled “2. Upgrade Procedure”2.1 Prepare New Images
Section titled “2.1 Prepare New Images”For clusters using custom image registries (not the default ghcr.io/superlinked), push the new images first:
# Build and push new images (adjust registry as needed)REGISTRY="your-registry.example.com"TAG="0.1.7" # Target version
# Server image (one per bundle)mise run docker -- --tag $TAGdocker tag sie-server:cuda12-default $REGISTRY/sie-server:$TAG-defaultdocker push $REGISTRY/sie-server:$TAG-default
# Router imagemise run docker -- --router --tag $TAGdocker tag sie-router:$TAG $REGISTRY/sie-router:$TAGdocker push $REGISTRY/sie-router:$TAG2.2 Helm Upgrade
Section titled “2.2 Helm Upgrade”Option A: Upgrade from Local Chart
Section titled “Option A: Upgrade from Local Chart”# Dry-run first to preview changeshelm diff upgrade sie deploy/helm/sie-cluster/ \ -n sie \ -f /tmp/sie-values-backup.yaml \ --set workers.common.image.tag="<TARGET_VERSION>" \ --set router.image.tag="<TARGET_VERSION>"
# Apply the upgrade (--wait blocks until pods are ready; --timeout guards against hangs)helm upgrade sie deploy/helm/sie-cluster/ \ -n sie \ -f /tmp/sie-values-backup.yaml \ --set workers.common.image.tag="<TARGET_VERSION>" \ --set router.image.tag="<TARGET_VERSION>" \ --wait --timeout 10mOption B: Upgrade from OCI Registry
Section titled “Option B: Upgrade from OCI Registry”# Dry-runhelm diff upgrade sie oci://ghcr.io/superlinked/sie-cluster \ -n sie \ --version <TARGET_CHART_VERSION> \ -f /tmp/sie-values-backup.yaml
# Applyhelm upgrade sie oci://ghcr.io/superlinked/sie-cluster \ -n sie \ --version <TARGET_CHART_VERSION> \ -f /tmp/sie-values-backup.yaml \ --wait --timeout 10mOption C: Terraform-Managed Clusters
Section titled “Option C: Terraform-Managed Clusters”# Update image tag in Terraform variables# Edit your .tfvars or set TF_VAR:export TF_VAR_sie_image_tag="<TARGET_VERSION>"
cd deploy/terraform/gcp/examples/<your-env>terraform plan # Review changesterraform apply # Apply2.3 Expected Behavior During Rolling Update
Section titled “2.3 Expected Behavior During Rolling Update”Router (Deployment):
- Kubernetes rolls out new router pods one at a time (default
RollingUpdatestrategy). - Router liveness probe:
GET /healthz(returns 200 if process is alive).initialDelaySeconds: 5,periodSeconds: 10. - Router readiness probe:
GET /readyz(returns 200 immediately — router is ready even with 0 workers).initialDelaySeconds: 5,periodSeconds: 5. - The router is stateless; new pods come up in seconds.
- Brief 503s are possible during the switchover window if all old pods are terminated before new ones pass readiness.
Workers (StatefulSets):
- The default
RollingUpdatestrategy updates pods one at a time in reverse ordinal order. (podManagementPolicy: Parallelonly affects pod ordering during scaling, not rolling updates.) - Worker
terminationGracePeriodSeconds: 65. preStophook:sleep 10— gives the K8s endpoints controller 10 seconds to remove the pod from the service before SIGTERM.- On SIGTERM, the server enters graceful shutdown: rejects new requests with
503(withRetry-After: 5header), drains in-flight requests (25-second timeout), then exits. - Readiness probe stops passing (
/readyzreturns 503) once shutdown begins, so the router stops sending new traffic to the draining pod. - The router detects worker disconnection via WebSocket and removes it from the routing table.
- New worker pods must download model weights if the emptyDir cache is empty (cache does not persist across pod restarts). Cold model loading can take 10-120 seconds depending on model size and cache state.
- PodDisruptionBudget:
maxUnavailable: 1per worker pool — protects against external disruptions (e.g.,kubectl drain, node autoscaler) but is not enforced by the StatefulSet controller during rolling updates.
Client Impact:
- SDK clients with automatic retry handle 503s transparently.
- Requests in flight during graceful shutdown complete normally (up to 25-second drain timeout).
- If all workers in a pool are restarting simultaneously, the router returns
202 Accepted(provisioning), and the SDK retries with backoff.
2.4 Monitor the Rollout
Section titled “2.4 Monitor the Rollout”# Watch router rolloutkubectl rollout status deployment/sie-sie-cluster-router -n sie --timeout=120s
# Watch worker rollouts (one per pool)kubectl get statefulsets -n sie -w
# Watch all podskubectl get pods -n sie -w
# Check KEDA ScaledObjects are still healthy (not Fallback)kubectl get scaledobject -n sie -o custom-columns=NAME:.metadata.name,READY:.status.conditions[0].status,MIN:.spec.minReplicaCount,MAX:.spec.maxReplicaCount,REPLICAS:.status.currentReplicas
# Watch router logs for errors during transitionkubectl logs -n sie -l app.kubernetes.io/component=router -f --tail=203. Post-Upgrade Verification
Section titled “3. Post-Upgrade Verification”3.1 All Pods Healthy
Section titled “3.1 All Pods Healthy”# All pods Running and Readykubectl get pods -n sie# Expected: all router pods 1/1 Ready, all worker pods 1/1 Ready
# Verify new image tags are deployedkubectl get pods -n sie -o jsonpath='{range .items[*]}{.metadata.name}: {.spec.containers[0].image}{"\n"}{end}'3.2 Router Health
Section titled “3.2 Router Health”# Readiness checkkubectl exec -n sie deploy/sie-sie-cluster-router -- wget -qO- http://localhost:8080/readyz# Expected: {"status": "ready", "healthy_workers": N}
# Detailed health (worker count, models, GPU types)kubectl exec -n sie deploy/sie-sie-cluster-router -- wget -qO- http://localhost:8080/health# Expected: "status": "healthy", worker_count > 0 (if pools not scaled to zero)
# Model catalog is availablekubectl exec -n sie deploy/sie-sie-cluster-router -- wget -qO- http://localhost:8080/v1/models | head -c 5003.3 Encode Request Smoke Test
Section titled “3.3 Encode Request Smoke Test”# Port-forward to routerkubectl port-forward -n sie svc/sie-sie-cluster-router 8080:8080 &
# Test encode request (requires a running worker with GPU)python3 -c "from sie_sdk import SIEClientclient = SIEClient('http://localhost:8080')result = client.encode('BAAI/bge-m3', {'text': 'upgrade verification test'})print(f'Dense embedding dim: {len(result[\"dense\"])}')print('SUCCESS: Encode request returned 200')"
# Or with curl (JSON fallback):curl -s -X POST http://localhost:8080/v1/encode/BAAI%2Fbge-m3 \ -H "Content-Type: application/json" \ -d '{"items": [{"text": "upgrade verification test"}]}' | python3 -m json.tool | head -53.4 KEDA and Autoscaling
Section titled “3.4 KEDA and Autoscaling”# Unpause KEDA if paused in step 1.5kubectl annotate scaledobject -n sie --all autoscaling.keda.sh/paused-replicas- --overwrite
# Verify ScaledObjects are Ready (not Fallback)kubectl get scaledobject -n siekubectl describe scaledobject -n sie | grep -A3 "Conditions:"# Expected: Ready=True, Active depends on load, Fallback=False3.5 Metrics Flowing
Section titled “3.5 Metrics Flowing”# Verify Prometheus is scraping the new podskubectl exec -n monitoring svc/prometheus-operated -- wget -qO- \ 'http://localhost:9090/api/v1/query?query=sie_requests_total' 2>/dev/null | python3 -m json.tool | head -20
# Verify router metricskubectl exec -n monitoring svc/prometheus-operated -- wget -qO- \ 'http://localhost:9090/api/v1/query?query=sie_router_requests_total' 2>/dev/null | python3 -m json.tool | head -20
# Check Grafana dashboards show data for new pods# Port-forward: kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80# Navigate to SIE > Cluster Overview dashboard3.6 Version Verification
Section titled “3.6 Version Verification”# Check Helm release versionhelm list -n sie# Expected: Chart version and App version match target
# Check the server version header on a responsecurl -s -I http://localhost:8080/healthz | grep -i x-sie# Expected: X-SIE-Server-Version: <TARGET_VERSION>4. Rollback Procedure
Section titled “4. Rollback Procedure”4.1 Identify Rollback Target
Section titled “4.1 Identify Rollback Target”# List Helm release historyhelm history sie -n sie --max 10# Note the REVISION number of the last known-good release4.2 Execute Rollback
Section titled “4.2 Execute Rollback”# Rollback to previous revisionhelm rollback sie <REVISION> -n sie
# Or rollback to immediately previous versionhelm rollback sie -n sieFor Terraform-managed clusters:
# Revert image tag to previous versionexport TF_VAR_sie_image_tag="<PREVIOUS_VERSION>"cd deploy/terraform/gcp/examples/<your-env>terraform apply4.3 Monitor Rollback
Section titled “4.3 Monitor Rollback”# Watch the rollback proceedkubectl rollout status deployment/sie-sie-cluster-router -n sie --timeout=120skubectl get pods -n sie -w
# Verify old image is restoredkubectl get pods -n sie -o jsonpath='{range .items[*]}{.metadata.name}: {.spec.containers[0].image}{"\n"}{end}'4.4 Verify Rollback Succeeded
Section titled “4.4 Verify Rollback Succeeded”Run the same post-upgrade verification steps:
# Router healthkubectl exec -n sie deploy/sie-sie-cluster-router -- wget -qO- http://localhost:8080/readyz
# Encode smoke testkubectl port-forward -n sie svc/sie-sie-cluster-router 8080:8080 &python3 -c "from sie_sdk import SIEClientclient = SIEClient('http://localhost:8080')result = client.encode('BAAI/bge-m3', {'text': 'rollback verification'})print(f'Dense dim: {len(result[\"dense\"])} - SUCCESS')"
# KEDA healthkubectl get scaledobject -n sie4.5 Known Caveats
Section titled “4.5 Known Caveats”- No schema migrations: SIE is stateless. Workers use emptyDir for model cache, and the router stores pool state in ConfigMaps with Leases for TTL. There are no database migrations to worry about during rollback.
- Model cache invalidation: Worker pods use emptyDir volumes for the HuggingFace model cache. Rolling back means new pods start with an empty cache and must re-download model weights on first request. If cluster cache (S3/GCS) is configured, downloads come from there instead of HuggingFace Hub.
- Pool state: Resource pools are stored as ConfigMaps in the
sienamespace. Pool leases survive upgrades and rollbacks. Active pools will continue to work, but if the pool API changed between versions, clients may need to recreate pools. - KEDA ScaledObjects: Helm rollback re-applies the previous ScaledObject definitions. If KEDA version requirements changed between SIE versions, verify ScaledObjects are not in Fallback mode after rollback.
- Config drift: If the upgrade included changes to embedded model or bundle configs (baked into the Helm chart
files/directory), rollback restores the previous configs. Ensure the previous configs are compatible with the previous server version. - SDK version compatibility: The router returns
X-SIE-Server-Versionheaders. If clients upgraded their SDK alongside the server, a server rollback may trigger version mismatch warnings in the SDK logs. The SDK remains functional but logs warnings for major.minor mismatches.
Appendix: Key Resources
Section titled “Appendix: Key Resources”| Resource | Namespace | Type | Purpose |
|---|---|---|---|
sie-sie-cluster-router | sie | Deployment | Stateless request router (2+ replicas) |
sie-sie-cluster-worker-<pool> | sie | StatefulSet | GPU worker pool (one per pool) |
sie-sie-cluster-worker | sie | Service (headless) | Worker DNS discovery |
sie-sie-cluster-router | sie | Service (ClusterIP) | Router endpoint |
sie-sie-cluster-worker-<pool>-scaler | sie | ScaledObject | KEDA autoscaler per pool |
sie-sie-cluster-worker-<pool> | sie | PodDisruptionBudget | maxUnavailable: 1 per pool |
sie-sie-cluster-gpu-config | sie | ConfigMap | Available GPU types / machine profiles |
sie-sie-cluster-config | sie | ConfigMap | Shared cluster configuration |
Health Endpoints
Section titled “Health Endpoints”| Endpoint | Component | Returns |
|---|---|---|
GET /healthz | Router | {"status": "ok"} — liveness probe |
GET /readyz | Router | {"status": "ready", "healthy_workers": N} — readiness probe |
GET /health | Router | Detailed cluster status (worker count, GPUs, models) |
GET /healthz | Worker | "ok" — liveness probe |
GET /readyz | Worker | "ok" or 503 — readiness probe |
GET /metrics | Both | Prometheus metrics |
Grafana Dashboards
Section titled “Grafana Dashboards”| Dashboard | Purpose |
|---|---|
| Cluster Overview | QPS, latency (p50/p95/p99), GPU utilization |
| Model Performance | Per-model latency, throughput, batch sizes |
| Worker Health | Per-worker CPU/memory, GPU temp, queue depth |
What’s Next
Section titled “What’s Next”- Monitoring - metrics, alerts, and dashboards
- Scale-from-Zero - KEDA autoscaling and cold start handling
- Kubernetes in GCP - GKE deployment setup
- Kubernetes in AWS - EKS deployment setup