Upgrade Runbook

Procedure for upgrading an SIE cluster to a new release version. Covers Helm-managed deployments on GKE and EKS.

Components upgraded:

Router (Deployment) — stateless, fast restart
Worker pools (StatefulSets) — GPU pods, model cache in emptyDir

Version management: SIE uses release-please for unified versioning. A single version (e.g., 0.1.6) is applied to the Helm chart (Chart.yaml appVersion), all Python packages, and all TypeScript packages. The CHANGELOG.md at the repo root documents all changes per release.

1. Pre-Upgrade Checklist

Complete all items before starting the upgrade.

1.1 Review the CHANGELOG

Read CHANGELOG.md for the target version. Pay attention to:

Breaking changes in the router or server API
Helm values changes (new required values, renamed keys, removed options)
Model config changes (new or removed models, adapter changes)

# View changelog for the target version
git log v<CURRENT>..v<TARGET> --oneline

1.2 Record Current State

# Note current Helm release version
helm list -n sie

# Note current chart values (save for rollback reference)
helm get values sie -n sie -o yaml > /tmp/sie-values-backup.yaml

# Back up pool state (ConfigMaps + Leases in the sie namespace)
kubectl get configmap,lease -n sie -o yaml > /tmp/sie-pool-state-backup.yaml

# Record current image tags
kubectl get deployment -n sie -o jsonpath='{range .items[*]}{.metadata.name}: {.spec.template.spec.containers[0].image}{"\n"}{end}'
kubectl get statefulset -n sie -o jsonpath='{range .items[*]}{.metadata.name}: {.spec.template.spec.containers[0].image}{"\n"}{end}'

# Record Helm revision number
helm history sie -n sie --max 5

1.3 Verify Cluster Health

# All router pods should be Running and Ready
kubectl get pods -n sie -l app.kubernetes.io/component=router

# All worker pods should be Running and Ready (if not scaled to zero)
kubectl get pods -n sie -l app.kubernetes.io/component=worker

# Router readiness (returns {"status": "ready", "healthy_workers": N})
kubectl exec -n sie deploy/sie-sie-cluster-router -- wget -qO- http://localhost:8080/readyz

# Router detailed health (returns worker count, GPU count, loaded models)
kubectl exec -n sie deploy/sie-sie-cluster-router -- wget -qO- http://localhost:8080/health

# KEDA ScaledObjects should not be in Fallback mode
kubectl get scaledobject -n sie
kubectl describe scaledobject -n sie | grep -A2 "Type.*Fallback"

# Check for recent errors in router logs
kubectl logs -n sie -l app.kubernetes.io/component=router --tail=50 | grep -i error

# Check for recent errors in worker logs
kubectl logs -n sie -l app.kubernetes.io/component=worker --tail=50 | grep -i error

1.4 Verify Observability Stack

# Prometheus is serving queries
kubectl exec -n monitoring svc/prometheus-operated -- wget -qO- \
  'http://localhost:9090/api/v1/query?query=up' 2>/dev/null | head -c 200

# Grafana is accessible
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80 &
# Open http://localhost:3000 and verify SIE dashboards show data

1.5 Drain Active Workloads (Optional)

If running during active traffic, consider:

# Pause KEDA autoscaling to prevent scale events during upgrade.
# Each ScaledObject targets a specific StatefulSet, so freeze each one
# at its own replica count (pools may differ).
for so in $(kubectl get scaledobject -n sie -o jsonpath='{.items[*].metadata.name}'); do
  # Read the actual scale target from the ScaledObject spec
  sts=$(kubectl get scaledobject "$so" -n sie -o jsonpath='{.spec.scaleTargetRef.name}')
  replicas=$(kubectl get statefulset "$sts" -n sie -o jsonpath='{.spec.replicas}' 2>/dev/null)
  if [ -n "$replicas" ]; then
    kubectl annotate scaledobject "$so" -n sie \
      autoscaling.keda.sh/paused-replicas="$replicas" --overwrite
  fi
done

2. Upgrade Procedure

2.1 Prepare New Images

For clusters using custom image registries (not the default ghcr.io/superlinked), push the new images first:

# Build and push new images (adjust registry as needed)
REGISTRY="your-registry.example.com"
TAG="0.1.7"  # Target version

# Server image (one per bundle)
mise run docker -- --tag $TAG
docker tag sie-server:cuda12-default $REGISTRY/sie-server:$TAG-default
docker push $REGISTRY/sie-server:$TAG-default

# Router image
mise run docker -- --router --tag $TAG
docker tag sie-router:$TAG $REGISTRY/sie-router:$TAG
docker push $REGISTRY/sie-router:$TAG

2.2 Helm Upgrade

Option A: Upgrade from Local Chart

# Dry-run first to preview changes
helm diff upgrade sie deploy/helm/sie-cluster/ \
  -n sie \
  -f /tmp/sie-values-backup.yaml \
  --set workers.common.image.tag="<TARGET_VERSION>" \
  --set router.image.tag="<TARGET_VERSION>"

# Apply the upgrade (--wait blocks until pods are ready; --timeout guards against hangs)
helm upgrade sie deploy/helm/sie-cluster/ \
  -n sie \
  -f /tmp/sie-values-backup.yaml \
  --set workers.common.image.tag="<TARGET_VERSION>" \
  --set router.image.tag="<TARGET_VERSION>" \
  --wait --timeout 10m

Option B: Upgrade from OCI Registry

# Dry-run
helm diff upgrade sie oci://ghcr.io/superlinked/sie-cluster \
  -n sie \
  --version <TARGET_CHART_VERSION> \
  -f /tmp/sie-values-backup.yaml

# Apply
helm upgrade sie oci://ghcr.io/superlinked/sie-cluster \
  -n sie \
  --version <TARGET_CHART_VERSION> \
  -f /tmp/sie-values-backup.yaml \
  --wait --timeout 10m

Option C: Terraform-Managed Clusters

# Update image tag in Terraform variables
# Edit your .tfvars or set TF_VAR:
export TF_VAR_sie_image_tag="<TARGET_VERSION>"

cd deploy/terraform/gcp/examples/<your-env>
terraform plan   # Review changes
terraform apply  # Apply

2.3 Expected Behavior During Rolling Update

Router (Deployment):

Kubernetes rolls out new router pods one at a time (default RollingUpdate strategy).
Router liveness probe: GET /healthz (returns 200 if process is alive). initialDelaySeconds: 5, periodSeconds: 10.
Router readiness probe: GET /readyz (returns 200 immediately — router is ready even with 0 workers). initialDelaySeconds: 5, periodSeconds: 5.
The router is stateless; new pods come up in seconds.
Brief 503s are possible during the switchover window if all old pods are terminated before new ones pass readiness.

Workers (StatefulSets):

The default RollingUpdate strategy updates pods one at a time in reverse ordinal order. (podManagementPolicy: Parallel only affects pod ordering during scaling, not rolling updates.)
Worker terminationGracePeriodSeconds: 65.
preStop hook: sleep 10 — gives the K8s endpoints controller 10 seconds to remove the pod from the service before SIGTERM.
On SIGTERM, the server enters graceful shutdown: rejects new requests with 503 (with Retry-After: 5 header), drains in-flight requests (25-second timeout), then exits.
Readiness probe stops passing (/readyz returns 503) once shutdown begins, so the router stops sending new traffic to the draining pod.
The router detects worker disconnection via WebSocket and removes it from the routing table.
New worker pods must download model weights if the emptyDir cache is empty (cache does not persist across pod restarts). Cold model loading can take 10-120 seconds depending on model size and cache state.
PodDisruptionBudget: maxUnavailable: 1 per worker pool — protects against external disruptions (e.g., kubectl drain, node autoscaler) but is not enforced by the StatefulSet controller during rolling updates.

Client Impact:

SDK clients with automatic retry handle 503s transparently.
Requests in flight during graceful shutdown complete normally (up to 25-second drain timeout).
If all workers in a pool are restarting simultaneously, the router returns 202 Accepted (provisioning), and the SDK retries with backoff.

2.4 Monitor the Rollout

# Watch router rollout
kubectl rollout status deployment/sie-sie-cluster-router -n sie --timeout=120s

# Watch worker rollouts (one per pool)
kubectl get statefulsets -n sie -w

# Watch all pods
kubectl get pods -n sie -w

# Check KEDA ScaledObjects are still healthy (not Fallback)
kubectl get scaledobject -n sie -o custom-columns=NAME:.metadata.name,READY:.status.conditions[0].status,MIN:.spec.minReplicaCount,MAX:.spec.maxReplicaCount,REPLICAS:.status.currentReplicas

# Watch router logs for errors during transition
kubectl logs -n sie -l app.kubernetes.io/component=router -f --tail=20

3. Post-Upgrade Verification

3.1 All Pods Healthy

# All pods Running and Ready
kubectl get pods -n sie
# Expected: all router pods 1/1 Ready, all worker pods 1/1 Ready

# Verify new image tags are deployed
kubectl get pods -n sie -o jsonpath='{range .items[*]}{.metadata.name}: {.spec.containers[0].image}{"\n"}{end}'

3.2 Router Health

# Readiness check
kubectl exec -n sie deploy/sie-sie-cluster-router -- wget -qO- http://localhost:8080/readyz
# Expected: {"status": "ready", "healthy_workers": N}

# Detailed health (worker count, models, GPU types)
kubectl exec -n sie deploy/sie-sie-cluster-router -- wget -qO- http://localhost:8080/health
# Expected: "status": "healthy", worker_count > 0 (if pools not scaled to zero)

# Model catalog is available
kubectl exec -n sie deploy/sie-sie-cluster-router -- wget -qO- http://localhost:8080/v1/models | head -c 500

3.3 Encode Request Smoke Test

# Port-forward to router
kubectl port-forward -n sie svc/sie-sie-cluster-router 8080:8080 &

# Test encode request (requires a running worker with GPU)
python3 -c "
from sie_sdk import SIEClient
client = SIEClient('http://localhost:8080')
result = client.encode('BAAI/bge-m3', {'text': 'upgrade verification test'})
print(f'Dense embedding dim: {len(result[\"dense\"])}')
print('SUCCESS: Encode request returned 200')
"

# Or with curl (JSON fallback):
curl -s -X POST http://localhost:8080/v1/encode/BAAI%2Fbge-m3 \
  -H "Content-Type: application/json" \
  -d '{"items": [{"text": "upgrade verification test"}]}' | python3 -m json.tool | head -5

3.4 KEDA and Autoscaling

# Unpause KEDA if paused in step 1.5
kubectl annotate scaledobject -n sie --all autoscaling.keda.sh/paused-replicas- --overwrite

# Verify ScaledObjects are Ready (not Fallback)
kubectl get scaledobject -n sie
kubectl describe scaledobject -n sie | grep -A3 "Conditions:"
# Expected: Ready=True, Active depends on load, Fallback=False

3.5 Metrics Flowing

# Verify Prometheus is scraping the new pods
kubectl exec -n monitoring svc/prometheus-operated -- wget -qO- \
  'http://localhost:9090/api/v1/query?query=sie_requests_total' 2>/dev/null | python3 -m json.tool | head -20

# Verify router metrics
kubectl exec -n monitoring svc/prometheus-operated -- wget -qO- \
  'http://localhost:9090/api/v1/query?query=sie_router_requests_total' 2>/dev/null | python3 -m json.tool | head -20

# Check Grafana dashboards show data for new pods
# Port-forward: kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
# Navigate to SIE > Cluster Overview dashboard

3.6 Version Verification

# Check Helm release version
helm list -n sie
# Expected: Chart version and App version match target

# Check the server version header on a response
curl -s -I http://localhost:8080/healthz | grep -i x-sie
# Expected: X-SIE-Server-Version: <TARGET_VERSION>

4. Rollback Procedure

4.1 Identify Rollback Target

# List Helm release history
helm history sie -n sie --max 10
# Note the REVISION number of the last known-good release

4.2 Execute Rollback

# Rollback to previous revision
helm rollback sie <REVISION> -n sie

# Or rollback to immediately previous version
helm rollback sie -n sie

For Terraform-managed clusters:

# Revert image tag to previous version
export TF_VAR_sie_image_tag="<PREVIOUS_VERSION>"
cd deploy/terraform/gcp/examples/<your-env>
terraform apply

4.3 Monitor Rollback

# Watch the rollback proceed
kubectl rollout status deployment/sie-sie-cluster-router -n sie --timeout=120s
kubectl get pods -n sie -w

# Verify old image is restored
kubectl get pods -n sie -o jsonpath='{range .items[*]}{.metadata.name}: {.spec.containers[0].image}{"\n"}{end}'

4.4 Verify Rollback Succeeded

Run the same post-upgrade verification steps:

# Router health
kubectl exec -n sie deploy/sie-sie-cluster-router -- wget -qO- http://localhost:8080/readyz

# Encode smoke test
kubectl port-forward -n sie svc/sie-sie-cluster-router 8080:8080 &
python3 -c "
from sie_sdk import SIEClient
client = SIEClient('http://localhost:8080')
result = client.encode('BAAI/bge-m3', {'text': 'rollback verification'})
print(f'Dense dim: {len(result[\"dense\"])} - SUCCESS')
"

# KEDA health
kubectl get scaledobject -n sie

4.5 Known Caveats

No schema migrations: SIE is stateless. Workers use emptyDir for model cache, and the router stores pool state in ConfigMaps with Leases for TTL. There are no database migrations to worry about during rollback.
Model cache invalidation: Worker pods use emptyDir volumes for the HuggingFace model cache. Rolling back means new pods start with an empty cache and must re-download model weights on first request. If cluster cache (S3/GCS) is configured, downloads come from there instead of HuggingFace Hub.
Pool state: Resource pools are stored as ConfigMaps in the sie namespace. Pool leases survive upgrades and rollbacks. Active pools will continue to work, but if the pool API changed between versions, clients may need to recreate pools.
KEDA ScaledObjects: Helm rollback re-applies the previous ScaledObject definitions. If KEDA version requirements changed between SIE versions, verify ScaledObjects are not in Fallback mode after rollback.
Config drift: If the upgrade included changes to embedded model or bundle configs (baked into the Helm chart files/ directory), rollback restores the previous configs. Ensure the previous configs are compatible with the previous server version.
SDK version compatibility: The router returns X-SIE-Server-Version headers. If clients upgraded their SDK alongside the server, a server rollback may trigger version mismatch warnings in the SDK logs. The SDK remains functional but logs warnings for major.minor mismatches.

Appendix: Key Resources

Resource	Namespace	Type	Purpose
`sie-sie-cluster-router`	`sie`	Deployment	Stateless request router (2+ replicas)
`sie-sie-cluster-worker-<pool>`	`sie`	StatefulSet	GPU worker pool (one per pool)
`sie-sie-cluster-worker`	`sie`	Service (headless)	Worker DNS discovery
`sie-sie-cluster-router`	`sie`	Service (ClusterIP)	Router endpoint
`sie-sie-cluster-worker-<pool>-scaler`	`sie`	ScaledObject	KEDA autoscaler per pool
`sie-sie-cluster-worker-<pool>`	`sie`	PodDisruptionBudget	maxUnavailable: 1 per pool
`sie-sie-cluster-gpu-config`	`sie`	ConfigMap	Available GPU types / machine profiles
`sie-sie-cluster-config`	`sie`	ConfigMap	Shared cluster configuration

Health Endpoints

Endpoint	Component	Returns
`GET /healthz`	Router	`{"status": "ok"}` — liveness probe
`GET /readyz`	Router	`{"status": "ready", "healthy_workers": N}` — readiness probe
`GET /health`	Router	Detailed cluster status (worker count, GPUs, models)
`GET /healthz`	Worker	`"ok"` — liveness probe
`GET /readyz`	Worker	`"ok"` or 503 — readiness probe
`GET /metrics`	Both	Prometheus metrics

Grafana Dashboards

Dashboard	Purpose
Cluster Overview	QPS, latency (p50/p95/p99), GPU utilization
Model Performance	Per-model latency, throughput, batch sizes
Worker Health	Per-worker CPU/memory, GPU temp, queue depth

What’s Next

Monitoring - metrics, alerts, and dashboards
Scale-from-Zero - KEDA autoscaling and cold start handling
Kubernetes in GCP - GKE deployment setup
Kubernetes in AWS - EKS deployment setup