CLI Reference
SIE provides five CLI tools for different roles: server operation, benchmarking, administration, monitoring, and routing. All tools use typer for argument parsing.
sie-server
Section titled “sie-server”The inference server. Start with sie-server serve.
sie-server serve [OPTIONS]Start the SIE inference server.
| Option | Default | Description |
|---|---|---|
--port, -p | 8080 | Port to listen on |
--host | 0.0.0.0 | Host to bind to |
--device, -d | auto | Device for inference: auto (detect GPU), cuda, mps, cpu |
--models-dir | ./models | Models config directory (local path, s3://, or gs://) |
--bundle, -b | None | Bundle name to load from bundles/ dir (e.g., default, legacy) |
--models, -m | None | Comma-separated model names to load (mutually exclusive with --bundle) |
--local-cache | HF_HOME | Local cache directory for model weights |
--cluster-cache | None | Cluster cache URL for model weights (s3:// or gs://) |
--hf-fallback/--no-hf-fallback | true | Enable/disable HuggingFace Hub fallback for weight downloads |
--reload | false | Enable auto-reload for development (uses uvicorn reload) |
--tracing | false | Enable OpenTelemetry tracing (exports to localhost:4317) |
--verbose, -v | false | Enable verbose logging |
--json-logs | false | Enable structured JSON logging (for Loki compatibility) |
Examples:
# Start with defaults (auto-detect GPU, port 8080)sie-server serve
# Specific port and devicesie-server serve --port 8081 --device cuda
# Load specific bundlesie-server serve --bundle legacy
# Load specific models onlysie-server serve --models BAAI/bge-m3,BAAI/bge-reranker-v2-m3
# Use cloud model configssie-server serve --models-dir s3://my-bucket/sie-models/
# Development mode with auto-reloadsie-server serve --reload --verboseresolve-deps
Section titled “resolve-deps”sie-server resolve-deps [OPTIONS]Resolve and print dependencies for a bundle or model list. Used by deployment scripts.
| Option | Description |
|---|---|
--bundle, -b | Bundle name to resolve deps for |
--models, -m | Comma-separated model names |
--models-dir | Models directory |
--json | Output as JSON |
sie-bench
Section titled “sie-bench”Evaluation and benchmarking CLI. Runs quality and performance evaluations.
sie-bench eval MODEL --task TASK --type TYPE [OPTIONS]Run evaluation against multiple sources.
| Argument/Option | Description |
|---|---|
MODEL | Model name (e.g., BAAI/bge-m3) |
--task, -t | Namespaced task (e.g., mteb/NFCorpus, beir/SciFact) |
--type | Evaluation type: quality or perf |
--sources, -s | Comma-separated sources: sie, tei, infinity, fastembed, benchmark, targets, measurements, or a URL (default: sie) |
--batch-size, -b | Batch size for performance evaluation (default: 1) |
--concurrency, -c | Concurrency level (default: 16) |
--device, -d | Device for inference (default: cuda:0) |
--output, -o | Output format: table, json, md (default: table) |
--profile, -p | Named profile from model config (e.g., sparse, muvera). Controls runtime options including output types. |
--lang | Language filter (ISO 639-3, e.g., eng for English only). For multilingual tasks. |
--timeout | Request timeout in seconds (default: 120, use 600+ for VLMs) |
--verbose, -v | Enable verbose logging |
Target management:
| Option | Description |
|---|---|
--save-targets SOURCE | Save results from SOURCE (e.g., tei, benchmark) as targets in model config |
--save-measurements SOURCE | Save results from SOURCE (e.g., sie) as measurements in model config |
--check-targets | Exit non-zero if SIE results are below targets. Requires targets in --sources. |
--check-measurements | Exit non-zero if SIE results are below past measurements. Requires measurements in --sources. |
--print | Print summary table of all targets and measurements from model configs |
--print-json | Print JSON with task metadata and model results for website integration |
--models-dir | Path to models directory (for target/measurement operations) |
Cluster options:
| Option | Description |
|---|---|
--cluster | Cluster router URL for elastic cloud deployments (e.g., https://router.example.com) |
--gpu | Target GPU type for cluster routing (e.g., l4, a100-80gb). Requires --cluster. |
--provision | Wait for GPU capacity if not immediately available. Requires --cluster. |
--provision-timeout | Max seconds to wait for GPU provisioning (default: 300) |
--wait-ready | Wait for cluster GPU capacity before starting benchmark. Requires --cluster. |
Experiment tracking:
| Option | Description |
|---|---|
--wandb-project | W&B project name |
--wandb-entity | W&B entity/team name |
--mlflow-experiment | MLflow experiment name |
--mlflow-uri | MLflow tracking URI |
Examples:
# Quality evaluationsie-bench eval BAAI/bge-m3 -t mteb/NFCorpus --type quality
# Compare SIE vs TEI vs published benchmarksie-bench eval BAAI/bge-m3 -t mteb/NFCorpus --type quality -s sie,tei,benchmark
# Performance benchmarksie-bench eval BAAI/bge-m3 -t mteb/NFCorpus --type perf -s sie
# Save results as targetssie-bench eval BAAI/bge-m3 -t mteb/NFCorpus --type quality --save-targets sie
# CI regression checksie-bench eval BAAI/bge-m3 -t mteb/NFCorpus --type quality -s sie,targets --check-targets
# Print summary of all configured targetssie-bench eval --print --type quality
# Evaluate on cluster with specific GPUsie-bench eval BAAI/bge-m3 -t mteb/NFCorpus --type perf \ --cluster http://router:8080 --gpu l4 --provisionmatrix
Section titled “matrix”sie-bench matrix CONFIG --cluster URL [OPTIONS]Run matrix evaluation across models, profiles, tasks, and GPUs.
| Argument/Option | Description |
|---|---|
CONFIG | Path to matrix config YAML |
--cluster, -c | Cluster router URL (required) |
--workers, -w | Number of parallel workers per GPU type (default: 1) |
--pool-timeout | Timeout waiting for pools to become active, in seconds (default: 300) |
--models-dir | Path to models directory |
--save-measurements/--no-save-measurements | Save results to model configs (default: enabled) |
--output, -o | Output format: table, json, md (default: table) |
--verbose, -v | Enable verbose logging |
Example:
sie-bench matrix configs/eval-matrix.yaml --cluster http://router:8080 --workers 2loadtest
Section titled “loadtest”sie-bench loadtest SCENARIO --cluster URL [OPTIONS]Run load test scenario against a SIE cluster.
| Argument/Option | Description |
|---|---|
SCENARIO | Path to load test scenario YAML |
--cluster, -c | Cluster router URL |
--duration, -d | Override scenario duration (seconds) |
--output, -o | Output directory for reports |
--verbose, -v | Verbose output |
Example:
sie-bench loadtest scenario.yaml --cluster http://router:8080 --duration 300sie-admin
Section titled “sie-admin”Cluster administration and cache management. Has three subcommand groups: cache, cluster, and models.
cache populate
Section titled “cache populate”sie-admin cache populate [MODEL] [OPTIONS]Download model weights to local cache or cluster cache.
| Argument/Option | Description |
|---|---|
MODEL | Model ID to populate (e.g., BAAI/bge-m3) |
--bundle, -b | Bundle name to populate all models |
--target, -t | Target S3/GCS URL for cluster cache |
Examples:
# Download single model to local cachesie-admin cache populate BAAI/bge-m3
# Download all models in a bundlesie-admin cache populate --bundle default
# Download and upload to cluster cachesie-admin cache populate BAAI/bge-m3 --target s3://my-bucket/sie-cache/cache sync
Section titled “cache sync”sie-admin cache sync PATH --target URL [OPTIONS]Sync model configs from local path to cluster storage.
| Argument/Option | Description |
|---|---|
PATH | Local path to model configs |
--target, -t | Target S3/GCS URL |
--dry-run, -n | Show what would be synced |
Examples:
# Sync configs to S3sie-admin cache sync ./models --target s3://my-bucket/sie-models/
# Dry run to previewsie-admin cache sync ./models -t s3://bucket/configs --dry-runcache status
Section titled “cache status”sie-admin cache statusShow cache status including local and cluster cache contents, with model sizes and download status.
cluster status
Section titled “cluster status”sie-admin cluster status ROUTER [OPTIONS]Show cluster status (workers, GPUs, models).
| Argument/Option | Description |
|---|---|
ROUTER | Router URL (e.g., router.example.com:8080) |
--json, -j | Output as JSON |
Example:
sie-admin cluster status router:8080cluster models
Section titled “cluster models”sie-admin cluster models ROUTER [OPTIONS]Show model availability across workers.
| Argument/Option | Description |
|---|---|
ROUTER | Router URL |
--json, -j | Output as JSON |
models validate
Section titled “models validate”sie-admin models validate PATHValidate model config YAML files against the schema.
| Argument/Option | Description |
|---|---|
PATH | Path to model config(s) - supports glob patterns, local dirs, or cloud URLs |
Examples:
# Validate all models in a directorysie-admin models validate ./models/
# Validate a single configsie-admin models validate ./models/baai-bge-m3.yaml
# Validate configs in S3sie-admin models validate s3://my-bucket/models/models list
Section titled “models list”sie-admin models list PATH [OPTIONS]List models in a directory or bucket with their metadata.
| Argument/Option | Description |
|---|---|
PATH | Path to model configs (local or S3/GCS) |
--json, -j | Output as JSON |
Examples:
# List models in local directorysie-admin models list ./models
# List models in S3 bucketsie-admin models list s3://my-bucket/models/
# Output as JSON for scriptingsie-admin models list ./models --jsonsie-top
Section titled “sie-top”Real-time TUI monitor for SIE servers and clusters.
sie-top [HOST:PORT] [OPTIONS]| Argument/Option | Default | Description |
|---|---|---|
HOST:PORT | localhost:8080 | Server address |
--cluster, -c | - | Force cluster mode (connect to router) |
--worker, -w | - | Force worker mode (connect to single server) |
Mode is auto-detected by probing the router /health endpoint (falls back to worker mode if unavailable).
Examples:
# Monitor local server (auto-detect mode)sie-top
# Monitor specific serversie-top localhost:8080
# Force cluster mode (connect to router)sie-top --cluster router.example.com:8080
# Force worker modesie-top --worker worker-0:8080Installation:
The TUI requires optional dependencies:
pip install 'sie-admin[top]'sie-router
Section titled “sie-router”Stateless request router for elastic cloud deployments.
sie-router serve [OPTIONS]Start the SIE Router server.
| Option | Default | Description |
|---|---|---|
--port, -p | 8081 | Port to listen on |
--host | 0.0.0.0 | Host to bind to |
--worker, -w | None | Worker URLs (can specify multiple times) |
--kubernetes, -k | false | Use Kubernetes service discovery |
--k8s-namespace | default | Kubernetes namespace for discovery |
--k8s-service | sie-worker | Kubernetes service name to discover |
--k8s-port | 8080 | Worker port for K8s-discovered endpoints |
--log-level, -l | info | Log level: debug, info, warning, error |
--json-logs | false | Enable structured JSON logging (for Loki compatibility) |
--reload, -r | false | Enable auto-reload for development |
Examples:
# Static worker discoverysie-router serve -w http://worker-0:8080 -w http://worker-1:8080
# Kubernetes discoverysie-router serve --kubernetes --k8s-service sie-worker
# Development with auto-reloadsie-router serve -w http://localhost:8080 --reloadversion
Section titled “version”sie-router versionShow version information.
Environment Variables
Section titled “Environment Variables”Many CLI options can be set via environment variables. CLI arguments override environment variables, which override defaults.
Server (sie-server):
| Variable | CLI Equivalent | Description |
|---|---|---|
SIE_DEVICE | --device | Inference device (cuda, mps, cpu) |
SIE_MODELS_DIR | --models-dir | Models config directory |
SIE_MODEL_FILTER | --models | Comma-separated model names to load |
SIE_LOCAL_CACHE | --local-cache | Local cache directory for weights |
SIE_CLUSTER_CACHE | --cluster-cache | Cluster cache URL (s3:// or gs://) |
SIE_HF_FALLBACK | --hf-fallback | Enable HF Hub fallback (true/false) |
SIE_LOG_JSON | --json-logs | Enable JSON logging (true/false) |
SIE_TRACING_ENABLED | --tracing | Enable OpenTelemetry tracing |
SIE_GPU_TYPE | - | Override detected GPU type |
SIE_MEMORY_PRESSURE_THRESHOLD_PCT | - | GPU memory pressure threshold (0-100) |
SIE_MEMORY_CHECK_INTERVAL_S | - | Memory check interval in seconds |
SIE_IMAGE_WORKERS | - | Image preprocessing worker count (default: 4) |
SIE_INSTRUMENTATION | - | Enable detailed instrumentation |
Router (sie-router):
| Variable | CLI Equivalent | Description |
|---|---|---|
SIE_ROUTER_WORKERS | --worker | Comma-separated worker URLs |
SIE_ROUTER_KUBERNETES | --kubernetes | Enable K8s discovery (true/false) |
SIE_ROUTER_K8S_NAMESPACE | --k8s-namespace | K8s namespace |
SIE_ROUTER_K8S_SERVICE | --k8s-service | K8s service name |
SIE_ROUTER_K8S_PORT | --k8s-port | K8s worker port |
SIE_ROUTER_ENABLE_POOLS | - | Enable resource pools (true/false) |
SIE_ROUTER_CONFIGURED_GPUS | - | Comma-separated configured GPU types |
SIE_LOG_JSON | --json-logs | Enable JSON logging |
See Configuration for the complete list.