CLI Reference

SIE provides five CLI tools for different roles: server operation, benchmarking, administration, monitoring, and routing. All tools use typer for argument parsing.

sie-server

The inference server. Start with sie-server serve.

serve

sie-server serve [OPTIONS]

Start the SIE inference server.

Option	Default	Description
`--port`, `-p`	`8080`	Port to listen on
`--host`	`0.0.0.0`	Host to bind to
`--device`, `-d`	`auto`	Device for inference: `auto` (detect GPU), `cuda`, `mps`, `cpu`
`--models-dir`	`./models`	Models config directory (local path, `s3://`, or `gs://`)
`--bundle`, `-b`	None	Bundle name to load from `bundles/` dir (e.g., `default`, `legacy`)
`--models`, `-m`	None	Comma-separated model names to load (mutually exclusive with `--bundle`)
`--local-cache`	`HF_HOME`	Local cache directory for model weights
`--cluster-cache`	None	Cluster cache URL for model weights (`s3://` or `gs://`)
`--hf-fallback`/`--no-hf-fallback`	`true`	Enable/disable HuggingFace Hub fallback for weight downloads
`--reload`	`false`	Enable auto-reload for development (uses uvicorn reload)
`--tracing`	`false`	Enable OpenTelemetry tracing (exports to `localhost:4317`)
`--verbose`, `-v`	`false`	Enable verbose logging
`--json-logs`	`false`	Enable structured JSON logging (for Loki compatibility)

Examples:

# Start with defaults (auto-detect GPU, port 8080)
sie-server serve

# Specific port and device
sie-server serve --port 8081 --device cuda

# Load specific bundle
sie-server serve --bundle legacy

# Load specific models only
sie-server serve --models BAAI/bge-m3,BAAI/bge-reranker-v2-m3

# Use cloud model configs
sie-server serve --models-dir s3://my-bucket/sie-models/

# Development mode with auto-reload
sie-server serve --reload --verbose

resolve-deps

sie-server resolve-deps [OPTIONS]

Resolve and print dependencies for a bundle or model list. Used by deployment scripts.

Option	Description
`--bundle`, `-b`	Bundle name to resolve deps for
`--models`, `-m`	Comma-separated model names
`--models-dir`	Models directory
`--json`	Output as JSON

sie-bench

Evaluation and benchmarking CLI. Runs quality and performance evaluations.

eval

sie-bench eval MODEL --task TASK --type TYPE [OPTIONS]

Run evaluation against multiple sources.

Argument/Option	Description
`MODEL`	Model name (e.g., `BAAI/bge-m3`)
`--task`, `-t`	Namespaced task (e.g., `mteb/NFCorpus`, `beir/SciFact`)
`--type`	Evaluation type: `quality` or `perf`
`--sources`, `-s`	Comma-separated sources: `sie`, `tei`, `infinity`, `fastembed`, `benchmark`, `targets`, `measurements`, or a URL (default: `sie`)
`--batch-size`, `-b`	Batch size for performance evaluation (default: 1)
`--concurrency`, `-c`	Concurrency level (default: 16)
`--device`, `-d`	Device for inference (default: `cuda:0`)
`--output`, `-o`	Output format: `table`, `json`, `md` (default: `table`)
`--profile`, `-p`	Named profile from model config (e.g., `sparse`, `muvera`). Controls runtime options including output types.
`--lang`	Language filter (ISO 639-3, e.g., `eng` for English only). For multilingual tasks.
`--timeout`	Request timeout in seconds (default: 120, use 600+ for VLMs)
`--verbose`, `-v`	Enable verbose logging

Target management:

Option	Description
`--save-targets SOURCE`	Save results from SOURCE (e.g., `tei`, `benchmark`) as targets in model config
`--save-measurements SOURCE`	Save results from SOURCE (e.g., `sie`) as measurements in model config
`--check-targets`	Exit non-zero if SIE results are below targets. Requires `targets` in `--sources`.
`--check-measurements`	Exit non-zero if SIE results are below past measurements. Requires `measurements` in `--sources`.
`--print`	Print summary table of all targets and measurements from model configs
`--print-json`	Print JSON with task metadata and model results for website integration
`--models-dir`	Path to models directory (for target/measurement operations)

Cluster options:

Option	Description
`--cluster`	Cluster router URL for elastic cloud deployments (e.g., `https://router.example.com`)
`--gpu`	Target GPU type for cluster routing (e.g., `l4`, `a100-80gb`). Requires `--cluster`.
`--provision`	Wait for GPU capacity if not immediately available. Requires `--cluster`.
`--provision-timeout`	Max seconds to wait for GPU provisioning (default: 300)
`--wait-ready`	Wait for cluster GPU capacity before starting benchmark. Requires `--cluster`.

Experiment tracking:

Option	Description
`--wandb-project`	W&B project name
`--wandb-entity`	W&B entity/team name
`--mlflow-experiment`	MLflow experiment name
`--mlflow-uri`	MLflow tracking URI

Examples:

# Quality evaluation
sie-bench eval BAAI/bge-m3 -t mteb/NFCorpus --type quality

# Compare SIE vs TEI vs published benchmark
sie-bench eval BAAI/bge-m3 -t mteb/NFCorpus --type quality -s sie,tei,benchmark

# Performance benchmark
sie-bench eval BAAI/bge-m3 -t mteb/NFCorpus --type perf -s sie

# Save results as targets
sie-bench eval BAAI/bge-m3 -t mteb/NFCorpus --type quality --save-targets sie

# CI regression check
sie-bench eval BAAI/bge-m3 -t mteb/NFCorpus --type quality -s sie,targets --check-targets

# Print summary of all configured targets
sie-bench eval --print --type quality

# Evaluate on cluster with specific GPU
sie-bench eval BAAI/bge-m3 -t mteb/NFCorpus --type perf \
  --cluster http://router:8080 --gpu l4 --provision

matrix

sie-bench matrix CONFIG --cluster URL [OPTIONS]

Run matrix evaluation across models, profiles, tasks, and GPUs.

Argument/Option	Description
`CONFIG`	Path to matrix config YAML
`--cluster`, `-c`	Cluster router URL (required)
`--workers`, `-w`	Number of parallel workers per GPU type (default: 1)
`--pool-timeout`	Timeout waiting for pools to become active, in seconds (default: 300)
`--models-dir`	Path to models directory
`--save-measurements`/`--no-save-measurements`	Save results to model configs (default: enabled)
`--output`, `-o`	Output format: `table`, `json`, `md` (default: `table`)
`--verbose`, `-v`	Enable verbose logging

Example:

sie-bench matrix configs/eval-matrix.yaml --cluster http://router:8080 --workers 2

loadtest

sie-bench loadtest SCENARIO --cluster URL [OPTIONS]

Run load test scenario against a SIE cluster.

Argument/Option	Description
`SCENARIO`	Path to load test scenario YAML
`--cluster`, `-c`	Cluster router URL
`--duration`, `-d`	Override scenario duration (seconds)
`--output`, `-o`	Output directory for reports
`--verbose`, `-v`	Verbose output

Example:

sie-bench loadtest scenario.yaml --cluster http://router:8080 --duration 300

sie-admin

Cluster administration and cache management. Has three subcommand groups: cache, cluster, and models.

cache populate

sie-admin cache populate [MODEL] [OPTIONS]

Download model weights to local cache or cluster cache.

Argument/Option	Description
`MODEL`	Model ID to populate (e.g., `BAAI/bge-m3`)
`--bundle`, `-b`	Bundle name to populate all models
`--target`, `-t`	Target S3/GCS URL for cluster cache

Examples:

# Download single model to local cache
sie-admin cache populate BAAI/bge-m3

# Download all models in a bundle
sie-admin cache populate --bundle default

# Download and upload to cluster cache
sie-admin cache populate BAAI/bge-m3 --target s3://my-bucket/sie-cache/

cache sync

sie-admin cache sync PATH --target URL [OPTIONS]

Sync model configs from local path to cluster storage.

Argument/Option	Description
`PATH`	Local path to model configs
`--target`, `-t`	Target S3/GCS URL
`--dry-run`, `-n`	Show what would be synced

Examples:

# Sync configs to S3
sie-admin cache sync ./models --target s3://my-bucket/sie-models/

# Dry run to preview
sie-admin cache sync ./models -t s3://bucket/configs --dry-run

cache status

sie-admin cache status

Show cache status including local and cluster cache contents, with model sizes and download status.

cluster status

sie-admin cluster status ROUTER [OPTIONS]

Show cluster status (workers, GPUs, models).

Argument/Option	Description
`ROUTER`	Router URL (e.g., `router.example.com:8080`)
`--json`, `-j`	Output as JSON

Example:

sie-admin cluster status router:8080

cluster models

sie-admin cluster models ROUTER [OPTIONS]

Show model availability across workers.

Argument/Option	Description
`ROUTER`	Router URL
`--json`, `-j`	Output as JSON

models validate

sie-admin models validate PATH

Validate model config YAML files against the schema.

Argument/Option	Description
`PATH`	Path to model config(s) - supports glob patterns, local dirs, or cloud URLs

Examples:

# Validate all models in a directory
sie-admin models validate ./models/

# Validate a single config
sie-admin models validate ./models/baai-bge-m3.yaml

# Validate configs in S3
sie-admin models validate s3://my-bucket/models/

models list

sie-admin models list PATH [OPTIONS]

List models in a directory or bucket with their metadata.

Argument/Option	Description
`PATH`	Path to model configs (local or S3/GCS)
`--json`, `-j`	Output as JSON

Examples:

# List models in local directory
sie-admin models list ./models

# List models in S3 bucket
sie-admin models list s3://my-bucket/models/

# Output as JSON for scripting
sie-admin models list ./models --json

sie-top

Real-time TUI monitor for SIE servers and clusters.

sie-top [HOST:PORT] [OPTIONS]

Argument/Option	Default	Description
`HOST:PORT`	`localhost:8080`	Server address
`--cluster`, `-c`	-	Force cluster mode (connect to router)
`--worker`, `-w`	-	Force worker mode (connect to single server)

Mode is auto-detected by probing the router /health endpoint (falls back to worker mode if unavailable).

Examples:

# Monitor local server (auto-detect mode)
sie-top

# Monitor specific server
sie-top localhost:8080

# Force cluster mode (connect to router)
sie-top --cluster router.example.com:8080

# Force worker mode
sie-top --worker worker-0:8080

Installation:

The TUI requires optional dependencies:

pip install 'sie-admin[top]'

sie-router

Stateless request router for elastic cloud deployments.

serve

sie-router serve [OPTIONS]

Start the SIE Router server.

Option	Default	Description
`--port`, `-p`	`8081`	Port to listen on
`--host`	`0.0.0.0`	Host to bind to
`--worker`, `-w`	None	Worker URLs (can specify multiple times)
`--kubernetes`, `-k`	`false`	Use Kubernetes service discovery
`--k8s-namespace`	`default`	Kubernetes namespace for discovery
`--k8s-service`	`sie-worker`	Kubernetes service name to discover
`--k8s-port`	`8080`	Worker port for K8s-discovered endpoints
`--log-level`, `-l`	`info`	Log level: `debug`, `info`, `warning`, `error`
`--json-logs`	`false`	Enable structured JSON logging (for Loki compatibility)
`--reload`, `-r`	`false`	Enable auto-reload for development

Examples:

# Static worker discovery
sie-router serve -w http://worker-0:8080 -w http://worker-1:8080

# Kubernetes discovery
sie-router serve --kubernetes --k8s-service sie-worker

# Development with auto-reload
sie-router serve -w http://localhost:8080 --reload

version

sie-router version

Show version information.

Environment Variables

Many CLI options can be set via environment variables. CLI arguments override environment variables, which override defaults.

Server (sie-server):

Variable	CLI Equivalent	Description
`SIE_DEVICE`	`--device`	Inference device (`cuda`, `mps`, `cpu`)
`SIE_MODELS_DIR`	`--models-dir`	Models config directory
`SIE_MODEL_FILTER`	`--models`	Comma-separated model names to load
`SIE_LOCAL_CACHE`	`--local-cache`	Local cache directory for weights
`SIE_CLUSTER_CACHE`	`--cluster-cache`	Cluster cache URL (`s3://` or `gs://`)
`SIE_HF_FALLBACK`	`--hf-fallback`	Enable HF Hub fallback (`true`/`false`)
`SIE_LOG_JSON`	`--json-logs`	Enable JSON logging (`true`/`false`)
`SIE_TRACING_ENABLED`	`--tracing`	Enable OpenTelemetry tracing
`SIE_GPU_TYPE`	-	Override detected GPU type
`SIE_MEMORY_PRESSURE_THRESHOLD_PCT`	-	GPU memory pressure threshold (0-100)
`SIE_MEMORY_CHECK_INTERVAL_S`	-	Memory check interval in seconds
`SIE_IMAGE_WORKERS`	-	Image preprocessing worker count (default: 4)
`SIE_INSTRUMENTATION`	-	Enable detailed instrumentation

Router (sie-router):

Variable	CLI Equivalent	Description
`SIE_ROUTER_WORKERS`	`--worker`	Comma-separated worker URLs
`SIE_ROUTER_KUBERNETES`	`--kubernetes`	Enable K8s discovery (`true`/`false`)
`SIE_ROUTER_K8S_NAMESPACE`	`--k8s-namespace`	K8s namespace
`SIE_ROUTER_K8S_SERVICE`	`--k8s-service`	K8s service name
`SIE_ROUTER_K8S_PORT`	`--k8s-port`	K8s worker port
`SIE_ROUTER_ENABLE_POOLS`	-	Enable resource pools (`true`/`false`)
`SIE_ROUTER_CONFIGURED_GPUS`	-	Comma-separated configured GPU types
`SIE_LOG_JSON`	`--json-logs`	Enable JSON logging

See Configuration for the complete list.