Skip to content
SIE

CLI Reference

SIE provides five CLI tools for different roles: server operation, benchmarking, administration, monitoring, and routing. All tools use typer for argument parsing.

The inference server. Start with sie-server serve.

Terminal window
sie-server serve [OPTIONS]

Start the SIE inference server.

OptionDefaultDescription
--port, -p8080Port to listen on
--host0.0.0.0Host to bind to
--device, -dautoDevice for inference: auto (detect GPU), cuda, mps, cpu
--models-dir./modelsModels config directory (local path, s3://, or gs://)
--bundle, -bNoneBundle name to load from bundles/ dir (e.g., default, legacy)
--models, -mNoneComma-separated model names to load (mutually exclusive with --bundle)
--local-cacheHF_HOMELocal cache directory for model weights
--cluster-cacheNoneCluster cache URL for model weights (s3:// or gs://)
--hf-fallback/--no-hf-fallbacktrueEnable/disable HuggingFace Hub fallback for weight downloads
--reloadfalseEnable auto-reload for development (uses uvicorn reload)
--tracingfalseEnable OpenTelemetry tracing (exports to localhost:4317)
--verbose, -vfalseEnable verbose logging
--json-logsfalseEnable structured JSON logging (for Loki compatibility)

Examples:

Terminal window
# Start with defaults (auto-detect GPU, port 8080)
sie-server serve
# Specific port and device
sie-server serve --port 8081 --device cuda
# Load specific bundle
sie-server serve --bundle legacy
# Load specific models only
sie-server serve --models BAAI/bge-m3,BAAI/bge-reranker-v2-m3
# Use cloud model configs
sie-server serve --models-dir s3://my-bucket/sie-models/
# Development mode with auto-reload
sie-server serve --reload --verbose
Terminal window
sie-server resolve-deps [OPTIONS]

Resolve and print dependencies for a bundle or model list. Used by deployment scripts.

OptionDescription
--bundle, -bBundle name to resolve deps for
--models, -mComma-separated model names
--models-dirModels directory
--jsonOutput as JSON

Evaluation and benchmarking CLI. Runs quality and performance evaluations.

Terminal window
sie-bench eval MODEL --task TASK --type TYPE [OPTIONS]

Run evaluation against multiple sources.

Argument/OptionDescription
MODELModel name (e.g., BAAI/bge-m3)
--task, -tNamespaced task (e.g., mteb/NFCorpus, beir/SciFact)
--typeEvaluation type: quality or perf
--sources, -sComma-separated sources: sie, tei, infinity, fastembed, benchmark, targets, measurements, or a URL (default: sie)
--batch-size, -bBatch size for performance evaluation (default: 1)
--concurrency, -cConcurrency level (default: 16)
--device, -dDevice for inference (default: cuda:0)
--output, -oOutput format: table, json, md (default: table)
--profile, -pNamed profile from model config (e.g., sparse, muvera). Controls runtime options including output types.
--langLanguage filter (ISO 639-3, e.g., eng for English only). For multilingual tasks.
--timeoutRequest timeout in seconds (default: 120, use 600+ for VLMs)
--verbose, -vEnable verbose logging

Target management:

OptionDescription
--save-targets SOURCESave results from SOURCE (e.g., tei, benchmark) as targets in model config
--save-measurements SOURCESave results from SOURCE (e.g., sie) as measurements in model config
--check-targetsExit non-zero if SIE results are below targets. Requires targets in --sources.
--check-measurementsExit non-zero if SIE results are below past measurements. Requires measurements in --sources.
--printPrint summary table of all targets and measurements from model configs
--print-jsonPrint JSON with task metadata and model results for website integration
--models-dirPath to models directory (for target/measurement operations)

Cluster options:

OptionDescription
--clusterCluster router URL for elastic cloud deployments (e.g., https://router.example.com)
--gpuTarget GPU type for cluster routing (e.g., l4, a100-80gb). Requires --cluster.
--provisionWait for GPU capacity if not immediately available. Requires --cluster.
--provision-timeoutMax seconds to wait for GPU provisioning (default: 300)
--wait-readyWait for cluster GPU capacity before starting benchmark. Requires --cluster.

Experiment tracking:

OptionDescription
--wandb-projectW&B project name
--wandb-entityW&B entity/team name
--mlflow-experimentMLflow experiment name
--mlflow-uriMLflow tracking URI

Examples:

Terminal window
# Quality evaluation
sie-bench eval BAAI/bge-m3 -t mteb/NFCorpus --type quality
# Compare SIE vs TEI vs published benchmark
sie-bench eval BAAI/bge-m3 -t mteb/NFCorpus --type quality -s sie,tei,benchmark
# Performance benchmark
sie-bench eval BAAI/bge-m3 -t mteb/NFCorpus --type perf -s sie
# Save results as targets
sie-bench eval BAAI/bge-m3 -t mteb/NFCorpus --type quality --save-targets sie
# CI regression check
sie-bench eval BAAI/bge-m3 -t mteb/NFCorpus --type quality -s sie,targets --check-targets
# Print summary of all configured targets
sie-bench eval --print --type quality
# Evaluate on cluster with specific GPU
sie-bench eval BAAI/bge-m3 -t mteb/NFCorpus --type perf \
--cluster http://router:8080 --gpu l4 --provision
Terminal window
sie-bench matrix CONFIG --cluster URL [OPTIONS]

Run matrix evaluation across models, profiles, tasks, and GPUs.

Argument/OptionDescription
CONFIGPath to matrix config YAML
--cluster, -cCluster router URL (required)
--workers, -wNumber of parallel workers per GPU type (default: 1)
--pool-timeoutTimeout waiting for pools to become active, in seconds (default: 300)
--models-dirPath to models directory
--save-measurements/--no-save-measurementsSave results to model configs (default: enabled)
--output, -oOutput format: table, json, md (default: table)
--verbose, -vEnable verbose logging

Example:

Terminal window
sie-bench matrix configs/eval-matrix.yaml --cluster http://router:8080 --workers 2
Terminal window
sie-bench loadtest SCENARIO --cluster URL [OPTIONS]

Run load test scenario against a SIE cluster.

Argument/OptionDescription
SCENARIOPath to load test scenario YAML
--cluster, -cCluster router URL
--duration, -dOverride scenario duration (seconds)
--output, -oOutput directory for reports
--verbose, -vVerbose output

Example:

Terminal window
sie-bench loadtest scenario.yaml --cluster http://router:8080 --duration 300

Cluster administration and cache management. Has three subcommand groups: cache, cluster, and models.

Terminal window
sie-admin cache populate [MODEL] [OPTIONS]

Download model weights to local cache or cluster cache.

Argument/OptionDescription
MODELModel ID to populate (e.g., BAAI/bge-m3)
--bundle, -bBundle name to populate all models
--target, -tTarget S3/GCS URL for cluster cache

Examples:

Terminal window
# Download single model to local cache
sie-admin cache populate BAAI/bge-m3
# Download all models in a bundle
sie-admin cache populate --bundle default
# Download and upload to cluster cache
sie-admin cache populate BAAI/bge-m3 --target s3://my-bucket/sie-cache/
Terminal window
sie-admin cache sync PATH --target URL [OPTIONS]

Sync model configs from local path to cluster storage.

Argument/OptionDescription
PATHLocal path to model configs
--target, -tTarget S3/GCS URL
--dry-run, -nShow what would be synced

Examples:

Terminal window
# Sync configs to S3
sie-admin cache sync ./models --target s3://my-bucket/sie-models/
# Dry run to preview
sie-admin cache sync ./models -t s3://bucket/configs --dry-run
Terminal window
sie-admin cache status

Show cache status including local and cluster cache contents, with model sizes and download status.

Terminal window
sie-admin cluster status ROUTER [OPTIONS]

Show cluster status (workers, GPUs, models).

Argument/OptionDescription
ROUTERRouter URL (e.g., router.example.com:8080)
--json, -jOutput as JSON

Example:

Terminal window
sie-admin cluster status router:8080
Terminal window
sie-admin cluster models ROUTER [OPTIONS]

Show model availability across workers.

Argument/OptionDescription
ROUTERRouter URL
--json, -jOutput as JSON
Terminal window
sie-admin models validate PATH

Validate model config YAML files against the schema.

Argument/OptionDescription
PATHPath to model config(s) - supports glob patterns, local dirs, or cloud URLs

Examples:

Terminal window
# Validate all models in a directory
sie-admin models validate ./models/
# Validate a single config
sie-admin models validate ./models/baai-bge-m3.yaml
# Validate configs in S3
sie-admin models validate s3://my-bucket/models/
Terminal window
sie-admin models list PATH [OPTIONS]

List models in a directory or bucket with their metadata.

Argument/OptionDescription
PATHPath to model configs (local or S3/GCS)
--json, -jOutput as JSON

Examples:

Terminal window
# List models in local directory
sie-admin models list ./models
# List models in S3 bucket
sie-admin models list s3://my-bucket/models/
# Output as JSON for scripting
sie-admin models list ./models --json

Real-time TUI monitor for SIE servers and clusters.

Terminal window
sie-top [HOST:PORT] [OPTIONS]
Argument/OptionDefaultDescription
HOST:PORTlocalhost:8080Server address
--cluster, -c-Force cluster mode (connect to router)
--worker, -w-Force worker mode (connect to single server)

Mode is auto-detected by probing the router /health endpoint (falls back to worker mode if unavailable).

Examples:

Terminal window
# Monitor local server (auto-detect mode)
sie-top
# Monitor specific server
sie-top localhost:8080
# Force cluster mode (connect to router)
sie-top --cluster router.example.com:8080
# Force worker mode
sie-top --worker worker-0:8080

Installation:

The TUI requires optional dependencies:

Terminal window
pip install 'sie-admin[top]'

Stateless request router for elastic cloud deployments.

Terminal window
sie-router serve [OPTIONS]

Start the SIE Router server.

OptionDefaultDescription
--port, -p8081Port to listen on
--host0.0.0.0Host to bind to
--worker, -wNoneWorker URLs (can specify multiple times)
--kubernetes, -kfalseUse Kubernetes service discovery
--k8s-namespacedefaultKubernetes namespace for discovery
--k8s-servicesie-workerKubernetes service name to discover
--k8s-port8080Worker port for K8s-discovered endpoints
--log-level, -linfoLog level: debug, info, warning, error
--json-logsfalseEnable structured JSON logging (for Loki compatibility)
--reload, -rfalseEnable auto-reload for development

Examples:

Terminal window
# Static worker discovery
sie-router serve -w http://worker-0:8080 -w http://worker-1:8080
# Kubernetes discovery
sie-router serve --kubernetes --k8s-service sie-worker
# Development with auto-reload
sie-router serve -w http://localhost:8080 --reload
Terminal window
sie-router version

Show version information.


Many CLI options can be set via environment variables. CLI arguments override environment variables, which override defaults.

Server (sie-server):

VariableCLI EquivalentDescription
SIE_DEVICE--deviceInference device (cuda, mps, cpu)
SIE_MODELS_DIR--models-dirModels config directory
SIE_MODEL_FILTER--modelsComma-separated model names to load
SIE_LOCAL_CACHE--local-cacheLocal cache directory for weights
SIE_CLUSTER_CACHE--cluster-cacheCluster cache URL (s3:// or gs://)
SIE_HF_FALLBACK--hf-fallbackEnable HF Hub fallback (true/false)
SIE_LOG_JSON--json-logsEnable JSON logging (true/false)
SIE_TRACING_ENABLED--tracingEnable OpenTelemetry tracing
SIE_GPU_TYPE-Override detected GPU type
SIE_MEMORY_PRESSURE_THRESHOLD_PCT-GPU memory pressure threshold (0-100)
SIE_MEMORY_CHECK_INTERVAL_S-Memory check interval in seconds
SIE_IMAGE_WORKERS-Image preprocessing worker count (default: 4)
SIE_INSTRUMENTATION-Enable detailed instrumentation

Router (sie-router):

VariableCLI EquivalentDescription
SIE_ROUTER_WORKERS--workerComma-separated worker URLs
SIE_ROUTER_KUBERNETES--kubernetesEnable K8s discovery (true/false)
SIE_ROUTER_K8S_NAMESPACE--k8s-namespaceK8s namespace
SIE_ROUTER_K8S_SERVICE--k8s-serviceK8s service name
SIE_ROUTER_K8S_PORT--k8s-portK8s worker port
SIE_ROUTER_ENABLE_POOLS-Enable resource pools (true/false)
SIE_ROUTER_CONFIGURED_GPUS-Comma-separated configured GPU types
SIE_LOG_JSON--json-logsEnable JSON logging

See Configuration for the complete list.