Adding Models
Add any HuggingFace model by creating a config file. No code changes required.
Directory Layout
Section titled “Directory Layout”Model configs are flat YAML files in the models directory, named {org}-{name}.yaml. The filename uses dashes to separate org from model name.
models/ baai-bge-m3.yaml my-org-my-custom-model.yamlFor Docker deployments, mount your custom models directory:
docker run --gpus all -p 8080:8080 \ -v /path/to/custom-models:/app/models:ro \ ghcr.io/superlinked/sie:latestConfig File Structure
Section titled “Config File Structure”Each model needs a config YAML file. Here is a minimal example:
name: my-org/my-modelhf_id: my-org/my-modeladapter: sie_server.adapters.pytorch_embedding:PyTorchEmbeddingAdapterinputs: - textoutputs: - densedims: dense: 768max_sequence_length: 512Required Fields
Section titled “Required Fields”| Field | Type | Description |
|---|---|---|
name | string | Model name used in API requests |
hf_id | string | HuggingFace model ID for weight download |
adapter | string | Adapter class path (see adapters below) |
inputs | list | Input modalities: text, image, audio, video |
outputs | list | Output types: dense, sparse, multivector, score, extract |
dims | object | Embedding dimensions per output type |
Weight Source
Section titled “Weight Source”At least one weight source is required (unless using base_model):
| Field | Description |
|---|---|
hf_id | HuggingFace model ID (e.g., BAAI/bge-m3) |
weights_path | Local path to weights (takes precedence over hf_id) |
Adapter Resolution
Section titled “Adapter Resolution”Specify how the model should be loaded:
| Field | Description |
|---|---|
adapter | Adapter path: module:Class or file.py:Class |
base_model | Inherit adapter from another model |
Optional Fields
Section titled “Optional Fields”| Field | Type | Default | Description |
|---|---|---|---|
max_sequence_length | int | 512 | Maximum input tokens |
pooling | string | null | Pooling strategy: cls, mean, last_token, splade, none |
normalize | bool | true | L2-normalize output embeddings |
max_batch_tokens | int | 16384 | Maximum tokens per batch |
compute_precision | string | null | Override precision: float16, bfloat16, float32 |
Profiles
Section titled “Profiles”Profiles define named combinations of runtime options. One profile must have is_default: true.
profiles: default: is_default: true sparse: output_types: - sparse banking: lora: saivamshiatukuri/bge-m3-banking77-lora instruction: "Classify banking intent"Adapter Options
Section titled “Adapter Options”Options split into loadtime (require reload) and runtime (per-request override):
adapter_options_loadtime: attn_implementation: sdpa compute_precision: bfloat16
adapter_options_runtime: query_template: 'Instruct: {instruction}\nQuery:{text}' default_instruction: "Retrieve relevant passages"Available Adapters
Section titled “Available Adapters”| Adapter | Use Case |
|---|---|
sie_server.adapters.pytorch_embedding:PyTorchEmbeddingAdapter | Standard embedding models |
sie_server.adapters.bge_m3_flash:BGEM3FlashAdapter | BGE-M3 with flash attention |
sie_server.adapters.cross_encoder:CrossEncoderAdapter | Reranking models |
sie_server.adapters.gliner:GLiNERAdapter | Entity extraction models |
sie_server.adapters.clip:CLIPAdapter | CLIP vision-text models |
sie_server.adapters.colbert:ColBERTAdapter | Multi-vector (ColBERT) models |
Complete Example
Section titled “Complete Example”A full config with profiles, targets, and runtime options:
name: sentence-transformers/all-MiniLM-L6-v2hf_id: sentence-transformers/all-MiniLM-L6-v2adapter: sie_server.adapters.pytorch_embedding:PyTorchEmbeddingAdapterinputs: - textoutputs: - densedims: dense: 384max_sequence_length: 256pooling: meannormalize: truemax_batch_tokens: 16384
profiles: default: is_default: true
adapter_options_runtime: pooling: mean normalize: trueTesting Your Model
Section titled “Testing Your Model”After creating the config, verify the model loads and produces correct outputs.
1. Start the server
Section titled “1. Start the server”docker run --gpus all -p 8080:8080 \ -v /path/to/custom-models:/app/models:ro \ ghcr.io/superlinked/sie:latest2. Check model is listed
Section titled “2. Check model is listed”curl http://localhost:8080/v1/models | jq '.data[].id'3. Generate embeddings
Section titled “3. Generate embeddings”from sie_sdk import SIEClientfrom sie_sdk.types import Item
client = SIEClient("http://localhost:8080")result = client.encode("my-org/my-model", Item(text="test input"))print(result["dense"].shape) # Should match dims.dense4. Run quality eval
Section titled “4. Run quality eval”mise run eval my-org/my-model -t mteb/NanoFiQA2018Retrieval --type quality -s sieHot Reload
Section titled “Hot Reload”The server monitors the models directory for changes. Add new configs without restarting:
- Create a new
models/{org}-{name}.yamlfile - The server detects the new config automatically
- Model weights load on first request
For Docker, the mounted volume updates are detected. Changes to existing configs require a server restart.
What’s Next
Section titled “What’s Next”- Model Catalog - browse 85+ supported models
- Benchmarking - evaluate model quality and performance