Choosing a Model

SIE supports 85+ models. This guide helps you pick the right one based on your use case, language requirements, and performance needs.

Quick Recommendations

Use Case	Recommended Model	Why
English-only, balanced	`NovaSearch/stella_en_400M_v5`	Strong MTEB scores, efficient size
English-only, max quality	`nvidia/NV-Embed-v2`	Top MTEB scores, 4096 dims
Speed-optimized	`sentence-transformers/all-MiniLM-L6-v2`	22M params, 384 dims, very fast
Multilingual	`BAAI/bge-m3`	100+ languages, also supports sparse + multivector
Hybrid search	`BAAI/bge-m3` or `naver/splade-v3`	Dense + sparse from one model, or dedicated sparse
Late interaction (ColBERT)	`jinaai/jina-colbert-v2`	Best ColBERT quality, multilingual
Vision / image search	`google/siglip-so400m-patch14-384`	Image-text similarity
Multilingual, fast	`Qwen/Qwen3-Embedding-0.6B`	1024 dims, 32K context, 100+ languages
Document vision (PDF)	`vidore/colpali-v1.3-hf`	Visual document retrieval
ColBERT reranking	`answerdotai/answerai-colbert-small-v1`	Fast MaxSim reranking; also `jina-colbert-v2`, `GTE-ModernColBERT-v1`
Reranking (multilingual)	`BAAI/bge-reranker-v2-m3`	Strong cross-language reranking
Reranking (English)	`mixedbread-ai/mxbai-rerank-large-v2`	High quality, 8192 max length
Entity extraction	`urchade/gliner_multi-v2.1`	Zero-shot NER, multilingual

Decision Guide

Use Case	Scenario	Recommended Models
Semantic Search / RAG	English-only	`stella_en_400M_v5`, `NV-Embed-v2`, `all-MiniLM-L6-v2`
	Multilingual	`BAAI/bge-m3`
	Hybrid (dense + sparse)	`BAAI/bge-m3` + `naver/splade-v3`
Image Search	Text ↔ Image	`SigLIP`, `CLIP`
	Visual docs	`ColPali`
Reranking	Multilingual	`BAAI/bge-reranker-v2-m3`
	English	`mixedbread-ai/mxbai-rerank-large-v2`
Entity Extraction	NER	`GLiNER`
	Relations	`GLiREL`
	Classification	`GLiClass`

Tradeoff Axes

Quality vs Speed vs Memory

Model	Params	Dims	VRAM	Relative Speed	Quality
all-MiniLM-L6-v2	22M	384	~200MB	Fastest	Good
stella_en_400M_v5	400M	1024	~1.5GB	Fast	Very good
bge-m3	568M	1024	~2GB	Fast	Very good
NV-Embed-v2	7B	4096	~14GB	Slow	Best

Rule of thumb: For English, start with stella_en_400M_v5. For multilingual or hybrid search, use BAAI/bge-m3. Only move to 7B+ models if benchmarks show a meaningful gap on your data.

Dense vs Sparse vs Multi-vector

Output Type	Storage	Search Speed	Quality	Best For
Dense	Small (1024 floats)	Fast	Good	Standard semantic search
Sparse	Variable	Fast	Good for keywords	Hybrid search, keyword matching
Multi-vector (ColBERT)	Large (N * 128 floats)	Slower	Best	When accuracy is critical

Recommendation: Use dense for most cases. Add sparse for hybrid search if you need keyword matching. Use multi-vector only when you need the best possible retrieval quality and can afford the storage.

Language Support

Language Need	Models
English only	Stella, NV-Embed-v2, all-MiniLM, GTE-Qwen2
Multilingual (100+ languages)	BGE-M3, multilingual-e5-large, Qwen3-Embedding-0.6B
Chinese-focused	GTE-Qwen2, BGE-M3

GPU Memory Planning

GPU	VRAM	Models That Fit
T4	16GB	Most models up to ~1B params
L4	24GB	All standard models, 2-3 loaded simultaneously
A100 40GB	40GB	Large models, 5+ loaded simultaneously
A100 80GB	80GB	7B+ parameter models (NV-Embed-v2, e5-mistral-7b)

With LRU eviction, you can serve all 85+ models from a single GPU — only the most recently used models stay in memory.

When to Add Reranking

Almost always. Two-stage retrieval (retrieve with embeddings, then rerank with a cross-encoder) consistently improves quality:

Retrieve 20-50 candidates with dense embeddings (fast)
Rerank to top 5-10 with a cross-encoder (more accurate)

The reranker sees both query and document together, enabling deeper semantic matching than embedding similarity alone.

# Stage 1: Fast retrieval
results = vector_db.search(query_embedding, k=20)

# Stage 2: Accurate reranking
reranked = client.score(
    "mixedbread-ai/mxbai-rerank-large-v2",
    query=Item(text="What is machine learning?"),
    items=[Item(text=r.text) for r in results]
)

When to Use Hybrid Search

Add sparse embeddings when your data has:

Domain-specific terminology that dense models might miss
Exact keyword matching requirements (product codes, identifiers)
Mixed content where some queries are keyword-like and others are semantic

# Get both dense and sparse from one model
result = client.encode(
    "BAAI/bge-m3",
    Item(text="your text"),
    output_types=["dense", "sparse"]
)
# Use dense for semantic search, sparse for keyword matching
# Combine scores for hybrid retrieval

What’s Next

Model Catalog - full list of all supported models
Sparse Embeddings - hybrid search patterns
Multi-vector / ColBERT - late interaction retrieval
Quantization - reduce embedding size for storage