Multi-Vector Reranking
ColBERT-style models can rerank via MaxSim scoring. This uses pre-computed multi-vector embeddings instead of cross-encoder forward passes.
MaxSim Scoring
Section titled “MaxSim Scoring”MaxSim computes the maximum similarity between each query token embedding and all document token embeddings, then sums across query tokens. This gives a fine-grained relevance score without requiring a cross-encoder forward pass per document.
from sie_sdk import SIEClientfrom sie_sdk.types import Itemfrom sie_sdk.scoring import maxsim
client = SIEClient("http://localhost:8080")
# Encode query and documents with multivector outputquery_result = client.encode( "jinaai/jina-colbert-v2", Item(text="What is ColBERT?"), output_types=["multivector"], is_query=True,)
doc_results = client.encode( "jinaai/jina-colbert-v2", documents, output_types=["multivector"])
# Score with MaxSimquery_mv = query_result["multivector"]doc_mvs = [r["multivector"] for r in doc_results]scores = maxsim(query_mv, doc_mvs)
# Rank by scoreranked = sorted(enumerate(scores), key=lambda x: -x[1])import { SIEClient, maxsim } from "@sie/sdk";
const client = new SIEClient("http://localhost:8080");
// Encode query and documents with multivector outputconst queryResult = await client.encode( "jinaai/jina-colbert-v2", { text: "What is ColBERT?" }, { outputTypes: ["multivector"], isQuery: true });
const docResults = await client.encode( "jinaai/jina-colbert-v2", documents, { outputTypes: ["multivector"] });
// Score with MaxSim using SDK helperconst queryMv = queryResult.multivector!;const scores = docResults.map((r) => maxsim(queryMv, r.multivector!));
// Rank by scoreconst ranked = scores .map((score, idx) => ({ idx, score })) .sort((a, b) => b.score - a.score);
await client.close();When to Use MaxSim vs Cross-Encoders
Section titled “When to Use MaxSim vs Cross-Encoders”| Factor | MaxSim (ColBERT) | Cross-Encoder |
|---|---|---|
| Speed | Fast — reuses cached embeddings | Slower — forward pass per pair |
| Pre-computation | Embeddings can be stored | Must recompute for each query |
| Quality | Strong token-level matching | Deeper cross-attention |
| Best for | Large candidate sets, real-time | Small candidate sets, max quality |
Use MaxSim when you already have multi-vector embeddings stored (e.g., from indexing with ColBERT). Use cross-encoders when you need the highest possible quality on a small candidate set.
See Multi-vector embeddings for details on encoding with ColBERT models.
What’s Next
Section titled “What’s Next”- Reranker models — model selection guide
- Multi-vector embeddings — encoding with ColBERT models
- Full model catalog — all supported models