Skip to content
Why did we open-source our inference engine? Read the post

Multi-Vector Reranking

ColBERT-style models can rerank via MaxSim scoring. This uses pre-computed multi-vector embeddings instead of cross-encoder forward passes.

MaxSim computes the maximum similarity between each query token embedding and all document token embeddings, then sums across query tokens. This gives a fine-grained relevance score without requiring a cross-encoder forward pass per document.

from sie_sdk import SIEClient
from sie_sdk.types import Item
from sie_sdk.scoring import maxsim
client = SIEClient("http://localhost:8080")
# Encode query and documents with multivector output
query_result = client.encode(
"jinaai/jina-colbert-v2",
Item(text="What is ColBERT?"),
output_types=["multivector"],
is_query=True,
)
doc_results = client.encode(
"jinaai/jina-colbert-v2",
documents,
output_types=["multivector"]
)
# Score with MaxSim
query_mv = query_result["multivector"]
doc_mvs = [r["multivector"] for r in doc_results]
scores = maxsim(query_mv, doc_mvs)
# Rank by score
ranked = sorted(enumerate(scores), key=lambda x: -x[1])
FactorMaxSim (ColBERT)Cross-Encoder
SpeedFast — reuses cached embeddingsSlower — forward pass per pair
Pre-computationEmbeddings can be storedMust recompute for each query
QualityStrong token-level matchingDeeper cross-attention
Best forLarge candidate sets, real-timeSmall candidate sets, max quality

Use MaxSim when you already have multi-vector embeddings stored (e.g., from indexing with ColBERT). Use cross-encoders when you need the highest possible quality on a small candidate set.

See Multi-vector embeddings for details on encoding with ColBERT models.