Skip to content
SIE

Sparse & Hybrid Search

Sparse vectors capture lexical (keyword) signals. Unlike dense embeddings that compress meaning into fixed-size vectors (384 to 4096+ dimensions depending on the model), sparse vectors assign weights directly to vocabulary tokens. This enables exact term matching alongside semantic search.

from sie_sdk import SIEClient
from sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
result = client.encode(
"BAAI/bge-m3",
Item(text="machine learning algorithms"),
output_types=["sparse"]
)
# Sparse vector: token IDs -> weights
sparse = result["sparse"]
print(f"Non-zero tokens: {len(sparse['indices'])}")

Use sparse when:

  • Exact term matching matters (product names, proper nouns, acronyms)
  • You want hybrid search (combining dense + sparse)
  • Your domain has specialized vocabulary

Stick to dense when:

  • Pure semantic search is sufficient
  • Storage is constrained (sparse vectors are larger)
  • You’re not using a vector database that supports sparse

Sparse vectors contain:

  • indices: Token IDs from the model’s vocabulary
  • values: Weights for each token (higher = more important)
result = client.encode("BAAI/bge-m3", Item(text="hello world"), output_types=["sparse"])
sparse = result["sparse"]
# {"indices": array([101, 2023, ...]), "values": array([0.45, 0.32, ...])}
# Reconstruct as dict
sparse_dict = dict(zip(sparse["indices"], sparse["values"]))

BGE-M3 produces dense, sparse, and multi-vector outputs simultaneously:

result = client.encode(
"BAAI/bge-m3",
Item(text="What is machine learning?"),
output_types=["dense", "sparse"]
)
# Dense: 1024-dimensional semantic embedding
print(f"Dense: {len(result['dense'])} dims")
# Sparse: lexical signal
print(f"Sparse: {len(result['sparse']['indices'])} non-zero terms")

This is more efficient than calling separate dense and sparse models.

Combine dense and sparse scores for retrieval:

from sie_sdk import SIEClient
from sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
query = Item(text="Python programming tutorial")
# Get both embeddings
result = client.encode(
"BAAI/bge-m3",
query,
output_types=["dense", "sparse"],
is_query=True,
)
# Store both in your vector database
# Most databases support hybrid search with weighted combination:
# final_score = alpha * dense_score + (1 - alpha) * sparse_score

SPLADE models are purpose-built for sparse retrieval:

# SPLADE-v3
result = client.encode(
"naver/splade-v3",
Item(text="neural information retrieval"),
output_types=["sparse"]
)
# OpenSearch neural sparse
result = client.encode(
"opensearch-project/opensearch-neural-sparse-encoding-v2-distill",
Item(text="search query"),
output_types=["sparse"]
)

SPLADE uses MLM (masked language model) head to predict term importance.

ModelVocabularyNotes
BAAI/bge-m3250,002Also supports dense + multivector
naver/splade-v330,522Sparse-focused, BERT vocabulary
naver/splade-cocondenser-selfdistil30,522Balanced
opensearch-project/opensearch-neural-sparse-*30,522OpenSearch integration

Sparse vectors require database support. Compatible options:

DatabaseSparse Support
ElasticsearchYes (native)
OpenSearchYes (neural sparse)
QdrantYes (sparse vectors)
WeaviateYes (hybrid)
MilvusYes (sparse index)
PineconeYes (hybrid)

The server defaults to msgpack. For JSON, set the Accept header:

Terminal window
curl -X POST http://localhost:8080/v1/encode/BAAI/bge-m3 \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{"items": [{"text": "sparse query"}], "params": {"output_types": ["sparse"]}}'