Sparse & Hybrid Search

Sparse vectors capture lexical (keyword) signals. Unlike dense embeddings that compress meaning into fixed-size vectors (384 to 4096+ dimensions depending on the model), sparse vectors assign weights directly to vocabulary tokens. This enables exact term matching alongside semantic search.

from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

result = client.encode(
    "BAAI/bge-m3",
    Item(text="machine learning algorithms"),
    output_types=["sparse"]
)

# Sparse vector: token IDs -> weights
sparse = result["sparse"]
print(f"Non-zero tokens: {len(sparse['indices'])}")

import { SIEClient } from "@sie/sdk";

const client = new SIEClient("http://localhost:8080");

const result = await client.encode(
  "BAAI/bge-m3",
  { text: "machine learning algorithms" },
  { outputTypes: ["sparse"] }
);

// Sparse vector: token IDs -> weights
const sparse = result.sparse;
console.log(`Non-zero tokens: ${sparse?.indices.length}`);

await client.close();

When to Use Sparse Embeddings

Use sparse when:

Exact term matching matters (product names, proper nouns, acronyms)
You want hybrid search (combining dense + sparse)
Your domain has specialized vocabulary

Stick to dense when:

Pure semantic search is sufficient
Storage is constrained (sparse vectors are larger)
You’re not using a vector database that supports sparse

Sparse Vector Format

Sparse vectors contain:

indices: Token IDs from the model’s vocabulary
values: Weights for each token (higher = more important)

Python
TypeScript

result = client.encode("BAAI/bge-m3", Item(text="hello world"), output_types=["sparse"])

sparse = result["sparse"]
# {"indices": array([101, 2023, ...]), "values": array([0.45, 0.32, ...])}

# Reconstruct as dict
sparse_dict = dict(zip(sparse["indices"], sparse["values"]))

const result = await client.encode(
  "BAAI/bge-m3",
  { text: "hello world" },
  { outputTypes: ["sparse"] }
);

const sparse = result.sparse;
// { indices: Int32Array([101, 2023, ...]), values: Float32Array([0.45, 0.32, ...]) }

// Reconstruct as Map
const sparseMap = new Map<number, number>();
if (sparse) {
  for (let i = 0; i < sparse.indices.length; i++) {
    sparseMap.set(sparse.indices[i], sparse.values[i]);
  }
}

BGE-M3: Dense + Sparse in One Call

BGE-M3 produces dense, sparse, and multi-vector outputs simultaneously:

Python
TypeScript

result = client.encode(
    "BAAI/bge-m3",
    Item(text="What is machine learning?"),
    output_types=["dense", "sparse"]
)

# Dense: 1024-dimensional semantic embedding
print(f"Dense: {len(result['dense'])} dims")

# Sparse: lexical signal
print(f"Sparse: {len(result['sparse']['indices'])} non-zero terms")

const result = await client.encode(
  "BAAI/bge-m3",
  { text: "What is machine learning?" },
  { outputTypes: ["dense", "sparse"] }
);

// Dense: 1024-dimensional semantic embedding
console.log(`Dense: ${result.dense?.length} dims`);

// Sparse: lexical signal
console.log(`Sparse: ${result.sparse?.indices.length} non-zero terms`);

This is more efficient than calling separate dense and sparse models.

Hybrid Search Pattern

Combine dense and sparse scores for retrieval:

Python
TypeScript

from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

query = Item(text="Python programming tutorial")

# Get both embeddings
result = client.encode(
    "BAAI/bge-m3",
    query,
    output_types=["dense", "sparse"],
    is_query=True,
)

# Store both in your vector database
# Most databases support hybrid search with weighted combination:
#   final_score = alpha * dense_score + (1 - alpha) * sparse_score

import { SIEClient } from "@sie/sdk";

const client = new SIEClient("http://localhost:8080");

const query = { text: "Python programming tutorial" };

// Get both embeddings
const result = await client.encode(
  "BAAI/bge-m3",
  query,
  {
    outputTypes: ["dense", "sparse"],
    isQuery: true,
  }
);

// Store both in your vector database
// Most databases support hybrid search with weighted combination:
//   final_score = alpha * dense_score + (1 - alpha) * sparse_score

await client.close();

SPLADE Models

SPLADE models are purpose-built for sparse retrieval:

Python
TypeScript

# SPLADE-v3
result = client.encode(
    "naver/splade-v3",
    Item(text="neural information retrieval"),
    output_types=["sparse"]
)

# OpenSearch neural sparse
result = client.encode(
    "opensearch-project/opensearch-neural-sparse-encoding-v2-distill",
    Item(text="search query"),
    output_types=["sparse"]
)

// SPLADE-v3
const result = await client.encode(
  "naver/splade-v3",
  { text: "neural information retrieval" },
  { outputTypes: ["sparse"] }
);

// OpenSearch neural sparse
const osResult = await client.encode(
  "opensearch-project/opensearch-neural-sparse-encoding-v2-distill",
  { text: "search query" },
  { outputTypes: ["sparse"] }
);

SPLADE uses MLM (masked language model) head to predict term importance.

Sparse Models

Model	Vocabulary	Notes
`BAAI/bge-m3`	250,002	Also supports dense + multivector
`naver/splade-v3`	30,522	Sparse-focused, BERT vocabulary
`naver/splade-cocondenser-selfdistil`	30,522	Balanced
`opensearch-project/opensearch-neural-sparse-*`	30,522	OpenSearch integration

Vector Database Support

Sparse vectors require database support. Compatible options:

Database	Sparse Support
Elasticsearch	Yes (native)
OpenSearch	Yes (neural sparse)
Qdrant	Yes (sparse vectors)
Weaviate	Yes (hybrid)
Milvus	Yes (sparse index)
Pinecone	Yes (hybrid)

HTTP API

The server defaults to msgpack. For JSON, set the Accept header:

curl -X POST http://localhost:8080/v1/encode/BAAI/bge-m3 \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{"items": [{"text": "sparse query"}], "params": {"output_types": ["sparse"]}}'

What’s Next

Dense embeddings - when sparse isn’t needed
Multi-vector embeddings - ColBERT for maximum quality