Overview
Dense embeddings are fixed-dimension float vectors that capture semantic meaning. They power similarity search, RAG pipelines, and recommendation systems.
Quick Example
Section titled “Quick Example”from sie_sdk import SIEClientfrom sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
result = client.encode("BAAI/bge-m3", Item(text="Your text here"))print(f"Dimensions: {len(result['dense'])}") # 1024import { SIEClient } from "@sie/sdk";
const client = new SIEClient("http://localhost:8080");
const result = await client.encode("BAAI/bge-m3", { text: "Your text here" });console.log(`Dimensions: ${result.dense?.length}`); // 1024
await client.close();When to Use Dense Embeddings
Section titled “When to Use Dense Embeddings”Use dense embeddings when:
- You need semantic similarity (not exact keyword matching)
- Your vector database supports dense vectors (most do)
- Storage is not extremely constrained
Consider alternatives when:
- You need hybrid search → Sparse embeddings
- You need maximum retrieval quality → Multi-vector (ColBERT)
- You’re working with images → Multimodal embeddings
Basic Usage
Section titled “Basic Usage”Single Item
Section titled “Single Item”Pass a single Item to get a single result:
result = client.encode("BAAI/bge-m3", Item(text="Hello world"))print(result["dense"][:5]) # First 5 dimensions# [0.0234, -0.0891, 0.1234, ...]const result = await client.encode("BAAI/bge-m3", { text: "Hello world" });console.log(result.dense?.slice(0, 5)); // First 5 dimensions// Float32Array [0.0234, -0.0891, 0.1234, ...]Batch Encoding
Section titled “Batch Encoding”Pass a list of items for efficient batch processing:
items = [ Item(text="First document"), Item(text="Second document"), Item(text="Third document"),]
results = client.encode("BAAI/bge-m3", items)
for i, result in enumerate(results): print(f"Doc {i}: {len(result['dense'])} dimensions")const items = [ { text: "First document" }, { text: "Second document" }, { text: "Third document" },];
const results = await client.encode("BAAI/bge-m3", items);
results.forEach((result, i) => { console.log(`Doc ${i}: ${result.dense?.length} dimensions`);});The server batches requests automatically for GPU efficiency.
With Item IDs
Section titled “With Item IDs”Track which result corresponds to which input:
items = [ Item(id="doc-1", text="First document"), Item(id="doc-2", text="Second document"),]
results = client.encode("BAAI/bge-m3", items)
for result in results: print(f"{result['id']}: {len(result['dense'])} dims")const items = [ { id: "doc-1", text: "First document" }, { id: "doc-2", text: "Second document" },];
const results = await client.encode("BAAI/bge-m3", items);
for (const result of results) { console.log(`${result.id}: ${result.dense?.length} dims`);}Query vs Document Encoding
Section titled “Query vs Document Encoding”Many models perform better when you distinguish queries from documents. Queries are short questions. Documents are the content you search over.
Asymmetric Models
Section titled “Asymmetric Models”For asymmetric models, set is_query=True:
# Encode query (short, question-like)query = client.encode( "BAAI/bge-m3", Item(text="What is machine learning?"), is_query=True,)
# Encode documents (longer, content)documents = client.encode( "BAAI/bge-m3", [Item(text="Machine learning is..."), Item(text="Deep learning uses...")],)// Encode query (short, question-like)const query = await client.encode( "BAAI/bge-m3", { text: "What is machine learning?" }, { isQuery: true });
// Encode documents (longer, content)const documents = await client.encode( "BAAI/bge-m3", [{ text: "Machine learning is..." }, { text: "Deep learning uses..." }]);Instruction-Tuned Models
Section titled “Instruction-Tuned Models”Some models accept explicit instructions:
result = client.encode( "Alibaba-NLP/gte-Qwen2-1.5B-instruct", Item(text="What is Python?"), instruction="Represent this query for retrieving programming tutorials:")const result = await client.encode( "Alibaba-NLP/gte-Qwen2-1.5B-instruct", { text: "What is Python?" }, { instruction: "Represent this query for retrieving programming tutorials:" });Output Types
Section titled “Output Types”By default, encode returns dense embeddings. Request multiple output types:
# Dense only (default)result = client.encode("BAAI/bge-m3", Item(text="text"))print(result["dense"]) # numpy array
# Multiple outputsresult = client.encode( "BAAI/bge-m3", Item(text="text"), output_types=["dense", "sparse", "multivector"])print(result["dense"]) # 1024-dim float arrayprint(result["sparse"]) # {"indices": [...], "values": [...]}print(result["multivector"]) # [num_tokens, 1024] array// Dense only (default)const result = await client.encode("BAAI/bge-m3", { text: "text" });console.log(result.dense); // Float32Array
// Multiple outputsconst multiResult = await client.encode( "BAAI/bge-m3", { text: "text" }, { outputTypes: ["dense", "sparse", "multivector"] });console.log(multiResult.dense); // Float32Array (1024 dims)console.log(multiResult.sparse); // { indices: Int32Array, values: Float32Array }console.log(multiResult.multivector); // Float32Array[] (per-token embeddings)Not all models support all output types. BGE-M3 supports all three. Most models support only dense.
Response Format
Section titled “Response Format”The EncodeResult is a TypedDict containing:
| Field | Type | Description |
|---|---|---|
id | str | None | Item ID if provided |
dense | NDArray[float32] | Dense embedding vector |
sparse | SparseResult | None | Sparse indices and values |
multivector | NDArray[float32] | None | Per-token embeddings |
timing | TimingInfo | Request timing breakdown |
result = client.encode("BAAI/bge-m3", Item(text="text"))
# Access fields (TypedDict syntax)embedding = result["dense"] # numpy arraydimensions = len(result["dense"]) # e.g., 1024const result = await client.encode("BAAI/bge-m3", { text: "text" });
// Access fields (object property syntax)const embedding = result.dense; // Float32Arrayconst dimensions = result.dense?.length; // e.g., 1024Starting Models
Section titled “Starting Models”These models work well for general-purpose embedding. Run mise run eval --print for benchmark data, or see the full catalog.
| Model | Dims | Max Length | Notes |
|---|---|---|---|
BAAI/bge-m3 | 1024 | 8192 | Multilingual, also supports sparse and multivector |
intfloat/e5-base-v2 | 768 | 512 | Balanced quality and speed |
sentence-transformers/all-MiniLM-L6-v2 | 384 | 256 | Fast, lightweight |
Models perform differently on different tasks. Identify a benchmark task similar to your problem, or create custom eval tasks. See Evals.
HTTP API
Section titled “HTTP API”The server defaults to msgpack for efficient numpy array transport. To use JSON, set the Accept header:
curl -X POST http://localhost:8080/v1/encode/BAAI/bge-m3 \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -d '{"items": [{"text": "Your text here"}]}'What’s Next
Section titled “What’s Next”- Sparse embeddings - for hybrid search
- Multi-vector embeddings - ColBERT and late interaction
- Model Catalog - all supported embedding models