Overview

Dense embeddings are fixed-dimension float vectors that capture semantic meaning. They power similarity search, RAG pipelines, and recommendation systems.

Quick Example

Python
TypeScript

from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

result = client.encode("BAAI/bge-m3", Item(text="Your text here"))
print(f"Dimensions: {len(result['dense'])}")  # 1024

import { SIEClient } from "@sie/sdk";

const client = new SIEClient("http://localhost:8080");

const result = await client.encode("BAAI/bge-m3", { text: "Your text here" });
console.log(`Dimensions: ${result.dense?.length}`);  // 1024

await client.close();

When to Use Dense Embeddings

Use dense embeddings when:

You need semantic similarity (not exact keyword matching)
Your vector database supports dense vectors (most do)
Storage is not extremely constrained

Consider alternatives when:

You need hybrid search → Sparse embeddings
You need maximum retrieval quality → Multi-vector (ColBERT)
You’re working with images → Multimodal embeddings

Basic Usage

Single Item

Pass a single Item to get a single result:

Python
TypeScript

result = client.encode("BAAI/bge-m3", Item(text="Hello world"))
print(result["dense"][:5])  # First 5 dimensions
# [0.0234, -0.0891, 0.1234, ...]

const result = await client.encode("BAAI/bge-m3", { text: "Hello world" });
console.log(result.dense?.slice(0, 5));  // First 5 dimensions
// Float32Array [0.0234, -0.0891, 0.1234, ...]

Batch Encoding

Pass a list of items for efficient batch processing:

Python
TypeScript

items = [
    Item(text="First document"),
    Item(text="Second document"),
    Item(text="Third document"),
]

results = client.encode("BAAI/bge-m3", items)

for i, result in enumerate(results):
    print(f"Doc {i}: {len(result['dense'])} dimensions")

const items = [
  { text: "First document" },
  { text: "Second document" },
  { text: "Third document" },
];

const results = await client.encode("BAAI/bge-m3", items);

results.forEach((result, i) => {
  console.log(`Doc ${i}: ${result.dense?.length} dimensions`);
});

The server batches requests automatically for GPU efficiency.

With Item IDs

Track which result corresponds to which input:

Python
TypeScript

items = [
    Item(id="doc-1", text="First document"),
    Item(id="doc-2", text="Second document"),
]

results = client.encode("BAAI/bge-m3", items)

for result in results:
    print(f"{result['id']}: {len(result['dense'])} dims")

const items = [
  { id: "doc-1", text: "First document" },
  { id: "doc-2", text: "Second document" },
];

const results = await client.encode("BAAI/bge-m3", items);

for (const result of results) {
  console.log(`${result.id}: ${result.dense?.length} dims`);
}

Query vs Document Encoding

Many models perform better when you distinguish queries from documents. Queries are short questions. Documents are the content you search over.

Asymmetric Models

For asymmetric models, set is_query=True:

Python
TypeScript

# Encode query (short, question-like)
query = client.encode(
    "BAAI/bge-m3",
    Item(text="What is machine learning?"),
    is_query=True,
)

# Encode documents (longer, content)
documents = client.encode(
    "BAAI/bge-m3",
    [Item(text="Machine learning is..."), Item(text="Deep learning uses...")],
)

// Encode query (short, question-like)
const query = await client.encode(
  "BAAI/bge-m3",
  { text: "What is machine learning?" },
  { isQuery: true }
);

// Encode documents (longer, content)
const documents = await client.encode(
  "BAAI/bge-m3",
  [{ text: "Machine learning is..." }, { text: "Deep learning uses..." }]
);

Instruction-Tuned Models

Some models accept explicit instructions:

Python
TypeScript

result = client.encode(
    "Alibaba-NLP/gte-Qwen2-1.5B-instruct",
    Item(text="What is Python?"),
    instruction="Represent this query for retrieving programming tutorials:"
)

const result = await client.encode(
  "Alibaba-NLP/gte-Qwen2-1.5B-instruct",
  { text: "What is Python?" },
  { instruction: "Represent this query for retrieving programming tutorials:" }
);

Output Types

By default, encode returns dense embeddings. Request multiple output types:

Python
TypeScript

# Dense only (default)
result = client.encode("BAAI/bge-m3", Item(text="text"))
print(result["dense"])  # numpy array

# Multiple outputs
result = client.encode(
    "BAAI/bge-m3",
    Item(text="text"),
    output_types=["dense", "sparse", "multivector"]
)
print(result["dense"])       # 1024-dim float array
print(result["sparse"])      # {"indices": [...], "values": [...]}
print(result["multivector"]) # [num_tokens, 1024] array

// Dense only (default)
const result = await client.encode("BAAI/bge-m3", { text: "text" });
console.log(result.dense);  // Float32Array

// Multiple outputs
const multiResult = await client.encode(
  "BAAI/bge-m3",
  { text: "text" },
  { outputTypes: ["dense", "sparse", "multivector"] }
);
console.log(multiResult.dense);       // Float32Array (1024 dims)
console.log(multiResult.sparse);      // { indices: Int32Array, values: Float32Array }
console.log(multiResult.multivector); // Float32Array[] (per-token embeddings)

Not all models support all output types. BGE-M3 supports all three. Most models support only dense.

Response Format

The EncodeResult is a TypedDict containing:

Field	Type	Description
`id`	`str \| None`	Item ID if provided
`dense`	`NDArray[float32]`	Dense embedding vector
`sparse`	`SparseResult \| None`	Sparse indices and values
`multivector`	`NDArray[float32] \| None`	Per-token embeddings
`timing`	`TimingInfo`	Request timing breakdown

Python
TypeScript

result = client.encode("BAAI/bge-m3", Item(text="text"))

# Access fields (TypedDict syntax)
embedding = result["dense"]           # numpy array
dimensions = len(result["dense"])     # e.g., 1024

const result = await client.encode("BAAI/bge-m3", { text: "text" });

// Access fields (object property syntax)
const embedding = result.dense;           // Float32Array
const dimensions = result.dense?.length;  // e.g., 1024

Starting Models

These models work well for general-purpose embedding. Run mise run eval --print for benchmark data, or see the full catalog.

Model	Dims	Max Length	Notes
`BAAI/bge-m3`	1024	8192	Multilingual, also supports sparse and multivector
`intfloat/e5-base-v2`	768	512	Balanced quality and speed
`sentence-transformers/all-MiniLM-L6-v2`	384	256	Fast, lightweight

Models perform differently on different tasks. Identify a benchmark task similar to your problem, or create custom eval tasks. See Evals.

HTTP API

The server defaults to msgpack for efficient numpy array transport. To use JSON, set the Accept header:

curl -X POST http://localhost:8080/v1/encode/BAAI/bge-m3 \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{"items": [{"text": "Your text here"}]}'

What’s Next

Sparse embeddings - for hybrid search
Multi-vector embeddings - ColBERT and late interaction
Model Catalog - all supported embedding models