Chroma
The sie-chroma package (Python) and @sie/chroma package (TypeScript) provide embedding functions for ChromaDB. Use SIEEmbeddingFunction for dense embeddings in standard collections. Use SIESparseEmbeddingFunction for hybrid search on Chroma Cloud.
Installation
Section titled “Installation”pip install sie-chromaThis installs sie-sdk and chromadb as dependencies.
pnpm add @sie/chromaThis installs @sie/sdk and chromadb as dependencies.
Start the Server
Section titled “Start the Server”# Docker (recommended)docker run -p 8080:8080 ghcr.io/superlinked/sie:latest
# Or with GPUdocker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie:latestEmbedding Function
Section titled “Embedding Function”SIEEmbeddingFunction implements ChromaDB’s EmbeddingFunction protocol. Use it when creating or querying collections.
from sie_chroma import SIEEmbeddingFunction
embedding_function = SIEEmbeddingFunction( base_url="http://localhost:8080", model="BAAI/bge-m3",)import { SIEEmbeddingFunction } from "@sie/chroma";
const embeddingFunction = new SIEEmbeddingFunction({ baseUrl: "http://localhost:8080", model: "BAAI/bge-m3",});Configuration Options
Section titled “Configuration Options”| Parameter | Type | Default | Description |
|---|---|---|---|
base_url | str | http://localhost:8080 | SIE server URL |
model | str | BAAI/bge-m3 | Model to use for embeddings |
gpu | str | None | Target GPU type for routing |
options | dict | None | Model-specific options |
timeout_s | float | 180.0 | Request timeout in seconds |
| Parameter | Type | Default | Description |
|---|---|---|---|
baseUrl | string | http://localhost:8080 | SIE server URL |
model | string | BAAI/bge-m3 | Model to use for embeddings |
gpu | string | undefined | Target GPU type for routing |
timeout | number | 180000 | Request timeout in milliseconds |
Full Example
Section titled “Full Example”Create a ChromaDB collection with SIE embeddings and perform similarity search:
import chromadbfrom sie_chroma import SIEEmbeddingFunction
# Initialize the embedding functionembedding_function = SIEEmbeddingFunction( base_url="http://localhost:8080", model="BAAI/bge-m3",)
# Create a Chroma client and collectionclient = chromadb.Client()collection = client.create_collection( name="documents", embedding_function=embedding_function,)
# Add documentscollection.add( documents=[ "Machine learning is a subset of artificial intelligence.", "Neural networks are inspired by biological neurons.", "Deep learning uses multiple layers of neural networks.", "Python is popular for machine learning development.", ], ids=["doc1", "doc2", "doc3", "doc4"],)
# Query the collectionresults = collection.query( query_texts=["What is deep learning?"], n_results=2,)
for doc, distance in zip(results["documents"][0], results["distances"][0]): print(f"{distance:.4f}: {doc}")import { ChromaClient } from "chromadb";import { SIEEmbeddingFunction } from "@sie/chroma";
// Initialize the embedding functionconst embeddingFunction = new SIEEmbeddingFunction({ baseUrl: "http://localhost:8080", model: "BAAI/bge-m3",});
// Create a Chroma client and collectionconst client = new ChromaClient();const collection = await client.createCollection({ name: "documents", embeddingFunction,});
// Add documentsawait collection.add({ documents: [ "Machine learning is a subset of artificial intelligence.", "Neural networks are inspired by biological neurons.", "Deep learning uses multiple layers of neural networks.", "Python is popular for machine learning development.", ], ids: ["doc1", "doc2", "doc3", "doc4"],});
// Query the collectionconst results = await collection.query({ queryTexts: ["What is deep learning?"], nResults: 2,});
for (let i = 0; i < results.documents[0].length; i++) { const doc = results.documents[0][i]; const distance = results.distances?.[0][i]; console.log(`${distance?.toFixed(4)}: ${doc}`);}With Persistent Storage
Section titled “With Persistent Storage”import chromadbfrom sie_chroma import SIEEmbeddingFunction
embedding_function = SIEEmbeddingFunction(model="BAAI/bge-m3")
# Use persistent storageclient = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection( name="my_collection", embedding_function=embedding_function,)import { ChromaClient } from "chromadb";import { SIEEmbeddingFunction } from "@sie/chroma";
const embeddingFunction = new SIEEmbeddingFunction({ model: "BAAI/bge-m3" });
// Use persistent storage (requires chroma server running)const client = new ChromaClient({ path: "http://localhost:8000" });
const collection = await client.getOrCreateCollection({ name: "my_collection", embeddingFunction,});Sparse Embeddings (Chroma Cloud)
Section titled “Sparse Embeddings (Chroma Cloud)”SIESparseEmbeddingFunction generates sparse embeddings for Chroma Cloud hybrid search. Use it with SparseVectorIndexConfig.
from sie_chroma import SIESparseEmbeddingFunction
sparse_ef = SIESparseEmbeddingFunction( base_url="http://localhost:8080", model="BAAI/bge-m3",)The sparse embedding function returns dict[int, float] mappings of token indices to weights. This format is compatible with Chroma Cloud’s hybrid search feature.
import { SIESparseEmbeddingFunction } from "@sie/chroma";
const sparseEf = new SIESparseEmbeddingFunction({ baseUrl: "http://localhost:8080", model: "BAAI/bge-m3",});
// Generate sparse embeddingsconst embeddings = await sparseEf.generate(["Hello world"]);console.log(embeddings[0].indices); // [1, 5, 10, ...]console.log(embeddings[0].values); // [0.5, 0.3, 0.2, ...]
// Or as dict format for Chroma Cloudconst dictEmbeddings = await sparseEf.generateAsDict(["Hello world"]);console.log(dictEmbeddings[0]); // { 1: 0.5, 5: 0.3, 10: 0.2, ... }The sparse embedding function returns { indices: number[], values: number[] } objects or Record<number, number> dicts (via generateAsDict). Both formats are compatible with Chroma Cloud’s hybrid search feature.
What’s Next
Section titled “What’s Next”- Encode Text - embedding API details and output types
- Model Catalog - all supported embedding models