Skip to content
Why did we open-source our inference engine? Read the post

Overview

SIE provides three integration paths: native SDK for full feature access, framework adapters for RAG pipelines, and OpenAI compatibility for drop-in migration.

OptionFeaturesBest For
Framework AdaptersDense, sparse, reranking, extractionChroma, CrewAI, DSPy, Haystack, LangChain, LlamaIndex, Qdrant, Weaviate
Native SDKAll features, full controlCustom pipelines, advanced use cases
OpenAI CompatibilityDense onlyMigrating existing OpenAI code

SIE provides native packages for popular frameworks and vector stores in Python, and three in TypeScript.

FrameworkPackageEmbeddingsSparseRerankingExtraction
Chromasie-chromaYesYesNoNo
CrewAIsie-crewaiNoYesYesYes
DSPysie-dspyYesYesYesYes
Haystacksie-haystackYesYesNoNo
LangChainsie-langchainYesYesYesYes
LlamaIndexsie-llamaindexYesYesYesYes
Qdrantsie-qdrantYesYesNoNo
Weaviatesie-weaviateYesYesNoNo
FrameworkPackageEmbeddingsSparseReranking
Chroma@sie/chromaYesYesNo
LangChain.js@sie/langchainYesYesNo
LlamaIndex.ts@sie/llamaindexYesYesNo

Use framework adapters when:

  • You’re building a RAG pipeline with one of these frameworks
  • You need sparse embeddings for hybrid search
  • You need reranking to improve retrieval quality

The native SDK provides full access to all SIE features: dense, sparse, multi-vector embeddings, reranking, and entity extraction.

Terminal window
pip install sie-sdk
from sie_sdk import SIEClient
from sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
# All output types
result = client.encode(
"BAAI/bge-m3",
Item(text="Your text"),
output_types=["dense", "sparse", "multivector"]
)
# Reranking
scores = client.score(
"BAAI/bge-reranker-v2-m3",
query=Item(text="What is AI?"),
items=[Item(text="AI is..."), Item(text="Weather is...")]
)
# Entity extraction
entities = client.extract(
"urchade/gliner_multi-v2.1",
Item(text="Tim Cook leads Apple."),
labels=["person", "organization"]
)

Use the native SDK when:

  • You’re building a custom pipeline without a framework
  • You need multi-vector (ColBERT) output
  • You need entity extraction
  • You want fine-grained control over batching and timing

SIE exposes /v1/embeddings matching OpenAI’s API format. Existing OpenAI code works with a URL change.

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed")
response = client.embeddings.create(
model="BAAI/bge-m3",
input=["Your text here", "Another text"]
)
for item in response.data:
print(f"Index {item.index}: {len(item.embedding)} dimensions")

Use OpenAI compatibility when:

  • You have existing code using the OpenAI SDK
  • You only need dense embeddings
  • You want zero code changes beyond the URL

Limitations: Only dense embeddings. No sparse, multi-vector, reranking, or extraction.


FeatureFramework AdaptersNative SDKOpenAI Compat
Dense embeddingsYesYesYes
Sparse embeddingsMostYesNo
Multi-vector (ColBERT)NoYesNo
RerankingLangChain, LlamaIndexYesNo
Entity extractionLangChain, LlamaIndex, CrewAI, DSPyYesNo

  • Chroma - embedding functions for ChromaDB
  • CrewAI - sparse embeddings for hybrid search
  • DSPy - embedder for DSPy retrievers
  • Haystack - dense and sparse embedders
  • LangChain - embeddings, reranking, and extraction for LangChain
  • LlamaIndex - embeddings and reranking for LlamaIndex
  • Qdrant - dense and sparse embeddings for Qdrant
  • Weaviate - dense and named vectors for Weaviate
  • SDK Reference - full SDK documentation