Why did we open-source our inference engine? Read the post

Self-hosted inference
for search & document processing

50x cheaper vs managed model APIs
Quality boost from 85+ SOTA models
Data doesn't leave your AWS/GCP
# Configure
module "sie" {
  source = "superlinked/sie/aws"
  region = "us-east-1"
  gpus   = ["a100-40gb", "l4-spot"]
}
# Deploy
terraform apply
helm install sie ghcr.io/superlinked/charts/sie
# Use
curl $URL/v1/encode/bge-m3?lora=legal \
    -d '{"text": "indemnification clause"}'
# Configure
module "sie" {
  source = "superlinked/sie/gcp"
  region = "us-east-1"
  gpus   = ["a100-40gb", "l4-spot"]
}
# Deploy
terraform apply
helm install sie ghcr.io/superlinked/charts/sie
# Use
curl $URL/v1/encode/bge-m3?lora=legal \
    -d '{"text": "indemnification clause"}'
# Configure
module "sie" {
  source = "superlinked/sie/local"
  region = "us-east-1"
  gpus   = ["a100-40gb", "l4-spot"]
}
# Deploy
terraform apply
helm install sie ghcr.io/superlinked/charts/sie
# Use
curl $URL/v1/encode/bge-m3?lora=legal \
    -d '{"text": "indemnification clause"}'

SIE: Superlinked Inference Engine

Run all your Search & Document processing inference in one centralized cluster across teams and workloads.

SIE SDKs

Build your apps

> pip install sie-sdk
> npm install @sie/sdk

and 5+ framework integrations

Manage models & configurations via code

config.update()
SIE Cluster

Deploy the cluster

> helm install sie
    ghcr.io/superlinked/
        charts/sie

Observe with cloud-native tools, grafana and

> sie-admin top
SIE Infra

Create the infrastructure

module "sie" {
  source = "superlinked/sie/aws"
  region = "us-east-1"
  gpus   = ["a100-40gb", "A10-spot"]
}

Deploy

> terraform apply
SIE Architecture

Plan your self-deployment

SIE deployment architecture

How SIE fits in your stack

See where SIE sits in a typical retrieval pipeline alongside vector databases, orchestration frameworks, and your application layer.

Cost Comparison

Compare across models, GPU types, and cloud providers.

ProviderPrice per billion tokensComplexity
OpenAI API1M tokensNone
Modal + TEI20M tokensMedium
Your Cloud + SIE47M tokensLow
deployment documentation

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.