LlamaIndex

The sie-llamaindex package (Python) and @sie/llamaindex package (TypeScript) provide drop-in components for LlamaIndex. Use SIEEmbedding for vector stores and SIENodePostprocessor for reranking.

Installation

Python
TypeScript

pip install sie-llamaindex

This installs sie-sdk and llama-index-core as dependencies.

pnpm add @sie/llamaindex

This installs @sie/sdk and llamaindex as dependencies.

Start the Server

# Docker (recommended)
docker run -p 8080:8080 ghcr.io/superlinked/sie:latest

# Or with GPU
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie:latest

Embeddings

SIEEmbedding implements LlamaIndex’s BaseEmbedding interface. Set it as the default embed model or use it directly.

Python
TypeScript

from llama_index.core import Settings
from sie_llamaindex import SIEEmbedding

# Set as default embedding model
Settings.embed_model = SIEEmbedding(
    base_url="http://localhost:8080",
    model_name="BAAI/bge-m3"
)

# Or use directly
embed_model = SIEEmbedding(model_name="BAAI/bge-m3")
embedding = embed_model.get_text_embedding("Your text here")
print(len(embedding))  # 1024

import { Settings } from "llamaindex";
import { SIEEmbedding } from "@sie/llamaindex";

// Set as default embedding model
Settings.embedModel = new SIEEmbedding({
  baseUrl: "http://localhost:8080",
  modelName: "BAAI/bge-m3",
});

// Or use directly
const embedModel = new SIEEmbedding({ modelName: "BAAI/bge-m3" });
const embedding = await embedModel.getTextEmbedding("Your text here");
console.log(embedding.length); // 1024

from llama_index.core import Settings, VectorStoreIndex, Document
from sie_llamaindex import SIEEmbedding

Settings.embed_model = SIEEmbedding(model_name="BAAI/bge-m3")

documents = [
    Document(text="Machine learning uses algorithms to learn from data."),
    Document(text="The weather is sunny today."),
]

index = VectorStoreIndex.from_documents(documents)
results = index.as_query_engine().query("What is machine learning?")

import { Settings, VectorStoreIndex, Document } from "llamaindex";
import { SIEEmbedding } from "@sie/llamaindex";

Settings.embedModel = new SIEEmbedding({ modelName: "BAAI/bge-m3" });

const documents = [
  new Document({ text: "Machine learning uses algorithms to learn from data." }),
  new Document({ text: "The weather is sunny today." }),
];

const index = await VectorStoreIndex.fromDocuments(documents);
const queryEngine = index.asQueryEngine();
const results = await queryEngine.query({ query: "What is machine learning?" });

Async Support

Python
TypeScript

Both sync and async methods are available:

# Sync
embedding = embed_model.get_text_embedding(text)
embeddings = embed_model.get_text_embedding_batch(texts)

# Async
embedding = await embed_model.aget_text_embedding(text)
query_embedding = await embed_model.aget_query_embedding(query)

All methods are async by default:

// Single text
const embedding = await embedModel.getTextEmbedding(text);

// Multiple texts
const embeddings = await embedModel.getTextEmbeddings(texts);

Reranking

SIENodePostprocessor implements BaseNodePostprocessor. Use it to rerank retrieved nodes.

Python
TypeScript

from llama_index.core.schema import NodeWithScore, TextNode, QueryBundle
from sie_llamaindex import SIENodePostprocessor

reranker = SIENodePostprocessor(
    base_url="http://localhost:8080",
    model="jinaai/jina-reranker-v2-base-multilingual",
    top_n=3
)

nodes = [
    NodeWithScore(node=TextNode(text="Machine learning is a subset of AI."), score=0.5),
    NodeWithScore(node=TextNode(text="The weather is sunny today."), score=0.6),
    NodeWithScore(node=TextNode(text="Deep learning uses neural networks."), score=0.4),
]

reranked = reranker.postprocess_nodes(nodes, QueryBundle(query_str="What is ML?"))

for node in reranked:
    print(f"{node.score:.3f}: {node.node.get_content()[:50]}")

import { NodeWithScore, TextNode, QueryBundle } from "llamaindex";
import { SIENodePostprocessor } from "@sie/llamaindex";

const reranker = new SIENodePostprocessor({
  baseUrl: "http://localhost:8080",
  model: "jinaai/jina-reranker-v2-base-multilingual",
  topN: 3,
});

const nodes = [
  new NodeWithScore({ node: new TextNode({ text: "Machine learning is a subset of AI." }), score: 0.5 }),
  new NodeWithScore({ node: new TextNode({ text: "The weather is sunny today." }), score: 0.6 }),
  new NodeWithScore({ node: new TextNode({ text: "Deep learning uses neural networks." }), score: 0.4 }),
];

const reranked = await reranker.postprocessNodes(nodes, new QueryBundle({ queryStr: "What is ML?" }));

for (const node of reranked) {
  console.log(`${node.score?.toFixed(3)}: ${node.node.getContent().slice(0, 50)}`);
}

With Query Engine

Python
TypeScript

from llama_index.core import VectorStoreIndex
from sie_llamaindex import SIENodePostprocessor

reranker = SIENodePostprocessor(
    model="jinaai/jina-reranker-v2-base-multilingual",
    top_n=5
)

# Create query engine with reranking
query_engine = index.as_query_engine(
    node_postprocessors=[reranker],
    similarity_top_k=20  # Retrieve 20, rerank to 5
)

response = query_engine.query("What is machine learning?")

import { VectorStoreIndex } from "llamaindex";
import { SIENodePostprocessor } from "@sie/llamaindex";

const reranker = new SIENodePostprocessor({
  model: "jinaai/jina-reranker-v2-base-multilingual",
  topN: 5,
});

// Create query engine with reranking
const queryEngine = index.asQueryEngine({
  nodePostprocessors: [reranker],
  similarityTopK: 20, // Retrieve 20, rerank to 5
});

const response = await queryEngine.query({ query: "What is machine learning?" });

Hybrid Search

Use SIESparseEmbeddingFunction with vector stores that support hybrid search.

Python
TypeScript

from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from sie_llamaindex import SIEEmbedding, SIESparseEmbeddingFunction

# Create sparse embedding function
sparse_embed_fn = SIESparseEmbeddingFunction(
    base_url="http://localhost:8080",
    model_name="BAAI/bge-m3"
)

# Create hybrid vector store
client = QdrantClient(":memory:")
vector_store = QdrantVectorStore(
    client=client,
    collection_name="hybrid_docs",
    enable_hybrid=True,
    sparse_embedding_function=sparse_embed_fn
)

import { QdrantVectorStore } from "llamaindex";
import { QdrantClient } from "@qdrant/js-client-rest";
import { SIEEmbedding, SIESparseEmbeddingFunction } from "@sie/llamaindex";

// Create sparse embedding function
const sparseEmbedFn = new SIESparseEmbeddingFunction({
  baseUrl: "http://localhost:8080",
  modelName: "BAAI/bge-m3",
});

// Create hybrid vector store
const client = new QdrantClient({ url: "http://localhost:6333" });
const vectorStore = new QdrantVectorStore({
  client,
  collectionName: "hybrid_docs",
  enableHybrid: true,
  sparseEmbeddingFunction: sparseEmbedFn,
});

Full RAG Pipeline

Complete example combining embeddings, reranking, and LLM generation:

Python
TypeScript

from llama_index.core import Settings, VectorStoreIndex, Document
from llama_index.llms.openai import OpenAI
from sie_llamaindex import SIEEmbedding, SIENodePostprocessor

# 1. Configure SIE embeddings
Settings.embed_model = SIEEmbedding(
    base_url="http://localhost:8080",
    model_name="BAAI/bge-m3"
)
Settings.llm = OpenAI(model="gpt-4o-mini")

# 2. Create documents and index
documents = [
    Document(text="Machine learning is a branch of artificial intelligence."),
    Document(text="Neural networks are inspired by biological neurons."),
    Document(text="Deep learning uses multiple layers of neural networks."),
    Document(text="Python is popular for machine learning development."),
]

index = VectorStoreIndex.from_documents(documents)

# 3. Create reranker
reranker = SIENodePostprocessor(
    base_url="http://localhost:8080",
    model="jinaai/jina-reranker-v2-base-multilingual",
    top_n=2
)

# 4. Build query engine with reranking
query_engine = index.as_query_engine(
    node_postprocessors=[reranker],
    similarity_top_k=10  # Retrieve 10, rerank to 2
)

# 5. Query
response = query_engine.query("What is deep learning?")
print(response)

import { Settings, VectorStoreIndex, Document, OpenAI } from "llamaindex";
import { SIEEmbedding, SIENodePostprocessor } from "@sie/llamaindex";

// 1. Configure SIE embeddings
Settings.embedModel = new SIEEmbedding({
  baseUrl: "http://localhost:8080",
  modelName: "BAAI/bge-m3",
});
Settings.llm = new OpenAI({ model: "gpt-4o-mini" });

// 2. Create documents and index
const documents = [
  new Document({ text: "Machine learning is a branch of artificial intelligence." }),
  new Document({ text: "Neural networks are inspired by biological neurons." }),
  new Document({ text: "Deep learning uses multiple layers of neural networks." }),
  new Document({ text: "Python is popular for machine learning development." }),
];

const index = await VectorStoreIndex.fromDocuments(documents);

// 3. Create reranker
const reranker = new SIENodePostprocessor({
  baseUrl: "http://localhost:8080",
  model: "jinaai/jina-reranker-v2-base-multilingual",
  topN: 2,
});

// 4. Build query engine with reranking
const queryEngine = index.asQueryEngine({
  nodePostprocessors: [reranker],
  similarityTopK: 10, // Retrieve 10, rerank to 2
});

// 5. Query
const response = await queryEngine.query({ query: "What is deep learning?" });
console.log(response.toString());

Configuration Options

SIEEmbedding

Python
TypeScript

Parameter	Type	Default	Description
`base_url`	`str`	`http://localhost:8080`	SIE server URL
`model_name`	`str`	`BAAI/bge-m3`	Model to use
`instruction`	`str`	`None`	Instruction prefix for encoding
`output_dtype`	`str`	`None`	Output dtype: float32, float16, int8, binary
`gpu`	`str`	`None`	Target GPU type for routing
`timeout_s`	`float`	`180.0`	Request timeout in seconds
`embed_batch_size`	`int`	`10`	Batch size for embedding multiple texts

Parameter	Type	Default	Description
`baseUrl`	`string`	`http://localhost:8080`	SIE server URL
`modelName`	`string`	`BAAI/bge-m3`	Model to use
`instruction`	`string`	`undefined`	Instruction prefix for encoding
`outputDtype`	`DType`	`undefined`	Output dtype: float32, float16, int8, binary
`gpu`	`string`	`undefined`	Target GPU type for routing
`timeout`	`number`	`180000`	Request timeout in milliseconds
`embedBatchSize`	`number`	`10`	Batch size for embedding multiple texts

SIENodePostprocessor

Python
TypeScript

Parameter	Type	Default	Description
`base_url`	`str`	`http://localhost:8080`	SIE server URL
`model`	`str`	`jinaai/jina-reranker-v2-base-multilingual`	Reranker model
`top_n`	`int`	`None`	Number of nodes to return
`gpu`	`str`	`None`	Target GPU type for routing
`timeout_s`	`float`	`180.0`	Request timeout in seconds

Parameter	Type	Default	Description
`baseUrl`	`string`	`http://localhost:8080`	SIE server URL
`model`	`string`	`jinaai/jina-reranker-v2-base-multilingual`	Reranker model
`topN`	`number`	`undefined`	Number of nodes to return
`gpu`	`string`	`undefined`	Target GPU type for routing
`timeout`	`number`	`180000`	Request timeout in milliseconds

What’s Next

Rerank Results - cross-encoder reranking details
Model Catalog - all supported embedding models