LlamaIndex
The sie-llamaindex package (Python) and @sie/llamaindex package (TypeScript) provide drop-in components for LlamaIndex. Use SIEEmbedding for vector stores and SIENodePostprocessor for reranking.
Installation
Section titled “Installation”pip install sie-llamaindexThis installs sie-sdk and llama-index-core as dependencies.
pnpm add @sie/llamaindexThis installs @sie/sdk and llamaindex as dependencies.
Start the Server
Section titled “Start the Server”# Docker (recommended)docker run -p 8080:8080 ghcr.io/superlinked/sie:latest
# Or with GPUdocker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie:latestEmbeddings
Section titled “Embeddings”SIEEmbedding implements LlamaIndex’s BaseEmbedding interface. Set it as the default embed model or use it directly.
from llama_index.core import Settingsfrom sie_llamaindex import SIEEmbedding
# Set as default embedding modelSettings.embed_model = SIEEmbedding( base_url="http://localhost:8080", model_name="BAAI/bge-m3")
# Or use directlyembed_model = SIEEmbedding(model_name="BAAI/bge-m3")embedding = embed_model.get_text_embedding("Your text here")print(len(embedding)) # 1024import { Settings } from "llamaindex";import { SIEEmbedding } from "@sie/llamaindex";
// Set as default embedding modelSettings.embedModel = new SIEEmbedding({ baseUrl: "http://localhost:8080", modelName: "BAAI/bge-m3",});
// Or use directlyconst embedModel = new SIEEmbedding({ modelName: "BAAI/bge-m3" });const embedding = await embedModel.getTextEmbedding("Your text here");console.log(embedding.length); // 1024With VectorStoreIndex
Section titled “With VectorStoreIndex”from llama_index.core import Settings, VectorStoreIndex, Documentfrom sie_llamaindex import SIEEmbedding
Settings.embed_model = SIEEmbedding(model_name="BAAI/bge-m3")
documents = [ Document(text="Machine learning uses algorithms to learn from data."), Document(text="The weather is sunny today."),]
index = VectorStoreIndex.from_documents(documents)results = index.as_query_engine().query("What is machine learning?")import { Settings, VectorStoreIndex, Document } from "llamaindex";import { SIEEmbedding } from "@sie/llamaindex";
Settings.embedModel = new SIEEmbedding({ modelName: "BAAI/bge-m3" });
const documents = [ new Document({ text: "Machine learning uses algorithms to learn from data." }), new Document({ text: "The weather is sunny today." }),];
const index = await VectorStoreIndex.fromDocuments(documents);const queryEngine = index.asQueryEngine();const results = await queryEngine.query({ query: "What is machine learning?" });Async Support
Section titled “Async Support”Both sync and async methods are available:
# Syncembedding = embed_model.get_text_embedding(text)embeddings = embed_model.get_text_embedding_batch(texts)
# Asyncembedding = await embed_model.aget_text_embedding(text)query_embedding = await embed_model.aget_query_embedding(query)All methods are async by default:
// Single textconst embedding = await embedModel.getTextEmbedding(text);
// Multiple textsconst embeddings = await embedModel.getTextEmbeddings(texts);Reranking
Section titled “Reranking”SIENodePostprocessor implements BaseNodePostprocessor. Use it to rerank retrieved nodes.
from llama_index.core.schema import NodeWithScore, TextNode, QueryBundlefrom sie_llamaindex import SIENodePostprocessor
reranker = SIENodePostprocessor( base_url="http://localhost:8080", model="jinaai/jina-reranker-v2-base-multilingual", top_n=3)
nodes = [ NodeWithScore(node=TextNode(text="Machine learning is a subset of AI."), score=0.5), NodeWithScore(node=TextNode(text="The weather is sunny today."), score=0.6), NodeWithScore(node=TextNode(text="Deep learning uses neural networks."), score=0.4),]
reranked = reranker.postprocess_nodes(nodes, QueryBundle(query_str="What is ML?"))
for node in reranked: print(f"{node.score:.3f}: {node.node.get_content()[:50]}")import { NodeWithScore, TextNode, QueryBundle } from "llamaindex";import { SIENodePostprocessor } from "@sie/llamaindex";
const reranker = new SIENodePostprocessor({ baseUrl: "http://localhost:8080", model: "jinaai/jina-reranker-v2-base-multilingual", topN: 3,});
const nodes = [ new NodeWithScore({ node: new TextNode({ text: "Machine learning is a subset of AI." }), score: 0.5 }), new NodeWithScore({ node: new TextNode({ text: "The weather is sunny today." }), score: 0.6 }), new NodeWithScore({ node: new TextNode({ text: "Deep learning uses neural networks." }), score: 0.4 }),];
const reranked = await reranker.postprocessNodes(nodes, new QueryBundle({ queryStr: "What is ML?" }));
for (const node of reranked) { console.log(`${node.score?.toFixed(3)}: ${node.node.getContent().slice(0, 50)}`);}With Query Engine
Section titled “With Query Engine”from llama_index.core import VectorStoreIndexfrom sie_llamaindex import SIENodePostprocessor
reranker = SIENodePostprocessor( model="jinaai/jina-reranker-v2-base-multilingual", top_n=5)
# Create query engine with rerankingquery_engine = index.as_query_engine( node_postprocessors=[reranker], similarity_top_k=20 # Retrieve 20, rerank to 5)
response = query_engine.query("What is machine learning?")import { VectorStoreIndex } from "llamaindex";import { SIENodePostprocessor } from "@sie/llamaindex";
const reranker = new SIENodePostprocessor({ model: "jinaai/jina-reranker-v2-base-multilingual", topN: 5,});
// Create query engine with rerankingconst queryEngine = index.asQueryEngine({ nodePostprocessors: [reranker], similarityTopK: 20, // Retrieve 20, rerank to 5});
const response = await queryEngine.query({ query: "What is machine learning?" });Hybrid Search
Section titled “Hybrid Search”Use SIESparseEmbeddingFunction with vector stores that support hybrid search.
from llama_index.vector_stores.qdrant import QdrantVectorStorefrom qdrant_client import QdrantClientfrom sie_llamaindex import SIEEmbedding, SIESparseEmbeddingFunction
# Create sparse embedding functionsparse_embed_fn = SIESparseEmbeddingFunction( base_url="http://localhost:8080", model_name="BAAI/bge-m3")
# Create hybrid vector storeclient = QdrantClient(":memory:")vector_store = QdrantVectorStore( client=client, collection_name="hybrid_docs", enable_hybrid=True, sparse_embedding_function=sparse_embed_fn)import { QdrantVectorStore } from "llamaindex";import { QdrantClient } from "@qdrant/js-client-rest";import { SIEEmbedding, SIESparseEmbeddingFunction } from "@sie/llamaindex";
// Create sparse embedding functionconst sparseEmbedFn = new SIESparseEmbeddingFunction({ baseUrl: "http://localhost:8080", modelName: "BAAI/bge-m3",});
// Create hybrid vector storeconst client = new QdrantClient({ url: "http://localhost:6333" });const vectorStore = new QdrantVectorStore({ client, collectionName: "hybrid_docs", enableHybrid: true, sparseEmbeddingFunction: sparseEmbedFn,});Full RAG Pipeline
Section titled “Full RAG Pipeline”Complete example combining embeddings, reranking, and LLM generation:
from llama_index.core import Settings, VectorStoreIndex, Documentfrom llama_index.llms.openai import OpenAIfrom sie_llamaindex import SIEEmbedding, SIENodePostprocessor
# 1. Configure SIE embeddingsSettings.embed_model = SIEEmbedding( base_url="http://localhost:8080", model_name="BAAI/bge-m3")Settings.llm = OpenAI(model="gpt-4o-mini")
# 2. Create documents and indexdocuments = [ Document(text="Machine learning is a branch of artificial intelligence."), Document(text="Neural networks are inspired by biological neurons."), Document(text="Deep learning uses multiple layers of neural networks."), Document(text="Python is popular for machine learning development."),]
index = VectorStoreIndex.from_documents(documents)
# 3. Create rerankerreranker = SIENodePostprocessor( base_url="http://localhost:8080", model="jinaai/jina-reranker-v2-base-multilingual", top_n=2)
# 4. Build query engine with rerankingquery_engine = index.as_query_engine( node_postprocessors=[reranker], similarity_top_k=10 # Retrieve 10, rerank to 2)
# 5. Queryresponse = query_engine.query("What is deep learning?")print(response)import { Settings, VectorStoreIndex, Document, OpenAI } from "llamaindex";import { SIEEmbedding, SIENodePostprocessor } from "@sie/llamaindex";
// 1. Configure SIE embeddingsSettings.embedModel = new SIEEmbedding({ baseUrl: "http://localhost:8080", modelName: "BAAI/bge-m3",});Settings.llm = new OpenAI({ model: "gpt-4o-mini" });
// 2. Create documents and indexconst documents = [ new Document({ text: "Machine learning is a branch of artificial intelligence." }), new Document({ text: "Neural networks are inspired by biological neurons." }), new Document({ text: "Deep learning uses multiple layers of neural networks." }), new Document({ text: "Python is popular for machine learning development." }),];
const index = await VectorStoreIndex.fromDocuments(documents);
// 3. Create rerankerconst reranker = new SIENodePostprocessor({ baseUrl: "http://localhost:8080", model: "jinaai/jina-reranker-v2-base-multilingual", topN: 2,});
// 4. Build query engine with rerankingconst queryEngine = index.asQueryEngine({ nodePostprocessors: [reranker], similarityTopK: 10, // Retrieve 10, rerank to 2});
// 5. Queryconst response = await queryEngine.query({ query: "What is deep learning?" });console.log(response.toString());Configuration Options
Section titled “Configuration Options”SIEEmbedding
Section titled “SIEEmbedding”| Parameter | Type | Default | Description |
|---|---|---|---|
base_url | str | http://localhost:8080 | SIE server URL |
model_name | str | BAAI/bge-m3 | Model to use |
instruction | str | None | Instruction prefix for encoding |
output_dtype | str | None | Output dtype: float32, float16, int8, binary |
gpu | str | None | Target GPU type for routing |
timeout_s | float | 180.0 | Request timeout in seconds |
embed_batch_size | int | 10 | Batch size for embedding multiple texts |
| Parameter | Type | Default | Description |
|---|---|---|---|
baseUrl | string | http://localhost:8080 | SIE server URL |
modelName | string | BAAI/bge-m3 | Model to use |
instruction | string | undefined | Instruction prefix for encoding |
outputDtype | DType | undefined | Output dtype: float32, float16, int8, binary |
gpu | string | undefined | Target GPU type for routing |
timeout | number | 180000 | Request timeout in milliseconds |
embedBatchSize | number | 10 | Batch size for embedding multiple texts |
SIENodePostprocessor
Section titled “SIENodePostprocessor”| Parameter | Type | Default | Description |
|---|---|---|---|
base_url | str | http://localhost:8080 | SIE server URL |
model | str | jinaai/jina-reranker-v2-base-multilingual | Reranker model |
top_n | int | None | Number of nodes to return |
gpu | str | None | Target GPU type for routing |
timeout_s | float | 180.0 | Request timeout in seconds |
| Parameter | Type | Default | Description |
|---|---|---|---|
baseUrl | string | http://localhost:8080 | SIE server URL |
model | string | jinaai/jina-reranker-v2-base-multilingual | Reranker model |
topN | number | undefined | Number of nodes to return |
gpu | string | undefined | Target GPU type for routing |
timeout | number | 180000 | Request timeout in milliseconds |
What’s Next
Section titled “What’s Next”- Rerank Results - cross-encoder reranking details
- Model Catalog - all supported embedding models