LangChain
The sie-langchain package (Python) and @sie/langchain package (TypeScript) provide drop-in components for LangChain. Use SIEEmbeddings for vector stores and SIEReranker for document compression.
Installation
Section titled “Installation”pip install sie-langchainThis installs sie-sdk and langchain-core as dependencies.
pnpm add @sie/langchainThis installs @sie/sdk and @langchain/core as dependencies.
Start the Server
Section titled “Start the Server”# Docker (recommended)docker run -p 8080:8080 ghcr.io/superlinked/sie:latest
# Or with GPUdocker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie:latestEmbeddings
Section titled “Embeddings”SIEEmbeddings implements LangChain’s Embeddings interface. Use it with any vector store.
from sie_langchain import SIEEmbeddings
embeddings = SIEEmbeddings( base_url="http://localhost:8080", model="BAAI/bge-m3")
# Embed documentsvectors = embeddings.embed_documents([ "Machine learning uses algorithms to learn from data.", "The weather is sunny today."])print(len(vectors)) # 2
# Embed a queryquery_vector = embeddings.embed_query("What is machine learning?")print(len(query_vector)) # 1024import { SIEEmbeddings } from "@sie/langchain";
const embeddings = new SIEEmbeddings({ baseUrl: "http://localhost:8080", model: "BAAI/bge-m3",});
// Embed documentsconst vectors = await embeddings.embedDocuments([ "Machine learning uses algorithms to learn from data.", "The weather is sunny today.",]);console.log(vectors.length); // 2
// Embed a queryconst queryVector = await embeddings.embedQuery("What is machine learning?");console.log(queryVector.length); // 1024With ChromaDB
Section titled “With ChromaDB”from langchain_chroma import Chromafrom sie_langchain import SIEEmbeddings
embeddings = SIEEmbeddings(model="BAAI/bge-m3")
# Create vector storevectorstore = Chroma.from_texts( texts=["Document one", "Document two"], embedding=embeddings)
# Searchresults = vectorstore.similarity_search("query", k=2)import { Chroma } from "@langchain/community/vectorstores/chroma";import { SIEEmbeddings } from "@sie/langchain";
const embeddings = new SIEEmbeddings({ model: "BAAI/bge-m3" });
// Create vector storeconst vectorstore = await Chroma.fromTexts( ["Document one", "Document two"], [], embeddings);
// Searchconst results = await vectorstore.similaritySearch("query", 2);Async Support
Section titled “Async Support”Both sync and async methods are available:
# Syncvectors = embeddings.embed_documents(texts)query_vec = embeddings.embed_query(text)
# Asyncvectors = await embeddings.aembed_documents(texts)query_vec = await embeddings.aembed_query(text)All methods are async by default:
// All methods return Promisesconst vectors = await embeddings.embedDocuments(texts);const queryVec = await embeddings.embedQuery(text);Reranking
Section titled “Reranking”SIEReranker implements BaseDocumentCompressor. Use it to rerank retrieved documents.
from langchain_core.documents import Documentfrom sie_langchain import SIEReranker
reranker = SIEReranker( base_url="http://localhost:8080", model="jinaai/jina-reranker-v2-base-multilingual", top_k=3)
documents = [ Document(page_content="Machine learning is a subset of AI."), Document(page_content="The weather is sunny today."), Document(page_content="Deep learning uses neural networks."),]
reranked = reranker.compress_documents(documents, "What is ML?")
for doc in reranked: score = doc.metadata.get("relevance_score", 0) print(f"{score:.3f}: {doc.page_content[:50]}")import { Document } from "@langchain/core/documents";import { SIEReranker } from "@sie/langchain";
const reranker = new SIEReranker({ baseUrl: "http://localhost:8080", model: "jinaai/jina-reranker-v2-base-multilingual", topK: 3,});
const documents = [ new Document({ pageContent: "Machine learning is a subset of AI." }), new Document({ pageContent: "The weather is sunny today." }), new Document({ pageContent: "Deep learning uses neural networks." }),];
const reranked = await reranker.compressDocuments(documents, "What is ML?");
for (const doc of reranked) { const score = doc.metadata.relevanceScore ?? 0; console.log(`${score.toFixed(3)}: ${doc.pageContent.slice(0, 50)}`);}With ContextualCompressionRetriever
Section titled “With ContextualCompressionRetriever”from langchain.retrievers import ContextualCompressionRetrieverfrom sie_langchain import SIEReranker
reranker = SIEReranker(model="jinaai/jina-reranker-v2-base-multilingual", top_k=5)
compression_retriever = ContextualCompressionRetriever( base_compressor=reranker, base_retriever=vectorstore.as_retriever(search_kwargs={"k": 20}))
# Retrieves 20 docs, reranks, returns top 5results = compression_retriever.invoke("What is machine learning?")import { ContextualCompressionRetriever } from "langchain/retrievers/contextual_compression";import { SIEReranker } from "@sie/langchain";
const reranker = new SIEReranker({ model: "jinaai/jina-reranker-v2-base-multilingual", topK: 5,});
const compressionRetriever = new ContextualCompressionRetriever({ baseCompressor: reranker, baseRetriever: vectorstore.asRetriever({ k: 20 }),});
// Retrieves 20 docs, reranks, returns top 5const results = await compressionRetriever.invoke("What is machine learning?");Hybrid Search
Section titled “Hybrid Search”Use SIESparseEncoder with SIEEmbeddings for hybrid dense+sparse search.
from langchain_pinecone import PineconeHybridSearchRetrieverfrom sie_langchain import SIEEmbeddings, SIESparseEncoder
retriever = PineconeHybridSearchRetriever( embeddings=SIEEmbeddings(model="BAAI/bge-m3"), sparse_encoder=SIESparseEncoder(model="BAAI/bge-m3"), index=pinecone_index)
results = retriever.invoke("hybrid search query")import { PineconeHybridSearchRetriever } from "@langchain/pinecone";import { SIEEmbeddings, SIESparseEncoder } from "@sie/langchain";
const retriever = new PineconeHybridSearchRetriever({ embeddings: new SIEEmbeddings({ model: "BAAI/bge-m3" }), sparseEncoder: new SIESparseEncoder({ model: "BAAI/bge-m3" }), index: pineconeIndex,});
const results = await retriever.invoke("hybrid search query");Full RAG Pipeline
Section titled “Full RAG Pipeline”Complete example combining embeddings, reranking, and LLM generation:
from langchain_chroma import Chromafrom langchain_core.output_parsers import StrOutputParserfrom langchain_core.prompts import ChatPromptTemplatefrom langchain_core.runnables import RunnablePassthroughfrom langchain_openai import ChatOpenAIfrom langchain.retrievers import ContextualCompressionRetrieverfrom sie_langchain import SIEEmbeddings, SIEReranker
# 1. Create embeddings and vector storeembeddings = SIEEmbeddings( base_url="http://localhost:8080", model="BAAI/bge-m3")
documents = [ "Machine learning is a branch of artificial intelligence.", "Neural networks are inspired by biological neurons.", "Deep learning uses multiple layers of neural networks.", "Python is popular for machine learning development.",]
vectorstore = Chroma.from_texts(texts=documents, embedding=embeddings)
# 2. Create two-stage retriever with rerankingreranker = SIEReranker( base_url="http://localhost:8080", model="jinaai/jina-reranker-v2-base-multilingual", top_k=2)
retriever = ContextualCompressionRetriever( base_compressor=reranker, base_retriever=vectorstore.as_retriever(search_kwargs={"k": 10}))
# 3. Build RAG chaintemplate = """Answer based on the context:
Context: {context}
Question: {question}"""
prompt = ChatPromptTemplate.from_template(template)llm = ChatOpenAI(model="gpt-4o-mini")
def format_docs(docs): return "\n".join(doc.page_content for doc in docs)
chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser())
# 4. Queryanswer = chain.invoke("What is deep learning?")print(answer)import { Chroma } from "@langchain/community/vectorstores/chroma";import { StringOutputParser } from "@langchain/core/output_parsers";import { ChatPromptTemplate } from "@langchain/core/prompts";import { RunnablePassthrough, RunnableSequence } from "@langchain/core/runnables";import { ChatOpenAI } from "@langchain/openai";import { ContextualCompressionRetriever } from "langchain/retrievers/contextual_compression";import { SIEEmbeddings, SIEReranker } from "@sie/langchain";import type { Document } from "@langchain/core/documents";
// 1. Create embeddings and vector storeconst embeddings = new SIEEmbeddings({ baseUrl: "http://localhost:8080", model: "BAAI/bge-m3",});
const documents = [ "Machine learning is a branch of artificial intelligence.", "Neural networks are inspired by biological neurons.", "Deep learning uses multiple layers of neural networks.", "Python is popular for machine learning development.",];
const vectorstore = await Chroma.fromTexts(documents, [], embeddings);
// 2. Create two-stage retriever with rerankingconst reranker = new SIEReranker({ baseUrl: "http://localhost:8080", model: "jinaai/jina-reranker-v2-base-multilingual", topK: 2,});
const retriever = new ContextualCompressionRetriever({ baseCompressor: reranker, baseRetriever: vectorstore.asRetriever({ k: 10 }),});
// 3. Build RAG chainconst template = `Answer based on the context:
Context: {context}
Question: {question}`;
const prompt = ChatPromptTemplate.fromTemplate(template);const llm = new ChatOpenAI({ model: "gpt-4o-mini" });
const formatDocs = (docs: Document[]): string => { return docs.map((doc) => doc.pageContent).join("\n");};
const chain = RunnableSequence.from([ { context: retriever.pipe(formatDocs), question: new RunnablePassthrough(), }, prompt, llm, new StringOutputParser(),]);
// 4. Queryconst answer = await chain.invoke("What is deep learning?");console.log(answer);Configuration Options
Section titled “Configuration Options”SIEEmbeddings
Section titled “SIEEmbeddings”| Parameter | Type | Default | Description |
|---|---|---|---|
base_url | str | http://localhost:8080 | SIE server URL |
model | str | BAAI/bge-m3 | Model to use |
instruction | str | None | Instruction prefix for encoding |
output_dtype | str | None | Output dtype: float32, float16, int8, binary |
gpu | str | None | Target GPU type for routing |
timeout_s | float | 180.0 | Request timeout in seconds |
| Parameter | Type | Default | Description |
|---|---|---|---|
baseUrl | string | http://localhost:8080 | SIE server URL |
model | string | BAAI/bge-m3 | Model to use |
instruction | string | undefined | Instruction prefix for encoding |
outputDtype | DType | undefined | Output dtype: float32, float16, int8, binary |
gpu | string | undefined | Target GPU type for routing |
timeout | number | 180000 | Request timeout in milliseconds |
SIEReranker
Section titled “SIEReranker”| Parameter | Type | Default | Description |
|---|---|---|---|
base_url | str | http://localhost:8080 | SIE server URL |
model | str | jinaai/jina-reranker-v2-base-multilingual | Reranker model |
top_k | int | None | Number of documents to return |
gpu | str | None | Target GPU type for routing |
timeout_s | float | 180.0 | Request timeout in seconds |
| Parameter | Type | Default | Description |
|---|---|---|---|
baseUrl | string | http://localhost:8080 | SIE server URL |
model | string | jinaai/jina-reranker-v2-base-multilingual | Reranker model |
topK | number | undefined | Number of documents to return |
gpu | string | undefined | Target GPU type for routing |
timeout | number | 180000 | Request timeout in milliseconds |
What’s Next
Section titled “What’s Next”- Rerank Results - cross-encoder reranking details
- Model Catalog - all supported embedding models