LangChain
The sie-langchain package (Python) and @sie/langchain package (TypeScript) provide drop-in components for LangChain. Python supports embeddings, sparse search, reranking, and extraction. TypeScript supports embeddings and sparse search.
Installation
Section titled “Installation”pip install sie-langchainThis installs sie-sdk and langchain-core as dependencies.
pnpm add @sie/langchainThis installs @sie/sdk and @langchain/core as dependencies.
Start the Server
Section titled “Start the Server”# Docker (recommended)docker run -p 8080:8080 ghcr.io/superlinked/sie:default
# Or with GPUdocker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie:defaultEmbeddings
Section titled “Embeddings”SIEEmbeddings implements LangChain’s Embeddings interface. Use it with any vector store.
from sie_langchain import SIEEmbeddings
embeddings = SIEEmbeddings( base_url="http://localhost:8080", model="BAAI/bge-m3")
# Embed documentsvectors = embeddings.embed_documents([ "Machine learning uses algorithms to learn from data.", "The weather is sunny today."])print(len(vectors)) # 2
# Embed a queryquery_vector = embeddings.embed_query("What is machine learning?")print(len(query_vector)) # 1024import { SIEEmbeddings } from "@sie/langchain";
const embeddings = new SIEEmbeddings({ baseUrl: "http://localhost:8080", model: "BAAI/bge-m3",});
// Embed documentsconst vectors = await embeddings.embedDocuments([ "Machine learning uses algorithms to learn from data.", "The weather is sunny today.",]);console.log(vectors.length); // 2
// Embed a queryconst queryVector = await embeddings.embedQuery("What is machine learning?");console.log(queryVector.length); // 1024Any model SIE supports for dense embeddings works — just change the model parameter:
# Stella (1024-dim, strong quality)embeddings = SIEEmbeddings(model="NovaSearch/stella_en_400M_v5")
# Nomic MoE (768-dim)embeddings = SIEEmbeddings(model="nomic-ai/nomic-embed-text-v2-moe")
# E5 (1024-dim) — SIE handles query vs document encoding automaticallyembeddings = SIEEmbeddings(model="intfloat/e5-large-v2")See the Model Catalog for all 85+ supported models.
With ChromaDB
Section titled “With ChromaDB”from langchain_chroma import Chromafrom sie_langchain import SIEEmbeddings
embeddings = SIEEmbeddings(model="BAAI/bge-m3")
# Create vector storevectorstore = Chroma.from_texts( texts=["Document one", "Document two"], embedding=embeddings)
# Searchresults = vectorstore.similarity_search("query", k=2)import { Chroma } from "@langchain/community/vectorstores/chroma";import { SIEEmbeddings } from "@sie/langchain";
const embeddings = new SIEEmbeddings({ model: "BAAI/bge-m3" });
// Create vector storeconst vectorstore = await Chroma.fromTexts( ["Document one", "Document two"], [], embeddings);
// Searchconst results = await vectorstore.similaritySearch("query", 2);Async Support
Section titled “Async Support”Both sync and async methods are available:
# Syncvectors = embeddings.embed_documents(texts)query_vec = embeddings.embed_query(text)
# Asyncvectors = await embeddings.aembed_documents(texts)query_vec = await embeddings.aembed_query(text)All methods are async by default:
// All methods return Promisesconst vectors = await embeddings.embedDocuments(texts);const queryVec = await embeddings.embedQuery(text);Reranking (Python only)
Section titled “Reranking (Python only)”SIEReranker implements BaseDocumentCompressor. Use it to rerank retrieved documents.
from langchain_core.documents import Documentfrom sie_langchain import SIEReranker
reranker = SIEReranker( base_url="http://localhost:8080", model="jinaai/jina-reranker-v2-base-multilingual", top_k=3)
documents = [ Document(page_content="Machine learning is a subset of AI."), Document(page_content="The weather is sunny today."), Document(page_content="Deep learning uses neural networks."),]
reranked = reranker.compress_documents(documents, "What is ML?")
for doc in reranked: score = doc.metadata.get("relevance_score", 0) print(f"{score:.3f}: {doc.page_content[:50]}")With ContextualCompressionRetriever
Section titled “With ContextualCompressionRetriever”from langchain.retrievers import ContextualCompressionRetrieverfrom sie_langchain import SIEReranker
reranker = SIEReranker(model="jinaai/jina-reranker-v2-base-multilingual", top_k=5)
compression_retriever = ContextualCompressionRetriever( base_compressor=reranker, base_retriever=vectorstore.as_retriever(search_kwargs={"k": 20}))
# Retrieves 20 docs, reranks, returns top 5results = compression_retriever.invoke("What is machine learning?")Hybrid Search
Section titled “Hybrid Search”Use SIESparseEncoder with SIEEmbeddings for hybrid dense+sparse search.
from langchain_pinecone import PineconeHybridSearchRetrieverfrom sie_langchain import SIEEmbeddings, SIESparseEncoder
retriever = PineconeHybridSearchRetriever( embeddings=SIEEmbeddings(model="BAAI/bge-m3"), sparse_encoder=SIESparseEncoder(model="BAAI/bge-m3"), index=pinecone_index)
results = retriever.invoke("hybrid search query")import { PineconeHybridSearchRetriever } from "@langchain/pinecone";import { SIEEmbeddings, SIESparseEncoder } from "@sie/langchain";
const retriever = new PineconeHybridSearchRetriever({ embeddings: new SIEEmbeddings({ model: "BAAI/bge-m3" }), sparseEncoder: new SIESparseEncoder({ model: "BAAI/bge-m3" }), index: pineconeIndex,});
const results = await retriever.invoke("hybrid search query");Full RAG Pipeline
Section titled “Full RAG Pipeline”Complete example combining embeddings, reranking, and LLM generation:
from langchain_chroma import Chromafrom langchain_core.output_parsers import StrOutputParserfrom langchain_core.prompts import ChatPromptTemplatefrom langchain_core.runnables import RunnablePassthroughfrom langchain_openai import ChatOpenAIfrom langchain.retrievers import ContextualCompressionRetrieverfrom sie_langchain import SIEEmbeddings, SIEReranker
# 1. Create embeddings and vector storeembeddings = SIEEmbeddings( base_url="http://localhost:8080", model="BAAI/bge-m3")
documents = [ "Machine learning is a branch of artificial intelligence.", "Neural networks are inspired by biological neurons.", "Deep learning uses multiple layers of neural networks.", "Python is popular for machine learning development.",]
vectorstore = Chroma.from_texts(texts=documents, embedding=embeddings)
# 2. Create two-stage retriever with rerankingreranker = SIEReranker( base_url="http://localhost:8080", model="jinaai/jina-reranker-v2-base-multilingual", top_k=2)
retriever = ContextualCompressionRetriever( base_compressor=reranker, base_retriever=vectorstore.as_retriever(search_kwargs={"k": 10}))
# 3. Build RAG chaintemplate = """Answer based on the context:
Context: {context}
Question: {question}"""
prompt = ChatPromptTemplate.from_template(template)llm = ChatOpenAI(model="gpt-4o-mini")
def format_docs(docs): return "\n".join(doc.page_content for doc in docs)
chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser())
# 4. Queryanswer = chain.invoke("What is deep learning?")print(answer)Entity Extraction (Python only)
Section titled “Entity Extraction (Python only)”SIEExtractor provides zero-shot entity extraction as a LangChain Runnable.
from sie_langchain import SIEExtractor
extractor = SIEExtractor( base_url="http://localhost:8080", model="urchade/gliner_multi-v2.1", labels=["person", "organization", "location"])
result = extractor.invoke("Tim Cook announced new products at Apple Park in Cupertino.")for entity in result["entities"]: print(f"{entity['label']}: {entity['text']} ({entity['score']:.2f})")# person: Tim Cook (0.96)# organization: Apple (0.91)# location: Cupertino (0.88)Configuration Options
Section titled “Configuration Options”SIEEmbeddings
Section titled “SIEEmbeddings”| Parameter | Type | Default | Description |
|---|---|---|---|
base_url | str | http://localhost:8080 | SIE server URL |
model | str | BAAI/bge-m3 | Model to use |
instruction | str | None | Instruction prefix for encoding |
output_dtype | str | None | Output dtype: float32, float16, int8, binary |
gpu | str | None | Target GPU type for routing |
timeout_s | float | 180.0 | Request timeout in seconds |
| Parameter | Type | Default | Description |
|---|---|---|---|
baseUrl | string | http://localhost:8080 | SIE server URL |
model | string | BAAI/bge-m3 | Model to use |
instruction | string | undefined | Instruction prefix for encoding |
outputDtype | DType | undefined | Output dtype: float32, float16, int8, binary |
gpu | string | undefined | Target GPU type for routing |
timeout | number | 180000 | Request timeout in milliseconds |
SIEReranker (Python only)
Section titled “SIEReranker (Python only)”| Parameter | Type | Default | Description |
|---|---|---|---|
base_url | str | http://localhost:8080 | SIE server URL |
model | str | jinaai/jina-reranker-v2-base-multilingual | Reranker model |
top_k | int | None | Number of documents to return |
gpu | str | None | Target GPU type for routing |
options | dict | None | Model-specific options |
timeout_s | float | 180.0 | Request timeout in seconds |
SIEExtractor (Python only)
Section titled “SIEExtractor (Python only)”| Parameter | Type | Default | Description |
|---|---|---|---|
base_url | str | http://localhost:8080 | SIE server URL |
model | str | urchade/gliner_multi-v2.1 | Extraction model |
labels | list[str] | ["person", "organization", "location"] | Default entity labels |
gpu | str | None | Target GPU type for routing |
options | dict | None | Model-specific options |
timeout_s | float | 180.0 | Request timeout in seconds |
What’s Next
Section titled “What’s Next”- Rerank Results - cross-encoder reranking details
- Model Catalog - all supported embedding models
- Troubleshooting - common errors and solutions