Skip to content
Why did we open-source our inference engine? Read the post

Bundles

Python ML libraries often have conflicting dependency requirements. Models using trust_remote_code=True may depend on specific transformers versions. SIE solves this with bundles. Each bundle is a self-contained environment with compatible dependencies.

For example:

  • sentence-transformers requires transformers>=4.57
  • gliner requires transformers>=4.51.3,<5
  • These cannot coexist in the same environment

Bundles group models with compatible dependencies into separate Docker images.


BundlePurposeKey Models
defaultStandard modelsBGE-M3, E5, Qwen3, Stella, GritLM, ColBERT
glinerGLiNER ecosystem modelsGLiNER, GLiREL, GLiClass, NuNER
sglangLarge LLM embeddingsgte-Qwen2-7B, E5-Mistral-7B, Qwen3-4B
florence2Vision-language modelsFlorence-2, Donut

The default bundle includes most models using transformers>=4.57. This is the recommended starting point.

Included models:

  • Dense: BAAI/bge-m3, intfloat/e5-*, Alibaba-NLP/gte-multilingual-base, Alibaba-NLP/gte-Qwen2-1.5B-instruct
  • Stella: NovaSearch/stella_en_400M_v5, NovaSearch/stella_en_1.5B_v5
  • GritLM: GritLM/GritLM-7B
  • Qwen3: Qwen/Qwen3-Embedding-0.6B
  • NVIDIA: nvidia/NV-Embed-v2
  • Sparse: OpenSearch neural sparse, SPLADE variants, Granite sparse
  • ColBERT: jinaai/jina-colbert-v2, answerdotai/answerai-colbert-small-v1

Named entity recognition, relation extraction, and zero-shot classification models from the GLiNER ecosystem. Requires gliner, glirel, and gliclass libraries with transformers>=4.51.3,<5.

Included models:

  • NER: urchade/gliner_*, EmergentMethods/gliner_large_news-v2.1
  • Biomedical NER: Ihor/gliner-biomed-large-v1.0
  • Relation extraction: jackboyla/glirel-large-v0
  • Zero-shot classification: knowledgator/gliclass-*
  • Span detection: numind/NuNER_Zero, numind/NuNER_Zero-span

Large LLM embedding models (4B+ parameters) using SGLang backend for memory efficiency.

Included models:

  • Alibaba-NLP/gte-Qwen2-7B-instruct
  • Qwen/Qwen3-Embedding-4B
  • intfloat/e5-mistral-7b-instruct
  • Linq-AI-Research/Linq-Embed-Mistral
  • Salesforce/SFR-Embedding-Mistral, Salesforce/SFR-Embedding-2_R
  • nvidia/llama-embed-nemotron-8b

Microsoft Florence-2 and Donut vision-language models. Requires timm for the DaViT vision encoder.

Included models:

  • microsoft/Florence-2-base, microsoft/Florence-2-large
  • microsoft/Florence-2-base-ft
  • mynkchaudhry/Florence-2-FT-DocVQA
  • naver-clova-ix/donut-base-finetuned-cord-v2 (receipt parsing)
  • naver-clova-ix/donut-base-finetuned-docvqa (document QA)
  • naver-clova-ix/donut-base-finetuned-rvlcdip (document classification)

Each bundle has a corresponding Docker image tag. One image per bundle.

Terminal window
# Default bundle (recommended)
docker run -p 8080:8080 ghcr.io/superlinked/sie:default
# With GPU
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie:default
# GLiNER bundle for NER/relation extraction
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie:gliner
# SGLang bundle for large LLM models
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie:sglang
# Florence-2 bundle for vision models
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie:florence2

Choose a bundle based on the models you need:

  1. Start with default - covers most use cases including Stella, GritLM, and GTE-Qwen2-1.5B
  2. Use gliner for named entity recognition, relation extraction, or zero-shot classification
  3. Use sglang for memory-efficient large LLM embeddings (e.g. gte-Qwen2-7B)
  4. Use florence2 for document understanding and OCR

Models are loaded on first request. The bundle only determines which models are available.