Why did we open-source our inference engine? Read the post
55 models
Request:
GPU:
Model Architecture Params Task Quality Throughput Latency
vidore/colqwen2.5-v0.2
Multi-Vec
Qwen2 7.0B Encode 0.7680 2 tps 1.9s
vidore/colpali-v1.3-hf
Multi-Vec
PaliGemma 3.0B Encode 0.7119 6 tps 619.1ms
Alibaba-NLP/gte-Qwen2-1.5B-instruct
Dense
Qwen2 1.8B Encode 0.6524 12.3K tps 261.1ms
NovaSearch/stella_en_1.5B_v5
Dense
Qwen2 1.5B Encode 0.9134 12.8K tps 265.9ms
laion/CLIP-ViT-H-14-laion2B-s32B-b79K
Dense
CLIP 986M Encode 0.8624 321 tps 503.8ms
google/siglip-so400m-patch14-384
Dense
SigLIP 878M Encode 0.9001 355 tps 488.3ms
google/siglip-so400m-patch14-224
Dense
SigLIP 877M Encode 0.8382 348 tps 439.8ms
Qwen/Qwen3-Embedding-0.6B
Dense
Qwen3 596M Encode 0.6538 20.6K tps 156.9ms
BAAI/bge-m3
DenseSparseMulti-Vec
XLM-RoBERTa 568M Encode 0.5726 32.3K tps 94.1ms
BAAI/bge-reranker-large
Score
XLM-RoBERTa 560M Score 0.6404 6.6K tps 41.4ms
intfloat/multilingual-e5-large
Dense
XLM-RoBERTa 560M Encode 0.5035 29.8K tps 108.6ms
intfloat/multilingual-e5-large-instruct
Dense
XLM-RoBERTa 560M Encode 0.5539 29.4K tps 106.9ms
jinaai/jina-colbert-v2
Multi-Vec
XLM-RoBERTa 559M Encode 0.7615 24.9K tps 146.1ms
nomic-ai/nomic-embed-text-v2-moe
Dense
NomicBERT 475M Encode 0.5207 13.0K tps 149.6ms
numind/NuNER_Zero
Entities
DeBERTa 449M Extract 0.6122
NovaSearch/stella_en_400M_v5
Dense
ModernBERT 435M Encode 0.8666 27.1K tps 115.7ms
EmergentMethods/gliner_large_news-v2.1
Entities
DeBERTa 435M Extract 0.5527
Ihor/gliner-biomed-large-v1.0
Entities
DeBERTa 435M Extract 0.6439
jackboyla/glirel-large-v0
Relations
DeBERTa 435M Extract 0.2639
mixedbread-ai/mxbai-rerank-large-v2
Score
Qwen2 435M Score 0.6914 2.3K tps 1.1s
urchade/gliner_large-v2.1
Entities
DeBERTa 435M Extract 0.5483
urchade/gliner_multi-v2.1
Entities
DeBERTa 435M Extract 0.6007
urchade/gliner_multi_pii-v1
Entities
DeBERTa 435M Extract 0.5357
openai/clip-vit-large-patch14
Dense
CLIP 428M Encode 0.7824 706 tps 298.1ms
mixedbread-ai/mxbai-colbert-large-v1
Multi-Vec
BERT 335M Encode 0.4833 42.7K tps 77.3ms
intfloat/e5-large-v2
Dense
BERT 335M Encode 0.4531 33.2K tps 86.6ms
Alibaba-NLP/gte-multilingual-base
Dense
ModernBERT 305M Encode 0.5990 55.1K tps 63.1ms
lightonai/GTE-ModernColBERT-v1
Multi-Vec
ModernBERT 305M Encode 0.7773 5.3K tps 355.9ms
google/embeddinggemma-300m
Dense
Gemma 3 303M Encode 0.3876 79.6K tps 55.7ms
jinaai/jina-reranker-v2-base-multilingual
Score
XLM-RoBERTa 278M Score 0.6546 8.3K tps 32.0ms
BAAI/bge-reranker-base
Score
XLM-RoBERTa 278M Score 0.5926 5.0K tps 33.2ms
IDEA-Research/grounding-dino-base
Bounding Boxes
Swin 233M Extract 0.5809 0.8 mpix/s 785.8ms
urchade/gliner_medium-v2.1
Entities
DeBERTa 195M Extract 0.6111
IDEA-Research/grounding-dino-tiny
Bounding Boxes
Swin 172M Extract 0.4860 0.9 mpix/s 532.6ms
google/owlv2-base-patch16-ensemble
Bounding Boxes
CLIP 155M Extract 0.5171 1.0 mpix/s 954.6ms
laion/CLIP-ViT-B-32-laion2B-s34B-b79K
Dense
CLIP 151M Encode 0.7744 1.2K tps 178.6ms
openai/clip-vit-base-patch32
Dense
CLIP 151M Encode 0.7165 651 tps 319.4ms
mixedbread-ai/mxbai-rerank-base-v2
Score
Qwen2 150M Score 0.6638 6.0K tps 454.0ms
Alibaba-NLP/gte-reranker-modernbert-base
Score
ModernBERT 150M Score 0.6701 6.2K tps 41.9ms
lightonai/Reason-ModernColBERT
Multi-Vec
ModernBERT 149M Encode 0.7777 33.0K tps 82.2ms
opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte
Sparse
ModernBERT 137M Encode 0.7470 34.2K tps 93.7ms
naver/splade-cocondenser-selfdistil
Sparse
BERT 110M Encode 0.3403 40.0K tps 72.4ms
naver/splade-v3
Sparse
BERT 110M Encode 0.7393 29.6K tps 83.7ms
numind/NuNER_Zero-span
Entities
DeBERTa 110M Extract 0.6448
intfloat/e5-base-v2
Dense
BERT 109M Encode 0.4603 53.2K tps 57.9ms
opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill
Sparse
DistilBERT 67M Encode 0.3398 49.1K tps 63.3ms
opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill
Sparse
DistilBERT 67M Encode 0.3399 50.1K tps 60.7ms
opensearch-project/opensearch-neural-sparse-encoding-v2-distill
Sparse
DistilBERT 67M Encode 0.3373 44.2K tps 63.3ms
urchade/gliner_small-v2.1
Entities
DeBERTa 60M Extract 0.5959
answerdotai/answerai-colbert-small-v1
Multi-Vec
BERT 33M Encode 0.7840 43.6K tps 60.2ms
cross-encoder/ms-marco-MiniLM-L-12-v2
Score
BERT 33M Score 0.6145 8.2K tps 31.7ms
intfloat/e5-small-v2
Dense
BERT 33M Encode 0.4299 58.3K tps 49.7ms
mixedbread-ai/mxbai-edge-colbert-v0-32m
Multi-Vec
ModernBERT 32M Encode 0.3376 45.9K tps 59.7ms
sentence-transformers/all-MiniLM-L6-v2
Dense
BERT 23M Encode 0.8396 55.3K tps 53.3ms
rasyosef/splade-mini
Sparse
BERT 11M Encode 0.3090 56.3K tps 56.0ms

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.