Why did we open-source our inference engine? Read the post

openai/clip-vit-large-patch14

Disclaimer: The model card is taken and modified from the official CLIP repository, it can be found here.

Architecture
CLIP
Parameters
428M
Tasks
Encode
Outputs
Dense
Dimensions
Dense: 768
Max Sequence Length
77 tokens
License

Benchmarks

Flickr30kI2TRetrieval

general retrieval en

Image-to-text retrieval: retrieve captions from images

Corpus: 31,783 Queries: 1,000
Quality
ndcg at 10 0.7824
map at 10 0.6816
mrr at 10 0.9111
Performance L4 b1 c16
Corpus TPS 706
Corpus p50 298.1ms
Query TPS 25
Query p50 389.6ms
Reference →

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.