Why did we open-source our inference engine? Read the post

openai/clip-vit-base-patch32

Disclaimer: The model card is taken and modified from the official CLIP repository, it can be found here.

Architecture
CLIP
Parameters
151M
Tasks
Encode
Outputs
Dense
Dimensions
Dense: 512
Max Sequence Length
77 tokens
License

Benchmarks

Flickr30kI2TRetrieval

general retrieval en

Image-to-text retrieval: retrieve captions from images

Corpus: 31,783 Queries: 1,000
Quality
ndcg at 10 0.7165
map at 10 0.6029
mrr at 10 0.8521
Performance L4 b1 c16
Corpus TPS 651
Corpus p50 319.4ms
Query TPS 24
Query p50 317.9ms
Reference →

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.