google/siglip-so400m-patch14-224
SigLIP model pre-trained on WebLi at resolution 224x224. It was introduced in the paper Sigmoid Loss for Language Image Pre-Training by Zhai et al. and first released in this repository.
Benchmarks
Flickr30kI2TRetrieval
Image-to-text retrieval: retrieve captions from images
Corpus: 31,783 Queries: 1,000
Quality
ndcg at 10 0.8382
map at 10 0.7479
mrr at 10 0.9353
Performance L4-SPOT b1 c8
Corpus TPS 223
Corpus p50 395.0ms
Query TPS 11
Query p50 392.1ms
Performance L4 b1 c16
Corpus TPS 473
Corpus p50 484.7ms
Query TPS 22
Query p50 425.9ms