openai/clip-vit-base-patch32

Disclaimer: The model card is taken and modified from the official CLIP repository, it can be found here.

Architecture

CLIP

Parameters

151M

Tasks

Encode

Outputs

Dense

Dimensions

Dense: 512

Max Sequence Length

77 tokens

License

—

Benchmarks

general retrieval en

Image-to-text retrieval: retrieve captions from images

Corpus: 31,783 Queries: 1,000

Quality

ndcg at 10 0.7165

map at 10 0.6029

mrr at 10 0.8521

Performance L4 b1 c16

Corpus TPS 651

Corpus p50 319.4ms

Query TPS 24

Query p50 317.9ms