openai/clip-vit-large-patch14

Disclaimer: The model card is taken and modified from the official CLIP repository, it can be found here.

Architecture

CLIP

Parameters

428M

Tasks

Encode

Outputs

Dense

Dimensions

Dense: 768

Max Sequence Length

77 tokens

License

—

Benchmarks

general retrieval en

Image-to-text retrieval: retrieve captions from images

Corpus: 31,783 Queries: 1,000

Quality

ndcg at 10 0.7824

map at 10 0.6816

mrr at 10 0.9111

Performance L4 b1 c16

Corpus TPS 706

Corpus p50 298.1ms

Query TPS 25

Query p50 389.6ms