vidore/colqwen2.5-v0.2
ColQwen is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.
Benchmarks
Vidore3ComputerScienceRetrieval
Visual document retrieval on computer science papers and slides
Quality
ndcg at 10 0.7680
map at 10 0.6543
mrr at 10 0.8726
Performance L4 b1 c4
Corpus TPS 2
Corpus p50 1.9s
Query TPS 139
Query p50 527.1ms
Vidore3FinanceEnRetrieval
Visual document retrieval on financial reports
Quality
ndcg at 10 0.6207
map at 10 0.5008
mrr at 10 0.7416
Performance L4 b1 c4
Corpus TPS 2
Corpus p50 1.9s
Query TPS 150
Query p50 547.2ms
english
Visual document retrieval on HR-related documents
Quality
ndcg at 10 0.6034
map at 10 0.4666
mrr at 10 0.7046
Performance L4 b1 c4
Corpus TPS 2
Corpus p50 2.0s
Query TPS 110
Query p50 769.7ms
Vidore3PharmaceuticalsRetrieval
Visual document retrieval on pharmaceutical documents
Quality
ndcg at 10 0.6274
map at 10 0.5173
mrr at 10 0.7234
Performance L4 b1 c4
Corpus TPS 2
Corpus p50 1.9s
Query TPS 138
Query p50 565.3ms