google/owlv2-base-patch16-ensemble

The OWLv2 model (short for Open-World Localization) was proposed in Scaling Open-Vocabulary Object Detection by Matthias Minderer, Alexey Gritsenko, Neil Houlsby.

Architecture

CLIP

Parameters

155M

Tasks

Extract

Outputs

Bounding Boxes

License

apache-2.0

View on HuggingFace →

Benchmarks

COCO

general detection en

Object detection on COCO natural images

Corpus: 5,000 Queries: 5,000

default_limit-1000

Performance A10G b1 c4

Performance L4-SPOT b1 c4

Performance L4 b1 c4

default_limit-100

Quality

ap 0.5171

ap50 0.7172

ap75 0.5738

ar 100 0.6315

Performance RTX-4090 b1 c16

Reference →

Benchmarks

COCO

Self-hosted inference for search & document processing