Why did we open-source our inference engine? Read the post

google/owlv2-base-patch16-ensemble

The OWLv2 model (short for Open-World Localization) was proposed in Scaling Open-Vocabulary Object Detection by Matthias Minderer, Alexey Gritsenko, Neil Houlsby.

Architecture
CLIP
Parameters
155M
Tasks
Extract
Outputs
Bounding Boxes
License
apache-2.0

Benchmarks

COCO

general detection en

Object detection on COCO natural images

Corpus: 5,000 Queries: 5,000
default_limit-1000
Performance A10G b1 c4
Performance L4-SPOT b1 c4
Performance L4 b1 c4
default_limit-100
Quality
ap 0.5171
ap50 0.7172
ap75 0.5738
ar 100 0.6315
Performance RTX-4090 b1 c16
Reference →

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.