nomic-ai/nomic-embed-text-v2-moe
Blog | Technical Report | AWS SageMaker | Atlas Embedding and Unstructured Data Analytics Platform
Benchmarks
CQADupstackPhysicsRetrieval
Duplicate question retrieval from StackExchange Physics
Corpus: 38,314 Queries: 1,039
Performance L4 b1 c16
Corpus TPS 13.0K
Corpus p50 149.6ms
Query TPS 1.2K
Query p50 143.2ms
CosQA
Code search with natural language queries
Corpus: 6,267 Queries: 500
Performance L4 b1 c16
Corpus TPS 807
Corpus p50 595.7ms
Query TPS 139
Query p50 634.4ms
NanoFiQA2018Retrieval
Smaller subset of the FiQA financial QA dataset
Quality
ndcg at 10 0.5207
map at 10 0.4283
mrr at 10 0.5634
Performance L4 b1 c16
Corpus TPS 20.1K
Corpus p50 135.4ms
Query TPS 1.7K
Query p50 119.2ms
SCIDOCS
Citation prediction, document classification, and recommendation for scientific papers
Corpus: 25,656 Queries: 1,000
Performance L4 b1 c16
Corpus TPS 2.4K
Corpus p50 1.3s
Query TPS 74
Query p50 1.7s
StackOverflowQA
Programming question answering from Stack Overflow
Corpus: 19,931 Queries: 1,994
Performance L4 b1 c16
Corpus TPS 24.1K
Corpus p50 145.6ms
Query TPS 33.4K
Query p50 142.9ms