lightonai/GTE-ModernColBERT-v1
This is a PyLate model trained on the ms-marco-en-bge-gemma dataset. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator.
Benchmarks
CQADupstackPhysicsRetrieval
Duplicate question retrieval from StackExchange Physics
CosQA
Code search with natural language queries
FiQA2018
Financial opinion mining and question answering
LegalBenchConsumerContractsQA
Question answering on consumer contracts
NFCorpus
Biomedical literature search from NutritionFacts.org
NanoFiQA2018Retrieval
Smaller subset of the FiQA financial QA dataset
SCIDOCS
Citation prediction, document classification, and recommendation for scientific papers
SciFact
Scientific claim verification using research literature
StackOverflowQA
Programming question answering from Stack Overflow