Overview
The extract primitive pulls structured data from unstructured content. It handles named entity recognition, relation extraction, text classification, and vision tasks like captioning and OCR.
Quick Example
Section titled “Quick Example”from sie_sdk import SIEClientfrom sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
text = Item(text="Apple CEO Tim Cook announced the iPhone 16 in Cupertino.")
result = client.extract( "urchade/gliner_multi-v2.1", text, labels=["person", "organization", "product", "location"])
for entity in result["entities"]: print(f"{entity['label']}: {entity['text']} (score: {entity['score']:.2f})")# organization: Apple (score: 0.95)# person: Tim Cook (score: 0.93)# product: iPhone 16 (score: 0.89)# location: Cupertino (score: 0.87)import { SIEClient } from "@sie/sdk";
const client = new SIEClient("http://localhost:8080");
const text = { text: "Apple CEO Tim Cook announced the iPhone 16 in Cupertino." };
const result = await client.extract( "urchade/gliner_multi-v2.1", text, { labels: ["person", "organization", "product", "location"] });
for (const entity of result.entities) { console.log(`${entity.label}: ${entity.text} (score: ${entity.score.toFixed(2)})`);}// organization: Apple (score: 0.95)// person: Tim Cook (score: 0.93)// product: iPhone 16 (score: 0.89)// location: Cupertino (score: 0.87)
await client.close();Named Entity Recognition (NER)
Section titled “Named Entity Recognition (NER)”GLiNER models extract entities with zero-shot label support. Define your own entity types at query time.
Custom Entity Types
Section titled “Custom Entity Types”No predefined schema. Specify any labels you need:
# Domain-specific entitiesresult = client.extract( "urchade/gliner_multi-v2.1", Item(text="The merger between Acme Corp and Beta Inc requires FTC approval."), labels=["company", "regulatory_body", "legal_action"])
for entity in result["entities"]: print(f"{entity['label']}: {entity['text']}")# company: Acme Corp# company: Beta Inc# regulatory_body: FTC// Domain-specific entitiesconst result = await client.extract( "urchade/gliner_multi-v2.1", { text: "The merger between Acme Corp and Beta Inc requires FTC approval." }, { labels: ["company", "regulatory_body", "legal_action"] });
for (const entity of result.entities) { console.log(`${entity.label}: ${entity.text}`);}// company: Acme Corp// company: Beta Inc// regulatory_body: FTCEntity Positions
Section titled “Entity Positions”Entities include character positions for highlighting or further processing:
result = client.extract( "urchade/gliner_multi-v2.1", Item(text="Tim Cook works at Apple."), labels=["person", "organization"])
for entity in result["entities"]: print(f"{entity['label']}: '{entity['text']}' at positions [{entity['start']}:{entity['end']}]")# person: 'Tim Cook' at positions [0:8]# organization: 'Apple' at positions [18:23]const result = await client.extract( "urchade/gliner_multi-v2.1", { text: "Tim Cook works at Apple." }, { labels: ["person", "organization"] });
for (const entity of result.entities) { console.log(`${entity.label}: '${entity.text}' at positions [${entity.start}:${entity.end}]`);}// person: 'Tim Cook' at positions [0:8]// organization: 'Apple' at positions [18:23]Batch Extraction
Section titled “Batch Extraction”Process multiple documents efficiently:
documents = [ Item(id="doc-1", text="Microsoft acquired Activision for $69 billion."), Item(id="doc-2", text="Sundar Pichai leads Google's AI initiatives."),]
results = client.extract( "urchade/gliner_multi-v2.1", documents, labels=["person", "organization", "money"])
for result in results: print(f"\n{result['id']}:") for entity in result["entities"]: print(f" {entity['label']}: {entity['text']}")const documents = [ { id: "doc-1", text: "Microsoft acquired Activision for $69 billion." }, { id: "doc-2", text: "Sundar Pichai leads Google's AI initiatives." },];
const results = await client.extract( "urchade/gliner_multi-v2.1", documents, { labels: ["person", "organization", "money"] });
for (const result of results) { console.log(`\n${result.id}:`); for (const entity of result.entities) { console.log(` ${entity.label}: ${entity.text}`); }}Response Format
Section titled “Response Format”The ExtractResult contains different fields based on the extraction type:
| Field | Type | When Present |
|---|---|---|
id | str | None | Always (if provided) |
entities | list[Entity] | NER models (GLiNER) |
relations | list[Relation] | Relation extraction (GLiREL) |
classifications | list[Classification] | Classification models (GLiClass) |
objects | list[DetectedObject] | Object detection (GroundingDINO, OWLv2) |
data | dict | Document models (Donut) |
Entity Fields
Section titled “Entity Fields”| Field | Type | Description |
|---|---|---|
text | str | Extracted text span |
label | str | Entity type |
score | float | Confidence score (0-1) |
start | int | Start character position |
end | int | End character position |
HTTP API
Section titled “HTTP API”The server defaults to msgpack. For JSON responses:
curl -X POST http://localhost:8080/v1/extract/urchade/gliner_multi-v2.1 \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -d '{ "items": [{"text": "Tim Cook is the CEO of Apple."}], "params": {"labels": ["person", "organization"]} }'Response:
{ "model": "urchade/gliner_multi-v2.1", "items": [ { "entities": [ {"text": "Tim Cook", "label": "person", "score": 0.93, "start": 0, "end": 8}, {"text": "Apple", "label": "organization", "score": 0.95, "start": 24, "end": 29} ] } ]}What’s Next
Section titled “What’s Next”- Vision tasks - image captioning, OCR, and document understanding
- Relations & classification - relation extraction and text classification
- Full model catalog - all supported models