Skip to content
Why did we open-source our inference engine? Read the post

Overview

The extract primitive pulls structured data from unstructured content. It handles named entity recognition, relation extraction, text classification, and vision tasks like captioning and OCR.

from sie_sdk import SIEClient
from sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
text = Item(text="Apple CEO Tim Cook announced the iPhone 16 in Cupertino.")
result = client.extract(
"urchade/gliner_multi-v2.1",
text,
labels=["person", "organization", "product", "location"]
)
for entity in result["entities"]:
print(f"{entity['label']}: {entity['text']} (score: {entity['score']:.2f})")
# organization: Apple (score: 0.95)
# person: Tim Cook (score: 0.93)
# product: iPhone 16 (score: 0.89)
# location: Cupertino (score: 0.87)

GLiNER models extract entities with zero-shot label support. Define your own entity types at query time.

No predefined schema. Specify any labels you need:

# Domain-specific entities
result = client.extract(
"urchade/gliner_multi-v2.1",
Item(text="The merger between Acme Corp and Beta Inc requires FTC approval."),
labels=["company", "regulatory_body", "legal_action"]
)
for entity in result["entities"]:
print(f"{entity['label']}: {entity['text']}")
# company: Acme Corp
# company: Beta Inc
# regulatory_body: FTC

Entities include character positions for highlighting or further processing:

result = client.extract(
"urchade/gliner_multi-v2.1",
Item(text="Tim Cook works at Apple."),
labels=["person", "organization"]
)
for entity in result["entities"]:
print(f"{entity['label']}: '{entity['text']}' at positions [{entity['start']}:{entity['end']}]")
# person: 'Tim Cook' at positions [0:8]
# organization: 'Apple' at positions [18:23]

Process multiple documents efficiently:

documents = [
Item(id="doc-1", text="Microsoft acquired Activision for $69 billion."),
Item(id="doc-2", text="Sundar Pichai leads Google's AI initiatives."),
]
results = client.extract(
"urchade/gliner_multi-v2.1",
documents,
labels=["person", "organization", "money"]
)
for result in results:
print(f"\n{result['id']}:")
for entity in result["entities"]:
print(f" {entity['label']}: {entity['text']}")

The ExtractResult contains different fields based on the extraction type:

FieldTypeWhen Present
idstr | NoneAlways (if provided)
entitieslist[Entity]NER models (GLiNER)
relationslist[Relation]Relation extraction (GLiREL)
classificationslist[Classification]Classification models (GLiClass)
objectslist[DetectedObject]Object detection (GroundingDINO, OWLv2)
datadictDocument models (Donut)
FieldTypeDescription
textstrExtracted text span
labelstrEntity type
scorefloatConfidence score (0-1)
startintStart character position
endintEnd character position

The server defaults to msgpack. For JSON responses:

Terminal window
curl -X POST http://localhost:8080/v1/extract/urchade/gliner_multi-v2.1 \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{
"items": [{"text": "Tim Cook is the CEO of Apple."}],
"params": {"labels": ["person", "organization"]}
}'

Response:

{
"model": "urchade/gliner_multi-v2.1",
"items": [
{
"entities": [
{"text": "Tim Cook", "label": "person", "score": 0.93, "start": 0, "end": 8},
{"text": "Apple", "label": "organization", "score": 0.95, "start": 24, "end": 29}
]
}
]
}