TypeScript SDK Reference

The TypeScript SDK provides an async client for interacting with the SIE server from Node.js and browser environments.

Installation

pnpm add @sie/sdk

Or with npm:

npm install @sie/sdk

SIEClient

Async client for the SIE server. All methods return Promises.

Constructor

import { SIEClient } from "@sie/sdk";

const client = new SIEClient(
  baseUrl: string,                    // Server URL (e.g., "http://localhost:8080")
  options?: {
    timeout?: number,                 // Request timeout in milliseconds (default: 30000)
    apiKey?: string,                  // API key for authentication
    gpu?: string,                     // Default GPU type for routing
    pool?: PoolSpec,                  // Resource pool configuration
    waitForCapacity?: boolean,        // Auto-retry on 202 (default: false)
    provisionTimeout?: number,        // Max wait for provisioning in ms (default: 300000)
  }
);

Methods

encode()

Generate embeddings.

async encode(
  model: string,                      // Model name
  items: Item | Item[],               // Items to encode
  options?: {
    outputTypes?: OutputType[],       // ["dense", "sparse", "multivector"]
    instruction?: string,             // Task instruction for instruction-tuned models
    outputDtype?: DType,              // "float32", "float16", "int8", "binary"
    isQuery?: boolean,                // Query vs document encoding
    gpu?: string,                     // GPU routing
    waitForCapacity?: boolean,        // Wait for scale-up
  }
): Promise<EncodeResult | EncodeResult[]>

Returns: Single EncodeResult if single item passed, otherwise array.

Example:

// Single item
const result = await client.encode("BAAI/bge-m3", { text: "Hello" });
console.log(result.dense?.slice(0, 5)); // Float32Array

// Batch
const results = await client.encode("BAAI/bge-m3", [
  { text: "First" },
  { text: "Second" },
]);

score()

Rerank items against a query using a cross-encoder or late interaction model. Returns items sorted by relevance score (highest first).

async score(
  model: string,                      // Model name (e.g., "BAAI/bge-reranker-v2-m3")
  query: Item,                        // Query item with text or multivector
  items: Item[],                      // Items to score against query
  options?: {
    topK?: number,                    // Return only top K results
    gpu?: string,
    waitForCapacity?: boolean,
  }
): Promise<ScoreResult>

Example:

const result = await client.score(
  "BAAI/bge-reranker-v2-m3",
  { text: "What is Python?" },
  [{ text: "Python is..." }, { text: "Java is..." }]
);

// Scores are sorted by relevance (rank 0 = most relevant)
for (const entry of result.scores) {
  console.log(`Rank ${entry.rank}: ${entry.score.toFixed(3)}`);
}

Note: For ColBERT-style models, you can pass pre-computed multivectors to score client-side without a server round-trip. See the Scoring Utilities section.

extract()

Extract entities or structured data from text. Supports Named Entity Recognition (NER) models like GLiNER.

async extract(
  model: string,                      // Model name (e.g., "urchade/gliner_multi-v2.1")
  items: Item | Item[],               // Items to extract from
  options: {
    labels: string[],                 // Entity types to extract (e.g., ["person", "org"])
    threshold?: number,               // Minimum confidence (0-1)
    gpu?: string,
    waitForCapacity?: boolean,
  }
): Promise<ExtractResult | ExtractResult[]>

Returns: Single ExtractResult if single item passed, otherwise array.

Example:

const result = await client.extract(
  "urchade/gliner_multi-v2.1",
  { text: "Tim Cook leads Apple." },
  { labels: ["person", "organization"] }
);

for (const entity of result.entities) {
  console.log(`${entity.label}: ${entity.text} (score: ${entity.score.toFixed(2)})`);
}
// Output:
// person: Tim Cook (score: 0.95)
// organization: Apple (score: 0.92)

listModels()

Get available models.

async listModels(): Promise<ModelInfo[]>

Example:

const models = await client.listModels();
for (const model of models) {
  console.log(`${model.name}: ${model.outputs.join(", ")}`);
}

getCapacity()

Get cluster capacity information.

async getCapacity(gpu?: string): Promise<CapacityInfo>

Example:

const capacity = await client.getCapacity();
console.log(`Workers: ${capacity.workerCount}, GPUs: ${capacity.liveGpuTypes}`);

// Check if L4 GPUs are available
const l4Capacity = await client.getCapacity("l4");
if (l4Capacity.workerCount > 0) {
  console.log("L4 workers available");
}

waitForCapacity()

Wait for GPU capacity to become available. This is useful for pre-warming the cluster before running benchmarks.

async waitForCapacity(
  gpu: string,
  options?: {
    model?: string,                   // If provided, sends a warmup encode request
    timeout?: number,                 // Default: 300000ms
    pollInterval?: number,            // Default: 5000ms
  }
): Promise<CapacityInfo>

Example:

// Wait for L4 capacity before running benchmarks
const capacity = await client.waitForCapacity("l4", { timeout: 300000 });
console.log(`Ready with ${capacity.workerCount} L4 workers`);

// Wait and pre-load a model
const capacityWithModel = await client.waitForCapacity("l4", { model: "BAAI/bge-m3" });

close()

Close the client and cleanup resources.

async close(): Promise<void>

Types

Item

Input item for encode, score, and extract operations.

interface Item {
  id?: string;                        // Client-provided ID (echoed in response)
  text?: string;                      // Text content
  images?: Uint8Array[];              // Image data as byte arrays (for multimodal models)
  multivector?: Float32Array[];       // Pre-computed vectors (for client-side MaxSim)
  metadata?: Record<string, unknown>; // Custom metadata
}

Common patterns:

// Simple text
{ text: "Hello world" }

// With ID for tracking
{ id: "doc-1", text: "Document text" }

// Multimodal (for CLIP, ColPali, etc.)
{ text: "Description", images: [imageBytes] }

EncodeResult

interface EncodeResult {
  id?: string;                        // Echoed item ID
  dense?: Float32Array;               // Dense embedding
  sparse?: SparseResult;              // Sparse embedding
  multivector?: Float32Array[];       // Per-token embeddings
  timing?: TimingInfo;                // Timing breakdown
}

SparseResult

interface SparseResult {
  indices: Int32Array;                // Token IDs
  values: Float32Array;               // Token weights
}

ScoreResult

interface ScoreResult {
  model?: string;                     // Model used for scoring
  queryId?: string;                   // Query ID (if provided in request)
  scores: ScoreEntry[];               // Sorted by score descending
}

ScoreEntry

interface ScoreEntry {
  itemId: string;                     // ID of the item
  score: number;                      // Relevance score
  rank: number;                       // Position (0 = most relevant)
}

ExtractResult

interface ExtractResult {
  id?: string;                        // Echoed item ID
  entities: Entity[];                 // Extracted entities
}

Entity

interface Entity {
  text: string;                       // Extracted span
  label: string;                      // Entity type
  score: number;                      // Confidence (0-1)
  start?: number;                     // Start character offset
  end?: number;                       // End character offset
  bbox?: number[];                    // Bounding box [x, y, width, height] for vision models
}

ModelInfo

interface ModelInfo {
  name: string;                       // Model name/identifier
  loaded: boolean;                    // Whether model weights are in memory
  inputs: string[];                   // Input types: ["text"], ["text", "image"], etc.
  outputs: string[];                  // Output types: ["dense"], ["dense", "sparse"], etc.
  dims?: ModelDims;                   // Dimension info for each output type
  maxSequenceLength?: number;         // Maximum input sequence length
}

CapacityInfo

interface CapacityInfo {
  status: string;                     // "healthy", "degraded", "no_workers"
  workerCount: number;                // Number of healthy workers
  gpuCount: number;                   // Number of GPUs available
  modelsLoaded: number;               // Unique models loaded across workers
  configuredGpuTypes: string[];       // GPU types configured in cluster
  liveGpuTypes: string[];             // GPU types currently running
  workers: WorkerInfo[];              // Worker details
}

TimingInfo

interface TimingInfo {
  totalMs?: number;                   // Total request time
  queueMs?: number;                   // Time waiting in queue
  tokenizationMs?: number;            // Tokenization time
  inferenceMs?: number;               // Model inference time
}

OutputType

type OutputType = "dense" | "sparse" | "multivector";

DType

type DType = "float32" | "float16" | "bfloat16" | "int8" | "uint8" | "binary" | "ubinary";

Utility Functions

// Convert typed arrays to regular number arrays (for JSON serialization)
function toNumberArray(arr: Float32Array | Int32Array): number[];

// Convert number array to Float32Array
function toFloat32Array(arr: number[]): Float32Array;

Scoring Utilities

Client-side scoring for multi-vector embeddings.

maxsim()

Compute MaxSim scores for ColBERT-style retrieval. MaxSim finds the maximum similarity between each query token and any document token, then sums these maximums.

function maxsim(
  query: Float32Array[],              // [numQueryTokens][dim]
  document: Float32Array[]            // [numDocTokens][dim]
): number

Example:

import { SIEClient, maxsim } from "@sie/sdk";

const client = new SIEClient("http://localhost:8080");

// Encode query with isQuery=true for ColBERT models
const queryResult = await client.encode(
  "jinaai/jina-colbert-v2",
  { text: "What is ColBERT?" },
  { outputTypes: ["multivector"], isQuery: true }
);

// Encode documents (no isQuery needed for documents)
const docResults = await client.encode(
  "jinaai/jina-colbert-v2",
  documents,
  { outputTypes: ["multivector"] }
);

// Compute MaxSim scores client-side
const queryMv = queryResult.multivector!;
const scores = docResults.map((r) => maxsim(queryMv, r.multivector!));

// Rank by score (higher is more relevant)
const ranked = scores
  .map((score, idx) => ({ score, idx }))
  .sort((a, b) => b.score - a.score);

maxsimDocuments()

Score a query against multiple documents.

function maxsimDocuments(
  query: Float32Array[],
  documents: Float32Array[][]
): number[]

maxsimBatch()

Batch version for multiple queries against multiple documents.

function maxsimBatch(
  queries: Float32Array[][],
  documents: Float32Array[][]
): Float32Array  // Flattened [numQueries * numDocuments]

Errors

Exception hierarchy for SDK errors.

SIEError

Base class for all SDK errors.

class SIEError extends Error {
  name: "SIEError";
}

SIEConnectionError

Cannot connect to server.

class SIEConnectionError extends SIEError {
  name: "SIEConnectionError";
}

RequestError

Invalid request (4xx responses).

class RequestError extends SIEError {
  name: "RequestError";
  code?: string;
  statusCode?: number;
}

ServerError

Server error (5xx responses).

class ServerError extends SIEError {
  name: "ServerError";
  code?: string;
  statusCode?: number;
}

ProvisioningError

No capacity available or timeout waiting for scale-up.

class ProvisioningError extends SIEError {
  name: "ProvisioningError";
  gpu?: string;
  retryAfter?: number;
}

PoolError

Resource pool operation failed.

class PoolError extends SIEError {
  name: "PoolError";
  poolName?: string;
  state?: string;
}

LoraLoadingError

LoRA adapter loading timeout.

class LoraLoadingError extends SIEError {
  name: "LoraLoadingError";
  lora?: string;
  model?: string;
}

Handling Errors

import { SIEClient, RequestError, ProvisioningError } from "@sie/sdk";

const client = new SIEClient("http://localhost:8080");

try {
  const result = await client.encode("unknown-model", { text: "test" });
} catch (error) {
  if (error instanceof RequestError) {
    console.log(`Invalid request: ${error.code} (${error.statusCode})`);
  } else if (error instanceof ProvisioningError) {
    console.log(`No capacity for GPU ${error.gpu}, retry after ${error.retryAfter}ms`);
  }
}

GPU Routing

For cluster deployments with multiple GPU types, specify the target GPU:

// Per-request GPU selection
const result = await client.encode(
  "BAAI/bge-m3",
  items,
  { gpu: "a100-80gb" }
);

// Default GPU for all requests
const client = new SIEClient("http://router.example.com", {
  gpu: "l4"
});

Available GPU types depend on your cluster configuration.

Resource Pools

Create isolated worker sets for testing or tenant isolation:

import { SIEClient } from "@sie/sdk";

const client = new SIEClient("http://router.example.com");
await client.createPool("my-test-pool", { l4: 2, "a100-40gb": 1 });

// Route requests to the pool
const result = await client.encode(
  "BAAI/bge-m3",
  items,
  { gpu: "my-test-pool/l4" }
);

// Check pool status
const pool = await client.getPool("my-test-pool");
console.log(`Pool state: ${pool?.status.state}`);
console.log(`Workers: ${pool?.status.assignedWorkers.length}`);

// Clean up
await client.deletePool("my-test-pool");
await client.close();

Complete Example

import { SIEClient, maxsim } from "@sie/sdk";

// Initialize client
const client = new SIEClient("http://localhost:8080", { timeout: 60000 });

// Dense embeddings
const documents = [
  "Machine learning is a subset of artificial intelligence.",
  "Python is a popular programming language.",
  "Neural networks are inspired by the human brain.",
];

const embeddings = await client.encode(
  "BAAI/bge-m3",
  documents.map((text, i) => ({ id: `doc-${i}`, text }))
);

// Store in vector database
for (const result of embeddings) {
  if (result.dense) {
    // vectorDb.insert(result.id, result.dense);
    console.log(`Stored ${result.id}: ${result.dense.length} dimensions`);
  }
}

// Query with reranking
const query = { text: "What is machine learning?" };

// Stage 1: Vector search
const queryEmb = await client.encode("BAAI/bge-m3", query, { isQuery: true });
// const candidates = await vectorDb.search(queryEmb.dense, { topK: 100 });

// Stage 2: Rerank (using documents directly for this example)
const rerankResult = await client.score(
  "BAAI/bge-reranker-v2-m3",
  query,
  documents.map((text, i) => ({ id: `doc-${i}`, text }))
);

// Top results
console.log("\nTop results:");
for (const entry of rerankResult.scores.slice(0, 3)) {
  console.log(`  ${entry.rank + 1}. ${entry.itemId} (score: ${entry.score.toFixed(3)})`);
}

// Entity extraction
const extractResult = await client.extract(
  "urchade/gliner_multi-v2.1",
  { text: "Elon Musk founded SpaceX and leads Tesla." },
  { labels: ["person", "organization"] }
);

console.log("\nExtracted entities:");
for (const entity of extractResult.entities) {
  console.log(`  ${entity.label}: ${entity.text}`);
}

// Clean up
await client.close();