API Reference

All application API endpoints are under /api/v1/. Backend execution calls authenticate with an API key in the Authorization header. Dashboard control-plane endpoints use a session cookie managed by the dashboard itself.

Application code should always use API keys — they are project-scoped, traceable, and independently revocable. Session credentials are for the dashboard UI and are not intended for backend services or automated scripts.

Authorization: Bearer ak_<key_id>.<secret>   # API key — backend / execution

See Authentication for credential types, key format, and security practices.

How it works — the trace model

Before diving into endpoints, it helps to understand how Olyx structures work.

Every AI call belongs to a trace. A trace is a container for one logical unit of work — a user message, a background job, an agentic task. Inside a trace are steps: each model call, safety check, tool invocation, or log event is recorded as a step. This structure is what lets Olyx calculate cost, latency, grades, and security flags per request.

The typical lifecycle looks like this:

flowchart LR TRACE[POST /TRACES] EXEC[POST /EXECUTIONS] MORE[OPTIONAL MORE STEPS] COMPLETE[PATCH /TRACES/:ID/COMPLETE] SUMMARY[TRACE SUMMARY] TRACE --> EXEC --> MORE --> COMPLETE --> SUMMARY EXEC --> COMPLETE

POST /api/v1/traces creates the trace and returns a lightweight trace object.
POST /api/v1/executions runs the AI call, records a step, and returns output.
Repeat execution/log/check steps for multi-turn or agentic flows.
PATCH /api/v1/traces/:id/complete seals the trace and triggers grading.

You always create a trace first, then attach execution calls to it via trace_id. Think of it like opening a tab at a restaurant, ordering items, and then closing the tab at the end.

Step types

Type	What it records
`check`	A safety-check-only call (PII, injection, secrets).
`run`	A model call — stores input, output, model, cost, latency.
`tool_call`	A tool invoked by the model during an agentic step.
`log`	A custom event you pushed with `POST /api/v1/logs` — user rating, A/B label, etc.

Projects

All project endpoints require a session token.

A project is an isolated environment with its own API keys, routing rules, model registry, and spend limits. Use separate projects for separate environments (production, staging) or separate products.

Method	Path	Description
`GET`	`/api/v1/projects`	List projects visible to your role.
`POST`	`/api/v1/projects`	Create a project (admin/owner only).
`GET`	`/api/v1/projects/:id`	Get a single project.
`PATCH`	`/api/v1/projects/:id`	Update name, description, status, or settings.
`DELETE`	`/api/v1/projects/:id`	Delete a project and its related project data (admin/owner only).

Create a project

POST /api/v1/projects
Content-Type: application/json

{
  "project": {
    "name": "Production",
    "description": "Live traffic"
  }
}

Response fields

Field	Type	Description
`id`	integer	Project ID.
`name`	string	Project name.
`description`	string	Optional description.
`status`	string	`active` or `archived`.
`settings`	object	Routing tiers, shadow model config.
`total_monthly_spend`	float	Cumulative spend this month (USD).
`created_at`	ISO8601	Creation timestamp.

Update routing tiers

Routing tiers tell Olyx which model to use for different kinds of requests. The gateway classifies each request as simple, complex, or secure based on content and sends it to the right model automatically — you don’t need to pick a model per request.

PATCH merges settings; existing keys are preserved.

PATCH /api/v1/projects/:id
Content-Type: application/json

{
  "project": {
    "settings": {
      "routing_tiers": {
        "secure":  "my-private-model",
        "simple":  "gpt-4o-mini",
        "complex": "gpt-4o"
      }
    }
  }
}

Tier	When it applies
`simple`	Short, low-stakes prompts. Routed to your cheapest model.
`complex`	Long context, reasoning-heavy, or multi-step prompts. Routed to your most capable model.
`secure`	Requests flagged as sensitive. Point this to a selected private-agent or internal model route when one is configured.

Model Registry

All model registry endpoints require a session token.

The model registry is where you tell Olyx about every AI model your projects can use — including private or self-hosted models. Olyx needs this so it can route to the right endpoint, track costs accurately, and apply per-model retention policies.

Method	Path	Description
`GET`	`/api/v1/projects/:project_id/models`	List model definitions.
`POST`	`/api/v1/projects/:project_id/models`	Add a model.
`GET`	`/api/v1/projects/:project_id/models/:model_id`	Get a model definition.
`PATCH`	`/api/v1/projects/:project_id/models/:model_id`	Update a model definition.
`DELETE`	`/api/v1/projects/:project_id/models/:model_id`	Remove a model.

Add a model

POST /api/v1/projects/:project_id/models
Content-Type: application/json

{
  "model_definition": {
    "name": "Internal Llama 3",
    "identifier": "my-llama-3",
    "provider": "openai",
    "base_url": "https://llm.internal.corp/v1",
    "api_key": "sk-...",
    "is_public": false,
    "input_cost_per_1k": 0.0002,
    "output_cost_per_1k": 0.0002,
    "data_retention_days": 7,
    "region_restriction": "us-east",
    "currency": "USD"
  }
}

All fields

Field	Type	Description
`name`	string	Human-readable label shown in the dashboard.
`identifier`	string	The string you pass as `model` in API calls. Must be unique per project. This is the key you reference in routing tier config.
`provider`	string	`openai`, `anthropic`, `gemini`, `bedrock`, `azure`, or `internal`. Selects the provider adapter and wire format. Use `openai` for any model that speaks the OpenAI API format — including Groq, vLLM, Ollama, and LM Studio. Use `azure` for Azure OpenAI deployments, which use a different auth header and deployment-based URL.
`base_url`	string	The HTTP endpoint Olyx sends requests to. For hosted models like `gpt-4o` this is already known — only needed for custom or private endpoints.
`api_key`	string	Stored encrypted. Never returned in responses — use `has_api_key` to check if one is stored.
`is_public`	boolean	`false` marks the model as private/internal. Use it with the private-agent route when the provider endpoint is not publicly reachable.
`input_cost_per_1k`	float	Cost per 1,000 prompt tokens (USD). Used for cost tracking and spend limit enforcement.
`output_cost_per_1k`	float	Cost per 1,000 completion tokens (USD).
`data_retention_days`	integer	Days before trace data associated with this model is purged. Default: 30.
`region_restriction`	string	Optional region tag (e.g. `us-east`, `eu-west`). Default: `global`.
`currency`	string	3-letter currency code for cost tracking. Default: `CAD`.
`additional_config`	object	JSONB bag — e.g. `{ "infrastructure": "internal", "fallback_identifier": "gpt-4o-mini" }`.

Read-only response fields

Field	Type	Description
`has_api_key`	boolean	Whether a key is stored. The value is never returned.
`fallback_identifier`	string	Shortcut read of `additional_config.fallback_identifier`.

API Keys

All key endpoints require a session token. Keys are project-scoped in the current closed beta.

Method	Path	Description
`GET`	`/api/v1/keys`	List keys for your organization.
`POST`	`/api/v1/keys`	Create a project-scoped key.
`PATCH`	`/api/v1/keys/:id`	Update status or hourly limit.
`DELETE`	`/api/v1/keys/:id`	Revoke a key.

Create a key

POST /api/v1/keys
Content-Type: application/json

{
  "name": "Production",
  "project_id": 12,
  "hourly_limit": 10.00
}

project_id is required. A key created for staging cannot authenticate production traces, and a production key cannot write into staging.

hourly_limit is a rolling spend cap in USD. Once a key crosses the threshold in the current hourly window, its status flips to tripped and requests with that key are rejected until an admin reviews and resets the key. This is a safety net for runaway agentic loops — set it to a value just above your expected peak, not zero.

Response — create only

{
  "id": "ak_3f8a1c...",
  "name": "Production",
  "raw_key": "ak_3f8a1c....fde29b84...",
  "masked": "ak_3f8a...84",
  "status": "active",
  "hourly_limit": 10.00,
  "project": { "id": 12, "name": "Production" },
  "created_at": "2026-04-12T09:00:00Z"
}

raw_key is the full secret. It is shown exactly once — store it immediately in an environment variable or secrets manager. It cannot be recovered after this response.

List response fields

Field	Type	Description
`id`	string	The `key_id` portion (`ak_<hex>`).
`name`	string	Label you gave the key.
`masked`	string	First 6 + last 4 characters — safe to display in UIs.
`status`	string	See below.
`hourly_limit`	float\|null	Spend cap per rolling hour (USD). `null` means no cap is configured.
`project`	object	Project the key is scoped to.
`expires_at`	ISO8601\|null	Hard expiry timestamp if set.
`loop_intervention_count`	integer	How many times an agent loop was auto-halted on this key. A non-zero value is a signal to review your agent’s stop conditions.
`last_used`	ISO8601\|null	Last authenticated request timestamp.

Key statuses

Status	What it means	What to do
`active`	Key is working normally.	—
`tripped`	Hourly spend cap was exceeded. Requests are blocked while the key remains tripped.	Review the traffic, raise the `hourly_limit` if appropriate, then reset the key status to `active`.
`loop_detected`	Olyx detected a repeating request pattern and paused the key.	Check the trace for the looping pattern, fix the caller, then reset the key status to `active`.

Traces

All trace endpoints require an API key.

Method	Path	Description
`POST`	`/api/v1/traces`	Create a trace.
`GET`	`/api/v1/traces`	List traces (newest first, paginated).
`GET`	`/api/v1/traces/:id`	Get a trace with steps and summary.
`PATCH`	`/api/v1/traces/:id/complete`	Mark a trace completed and trigger grading.

Create a trace

POST /api/v1/traces
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "metadata": { "user_id": "user_123", "task": "translation" },
  "revenue": 0.50
}

metadata is optional and must be a JSON object when supplied — use it to attach any context you want to see in the dashboard (user ID, session ID, feature flag, task type). revenue (USD) enables request-margin calculation: Olyx subtracts model cost from revenue after the trace is completed.

Response

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "created_at": "2026-04-12T09:00:00Z",
  "metadata": { "user_id": "user_123", "task": "translation" },
  "revenue": 0.50
}

Save the id — you’ll pass it as trace_id in every subsequent execution call.

import Olyx from "@olyx-labs/olyx";

const client = new Olyx({ apiKey: process.env.OLYX_API_KEY! });

const trace = await client.traces.create({
  metadata: { userId: "user_123", task: "translation" },
  revenue: 0.50,
});

console.log(trace.data.id); // pass to execute()

import os, olyx

client = olyx.Olyx(api_key=os.environ["OLYX_API_KEY"])

trace = client.traces.create(
    metadata={"user_id": "user_123", "task": "translation"},
    revenue=0.50,
)

print(trace.id)  # pass to execute()

require "olyx"

client = Olyx.new(api_key: ENV.fetch("OLYX_API_KEY"))

trace = client.traces.create(
  metadata: { user_id: "user_123", task: "translation" },
  revenue: 0.50
)

puts trace.id  # pass to execute()

Complete a trace

PATCH /api/v1/traces/:id/complete
Authorization: Bearer ak_<key_id>.<secret>

Completing a trace marks the unit of work as finished, triggers grading, and returns the computed cost summary.

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "created_at": "2026-04-12T09:00:00Z",
  "metadata": { "user_id": "user_123", "task": "translation" },
  "revenue": 0.50,
  "optimization_grade": "B",
  "grades": { "overall": "B", "waste": "A", "latency": "B" },
  "total_cost": 0.00318,
  "summary": {
    "total_cost": 0.00318,
    "revenue": 0.50,
    "gross_margin": 0.49682,
    "by_model": { "gpt-4o": 0.00318 },
    "by_infrastructure": { "public_cloud": 0.00318, "private": 0.0 }
  }
}

const summary = await client.traces.complete(trace.data.id);

console.log(summary.data.optimization_grade); // "B"
console.log(summary.data.total_cost);         // 0.00318

summary = client.traces.complete(trace.id)

print(summary.optimization_grade)  # "B"
print(summary.total_cost)          # 0.00318

summary = client.traces.complete(trace.id)

puts summary.optimization_grade  # "B"
puts summary.total_cost          # 0.00318

Get a trace

GET /api/v1/traces/:id
Authorization: Bearer ak_<key_id>.<secret>

Response

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "created_at": "2026-04-12T09:00:00Z",
  "summary": {
    "total_latency_ms": 1240.5,
    "total_cost": 0.00318,
    "revenue": 0.50,
    "gross_margin": 0.4968,
    "latency_p50": 870.0,
    "latency_p95": 1200.0,
    "latency_p99": 1238.0,
    "optimization_grade": "B",
    "grades": { "overall": "B", "waste": "A", "latency": "B" },
    "step_count": 2,
    "chain_depth": 0,
    "tool_overhead_ms": null,
    "stall_probability": 0.02
  },
  "security": {
    "pii_detected": false,
    "injection_attempt": false,
    "secret_leaked": false,
    "tool_fidelity_score": 0.98,
    "shadow_score": null
  },
  "routing_decision": { "tier": "simple", "model": "gpt-4o" },
  "steps": [
    { "type": "check", "output": { "allowed": true }, "latency_ms": 12.4 },
    { "type": "run",   "model": "gpt-4o", "output": "Bonjour le monde.", "latency_ms": 1228.1, "cost": 0.00318 }
  ]
}

Understanding summary fields

Field	What it means
`total_latency_ms`	Wall-clock time from trace creation to the last step completing.
`total_cost`	Sum of token costs across all `run` steps (USD).
`gross_margin`	`revenue − total_cost`. How much profit this request generated.
`latency_p50 / p95 / p99`	The 50th, 95th, and 99th percentile step latencies in this trace. P95 is a good number to watch — it captures slow outlier steps without being skewed by the absolute worst case.
`optimization_grade`	A–F letter grade. See grade table below.
`grades`	Grade broken down by dimension: `overall`, `waste` (over-spending on too-capable a model), `latency` (slow steps).
`chain_depth`	How many nested tool-call cycles occurred. 0 = no tool calls. Deeper chains mean more latency and cost.
`tool_overhead_ms`	Milliseconds spent waiting on tool execution (not model time). `null` if no tools were called.
`stall_probability`	A 0–1 score of how likely this trace would have stalled or looped if left unchecked. Scores above 0.7 are flagged in the dashboard.

Optimization grades

Grades reflect how efficiently this trace used the available models relative to the complexity of the task.

Grade	What it means	What to do
A	Optimal model selection, minimal latency, no waste.	Nothing — this is the target.
B	Slightly over-engineered — a simpler model probably suffices.	Consider moving this task pattern to the `simple` routing tier.
C	Noticeable waste or latency. Two or more sub-optimal steps.	Review routing configuration and check for redundant steps.
D	Significant over-spend or slow chain. Multiple issues.	Investigate which steps are slow or expensive and refactor the prompt or model selection.
F	Severely inefficient — likely a loop, runaway chain, or wrong model for every step.	Check for agent loops (`chain_depth` > 5), missing stop conditions, or misconfigured routing.

Understanding security fields

Field	What it means
`pii_detected`	One or more inputs contained recognisable personal data (email, phone, name, etc.). The data was flagged but not modified at this level — use the `secure` routing tier to prevent it from reaching public models.
`injection_attempt`	Input matched a prompt injection pattern — an attempt to override the system prompt or exfiltrate data via the model.
`secret_leaked`	An API key, password, or credential pattern was detected in the input or output.
`tool_fidelity_score`	0–1. How closely the model’s tool calls matched valid tool schemas. A score below 0.8 means the model is hallucinating tool names or arguments — review your tool definitions.
`shadow_score`	A comparison score from a shadow model run, if configured. `null` if shadow mode is not enabled.

List traces — query parameters

Parameter	Description
`page`	Page number (default 1).
`per_page`	Results per page (default 50, max 100).
`status`	Filter by `pending`, `completed`, `replay`, or `failed`.

Execution

All execution endpoints require an API key.

Which endpoint should I use?

Situation	Use
Standard prompt → response	`POST /api/v1/executions`
You need tokens to stream in real time	`GET /api/v1/executions/stream`
Existing OpenAI SDK and you want to drop Olyx in	`POST /api/v1/chat/completions`
You need vector embeddings with guardrails	`POST /api/v1/embeddings`
Multi-turn conversation with memory	`POST /api/v1/assistants`
Image generation	`POST /api/v1/images/generations`
You need a specific model, no smart routing	`POST /api/v1/runs`
Check if input is safe before doing anything	`POST /api/v1/checks`
Test what model routing would pick	`POST /api/v1/simulate`

Full pipeline — recommended

Handles safety check, smart routing, model call, and cost recording in one request.

POST /api/v1/executions
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "input": "Translate to French: Hello, world."
}

Optional fields

Field	Type	Description
`tools`	array	Tool definitions in OpenAI function-calling format. Translated per-provider automatically.
`tool_results`	array	Results from a prior `tool_calls_pending` response. See the tool call loop below.
`parent_step_id`	integer	The `step_id` of the prior `tool_call` step. Required when sending `tool_results`.

Response — text

{ "output": "Bonjour le monde.", "model": "gpt-4o", "step_id": 42 }

Response — tool calls pending

{
  "tool_calls": [{ "id": "call_1", "name": "get_weather", "arguments": { "city": "London" } }],
  "step_id": 43,
  "status": "tool_calls_pending"
}

Response — blocked

{ "reason": "No private model configured for sensitive routing", "step_id": 44 }

A blocked response means the safety layer stopped the request before any model was called. The reason tells you what triggered it (PII, injection, missing secure model, etc.). No tokens were spent.

const result = await client.execute({
  traceId: trace.data.id,
  input: "Translate to French: Hello, world.",
});

if (result.data.output) {
  console.log(result.data.output); // "Bonjour le monde."
  console.log(result.data.model);  // "gpt-4o"
}

result = client.execute(
    trace_id=trace.id,
    input="Translate to French: Hello, world.",
)

if result.output:
    print(result.output)  # "Bonjour le monde."
    print(result.model)   # "gpt-4o"

result = client.execute(
  trace_id: trace.id,
  input: "Translate to French: Hello, world."
)

if result.output
  puts result.output  # "Bonjour le monde."
  puts result.model   # "gpt-4o"
end

The tool call loop — step by step

When a model needs to call a tool (e.g. look up the weather, query a database), the flow involves multiple back-and-forth requests. Here is the complete pattern:

Step 1 — Send the initial request with your tool definitions:

POST /api/v1/executions
{
  "trace_id":  "...",
  "input":     "What's the weather in London right now?",
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    }
  ]
}

Step 2 — The model responds asking you to call the tool:

{
  "status": "tool_calls_pending",
  "step_id": 43,
  "tool_calls": [{ "id": "call_1", "name": "get_weather", "arguments": { "city": "London" } }]
}

Step 3 — You run the tool in your own code:

weather = get_weather("London")   # → "15°C, partly cloudy"

Step 4 — Send the result back, linking to step 43 via parent_step_id:

POST /api/v1/executions
{
  "trace_id":       "...",
  "input":          "What's the weather in London right now?",
  "parent_step_id": 43,
  "tool_results": [
    { "tool_call_id": "call_1", "content": "15°C, partly cloudy" }
  ]
}

Step 5 — The model now has the tool result and gives you the final answer:

{ "output": "It's 15°C and partly cloudy in London right now.", "step_id": 44 }

The model may request multiple tools in one response, or chain tool calls across multiple rounds — keep looping until the response has an output instead of tool_calls_pending.

const tools = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get current weather for a city",
      parameters: {
        type: "object",
        properties: { city: { type: "string" } },
        required: ["city"],
      },
    },
  },
];

let result = await client.execute({ traceId: trace.data.id, input: "Weather in London?", tools });

while (result.data.status === "tool_calls_pending") {
  const toolResults = result.data.tool_calls.map((call) => ({
    toolCallId: call.id,
    content: get_weather(call.arguments.city), // your function
  }));
  result = await client.execute({
    traceId: trace.data.id,
    input: "Weather in London?",
    parentStepId: result.data.step_id,
    toolResults,
  });
}

console.log(result.data.output);

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"],
            },
        },
    }
]

result = client.execute(trace_id=trace.id, input="Weather in London?", tools=tools)

while result.status == "tool_calls_pending":
    tool_results = [
        {"tool_call_id": c["id"], "content": get_weather(c["arguments"]["city"])}
        for c in result.tool_calls
    ]
    result = client.execute(
        trace_id=trace.id,
        input="Weather in London?",
        parent_step_id=result.step_id,
        tool_results=tool_results,
    )

print(result.output)

tools = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get current weather for a city",
      parameters: {
        type: "object",
        properties: { city: { type: "string" } },
        required: ["city"]
      }
    }
  }
]

result = client.execute(trace_id: trace.id, input: "Weather in London?", tools: tools)

while result.status == "tool_calls_pending"
  tool_results = result.tool_calls.map do |call|
    { tool_call_id: call[:id], content: get_weather(call[:arguments][:city]) }
  end
  result = client.execute(
    trace_id: trace.id,
    input: "Weather in London?",
    parent_step_id: result.step_id,
    tool_results: tool_results
  )
end

puts result.output

Streaming

Stream output token-by-token. Returns text/event-stream in the OpenAI chat.completion.chunk wire format. Each server-sent event is a JSON object on a data: line; the stream terminates with data: [DONE].

Use streaming when you want text to appear in the UI as the model generates it, rather than waiting for the full response.

GET /api/v1/executions/stream?trace_id=550e8400-...&input=Translate+to+French%3A+Hello
Authorization: Bearer ak_<key_id>.<secret>

Note: this is a GET request, not POST. The input and trace_id go in the query string (URL-encoded). Use encodeURIComponent or your HTTP library’s query builder — never build query strings by hand.

Event format

Each line that starts with data: is one chunk. Extract the delta.content field from each chunk and append it to your output buffer.

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1713520800,"model":"gpt-4o","choices":[{"delta":{"content":"Bon"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1713520800,"model":"gpt-4o","choices":[{"delta":{"content":"jour"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1713520800,"model":"gpt-4o","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}

data: [DONE]

Consuming with curl

curl -N "https://olyx.ai/api/v1/executions/stream?trace_id=550e8400-...&input=Hello" \
  -H "Authorization: Bearer ak_..."

-N disables buffering so chunks print as they arrive.

Consuming with the Fetch API (server-side Node 18+)

const res = await fetch('/api/v1/executions/stream?trace_id=...&input=Hello', {
  headers: { Authorization: `Bearer ${apiKey}` },
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  for (const line of decoder.decode(value).split('\n')) {
    if (!line.startsWith('data: ') || line === 'data: [DONE]') continue;
    const chunk = JSON.parse(line.slice(6));     // strip the "data: " prefix
    process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
  }
}

Do not expose an Olyx API key in browser code. For browser streaming, call your own backend route and have that route connect to Olyx server-side.

The trace_id must be a trace you created beforehand. The step is recorded and costs accrue the same as a non-streaming execution — the only difference is delivery mode.

Chat completions — OpenAI-compatible

PII scrubbing and MCP tool shimming applied on every call.

Use this endpoint when you have existing code written against the OpenAI API and want to route it through Olyx without rewriting the call sites. The request shape is the OpenAI chat/completions format plus a trace_id.

POST /api/v1/chat/completions
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user",   "content": "My email is user@example.com. Can you help?" }
  ],
  "model": "gpt-4o"
}

model is optional — the gateway routes to the project-configured model if omitted. PII in messages is redacted before any provider sees the content. Tool definitions in MCP inputSchema, flat hash, or OpenAI format are normalised automatically.

Embeddings

Anti-leak guardrail applied before any embedding is generated.

Embeddings convert text into a vector that captures semantic meaning. They’re used for similarity search, RAG (retrieval-augmented generation), and clustering. The Olyx embeddings endpoint adds a guardrail before the model call: if the input contains a secret or sensitive data, the request is blocked and no data leaves the gateway.

The model identifier determines which provider handles the request. Supported embedding providers: OpenAI (text-embedding-3-small, text-embedding-3-large), AWS Bedrock (amazon.titan-embed-text-v2:0, cohere.embed-english-v3), Google Gemini (text-embedding-004), and any OpenAI-compatible internal endpoint configured in the registry. Unsupported providers return HTTP 422.

POST /api/v1/embeddings
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "input": "The quarterly revenue exceeded expectations by 12%.",
  "model": "text-embedding-3-small"
}

input may be a single string or an array of strings. model defaults to text-embedding-3-small. Pass a Bedrock or Gemini model identifier to route to those providers without changing any other request field.

Response

{ "embeddings": [[0.0023, -0.0048, "..."]], "model": "text-embedding-3-small", "usage_tokens": 12, "step_id": 55 }

Blocked (403)

{ "error": "Input blocked by data-ingestion guardrail", "reason": "profanity", "step_id": 56 }

Assistants — audited multi-turn

Thread auditing and budget guardrails on every turn.

Use assistants when you need audited multi-turn conversations. Send the full message history on each turn and pass parent_step_id to link turns in the trace graph; Olyx records each turn so you can inspect the conversation in the dashboard.

POST /api/v1/assistants
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "messages": [
    { "role": "system", "content": "You are a project manager." },
    { "role": "user",   "content": "Help me plan a product launch." }
  ],
  "model": "gpt-4o"
}

For subsequent turns, include the full message history (system + all prior user and assistant messages) and pass parent_step_id to link turns in the trace graph.

Response

{ "output": "Let's start with a timeline...", "model": "gpt-4o", "step_id": 61, "thread_turn": 1 }

thread_turn is the 1-based index of this turn within the trace. Turn 1 is the first user message, turn 2 is the follow-up, and so on.

Image generations

Brand compliance and copyright protection before generation.

Supported image generation providers: OpenAI (dall-e-3, dall-e-2), AWS Bedrock Stability AI (stability.stable-diffusion-xl-v1 and variants), and AWS Bedrock Titan Image (amazon.titan-image-generator-v2:0). Unsupported providers return HTTP 422.

POST /api/v1/images/generations
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "prompt": "A futuristic cityscape at dusk, photorealistic",
  "model": "dall-e-3",
  "size": "1024x1024",
  "n": 1
}

model defaults to dall-e-3. size defaults to 1024x1024. Pass a Bedrock model identifier to route image generation through AWS without changing any other request field.

Response

{ "images": ["https://..."], "model": "dall-e-3", "step_id": 70 }

Blocked (403)

{ "error": "Prompt blocked: copyright violation detected", "violations": [{ "type": "copyright", "term": "mickey" }], "step_id": 71 }

Run a model directly

Call a specific model by identifier, bypassing smart routing. Use this when you need deterministic model selection — e.g. a pipeline step that must always use your fine-tuned model regardless of routing tier configuration.

The difference from /api/v1/executions: routing is skipped and your explicit model value is used directly. Safety checks and cost recording still apply.

POST /api/v1/runs
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "model":    "gpt-4o",
  "input":    "Summarise this document: ..."
}

Field	Type	Description
`trace_id`	string	Required. UUID of an existing trace.
`model`	string	Required. Registry identifier — must be registered on the project.
`input`	string	Required. The prompt text.
`parent_step_id`	integer	Optional. Link this direct run to an earlier trace step.

Response

{ "output": "The document covers...", "model": "gpt-4o", "step_id": 47 }

const result = await client.runs.create({
  traceId: trace.data.id,
  model: "gpt-4o",
  input: "Summarise this document: ...",
});

console.log(result.data.output);
console.log(result.data.model); // always "gpt-4o" — no routing

result = client.runs.create(
    trace_id=trace.id,
    model="gpt-4o",
    input="Summarise this document: ...",
)

print(result.output)
print(result.model)  # always "gpt-4o" — no routing

result = client.runs.create(
  trace_id: trace.id,
  model: "gpt-4o",
  input: "Summarise this document: ..."
)

puts result.output
puts result.model  # always "gpt-4o" — no routing

Safety check only

Run the full security pipeline against an input without invoking any model. Returns the risk decision so your code can gate on it before doing other work. Useful as a pre-flight check before calling an external tool, storing user content, or branching a workflow.

The check evaluates: PII detection, prompt injection patterns, secret/credential leakage, and configured policy rules. A check step is recorded on the trace.

POST /api/v1/checks
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{ "trace_id": "550e8400-e29b-41d4-a716-446655440000", "input": "Your input text" }

Response — allowed

{
  "allowed": true,
  "step_id": 48,
  "meta": {
    "pii_detected":       false,
    "injection_attempt":  false,
    "secret_leaked":      false,
    "risk_score":         0.03
  }
}

Response — blocked (200, not 4xx)

{
  "allowed": false,
  "step_id": 49,
  "reason":  "pii_detected",
  "meta": {
    "pii_detected":      true,
    "pii_entities":      [{ "type": "email", "start": 12, "end": 30 }],
    "injection_attempt": false,
    "secret_leaked":     false,
    "risk_score":        0.91
  }
}

The response is always 200 OK — allowed: false is a normal risk decision, not an HTTP error. The HTTP status code only tells you whether the API call itself succeeded, not whether the content was safe. Check allowed in your code.

pii_entities tells you exactly where in the string the PII was found (start and end are character offsets), so you can highlight it in a UI or log it for review.

risk_score is a 0–1 number. Scores below 0.2 are clean, 0.2–0.7 are borderline (flagged for review), and above 0.7 are blocked by default. You can adjust the threshold in project settings.

const check = await client.checks.create({
  traceId: trace.data.id,
  input: userInput,
});

if (!check.data.allowed) {
  console.error("Blocked:", check.data.reason);
  return;
}

// safe to proceed

check = client.checks.create(
    trace_id=trace.id,
    input=user_input,
)

if not check.allowed:
    print("Blocked:", check.reason)
    return

# safe to proceed

check = client.checks.create(
  trace_id: trace.id,
  input: user_input
)

unless check.allowed
  puts "Blocked: #{check.reason}"
  return
end

# safe to proceed

Log step

Append a custom structured event to a trace without calling any model or running a safety check. Use it to record outcomes that happen outside Olyx — user ratings, A/B assignment labels, downstream system results, or any metadata you want correlated with the execution.

POST /api/v1/logs
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "output": {
    "user_rating":  5,
    "comment":      "Great translation",
    "ab_variant":   "B",
    "accepted":     true
  }
}

output accepts any JSON object. It is stored as-is on the log step and is queryable in the dashboard.

Response

{
  "step_id": 52,
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "type": "log",
  "parent_step_id": null,
  "created_at": "2026-04-12T09:00:00Z"
}

The response is a receipt for the step Olyx recorded. Use step_id when you want to attach later steps to the same point in the trace graph, or when support needs to find the exact event in the dashboard.

The response intentionally does not echo output. Log payloads often contain user feedback, downstream identifiers, or application metadata, so Olyx stores the payload on the trace step and returns only linkage metadata from the create call.

Log steps appear in the trace step list alongside run, check, and tool_call steps and are included in optimization grade calculations when a user_rating key is present.

await client.logs.create({
  traceId: trace.data.id,
  output: {
    user_rating: 5,
    comment: "Great translation",
    ab_variant: "B",
    accepted: true,
  },
});

client.logs.create(
    trace_id=trace.id,
    output={
        "user_rating": 5,
        "comment": "Great translation",
        "ab_variant": "B",
        "accepted": True,
    },
)

client.logs.create(
  trace_id: trace.id,
  output: {
    user_rating: 5,
    comment: "Great translation",
    ab_variant: "B",
    accepted: true
  }
)

Simulate — dry-run routing

Resolve which model and tier Olyx would select for a given input, without invoking any model or creating an execution. Returns the routing decision, estimated cost, and fallback path.

This is useful in two scenarios:

CI/CD — verify your routing configuration is correct before deploying a change.
UX — show users which model will handle their request before they submit it.

Requires an API key and SDK mode — this endpoint is not available through the OpenAI-compatible gateway path described in Quick Start.

POST /api/v1/simulate
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "input":      "Analyse the attached contract for liability clauses",
  "project_id": 12
}

Response — resolved

{
  "status":         "resolved",
  "tier":           "complex",
  "model":          "gpt-4o",
  "estimated_cost": 0.0042,
  "fallback_path":  ["gpt-4o", "gpt-4o-mini"]
}

fallback_path shows the chain of models Olyx would try in order if the primary model fails or is unavailable.

Response — blocked

{
  "status": "blocked",
  "reason": "PII detected — would require Secure tier, no private model configured"
}

Response — unconfigured

{
  "status": "unconfigured",
  "tier":   "complex",
  "reason": "No model configured for the complex tier"
}

const decision = await client.simulate.create({
  input: "Analyse the attached contract for liability clauses",
  projectId: 12,
});

if (decision.data.status === "resolved") {
  console.log(decision.data.tier);           // "complex"
  console.log(decision.data.model);          // "gpt-4o"
  console.log(decision.data.estimated_cost); // 0.0042
}

decision = client.simulate.create(
    input="Analyse the attached contract for liability clauses",
    project_id=12,
)

if decision.status == "resolved":
    print(decision.tier)            # "complex"
    print(decision.model)           # "gpt-4o"
    print(decision.estimated_cost)  # 0.0042

decision = client.simulate.create(
  input: "Analyse the attached contract for liability clauses",
  project_id: 12
)

if decision.status == "resolved"
  puts decision.tier            # "complex"
  puts decision.model           # "gpt-4o"
  puts decision.estimated_cost  # 0.0042
end

Replay

Re-run an existing trace with optional overrides. Requires an API key.

Why would I replay a trace? Replays answer questions like: “Would this request have been cheaper on a different model? Would it have been faster? Would the output quality have been equivalent?” Instead of running new live traffic, you replay a trace you already have — so you can compare cost and latency across models without affecting end-user experience.

Fast path vs slow path

Olyx caches replay results for one hour. If you replay the same trace with the same overrides within the TTL, you get the cached result back immediately (200 OK). If no cached result exists, the job is queued and you need to poll for it (202 Accepted).

First replay:
  POST /api/v1/replay   →   202 { job_id: "a3f9...", status: "queued" }
  GET  /api/v1/replay/a3f9...  →  { status: "running" }
  GET  /api/v1/replay/a3f9...  →  { status: "completed", comparison: {...} }

Same replay within 1 hour:
  POST /api/v1/replay   →   200 { status: "completed", comparison: {...} }  (cache hit)

POST /api/v1/replay

POST /api/v1/replay
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "trace_id":       "550e8400-e29b-41d4-a716-446655440000",
  "force_model":    "gpt-4o-mini",
  "max_cost":       0.005
}

All override fields are optional and passed flat at the top level. force_model and compare_models are mutually exclusive.

Field	Type	Description
`force_model`	string	Replace every run step with this single model. Use this to test one cheaper alternative.
`compare_models`	array	Benchmark N models simultaneously — returns a multi-model comparison table. Use this to find the best option across several candidates in one job.
`force_models`	array	Override the model list for parallel/fanout steps.
`max_cost`	float	Skip any step whose estimated cost exceeds this USD value. Useful for large traces where you only want to replay cheap steps.

Response — async (202)

{ "job_id": "a3f9c1d8e72b", "status": "queued" }

const job = await client.replays.create({
  traceId: trace.data.id,
  forceModel: "gpt-4o-mini",
  maxCost: 0.005,
});

// Poll until complete
let result = await client.replays.get(job.data.job_id);
while (result.data.status === "queued" || result.data.status === "running") {
  await new Promise((r) => setTimeout(r, 1500));
  result = await client.replays.get(job.data.job_id);
}

const { source, replay } = result.data.comparison;
console.log(`Cost: ${source.total_cost} → ${replay.total_cost}`);
console.log(`Grade: ${source.optimization_grade} → ${replay.optimization_grade}`);

import time

job = client.replays.create(
    trace_id=trace.id,
    force_model="gpt-4o-mini",
    max_cost=0.005,
)

# Poll until complete
result = client.replays.get(job.job_id)
while result.status in ("queued", "running"):
    time.sleep(1.5)
    result = client.replays.get(job.job_id)

src, rep = result.comparison["source"], result.comparison["replay"]
print(f"Cost: {src['total_cost']} → {rep['total_cost']}")
print(f"Grade: {src['optimization_grade']} → {rep['optimization_grade']}")

job = client.replays.create(
  trace_id: trace.id,
  force_model: "gpt-4o-mini",
  max_cost: 0.005
)

# Poll until complete
result = client.replays.get(job.job_id)
while %w[queued running].include?(result.status)
  sleep 1.5
  result = client.replays.get(job.job_id)
end

src = result.comparison[:source]
rep = result.comparison[:replay]
puts "Cost: #{src[:total_cost]} → #{rep[:total_cost]}"
puts "Grade: #{src[:optimization_grade]} → #{rep[:optimization_grade]}"

Response — cache hit (200)

{
  "status":          "completed",
  "source_trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "replay_trace_id": "replay_4a2c9f",
  "overrides":       { "force_model": "gpt-4o-mini" },
  "comparison": {
    "source": { "total_cost": 0.012, "optimization_grade": "B", "total_latency_ms": 1240.0, "models_used": ["gpt-4o"], "grades": {} },
    "replay": { "total_cost": 0.003, "optimization_grade": "A", "total_latency_ms":  870.0, "models_used": ["gpt-4o-mini"], "grades": {} }
  }
}

GET /api/v1/replay/:job_id

Poll for async job status. Call this in a loop with a short sleep (1–2 seconds) until status is "completed" or "failed". Returns one of "queued", "running", "completed", or "failed".

GET /api/v1/replay/a3f9c1d8e72b
Authorization: Bearer ak_<key_id>.<secret>

Polling example (JavaScript)

async function waitForReplay(jobId, apiKey) {
  while (true) {
    const res  = await fetch(`/api/v1/replay/${jobId}`, {
      headers: { Authorization: `Bearer ${apiKey}` },
    });
    const data = await res.json();
    if (data.status === 'completed') return data;
    if (data.status === 'failed')    throw new Error(data.error);
    await new Promise(r => setTimeout(r, 1500));  // wait 1.5s between polls
  }
}

Response — completed, single-model

{
  "status":          "completed",
  "source_trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "replay_trace_id": "replay_4a2c9f",
  "comparison": {
    "source": { "total_cost": 0.012, "optimization_grade": "B", "total_latency_ms": 1240.0, "models_used": ["gpt-4o"],      "grades": {} },
    "replay": { "total_cost": 0.003, "optimization_grade": "A", "total_latency_ms":  870.0, "models_used": ["gpt-4o-mini"], "grades": {} }
  }
}

Reading the comparison: source is the original production run. replay is the re-run with your override. Compare total_cost and total_latency_ms to decide if the cheaper model is worth switching to.

Response — completed, multi-model (compare_models)

When you pass compare_models, the result has a replays array instead of a single replay object.

{
  "status":          "completed",
  "source_trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "comparison": {
    "source":  { "total_cost": 0.012, "optimization_grade": "B", "total_latency_ms": 1240.0, "models_used": ["gpt-4o"], "grades": {} },
    "replays": [
      { "model": "gpt-4o-mini",               "total_cost": 0.003, "optimization_grade": "A", "total_latency_ms": 870.0,  "models_used": ["gpt-4o-mini"],               "grades": {} },
      { "model": "gpt-3.5-turbo",             "total_cost": 0.001, "optimization_grade": "A", "total_latency_ms": 620.0,  "models_used": ["gpt-3.5-turbo"],             "grades": {} },
      { "model": "claude-haiku-4-5-20251001", "total_cost": 0.002, "optimization_grade": "A", "total_latency_ms": 740.0,  "models_used": ["claude-haiku-4-5-20251001"], "grades": {} }
    ]
  }
}

Response — failed

{ "status": "failed", "error": "Source trace not found" }

Job status keys expire after 1 hour. Completed comparison results are cached for 1 hour — re-submitting the same trace + overrides within the TTL returns a 200 OK cache hit.

Stats

Latency percentiles, cost, security metrics, and agent health. Requires a session token for dashboard requests or an API key for SDK/server-side reads.

GET /api/v1/stats?window=24&project_id=12

Query parameters

Parameter	Description
`project_id`	Filter to a specific project. Omit to use the API key’s project or the first accessible dashboard project.
`window`	Rolling hours (e.g. `24` = last 24 hours, `168` = last 7 days).
`start_date`	ISO date — start of a custom range.
`end_date`	ISO date — end of a custom range.

window and start_date/end_date are mutually exclusive. Use window for live dashboards and start_date/end_date for reports.

Response

{
  "period": { "label": "Last 24 hours", "start": "...", "end": "..." },
  "latency": { "p50": 870, "p95": 1200, "p99": 1238, "avg_ms": 940 },
  "cost": {
    "total": 1.284,
    "by_model": [{ "model": "gpt-4o", "cost": 0.94 }],
    "by_infrastructure": [{ "infrastructure": "openai", "cost": 1.18 }, { "infrastructure": "internal", "cost": 0.10 }]
  },
  "revenue": 12.50,
  "gross_margin": 11.216,
  "gross_margin_pct": 89.7,
  "security": {
    "total_requests": 4820,
    "pii_block_rate": 3.2,
    "injection_shadow_rate": 0.4,
    "tool_fidelity_score": 0.97,
    "loop_interventions": 2,
    "secret_leakage_rate": 0.0
  },
  "agent": {
    "mcp_latency": {
      "avg_model_think_ms": 812.4,
      "p95_model_think_ms": 1210.0,
      "avg_tool_exec_ms": 142.3,
      "p95_tool_exec_ms": 240.0,
      "tool_call_count": 28,
      "overhead_ratio": 14.9
    },
    "chain_depth": { "avg_tool_cycles": 2.1, "max_tool_cycles": 8, "deep_loop_count": 1 },
    "bias_drift": [],
    "stall": { "avg_probability": 0.04, "alert_count": 3, "monitor_count": 4, "alert_rate_pct": 0.6 }
  }
}

Latency fields

Field	What it means
`p50`	The median response time — half of requests were faster than this.
`p95`	95% of requests were faster than this. The number to watch for latency targets.
`p99`	The slowest 1% of requests. High p99 with low p50 means occasional slow outliers — often caused by model cold starts or tool latency.
`avg_ms`	Arithmetic mean. Less useful than p50/p95 because a few very slow requests can inflate it.

Security fields

Field	What it means
`pii_block_rate`	Percentage of traces where PII was detected. `3.2` means 3.2% of requests contained personal data.
`injection_shadow_rate`	Percentage of traces where a prompt injection pattern was detected. Even a low number is worth investigating.
`tool_fidelity_score`	Average 0–1 score across all tool-calling steps. Below 0.8 means models are hallucinating tool names or arguments frequently — review your tool schemas.
`loop_interventions`	How many times the loop-detection system halted an agent during this period. Non-zero means agents are getting stuck — investigate `chain_depth` on individual traces.
`secret_leakage_rate`	Percentage of traces where a secret or credential pattern was found in inputs or outputs.

Agent fields

Field	What it means
`mcp_latency`	Object containing average and p95 model-think time, tool-execution time, tool-call count, and overhead ratio.
`avg_tool_cycles`	Average number of tool-call rounds per trace. A value above 3–4 often indicates the agent is looping or the task decomposition is inefficient.
`max_tool_cycles`	The deepest chain observed in the period. A very high number (> 10) is a strong signal of a runaway agent.
`deep_loop_count`	Number of traces where `chain_depth` exceeded the configured threshold.
`bias_drift`	Array of per-model drift reports when enough samples exist. Empty means no model crossed the sample threshold in the window.
`stall.avg_probability`	Average `stall_probability` across all traces in the period. A rising value suggests a systematic issue.
`stall.alert_count`	Number of individual traces that crossed the stall alert threshold.
`stall.alert_rate_pct`	`alert_count` as a percentage of total requests.

Insights

Project-level optimization alerts derived from real trace data. Requires a session token.

Insights are automatically generated by Olyx by analysing patterns in your traces — you don’t need to configure them. They surface concrete, actionable recommendations (e.g. “you’re sending summarisation tasks to gpt-4o but gpt-4o-mini handles them equally well at 68% lower cost”).

Method	Path	Description
`GET`	`/api/v1/projects/:project_id/insights`	List active insights.
`POST`	`/api/v1/projects/:project_id/insights/refresh`	Trigger insight computation immediately instead of waiting for the next scheduled run.
`PATCH`	`/api/v1/projects/:project_id/insights/:insight_id/dismiss`	Dismiss an insight once you’ve acted on it (or decided not to).

Response

[
  {
    "id": 7,
    "insight_type": "migration_alert",
    "status": "active",
    "data": {
      "intent": "summarization",
      "current_model": "gpt-4o",
      "suggested_model": "gpt-4o-mini",
      "savings_pct": 68.2
    },
    "estimated_savings_usd": 210.00,
    "created_at": "2026-04-10T08:00:00Z"
  }
]

Insight types

`insight_type`	What triggered it	What to do
`migration_alert`	Olyx detected a task pattern (e.g. summarisation, translation) where a cheaper model produces equivalent output based on your trace history.	Use Replay to verify the cheaper model’s output quality, then update the routing tier.
`latency_alert`	P99 latency for a detected task pattern exceeds your configured threshold.	Check which steps are slow — usually tool latency or an oversized prompt.
`intent_pattern`	A task type is dominating traffic but has no dedicated routing rule, so it’s falling through to the default model.	Add a routing rule for this intent so it lands on the right tier automatically.

estimated_savings_usd is projected over 30 days based on your current traffic volume. It is an estimate — actual savings depend on traffic fluctuation and whether the cheaper model meets your quality bar.

Audit Events

Last 50 control-plane events for your organisation. Requires admin or owner role.

These are account-level security events (logins, MFA changes, key revocations) — not trace-level events. Use this endpoint to build a security audit UI or export events to a SIEM.

GET /api/v1/audit-events

[
  {
    "id": 201,
    "event": "login_success",
    "ip": "203.0.113.42",
    "user_agent": "Mozilla/5.0 ...",
    "metadata": {},
    "occurred_at": "2026-04-12T09:00:00Z"
  }
]

`event`	Description
`login_success` / `login_failed`	Authentication attempt result.
`logout`	Session ended.
`mfa_verified` / `mfa_failed`	MFA challenge result.
`mfa_enabled` / `mfa_disabled`	MFA configuration changed.
`email_verified`	Email address confirmed.
`member_invited`	Team invite sent.
`perimeter_block`	Request blocked by the Olyx perimeter — could be an invalid key, a revoked key, or a request from a blocked IP.

Private Agent Routes

For selected closed-beta deployments, your backend can point API calls at an Olyx Agent running inside your network. The agent exposes the same API shape as the hosted gateway and forwards requests outbound through your normal network controls.

Do I need this? Most closed-beta teams should start with the hosted gateway. Add the agent only when the hosted path cannot reach an internal provider endpoint, or when your deployment needs an internal egress point.

Prerequisites

An Olyx Agent deployment configured for your project.
A project-scoped API key stored in your secret manager.
Network access from your backend to the agent hostname.
Outbound access from the agent to the configured Olyx control plane and provider endpoints.

Pointing requests at the agent

Replace the hosted gateway host with the agent hostname in server-side calls:

# Hosted gateway
curl -X POST https://olyx.ai/api/v1/executions \
  -H "Authorization: Bearer ak_..." \
  -H "Content-Type: application/json" \
  -d '{ "trace_id": "...", "input": "Hello" }'

# Agent route — identical request shape, different host
curl -X POST http://olyx-agent:4000/api/v1/executions \
  -H "Authorization: Bearer ak_..." \
  -H "Content-Type: application/json" \
  -d '{ "trace_id": "...", "input": "Hello" }'

The path, headers, and body are identical — only the host changes.

TLS & custom certificate authority

If your agent route presents a certificate signed by an internal CA, tell your HTTP client to trust that CA. Without this, the client will fail before the request reaches Olyx.

curl

curl --cacert /etc/ssl/certs/internal-ca.pem \
  -X POST https://olyx-agent.internal/api/v1/executions \
  -H "Authorization: Bearer ak_ent_..." \
  -H "Content-Type: application/json" \
  -d '{ "trace_id": "...", "input": "Hello" }'

Python — requests

import requests

resp = requests.post(
    "https://olyx-agent.internal/api/v1/executions",
    headers={"Authorization": f"Bearer {api_key}"},
    json={"trace_id": trace_id, "input": "Hello"},
    verify="/etc/ssl/certs/internal-ca.pem",
    timeout=30,
)

Python — httpx

import httpx, ssl

ctx = ssl.create_default_context(cafile="/etc/ssl/certs/internal-ca.pem")
client = httpx.Client(base_url="https://olyx-agent.internal", ssl_context=ctx, timeout=30.0)

resp = client.post(
    "/api/v1/executions",
    headers={"Authorization": f"Bearer {api_key}"},
    json={"trace_id": trace_id, "input": "Hello"},
)

Ruby — Net::HTTP

require "net/http"
require "openssl"

uri  = URI("https://olyx-agent.internal/api/v1/executions")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl     = true
http.ca_file     = "/etc/ssl/certs/internal-ca.pem"
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
http.read_timeout = 30

req = Net::HTTP::Post.new(uri,
  "Content-Type"  => "application/json",
  "Authorization" => "Bearer #{api_key}"
)
req.body = { trace_id: trace_id, input: "Hello" }.to_json
res = http.request(req)

For Kubernetes in-cluster deployments, reference the agent by its service DNS name and mount the CA bundle your cluster or service mesh expects.

Connectivity verification

Ping the health endpoint before your first execution to confirm the agent is reachable and ready:

curl -f http://olyx-agent:4000/up
# → { "status": "ok" }

Use this as a readiness check in your deployment pipeline and container startup probe. A connection failure usually means the service name, port, container health, or network policy needs attention.

Private model behavior

Models registered with is_public: false represent internal or private routes. In closed beta, pair those model definitions with an agent deployment only when the provider endpoint is not reachable from the hosted gateway.

If a private model is selected but the network path cannot reach it, the execution fails like any other provider connectivity error. Keep one known-good public fallback in staging until the private route has enough trace history.

{
  "error": "Provider request failed",
  "code":  "provider_unreachable"
}

Cost reporting groups internal/private models under their configured infrastructure label when additional_config sets one:

{
  "model_definition": {
    "identifier": "my-private-llama",
    "is_public": false,
    "additional_config": {
      "infrastructure": "internal"
    }
  }
}

Regional routing

If you run services in multiple regions, register a model definition for each region and configure fallbacks between them:

PATCH /api/v1/projects/:project_id/models/:primary_model_id
Content-Type: application/json

{
  "model_definition": {
    "additional_config": {
      "fallback_identifier": "my-private-llama-eu-west"
    }
  }
}

At the network layer, route your application to the nearest healthy agent or hosted gateway using your own load balancer, service mesh, or DNS policy. Keep this simple during closed beta; add regional routing once trace latency shows a clear need.