API Reference

All application API endpoints are under /api/v1/. Backend execution calls authenticate with an API key in the Authorization header. Dashboard control-plane endpoints use a session cookie managed by the dashboard itself.

Application code should always use API keys — they are project-scoped, traceable, and independently revocable. Session credentials are for the dashboard UI and are not intended for backend services or automated scripts.

Authorization: Bearer ak_<key_id>.<secret>   # API key — backend / execution

See Authentication for credential types, key format, and security practices.


How it works — the trace model

Before diving into endpoints, it helps to understand how Olyx structures work.

Every AI call belongs to a trace. A trace is a container for one logical unit of work — a user message, a background job, an agentic task. Inside a trace are steps: each model call, safety check, tool invocation, or log event is recorded as a step. This structure is what lets Olyx calculate cost, latency, grades, and security flags per request.

The typical lifecycle looks like this:

flowchart LR TRACE[POST /TRACES] EXEC[POST /EXECUTIONS] MORE[OPTIONAL MORE STEPS] COMPLETE[PATCH /TRACES/:ID/COMPLETE] SUMMARY[TRACE SUMMARY] TRACE --> EXEC --> MORE --> COMPLETE --> SUMMARY EXEC --> COMPLETE
  1. POST /api/v1/traces creates the trace and returns a lightweight trace object.
  2. POST /api/v1/executions runs the AI call, records a step, and returns output.
  3. Repeat execution/log/check steps for multi-turn or agentic flows.
  4. PATCH /api/v1/traces/:id/complete seals the trace and triggers grading.

You always create a trace first, then attach execution calls to it via trace_id. Think of it like opening a tab at a restaurant, ordering items, and then closing the tab at the end.

Step types

TypeWhat it records
checkA safety-check-only call (PII, injection, secrets).
runA model call — stores input, output, model, cost, latency.
tool_callA tool invoked by the model during an agentic step.
logA custom event you pushed with POST /api/v1/logs — user rating, A/B label, etc.

Projects

All project endpoints require a session token.

A project is an isolated environment with its own API keys, routing rules, model registry, and spend limits. Use separate projects for separate environments (production, staging) or separate products.

MethodPathDescription
GET/api/v1/projectsList projects visible to your role.
POST/api/v1/projectsCreate a project (admin/owner only).
GET/api/v1/projects/:idGet a single project.
PATCH/api/v1/projects/:idUpdate name, description, status, or settings.
DELETE/api/v1/projects/:idDelete a project and its related project data (admin/owner only).

Create a project

POST /api/v1/projects
Content-Type: application/json

{
  "project": {
    "name": "Production",
    "description": "Live traffic"
  }
}

Response fields

FieldTypeDescription
idintegerProject ID.
namestringProject name.
descriptionstringOptional description.
statusstringactive or archived.
settingsobjectRouting tiers, shadow model config.
total_monthly_spendfloatCumulative spend this month (USD).
created_atISO8601Creation timestamp.

Update routing tiers

Routing tiers tell Olyx which model to use for different kinds of requests. The gateway classifies each request as simple, complex, or secure based on content and sends it to the right model automatically — you don’t need to pick a model per request.

PATCH merges settings; existing keys are preserved.

PATCH /api/v1/projects/:id
Content-Type: application/json

{
  "project": {
    "settings": {
      "routing_tiers": {
        "secure":  "my-private-model",
        "simple":  "gpt-4o-mini",
        "complex": "gpt-4o"
      }
    }
  }
}
TierWhen it applies
simpleShort, low-stakes prompts. Routed to your cheapest model.
complexLong context, reasoning-heavy, or multi-step prompts. Routed to your most capable model.
secureRequests flagged as sensitive. Point this to a selected private-agent or internal model route when one is configured.

Model Registry

All model registry endpoints require a session token.

The model registry is where you tell Olyx about every AI model your projects can use — including private or self-hosted models. Olyx needs this so it can route to the right endpoint, track costs accurately, and apply per-model retention policies.

MethodPathDescription
GET/api/v1/projects/:project_id/modelsList model definitions.
POST/api/v1/projects/:project_id/modelsAdd a model.
GET/api/v1/projects/:project_id/models/:model_idGet a model definition.
PATCH/api/v1/projects/:project_id/models/:model_idUpdate a model definition.
DELETE/api/v1/projects/:project_id/models/:model_idRemove a model.

Add a model

POST /api/v1/projects/:project_id/models
Content-Type: application/json

{
  "model_definition": {
    "name": "Internal Llama 3",
    "identifier": "my-llama-3",
    "provider": "openai",
    "base_url": "https://llm.internal.corp/v1",
    "api_key": "sk-...",
    "is_public": false,
    "input_cost_per_1k": 0.0002,
    "output_cost_per_1k": 0.0002,
    "data_retention_days": 7,
    "region_restriction": "us-east",
    "currency": "USD"
  }
}

All fields

FieldTypeDescription
namestringHuman-readable label shown in the dashboard.
identifierstringThe string you pass as model in API calls. Must be unique per project. This is the key you reference in routing tier config.
providerstringopenai, anthropic, gemini, bedrock, azure, or internal. Selects the provider adapter and wire format. Use openai for any model that speaks the OpenAI API format — including Groq, vLLM, Ollama, and LM Studio. Use azure for Azure OpenAI deployments, which use a different auth header and deployment-based URL.
base_urlstringThe HTTP endpoint Olyx sends requests to. For hosted models like gpt-4o this is already known — only needed for custom or private endpoints.
api_keystringStored encrypted. Never returned in responses — use has_api_key to check if one is stored.
is_publicbooleanfalse marks the model as private/internal. Use it with the private-agent route when the provider endpoint is not publicly reachable.
input_cost_per_1kfloatCost per 1,000 prompt tokens (USD). Used for cost tracking and spend limit enforcement.
output_cost_per_1kfloatCost per 1,000 completion tokens (USD).
data_retention_daysintegerDays before trace data associated with this model is purged. Default: 30.
region_restrictionstringOptional region tag (e.g. us-east, eu-west). Default: global.
currencystring3-letter currency code for cost tracking. Default: CAD.
additional_configobjectJSONB bag — e.g. { "infrastructure": "internal", "fallback_identifier": "gpt-4o-mini" }.

Read-only response fields

FieldTypeDescription
has_api_keybooleanWhether a key is stored. The value is never returned.
fallback_identifierstringShortcut read of additional_config.fallback_identifier.

API Keys

All key endpoints require a session token. Keys are project-scoped in the current closed beta.

MethodPathDescription
GET/api/v1/keysList keys for your organization.
POST/api/v1/keysCreate a project-scoped key.
PATCH/api/v1/keys/:idUpdate status or hourly limit.
DELETE/api/v1/keys/:idRevoke a key.

Create a key

POST /api/v1/keys
Content-Type: application/json

{
  "name": "Production",
  "project_id": 12,
  "hourly_limit": 10.00
}

project_id is required. A key created for staging cannot authenticate production traces, and a production key cannot write into staging.

hourly_limit is a rolling spend cap in USD. Once a key crosses the threshold in the current hourly window, its status flips to tripped and requests with that key are rejected until an admin reviews and resets the key. This is a safety net for runaway agentic loops — set it to a value just above your expected peak, not zero.

Response — create only

{
  "id": "ak_3f8a1c...",
  "name": "Production",
  "raw_key": "ak_3f8a1c....fde29b84...",
  "masked": "ak_3f8a...84",
  "status": "active",
  "hourly_limit": 10.00,
  "project": { "id": 12, "name": "Production" },
  "created_at": "2026-04-12T09:00:00Z"
}

raw_key is the full secret. It is shown exactly once — store it immediately in an environment variable or secrets manager. It cannot be recovered after this response.

List response fields

FieldTypeDescription
idstringThe key_id portion (ak_<hex>).
namestringLabel you gave the key.
maskedstringFirst 6 + last 4 characters — safe to display in UIs.
statusstringSee below.
hourly_limitfloat|nullSpend cap per rolling hour (USD). null means no cap is configured.
projectobjectProject the key is scoped to.
expires_atISO8601|nullHard expiry timestamp if set.
loop_intervention_countintegerHow many times an agent loop was auto-halted on this key. A non-zero value is a signal to review your agent’s stop conditions.
last_usedISO8601|nullLast authenticated request timestamp.

Key statuses

StatusWhat it meansWhat to do
activeKey is working normally.
trippedHourly spend cap was exceeded. Requests are blocked while the key remains tripped.Review the traffic, raise the hourly_limit if appropriate, then reset the key status to active.
loop_detectedOlyx detected a repeating request pattern and paused the key.Check the trace for the looping pattern, fix the caller, then reset the key status to active.

Traces

All trace endpoints require an API key.

MethodPathDescription
POST/api/v1/tracesCreate a trace.
GET/api/v1/tracesList traces (newest first, paginated).
GET/api/v1/traces/:idGet a trace with steps and summary.
PATCH/api/v1/traces/:id/completeMark a trace completed and trigger grading.

Create a trace

POST /api/v1/traces
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "metadata": { "user_id": "user_123", "task": "translation" },
  "revenue": 0.50
}

metadata is optional and must be a JSON object when supplied — use it to attach any context you want to see in the dashboard (user ID, session ID, feature flag, task type). revenue (USD) enables request-margin calculation: Olyx subtracts model cost from revenue after the trace is completed.

Response

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "created_at": "2026-04-12T09:00:00Z",
  "metadata": { "user_id": "user_123", "task": "translation" },
  "revenue": 0.50
}

Save the id — you’ll pass it as trace_id in every subsequent execution call.

import Olyx from "@olyx-labs/olyx";

const client = new Olyx({ apiKey: process.env.OLYX_API_KEY! });

const trace = await client.traces.create({
  metadata: { userId: "user_123", task: "translation" },
  revenue: 0.50,
});

console.log(trace.data.id); // pass to execute()
import os, olyx

client = olyx.Olyx(api_key=os.environ["OLYX_API_KEY"])

trace = client.traces.create(
    metadata={"user_id": "user_123", "task": "translation"},
    revenue=0.50,
)

print(trace.id)  # pass to execute()
require "olyx"

client = Olyx.new(api_key: ENV.fetch("OLYX_API_KEY"))

trace = client.traces.create(
  metadata: { user_id: "user_123", task: "translation" },
  revenue: 0.50
)

puts trace.id  # pass to execute()

Complete a trace

PATCH /api/v1/traces/:id/complete
Authorization: Bearer ak_<key_id>.<secret>

Completing a trace marks the unit of work as finished, triggers grading, and returns the computed cost summary.

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "created_at": "2026-04-12T09:00:00Z",
  "metadata": { "user_id": "user_123", "task": "translation" },
  "revenue": 0.50,
  "optimization_grade": "B",
  "grades": { "overall": "B", "waste": "A", "latency": "B" },
  "total_cost": 0.00318,
  "summary": {
    "total_cost": 0.00318,
    "revenue": 0.50,
    "gross_margin": 0.49682,
    "by_model": { "gpt-4o": 0.00318 },
    "by_infrastructure": { "public_cloud": 0.00318, "private": 0.0 }
  }
}
const summary = await client.traces.complete(trace.data.id);

console.log(summary.data.optimization_grade); // "B"
console.log(summary.data.total_cost);         // 0.00318
summary = client.traces.complete(trace.id)

print(summary.optimization_grade)  # "B"
print(summary.total_cost)          # 0.00318
summary = client.traces.complete(trace.id)

puts summary.optimization_grade  # "B"
puts summary.total_cost          # 0.00318

Get a trace

GET /api/v1/traces/:id
Authorization: Bearer ak_<key_id>.<secret>

Response

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "created_at": "2026-04-12T09:00:00Z",
  "summary": {
    "total_latency_ms": 1240.5,
    "total_cost": 0.00318,
    "revenue": 0.50,
    "gross_margin": 0.4968,
    "latency_p50": 870.0,
    "latency_p95": 1200.0,
    "latency_p99": 1238.0,
    "optimization_grade": "B",
    "grades": { "overall": "B", "waste": "A", "latency": "B" },
    "step_count": 2,
    "chain_depth": 0,
    "tool_overhead_ms": null,
    "stall_probability": 0.02
  },
  "security": {
    "pii_detected": false,
    "injection_attempt": false,
    "secret_leaked": false,
    "tool_fidelity_score": 0.98,
    "shadow_score": null
  },
  "routing_decision": { "tier": "simple", "model": "gpt-4o" },
  "steps": [
    { "type": "check", "output": { "allowed": true }, "latency_ms": 12.4 },
    { "type": "run",   "model": "gpt-4o", "output": "Bonjour le monde.", "latency_ms": 1228.1, "cost": 0.00318 }
  ]
}

Understanding summary fields

FieldWhat it means
total_latency_msWall-clock time from trace creation to the last step completing.
total_costSum of token costs across all run steps (USD).
gross_marginrevenue − total_cost. How much profit this request generated.
latency_p50 / p95 / p99The 50th, 95th, and 99th percentile step latencies in this trace. P95 is a good number to watch — it captures slow outlier steps without being skewed by the absolute worst case.
optimization_gradeA–F letter grade. See grade table below.
gradesGrade broken down by dimension: overall, waste (over-spending on too-capable a model), latency (slow steps).
chain_depthHow many nested tool-call cycles occurred. 0 = no tool calls. Deeper chains mean more latency and cost.
tool_overhead_msMilliseconds spent waiting on tool execution (not model time). null if no tools were called.
stall_probabilityA 0–1 score of how likely this trace would have stalled or looped if left unchecked. Scores above 0.7 are flagged in the dashboard.

Optimization grades

Grades reflect how efficiently this trace used the available models relative to the complexity of the task.

GradeWhat it meansWhat to do
AOptimal model selection, minimal latency, no waste.Nothing — this is the target.
BSlightly over-engineered — a simpler model probably suffices.Consider moving this task pattern to the simple routing tier.
CNoticeable waste or latency. Two or more sub-optimal steps.Review routing configuration and check for redundant steps.
DSignificant over-spend or slow chain. Multiple issues.Investigate which steps are slow or expensive and refactor the prompt or model selection.
FSeverely inefficient — likely a loop, runaway chain, or wrong model for every step.Check for agent loops (chain_depth > 5), missing stop conditions, or misconfigured routing.

Understanding security fields

FieldWhat it means
pii_detectedOne or more inputs contained recognisable personal data (email, phone, name, etc.). The data was flagged but not modified at this level — use the secure routing tier to prevent it from reaching public models.
injection_attemptInput matched a prompt injection pattern — an attempt to override the system prompt or exfiltrate data via the model.
secret_leakedAn API key, password, or credential pattern was detected in the input or output.
tool_fidelity_score0–1. How closely the model’s tool calls matched valid tool schemas. A score below 0.8 means the model is hallucinating tool names or arguments — review your tool definitions.
shadow_scoreA comparison score from a shadow model run, if configured. null if shadow mode is not enabled.

List traces — query parameters

ParameterDescription
pagePage number (default 1).
per_pageResults per page (default 50, max 100).
statusFilter by pending, completed, replay, or failed.

Execution

All execution endpoints require an API key.

Which endpoint should I use?

SituationUse
Standard prompt → responsePOST /api/v1/executions
You need tokens to stream in real timeGET /api/v1/executions/stream
Existing OpenAI SDK and you want to drop Olyx inPOST /api/v1/chat/completions
You need vector embeddings with guardrailsPOST /api/v1/embeddings
Multi-turn conversation with memoryPOST /api/v1/assistants
Image generationPOST /api/v1/images/generations
You need a specific model, no smart routingPOST /api/v1/runs
Check if input is safe before doing anythingPOST /api/v1/checks
Test what model routing would pickPOST /api/v1/simulate

Handles safety check, smart routing, model call, and cost recording in one request.

POST /api/v1/executions
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "input": "Translate to French: Hello, world."
}

Optional fields

FieldTypeDescription
toolsarrayTool definitions in OpenAI function-calling format. Translated per-provider automatically.
tool_resultsarrayResults from a prior tool_calls_pending response. See the tool call loop below.
parent_step_idintegerThe step_id of the prior tool_call step. Required when sending tool_results.

Response — text

{ "output": "Bonjour le monde.", "model": "gpt-4o", "step_id": 42 }

Response — tool calls pending

{
  "tool_calls": [{ "id": "call_1", "name": "get_weather", "arguments": { "city": "London" } }],
  "step_id": 43,
  "status": "tool_calls_pending"
}

Response — blocked

{ "reason": "No private model configured for sensitive routing", "step_id": 44 }

A blocked response means the safety layer stopped the request before any model was called. The reason tells you what triggered it (PII, injection, missing secure model, etc.). No tokens were spent.

const result = await client.execute({
  traceId: trace.data.id,
  input: "Translate to French: Hello, world.",
});

if (result.data.output) {
  console.log(result.data.output); // "Bonjour le monde."
  console.log(result.data.model);  // "gpt-4o"
}
result = client.execute(
    trace_id=trace.id,
    input="Translate to French: Hello, world.",
)

if result.output:
    print(result.output)  # "Bonjour le monde."
    print(result.model)   # "gpt-4o"
result = client.execute(
  trace_id: trace.id,
  input: "Translate to French: Hello, world."
)

if result.output
  puts result.output  # "Bonjour le monde."
  puts result.model   # "gpt-4o"
end

The tool call loop — step by step

When a model needs to call a tool (e.g. look up the weather, query a database), the flow involves multiple back-and-forth requests. Here is the complete pattern:

Step 1 — Send the initial request with your tool definitions:

POST /api/v1/executions
{
  "trace_id":  "...",
  "input":     "What's the weather in London right now?",
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    }
  ]
}

Step 2 — The model responds asking you to call the tool:

{
  "status": "tool_calls_pending",
  "step_id": 43,
  "tool_calls": [{ "id": "call_1", "name": "get_weather", "arguments": { "city": "London" } }]
}

Step 3 — You run the tool in your own code:

weather = get_weather("London")   # → "15°C, partly cloudy"

Step 4 — Send the result back, linking to step 43 via parent_step_id:

POST /api/v1/executions
{
  "trace_id":       "...",
  "input":          "What's the weather in London right now?",
  "parent_step_id": 43,
  "tool_results": [
    { "tool_call_id": "call_1", "content": "15°C, partly cloudy" }
  ]
}

Step 5 — The model now has the tool result and gives you the final answer:

{ "output": "It's 15°C and partly cloudy in London right now.", "step_id": 44 }

The model may request multiple tools in one response, or chain tool calls across multiple rounds — keep looping until the response has an output instead of tool_calls_pending.

const tools = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get current weather for a city",
      parameters: {
        type: "object",
        properties: { city: { type: "string" } },
        required: ["city"],
      },
    },
  },
];

let result = await client.execute({ traceId: trace.data.id, input: "Weather in London?", tools });

while (result.data.status === "tool_calls_pending") {
  const toolResults = result.data.tool_calls.map((call) => ({
    toolCallId: call.id,
    content: get_weather(call.arguments.city), // your function
  }));
  result = await client.execute({
    traceId: trace.data.id,
    input: "Weather in London?",
    parentStepId: result.data.step_id,
    toolResults,
  });
}

console.log(result.data.output);
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"],
            },
        },
    }
]

result = client.execute(trace_id=trace.id, input="Weather in London?", tools=tools)

while result.status == "tool_calls_pending":
    tool_results = [
        {"tool_call_id": c["id"], "content": get_weather(c["arguments"]["city"])}
        for c in result.tool_calls
    ]
    result = client.execute(
        trace_id=trace.id,
        input="Weather in London?",
        parent_step_id=result.step_id,
        tool_results=tool_results,
    )

print(result.output)
tools = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get current weather for a city",
      parameters: {
        type: "object",
        properties: { city: { type: "string" } },
        required: ["city"]
      }
    }
  }
]

result = client.execute(trace_id: trace.id, input: "Weather in London?", tools: tools)

while result.status == "tool_calls_pending"
  tool_results = result.tool_calls.map do |call|
    { tool_call_id: call[:id], content: get_weather(call[:arguments][:city]) }
  end
  result = client.execute(
    trace_id: trace.id,
    input: "Weather in London?",
    parent_step_id: result.step_id,
    tool_results: tool_results
  )
end

puts result.output

Streaming

Stream output token-by-token. Returns text/event-stream in the OpenAI chat.completion.chunk wire format. Each server-sent event is a JSON object on a data: line; the stream terminates with data: [DONE].

Use streaming when you want text to appear in the UI as the model generates it, rather than waiting for the full response.

GET /api/v1/executions/stream?trace_id=550e8400-...&input=Translate+to+French%3A+Hello
Authorization: Bearer ak_<key_id>.<secret>

Note: this is a GET request, not POST. The input and trace_id go in the query string (URL-encoded). Use encodeURIComponent or your HTTP library’s query builder — never build query strings by hand.

Event format

Each line that starts with data: is one chunk. Extract the delta.content field from each chunk and append it to your output buffer.

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1713520800,"model":"gpt-4o","choices":[{"delta":{"content":"Bon"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1713520800,"model":"gpt-4o","choices":[{"delta":{"content":"jour"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1713520800,"model":"gpt-4o","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}

data: [DONE]

Consuming with curl

curl -N "https://olyx.ai/api/v1/executions/stream?trace_id=550e8400-...&input=Hello" \
  -H "Authorization: Bearer ak_..."

-N disables buffering so chunks print as they arrive.

Consuming with the Fetch API (server-side Node 18+)

const res = await fetch('/api/v1/executions/stream?trace_id=...&input=Hello', {
  headers: { Authorization: `Bearer ${apiKey}` },
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  for (const line of decoder.decode(value).split('\n')) {
    if (!line.startsWith('data: ') || line === 'data: [DONE]') continue;
    const chunk = JSON.parse(line.slice(6));     // strip the "data: " prefix
    process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
  }
}

Do not expose an Olyx API key in browser code. For browser streaming, call your own backend route and have that route connect to Olyx server-side.

The trace_id must be a trace you created beforehand. The step is recorded and costs accrue the same as a non-streaming execution — the only difference is delivery mode.

Chat completions — OpenAI-compatible

PII scrubbing and MCP tool shimming applied on every call.

Use this endpoint when you have existing code written against the OpenAI API and want to route it through Olyx without rewriting the call sites. The request shape is the OpenAI chat/completions format plus a trace_id.

POST /api/v1/chat/completions
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user",   "content": "My email is user@example.com. Can you help?" }
  ],
  "model": "gpt-4o"
}

model is optional — the gateway routes to the project-configured model if omitted. PII in messages is redacted before any provider sees the content. Tool definitions in MCP inputSchema, flat hash, or OpenAI format are normalised automatically.

Embeddings

Anti-leak guardrail applied before any embedding is generated.

Embeddings convert text into a vector that captures semantic meaning. They’re used for similarity search, RAG (retrieval-augmented generation), and clustering. The Olyx embeddings endpoint adds a guardrail before the model call: if the input contains a secret or sensitive data, the request is blocked and no data leaves the gateway.

The model identifier determines which provider handles the request. Supported embedding providers: OpenAI (text-embedding-3-small, text-embedding-3-large), AWS Bedrock (amazon.titan-embed-text-v2:0, cohere.embed-english-v3), Google Gemini (text-embedding-004), and any OpenAI-compatible internal endpoint configured in the registry. Unsupported providers return HTTP 422.

POST /api/v1/embeddings
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "input": "The quarterly revenue exceeded expectations by 12%.",
  "model": "text-embedding-3-small"
}

input may be a single string or an array of strings. model defaults to text-embedding-3-small. Pass a Bedrock or Gemini model identifier to route to those providers without changing any other request field.

Response

{ "embeddings": [[0.0023, -0.0048, "..."]], "model": "text-embedding-3-small", "usage_tokens": 12, "step_id": 55 }

Blocked (403)

{ "error": "Input blocked by data-ingestion guardrail", "reason": "profanity", "step_id": 56 }

Assistants — audited multi-turn

Thread auditing and budget guardrails on every turn.

Use assistants when you need audited multi-turn conversations. Send the full message history on each turn and pass parent_step_id to link turns in the trace graph; Olyx records each turn so you can inspect the conversation in the dashboard.

POST /api/v1/assistants
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "messages": [
    { "role": "system", "content": "You are a project manager." },
    { "role": "user",   "content": "Help me plan a product launch." }
  ],
  "model": "gpt-4o"
}

For subsequent turns, include the full message history (system + all prior user and assistant messages) and pass parent_step_id to link turns in the trace graph.

Response

{ "output": "Let's start with a timeline...", "model": "gpt-4o", "step_id": 61, "thread_turn": 1 }

thread_turn is the 1-based index of this turn within the trace. Turn 1 is the first user message, turn 2 is the follow-up, and so on.

Image generations

Brand compliance and copyright protection before generation.

Supported image generation providers: OpenAI (dall-e-3, dall-e-2), AWS Bedrock Stability AI (stability.stable-diffusion-xl-v1 and variants), and AWS Bedrock Titan Image (amazon.titan-image-generator-v2:0). Unsupported providers return HTTP 422.

POST /api/v1/images/generations
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "prompt": "A futuristic cityscape at dusk, photorealistic",
  "model": "dall-e-3",
  "size": "1024x1024",
  "n": 1
}

model defaults to dall-e-3. size defaults to 1024x1024. Pass a Bedrock model identifier to route image generation through AWS without changing any other request field.

Response

{ "images": ["https://..."], "model": "dall-e-3", "step_id": 70 }

Blocked (403)

{ "error": "Prompt blocked: copyright violation detected", "violations": [{ "type": "copyright", "term": "mickey" }], "step_id": 71 }

Run a model directly

Call a specific model by identifier, bypassing smart routing. Use this when you need deterministic model selection — e.g. a pipeline step that must always use your fine-tuned model regardless of routing tier configuration.

The difference from /api/v1/executions: routing is skipped and your explicit model value is used directly. Safety checks and cost recording still apply.

POST /api/v1/runs
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "model":    "gpt-4o",
  "input":    "Summarise this document: ..."
}
FieldTypeDescription
trace_idstringRequired. UUID of an existing trace.
modelstringRequired. Registry identifier — must be registered on the project.
inputstringRequired. The prompt text.
parent_step_idintegerOptional. Link this direct run to an earlier trace step.

Response

{ "output": "The document covers...", "model": "gpt-4o", "step_id": 47 }
const result = await client.runs.create({
  traceId: trace.data.id,
  model: "gpt-4o",
  input: "Summarise this document: ...",
});

console.log(result.data.output);
console.log(result.data.model); // always "gpt-4o" — no routing
result = client.runs.create(
    trace_id=trace.id,
    model="gpt-4o",
    input="Summarise this document: ...",
)

print(result.output)
print(result.model)  # always "gpt-4o" — no routing
result = client.runs.create(
  trace_id: trace.id,
  model: "gpt-4o",
  input: "Summarise this document: ..."
)

puts result.output
puts result.model  # always "gpt-4o" — no routing

Safety check only

Run the full security pipeline against an input without invoking any model. Returns the risk decision so your code can gate on it before doing other work. Useful as a pre-flight check before calling an external tool, storing user content, or branching a workflow.

The check evaluates: PII detection, prompt injection patterns, secret/credential leakage, and configured policy rules. A check step is recorded on the trace.

POST /api/v1/checks
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{ "trace_id": "550e8400-e29b-41d4-a716-446655440000", "input": "Your input text" }

Response — allowed

{
  "allowed": true,
  "step_id": 48,
  "meta": {
    "pii_detected":       false,
    "injection_attempt":  false,
    "secret_leaked":      false,
    "risk_score":         0.03
  }
}

Response — blocked (200, not 4xx)

{
  "allowed": false,
  "step_id": 49,
  "reason":  "pii_detected",
  "meta": {
    "pii_detected":      true,
    "pii_entities":      [{ "type": "email", "start": 12, "end": 30 }],
    "injection_attempt": false,
    "secret_leaked":     false,
    "risk_score":        0.91
  }
}

The response is always 200 OKallowed: false is a normal risk decision, not an HTTP error. The HTTP status code only tells you whether the API call itself succeeded, not whether the content was safe. Check allowed in your code.

pii_entities tells you exactly where in the string the PII was found (start and end are character offsets), so you can highlight it in a UI or log it for review.

risk_score is a 0–1 number. Scores below 0.2 are clean, 0.2–0.7 are borderline (flagged for review), and above 0.7 are blocked by default. You can adjust the threshold in project settings.

const check = await client.checks.create({
  traceId: trace.data.id,
  input: userInput,
});

if (!check.data.allowed) {
  console.error("Blocked:", check.data.reason);
  return;
}

// safe to proceed
check = client.checks.create(
    trace_id=trace.id,
    input=user_input,
)

if not check.allowed:
    print("Blocked:", check.reason)
    return

# safe to proceed
check = client.checks.create(
  trace_id: trace.id,
  input: user_input
)

unless check.allowed
  puts "Blocked: #{check.reason}"
  return
end

# safe to proceed

Log step

Append a custom structured event to a trace without calling any model or running a safety check. Use it to record outcomes that happen outside Olyx — user ratings, A/B assignment labels, downstream system results, or any metadata you want correlated with the execution.

POST /api/v1/logs
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "output": {
    "user_rating":  5,
    "comment":      "Great translation",
    "ab_variant":   "B",
    "accepted":     true
  }
}

output accepts any JSON object. It is stored as-is on the log step and is queryable in the dashboard.

Response

{
  "step_id": 52,
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "type": "log",
  "parent_step_id": null,
  "created_at": "2026-04-12T09:00:00Z"
}

The response is a receipt for the step Olyx recorded. Use step_id when you want to attach later steps to the same point in the trace graph, or when support needs to find the exact event in the dashboard.

The response intentionally does not echo output. Log payloads often contain user feedback, downstream identifiers, or application metadata, so Olyx stores the payload on the trace step and returns only linkage metadata from the create call.

Log steps appear in the trace step list alongside run, check, and tool_call steps and are included in optimization grade calculations when a user_rating key is present.

await client.logs.create({
  traceId: trace.data.id,
  output: {
    user_rating: 5,
    comment: "Great translation",
    ab_variant: "B",
    accepted: true,
  },
});
client.logs.create(
    trace_id=trace.id,
    output={
        "user_rating": 5,
        "comment": "Great translation",
        "ab_variant": "B",
        "accepted": True,
    },
)
client.logs.create(
  trace_id: trace.id,
  output: {
    user_rating: 5,
    comment: "Great translation",
    ab_variant: "B",
    accepted: true
  }
)

Simulate — dry-run routing

Resolve which model and tier Olyx would select for a given input, without invoking any model or creating an execution. Returns the routing decision, estimated cost, and fallback path.

This is useful in two scenarios:

  • CI/CD — verify your routing configuration is correct before deploying a change.
  • UX — show users which model will handle their request before they submit it.

Requires an API key and SDK mode — this endpoint is not available through the OpenAI-compatible gateway path described in Quick Start.

POST /api/v1/simulate
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "input":      "Analyse the attached contract for liability clauses",
  "project_id": 12
}

Response — resolved

{
  "status":         "resolved",
  "tier":           "complex",
  "model":          "gpt-4o",
  "estimated_cost": 0.0042,
  "fallback_path":  ["gpt-4o", "gpt-4o-mini"]
}

fallback_path shows the chain of models Olyx would try in order if the primary model fails or is unavailable.

Response — blocked

{
  "status": "blocked",
  "reason": "PII detected — would require Secure tier, no private model configured"
}

Response — unconfigured

{
  "status": "unconfigured",
  "tier":   "complex",
  "reason": "No model configured for the complex tier"
}
const decision = await client.simulate.create({
  input: "Analyse the attached contract for liability clauses",
  projectId: 12,
});

if (decision.data.status === "resolved") {
  console.log(decision.data.tier);           // "complex"
  console.log(decision.data.model);          // "gpt-4o"
  console.log(decision.data.estimated_cost); // 0.0042
}
decision = client.simulate.create(
    input="Analyse the attached contract for liability clauses",
    project_id=12,
)

if decision.status == "resolved":
    print(decision.tier)            # "complex"
    print(decision.model)           # "gpt-4o"
    print(decision.estimated_cost)  # 0.0042
decision = client.simulate.create(
  input: "Analyse the attached contract for liability clauses",
  project_id: 12
)

if decision.status == "resolved"
  puts decision.tier            # "complex"
  puts decision.model           # "gpt-4o"
  puts decision.estimated_cost  # 0.0042
end

Replay

Re-run an existing trace with optional overrides. Requires an API key.

Why would I replay a trace? Replays answer questions like: “Would this request have been cheaper on a different model? Would it have been faster? Would the output quality have been equivalent?” Instead of running new live traffic, you replay a trace you already have — so you can compare cost and latency across models without affecting end-user experience.

Fast path vs slow path

Olyx caches replay results for one hour. If you replay the same trace with the same overrides within the TTL, you get the cached result back immediately (200 OK). If no cached result exists, the job is queued and you need to poll for it (202 Accepted).

First replay:
  POST /api/v1/replay   →   202 { job_id: "a3f9...", status: "queued" }
  GET  /api/v1/replay/a3f9...  →  { status: "running" }
  GET  /api/v1/replay/a3f9...  →  { status: "completed", comparison: {...} }

Same replay within 1 hour:
  POST /api/v1/replay   →   200 { status: "completed", comparison: {...} }  (cache hit)

POST /api/v1/replay

POST /api/v1/replay
Authorization: Bearer ak_<key_id>.<secret>
Content-Type: application/json

{
  "trace_id":       "550e8400-e29b-41d4-a716-446655440000",
  "force_model":    "gpt-4o-mini",
  "max_cost":       0.005
}

All override fields are optional and passed flat at the top level. force_model and compare_models are mutually exclusive.

FieldTypeDescription
force_modelstringReplace every run step with this single model. Use this to test one cheaper alternative.
compare_modelsarrayBenchmark N models simultaneously — returns a multi-model comparison table. Use this to find the best option across several candidates in one job.
force_modelsarrayOverride the model list for parallel/fanout steps.
max_costfloatSkip any step whose estimated cost exceeds this USD value. Useful for large traces where you only want to replay cheap steps.

Response — async (202)

{ "job_id": "a3f9c1d8e72b", "status": "queued" }
const job = await client.replays.create({
  traceId: trace.data.id,
  forceModel: "gpt-4o-mini",
  maxCost: 0.005,
});

// Poll until complete
let result = await client.replays.get(job.data.job_id);
while (result.data.status === "queued" || result.data.status === "running") {
  await new Promise((r) => setTimeout(r, 1500));
  result = await client.replays.get(job.data.job_id);
}

const { source, replay } = result.data.comparison;
console.log(`Cost: ${source.total_cost} → ${replay.total_cost}`);
console.log(`Grade: ${source.optimization_grade} → ${replay.optimization_grade}`);
import time

job = client.replays.create(
    trace_id=trace.id,
    force_model="gpt-4o-mini",
    max_cost=0.005,
)

# Poll until complete
result = client.replays.get(job.job_id)
while result.status in ("queued", "running"):
    time.sleep(1.5)
    result = client.replays.get(job.job_id)

src, rep = result.comparison["source"], result.comparison["replay"]
print(f"Cost: {src['total_cost']}{rep['total_cost']}")
print(f"Grade: {src['optimization_grade']}{rep['optimization_grade']}")
job = client.replays.create(
  trace_id: trace.id,
  force_model: "gpt-4o-mini",
  max_cost: 0.005
)

# Poll until complete
result = client.replays.get(job.job_id)
while %w[queued running].include?(result.status)
  sleep 1.5
  result = client.replays.get(job.job_id)
end

src = result.comparison[:source]
rep = result.comparison[:replay]
puts "Cost: #{src[:total_cost]}#{rep[:total_cost]}"
puts "Grade: #{src[:optimization_grade]}#{rep[:optimization_grade]}"

Response — cache hit (200)

{
  "status":          "completed",
  "source_trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "replay_trace_id": "replay_4a2c9f",
  "overrides":       { "force_model": "gpt-4o-mini" },
  "comparison": {
    "source": { "total_cost": 0.012, "optimization_grade": "B", "total_latency_ms": 1240.0, "models_used": ["gpt-4o"], "grades": {} },
    "replay": { "total_cost": 0.003, "optimization_grade": "A", "total_latency_ms":  870.0, "models_used": ["gpt-4o-mini"], "grades": {} }
  }
}

GET /api/v1/replay/:job_id

Poll for async job status. Call this in a loop with a short sleep (1–2 seconds) until status is "completed" or "failed". Returns one of "queued", "running", "completed", or "failed".

GET /api/v1/replay/a3f9c1d8e72b
Authorization: Bearer ak_<key_id>.<secret>

Polling example (JavaScript)

async function waitForReplay(jobId, apiKey) {
  while (true) {
    const res  = await fetch(`/api/v1/replay/${jobId}`, {
      headers: { Authorization: `Bearer ${apiKey}` },
    });
    const data = await res.json();
    if (data.status === 'completed') return data;
    if (data.status === 'failed')    throw new Error(data.error);
    await new Promise(r => setTimeout(r, 1500));  // wait 1.5s between polls
  }
}

Response — completed, single-model

{
  "status":          "completed",
  "source_trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "replay_trace_id": "replay_4a2c9f",
  "comparison": {
    "source": { "total_cost": 0.012, "optimization_grade": "B", "total_latency_ms": 1240.0, "models_used": ["gpt-4o"],      "grades": {} },
    "replay": { "total_cost": 0.003, "optimization_grade": "A", "total_latency_ms":  870.0, "models_used": ["gpt-4o-mini"], "grades": {} }
  }
}

Reading the comparison: source is the original production run. replay is the re-run with your override. Compare total_cost and total_latency_ms to decide if the cheaper model is worth switching to.

Response — completed, multi-model (compare_models)

When you pass compare_models, the result has a replays array instead of a single replay object.

{
  "status":          "completed",
  "source_trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "comparison": {
    "source":  { "total_cost": 0.012, "optimization_grade": "B", "total_latency_ms": 1240.0, "models_used": ["gpt-4o"], "grades": {} },
    "replays": [
      { "model": "gpt-4o-mini",               "total_cost": 0.003, "optimization_grade": "A", "total_latency_ms": 870.0,  "models_used": ["gpt-4o-mini"],               "grades": {} },
      { "model": "gpt-3.5-turbo",             "total_cost": 0.001, "optimization_grade": "A", "total_latency_ms": 620.0,  "models_used": ["gpt-3.5-turbo"],             "grades": {} },
      { "model": "claude-haiku-4-5-20251001", "total_cost": 0.002, "optimization_grade": "A", "total_latency_ms": 740.0,  "models_used": ["claude-haiku-4-5-20251001"], "grades": {} }
    ]
  }
}

Response — failed

{ "status": "failed", "error": "Source trace not found" }

Job status keys expire after 1 hour. Completed comparison results are cached for 1 hour — re-submitting the same trace + overrides within the TTL returns a 200 OK cache hit.


Stats

Latency percentiles, cost, security metrics, and agent health. Requires a session token for dashboard requests or an API key for SDK/server-side reads.

GET /api/v1/stats?window=24&project_id=12

Query parameters

ParameterDescription
project_idFilter to a specific project. Omit to use the API key’s project or the first accessible dashboard project.
windowRolling hours (e.g. 24 = last 24 hours, 168 = last 7 days).
start_dateISO date — start of a custom range.
end_dateISO date — end of a custom range.

window and start_date/end_date are mutually exclusive. Use window for live dashboards and start_date/end_date for reports.

Response

{
  "period": { "label": "Last 24 hours", "start": "...", "end": "..." },
  "latency": { "p50": 870, "p95": 1200, "p99": 1238, "avg_ms": 940 },
  "cost": {
    "total": 1.284,
    "by_model": [{ "model": "gpt-4o", "cost": 0.94 }],
    "by_infrastructure": [{ "infrastructure": "openai", "cost": 1.18 }, { "infrastructure": "internal", "cost": 0.10 }]
  },
  "revenue": 12.50,
  "gross_margin": 11.216,
  "gross_margin_pct": 89.7,
  "security": {
    "total_requests": 4820,
    "pii_block_rate": 3.2,
    "injection_shadow_rate": 0.4,
    "tool_fidelity_score": 0.97,
    "loop_interventions": 2,
    "secret_leakage_rate": 0.0
  },
  "agent": {
    "mcp_latency": {
      "avg_model_think_ms": 812.4,
      "p95_model_think_ms": 1210.0,
      "avg_tool_exec_ms": 142.3,
      "p95_tool_exec_ms": 240.0,
      "tool_call_count": 28,
      "overhead_ratio": 14.9
    },
    "chain_depth": { "avg_tool_cycles": 2.1, "max_tool_cycles": 8, "deep_loop_count": 1 },
    "bias_drift": [],
    "stall": { "avg_probability": 0.04, "alert_count": 3, "monitor_count": 4, "alert_rate_pct": 0.6 }
  }
}

Latency fields

FieldWhat it means
p50The median response time — half of requests were faster than this.
p9595% of requests were faster than this. The number to watch for latency targets.
p99The slowest 1% of requests. High p99 with low p50 means occasional slow outliers — often caused by model cold starts or tool latency.
avg_msArithmetic mean. Less useful than p50/p95 because a few very slow requests can inflate it.

Security fields

FieldWhat it means
pii_block_ratePercentage of traces where PII was detected. 3.2 means 3.2% of requests contained personal data.
injection_shadow_ratePercentage of traces where a prompt injection pattern was detected. Even a low number is worth investigating.
tool_fidelity_scoreAverage 0–1 score across all tool-calling steps. Below 0.8 means models are hallucinating tool names or arguments frequently — review your tool schemas.
loop_interventionsHow many times the loop-detection system halted an agent during this period. Non-zero means agents are getting stuck — investigate chain_depth on individual traces.
secret_leakage_ratePercentage of traces where a secret or credential pattern was found in inputs or outputs.

Agent fields

FieldWhat it means
mcp_latencyObject containing average and p95 model-think time, tool-execution time, tool-call count, and overhead ratio.
avg_tool_cyclesAverage number of tool-call rounds per trace. A value above 3–4 often indicates the agent is looping or the task decomposition is inefficient.
max_tool_cyclesThe deepest chain observed in the period. A very high number (> 10) is a strong signal of a runaway agent.
deep_loop_countNumber of traces where chain_depth exceeded the configured threshold.
bias_driftArray of per-model drift reports when enough samples exist. Empty means no model crossed the sample threshold in the window.
stall.avg_probabilityAverage stall_probability across all traces in the period. A rising value suggests a systematic issue.
stall.alert_countNumber of individual traces that crossed the stall alert threshold.
stall.alert_rate_pctalert_count as a percentage of total requests.

Insights

Project-level optimization alerts derived from real trace data. Requires a session token.

Insights are automatically generated by Olyx by analysing patterns in your traces — you don’t need to configure them. They surface concrete, actionable recommendations (e.g. “you’re sending summarisation tasks to gpt-4o but gpt-4o-mini handles them equally well at 68% lower cost”).

MethodPathDescription
GET/api/v1/projects/:project_id/insightsList active insights.
POST/api/v1/projects/:project_id/insights/refreshTrigger insight computation immediately instead of waiting for the next scheduled run.
PATCH/api/v1/projects/:project_id/insights/:insight_id/dismissDismiss an insight once you’ve acted on it (or decided not to).

Response

[
  {
    "id": 7,
    "insight_type": "migration_alert",
    "status": "active",
    "data": {
      "intent": "summarization",
      "current_model": "gpt-4o",
      "suggested_model": "gpt-4o-mini",
      "savings_pct": 68.2
    },
    "estimated_savings_usd": 210.00,
    "created_at": "2026-04-10T08:00:00Z"
  }
]

Insight types

insight_typeWhat triggered itWhat to do
migration_alertOlyx detected a task pattern (e.g. summarisation, translation) where a cheaper model produces equivalent output based on your trace history.Use Replay to verify the cheaper model’s output quality, then update the routing tier.
latency_alertP99 latency for a detected task pattern exceeds your configured threshold.Check which steps are slow — usually tool latency or an oversized prompt.
intent_patternA task type is dominating traffic but has no dedicated routing rule, so it’s falling through to the default model.Add a routing rule for this intent so it lands on the right tier automatically.

estimated_savings_usd is projected over 30 days based on your current traffic volume. It is an estimate — actual savings depend on traffic fluctuation and whether the cheaper model meets your quality bar.


Audit Events

Last 50 control-plane events for your organisation. Requires admin or owner role.

These are account-level security events (logins, MFA changes, key revocations) — not trace-level events. Use this endpoint to build a security audit UI or export events to a SIEM.

GET /api/v1/audit-events
[
  {
    "id": 201,
    "event": "login_success",
    "ip": "203.0.113.42",
    "user_agent": "Mozilla/5.0 ...",
    "metadata": {},
    "occurred_at": "2026-04-12T09:00:00Z"
  }
]
eventDescription
login_success / login_failedAuthentication attempt result.
logoutSession ended.
mfa_verified / mfa_failedMFA challenge result.
mfa_enabled / mfa_disabledMFA configuration changed.
email_verifiedEmail address confirmed.
member_invitedTeam invite sent.
perimeter_blockRequest blocked by the Olyx perimeter — could be an invalid key, a revoked key, or a request from a blocked IP.

Private Agent Routes

For selected closed-beta deployments, your backend can point API calls at an Olyx Agent running inside your network. The agent exposes the same API shape as the hosted gateway and forwards requests outbound through your normal network controls.

Do I need this? Most closed-beta teams should start with the hosted gateway. Add the agent only when the hosted path cannot reach an internal provider endpoint, or when your deployment needs an internal egress point.

Prerequisites

  • An Olyx Agent deployment configured for your project.
  • A project-scoped API key stored in your secret manager.
  • Network access from your backend to the agent hostname.
  • Outbound access from the agent to the configured Olyx control plane and provider endpoints.

Pointing requests at the agent

Replace the hosted gateway host with the agent hostname in server-side calls:

# Hosted gateway
curl -X POST https://olyx.ai/api/v1/executions \
  -H "Authorization: Bearer ak_..." \
  -H "Content-Type: application/json" \
  -d '{ "trace_id": "...", "input": "Hello" }'

# Agent route — identical request shape, different host
curl -X POST http://olyx-agent:4000/api/v1/executions \
  -H "Authorization: Bearer ak_..." \
  -H "Content-Type: application/json" \
  -d '{ "trace_id": "...", "input": "Hello" }'

The path, headers, and body are identical — only the host changes.

TLS & custom certificate authority

If your agent route presents a certificate signed by an internal CA, tell your HTTP client to trust that CA. Without this, the client will fail before the request reaches Olyx.

curl

curl --cacert /etc/ssl/certs/internal-ca.pem \
  -X POST https://olyx-agent.internal/api/v1/executions \
  -H "Authorization: Bearer ak_ent_..." \
  -H "Content-Type: application/json" \
  -d '{ "trace_id": "...", "input": "Hello" }'

Python — requests

import requests

resp = requests.post(
    "https://olyx-agent.internal/api/v1/executions",
    headers={"Authorization": f"Bearer {api_key}"},
    json={"trace_id": trace_id, "input": "Hello"},
    verify="/etc/ssl/certs/internal-ca.pem",
    timeout=30,
)

Python — httpx

import httpx, ssl

ctx = ssl.create_default_context(cafile="/etc/ssl/certs/internal-ca.pem")
client = httpx.Client(base_url="https://olyx-agent.internal", ssl_context=ctx, timeout=30.0)

resp = client.post(
    "/api/v1/executions",
    headers={"Authorization": f"Bearer {api_key}"},
    json={"trace_id": trace_id, "input": "Hello"},
)

Ruby — Net::HTTP

require "net/http"
require "openssl"

uri  = URI("https://olyx-agent.internal/api/v1/executions")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl     = true
http.ca_file     = "/etc/ssl/certs/internal-ca.pem"
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
http.read_timeout = 30

req = Net::HTTP::Post.new(uri,
  "Content-Type"  => "application/json",
  "Authorization" => "Bearer #{api_key}"
)
req.body = { trace_id: trace_id, input: "Hello" }.to_json
res = http.request(req)

For Kubernetes in-cluster deployments, reference the agent by its service DNS name and mount the CA bundle your cluster or service mesh expects.

Connectivity verification

Ping the health endpoint before your first execution to confirm the agent is reachable and ready:

curl -f http://olyx-agent:4000/up
# → { "status": "ok" }

Use this as a readiness check in your deployment pipeline and container startup probe. A connection failure usually means the service name, port, container health, or network policy needs attention.

Private model behavior

Models registered with is_public: false represent internal or private routes. In closed beta, pair those model definitions with an agent deployment only when the provider endpoint is not reachable from the hosted gateway.

If a private model is selected but the network path cannot reach it, the execution fails like any other provider connectivity error. Keep one known-good public fallback in staging until the private route has enough trace history.

{
  "error": "Provider request failed",
  "code":  "provider_unreachable"
}

Cost reporting groups internal/private models under their configured infrastructure label when additional_config sets one:

{
  "model_definition": {
    "identifier": "my-private-llama",
    "is_public": false,
    "additional_config": {
      "infrastructure": "internal"
    }
  }
}

Regional routing

If you run services in multiple regions, register a model definition for each region and configure fallbacks between them:

PATCH /api/v1/projects/:project_id/models/:primary_model_id
Content-Type: application/json

{
  "model_definition": {
    "additional_config": {
      "fallback_identifier": "my-private-llama-eu-west"
    }
  }
}

At the network layer, route your application to the nearest healthy agent or hosted gateway using your own load balancer, service mesh, or DNS policy. Keep this simple during closed beta; add regional routing once trace latency shows a clear need.

Was this page helpful?