Model Providers

Provider setup turns a real model endpoint into a project registry entry. The SDK call does not change when you switch providers; the registry and routing settings decide which model path handles the request.

This page is a closed-beta configuration guide, not a live provider catalog. Model identifiers, prices, and API versions change over time — verify exact values against the provider before updating production routing.

Provider Selection

Start from the model endpoint you already trust, then register only the fields Olyx needs to route and account for it.

flowchart TD NEED[WHAT MODEL PATH DO YOU NEED?] OPENAI[OPENAI-COMPATIBLE] ANTHROPIC[ANTHROPIC] GEMINI[GOOGLE GEMINI] BEDROCK[AWS BEDROCK] AZURE[AZURE OPENAI] PRIVATE[PRIVATE OR SELF-HOSTED] REGISTRY[MODEL REGISTRY ENTRY] ROUTING[ROUTING TIER] NEED --> OPENAI NEED --> ANTHROPIC NEED --> GEMINI NEED --> BEDROCK NEED --> AZURE NEED --> PRIVATE OPENAI --> REGISTRY ANTHROPIC --> REGISTRY GEMINI --> REGISTRY BEDROCK --> REGISTRY AZURE --> REGISTRY PRIVATE --> REGISTRY REGISTRY --> ROUTING

Provider path	Use when
OpenAI-compatible	The endpoint accepts OpenAI-style chat/completions requests.
Anthropic	You want the native Anthropic Messages API and schema translation.
Google Gemini	You are calling the Gemini API directly with a Google API key.
AWS Bedrock	Your organization uses AWS model access and IAM controls.
Azure OpenAI	Your organization routes OpenAI models through an Azure deployment.
Private / self-hosted	The model is reachable through an internal endpoint or outbound agent route.

Common Fields

Every provider entry has the same core shape. Values differ by provider, but the registry model stays consistent.

Field	What to provide
`name`	Human-readable label, such as `GPT-4o Mini Production`.
`identifier`	The exact model or deployment name used by the provider.
`provider`	`openai`, `anthropic`, `gemini`, `bedrock`, `azure`, or `internal`.
`base_url`	Provider endpoint, deployment URL, or internal endpoint.
`api_key`	Provider credential. Stored server-side; not returned after creation.
`is_public`	`true` for public provider APIs; `false` for private routes.
`input_cost_per_1k`	Prompt-token rate used for project cost summaries.
`output_cost_per_1k`	Completion-token rate used for project cost summaries.
`additional_config`	Provider-specific fields: API version, region, or fallback identifier.

Use the dashboard for routine setup. Use the API for automated model provisioning.

Operation Support

Not every provider supports every operation. The gateway enforces this at the request boundary — unsupported operations return a clear error rather than routing silently to a different provider.

Provider	Chat	Streaming	Embeddings	Image generation
OpenAI	✓	✓	✓	✓ (DALL-E 2 / 3)
Anthropic	✓	✓	—	—
Google Gemini	✓	✓	✓	—
AWS Bedrock	✓	✓	✓ (Titan, Cohere)	✓ (Stability AI, Titan Image)
Azure OpenAI	✓	✓	✓	—
Internal / private	✓	✓	✓	—

Cells marked — return HTTP 422 with a structured error message. The request is not silently rerouted.

OpenAI-Compatible Providers

Use the OpenAI-compatible path for OpenAI itself and providers that expose an OpenAI-style API: Groq, vLLM, Ollama, LM Studio, or an internal gateway.

{
  "name": "GPT-4o Mini",
  "identifier": "gpt-4o-mini",
  "provider": "openai",
  "base_url": "https://api.openai.com/v1",
  "is_public": true,
  "input_cost_per_1k": 0.00015,
  "output_cost_per_1k": 0.0006,
  "data_retention_days": 30
}

Provider	What changes
OpenAI	Use OpenAI model identifiers and `https://api.openai.com/v1`.
Groq	Use Groq’s OpenAI-compatible base URL and Groq model identifier.
Ollama / vLLM	Use the internal base URL reachable from the configured Olyx path.

Anthropic

Use provider: "anthropic" when calling Anthropic directly. Olyx uses the native Messages API — tool definitions are translated from OpenAI function-calling format automatically. Anthropic does not support embeddings or image generation.

{
  "name": "Claude Sonnet",
  "identifier": "claude-sonnet-4-6",
  "provider": "anthropic",
  "base_url": "https://api.anthropic.com",
  "is_public": true,
  "input_cost_per_1k": 0.003,
  "output_cost_per_1k": 0.015,
  "data_retention_days": 30
}

Model identifiers starting with claude- are automatically inferred as Anthropic without pre-registration. Use a registry entry when you need per-model credentials, custom cost rates, or a fallback chain.

Google Gemini

Use provider: "gemini" when calling the Gemini API directly. Authentication uses a Google API key passed as a request parameter — no OAuth flow is required for the direct API path.

{
  "name": "Gemini 2.0 Flash",
  "identifier": "gemini-2.0-flash",
  "provider": "gemini",
  "base_url": "https://generativelanguage.googleapis.com/v1beta/models",
  "is_public": true,
  "input_cost_per_1k": 0.0001,
  "output_cost_per_1k": 0.0004,
  "data_retention_days": 30
}

Model identifiers starting with gemini- are automatically inferred as Gemini without pre-registration. Use a registry entry when you need a stored API key, custom cost rates, or a fallback chain.

Behavior	Detail
System prompt	Sent as the top-level `systemInstruction` field, not inside the messages array.
Tool calling	OpenAI function-calling format is translated to Gemini `functionDeclarations` automatically.
Embeddings	Single text calls use `embedContent`; arrays use `batchEmbedContents`.
Image generation	Not supported on this path. Use Vertex AI Imagen separately.

AWS Bedrock

Use provider: "bedrock" for Bedrock runtime access. In production, prefer IAM roles over long-lived static credentials.

{
  "name": "Bedrock Claude",
  "identifier": "anthropic.claude-3-5-sonnet-20241022-v2:0",
  "provider": "bedrock",
  "base_url": "https://bedrock-runtime.us-east-1.amazonaws.com",
  "is_public": false,
  "input_cost_per_1k": 0.003,
  "output_cost_per_1k": 0.015,
  "data_retention_days": 7,
  "additional_config": {
    "aws_region": "us-east-1"
  }
}

All requests are signed with AWS SigV4. No Authorization header is sent.

Choice	Guidance
IAM role	Preferred for AWS-hosted deployments.
Static credentials	Development and tightly controlled CI only.
Region	Keep model, gateway, and workloads in the same region to meet latency targets.
Inference profiles	Register the profile identifier after testing expected behavior in staging.

Bedrock model identifiers follow the pattern {provider}.{model} — for example amazon.titan-embed-text-v2:0 or stability.stable-diffusion-xl-v1. Cross-region variants use a region prefix: us.anthropic.claude-3-5-sonnet-20241022-v2:0.

Embedding models on Bedrock:

Model family	Identifier prefix
Amazon Titan Embed	`amazon.titan-embed-*`
Cohere Embed	`cohere.embed-*`

Image generation models on Bedrock:

Model family	Identifier prefix
Stability AI	`stability.*`
Amazon Titan Image	`amazon.titan-image-*`

Azure OpenAI

Use provider: "azure" when your organization routes OpenAI models through an Azure deployment. Azure requires a registered ModelDefinition — the model name alone cannot distinguish an Azure deployment from OpenAI direct.

The base_url must be the deployment root for your resource:

https://{resource-name}.openai.azure.com/openai/deployments/{deployment-id}

{
  "name": "Azure GPT-4o",
  "identifier": "gpt-4o",
  "provider": "azure",
  "base_url": "https://my-resource.openai.azure.com/openai/deployments/my-gpt4o",
  "is_public": false,
  "input_cost_per_1k": 0.005,
  "output_cost_per_1k": 0.015,
  "data_retention_days": 30
}

Behavior	Detail
Auth	`api-key` header, not `Authorization: Bearer`. Set `api_key` in the registry entry.
API version	Appended automatically (`2024-08-01-preview`). No manual override needed.
Wire format	Identical to OpenAI — response parsing and SSE streaming use the same path.
Embeddings	Supported. Point `base_url` at the embeddings deployment root.

Private and Self-Hosted Models

Private and self-hosted models typically run an OpenAI-compatible server — vLLM, Ollama, LM Studio, or an internal gateway. Use provider: "internal" for endpoints without a standard public API key.

{
  "name": "Internal vLLM",
  "identifier": "mistralai/Mistral-7B-Instruct-v0.3",
  "provider": "internal",
  "base_url": "http://vllm.internal:8000/v1/chat/completions",
  "api_key": "internal-token",
  "is_public": false,
  "input_cost_per_1k": 0.0,
  "output_cost_per_1k": 0.0,
  "data_retention_days": 7
}

Do not use localhost unless the Olyx process or agent is running on the same host. Use a hostname reachable from the gateway or agent.

For private embeddings, the base_url should point to the embeddings endpoint of the private server (for example, http://vllm.internal:8000/v1/embeddings). Configure a separate registry entry if the embedding and chat endpoints differ.

Cost and Retention

Olyx uses your configured token rates to calculate trace cost. It does not know your private GPU costs, reserved capacity discounts, or enterprise contract rates.

Field	Guidance
Public provider rates	Keep aligned with your current provider agreement.
Private model rates	Use your internal estimate, or `0.0` while validating token counts.
Retention days	Use the shortest window that still supports debugging and evaluation.

Cost Intelligence becomes more useful once rates reflect the way your team actually pays for model usage.

Provider Reference

Provider	`provider` value	Credential	Auto-inferred from identifier
OpenAI	`openai`	API key	—
OpenAI-compatible	`openai`	Provider key or internal token	—
Anthropic	`anthropic`	API key	`claude-*`
Google Gemini	`gemini`	Google API key	`gemini-*`
AWS Bedrock	`bedrock`	IAM role or AWS credentials	`amazon.`, `anthropic.`, `meta.`, `cohere.`, `mistral.`, `stability.`, and others
Azure OpenAI	`azure`	Azure API key	— (requires registry entry)
Self-hosted / internal	`internal`	Internal token or none	—

Auto-inferred providers do not require a registry entry for basic use. Register the model explicitly when you need stored credentials, custom cost rates, or a configured fallback.

After Registration

After a provider entry exists, assign it to a routing tier and run a trace through the SDK.

const trace = await client.traces.create({
  metadata: { feature: "provider-smoke-test" },
});

const result = await client.execute({
  traceId: trace.data.id,
  input: "Reply with the configured model name if available.",
});

await client.traces.complete(trace.data.id);
console.log(result.data.model);

trace = client.traces.create(
    metadata={"feature": "provider-smoke-test"}
)

result = client.execute(
    trace_id=trace.id,
    input="Reply with the configured model name if available.",
)

client.traces.complete(trace.id)
print(result.model)

trace = client.traces.create(
  metadata: { feature: "provider-smoke-test" }
)

result = client.execute(
  trace_id: trace.id,
  input: "Reply with the configured model name if available."
)

client.traces.complete(trace.id)
puts result.model

Check the trace to confirm the selected model, cost, latency, and any provider error before assigning the model to a live routing tier.