Model Providers
Provider setup turns a real model endpoint into a project registry entry. The SDK call does not change when you switch providers; the registry and routing settings decide which model path handles the request.
This page is a closed-beta configuration guide, not a live provider catalog. Model identifiers, prices, and API versions change over time — verify exact values against the provider before updating production routing.
Provider Selection
Start from the model endpoint you already trust, then register only the fields Olyx needs to route and account for it.
| Provider path | Use when |
|---|---|
| OpenAI-compatible | The endpoint accepts OpenAI-style chat/completions requests. |
| Anthropic | You want the native Anthropic Messages API and schema translation. |
| Google Gemini | You are calling the Gemini API directly with a Google API key. |
| AWS Bedrock | Your organization uses AWS model access and IAM controls. |
| Azure OpenAI | Your organization routes OpenAI models through an Azure deployment. |
| Private / self-hosted | The model is reachable through an internal endpoint or outbound agent route. |
Common Fields
Every provider entry has the same core shape. Values differ by provider, but the registry model stays consistent.
| Field | What to provide |
|---|---|
name | Human-readable label, such as GPT-4o Mini Production. |
identifier | The exact model or deployment name used by the provider. |
provider | openai, anthropic, gemini, bedrock, azure, or internal. |
base_url | Provider endpoint, deployment URL, or internal endpoint. |
api_key | Provider credential. Stored server-side; not returned after creation. |
is_public | true for public provider APIs; false for private routes. |
input_cost_per_1k | Prompt-token rate used for project cost summaries. |
output_cost_per_1k | Completion-token rate used for project cost summaries. |
additional_config | Provider-specific fields: API version, region, or fallback identifier. |
Use the dashboard for routine setup. Use the API for automated model provisioning.
Operation Support
Not every provider supports every operation. The gateway enforces this at the request boundary — unsupported operations return a clear error rather than routing silently to a different provider.
| Provider | Chat | Streaming | Embeddings | Image generation |
|---|---|---|---|---|
| OpenAI | ✓ | ✓ | ✓ | ✓ (DALL-E 2 / 3) |
| Anthropic | ✓ | ✓ | — | — |
| Google Gemini | ✓ | ✓ | ✓ | — |
| AWS Bedrock | ✓ | ✓ | ✓ (Titan, Cohere) | ✓ (Stability AI, Titan Image) |
| Azure OpenAI | ✓ | ✓ | ✓ | — |
| Internal / private | ✓ | ✓ | ✓ | — |
Cells marked — return HTTP 422 with a structured error message. The request is not silently rerouted.
OpenAI-Compatible Providers
Use the OpenAI-compatible path for OpenAI itself and providers that expose an OpenAI-style API: Groq, vLLM, Ollama, LM Studio, or an internal gateway.
{
"name": "GPT-4o Mini",
"identifier": "gpt-4o-mini",
"provider": "openai",
"base_url": "https://api.openai.com/v1",
"is_public": true,
"input_cost_per_1k": 0.00015,
"output_cost_per_1k": 0.0006,
"data_retention_days": 30
}
| Provider | What changes |
|---|---|
| OpenAI | Use OpenAI model identifiers and https://api.openai.com/v1. |
| Groq | Use Groq’s OpenAI-compatible base URL and Groq model identifier. |
| Ollama / vLLM | Use the internal base URL reachable from the configured Olyx path. |
Anthropic
Use provider: "anthropic" when calling Anthropic directly. Olyx uses the native Messages API — tool definitions are
translated from OpenAI function-calling format automatically. Anthropic does not support embeddings or image generation.
{
"name": "Claude Sonnet",
"identifier": "claude-sonnet-4-6",
"provider": "anthropic",
"base_url": "https://api.anthropic.com",
"is_public": true,
"input_cost_per_1k": 0.003,
"output_cost_per_1k": 0.015,
"data_retention_days": 30
}
Model identifiers starting with claude- are automatically inferred as Anthropic without pre-registration. Use a
registry entry when you need per-model credentials, custom cost rates, or a fallback chain.
Google Gemini
Use provider: "gemini" when calling the Gemini API directly. Authentication uses a Google API key passed as a
request parameter — no OAuth flow is required for the direct API path.
{
"name": "Gemini 2.0 Flash",
"identifier": "gemini-2.0-flash",
"provider": "gemini",
"base_url": "https://generativelanguage.googleapis.com/v1beta/models",
"is_public": true,
"input_cost_per_1k": 0.0001,
"output_cost_per_1k": 0.0004,
"data_retention_days": 30
}
Model identifiers starting with gemini- are automatically inferred as Gemini without pre-registration. Use a registry
entry when you need a stored API key, custom cost rates, or a fallback chain.
| Behavior | Detail |
|---|---|
| System prompt | Sent as the top-level systemInstruction field, not inside the messages array. |
| Tool calling | OpenAI function-calling format is translated to Gemini functionDeclarations automatically. |
| Embeddings | Single text calls use embedContent; arrays use batchEmbedContents. |
| Image generation | Not supported on this path. Use Vertex AI Imagen separately. |
AWS Bedrock
Use provider: "bedrock" for Bedrock runtime access. In production, prefer IAM roles over long-lived static
credentials.
{
"name": "Bedrock Claude",
"identifier": "anthropic.claude-3-5-sonnet-20241022-v2:0",
"provider": "bedrock",
"base_url": "https://bedrock-runtime.us-east-1.amazonaws.com",
"is_public": false,
"input_cost_per_1k": 0.003,
"output_cost_per_1k": 0.015,
"data_retention_days": 7,
"additional_config": {
"aws_region": "us-east-1"
}
}
All requests are signed with AWS SigV4. No Authorization header is sent.
| Choice | Guidance |
|---|---|
| IAM role | Preferred for AWS-hosted deployments. |
| Static credentials | Development and tightly controlled CI only. |
| Region | Keep model, gateway, and workloads in the same region to meet latency targets. |
| Inference profiles | Register the profile identifier after testing expected behavior in staging. |
Bedrock model identifiers follow the pattern {provider}.{model} — for example amazon.titan-embed-text-v2:0 or
stability.stable-diffusion-xl-v1. Cross-region variants use a region prefix: us.anthropic.claude-3-5-sonnet-20241022-v2:0.
Embedding models on Bedrock:
| Model family | Identifier prefix |
|---|---|
| Amazon Titan Embed | amazon.titan-embed-* |
| Cohere Embed | cohere.embed-* |
Image generation models on Bedrock:
| Model family | Identifier prefix |
|---|---|
| Stability AI | stability.* |
| Amazon Titan Image | amazon.titan-image-* |
Azure OpenAI
Use provider: "azure" when your organization routes OpenAI models through an Azure deployment. Azure requires a
registered ModelDefinition — the model name alone cannot distinguish an Azure deployment from OpenAI direct.
The base_url must be the deployment root for your resource:
https://{resource-name}.openai.azure.com/openai/deployments/{deployment-id}
{
"name": "Azure GPT-4o",
"identifier": "gpt-4o",
"provider": "azure",
"base_url": "https://my-resource.openai.azure.com/openai/deployments/my-gpt4o",
"is_public": false,
"input_cost_per_1k": 0.005,
"output_cost_per_1k": 0.015,
"data_retention_days": 30
}
| Behavior | Detail |
|---|---|
| Auth | api-key header, not Authorization: Bearer. Set api_key in the registry entry. |
| API version | Appended automatically (2024-08-01-preview). No manual override needed. |
| Wire format | Identical to OpenAI — response parsing and SSE streaming use the same path. |
| Embeddings | Supported. Point base_url at the embeddings deployment root. |
Private and Self-Hosted Models
Private and self-hosted models typically run an OpenAI-compatible server — vLLM, Ollama, LM Studio, or an internal
gateway. Use provider: "internal" for endpoints without a standard public API key.
{
"name": "Internal vLLM",
"identifier": "mistralai/Mistral-7B-Instruct-v0.3",
"provider": "internal",
"base_url": "http://vllm.internal:8000/v1/chat/completions",
"api_key": "internal-token",
"is_public": false,
"input_cost_per_1k": 0.0,
"output_cost_per_1k": 0.0,
"data_retention_days": 7
}
Do not use localhost unless the Olyx process or agent is running on the same host. Use a hostname reachable from the
gateway or agent.
For private embeddings, the base_url should point to the embeddings endpoint of the private server (for example,
http://vllm.internal:8000/v1/embeddings). Configure a separate registry entry if the embedding and chat endpoints
differ.
Cost and Retention
Olyx uses your configured token rates to calculate trace cost. It does not know your private GPU costs, reserved capacity discounts, or enterprise contract rates.
| Field | Guidance |
|---|---|
| Public provider rates | Keep aligned with your current provider agreement. |
| Private model rates | Use your internal estimate, or 0.0 while validating token counts. |
| Retention days | Use the shortest window that still supports debugging and evaluation. |
Cost Intelligence becomes more useful once rates reflect the way your team actually pays for model usage.
Provider Reference
| Provider | provider value | Credential | Auto-inferred from identifier |
|---|---|---|---|
| OpenAI | openai | API key | — |
| OpenAI-compatible | openai | Provider key or internal token | — |
| Anthropic | anthropic | API key | claude-* |
| Google Gemini | gemini | Google API key | gemini-* |
| AWS Bedrock | bedrock | IAM role or AWS credentials | amazon.*, anthropic.*, meta.*, cohere.*, mistral.*, stability.*, and others |
| Azure OpenAI | azure | Azure API key | — (requires registry entry) |
| Self-hosted / internal | internal | Internal token or none | — |
Auto-inferred providers do not require a registry entry for basic use. Register the model explicitly when you need stored credentials, custom cost rates, or a configured fallback.
After Registration
After a provider entry exists, assign it to a routing tier and run a trace through the SDK.
const trace = await client.traces.create({
metadata: { feature: "provider-smoke-test" },
});
const result = await client.execute({
traceId: trace.data.id,
input: "Reply with the configured model name if available.",
});
await client.traces.complete(trace.data.id);
console.log(result.data.model);trace = client.traces.create(
metadata={"feature": "provider-smoke-test"}
)
result = client.execute(
trace_id=trace.id,
input="Reply with the configured model name if available.",
)
client.traces.complete(trace.id)
print(result.model)trace = client.traces.create(
metadata: { feature: "provider-smoke-test" }
)
result = client.execute(
trace_id: trace.id,
input: "Reply with the configured model name if available."
)
client.traces.complete(trace.id)
puts result.modelCheck the trace to confirm the selected model, cost, latency, and any provider error before assigning the model to a live routing tier.