Model Providers

Provider setup turns a real model endpoint into a project registry entry. The SDK call does not change when you switch providers; the registry and routing settings decide which model path handles the request.

This page is a closed-beta configuration guide, not a live provider catalog. Model identifiers, prices, and API versions change over time — verify exact values against the provider before updating production routing.


Provider Selection

Start from the model endpoint you already trust, then register only the fields Olyx needs to route and account for it.

flowchart TD NEED[WHAT MODEL PATH DO YOU NEED?] OPENAI[OPENAI-COMPATIBLE] ANTHROPIC[ANTHROPIC] GEMINI[GOOGLE GEMINI] BEDROCK[AWS BEDROCK] AZURE[AZURE OPENAI] PRIVATE[PRIVATE OR SELF-HOSTED] REGISTRY[MODEL REGISTRY ENTRY] ROUTING[ROUTING TIER] NEED --> OPENAI NEED --> ANTHROPIC NEED --> GEMINI NEED --> BEDROCK NEED --> AZURE NEED --> PRIVATE OPENAI --> REGISTRY ANTHROPIC --> REGISTRY GEMINI --> REGISTRY BEDROCK --> REGISTRY AZURE --> REGISTRY PRIVATE --> REGISTRY REGISTRY --> ROUTING
Provider pathUse when
OpenAI-compatibleThe endpoint accepts OpenAI-style chat/completions requests.
AnthropicYou want the native Anthropic Messages API and schema translation.
Google GeminiYou are calling the Gemini API directly with a Google API key.
AWS BedrockYour organization uses AWS model access and IAM controls.
Azure OpenAIYour organization routes OpenAI models through an Azure deployment.
Private / self-hostedThe model is reachable through an internal endpoint or outbound agent route.

Common Fields

Every provider entry has the same core shape. Values differ by provider, but the registry model stays consistent.

FieldWhat to provide
nameHuman-readable label, such as GPT-4o Mini Production.
identifierThe exact model or deployment name used by the provider.
provideropenai, anthropic, gemini, bedrock, azure, or internal.
base_urlProvider endpoint, deployment URL, or internal endpoint.
api_keyProvider credential. Stored server-side; not returned after creation.
is_publictrue for public provider APIs; false for private routes.
input_cost_per_1kPrompt-token rate used for project cost summaries.
output_cost_per_1kCompletion-token rate used for project cost summaries.
additional_configProvider-specific fields: API version, region, or fallback identifier.

Use the dashboard for routine setup. Use the API for automated model provisioning.


Operation Support

Not every provider supports every operation. The gateway enforces this at the request boundary — unsupported operations return a clear error rather than routing silently to a different provider.

ProviderChatStreamingEmbeddingsImage generation
OpenAI✓ (DALL-E 2 / 3)
Anthropic
Google Gemini
AWS Bedrock✓ (Titan, Cohere)✓ (Stability AI, Titan Image)
Azure OpenAI
Internal / private

Cells marked — return HTTP 422 with a structured error message. The request is not silently rerouted.


OpenAI-Compatible Providers

Use the OpenAI-compatible path for OpenAI itself and providers that expose an OpenAI-style API: Groq, vLLM, Ollama, LM Studio, or an internal gateway.

{
  "name": "GPT-4o Mini",
  "identifier": "gpt-4o-mini",
  "provider": "openai",
  "base_url": "https://api.openai.com/v1",
  "is_public": true,
  "input_cost_per_1k": 0.00015,
  "output_cost_per_1k": 0.0006,
  "data_retention_days": 30
}
ProviderWhat changes
OpenAIUse OpenAI model identifiers and https://api.openai.com/v1.
GroqUse Groq’s OpenAI-compatible base URL and Groq model identifier.
Ollama / vLLMUse the internal base URL reachable from the configured Olyx path.

Anthropic

Use provider: "anthropic" when calling Anthropic directly. Olyx uses the native Messages API — tool definitions are translated from OpenAI function-calling format automatically. Anthropic does not support embeddings or image generation.

{
  "name": "Claude Sonnet",
  "identifier": "claude-sonnet-4-6",
  "provider": "anthropic",
  "base_url": "https://api.anthropic.com",
  "is_public": true,
  "input_cost_per_1k": 0.003,
  "output_cost_per_1k": 0.015,
  "data_retention_days": 30
}

Model identifiers starting with claude- are automatically inferred as Anthropic without pre-registration. Use a registry entry when you need per-model credentials, custom cost rates, or a fallback chain.


Google Gemini

Use provider: "gemini" when calling the Gemini API directly. Authentication uses a Google API key passed as a request parameter — no OAuth flow is required for the direct API path.

{
  "name": "Gemini 2.0 Flash",
  "identifier": "gemini-2.0-flash",
  "provider": "gemini",
  "base_url": "https://generativelanguage.googleapis.com/v1beta/models",
  "is_public": true,
  "input_cost_per_1k": 0.0001,
  "output_cost_per_1k": 0.0004,
  "data_retention_days": 30
}

Model identifiers starting with gemini- are automatically inferred as Gemini without pre-registration. Use a registry entry when you need a stored API key, custom cost rates, or a fallback chain.

BehaviorDetail
System promptSent as the top-level systemInstruction field, not inside the messages array.
Tool callingOpenAI function-calling format is translated to Gemini functionDeclarations automatically.
EmbeddingsSingle text calls use embedContent; arrays use batchEmbedContents.
Image generationNot supported on this path. Use Vertex AI Imagen separately.

AWS Bedrock

Use provider: "bedrock" for Bedrock runtime access. In production, prefer IAM roles over long-lived static credentials.

{
  "name": "Bedrock Claude",
  "identifier": "anthropic.claude-3-5-sonnet-20241022-v2:0",
  "provider": "bedrock",
  "base_url": "https://bedrock-runtime.us-east-1.amazonaws.com",
  "is_public": false,
  "input_cost_per_1k": 0.003,
  "output_cost_per_1k": 0.015,
  "data_retention_days": 7,
  "additional_config": {
    "aws_region": "us-east-1"
  }
}

All requests are signed with AWS SigV4. No Authorization header is sent.

ChoiceGuidance
IAM rolePreferred for AWS-hosted deployments.
Static credentialsDevelopment and tightly controlled CI only.
RegionKeep model, gateway, and workloads in the same region to meet latency targets.
Inference profilesRegister the profile identifier after testing expected behavior in staging.

Bedrock model identifiers follow the pattern {provider}.{model} — for example amazon.titan-embed-text-v2:0 or stability.stable-diffusion-xl-v1. Cross-region variants use a region prefix: us.anthropic.claude-3-5-sonnet-20241022-v2:0.

Embedding models on Bedrock:

Model familyIdentifier prefix
Amazon Titan Embedamazon.titan-embed-*
Cohere Embedcohere.embed-*

Image generation models on Bedrock:

Model familyIdentifier prefix
Stability AIstability.*
Amazon Titan Imageamazon.titan-image-*

Azure OpenAI

Use provider: "azure" when your organization routes OpenAI models through an Azure deployment. Azure requires a registered ModelDefinition — the model name alone cannot distinguish an Azure deployment from OpenAI direct.

The base_url must be the deployment root for your resource:

https://{resource-name}.openai.azure.com/openai/deployments/{deployment-id}
{
  "name": "Azure GPT-4o",
  "identifier": "gpt-4o",
  "provider": "azure",
  "base_url": "https://my-resource.openai.azure.com/openai/deployments/my-gpt4o",
  "is_public": false,
  "input_cost_per_1k": 0.005,
  "output_cost_per_1k": 0.015,
  "data_retention_days": 30
}
BehaviorDetail
Authapi-key header, not Authorization: Bearer. Set api_key in the registry entry.
API versionAppended automatically (2024-08-01-preview). No manual override needed.
Wire formatIdentical to OpenAI — response parsing and SSE streaming use the same path.
EmbeddingsSupported. Point base_url at the embeddings deployment root.

Private and Self-Hosted Models

Private and self-hosted models typically run an OpenAI-compatible server — vLLM, Ollama, LM Studio, or an internal gateway. Use provider: "internal" for endpoints without a standard public API key.

{
  "name": "Internal vLLM",
  "identifier": "mistralai/Mistral-7B-Instruct-v0.3",
  "provider": "internal",
  "base_url": "http://vllm.internal:8000/v1/chat/completions",
  "api_key": "internal-token",
  "is_public": false,
  "input_cost_per_1k": 0.0,
  "output_cost_per_1k": 0.0,
  "data_retention_days": 7
}

Do not use localhost unless the Olyx process or agent is running on the same host. Use a hostname reachable from the gateway or agent.

For private embeddings, the base_url should point to the embeddings endpoint of the private server (for example, http://vllm.internal:8000/v1/embeddings). Configure a separate registry entry if the embedding and chat endpoints differ.


Cost and Retention

Olyx uses your configured token rates to calculate trace cost. It does not know your private GPU costs, reserved capacity discounts, or enterprise contract rates.

FieldGuidance
Public provider ratesKeep aligned with your current provider agreement.
Private model ratesUse your internal estimate, or 0.0 while validating token counts.
Retention daysUse the shortest window that still supports debugging and evaluation.

Cost Intelligence becomes more useful once rates reflect the way your team actually pays for model usage.


Provider Reference

Providerprovider valueCredentialAuto-inferred from identifier
OpenAIopenaiAPI key
OpenAI-compatibleopenaiProvider key or internal token
AnthropicanthropicAPI keyclaude-*
Google GeminigeminiGoogle API keygemini-*
AWS BedrockbedrockIAM role or AWS credentialsamazon.*, anthropic.*, meta.*, cohere.*, mistral.*, stability.*, and others
Azure OpenAIazureAzure API key— (requires registry entry)
Self-hosted / internalinternalInternal token or none

Auto-inferred providers do not require a registry entry for basic use. Register the model explicitly when you need stored credentials, custom cost rates, or a configured fallback.


After Registration

After a provider entry exists, assign it to a routing tier and run a trace through the SDK.

const trace = await client.traces.create({
  metadata: { feature: "provider-smoke-test" },
});

const result = await client.execute({
  traceId: trace.data.id,
  input: "Reply with the configured model name if available.",
});

await client.traces.complete(trace.data.id);
console.log(result.data.model);
trace = client.traces.create(
    metadata={"feature": "provider-smoke-test"}
)

result = client.execute(
    trace_id=trace.id,
    input="Reply with the configured model name if available.",
)

client.traces.complete(trace.id)
print(result.model)
trace = client.traces.create(
  metadata: { feature: "provider-smoke-test" }
)

result = client.execute(
  trace_id: trace.id,
  input: "Reply with the configured model name if available."
)

client.traces.complete(trace.id)
puts result.model

Check the trace to confirm the selected model, cost, latency, and any provider error before assigning the model to a live routing tier.

Was this page helpful?