Model providers

This page covers LLM/model providers (not chat channels like WhatsApp/Telegram). For model selection rules, see /concepts/models.

Quick rules

Model refs use provider/model (example: opencode/claude-opus-4-6).
agents.defaults.models acts as an allowlist when set.
CLI helpers: fluffbuzz onboard, fluffbuzz models list, fluffbuzz models set <provider/model>.
models.providers.*.models[].contextWindow is native model metadata; contextTokens is the effective runtime cap.
Fallback rules, cooldown probes, and session-override persistence: Model failover.
OpenAI-family routes are prefix-specific: openai/<model> uses the direct OpenAI API-key provider in PI, openai-codex/<model> uses Codex OAuth in PI, and openai/<model> plus agents.defaults.embeddedHarness.runtime: "codex" uses the native Codex app-server harness. See OpenAI and Codex harness.
GPT-5.5 is currently available through subscription/OAuth routes: openai-codex/gpt-5.5 in PI or openai/gpt-5.5 with the Codex app-server harness. The direct API-key route for openai/gpt-5.5 is supported once OpenAI enables GPT-5.5 on the public API; until then use API-enabled models such as openai/gpt-5.4 for OPENAI_API_KEY setups.

Plugin-owned provider behavior

Most provider-specific logic lives in provider plugins (registerProvider(...)) while FluffBuzz keeps the generic inference loop. Plugins own onboarding, model catalogs, auth env-var mapping, transport/config normalization, tool-schema cleanup, failover classification, OAuth refresh, usage reporting, thinking/reasoning profiles, and more. The full list of provider-SDK hooks and bundled-plugin examples lives in Provider plugins. A provider that needs a totally custom request executor is a separate, deeper extension surface.

Provider runtime capabilities is shared runner metadata (provider family, transcript/tooling quirks, transport/cache hints). It is not the same as the public capability model, which describes what a plugin registers (text inference, speech, etc.).

API key rotation

Supports generic provider rotation for selected providers.
Configure multiple keys via:
- FLUFFBUZZ_LIVE_<PROVIDER>_KEY (single live override, highest priority)
- <PROVIDER>_API_KEYS (comma or semicolon list)
- <PROVIDER>_API_KEY (primary key)
- <PROVIDER>_API_KEY_* (numbered list, e.g. <PROVIDER>_API_KEY_1)
For Google providers, GOOGLE_API_KEY is also included as fallback.
Key selection order preserves priority and deduplicates values.
Requests are retried with the next key only on rate-limit responses (for example 429, rate_limit, quota, resource exhausted, Too many concurrent requests, ThrottlingException, concurrency limit reached, workers_ai ... quota limit exceeded, or periodic usage-limit messages).
Non-rate-limit failures fail immediately; no key rotation is attempted.
When all candidate keys fail, the final error is returned from the last attempt.

Built-in providers (pi-ai catalog)

FluffBuzz ships with the pi‑ai catalog. These providers require no models.providers config; just set auth + pick a model.

OpenAI

Provider: openai
Auth: OPENAI_API_KEY
Optional rotation: OPENAI_API_KEYS, OPENAI_API_KEY_1, OPENAI_API_KEY_2, plus FLUFFBUZZ_LIVE_OPENAI_KEY (single override)
Example models: openai/gpt-5.4, openai/gpt-5.4-mini
GPT-5.5 direct API support is future-ready here once OpenAI exposes GPT-5.5 on the API
CLI: fluffbuzz onboard --auth-choice openai-api-key
Default transport is auto (WebSocket-first, SSE fallback)
Override per model via agents.defaults.models["openai/<model>"].params.transport ("sse", "websocket", or "auto")
OpenAI Responses WebSocket warm-up defaults to enabled via params.openaiWsWarmup (true/false)
OpenAI priority processing can be enabled via agents.defaults.models["openai/<model>"].params.serviceTier
/fast and params.fastMode map direct openai/* Responses requests to service_tier=priority on api.openai.com
Use params.serviceTier when you want an explicit tier instead of the shared /fast toggle
Hidden FluffBuzz attribution headers (originator, version, User-Agent) apply only on native OpenAI traffic to api.openai.com, not generic OpenAI-compatible proxies
Native OpenAI routes also keep Responses store, prompt-cache hints, and OpenAI reasoning-compat payload shaping; proxy routes do not
openai/gpt-5.3-codex-spark is intentionally suppressed in FluffBuzz because live OpenAI API requests reject it and the current Codex catalog does not expose it

{
  agents: { defaults: { model: { primary: "openai/gpt-5.4" } } },
}

Anthropic

Provider: anthropic
Auth: ANTHROPIC_API_KEY
Optional rotation: ANTHROPIC_API_KEYS, ANTHROPIC_API_KEY_1, ANTHROPIC_API_KEY_2, plus FLUFFBUZZ_LIVE_ANTHROPIC_KEY (single override)
Example model: anthropic/claude-opus-4-6
CLI: fluffbuzz onboard --auth-choice apiKey
Direct public Anthropic requests support the shared /fast toggle and params.fastMode, including API-key and OAuth-authenticated traffic sent to api.anthropic.com; FluffBuzz maps that to Anthropic service_tier (auto vs standard_only)
Anthropic note: Anthropic staff told us FluffBuzz-style Claude CLI usage is allowed again, so FluffBuzz treats Claude CLI reuse and claude -p usage as sanctioned for this integration unless Anthropic publishes a new policy.
Anthropic setup-token remains available as a supported FluffBuzz token path, but FluffBuzz now prefers Claude CLI reuse and claude -p when available.

{
  agents: { defaults: { model: { primary: "anthropic/claude-opus-4-6" } } },
}

OpenAI Codex OAuth

Provider: openai-codex
Auth: OAuth (ChatGPT)
PI model ref: openai-codex/gpt-5.5
Native Codex app-server harness ref: openai/gpt-5.5 with agents.defaults.embeddedHarness.runtime: "codex"
Legacy model refs: codex/gpt-*
CLI: fluffbuzz onboard --auth-choice openai-codex or fluffbuzz models auth login --provider openai-codex
Default transport is auto (WebSocket-first, SSE fallback)
Override per PI model via agents.defaults.models["openai-codex/<model>"].params.transport ("sse", "websocket", or "auto")
params.serviceTier is also forwarded on native Codex Responses requests (chatgpt.com/backend-api)
Hidden FluffBuzz attribution headers (originator, version, User-Agent) are only attached on native Codex traffic to chatgpt.com/backend-api, not generic OpenAI-compatible proxies
Shares the same /fast toggle and params.fastMode config as direct openai/*; FluffBuzz maps that to service_tier=priority
openai-codex/gpt-5.5 keeps native contextWindow = 1000000 and a default runtime contextTokens = 272000; override the runtime cap with models.providers.openai-codex.models[].contextTokens
Policy note: OpenAI Codex OAuth is explicitly supported for external tools/workflows like FluffBuzz.
Current GPT-5.5 access uses this OAuth/subscription route until OpenAI enables GPT-5.5 on the public API.

{
  agents: { defaults: { model: { primary: "openai-codex/gpt-5.5" } } },
}

{
  models: {
    providers: {
      "openai-codex": {
        models: [{ id: "gpt-5.5", contextTokens: 160000 }],
      },
    },
  },
}

Other subscription-style hosted options

Qwen Cloud: Qwen Cloud provider surface plus Alibaba DashScope and Coding Plan endpoint mapping
MiniMax: MiniMax Coding Plan OAuth or API key access
GLM Models: Z.AI Coding Plan or general API endpoints

OpenCode

Auth: OPENCODE_API_KEY (or OPENCODE_ZEN_API_KEY)
Zen runtime provider: opencode
Go runtime provider: opencode-go
Example models: opencode/claude-opus-4-6, opencode-go/kimi-k2.5
CLI: fluffbuzz onboard --auth-choice opencode-zen or fluffbuzz onboard --auth-choice opencode-go

{
  agents: { defaults: { model: { primary: "opencode/claude-opus-4-6" } } },
}

Google Gemini (API key)

Provider: google
Auth: GEMINI_API_KEY
Optional rotation: GEMINI_API_KEYS, GEMINI_API_KEY_1, GEMINI_API_KEY_2, GOOGLE_API_KEY fallback, and FLUFFBUZZ_LIVE_GEMINI_KEY (single override)
Example models: google/gemini-3.1-pro-preview, google/gemini-3-flash-preview
Compatibility: legacy FluffBuzz config using google/gemini-3.1-flash-preview is normalized to google/gemini-3-flash-preview
CLI: fluffbuzz onboard --auth-choice gemini-api-key
Direct Gemini runs also accept agents.defaults.models["google/<model>"].params.cachedContent (or legacy cached_content) to forward a provider-native cachedContents/... handle; Gemini cache hits surface as FluffBuzz cacheRead

Google Vertex and Gemini CLI

Providers: google-vertex, google-gemini-cli
Auth: Vertex uses gcloud ADC; Gemini CLI uses its OAuth flow
Caution: Gemini CLI OAuth in FluffBuzz is an unofficial integration. Some users have reported Google account restrictions after using third-party clients. Review Google terms and use a non-critical account if you choose to proceed.
Gemini CLI OAuth is shipped as part of the bundled google plugin.
- Install Gemini CLI first:
  - brew install gemini-cli
  - or npm install -g @google/gemini-cli
- Enable: fluffbuzz plugins enable google
- Login: fluffbuzz models auth login --provider google-gemini-cli --set-default
- Default model: google-gemini-cli/gemini-3-flash-preview
- Note: you do not paste a client id or secret into fluffbuzz.json. The CLI login flow stores tokens in auth profiles on the gateway host.
- If requests fail after login, set GOOGLE_CLOUD_PROJECT or GOOGLE_CLOUD_PROJECT_ID on the gateway host.
- Gemini CLI JSON replies are parsed from response; usage falls back to stats, with stats.cached normalized into FluffBuzz cacheRead.

Z.AI (GLM)

Provider: zai
Auth: ZAI_API_KEY
Example model: zai/glm-5.1
CLI: fluffbuzz onboard --auth-choice zai-api-key
- Aliases: z.ai/* and z-ai/* normalize to zai/*
- zai-api-key auto-detects the matching Z.AI endpoint; zai-coding-global, zai-coding-cn, zai-global, and zai-cn force a specific surface

Vercel AI Gateway

Provider: vercel-ai-gateway
Auth: AI_GATEWAY_API_KEY
Example models: vercel-ai-gateway/anthropic/claude-opus-4.6, vercel-ai-gateway/moonshotai/kimi-k2.6
CLI: fluffbuzz onboard --auth-choice ai-gateway-api-key

Kilo Gateway

Provider: kilocode
Auth: KILOCODE_API_KEY
Example model: kilocode/kilo/auto
CLI: fluffbuzz onboard --auth-choice kilocode-api-key
Base URL: https://api.kilo.ai/api/gateway/
Static fallback catalog ships kilocode/kilo/auto; live https://api.kilo.ai/api/gateway/models discovery can expand the runtime catalog further.
Exact upstream routing behind kilocode/kilo/auto is owned by Kilo Gateway, not hard-coded in FluffBuzz.

See /providers/kilocode for setup details.

Other bundled provider plugins

Provider	Id	Auth env	Example model
BytePlus	`byteplus` / `byteplus-plan`	`BYTEPLUS_API_KEY`	`byteplus-plan/ark-code-latest`
Cerebras	`cerebras`	`CEREBRAS_API_KEY`	`cerebras/zai-glm-4.7`
Cloudflare AI Gateway	`cloudflare-ai-gateway`	`CLOUDFLARE_AI_GATEWAY_API_KEY`	—
GitHub Copilot	`github-copilot`	`COPILOT_GITHUB_TOKEN` / `GH_TOKEN` / `GITHUB_TOKEN`	—
Groq	`groq`	`GROQ_API_KEY`	—
Hugging Face Inference	`huggingface`	`HUGGINGFACE_HUB_TOKEN` or `HF_TOKEN`	`huggingface/deepseek-ai/DeepSeek-R1`
Kilo Gateway	`kilocode`	`KILOCODE_API_KEY`	`kilocode/kilo/auto`
Kimi Coding	`kimi`	`KIMI_API_KEY` or `KIMICODE_API_KEY`	`kimi/kimi-code`
MiniMax	`minimax` / `minimax-portal`	`MINIMAX_API_KEY` / `MINIMAX_OAUTH_TOKEN`	`minimax/MiniMax-M2.7`
Mistral	`mistral`	`MISTRAL_API_KEY`	`mistral/mistral-large-latest`
Moonshot	`moonshot`	`MOONSHOT_API_KEY`	`moonshot/kimi-k2.6`
NVIDIA	`nvidia`	`NVIDIA_API_KEY`	`nvidia/nvidia/llama-3.1-nemotron-70b-instruct`
OpenRouter	`openrouter`	`OPENROUTER_API_KEY`	`openrouter/auto`
Qianfan	`qianfan`	`QIANFAN_API_KEY`	`qianfan/deepseek-v3.2`
Qwen Cloud	`qwen`	`QWEN_API_KEY` / `MODELSTUDIO_API_KEY` / `DASHSCOPE_API_KEY`	`qwen/qwen3.5-plus`
StepFun	`stepfun` / `stepfun-plan`	`STEPFUN_API_KEY`	`stepfun/step-3.5-flash`
Together	`together`	`TOGETHER_API_KEY`	`together/moonshotai/Kimi-K2.5`
Venice	`venice`	`VENICE_API_KEY`	—
Vercel AI Gateway	`vercel-ai-gateway`	`AI_GATEWAY_API_KEY`	`vercel-ai-gateway/anthropic/claude-opus-4.6`
Volcano Engine (Doubao)	`volcengine` / `volcengine-plan`	`VOLCANO_ENGINE_API_KEY`	`volcengine-plan/ark-code-latest`
xAI	`xai`	`XAI_API_KEY`	`xai/grok-4`
Xiaomi	`xiaomi`	`XIAOMI_API_KEY`	`xiaomi/mimo-v2-flash`

Quirks worth knowing:

OpenRouter applies its app-attribution headers and Anthropic cache_control markers only on verified openrouter.ai routes. As a proxy-style OpenAI-compatible path, it skips native-OpenAI-only shaping (serviceTier, Responses store, prompt-cache hints, OpenAI reasoning-compat). Gemini-backed refs keep proxy-Gemini thought-signature sanitation only.
Kilo Gateway Gemini-backed refs follow the same proxy-Gemini sanitation path; kilocode/kilo/auto and other proxy-reasoning-unsupported refs skip proxy reasoning injection.
MiniMax API-key onboarding writes explicit M2.7 model definitions with input: ["text", "image"]; the bundled catalog keeps chat refs text-only until that config is materialized.
xAI uses the xAI Responses path. /fast or params.fastMode: true rewrites grok-3, grok-3-mini, grok-4, and grok-4-0709 to their *-fast variants. tool_stream defaults on; disable via agents.defaults.models["xai/<model>"].params.tool_stream=false.
Cerebras GLM models use zai-glm-4.7 / zai-glm-4.6; OpenAI-compatible base URL is https://api.cerebras.ai/v1.

Providers via `models.providers` (custom/base URL)

Use models.providers (or models.json) to add custom providers or OpenAI/Anthropic‑compatible proxies. Many of the bundled provider plugins below already publish a default catalog. Use explicit models.providers.<id> entries only when you want to override the default base URL, headers, or model list.

Moonshot AI (Kimi)

Moonshot ships as a bundled provider plugin. Use the built-in provider by default, and add an explicit models.providers.moonshot entry only when you need to override the base URL or model metadata:

Provider: moonshot
Auth: MOONSHOT_API_KEY
Example model: moonshot/kimi-k2.6
CLI: fluffbuzz onboard --auth-choice moonshot-api-key or fluffbuzz onboard --auth-choice moonshot-api-key-cn

Kimi K2 model IDs:

moonshot/kimi-k2.6
moonshot/kimi-k2.5
moonshot/kimi-k2-thinking
moonshot/kimi-k2-thinking-turbo
moonshot/kimi-k2-turbo

{
  agents: {
    defaults: { model: { primary: "moonshot/kimi-k2.6" } },
  },
  models: {
    mode: "merge",
    providers: {
      moonshot: {
        baseUrl: "https://api.moonshot.ai/v1",
        apiKey: "${MOONSHOT_API_KEY}",
        api: "openai-completions",
        models: [{ id: "kimi-k2.6", name: "Kimi K2.6" }],
      },
    },
  },
}

Kimi Coding

Kimi Coding uses Moonshot AI’s Anthropic-compatible endpoint:

Provider: kimi
Auth: KIMI_API_KEY
Example model: kimi/kimi-code

{
  env: { KIMI_API_KEY: "sk-..." },
  agents: {
    defaults: { model: { primary: "kimi/kimi-code" } },
  },
}

Legacy kimi/k2p5 remains accepted as a compatibility model id.

Volcano Engine (Doubao)

Volcano Engine (火山引擎) provides access to Doubao and other models in China.

Provider: volcengine (coding: volcengine-plan)
Auth: VOLCANO_ENGINE_API_KEY
Example model: volcengine-plan/ark-code-latest
CLI: fluffbuzz onboard --auth-choice volcengine-api-key

{
  agents: {
    defaults: { model: { primary: "volcengine-plan/ark-code-latest" } },
  },
}

Onboarding defaults to the coding surface, but the general volcengine/* catalog is registered at the same time. In onboarding/configure model pickers, the Volcengine auth choice prefers both volcengine/* and volcengine-plan/* rows. If those models are not loaded yet, FluffBuzz falls back to the unfiltered catalog instead of showing an empty provider-scoped picker. Available models:

volcengine/doubao-seed-1-8-251228 (Doubao Seed 1.8)
volcengine/doubao-seed-code-preview-251028
volcengine/kimi-k2-5-260127 (Kimi K2.5)
volcengine/glm-4-7-251222 (GLM 4.7)
volcengine/deepseek-v3-2-251201 (DeepSeek V3.2 128K)

Coding models (volcengine-plan):

volcengine-plan/ark-code-latest
volcengine-plan/doubao-seed-code
volcengine-plan/kimi-k2.5
volcengine-plan/kimi-k2-thinking
volcengine-plan/glm-4.7

BytePlus (International)

BytePlus ARK provides access to the same models as Volcano Engine for international users.

Provider: byteplus (coding: byteplus-plan)
Auth: BYTEPLUS_API_KEY
Example model: byteplus-plan/ark-code-latest
CLI: fluffbuzz onboard --auth-choice byteplus-api-key

{
  agents: {
    defaults: { model: { primary: "byteplus-plan/ark-code-latest" } },
  },
}

Onboarding defaults to the coding surface, but the general byteplus/* catalog is registered at the same time. In onboarding/configure model pickers, the BytePlus auth choice prefers both byteplus/* and byteplus-plan/* rows. If those models are not loaded yet, FluffBuzz falls back to the unfiltered catalog instead of showing an empty provider-scoped picker. Available models:

byteplus/seed-1-8-251228 (Seed 1.8)
byteplus/kimi-k2-5-260127 (Kimi K2.5)
byteplus/glm-4-7-251222 (GLM 4.7)

Coding models (byteplus-plan):

byteplus-plan/ark-code-latest
byteplus-plan/doubao-seed-code
byteplus-plan/kimi-k2.5
byteplus-plan/kimi-k2-thinking
byteplus-plan/glm-4.7

Synthetic

Synthetic provides Anthropic-compatible models behind the synthetic provider:

Provider: synthetic
Auth: SYNTHETIC_API_KEY
Example model: synthetic/hf:MiniMaxAI/MiniMax-M2.5
CLI: fluffbuzz onboard --auth-choice synthetic-api-key

{
  agents: {
    defaults: { model: { primary: "synthetic/hf:MiniMaxAI/MiniMax-M2.5" } },
  },
  models: {
    mode: "merge",
    providers: {
      synthetic: {
        baseUrl: "https://api.synthetic.new/anthropic",
        apiKey: "${SYNTHETIC_API_KEY}",
        api: "anthropic-messages",
        models: [{ id: "hf:MiniMaxAI/MiniMax-M2.5", name: "MiniMax M2.5" }],
      },
    },
  },
}

MiniMax

MiniMax is configured via models.providers because it uses custom endpoints:

MiniMax OAuth (Global): --auth-choice minimax-global-oauth
MiniMax OAuth (CN): --auth-choice minimax-cn-oauth
MiniMax API key (Global): --auth-choice minimax-global-api
MiniMax API key (CN): --auth-choice minimax-cn-api
Auth: MINIMAX_API_KEY for minimax; MINIMAX_OAUTH_TOKEN or MINIMAX_API_KEY for minimax-portal

See /providers/minimax for setup details, model options, and config snippets. On MiniMax’s Anthropic-compatible streaming path, FluffBuzz disables thinking by default unless you explicitly set it, and /fast on rewrites MiniMax-M2.7 to MiniMax-M2.7-highspeed. Plugin-owned capability split:

Text/chat defaults stay on minimax/MiniMax-M2.7
Image generation is minimax/image-01 or minimax-portal/image-01
Image understanding is plugin-owned MiniMax-VL-01 on both MiniMax auth paths
Web search stays on provider id minimax

LM Studio

LM Studio ships as a bundled provider plugin which uses the native API:

Provider: lmstudio
Auth: LM_API_TOKEN
Default inference base URL: http://localhost:1234/v1

Then set a model (replace with one of the IDs returned by http://localhost:1234/api/v1/models):

{
  agents: {
    defaults: { model: { primary: "lmstudio/openai/gpt-oss-20b" } },
  },
}

FluffBuzz uses LM Studio’s native /api/v1/models and /api/v1/models/load for discovery + auto-load, with /v1/chat/completions for inference by default. See /providers/lmstudio for setup and troubleshooting.

Ollama

Ollama ships as a bundled provider plugin and uses Ollama’s native API:

Provider: ollama
Auth: None required (local server)
Example model: ollama/llama3.3
Installation: https://ollama.com/download

# Install Ollama, then pull a model:
ollama pull llama3.3

{
  agents: {
    defaults: { model: { primary: "ollama/llama3.3" } },
  },
}

Ollama is detected locally at http://127.0.0.1:11434 when you opt in with OLLAMA_API_KEY, and the bundled provider plugin adds Ollama directly to fluffbuzz onboard and the model picker. See /providers/ollama for onboarding, cloud/local mode, and custom configuration.

vLLM

vLLM ships as a bundled provider plugin for local/self-hosted OpenAI-compatible servers:

Provider: vllm
Auth: Optional (depends on your server)
Default base URL: http://127.0.0.1:8000/v1

To opt in to auto-discovery locally (any value works if your server doesn’t enforce auth):

export VLLM_API_KEY="vllm-local"

Then set a model (replace with one of the IDs returned by /v1/models):

{
  agents: {
    defaults: { model: { primary: "vllm/your-model-id" } },
  },
}

See /providers/vllm for details.

SGLang

SGLang ships as a bundled provider plugin for fast self-hosted OpenAI-compatible servers:

Provider: sglang
Auth: Optional (depends on your server)
Default base URL: http://127.0.0.1:30000/v1

To opt in to auto-discovery locally (any value works if your server does not enforce auth):

export SGLANG_API_KEY="sglang-local"

Then set a model (replace with one of the IDs returned by /v1/models):

{
  agents: {
    defaults: { model: { primary: "sglang/your-model-id" } },
  },
}

See /providers/sglang for details.

Local proxies (LM Studio, vLLM, LiteLLM, etc.)

Example (OpenAI‑compatible):

{
  agents: {
    defaults: {
      model: { primary: "lmstudio/my-local-model" },
      models: { "lmstudio/my-local-model": { alias: "Local" } },
    },
  },
  models: {
    providers: {
      lmstudio: {
        baseUrl: "http://localhost:1234/v1",
        apiKey: "${LM_API_TOKEN}",
        api: "openai-completions",
        models: [
          {
            id: "my-local-model",
            name: "Local Model",
            reasoning: false,
            input: ["text"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
            contextWindow: 200000,
            maxTokens: 8192,
          },
        ],
      },
    },
  },
}

Notes:

For custom providers, reasoning, input, cost, contextWindow, and maxTokens are optional. When omitted, FluffBuzz defaults to:
- reasoning: false
- input: ["text"]
- cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }
- contextWindow: 200000
- maxTokens: 8192
Recommended: set explicit values that match your proxy/model limits.
For api: "openai-completions" on non-native endpoints (any non-empty baseUrl whose host is not api.openai.com), FluffBuzz forces compat.supportsDeveloperRole: false to avoid provider 400 errors for unsupported developer roles.
Proxy-style OpenAI-compatible routes also skip native OpenAI-only request shaping: no service_tier, no Responses store, no prompt-cache hints, no OpenAI reasoning-compat payload shaping, and no hidden FluffBuzz attribution headers.
If baseUrl is empty/omitted, FluffBuzz keeps the default OpenAI behavior (which resolves to api.openai.com).
For safety, an explicit compat.supportsDeveloperRole: true is still overridden on non-native openai-completions endpoints.

CLI examples

fluffbuzz onboard --auth-choice opencode-zen
fluffbuzz models set opencode/claude-opus-4-6
fluffbuzz models list

See also: /gateway/configuration for full configuration examples.

Models — model configuration and aliases
Model Failover — fallback chains and retry behavior
Configuration Reference — model config keys
Providers — per-provider setup guides

Overview

Concepts and configuration

Providers

Quick rules

Plugin-owned provider behavior

API key rotation

Built-in providers (pi-ai catalog)

OpenAI

Anthropic

OpenAI Codex OAuth

Other subscription-style hosted options

OpenCode

Google Gemini (API key)

Google Vertex and Gemini CLI

Z.AI (GLM)

Vercel AI Gateway

Kilo Gateway

Other bundled provider plugins

Providers via `models.providers` (custom/base URL)

Moonshot AI (Kimi)

Kimi Coding

Volcano Engine (Doubao)

BytePlus (International)

Synthetic

MiniMax

LM Studio

Ollama

vLLM

SGLang

Local proxies (LM Studio, vLLM, LiteLLM, etc.)

CLI examples

​Quick rules

​Plugin-owned provider behavior

​API key rotation

​Built-in providers (pi-ai catalog)

​OpenAI

​Anthropic

​OpenAI Codex OAuth

​Other subscription-style hosted options

​OpenCode

​Google Gemini (API key)

​Google Vertex and Gemini CLI

​Z.AI (GLM)

​Vercel AI Gateway

​Kilo Gateway

​Other bundled provider plugins

​Providers via models.providers (custom/base URL)

​Moonshot AI (Kimi)

​Kimi Coding

​Volcano Engine (Doubao)

​BytePlus (International)

​Synthetic

​MiniMax

​LM Studio

​Ollama

​vLLM

​SGLang

​Local proxies (LM Studio, vLLM, LiteLLM, etc.)

​CLI examples

​Related

Quick rules

Plugin-owned provider behavior

API key rotation

Built-in providers (pi-ai catalog)

OpenAI

Anthropic

OpenAI Codex OAuth

Other subscription-style hosted options

OpenCode

Google Gemini (API key)

Google Vertex and Gemini CLI

Z.AI (GLM)

Vercel AI Gateway

Kilo Gateway

Other bundled provider plugins

Providers via `models.providers` (custom/base URL)

Moonshot AI (Kimi)

Kimi Coding

Volcano Engine (Doubao)

BytePlus (International)

Synthetic

MiniMax

LM Studio

Ollama

vLLM

SGLang

Local proxies (LM Studio, vLLM, LiteLLM, etc.)

CLI examples

Related