Skip to main content
This page covers LLM/model providers (not chat channels like WhatsApp/Telegram). For model selection rules, see /concepts/models.

Quick rules

  • Model refs use provider/model (example: opencode/claude-opus-4-6).
  • agents.defaults.models acts as an allowlist when set.
  • CLI helpers: fluffbuzz onboard, fluffbuzz models list, fluffbuzz models set <provider/model>.
  • models.providers.*.models[].contextWindow is native model metadata; contextTokens is the effective runtime cap.
  • Fallback rules, cooldown probes, and session-override persistence: Model failover.
  • OpenAI-family routes are prefix-specific: openai/<model> uses the direct OpenAI API-key provider in PI, openai-codex/<model> uses Codex OAuth in PI, and openai/<model> plus agents.defaults.embeddedHarness.runtime: "codex" uses the native Codex app-server harness. See OpenAI and Codex harness.
  • GPT-5.5 is currently available through subscription/OAuth routes: openai-codex/gpt-5.5 in PI or openai/gpt-5.5 with the Codex app-server harness. The direct API-key route for openai/gpt-5.5 is supported once OpenAI enables GPT-5.5 on the public API; until then use API-enabled models such as openai/gpt-5.4 for OPENAI_API_KEY setups.

Plugin-owned provider behavior

Most provider-specific logic lives in provider plugins (registerProvider(...)) while FluffBuzz keeps the generic inference loop. Plugins own onboarding, model catalogs, auth env-var mapping, transport/config normalization, tool-schema cleanup, failover classification, OAuth refresh, usage reporting, thinking/reasoning profiles, and more. The full list of provider-SDK hooks and bundled-plugin examples lives in Provider plugins. A provider that needs a totally custom request executor is a separate, deeper extension surface.
Provider runtime capabilities is shared runner metadata (provider family, transcript/tooling quirks, transport/cache hints). It is not the same as the public capability model, which describes what a plugin registers (text inference, speech, etc.).

API key rotation

  • Supports generic provider rotation for selected providers.
  • Configure multiple keys via:
    • FLUFFBUZZ_LIVE_<PROVIDER>_KEY (single live override, highest priority)
    • <PROVIDER>_API_KEYS (comma or semicolon list)
    • <PROVIDER>_API_KEY (primary key)
    • <PROVIDER>_API_KEY_* (numbered list, e.g. <PROVIDER>_API_KEY_1)
  • For Google providers, GOOGLE_API_KEY is also included as fallback.
  • Key selection order preserves priority and deduplicates values.
  • Requests are retried with the next key only on rate-limit responses (for example 429, rate_limit, quota, resource exhausted, Too many concurrent requests, ThrottlingException, concurrency limit reached, workers_ai ... quota limit exceeded, or periodic usage-limit messages).
  • Non-rate-limit failures fail immediately; no key rotation is attempted.
  • When all candidate keys fail, the final error is returned from the last attempt.

Built-in providers (pi-ai catalog)

FluffBuzz ships with the pi‑ai catalog. These providers require no models.providers config; just set auth + pick a model.

OpenAI

  • Provider: openai
  • Auth: OPENAI_API_KEY
  • Optional rotation: OPENAI_API_KEYS, OPENAI_API_KEY_1, OPENAI_API_KEY_2, plus FLUFFBUZZ_LIVE_OPENAI_KEY (single override)
  • Example models: openai/gpt-5.4, openai/gpt-5.4-mini
  • GPT-5.5 direct API support is future-ready here once OpenAI exposes GPT-5.5 on the API
  • CLI: fluffbuzz onboard --auth-choice openai-api-key
  • Default transport is auto (WebSocket-first, SSE fallback)
  • Override per model via agents.defaults.models["openai/<model>"].params.transport ("sse", "websocket", or "auto")
  • OpenAI Responses WebSocket warm-up defaults to enabled via params.openaiWsWarmup (true/false)
  • OpenAI priority processing can be enabled via agents.defaults.models["openai/<model>"].params.serviceTier
  • /fast and params.fastMode map direct openai/* Responses requests to service_tier=priority on api.openai.com
  • Use params.serviceTier when you want an explicit tier instead of the shared /fast toggle
  • Hidden FluffBuzz attribution headers (originator, version, User-Agent) apply only on native OpenAI traffic to api.openai.com, not generic OpenAI-compatible proxies
  • Native OpenAI routes also keep Responses store, prompt-cache hints, and OpenAI reasoning-compat payload shaping; proxy routes do not
  • openai/gpt-5.3-codex-spark is intentionally suppressed in FluffBuzz because live OpenAI API requests reject it and the current Codex catalog does not expose it
{
  agents: { defaults: { model: { primary: "openai/gpt-5.4" } } },
}

Anthropic

  • Provider: anthropic
  • Auth: ANTHROPIC_API_KEY
  • Optional rotation: ANTHROPIC_API_KEYS, ANTHROPIC_API_KEY_1, ANTHROPIC_API_KEY_2, plus FLUFFBUZZ_LIVE_ANTHROPIC_KEY (single override)
  • Example model: anthropic/claude-opus-4-6
  • CLI: fluffbuzz onboard --auth-choice apiKey
  • Direct public Anthropic requests support the shared /fast toggle and params.fastMode, including API-key and OAuth-authenticated traffic sent to api.anthropic.com; FluffBuzz maps that to Anthropic service_tier (auto vs standard_only)
  • Anthropic note: Anthropic staff told us FluffBuzz-style Claude CLI usage is allowed again, so FluffBuzz treats Claude CLI reuse and claude -p usage as sanctioned for this integration unless Anthropic publishes a new policy.
  • Anthropic setup-token remains available as a supported FluffBuzz token path, but FluffBuzz now prefers Claude CLI reuse and claude -p when available.
{
  agents: { defaults: { model: { primary: "anthropic/claude-opus-4-6" } } },
}

OpenAI Codex OAuth

  • Provider: openai-codex
  • Auth: OAuth (ChatGPT)
  • PI model ref: openai-codex/gpt-5.5
  • Native Codex app-server harness ref: openai/gpt-5.5 with agents.defaults.embeddedHarness.runtime: "codex"
  • Legacy model refs: codex/gpt-*
  • CLI: fluffbuzz onboard --auth-choice openai-codex or fluffbuzz models auth login --provider openai-codex
  • Default transport is auto (WebSocket-first, SSE fallback)
  • Override per PI model via agents.defaults.models["openai-codex/<model>"].params.transport ("sse", "websocket", or "auto")
  • params.serviceTier is also forwarded on native Codex Responses requests (chatgpt.com/backend-api)
  • Hidden FluffBuzz attribution headers (originator, version, User-Agent) are only attached on native Codex traffic to chatgpt.com/backend-api, not generic OpenAI-compatible proxies
  • Shares the same /fast toggle and params.fastMode config as direct openai/*; FluffBuzz maps that to service_tier=priority
  • openai-codex/gpt-5.5 keeps native contextWindow = 1000000 and a default runtime contextTokens = 272000; override the runtime cap with models.providers.openai-codex.models[].contextTokens
  • Policy note: OpenAI Codex OAuth is explicitly supported for external tools/workflows like FluffBuzz.
  • Current GPT-5.5 access uses this OAuth/subscription route until OpenAI enables GPT-5.5 on the public API.
{
  agents: { defaults: { model: { primary: "openai-codex/gpt-5.5" } } },
}
{
  models: {
    providers: {
      "openai-codex": {
        models: [{ id: "gpt-5.5", contextTokens: 160000 }],
      },
    },
  },
}

Other subscription-style hosted options

  • Qwen Cloud: Qwen Cloud provider surface plus Alibaba DashScope and Coding Plan endpoint mapping
  • MiniMax: MiniMax Coding Plan OAuth or API key access
  • GLM Models: Z.AI Coding Plan or general API endpoints

OpenCode

  • Auth: OPENCODE_API_KEY (or OPENCODE_ZEN_API_KEY)
  • Zen runtime provider: opencode
  • Go runtime provider: opencode-go
  • Example models: opencode/claude-opus-4-6, opencode-go/kimi-k2.5
  • CLI: fluffbuzz onboard --auth-choice opencode-zen or fluffbuzz onboard --auth-choice opencode-go
{
  agents: { defaults: { model: { primary: "opencode/claude-opus-4-6" } } },
}

Google Gemini (API key)

  • Provider: google
  • Auth: GEMINI_API_KEY
  • Optional rotation: GEMINI_API_KEYS, GEMINI_API_KEY_1, GEMINI_API_KEY_2, GOOGLE_API_KEY fallback, and FLUFFBUZZ_LIVE_GEMINI_KEY (single override)
  • Example models: google/gemini-3.1-pro-preview, google/gemini-3-flash-preview
  • Compatibility: legacy FluffBuzz config using google/gemini-3.1-flash-preview is normalized to google/gemini-3-flash-preview
  • CLI: fluffbuzz onboard --auth-choice gemini-api-key
  • Direct Gemini runs also accept agents.defaults.models["google/<model>"].params.cachedContent (or legacy cached_content) to forward a provider-native cachedContents/... handle; Gemini cache hits surface as FluffBuzz cacheRead

Google Vertex and Gemini CLI

  • Providers: google-vertex, google-gemini-cli
  • Auth: Vertex uses gcloud ADC; Gemini CLI uses its OAuth flow
  • Caution: Gemini CLI OAuth in FluffBuzz is an unofficial integration. Some users have reported Google account restrictions after using third-party clients. Review Google terms and use a non-critical account if you choose to proceed.
  • Gemini CLI OAuth is shipped as part of the bundled google plugin.
    • Install Gemini CLI first:
      • brew install gemini-cli
      • or npm install -g @google/gemini-cli
    • Enable: fluffbuzz plugins enable google
    • Login: fluffbuzz models auth login --provider google-gemini-cli --set-default
    • Default model: google-gemini-cli/gemini-3-flash-preview
    • Note: you do not paste a client id or secret into fluffbuzz.json. The CLI login flow stores tokens in auth profiles on the gateway host.
    • If requests fail after login, set GOOGLE_CLOUD_PROJECT or GOOGLE_CLOUD_PROJECT_ID on the gateway host.
    • Gemini CLI JSON replies are parsed from response; usage falls back to stats, with stats.cached normalized into FluffBuzz cacheRead.

Z.AI (GLM)

  • Provider: zai
  • Auth: ZAI_API_KEY
  • Example model: zai/glm-5.1
  • CLI: fluffbuzz onboard --auth-choice zai-api-key
    • Aliases: z.ai/* and z-ai/* normalize to zai/*
    • zai-api-key auto-detects the matching Z.AI endpoint; zai-coding-global, zai-coding-cn, zai-global, and zai-cn force a specific surface

Vercel AI Gateway

  • Provider: vercel-ai-gateway
  • Auth: AI_GATEWAY_API_KEY
  • Example models: vercel-ai-gateway/anthropic/claude-opus-4.6, vercel-ai-gateway/moonshotai/kimi-k2.6
  • CLI: fluffbuzz onboard --auth-choice ai-gateway-api-key

Kilo Gateway

  • Provider: kilocode
  • Auth: KILOCODE_API_KEY
  • Example model: kilocode/kilo/auto
  • CLI: fluffbuzz onboard --auth-choice kilocode-api-key
  • Base URL: https://api.kilo.ai/api/gateway/
  • Static fallback catalog ships kilocode/kilo/auto; live https://api.kilo.ai/api/gateway/models discovery can expand the runtime catalog further.
  • Exact upstream routing behind kilocode/kilo/auto is owned by Kilo Gateway, not hard-coded in FluffBuzz.
See /providers/kilocode for setup details.

Other bundled provider plugins

ProviderIdAuth envExample model
BytePlusbyteplus / byteplus-planBYTEPLUS_API_KEYbyteplus-plan/ark-code-latest
CerebrascerebrasCEREBRAS_API_KEYcerebras/zai-glm-4.7
Cloudflare AI Gatewaycloudflare-ai-gatewayCLOUDFLARE_AI_GATEWAY_API_KEY
GitHub Copilotgithub-copilotCOPILOT_GITHUB_TOKEN / GH_TOKEN / GITHUB_TOKEN
GroqgroqGROQ_API_KEY
Hugging Face InferencehuggingfaceHUGGINGFACE_HUB_TOKEN or HF_TOKENhuggingface/deepseek-ai/DeepSeek-R1
Kilo GatewaykilocodeKILOCODE_API_KEYkilocode/kilo/auto
Kimi CodingkimiKIMI_API_KEY or KIMICODE_API_KEYkimi/kimi-code
MiniMaxminimax / minimax-portalMINIMAX_API_KEY / MINIMAX_OAUTH_TOKENminimax/MiniMax-M2.7
MistralmistralMISTRAL_API_KEYmistral/mistral-large-latest
MoonshotmoonshotMOONSHOT_API_KEYmoonshot/kimi-k2.6
NVIDIAnvidiaNVIDIA_API_KEYnvidia/nvidia/llama-3.1-nemotron-70b-instruct
OpenRouteropenrouterOPENROUTER_API_KEYopenrouter/auto
QianfanqianfanQIANFAN_API_KEYqianfan/deepseek-v3.2
Qwen CloudqwenQWEN_API_KEY / MODELSTUDIO_API_KEY / DASHSCOPE_API_KEYqwen/qwen3.5-plus
StepFunstepfun / stepfun-planSTEPFUN_API_KEYstepfun/step-3.5-flash
TogethertogetherTOGETHER_API_KEYtogether/moonshotai/Kimi-K2.5
VeniceveniceVENICE_API_KEY
Vercel AI Gatewayvercel-ai-gatewayAI_GATEWAY_API_KEYvercel-ai-gateway/anthropic/claude-opus-4.6
Volcano Engine (Doubao)volcengine / volcengine-planVOLCANO_ENGINE_API_KEYvolcengine-plan/ark-code-latest
xAIxaiXAI_API_KEYxai/grok-4
XiaomixiaomiXIAOMI_API_KEYxiaomi/mimo-v2-flash
Quirks worth knowing:
  • OpenRouter applies its app-attribution headers and Anthropic cache_control markers only on verified openrouter.ai routes. As a proxy-style OpenAI-compatible path, it skips native-OpenAI-only shaping (serviceTier, Responses store, prompt-cache hints, OpenAI reasoning-compat). Gemini-backed refs keep proxy-Gemini thought-signature sanitation only.
  • Kilo Gateway Gemini-backed refs follow the same proxy-Gemini sanitation path; kilocode/kilo/auto and other proxy-reasoning-unsupported refs skip proxy reasoning injection.
  • MiniMax API-key onboarding writes explicit M2.7 model definitions with input: ["text", "image"]; the bundled catalog keeps chat refs text-only until that config is materialized.
  • xAI uses the xAI Responses path. /fast or params.fastMode: true rewrites grok-3, grok-3-mini, grok-4, and grok-4-0709 to their *-fast variants. tool_stream defaults on; disable via agents.defaults.models["xai/<model>"].params.tool_stream=false.
  • Cerebras GLM models use zai-glm-4.7 / zai-glm-4.6; OpenAI-compatible base URL is https://api.cerebras.ai/v1.

Providers via models.providers (custom/base URL)

Use models.providers (or models.json) to add custom providers or OpenAI/Anthropic‑compatible proxies. Many of the bundled provider plugins below already publish a default catalog. Use explicit models.providers.<id> entries only when you want to override the default base URL, headers, or model list.

Moonshot AI (Kimi)

Moonshot ships as a bundled provider plugin. Use the built-in provider by default, and add an explicit models.providers.moonshot entry only when you need to override the base URL or model metadata:
  • Provider: moonshot
  • Auth: MOONSHOT_API_KEY
  • Example model: moonshot/kimi-k2.6
  • CLI: fluffbuzz onboard --auth-choice moonshot-api-key or fluffbuzz onboard --auth-choice moonshot-api-key-cn
Kimi K2 model IDs:
  • moonshot/kimi-k2.6
  • moonshot/kimi-k2.5
  • moonshot/kimi-k2-thinking
  • moonshot/kimi-k2-thinking-turbo
  • moonshot/kimi-k2-turbo
{
  agents: {
    defaults: { model: { primary: "moonshot/kimi-k2.6" } },
  },
  models: {
    mode: "merge",
    providers: {
      moonshot: {
        baseUrl: "https://api.moonshot.ai/v1",
        apiKey: "${MOONSHOT_API_KEY}",
        api: "openai-completions",
        models: [{ id: "kimi-k2.6", name: "Kimi K2.6" }],
      },
    },
  },
}

Kimi Coding

Kimi Coding uses Moonshot AI’s Anthropic-compatible endpoint:
  • Provider: kimi
  • Auth: KIMI_API_KEY
  • Example model: kimi/kimi-code
{
  env: { KIMI_API_KEY: "sk-..." },
  agents: {
    defaults: { model: { primary: "kimi/kimi-code" } },
  },
}
Legacy kimi/k2p5 remains accepted as a compatibility model id.

Volcano Engine (Doubao)

Volcano Engine (火山引擎) provides access to Doubao and other models in China.
  • Provider: volcengine (coding: volcengine-plan)
  • Auth: VOLCANO_ENGINE_API_KEY
  • Example model: volcengine-plan/ark-code-latest
  • CLI: fluffbuzz onboard --auth-choice volcengine-api-key
{
  agents: {
    defaults: { model: { primary: "volcengine-plan/ark-code-latest" } },
  },
}
Onboarding defaults to the coding surface, but the general volcengine/* catalog is registered at the same time. In onboarding/configure model pickers, the Volcengine auth choice prefers both volcengine/* and volcengine-plan/* rows. If those models are not loaded yet, FluffBuzz falls back to the unfiltered catalog instead of showing an empty provider-scoped picker. Available models:
  • volcengine/doubao-seed-1-8-251228 (Doubao Seed 1.8)
  • volcengine/doubao-seed-code-preview-251028
  • volcengine/kimi-k2-5-260127 (Kimi K2.5)
  • volcengine/glm-4-7-251222 (GLM 4.7)
  • volcengine/deepseek-v3-2-251201 (DeepSeek V3.2 128K)
Coding models (volcengine-plan):
  • volcengine-plan/ark-code-latest
  • volcengine-plan/doubao-seed-code
  • volcengine-plan/kimi-k2.5
  • volcengine-plan/kimi-k2-thinking
  • volcengine-plan/glm-4.7

BytePlus (International)

BytePlus ARK provides access to the same models as Volcano Engine for international users.
  • Provider: byteplus (coding: byteplus-plan)
  • Auth: BYTEPLUS_API_KEY
  • Example model: byteplus-plan/ark-code-latest
  • CLI: fluffbuzz onboard --auth-choice byteplus-api-key
{
  agents: {
    defaults: { model: { primary: "byteplus-plan/ark-code-latest" } },
  },
}
Onboarding defaults to the coding surface, but the general byteplus/* catalog is registered at the same time. In onboarding/configure model pickers, the BytePlus auth choice prefers both byteplus/* and byteplus-plan/* rows. If those models are not loaded yet, FluffBuzz falls back to the unfiltered catalog instead of showing an empty provider-scoped picker. Available models:
  • byteplus/seed-1-8-251228 (Seed 1.8)
  • byteplus/kimi-k2-5-260127 (Kimi K2.5)
  • byteplus/glm-4-7-251222 (GLM 4.7)
Coding models (byteplus-plan):
  • byteplus-plan/ark-code-latest
  • byteplus-plan/doubao-seed-code
  • byteplus-plan/kimi-k2.5
  • byteplus-plan/kimi-k2-thinking
  • byteplus-plan/glm-4.7

Synthetic

Synthetic provides Anthropic-compatible models behind the synthetic provider:
  • Provider: synthetic
  • Auth: SYNTHETIC_API_KEY
  • Example model: synthetic/hf:MiniMaxAI/MiniMax-M2.5
  • CLI: fluffbuzz onboard --auth-choice synthetic-api-key
{
  agents: {
    defaults: { model: { primary: "synthetic/hf:MiniMaxAI/MiniMax-M2.5" } },
  },
  models: {
    mode: "merge",
    providers: {
      synthetic: {
        baseUrl: "https://api.synthetic.new/anthropic",
        apiKey: "${SYNTHETIC_API_KEY}",
        api: "anthropic-messages",
        models: [{ id: "hf:MiniMaxAI/MiniMax-M2.5", name: "MiniMax M2.5" }],
      },
    },
  },
}

MiniMax

MiniMax is configured via models.providers because it uses custom endpoints:
  • MiniMax OAuth (Global): --auth-choice minimax-global-oauth
  • MiniMax OAuth (CN): --auth-choice minimax-cn-oauth
  • MiniMax API key (Global): --auth-choice minimax-global-api
  • MiniMax API key (CN): --auth-choice minimax-cn-api
  • Auth: MINIMAX_API_KEY for minimax; MINIMAX_OAUTH_TOKEN or MINIMAX_API_KEY for minimax-portal
See /providers/minimax for setup details, model options, and config snippets. On MiniMax’s Anthropic-compatible streaming path, FluffBuzz disables thinking by default unless you explicitly set it, and /fast on rewrites MiniMax-M2.7 to MiniMax-M2.7-highspeed. Plugin-owned capability split:
  • Text/chat defaults stay on minimax/MiniMax-M2.7
  • Image generation is minimax/image-01 or minimax-portal/image-01
  • Image understanding is plugin-owned MiniMax-VL-01 on both MiniMax auth paths
  • Web search stays on provider id minimax

LM Studio

LM Studio ships as a bundled provider plugin which uses the native API:
  • Provider: lmstudio
  • Auth: LM_API_TOKEN
  • Default inference base URL: http://localhost:1234/v1
Then set a model (replace with one of the IDs returned by http://localhost:1234/api/v1/models):
{
  agents: {
    defaults: { model: { primary: "lmstudio/openai/gpt-oss-20b" } },
  },
}
FluffBuzz uses LM Studio’s native /api/v1/models and /api/v1/models/load for discovery + auto-load, with /v1/chat/completions for inference by default. See /providers/lmstudio for setup and troubleshooting.

Ollama

Ollama ships as a bundled provider plugin and uses Ollama’s native API:
# Install Ollama, then pull a model:
ollama pull llama3.3
{
  agents: {
    defaults: { model: { primary: "ollama/llama3.3" } },
  },
}
Ollama is detected locally at http://127.0.0.1:11434 when you opt in with OLLAMA_API_KEY, and the bundled provider plugin adds Ollama directly to fluffbuzz onboard and the model picker. See /providers/ollama for onboarding, cloud/local mode, and custom configuration.

vLLM

vLLM ships as a bundled provider plugin for local/self-hosted OpenAI-compatible servers:
  • Provider: vllm
  • Auth: Optional (depends on your server)
  • Default base URL: http://127.0.0.1:8000/v1
To opt in to auto-discovery locally (any value works if your server doesn’t enforce auth):
export VLLM_API_KEY="vllm-local"
Then set a model (replace with one of the IDs returned by /v1/models):
{
  agents: {
    defaults: { model: { primary: "vllm/your-model-id" } },
  },
}
See /providers/vllm for details.

SGLang

SGLang ships as a bundled provider plugin for fast self-hosted OpenAI-compatible servers:
  • Provider: sglang
  • Auth: Optional (depends on your server)
  • Default base URL: http://127.0.0.1:30000/v1
To opt in to auto-discovery locally (any value works if your server does not enforce auth):
export SGLANG_API_KEY="sglang-local"
Then set a model (replace with one of the IDs returned by /v1/models):
{
  agents: {
    defaults: { model: { primary: "sglang/your-model-id" } },
  },
}
See /providers/sglang for details.

Local proxies (LM Studio, vLLM, LiteLLM, etc.)

Example (OpenAI‑compatible):
{
  agents: {
    defaults: {
      model: { primary: "lmstudio/my-local-model" },
      models: { "lmstudio/my-local-model": { alias: "Local" } },
    },
  },
  models: {
    providers: {
      lmstudio: {
        baseUrl: "http://localhost:1234/v1",
        apiKey: "${LM_API_TOKEN}",
        api: "openai-completions",
        models: [
          {
            id: "my-local-model",
            name: "Local Model",
            reasoning: false,
            input: ["text"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
            contextWindow: 200000,
            maxTokens: 8192,
          },
        ],
      },
    },
  },
}
Notes:
  • For custom providers, reasoning, input, cost, contextWindow, and maxTokens are optional. When omitted, FluffBuzz defaults to:
    • reasoning: false
    • input: ["text"]
    • cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }
    • contextWindow: 200000
    • maxTokens: 8192
  • Recommended: set explicit values that match your proxy/model limits.
  • For api: "openai-completions" on non-native endpoints (any non-empty baseUrl whose host is not api.openai.com), FluffBuzz forces compat.supportsDeveloperRole: false to avoid provider 400 errors for unsupported developer roles.
  • Proxy-style OpenAI-compatible routes also skip native OpenAI-only request shaping: no service_tier, no Responses store, no prompt-cache hints, no OpenAI reasoning-compat payload shaping, and no hidden FluffBuzz attribution headers.
  • If baseUrl is empty/omitted, FluffBuzz keeps the default OpenAI behavior (which resolves to api.openai.com).
  • For safety, an explicit compat.supportsDeveloperRole: true is still overridden on non-native openai-completions endpoints.

CLI examples

fluffbuzz onboard --auth-choice opencode-zen
fluffbuzz models set opencode/claude-opus-4-6
fluffbuzz models list
See also: /gateway/configuration for full configuration examples.