openai-completions API.
FluffBuzz can also auto-discover available models from vLLM when you opt in with VLLM_API_KEY (any value works if your server does not enforce auth) and you do not define an explicit models.providers.vllm entry.
FluffBuzz treats vllm as a local OpenAI-compatible provider that supports
streamed usage accounting, so status/context token counts can update from
stream_options.include_usage responses.
| Property | Value |
|---|---|
| Provider ID | vllm |
| API | openai-completions (OpenAI-compatible) |
| Auth | VLLM_API_KEY environment variable |
| Default base URL | http://127.0.0.1:8000/v1 |
Getting started
Start vLLM with an OpenAI-compatible server
Your base URL should expose
/v1 endpoints (e.g. /v1/models, /v1/chat/completions). vLLM commonly runs on:Model discovery (implicit provider)
WhenVLLM_API_KEY is set (or an auth profile exists) and you do not define models.providers.vllm, FluffBuzz queries:
If you set
models.providers.vllm explicitly, auto-discovery is skipped and you must define models manually.Explicit configuration (manual models)
Use explicit config when:- vLLM runs on a different host or port
- You want to pin
contextWindowormaxTokensvalues - Your server requires a real API key (or you want to control headers)
Advanced configuration
Proxy-style behavior
Proxy-style behavior
vLLM is treated as a proxy-style OpenAI-compatible
/v1 backend, not a native
OpenAI endpoint. This means:| Behavior | Applied? |
|---|---|
| Native OpenAI request shaping | No |
service_tier | Not sent |
Responses store | Not sent |
| Prompt-cache hints | Not sent |
| OpenAI reasoning-compat payload shaping | Not applied |
| Hidden FluffBuzz attribution headers | Not injected on custom base URLs |
Custom base URL
Custom base URL
If your vLLM server runs on a non-default host or port, set
baseUrl in the explicit provider config:Troubleshooting
Server not reachable
Server not reachable
Check that the vLLM server is running and accessible:If you see a connection error, verify the host, port, and that vLLM started with the OpenAI-compatible server mode.
Auth errors on requests
Auth errors on requests
If requests fail with auth errors, set a real
VLLM_API_KEY that matches your server configuration, or configure the provider explicitly under models.providers.vllm.No models discovered
No models discovered
Auto-discovery requires
VLLM_API_KEY to be set and no explicit models.providers.vllm config entry. If you have defined the provider manually, FluffBuzz skips discovery and uses only your declared models.Related
Model selection
Choosing providers, model refs, and failover behavior.
OpenAI
Native OpenAI provider and OpenAI-compatible route behavior.
OAuth and auth
Auth details and credential reuse rules.
Troubleshooting
Common issues and how to resolve them.