/api/chat) for hosted cloud models and local/self-hosted Ollama servers. You can use Ollama in three modes: Cloud + Local through a reachable Ollama host, Cloud only against https://ollama.com, or Local only against a reachable Ollama host.
Getting started
Choose your preferred setup method and mode.- Onboarding (recommended)
- Manual setup
Best for: fastest path to a working Ollama cloud or local setup.Optionally specify a custom base URL or model:
Choose your mode
- Cloud + Local — local Ollama host plus cloud models routed through that host
- Cloud only — hosted Ollama models via
https://ollama.com - Local only — local models only
Select a model
Cloud only prompts for OLLAMA_API_KEY and suggests hosted cloud defaults. Cloud + Local and Local only ask for an Ollama base URL, discover available models, and auto-pull the selected local model if it is not available yet. Cloud + Local also checks whether that Ollama host is signed in for cloud access.Non-interactive mode
Cloud models
- Cloud + Local
- Cloud only
- Local only
Cloud + Local uses a reachable Ollama host as the control point for both local and cloud models. This is Ollama’s preferred hybrid flow.Use Cloud + Local during setup. FluffBuzz prompts for the Ollama base URL, discovers local models from that host, and checks whether the host is signed in for cloud access with ollama signin. When the host is signed in, FluffBuzz also suggests hosted cloud defaults such as kimi-k2.5:cloud, minimax-m2.7:cloud, and glm-5.1:cloud.If the host is not signed in yet, FluffBuzz keeps the setup local-only until you run ollama signin.Model discovery (implicit provider)
When you setOLLAMA_API_KEY (or an auth profile) and do not define models.providers.ollama, FluffBuzz discovers models from the local Ollama instance at http://127.0.0.1:11434.
| Behavior | Detail |
|---|---|
| Catalog query | Queries /api/tags |
| Capability detection | Uses best-effort /api/show lookups to read contextWindow and detect capabilities (including vision) |
| Vision models | Models with a vision capability reported by /api/show are marked as image-capable (input: ["text", "image"]), so FluffBuzz auto-injects images into the prompt |
| Reasoning detection | Marks reasoning with a model-name heuristic (r1, reasoning, think) |
| Token limits | Sets maxTokens to the default Ollama max-token cap used by FluffBuzz |
| Costs | Sets all costs to 0 |
If you set
models.providers.ollama explicitly, auto-discovery is skipped and you must define models manually. See the explicit config section below.Vision and image description
The bundled Ollama plugin registers Ollama as an image-capable media-understanding provider. This lets FluffBuzz route explicit image-description requests and configured image-model defaults through local or hosted Ollama vision models. For local vision, pull a model that supports images:--model must be a full <provider/model> ref. When it is set, fluffbuzz infer image describe runs that model directly instead of skipping description because the model supports native vision.
To make Ollama the default image-understanding model for inbound media, configure agents.defaults.imageModel:
models.providers.ollama.models manually, mark vision models with image input support:
/api/show reports a vision capability.
Configuration
- Basic (implicit discovery)
- Explicit (manual models)
- Custom base URL
The simplest local-only enablement path is via environment variable:
Model selection
Once configured, all your Ollama models are available:Ollama Web Search
FluffBuzz supports Ollama Web Search as a bundledweb_search provider.
| Property | Detail |
|---|---|
| Host | Uses your configured Ollama host (models.providers.ollama.baseUrl when set, otherwise http://127.0.0.1:11434) |
| Auth | Key-free |
| Requirement | Ollama must be running and signed in with ollama signin |
fluffbuzz onboard or fluffbuzz configure --section web, or set:
For the full setup and behavior details, see Ollama Web Search.
Advanced configuration
Legacy OpenAI-compatible mode
Legacy OpenAI-compatible mode
If you need to use the OpenAI-compatible endpoint instead (for example, behind a proxy that only supports OpenAI format), set This mode may not support streaming and tool calling simultaneously. You may need to disable streaming with
api: "openai-completions" explicitly:params: { streaming: false } in model config.When api: "openai-completions" is used with Ollama, FluffBuzz injects options.num_ctx by default so Ollama does not silently fall back to a 4096 context window. If your proxy/upstream rejects unknown options fields, disable this behavior:Context windows
Context windows
For auto-discovered models, FluffBuzz uses the context window reported by Ollama when available, otherwise it falls back to the default Ollama context window used by FluffBuzz.You can override
contextWindow and maxTokens in explicit provider config:Reasoning models
Reasoning models
FluffBuzz treats models with names such as No additional configuration is needed — FluffBuzz marks them automatically.
deepseek-r1, reasoning, or think as reasoning-capable by default.Model costs
Model costs
Ollama is free and runs locally, so all model costs are set to $0. This applies to both auto-discovered and manually defined models.
Memory embeddings
Memory embeddings
The bundled Ollama plugin registers a memory embedding provider for
memory search. It uses the configured Ollama base URL
and API key.
To select Ollama as the memory search embedding provider:
| Property | Value |
|---|---|
| Default model | nomic-embed-text |
| Auto-pull | Yes — the embedding model is pulled automatically if not present locally |
Streaming configuration
Streaming configuration
FluffBuzz’s Ollama integration uses the native Ollama API (
/api/chat) by default, which fully supports streaming and tool calling simultaneously. No special configuration is needed.For native /api/chat requests, FluffBuzz also forwards thinking control directly to Ollama: /think off and fluffbuzz agent --thinking off send top-level think: false, while non-off thinking levels send think: true.Troubleshooting
Ollama not detected
Ollama not detected
Make sure Ollama is running and that you set Verify that the API is accessible:
OLLAMA_API_KEY (or an auth profile), and that you did not define an explicit models.providers.ollama entry:No models available
No models available
If your model is not listed, either pull the model locally or define it explicitly in
models.providers.ollama.Connection refused
Connection refused
Check that Ollama is running on the correct port:
More help: Troubleshooting and FAQ.
Related
Model selection
Overview of all providers, model refs, and failover behavior.
Model selection
How to choose and configure models.
Ollama Web Search
Full setup and behavior details for Ollama-powered web search.
Configuration
Full config reference.