Skip to main content
FluffBuzz integrates with Ollama’s native API (/api/chat) for hosted cloud models and local/self-hosted Ollama servers. You can use Ollama in three modes: Cloud + Local through a reachable Ollama host, Cloud only against https://ollama.com, or Local only against a reachable Ollama host.
Remote Ollama users: Do not use the /v1 OpenAI-compatible URL (http://host:11434/v1) with FluffBuzz. This breaks tool calling and models may output raw tool JSON as plain text. Use the native Ollama API URL instead: baseUrl: "http://host:11434" (no /v1).

Getting started

Choose your preferred setup method and mode.

Cloud models

Cloud + Local uses a reachable Ollama host as the control point for both local and cloud models. This is Ollama’s preferred hybrid flow.Use Cloud + Local during setup. FluffBuzz prompts for the Ollama base URL, discovers local models from that host, and checks whether the host is signed in for cloud access with ollama signin. When the host is signed in, FluffBuzz also suggests hosted cloud defaults such as kimi-k2.5:cloud, minimax-m2.7:cloud, and glm-5.1:cloud.If the host is not signed in yet, FluffBuzz keeps the setup local-only until you run ollama signin.

Model discovery (implicit provider)

When you set OLLAMA_API_KEY (or an auth profile) and do not define models.providers.ollama, FluffBuzz discovers models from the local Ollama instance at http://127.0.0.1:11434.
BehaviorDetail
Catalog queryQueries /api/tags
Capability detectionUses best-effort /api/show lookups to read contextWindow and detect capabilities (including vision)
Vision modelsModels with a vision capability reported by /api/show are marked as image-capable (input: ["text", "image"]), so FluffBuzz auto-injects images into the prompt
Reasoning detectionMarks reasoning with a model-name heuristic (r1, reasoning, think)
Token limitsSets maxTokens to the default Ollama max-token cap used by FluffBuzz
CostsSets all costs to 0
This avoids manual model entries while keeping the catalog aligned with the local Ollama instance.
# See what models are available
ollama list
fluffbuzz models list
To add a new model, simply pull it with Ollama:
ollama pull mistral
The new model will be automatically discovered and available to use.
If you set models.providers.ollama explicitly, auto-discovery is skipped and you must define models manually. See the explicit config section below.

Vision and image description

The bundled Ollama plugin registers Ollama as an image-capable media-understanding provider. This lets FluffBuzz route explicit image-description requests and configured image-model defaults through local or hosted Ollama vision models. For local vision, pull a model that supports images:
ollama pull qwen2.5vl:7b
export OLLAMA_API_KEY="ollama-local"
Then verify with the infer CLI:
fluffbuzz infer image describe \
  --file ./photo.jpg \
  --model ollama/qwen2.5vl:7b \
  --json
--model must be a full <provider/model> ref. When it is set, fluffbuzz infer image describe runs that model directly instead of skipping description because the model supports native vision. To make Ollama the default image-understanding model for inbound media, configure agents.defaults.imageModel:
{
  agents: {
    defaults: {
      imageModel: {
        primary: "ollama/qwen2.5vl:7b",
      },
    },
  },
}
If you define models.providers.ollama.models manually, mark vision models with image input support:
{
  id: "qwen2.5vl:7b",
  name: "qwen2.5vl:7b",
  input: ["text", "image"],
  contextWindow: 128000,
  maxTokens: 8192,
}
FluffBuzz rejects image-description requests for models that are not marked image-capable. With implicit discovery, FluffBuzz reads this from Ollama when /api/show reports a vision capability.

Configuration

The simplest local-only enablement path is via environment variable:
export OLLAMA_API_KEY="ollama-local"
If OLLAMA_API_KEY is set, you can omit apiKey in the provider entry and FluffBuzz will fill it for availability checks.

Model selection

Once configured, all your Ollama models are available:
{
  agents: {
    defaults: {
      model: {
        primary: "ollama/gpt-oss:20b",
        fallbacks: ["ollama/llama3.3", "ollama/qwen2.5-coder:32b"],
      },
    },
  },
}
FluffBuzz supports Ollama Web Search as a bundled web_search provider.
PropertyDetail
HostUses your configured Ollama host (models.providers.ollama.baseUrl when set, otherwise http://127.0.0.1:11434)
AuthKey-free
RequirementOllama must be running and signed in with ollama signin
Choose Ollama Web Search during fluffbuzz onboard or fluffbuzz configure --section web, or set:
{
  tools: {
    web: {
      search: {
        provider: "ollama",
      },
    },
  },
}
For the full setup and behavior details, see Ollama Web Search.

Advanced configuration

Tool calling is not reliable in OpenAI-compatible mode. Use this mode only if you need OpenAI format for a proxy and do not depend on native tool calling behavior.
If you need to use the OpenAI-compatible endpoint instead (for example, behind a proxy that only supports OpenAI format), set api: "openai-completions" explicitly:
{
  models: {
    providers: {
      ollama: {
        baseUrl: "http://ollama-host:11434/v1",
        api: "openai-completions",
        injectNumCtxForOpenAICompat: true, // default: true
        apiKey: "ollama-local",
        models: [...]
      }
    }
  }
}
This mode may not support streaming and tool calling simultaneously. You may need to disable streaming with params: { streaming: false } in model config.When api: "openai-completions" is used with Ollama, FluffBuzz injects options.num_ctx by default so Ollama does not silently fall back to a 4096 context window. If your proxy/upstream rejects unknown options fields, disable this behavior:
{
  models: {
    providers: {
      ollama: {
        baseUrl: "http://ollama-host:11434/v1",
        api: "openai-completions",
        injectNumCtxForOpenAICompat: false,
        apiKey: "ollama-local",
        models: [...]
      }
    }
  }
}
For auto-discovered models, FluffBuzz uses the context window reported by Ollama when available, otherwise it falls back to the default Ollama context window used by FluffBuzz.You can override contextWindow and maxTokens in explicit provider config:
{
  models: {
    providers: {
      ollama: {
        models: [
          {
            id: "llama3.3",
            contextWindow: 131072,
            maxTokens: 65536,
          }
        ]
      }
    }
  }
}
FluffBuzz treats models with names such as deepseek-r1, reasoning, or think as reasoning-capable by default.
ollama pull deepseek-r1:32b
No additional configuration is needed — FluffBuzz marks them automatically.
Ollama is free and runs locally, so all model costs are set to $0. This applies to both auto-discovered and manually defined models.
The bundled Ollama plugin registers a memory embedding provider for memory search. It uses the configured Ollama base URL and API key.
PropertyValue
Default modelnomic-embed-text
Auto-pullYes — the embedding model is pulled automatically if not present locally
To select Ollama as the memory search embedding provider:
{
  agents: {
    defaults: {
      memorySearch: { provider: "ollama" },
    },
  },
}
FluffBuzz’s Ollama integration uses the native Ollama API (/api/chat) by default, which fully supports streaming and tool calling simultaneously. No special configuration is needed.For native /api/chat requests, FluffBuzz also forwards thinking control directly to Ollama: /think off and fluffbuzz agent --thinking off send top-level think: false, while non-off thinking levels send think: true.
If you need to use the OpenAI-compatible endpoint, see the “Legacy OpenAI-compatible mode” section above. Streaming and tool calling may not work simultaneously in that mode.

Troubleshooting

Make sure Ollama is running and that you set OLLAMA_API_KEY (or an auth profile), and that you did not define an explicit models.providers.ollama entry:
ollama serve
Verify that the API is accessible:
curl http://localhost:11434/api/tags
If your model is not listed, either pull the model locally or define it explicitly in models.providers.ollama.
ollama list  # See what's installed
ollama pull gemma4
ollama pull gpt-oss:20b
ollama pull llama3.3     # Or another model
Check that Ollama is running on the correct port:
# Check if Ollama is running
ps aux | grep ollama

# Or restart Ollama
ollama serve
More help: Troubleshooting and FAQ.

Model selection

Overview of all providers, model refs, and failover behavior.

Model selection

How to choose and configure models.

Ollama Web Search

Full setup and behavior details for Ollama-powered web search.

Configuration

Full config reference.