Quick start
Paste this intofluffbuzz.json for a safe-default setup — plugin on, scoped to
the main agent, direct-message sessions only, inherits the session model
when available:
plugins.entries.active-memory.enabled: trueturns the plugin onconfig.agents: ["main"]opts only themainagent into active memoryconfig.allowedChatTypes: ["direct"]scopes it to direct-message sessions (opt in groups/channels explicitly)config.model(optional) pins a dedicated recall model; unset inherits the current session modelconfig.modelFallbackis used only when no explicit or inherited model resolvesconfig.promptStyle: "balanced"is the default forrecentmode- Active memory still runs only for eligible interactive persistent chat sessions
Speed recommendations
The simplest setup is to leaveconfig.model unset and let Active Memory use
the same model you already use for normal replies. That is the safest default
because it follows your existing provider, auth, and model preferences.
If you want Active Memory to feel faster, use a dedicated inference model
instead of borrowing the main chat model. Recall quality matters, but latency
matters more than for the main answer path, and Active Memory’s tool surface
is narrow (it only calls memory_search and memory_get).
Good fast-model options:
cerebras/gpt-oss-120bfor a dedicated low-latency recall modelgoogle/gemini-3-flashas a low-latency fallback without changing your primary chat model- your normal session model, by leaving
config.modelunset
Cerebras setup
Add a Cerebras provider and point Active Memory at it:chat/completions access for the
chosen model — /v1/models visibility alone does not guarantee it.
How to see it
Active memory injects a hidden untrusted prompt prefix for the model. It does not expose raw<active_memory_plugin>...</active_memory_plugin> tags in the
normal client-visible reply.
Session toggle
Use the plugin command when you want to pause or resume active memory for the current chat session without editing config:plugins.entries.active-memory.enabled, agent targeting, or other global
configuration.
If you want the command to write config and pause or resume active memory for
all sessions, use the explicit global form:
plugins.entries.active-memory.config.enabled. It leaves
plugins.entries.active-memory.enabled on so the command remains available to
turn active memory back on later.
If you want to see what active memory is doing in a live session, turn on the
session toggles that match the output you want:
- an active memory status line such as
Active Memory: status=ok elapsed=842ms query=recent summary=34 charswhen/verbose on - a readable debug summary such as
Active Memory Debug: Lemon pepper wings with blue cheese.when/trace on
/trace raw, the traced Model Input (User Role) block will
show the hidden Active Memory prefix as:
When it runs
Active memory uses two gates:- Config opt-in
The plugin must be enabled, and the current agent id must appear in
plugins.entries.active-memory.config.agents. - Strict runtime eligibility Even when enabled and targeted, active memory only runs for eligible interactive persistent chat sessions.
Session types
config.allowedChatTypes controls which kinds of conversations may run Active
Memory at all.
The default is:
Where it runs
Active memory is a conversational enrichment feature, not a platform-wide inference feature.| Surface | Runs active memory? |
|---|---|
| Control UI / web chat persistent sessions | Yes, if the plugin is enabled and the agent is targeted |
| Other interactive channel sessions on the same persistent chat path | Yes, if the plugin is enabled and the agent is targeted |
| Headless one-shot runs | No |
| Heartbeat/background runs | No |
Generic internal agent-command paths | No |
| Sub-agent/internal helper execution | No |
Why use it
Use active memory when:- the session is persistent and user-facing
- the agent has meaningful long-term memory to search
- continuity and personalization matter more than raw prompt determinism
- stable preferences
- recurring habits
- long-term user context that should surface naturally
- automation
- internal workers
- one-shot API tasks
- places where hidden personalization would be surprising
How it works
The runtime shape is: The blocking memory sub-agent can use only:memory_searchmemory_get
NONE.
Query modes
config.queryMode controls how much conversation the blocking memory sub-agent
sees. Pick the smallest mode that still answers follow-up questions well;
timeout budgets should grow with context size (message < recent < full).
- message
- recent
- full
Only the latest user message is sent.Use this when:
- you want the fastest behavior
- you want the strongest bias toward stable preference recall
- follow-up turns do not need conversational context
3000 to 5000 ms for config.timeoutMs.Prompt styles
config.promptStyle controls how eager or strict the blocking memory sub-agent is
when deciding whether to return memory.
Available styles:
balanced: general-purpose default forrecentmodestrict: least eager; best when you want very little bleed from nearby contextcontextual: most continuity-friendly; best when conversation history should matter morerecall-heavy: more willing to surface memory on softer but still plausible matchesprecision-heavy: aggressively prefersNONEunless the match is obviouspreference-only: optimized for favorites, habits, routines, taste, and recurring personal facts
config.promptStyle is unset:
config.promptStyle explicitly, that override wins.
Example:
Model fallback policy
Ifconfig.model is unset, Active Memory tries to resolve a model in this order:
config.modelFallback controls the configured fallback step.
Optional custom fallback:
config.modelFallbackPolicy is retained only as a deprecated compatibility
field for older configs. It no longer changes runtime behavior.
Advanced escape hatches
These options are intentionally not part of the recommended setup.config.thinking can override the blocking memory sub-agent thinking level:
config.promptAppend adds extra operator instructions after the default Active
Memory prompt and before the conversation context:
config.promptOverride replaces the default Active Memory prompt. FluffBuzz
still appends the conversation context afterward:
NONE
or compact user-fact context for the main model.
Transcript persistence
Active memory blocking memory sub-agent runs create a realsession.jsonl
transcript during the blocking memory sub-agent call.
By default, that transcript is temporary:
- it is written to a temp directory
- it is used only for the blocking memory sub-agent run
- it is deleted immediately after the run finishes
config.transcriptDir.
Use this carefully:
- blocking memory sub-agent transcripts can accumulate quickly on busy sessions
fullquery mode can duplicate a lot of conversation context- these transcripts contain hidden prompt context and recalled memories
Configuration
All active memory configuration lives under:| Key | Type | Meaning |
|---|---|---|
enabled | boolean | Enables the plugin itself |
config.agents | string[] | Agent ids that may use active memory |
config.model | string | Optional blocking memory sub-agent model ref; when unset, active memory uses the current session model |
config.queryMode | "message" | "recent" | "full" | Controls how much conversation the blocking memory sub-agent sees |
config.promptStyle | "balanced" | "strict" | "contextual" | "recall-heavy" | "precision-heavy" | "preference-only" | Controls how eager or strict the blocking memory sub-agent is when deciding whether to return memory |
config.thinking | "off" | "minimal" | "low" | "medium" | "high" | "xhigh" | "adaptive" | "max" | Advanced thinking override for the blocking memory sub-agent; default off for speed |
config.promptOverride | string | Advanced full prompt replacement; not recommended for normal use |
config.promptAppend | string | Advanced extra instructions appended to the default or overridden prompt |
config.timeoutMs | number | Hard timeout for the blocking memory sub-agent, capped at 120000 ms |
config.maxSummaryChars | number | Maximum total characters allowed in the active-memory summary |
config.logging | boolean | Emits active memory logs while tuning |
config.persistTranscripts | boolean | Keeps blocking memory sub-agent transcripts on disk instead of deleting temp files |
config.transcriptDir | string | Relative blocking memory sub-agent transcript directory under the agent sessions folder |
| Key | Type | Meaning |
|---|---|---|
config.maxSummaryChars | number | Maximum total characters allowed in the active-memory summary |
config.recentUserTurns | number | Prior user turns to include when queryMode is recent |
config.recentAssistantTurns | number | Prior assistant turns to include when queryMode is recent |
config.recentUserChars | number | Max chars per recent user turn |
config.recentAssistantChars | number | Max chars per recent assistant turn |
config.cacheTtlMs | number | Cache reuse for repeated identical queries |
Recommended setup
Start withrecent.
/verbose on for the
normal status line and /trace on for the active-memory debug summary instead
of looking for a separate active-memory debug command. In chat channels, those
diagnostic lines are sent after the main assistant reply rather than before it.
Then move to:
messageif you want lower latencyfullif you decide extra context is worth the slower blocking memory sub-agent
Debugging
If active memory is not showing up where you expect:- Confirm the plugin is enabled under
plugins.entries.active-memory.enabled. - Confirm the current agent id is listed in
config.agents. - Confirm you are testing through an interactive persistent chat session.
- Turn on
config.logging: trueand watch the gateway logs. - Verify memory search itself works with
fluffbuzz memory status --deep.
maxSummaryChars
- lower
queryMode - lower
timeoutMs - reduce recent turn counts
- reduce per-turn char caps
Common issues
Active Memory rides on the normalmemory_search pipeline under
agents.defaults.memorySearch, so most recall surprises are embedding-provider
problems, not Active Memory bugs.
Embedding provider switched or stopped working
Embedding provider switched or stopped working
If
memorySearch.provider is unset, FluffBuzz auto-detects the first
available embedding provider. A new API key, quota exhaustion, or a
rate-limited hosted provider can change which provider resolves between
runs. If no provider resolves, memory_search may degrade to lexical-only
retrieval; runtime failures after a provider is already selected do not
fall back automatically.Pin the provider (and an optional fallback) explicitly to make selection
deterministic. See Memory Search for the full
list of providers and pinning examples.Recall feels slow, empty, or inconsistent
Recall feels slow, empty, or inconsistent
- Turn on
/trace onto surface the plugin-owned Active Memory debug summary in the session. - Turn on
/verbose onto also see the🧩 Active Memory: ...status line after each reply. - Watch gateway logs for
active-memory: ... start|done,memory sync failed (search-bootstrap), or provider embedding errors. - Run
fluffbuzz memory status --deepto inspect the memory-search backend and index health. - If you use
ollama, confirm the embedding model is installed (ollama list).