Files
Everett 37a2be26f3 feat: port hermes-agent session-search, osv-check, clarify, and SSRF guard (#335 )
* feat: port hermes-agent session-search, osv-check, clarify, and SSRF guard

Adds four catch-up features from nousresearch/hermes-agent:

- `session_search`: FTS5 full-text search over stored messages for
  cross-conversation recall. New migration v21 introduces a `messages_fts`
  virtual table with triggers that keep it synced on INSERT/UPDATE/DELETE
  and backfills existing rows; the tool returns ranked snippets with chat
  metadata.
- `osv_check`: queries api.osv.dev for advisories across npm, PyPI,
  crates.io, RubyGems, Maven, NuGet, Packagist, Hex, Pub, and Go. Flags
  MAL-* malware advisories explicitly.
- `clarify`: structured multi-choice or open-ended question tool that
  delivers the question through the caller's channel and releases the
  turn so the next user message supplies the answer. Capped at 4
  predefined choices plus an automatic "Other" option.
- SSRF pre-flight guard on `web_fetch`: new `block_private_ips` field on
  `web_fetch_url_validation` (default on) rejects loopback, link-local,
  private, CGNAT, unique-local IPv6, documentation, benchmarking, and
  cloud-metadata targets. Runs on the initial URL and every redirect hop.

Wired into `ToolRegistry::new` and, for read-only tools, into
`ToolRegistry::new_sub_agent`. Generated docs updated to list 50 built-in
tools (was 47). Covered by unit tests for FTS search, SSRF ranges, and
OSV ecosystem canonicalization; full `cargo clippy -D warnings` and
`cargo test --all-targets` pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(session_search): scope default to caller's chat, gate cross-chat access

Previous default of searching all chats when chat_id was omitted leaked
messages across DMs, groups, and channels on the same microclaw
deployment — a caller in chat A could FTS5-search a snippet that only
existed in chat B.

Changes:
- Register `session_search` in `should_inject_default_chat_id` so the
  runtime injects the caller's chat_id when missing.
- In the tool, explicit `chat_id` is gated by `authorize_chat_access`
  (same caller or control chat only).
- Add `all_chats: true` opt-in that only control chats may use; it
  drops the chat scope entirely for audit/admin workflows.
- Update tool description so the agent knows the scope semantics.
- Add four unit tests: cross-chat denial, all_chats denial for
  non-control, all_chats success for control, default-scope is caller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: add multimedia tool suite (image gen/vision/TTS/STT)

Four OpenAI-compatible multimedia tools, all disabled by default and
opt-in per `media.<tool>.enabled`:

- `generate_image`: POST /v1/images/generations, saves PNG under
  <data_dir>/media/images/, delivers via channel send_attachment when
  supported. Supports b64_json + URL response shapes.
- `describe_image`: POST /v1/chat/completions with image content block.
  Accepts file paths inside working_dir, https:// URLs, or data: URIs;
  remote URLs are SSRF-checked then re-encoded as data: URIs so the
  provider always sees inline bytes.
- `text_to_speech`: POST /v1/audio/speech, saves audio to
  <data_dir>/media/audio/, delivers via channel send_attachment.
  Allowlists voices (alloy/echo/fable/...) and formats (mp3/opus/wav/...).
- `transcribe_audio`: POST /v1/audio/transcriptions as multipart/form-data.
  Accepts the same location forms as describe_image.

Shared `MediaClient` (microclaw-tools crate):
- Enforces SSRF guard on the configured base URL (prevents operator from
  pointing media traffic at loopback/private/metadata addresses)
- Redacts API keys from Debug output
- Resolves credentials in priority order: media.api_key (plaintext,
  discouraged) -> MICROCLAW_OPENAI_API_KEY -> OPENAI_API_KEY -> existing
  config.openai_api_key (for zero-config on existing deployments)

New config section `media` with per-tool knobs (model, default size/voice/
format, language) and a shared `openai_base_url` override. Defaults match
OpenAI's current catalog (gpt-image-1, gpt-4o-mini, tts-1, whisper-1).

Built-in tool count goes 50 -> 54. Schema unchanged. 5 new integration
tests cover SSRF guard on base URL. Existing 949 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(tools): insights — usage summary over trailing window

Aggregates llm_usage_logs and per-model breakdown into a markdown
report. Scoped to caller's chat by default; all_chats=true requires
control chat (same pattern as session_search).

Ported from hermes-agent's /insights [days] command, adapted to
microclaw's existing usage-tracking schema.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(title): auto-generate session titles via LLM

New module `src/title_generator.rs` with
`generate_and_save_title(config, db, chat_id)`. Loads the first ~8
messages of a chat (sessions.messages_json length >= 4), asks the
configured LLM for a 3-8 word title, strips quotes/trailing
punctuation, and writes it to sessions.label.

Also adds two small Database helpers:
- set_session_label(chat_id, label)
- get_session_label_and_length(chat_id) -> (Option<String>, usize)

Ported from hermes-agent's agent/title_generator.py. No automatic
scheduler hook yet — callers (web UI, admin CLI, or a future cron
task) invoke the function; the agent loop is never blocked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cache): tool result cache (schema v22) + osv_check wiring

New `tool_result_cache` table (migration v22) keyed by SHA-256 of
(tool_name + normalized input JSON). Auth-context fields are stripped
from the key so identical requests from different callers dedupe.
Default TTLs: web_fetch/search/osv/describe_image 15m-1h,
session_search 60s.

Helpers on microclaw-tools::tool_cache:
- cache_key(tool_name, &input)
- normalize_input_for_key (key-sorted, auth-stripped)
- default_ttls() catalog

Helpers on Database:
- get_cached_tool_result, put_cached_tool_result, prune_tool_result_cache

Wired into `osv_check` as a proof of concept — repeated queries on
the same package/ecosystem are served from SQLite within the TTL.
Other network tools can opt in the same way.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(core): redact module for PII / credential scrubbing

New microclaw-core::redact with `redact(&str) -> String` that replaces
well-known credential and PII patterns:
- sk-* / sk-ant-* / sk-proj-* API keys
- "Bearer <token>" headers
- GitHub PATs (ghp_/gho_/ghu_/ghs_/ghr_)
- AWS access keys (AKIA*, ASIA*)
- Slack tokens (xox[baprs]-*)
- Google API keys (AIza*)
- api_key=... in JSON/YAML bodies
- Emails (masked user, domain kept for debugging)
- Phone numbers (E.164 / CN 11-digit)

Ported from hermes-agent's agent/redact.py. Compiled regexes are
cached in a Lazy<Vec<...>>. 7 unit tests cover each branch plus a
multi-secret case and a plain-text passthrough.

Module is opt-in: callers apply redact at boundaries that might emit
sensitive data to logs or error messages. Wiring into the tracing
subscriber is deliberately deferred to keep the diff small.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(web): robots.txt consultation for web_fetch

New module crates/microclaw-tools/src/website_policy.rs:
- parse_robots_txt(text, user_agent) with UA-specific and '*' fallbacks
- Longest-prefix match between Allow and Disallow (standard semantics)
- Crawl-Delay surfaced in CrawlHint so callers can pace requests
- Per-host cache (30min TTL, 500KB body cap)
- Fail-open on network errors, 4xx, 5xx

consult_robots(client, url, user_agent) -> CrawlHint returns
{allowed, reason, crawl_delay_secs}. Ported from hermes-agent's
tools/website_policy.py. Integration with web_fetch left as a follow-up
to keep the diff small; the module is a pure helper today.

6 unit tests cover empty/disallow/allow-overrides/ua-specific/
crawl-delay/comments.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): reqwest redirect hook enforces SSRF on every hop

Adds url_safety::ssrf_redirect_policy(max_hops) that returns a
reqwest::redirect::Policy re-validating each redirect target against
check_url_private_ip. A blocked hop short-circuits the chain with a
descriptive error; the normal limit kicks in at max_hops.

MediaClient now uses this policy so provider-side redirects or any
third-party SDK that internally follows Location headers cannot slip
traffic into loopback/private/metadata ranges. web_fetch's manual
redirect loop remains unchanged (it already validates each hop); the
new policy gives the same guarantee for clients that can't do manual
redirect handling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(tools): truncate oversized tool results to artifacts + fetch_artifact

Tool results above tool_result_truncation_threshold_chars (default 4000)
now keep head + tail in the message history and spill the full body to a
new tool_result_artifacts table with a TTL. The agent reads further into
the body via the new fetch_artifact tool, which is scoped to the chat
that produced the artifact.

Stops bash/web_fetch/read_file blasts from inflating every subsequent
turn's prompt cost, while keeping the full body recoverable for the rest
of the session.

* feat(memory): per-row TTL + recency decay for ranking

Adds an `expires_at` column to memories so the agent can mark
time-bounded facts (e.g. "working from Tokyo this week") for auto-prune.
write_memory accepts `ttl_days`; the reflector tick deletes anything
past its expiry along with stale tool-result artifacts.

L1 ranking in build_db_memory_context now multiplies confidence by an
exponential recency-decay (configurable half-life, default 30d) so stale
EVENT/KNOWLEDGE rows fall behind durable ones. PROFILE memories are
exempt — they describe the user, not transient state.

* feat(agent): per-tool duplicate-call circuit breaker

Tracks the last N (tool_name, args_hash) keys across iterations. When the
same call would run for the (limit+1)th time inside the window, the agent
loop short-circuits it with an error tool_result that nudges the model to
change approach instead of repeating itself. Defaults to a 10-call window
with a limit of 3; both knobs are configurable.

Distinct from the existing whole-turn-fingerprint streak guard
(MAX_IDENTICAL_TOOL_USE_STREAK), which aborts the loop. The breaker is
softer: only the offending call fails, so the model can self-correct in
the same turn. fetch_artifact is exempted because paginated reads of one
artifact look like duplicates by design.

* feat(skills): structured tool trajectory for skill review

Replace the lossy "[sender]: content" message dump fed to the skill-review
LLM with a step-numbered trajectory built from the structured Vec<Message>
loaded from sessions.messages_json. Each tool_use block is rendered with
its name + truncated JSON input; each tool_result with its head + error
flag. Image blocks are dropped, oversized payloads are head-truncated
with a "+N chars" suffix.

Also replace the messages.len() / 3 tool-call estimate with an exact
count of tool_use blocks. Skips review when no session row exists rather
than running on degraded data.

* feat(skills): success-signal filter before review LLM call

Skill review now consults a cheap heuristic (assess_success) before
spending tokens on the review LLM. Conversations are flagged Unlikely —
and skipped — when:
  - the duplicate-call circuit breaker fired during the turn
  - the agent ran tool calls but emitted no closing text
  - more than half of tool_results errored
  - the closing assistant text contains apology/failure phrasing
    (English + Chinese)

Saves the LLM call on obvious failures and prevents codifying broken
approaches as reusable skills.

* feat(skills): trigger skill review at end-of-turn instead of reflector tick

Replace the periodic reflector-driven review with an on-completion handoff:
the agent loop enqueues chat_id to AppState.skill_review_queue right
after persisting the final session, and a dedicated worker task drains
the queue (deduping bursts) and runs the review pipeline.

Why this is better than the old path:
  - reviews fire seconds after a turn ends, not up to reflector_interval_mins
    later, so context is fresh and feedback loops are tight
  - each conversation is reviewed once per completion, not once per
    reflector tick (no more re-reviewing the same chat on every tick)
  - the agent loop never blocks on review work; the queue is non-blocking
    and the worker runs out of band
  - dedup batching collapses multiple enqueues for the same chat (e.g.
    rapid user turns) into a single review

Implementation: skill_review.rs gains run_skill_review,
build_skill_review_channel, and spawn_skill_review_worker. AppState owns
the SkillReviewQueue handle. Scheduler's reflector tick no longer
initiates reviews.

* feat(skills): review can edit / patch existing skills, not just create

Replace the create-or-skip review verdict with a four-action enum
({"action": "create" | "edit" | "patch" | "none"}). The review LLM now
sees existing skills with descriptions + a mutability tag and chooses to:

  - create: brand-new skill (version: 1)
  - edit:   full rewrite of an existing agent-created skill (version + 1)
  - patch:  single-occurrence find/replace inside agent-created skill
            (version + 1, ambiguous matches refused)
  - none:   no-op

Human-curated skills (source != "agent-created") are immutable from this
path. Each agent-created skill carries a monotonic version counter in its
frontmatter, surfaced in the apply-action log line. Legacy
{"create": true|false, ...} responses are still accepted as a transitional
shape so older prompts and self-hosted models keep working.

Frontmatter version line is updated in place by patch (preserving the
rest of the YAML), or rewritten wholesale by create/edit.

* feat(skills): activation tracking + auto-archive of inactive skills

Track every successful activate_skill call to a new
skill_activation_logs table (schema v25). The reflector tick walks the
skills directory once per cycle and moves agent-created skills that
haven't been activated within skill_archive_after_days (default 30) to
<skills_dir>/.archived/<name>-<timestamp>/, where the discoverer can't
see them but the move is reversible.

The archive policy is split into a pure decision rule
(should_archive_skill) and the IO-only sweep (archive_inactive_agent_skills),
so the policy is exhaustively unit-testable without mtime gymnastics.

Guards against false-positive archival:
  - human-curated skills (source != "agent-created") are never touched
  - freshly-written skills (mtime within threshold) are kept regardless
    of activation history, so a never-yet-activated new skill survives
    the next sweep
  - threshold_days = 0 disables the sweep entirely

Also exposes Database::skill_activation_counts_since for the insights
tool to consume in a follow-up.

* feat(skills): retrieval-gated catalog — inline top-K hot matches by query

build_skills_catalog_for_query scores every skill's name+description
against the current user query (keyword overlap with CJK n-gram
support, reused from memory_service::tokenize_for_relevance) and
splits the catalog into:

  - Hot bucket (top skills_catalog_top_k matches with score > 0):
    full SKILL.md body inlined, capped at 1500 chars per skill.
  - Cold bucket: name + truncated description only, with the standard
    "use activate_skill to load" hint.

Trades a bigger token slice for the most-relevant skills (so the agent
has procedural knowledge inline and skips an activate_skill round-trip)
against keeping the long tail cheap. Falls back to the flat catalog
when top_k = 0, the query is empty, or no skill scores > 0 — all the
old behavior is preserved as the degenerate path.

skills_catalog_top_k defaults to 3.

* feat(skills): enable autonomous skill review by default

Flip skill_review_min_tool_calls from 0 (disabled) to 5 — the whole
end-of-turn skill review pipeline now runs out of the box, no config
needed. The 5-tool-call threshold is the same minimum the prompt itself
asks the review LLM to look for, so smaller turns still skip review and
don't burn the LLM call.

Operators who don't want autonomous skill creation can still opt out by
setting skill_review_min_tool_calls: 0.

Also fix the integration test minimal_config() to include the nine
config fields added in the recent feature commits — it had drifted
because cargo test --lib doesn't catch tests/ misses.

* feat(security): media tools honor media.allowed_read_dirs allowlist

Vision and STT tools previously rejected any file path outside
working_dir. That blocks legitimate setups where media lives on a
mounted volume or shared cache directory. Add an explicit allowlist
parameter to load_bytes_from_location and thread media.allowed_read_dirs
through describe_image and transcribe_audio so operators can opt in
extra roots without weakening the default guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(voice): cross-channel inbound voice transcription

Telegram already auto-transcribed `voice` messages, but Discord audio
attachments and Feishu `audio` events were dropped or surfaced as a
"not yet supported" placeholder, and Slack audio file uploads were
ignored. Hoist the STT dispatch + inbound formatting out of telegram
into a shared `voice` module, then plug it into the Discord/Slack/Feishu
attachment paths so every platform that receives audio routes it
through the same OpenAI/local STT provider with a uniform
`[voice message from <user>]: <text>` shape the agent already handles.

Web is unchanged — its frontend doesn't capture microphone input today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(context): project-level context files in system prompt

Hermes Agent surfaces a workspace-wide "Context Files" layer that sits
above per-chat memory: facts that should shape every conversation in a
deployment but are not personality (SOUL.md) and not user-curated
recall. MicroClaw had no equivalent — operators had to inline such
notes into SOUL.md or rely on the reflector to discover them.

Add a Project Context layer: load all `*.md` files (alphabetical) from
`<data_dir>/context/` plus `<runtime_data_dir>/groups/<chat_id>/context/`
for chat-scoped overrides, concatenate them, cap at
`context_max_chars` (default 8000), and inject into the system prompt
between the identity preamble and the dynamic Memories section so the
prefix stays cache-friendly. Path is overridable via `context_dir`;
setting `context_max_chars: 0` disables the layer entirely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): bash command-content gate + readable approval preview

The HITL approval flow already paused high-risk tools waiting for
operator confirmation, but the existing gate had two gaps relative to
Hermes-style command approval:

1. Approval was bound to the *tool name* and the *chat type* — bash
   running in a non-control chat slipped through with no inspection of
   what command was about to execute.
2. The "waiting for confirmation" message only named the tool, leaving
   the operator to scroll back through tool-call JSON to see the
   actual command before approving.

Add `bash_dangerous_patterns` (case-insensitive regex list, with a
curated default covering rm -rf /, pipe-to-shell installers, sudo, dd,
forkbombs, mkfs, recursive chmod/chown of root) compiled into the
BashTool. When a command matches, bash returns `approval_required`
even outside control chats; the existing auto-retry path then handles
re-execution after the operator approves. Capture the bash command (or
truncated input JSON for future high-risk tools) into
`waiting_approval_preview` so the pause message includes a fenced code
block of exactly what the agent intends to run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(skills): agentskills.io spec compatibility

The SKILL.md parser already understood `name` and `description` plus
MicroClaw's nested `compatibility.{os,deps}` form, but the
agentskills.io standard is now adopted across Claude, Claude Code,
OpenCode, Codex, Cursor, Goose, and a growing list of clients. Skills
authored against that spec used three fields the parser silently
dropped — `license`, `compatibility` (flat string form), and
`allowed-tools` — and skill names with characters allowed by MicroClaw
but disallowed by the spec couldn't round-trip to other clients.

Make the parser accept the spec's flat fields alongside MicroClaw's
nested forms (untagged enum on `compatibility`), surface the new
fields on SkillMetadata for downstream consumers, and tighten the
skill_manage create/edit name validator to the spec's character rules
(lowercase a-z, digits, hyphens; no leading/trailing/consecutive
hyphens; ≤64 chars). Pre-existing skills with underscores or uppercase
names still load — the stricter rule only applies to skills the agent
creates from now on, so they're portable to other clients out of the
box.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(memory): per-chat USER.md user-model layer

Hermes splits a single curated USER.md narrative from the bag of atomic
memories so the agent always has a coherent description of who the
user is, regardless of which atomic facts happen to rank high for the
current query. MicroClaw had PROFILE-category memories injected at L0,
but those still arrive as fragmented rows that compete for budget and
can contradict each other as they accumulate.

Add a curated user-model layer:

- New `<runtime_data_dir>/groups/<channel>/<chat_id>/USER.md` per chat,
  with read/write helpers on MemoryManager.
- `load_user_model` reads the file with a `user_model_max_chars` cap
  (default 1500, matching Hermes); returns None when the layer is
  disabled (`user_model_max_chars: 0`) or the file is missing.
- System prompt grows a `# User Model` section between the soul/identity
  preamble and Project Context, so the user model anchors the
  prefix-cache prefix above query-driven memory ranking.
- Reflector ends each `reflect_for_chat` with `curate_user_model_for_chat`,
  which calls a small dedicated LLM with the current USER.md + the
  chat's PROFILE memories + a recent conversation excerpt and rewrites
  the file. The curator is gated by `user_model_curation_due` so it
  amortizes across `reflector_interval_mins * user_model_curation_interval`
  (default ~3 reflector ticks) instead of firing every tick.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(reflector): fold USER.md curation into single LLM call

The previous USER.md commit added a second LLM round trip per reflector
tick to curate the user model. That doubled reflector cost for chats
where PROFILE memories were extracted.

Extend the existing reflector JSON output schema with an optional
user_model field, include the current USER.md in the reflector's user
message, and persist whatever the LLM returns (or null when no rewrite
is needed). Drop the standalone curator function and the
user_model_curation_interval knob — the LLM itself decides when a
rewrite is warranted by emitting null, so the per-tick amortization
gate is no longer load-bearing.

Also tighten the response parser: legacy top-level arrays previously
fell through into the embedded-object scan, which would silently
match the first array element's braces and drop the rest. Branch on
the parsed JSON type so arrays take the legacy memories path
unambiguously.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(context): per-chat context dir uses channel/chat_id layout

The project-context loader I added earlier wrote per-chat overlays at
runtime/groups/<chat_id>/context/, but every other per-chat artifact
(AGENTS.md, USER.md, soul overrides) lives at
runtime/groups/<channel>/<chat_id>/. That mismatch forces operators to
remember two layouts and would silently pick up the wrong overlay when
the same numeric chat_id appeared on different channels.

Thread caller_channel through load_project_context and join it before
the chat_id segment so context directories sit alongside the other
per-chat files, with a regression test that confirms a chat scoped to
telegram doesn't leak its overlay to a discord chat with the same id.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(skills): surface agentskills.io fields + warn on legacy names

The previous compat commit started parsing license / compatibility /
allowed-tools but those fields stayed invisible to operators, and
skill_review's name validator still accepted underscores while
skill_manage's already enforced the spec — so reviewer-proposed skills
could land with names that round-tripped fine inside MicroClaw but
broke the moment they were published to other Agent Skills clients.

- Skill listings (`microclaw skill list` / available output) now
  indent license, compatibility (string form), and allowed-tools
  beneath each available skill so operators can audit declared
  metadata at a glance.
- Discovery emits a one-shot warn per non-spec-compliant skill name —
  silent for legacy installs that already passed loading, but loud
  enough to nudge a rename before publication.
- skill_review's reviewer delegates to validate_agentskills_name so
  proposed names match the same rule skill_manage enforces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(doctor): diagnostics for context cap, USER.md, bash patterns

The new system-prompt layers (project context, USER.md) and the bash
command-content gate landed without doctor coverage, so a misconfigured
deployment — context_max_chars=0, user_model_max_chars=0, or an
invalid regex in bash_dangerous_patterns — would silently degrade
behavior with no preflight signal.

Add three checks to `microclaw doctor`:

- `context.max_chars` warns when 0 (layer disabled) or absurdly large
  (prefix-cache risk).
- `user_model.max_chars` warns when 0 or above the curation budget
  Hermes treats as the upper bound.
- `bash.dangerous_patterns` compiles each entry and FAILs loudly when
  any regex is invalid — the runtime currently swallows compile errors,
  which leaves the gate weaker than operators expect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): redact PII before writes to USER.md and memory rows

The redact module already scrubs OpenAI/Anthropic/GitHub/AWS/Slack/Google
keys, bearer tokens, and emails for log output, but it wasn't on the
write path for any persisted memory artifact. Reflector-extracted
memories quote conversation content verbatim, so a user pasting an API
key into chat would land that key in long-lived storage and the
embedding store, and the new USER.md curator could verbatim-quote the
same secrets into a per-chat narrative file.

Apply `redact::redact` at three boundaries:

- MemoryManager::write_chat_user_model — USER.md content gets scrubbed
  before hitting disk.
- MemoryManager::{write_global,write_chat,write_bot}_memory — same
  treatment for AGENTS.md narrative files.
- memory_service::apply_reflector_extractions — DB memory rows get
  scrubbed after normalization but before topic-key dedup and
  insertion, so neither the topic key nor the embedding payload sees
  the secret.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(commands): /user shows and clears per-chat USER.md

USER.md is curated by the reflector and silently injected into every
system prompt for the chat, but operators had no way to inspect what
the curator had decided about them or to nudge it back to a clean
slate when the narrative drifted.

Add a slash command:

- `/user` prints the current USER.md with a `(used/cap chars)` header
  so the operator can see how much room is left, or a friendly hint
  when the file is empty.
- `/user clear` removes the file via a new
  `MemoryManager::clear_chat_user_model` helper; the reflector
  rebuilds it on its next tick.
- Anything else after `/user` falls through to a one-line usage hint.

The handler is a free function so the command logic doesn't need a
full AppState fixture to test — the storage helper that does the
on-disk work has its own unit test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(voice): outbound TTS round-trip for voice-inbound turns

Inbound voice was already transcribed across Telegram/Discord/Slack/Feishu
in an earlier commit, but the bot's reply always came back as text — a
jarring asymmetry for users on a phone who expected to listen to the
response on the same surface they spoke into.

Add an opt-in `voice_round_trip` config flag that, when paired with
`media.tts.enabled`, renders the reply text as audio via the existing
OpenAI-compatible /audio/speech endpoint and ships it back through the
channel:

- New `voice::synth_speech_to_temp` and `voice::round_trip_enabled`
  helpers so each channel pulls in two thin wrappers instead of
  fabricating a tool-input shape just to play back text.
- Telegram tracks `voice_inbound`, then uses `bot.send_voice` so the
  client renders the reply as a native voice bubble.
- Discord tracks `voice_inbound`, then attaches the audio file to a
  follow-up message via serenity's CreateMessage builder.
- Slack threads `voice_inbound` through the audio-injection path and
  uploads the synthesized reply via files.upload (mirroring how the
  SlackAdapter delivers attachments).
- Feishu deferred — its audio message type requires a separate
  resource-upload + tenant-token round trip.

Defaults to false because each round-trip burns one extra TTS call;
the operator must explicitly opt in alongside `media.tts.enabled`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): clippy, audit, docs, and test-compile fixes for #335

- agent_engine.rs: allow clippy::too_many_arguments on build_system_prompt
  (now 9 args after user_model + project_context); matches existing usage
  in setup.rs / slack.rs / feishu.rs / db.rs.
- doctor.rs: collapse two if_same_then_else branches in context/USER.md
  cap checks (warn on 0 OR over-cap) — same status, single arm.
- Cargo.lock: bump rustls-webpki 0.103.10 -> 0.103.13 to clear
  RUSTSEC-2026-0104 (reachable panic in CRL parsing).
- tests/config_validation.rs: add bash_dangerous_patterns, context_dir,
  context_max_chars, voice_round_trip, user_model_max_chars to the
  minimal_config helper. The fields were added in earlier commits on the
  branch but the test wasn't updated, breaking Rust and Coverage CI on
  all platforms.
- docs/generated/config-defaults.md: regenerate to match the new fields
  so the docs --check gate passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(providers): add Xiaomi MiMo preset (MiMo-V2.5-Pro / MiMo-V2.5)

Xiaomi exposes its MiMo line through an OpenAI-compatible endpoint at
https://api.xiaomimimo.com/v1, so this is a one-row addition to
PROVIDER_PRESETS — `provider_protocol`, `default_model_for_provider`,
the setup picker, and the generated provider matrix all derive from
that table.

Default model is MiMo-V2.5-Pro; MiMo-V2.5 is offered as the second
option in the model picker.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(providers): xiaomi MiMo model ids are lowercase

The /v1/models endpoint returns lowercase ids (`mimo-v2.5-pro`,
`mimo-v2.5`, `mimo-v2-pro`, `mimo-v2-omni`); the camel-case names I
shipped in the previous commit ("MiMo-V2.5-Pro") got rejected as
"Not supported model" when the setup wizard ran its model test.

Also expand the model list with mimo-v2-pro and mimo-v2-omni so the
picker reflects the full non-TTS lineup. TTS variants are excluded
because microclaw's chat path doesn't drive them.

Verified against https://token-plan-cn.xiaomimimo.com/v1 (the coding
plan gateway) — chat completion succeeds with model id mimo-v2.5-pro.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(providers): add mimo-v2-flash to xiaomi preset

User confirmed mimo-v2-flash is a real model on the Xiaomi MiMo line
even though /v1/models on the coding-plan gateway doesn't currently
list it (likely tier-gated behind that endpoint). Added between
mimo-v2-pro and mimo-v2-omni so the picker reflects pro -> flash ->
omni from heaviest to lightest within the v2 generation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(read_file): clamp offset to line count to avoid slice panic

When the agent passed an offset beyond the file's line count, the
slice `lines[offset..end]` panicked with "range start index N out of
range for slice of length M" because end was clamped to lines.len()
but offset was not — leaving offset > end. The panic surfaced from
inside a tokio worker, which is bad: a malformed tool input
shouldn't crash the runtime.

Clamp offset to lines.len() so an out-of-range offset yields an
empty (offset..offset) slice, and switch offset+limit to
saturating_add to be safe against pathological inputs. Added a
regression test that reproduces the original panic (offset=369 on a
3-line file).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(reflector): downgrade benign no-op responses, log preview on real failures

When the reflector LLM declined to extract anything (returning empty,
"null", "[]", "{}", "none", or a short refusal), the parser fell
through all four JSON strategies and logged ERROR — flooding the log
with what is actually a benign "nothing to update" signal. Operators
saw repeated "parse failed for chat ...: no valid JSON found" errors
even though the runtime was behaving correctly.

Distinguish the two cases now:
- Empty / explicit-no-op shapes (length < 16, or matching the common
  refusal tokens) log at info — the model just had nothing to say.
- Anything else still logs at warn (downgraded from error) and
  includes a 200-char response preview, so when the prompt schema
  drifts or a provider misbehaves we can see the actual payload
  without rerunning with LLM debug streams enabled.

Added two regression tests covering both branches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(providers): align preset list with OpenClaw + sort A→Z

Add 17 missing OpenClaw-supported LLM providers and sort the entire
PROVIDER_PRESETS table alphabetically by id. The setup wizard's preset
picker and the generated provider matrix both mirror this order
verbatim, so users now see a predictable A→Z list.

New providers (all OpenAI-compatible):
- arcee, cerebras, cloudflare-ai-gateway, deepinfra, fireworks, groq,
  inferrs, kilocode, litellm, lmstudio, qianfan, sglang, stepfun,
  venice, vercel-ai-gateway, vllm, volcengine

Skipped:
- Pure-multimedia providers (azure-speech, comfy, deepgram, elevenlabs,
  fal, gradium, inworld, runway, senseaudio, vydra) — microclaw routes
  multimedia through `media.*` tools, not a separate provider concept.
- glm / zai — already covered by the existing `zhipu` preset (label is
  "Zhipu AI (GLM / Z.AI)").
- qwen — already covered by `aliyun-bailian` and `alibaba`.
- opencode / opencode-go — OpenClaw-internal catalogs.
- github-copilot — needs an OAuth/token-exchange flow that doesn't fit
  the simple preset shape.
- perplexity — a web-search plugin, not an LLM provider.
- bedrock-mantle — variant of bedrock; the existing `bedrock` entry
  already covers the OpenAI-compat surface.
- claude-max-api-proxy — community proxy; the existing `custom` preset
  is the right shape for any OpenAI-compat localhost endpoint.

Added a regression test enforcing the A→Z invariant so future additions
don't drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(readme): surface hermes-era features and new tools

Update Features and Tools sections to reflect what shipped on this
branch — per-chat USER.md user model, cross-channel voice, multimedia
suite, defensive web_fetch defaults, tool-result truncation +
fetch_artifact, skill lifecycle, plus the nine new built-in tools
(session_search, clarify, osv_check, insights, fetch_artifact,
generate_image, describe_image, text_to_speech, transcribe_audio).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:55:07 -07:00
1.2 KiB

Raw Permalink Blame History

Generated Built-in Tools

This file is generated by scripts/generate_docs_artifacts.mjs. Do not edit manually.
Total built-in tools: 56
a2a_list_peers
a2a_send
activate_skill
bash
browser
calculate
cancel_scheduled_task
clarify
compare_time
describe_image
edit_file
export_chat
fetch_artifact
generate_image
get_current_time
get_task_history
glob
grep
insights
knowledge_graph_add
knowledge_graph_query
list_scheduled_task_dlq
list_scheduled_tasks
osv_check
pause_scheduled_task
read_file
read_memory
replay_scheduled_task_dlq
resume_scheduled_task
schedule_task
send_message
session_search
sessions_spawn
skill_manage
structured_memory_delete
structured_memory_search
structured_memory_update
subagents_focus
subagents_focused
subagents_info
subagents_kill
subagents_list
subagents_log
subagents_orchestrate
subagents_retry_announces
subagents_send
subagents_unfocus
sync_skills
text_to_speech
todo_read
todo_write
transcribe_audio
web_fetch
web_search
write_file
write_memory
1.2 KiB Raw Permalink Blame History

Generated Built-in Tools

1.2 KiB

Raw Permalink Blame History