Extended Thinking

Enable model reasoning/thinking tokens with configurable budget across providers.

Extended thinking lets models "think" before responding, producing internal reasoning tokens that improve quality on complex tasks. noumen provides a unified configuration that maps to each provider's native thinking support.

Configuration

import { Agent } from "noumen";
import { LocalSandbox } from "noumen/local";

const code = new Agent({
  provider,
  sandbox: LocalSandbox({ cwd: "/my/project" }),
  options: {
    thinking: { type: "enabled", budgetTokens: 10000 },
  },
});

ThinkingConfig

type ThinkingConfig =
  | { type: "enabled"; budgetTokens: number }
  | { type: "disabled" };

Provider behavior

ProviderImplementation
AnthropicMaps to thinking.type = "enabled" with budget_tokens. Requires a model that supports extended thinking (e.g., Claude with thinking).
OpenAIMaps to reasoning_effort for o-series models, or max_completion_tokens for others.
GeminiMaps to thinkingConfig.thinkingBudget in the Google GenAI SDK.

Stream events

Thinking content is streamed as thinking_delta events:

for await (const event of thread.run("...")) {
  if (event.type === "thinking_delta") {
    process.stdout.write(event.text); // thinking tokens
  }
  if (event.type === "text_delta") {
    process.stdout.write(event.text); // response tokens
  }
}

Retry interaction

When thinking is enabled and the retry engine adjusts max_tokens after a context overflow, the thinking budget is preserved as a minimum floor to ensure the model can still reason effectively.