noumen — coding agents are products. This is the library.

Enable model reasoning/thinking tokens with configurable budget across providers.

Extended thinking lets models "think" before responding, producing internal reasoning tokens that improve quality on complex tasks. noumen provides a unified configuration that maps to each provider's native thinking support.

Configuration

import { Agent } from "noumen";
import { LocalSandbox } from "noumen/local";

const code = new Agent({
  provider,
  sandbox: LocalSandbox({ cwd: "/my/project" }),
  options: {
    thinking: { type: "enabled", budgetTokens: 10000 },
  },
});

ThinkingConfig

type ThinkingConfig =
  | { type: "enabled"; budgetTokens: number }
  | { type: "disabled" };

Provider behavior

Provider	Implementation
Anthropic	Maps to `thinking.type = "enabled"` with `budget_tokens`. Requires a model that supports extended thinking (e.g., Claude with thinking).
OpenAI	Maps to `reasoning_effort` for o-series models, or `max_completion_tokens` for others.
Gemini	Maps to `thinkingConfig.thinkingBudget` in the Google GenAI SDK.

Stream events

Thinking content is streamed as thinking_delta events:

for await (const event of thread.run("...")) {
  if (event.type === "thinking_delta") {
    process.stdout.write(event.text); // thinking tokens
  }
  if (event.type === "text_delta") {
    process.stdout.write(event.text); // response tokens
  }
}

Retry interaction

When thinking is enabled and the retry engine adjusts max_tokens after a context overflow, the thinking budget is preserved as a minimum floor to ensure the model can still reason effectively.

Extended Thinking

Configuration

ThinkingConfig

Provider behavior

Stream events

Retry interaction

On this page