Providers

Ollama

Run models locally with Ollama — no API key, no cloud dependency.

Setup

Install Ollama, then pull a model with tool-calling support:

ollama pull qwen2.5-coder:32b
ollama serve

Add the Vercel AI SDK provider for Ollama:

pnpm add ollama-ai-provider-v2
import { AiSdkProvider } from "noumen";
import { createOllama } from "ollama-ai-provider-v2";

const ollama = createOllama({
  // Point at a remote Ollama server by overriding baseURL:
  // baseURL: "http://192.168.1.10:11434/api",
});

const provider = new AiSdkProvider({
  model: ollama("qwen2.5-coder:32b"),
});

Options

All connection-level options come from createOllama.

OptionSourceDescription
baseURLcreateOllamaOllama server URL. Defaults to http://localhost:11434/api.
headerscreateOllamaCustom headers sent with every request.

Environment variables

VariableDescription
OLLAMA_HOSTOverride the Ollama server address (e.g. http://192.168.1.10:11434). The CLI shorthand (provider: "ollama") reads this to derive baseURL.

When OLLAMA_HOST is set, the CLI auto-detects Ollama as the provider.

CLI usage

The CLI auto-detects a running Ollama server when no cloud API keys are configured:

# Ollama is detected automatically
noumen "refactor this module"

# Or be explicit
noumen -p ollama -m llama3.1:70b "explain this code"

Ollama models must support OpenAI-compatible function calling for the full agent loop. Recommended models:

  • qwen2.5-coder:32b — strong coding and tool-calling (default).
  • qwen2.5-coder:14b — lighter alternative with good tool-calling.
  • llama3.1:70b — general-purpose with function calling.
  • llama3.1:8b — fast local option for simpler tasks.
  • mistral:7b — compact with tool support.
  • command-r:35b — Cohere's model with native tool use.

Streaming

ollama-ai-provider-v2 speaks LanguageModelV2 directly — no OpenAI-compat layer, no HTTP-level translation in noumen. Streaming, tool calls, and structured output work identically to any other AI SDK provider.