›_InferoaGitHub

Inference-nativeTokenmaxxingAgentHarness

Inference-native Tokenmaxxing Agent Harness for Loop Engineering

Install latest dev
The mismatch

The Gap

Most agents, routers, and inference engines are designed as separate layers. The agent keeps sending generic chat traffic, while prefix cache stability, route choice, serving behavior, and context pressure stay invisible. Inferoa brings those tokenmaxxing surfaces into the agent harness itself.

Loop engineering surfaces

The Goal Loop

01

Loop Engineering

Run /goal to start a long-horizon recursive goal that keeps inspecting, changing, testing, and reflecting.

02

Tokenmaxxing discipline

Stable prompt epochs and deterministic tool schemas protect reusable session prefixes.

03

Context + routing

Compression, graph-shaped repo context, bounded tool output, and route choice reduce token waste while preserving evidence.

04

Inference-native serving

vLLM Engine and Omni keep cache, latency, cost, and multimodal signals native to the harness.

Mission

Engineer the loop, not the next prompt.

Inferoa starts with coding because coding exposes long-horizon pressure clearly: large repos, changing goals, tool failures, repeated model calls, context limits, and proof through tests. The goal is to co-design the agent harness, goal loop, and inference stack so every turn spends context, cache, route choice, and serving capacity more deliberately.

01Goal mode drives the loop

One durable outcome expands through horizons, evidence, reflection, and completion reports.

02Tokenmaxxing keeps it stable

Stable prompt epochs, bounded context, and fixed tool schemas keep long sessions warm.

03Routing and serving stay native

vLLM SR chooses paths while vLLM Engine supplies high-throughput, memory-efficient serving.

Quick Look

Inside a Session

01

Welcome

A restrained entry point for the configured model, workspace, and core commands.

Inferoa Welcome session demo
02

Goal Mode

Run /goal to start a long-horizon recursive goal with horizons, evidence, and reflection.

Inferoa Goal Mode session demo
03

Plan Mode

Ambiguous scope becomes an inspectable plan before execution starts.

Inferoa Plan Mode session demo
04

Autoresearch

Benchmark runs, failures, fixes, and metrics stay in one research loop.

Inferoa Autoresearch session demo
Cross-stack path

Across the Tokenmaxxing Stack

  1. 01Goal Looprecursive horizons + reflection
  2. 02Agent Harnesssessions, tools, evidence
  3. 03Tokenmaxxingprefix, context, routing
  4. 04vLLM ServingEngine + Omni