Loop Engineering
Run /goal to start a long-horizon recursive goal that keeps inspecting, changing, testing, and reflecting.
Most agents, routers, and inference engines are designed as separate layers. The agent keeps sending generic chat traffic, while prefix cache stability, route choice, serving behavior, and context pressure stay invisible. Inferoa brings those tokenmaxxing surfaces into the agent harness itself.
Run /goal to start a long-horizon recursive goal that keeps inspecting, changing, testing, and reflecting.
Stable prompt epochs and deterministic tool schemas protect reusable session prefixes.
Compression, graph-shaped repo context, bounded tool output, and route choice reduce token waste while preserving evidence.
vLLM Engine and Omni keep cache, latency, cost, and multimodal signals native to the harness.
Inferoa starts with coding because coding exposes long-horizon pressure clearly: large repos, changing goals, tool failures, repeated model calls, context limits, and proof through tests. The goal is to co-design the agent harness, goal loop, and inference stack so every turn spends context, cache, route choice, and serving capacity more deliberately.
One durable outcome expands through horizons, evidence, reflection, and completion reports.
Stable prompt epochs, bounded context, and fixed tool schemas keep long sessions warm.
vLLM SR chooses paths while vLLM Engine supplies high-throughput, memory-efficient serving.
A restrained entry point for the configured model, workspace, and core commands.

Run /goal to start a long-horizon recursive goal with horizons, evidence, and reflection.

Ambiguous scope becomes an inspectable plan before execution starts.

Benchmark runs, failures, fixes, and metrics stay in one research loop.


High-performance serving is the base. inferoa treats prefix-cache stability and endpoint signals as agent state.

Routing belongs in the loop. Cost, safety, privacy, capability, and session pressure can choose the model path.

Multimodal work stays native. Image, video, and audio understanding or generation live in the same durable session.