<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Inferoa Blog</title>
        <link>https://inferoa.agentic-in.ai/blog</link>
        <description>Inferoa Blog</description>
        <lastBuildDate>Mon, 08 Jun 2026 00:00:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <item>
            <title><![CDATA[Inferoa: Inference-native Tokenmaxxing Agent Harness for Loop Engineering]]></title>
            <link>https://inferoa.agentic-in.ai/blog/announcing-inferoa</link>
            <guid>https://inferoa.agentic-in.ai/blog/announcing-inferoa</guid>
            <pubDate>Mon, 08 Jun 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Inferoa is an Inference-native Tokenmaxxing Agent Harness for Loop Engineering, recursive long-horizon goals, context optimization, routing, and high-throughput model serving.]]></description>
            <content:encoded><![CDATA[<p><img decoding="async" loading="lazy" alt="Inferoa: Inference-native Tokenmaxxing Agent Harness for Loop Engineering" src="https://inferoa.agentic-in.ai/assets/images/inferoa-banner-fa9fa47c10be3a18e8247a81c60ad18e.png" width="1672" height="941" class="img_ev3q"></p>
<p>Most agents call models as if inference were a black box.</p>
<p>The agent loop lives in one place, routing policy in another, serving behavior
somewhere else, and context management becomes a last-minute fight with the
window. That split is tolerable for one-turn chat. It breaks down when agents
run for hours, recover from failures, compress context, warm prefix cache, route
between model paths, and still need to prove the work at the end.</p>
<blockquote>
<p>Prefix cache stability is ignored. Routing is bolted on later. Context is
pasted until it fits. Users pay for that gap.</p>
</blockquote>
<p>Inferoa = <strong>Infer</strong>(Inference-native)<strong>o</strong>(Tokenmaxxing Loop
Engineering)<strong>a</strong>(Agent Harness).</p>
<p>Inferoa is an <strong>Inference-native Tokenmaxxing Agent Harness for Loop
Engineering</strong>. It is built for <strong>recursive long-horizon goals</strong>: define the
outcome once, then the agent loop keeps inspecting, changing, testing,
reflecting, and continuing until the work is proven.</p>
<p>That is what <strong>inference-native</strong> means here: Inferoa starts from the inference
stack and co-designs loop engineering around <strong>tokenmaxxing</strong>:
<strong>prefix-cache discipline</strong>, <strong>context optimization</strong> with
<a href="https://github.com/rtk-ai/rtk" target="_blank" rel="noopener noreferrer" class="">RTK</a> and
<a href="https://www.npmjs.com/package/@colbymchenry/codegraph" target="_blank" rel="noopener noreferrer" class="">CodeGraph</a>,
<strong>intelligent routing</strong> through
<a href="https://github.com/vllm-project/semantic-router" target="_blank" rel="noopener noreferrer" class="">vLLM Semantic Router</a>,
<strong>high-throughput vLLM serving</strong> with
<a href="https://github.com/vllm-project/vllm" target="_blank" rel="noopener noreferrer" class="">vLLM Engine</a>, <strong>vLLM Omni</strong> multimodal
capability, and native <strong>goal</strong>, <strong>plan</strong>, and <strong>autoresearch</strong> loops with
<strong>tokenmaxxing observability</strong>.</p>
<p><img decoding="async" loading="lazy" alt="Inferoa welcome session" src="https://inferoa.agentic-in.ai/assets/images/welcome-c6cfc1ba62eccb15647a4a5c59316e95.gif" width="1864" height="1080" class="img_ev3q"></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-breaks">What Breaks<a href="https://inferoa.agentic-in.ai/blog/announcing-inferoa#what-breaks" class="hash-link" aria-label="Direct link to What Breaks" title="Direct link to What Breaks" translate="no">​</a></h2>
<p>Long-horizon agents are not one prompt. They are many turns of planning, repo
inspection, shell commands, edits, retries, compaction, cache warmup, route
selection, and verification. If the harness treats every turn as generic chat
traffic, it throws away the optimization surface underneath it.</p>
<p><img decoding="async" loading="lazy" alt="What breaks when long-horizon agents treat inference as a black box" src="https://inferoa.agentic-in.ai/assets/images/inferoa-what-breaks-2bf3415b4d15e8ec724d3188fee2f2df.png" width="1672" height="941" class="img_ev3q"></p>
<p>The failure modes are familiar:</p>
<ul>
<li class="">prompt shape drifts, so prefix cache cannot be reused reliably;</li>
<li class="">context selection becomes "paste more" instead of "select better";</li>
<li class="">cheap, private, or mechanical turns still take expensive model paths;</li>
<li class="">compression preserves a summary but loses continuity;</li>
<li class="">multimodal work becomes a disconnected side call;</li>
<li class="">serving and cache signals arrive too late to shape the next action.</li>
</ul>
<p>Inferoa treats those as harness design problems, not analytics problems.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-changes">What Changes<a href="https://inferoa.agentic-in.ai/blog/announcing-inferoa#what-changes" class="hash-link" aria-label="Direct link to What Changes" title="Direct link to What Changes" translate="no">​</a></h2>
<p>Inferoa makes inference behavior visible to the agent loop. The point is not to
add another dashboard. The point is to let the runtime choose better prompts,
better context, better routes, and better recovery behavior while the task is
still running.</p>
<p><img decoding="async" loading="lazy" alt="What changes when inference signals become native to the agent loop" src="https://inferoa.agentic-in.ai/assets/images/inferoa-what-changes-d06084e1b0d64855136724ff9dd79d1a.png" width="1672" height="941" class="img_ev3q"></p>
<table><thead><tr><th>Surface</th><th>Substrate</th><th>What Inferoa Makes Native</th><th>Why It Matters</th></tr></thead><tbody><tr><td>Loop Engineering</td><td><a href="https://inferoa.agentic-in.ai/docs/workflows/goal-mode" target="_blank" rel="noopener noreferrer" class="">Inferoa Goal Mode</a></td><td>Recursive long-horizon goals, horizons, candidate work, reflection, and completion evidence</td><td>The engineering loop keeps running until the work is proven</td></tr><tr><td>Agent Harness</td><td><a href="https://github.com/agentic-in/inferoa" target="_blank" rel="noopener noreferrer" class="">Inferoa</a></td><td>Sessions, tools, plans, autoresearch, resources, recovery, and prefix-cache discipline</td><td>Long work gets a durable runtime while preserving reusable prompt prefixes</td></tr><tr><td>Context Optimization</td><td><a href="https://www.npmjs.com/package/@colbymchenry/codegraph" target="_blank" rel="noopener noreferrer" class="">CodeGraph</a>, <a href="https://github.com/rtk-ai/rtk" target="_blank" rel="noopener noreferrer" class="">RTK</a></td><td>Compression, graph-shaped repo context, bounded tool output, and evidence selection</td><td>The model sees evidence, not raw sprawl</td></tr><tr><td>Intelligent Routing</td><td><a href="https://github.com/vllm-project/semantic-router" target="_blank" rel="noopener noreferrer" class="">vLLM Semantic Router</a></td><td>Model paths respond to cost, safety, privacy, capability, and session pressure</td><td>Turns can route between self-hosted vLLM models and external frontier models</td></tr><tr><td>Model Serving</td><td><a href="https://github.com/vllm-project/vllm" target="_blank" rel="noopener noreferrer" class="">vLLM Engine</a>, <a href="https://github.com/vllm-project/vllm-omni" target="_blank" rel="noopener noreferrer" class="">vLLM Omni</a></td><td>High-throughput, memory-efficient serving and multimodal endpoints stay visible to the harness</td><td>Self-hosted paths make cost, safety, privacy, and data sovereignty controllable when an external frontier model is unnecessary</td></tr></tbody></table>
<p>This is the core design: the agent is not merely calling an inference system.
It is shaped by it.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="goal-mode-loop-engineering-for-long-horizon-work">Goal Mode: Loop Engineering For Long-Horizon Work<a href="https://inferoa.agentic-in.ai/blog/announcing-inferoa#goal-mode-loop-engineering-for-long-horizon-work" class="hash-link" aria-label="Direct link to Goal Mode: Loop Engineering For Long-Horizon Work" title="Direct link to Goal Mode: Loop Engineering For Long-Horizon Work" translate="no">​</a></h2>
<p>Prompt engineering improves the next answer. Loop engineering designs the
system that keeps deciding what to do after that answer. In Inferoa, <code>/goal</code> is
the entry point: it starts a recursive long-horizon goal, expands work through
horizons, preserves evidence, and requires reflection before completion.</p>
<p><img decoding="async" loading="lazy" alt="Inferoa goal mode" src="https://inferoa.agentic-in.ai/assets/images/goal-e1796eba6011e766acc88c7dc5e006d3.gif" width="1860" height="1080" class="img_ev3q"></p>
<p>Goal Mode is deliberately not just a persistent note in the prompt. It gives
the harness a durable outcome, a visible Horizon 0 orientation, a strategy,
candidate work, step status, verification evidence, and a completion report.
That is the difference between asking an agent for the next step and engineering
the loop that keeps taking the next step.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="inferoa-at-a-glance">Inferoa At A Glance<a href="https://inferoa.agentic-in.ai/blog/announcing-inferoa#inferoa-at-a-glance" class="hash-link" aria-label="Direct link to Inferoa At A Glance" title="Direct link to Inferoa At A Glance" translate="no">​</a></h2>
<p>Inferoa is a terminal-first harness, but the product surface is not just a
shell. It makes long-horizon state visible while the agent works.</p>
<p>Run <code>/goal</code> to start a long-horizon recursive goal. The agent can decompose
work, update steps, attach evidence, reflect between horizons, and avoid
mistaking an empty checklist for a finished goal.</p>
<p>Plan mode turns ambiguous scope into an inspectable decision. A plan can stay in
drafting, move to approval, or become executable context without becoming a
hard runtime failure.</p>
<p><img decoding="async" loading="lazy" alt="Inferoa plan mode" src="https://inferoa.agentic-in.ai/assets/images/plan-7ffc3bc78df51fb34e073c06ff4fec05.gif" width="1884" height="1080" class="img_ev3q"></p>
<p>Autoresearch mode makes the evaluation loop native: define the experiment, run
the harness, record failures, patch the implementation, and keep the metric
trail inside the same session.</p>
<p><img decoding="async" loading="lazy" alt="Inferoa autoresearch iteration" src="https://inferoa.agentic-in.ai/assets/images/research-a235db555c704536d347c73b71fe19e4.gif" width="1864" height="1080" class="img_ev3q"></p>
<p>Tokenmaxxing is the savings ledger for prefix-cache reuse, context optimization,
<a href="https://github.com/rtk-ai/rtk" target="_blank" rel="noopener noreferrer" class="">RTK</a> tool-output savings, recent turn usage, and
model-selection pressure. This is the place to see whether the harness is
actually tokenmaxxing the session, not just reporting token usage after the
fact.</p>
<p><img decoding="async" loading="lazy" alt="Inferoa tokenmaxxing report" src="https://inferoa.agentic-in.ai/assets/images/tokenmaxxing-61155fb440155901f7e98a9294e9f329.png" width="3840" height="2100" class="img_ev3q"></p>
<p>The core command surface stays small: <code>/goal</code> for durable objectives, <code>/plan</code>
for inspectable scope, <code>/autoresearch</code> for metric-driven iteration, and
<code>/tokenmaxxing</code> for the savings ledger across prefix cache,
<a href="https://www.npmjs.com/package/@colbymchenry/codegraph" target="_blank" rel="noopener noreferrer" class="">CodeGraph</a> and
<a href="https://github.com/rtk-ai/rtk" target="_blank" rel="noopener noreferrer" class="">RTK</a> context savings, recent turn usage, and
model-selection cost pressure.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="proof-of-value">Proof Of Value<a href="https://inferoa.agentic-in.ai/blog/announcing-inferoa#proof-of-value" class="hash-link" aria-label="Direct link to Proof Of Value" title="Direct link to Proof Of Value" translate="no">​</a></h2>
<p>The value story is not one benchmark score. It is whether the tokenmaxxing path
stays stable, measurable, and cheaper as the horizon grows. The public eval is
deliberately split into measured stress runs and calibrated projections: measured
runs check runtime invariants and continuity; projections ask what happens if the
measured shape is carried to 1k-10k loops.</p>
<p>Key results:</p>
<ul>
<li class=""><strong>Prefix cache and continuity</strong>: measured profiles kept <strong>one prompt epoch,
one tool schema hash, and one cache salt</strong> while cache reuse improved after
warmup. A <strong>256-turn compression regression</strong> preserved continuity markers and
archive pointers, and 1k-10k projections were calibrated from measured tail
slope instead of claimed as live 10k-request runs.</li>
<li class=""><strong>CodeGraph context reduction</strong>:
<a href="https://www.npmjs.com/package/@colbymchenry/codegraph" target="_blank" rel="noopener noreferrer" class="">CodeGraph</a>-style
symbol/range selection saved <strong>80.8%</strong> of inspected context.</li>
<li class=""><strong>RTK tool-output reduction</strong>: <a href="https://github.com/rtk-ai/rtk" target="_blank" rel="noopener noreferrer" class="">RTK</a> command
records saved <strong>61.4%</strong> of command-token footprint.</li>
</ul>
<p><img decoding="async" loading="lazy" alt="Inferoa tokenmaxxing surfaces" src="data:image/svg+xml;base64,PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI5ODAiIGhlaWdodD0iMzgwIiB2aWV3Qm94PSIwIDAgOTgwIDM4MCIgcm9sZT0iaW1nIiBhcmlhLWxhYmVsbGVkYnk9InRpdGxlIGRlc2MiPgogIDx0aXRsZSBpZD0idGl0bGUiPlRva2VubWF4eGluZyByZWR1Y2VzIHRva2VuIHByZXNzdXJlPC90aXRsZT4KICA8ZGVzYyBpZD0iZGVzYyI+SG9yaXpvbnRhbCBiYXIgY2hhcnQgc2hvd2luZyBwcmVmaXggY2FjaGUsIENvZGVHcmFwaCwgYW5kIFJUSyB0b2tlbiBvciBjb250ZXh0IHJlZHVjdGlvbnMuPC9kZXNjPgogIDxyZWN0IHdpZHRoPSI5ODAiIGhlaWdodD0iMzgwIiBmaWxsPSIjZmZmZmZmIi8+CiAgPHRleHQgeD0iMjgiIHk9IjM4IiBmb250LWZhbWlseT0iRGVqYVZ1IFNhbnMsIEFyaWFsLCBzYW5zLXNlcmlmIiBmb250LXNpemU9IjIxIiBmb250LXdlaWdodD0iNzAwIiBmaWxsPSIjMjYzMjM4Ij5Ub2tlbm1heHhpbmcgcmVkdWNlcyB0b2tlbiBwcmVzc3VyZTwvdGV4dD4KICA8ZyBmb250LWZhbWlseT0iRGVqYVZ1IFNhbnMsIEFyaWFsLCBzYW5zLXNlcmlmIj48bGluZSB4MT0iMjg2IiB5MT0iNzgiIHgyPSIyODYiIHkyPSIzMDgiIHN0cm9rZT0iI2U4ZWVmMiIvPjx0ZXh0IHg9IjI4NiIgeT0iMzMwIiB0ZXh0LWFuY2hvcj0ibWlkZGxlIiBmb250LXNpemU9IjEyIiBmaWxsPSIjNjA3ZDhiIj4wJTwvdGV4dD4KPGxpbmUgeDE9IjQzOSIgeTE9Ijc4IiB4Mj0iNDM5IiB5Mj0iMzA4IiBzdHJva2U9IiNlOGVlZjIiLz48dGV4dCB4PSI0MzkiIHk9IjMzMCIgdGV4dC1hbmNob3I9Im1pZGRsZSIgZm9udC1zaXplPSIxMiIgZmlsbD0iIzYwN2Q4YiI+MjUlPC90ZXh0Pgo8bGluZSB4MT0iNTkyIiB5MT0iNzgiIHgyPSI1OTIiIHkyPSIzMDgiIHN0cm9rZT0iI2U4ZWVmMiIvPjx0ZXh0IHg9IjU5MiIgeT0iMzMwIiB0ZXh0LWFuY2hvcj0ibWlkZGxlIiBmb250LXNpemU9IjEyIiBmaWxsPSIjNjA3ZDhiIj41MCU8L3RleHQ+CjxsaW5lIHgxPSI3NDUiIHkxPSI3OCIgeDI9Ijc0NSIgeTI9IjMwOCIgc3Ryb2tlPSIjZThlZWYyIi8+PHRleHQgeD0iNzQ1IiB5PSIzMzAiIHRleHQtYW5jaG9yPSJtaWRkbGUiIGZvbnQtc2l6ZT0iMTIiIGZpbGw9IiM2MDdkOGIiPjc1JTwvdGV4dD4KPGxpbmUgeDE9Ijg5OCIgeTE9Ijc4IiB4Mj0iODk4IiB5Mj0iMzA4IiBzdHJva2U9IiNlOGVlZjIiLz48dGV4dCB4PSI4OTgiIHk9IjMzMCIgdGV4dC1hbmNob3I9Im1pZGRsZSIgZm9udC1zaXplPSIxMiIgZmlsbD0iIzYwN2Q4YiI+MTAwJTwvdGV4dD48L2c+CiAgPGxpbmUgeDE9IjI4NiIgeTE9IjMwOCIgeDI9Ijg5OCIgeTI9IjMwOCIgc3Ryb2tlPSIjYjBiZWM1Ii8+CiAgPGcgZm9udC1mYW1pbHk9IkRlamFWdSBTYW5zLCBBcmlhbCwgc2Fucy1zZXJpZiI+PHRleHQgeD0iMjY4IiB5PSIxMjEuMyIgdGV4dC1hbmNob3I9ImVuZCIgZm9udC1zaXplPSIxNCIgZmlsbD0iIzI2MzIzOCI+UHJlZml4IGNhY2hlIGNhY2hlZC10b2tlbiBkaXNjb3VudDwvdGV4dD4KPHJlY3QgeD0iMjg2IiB5PSI5Ny4zIiB3aWR0aD0iNTUwLjgiIGhlaWdodD0iMzguMCIgcng9IjQiIGZpbGw9IiMxOTc2ZDIiLz4KPHRleHQgeD0iODQ2LjgiIHk9IjEyMS4zIiBmb250LXNpemU9IjE1IiBmb250LXdlaWdodD0iNzAwIiBmaWxsPSIjMjYzMjM4Ij45MC4wJTwvdGV4dD4KPHRleHQgeD0iMjY4IiB5PSIxOTguMCIgdGV4dC1hbmNob3I9ImVuZCIgZm9udC1zaXplPSIxNCIgZmlsbD0iIzI2MzIzOCI+Q29kZUdyYXBoIGNvbnRleHQgcmVkdWNlZDwvdGV4dD4KPHJlY3QgeD0iMjg2IiB5PSIxNzQuMCIgd2lkdGg9IjQ5NC43IiBoZWlnaHQ9IjM4LjAiIHJ4PSI0IiBmaWxsPSIjMDA3OTZiIi8+Cjx0ZXh0IHg9Ijc5MC43IiB5PSIxOTguMCIgZm9udC1zaXplPSIxNSIgZm9udC13ZWlnaHQ9IjcwMCIgZmlsbD0iIzI2MzIzOCI+ODAuOCU8L3RleHQ+Cjx0ZXh0IHg9IjI2OCIgeT0iMjc0LjciIHRleHQtYW5jaG9yPSJlbmQiIGZvbnQtc2l6ZT0iMTQiIGZpbGw9IiMyNjMyMzgiPlJUSyB0b29sIG91dHB1dCByZWR1Y2VkPC90ZXh0Pgo8cmVjdCB4PSIyODYiIHk9IjI1MC43IiB3aWR0aD0iMzc1LjgiIGhlaWdodD0iMzguMCIgcng9IjQiIGZpbGw9IiM4ZTI0YWEiLz4KPHRleHQgeD0iNjcxLjgiIHk9IjI3NC43IiBmb250LXNpemU9IjE1IiBmb250LXdlaWdodD0iNzAwIiBmaWxsPSIjMjYzMjM4Ij42MS40JTwvdGV4dD48L2c+CiAgPHRleHQgeD0iMjg2IiB5PSIzNTgiIGZvbnQtZmFtaWx5PSJEZWphVnUgU2FucywgQXJpYWwsIHNhbnMtc2VyaWYiIGZvbnQtc2l6ZT0iMTMiIGZpbGw9IiM2MDdkOGIiPlNvdXJjZXM6IHByZWZpeC1jYWNoZSBjb3N0IG1vZGVsLCBDb2RlR3JhcGggcHJvamVjdGlvbiwgYW5kIFJUSyByZWNvcmRzLjwvdGV4dD4KPC9zdmc+Cg==" width="980" height="380" class="img_ev3q"></p>
<ul>
<li class=""><strong>Routing economics</strong>: the
<a href="https://routeworks.github.io/?p=/leaderboard" target="_blank" rel="noopener noreferrer" class="">Routeworks leaderboard</a> makes the
inference-cost tradeoff visible on a log scale. At similar accuracy, routed
paths can sit at <strong>1/10</strong> or even <strong>1/100</strong> of a frontier-heavy route's cost.</li>
</ul>
<p><img decoding="async" loading="lazy" alt="Routeworks routing leaderboard" src="https://inferoa.agentic-in.ai/assets/images/routeworks-routing-leaderboard-3f39d4d783a3b77e17e1adf0eee0d63a.png" width="2112" height="1298" class="img_ev3q"></p>
<p>The exact numbers will move with workload, model pricing, and local RTK command
corpus. The direction is the important part: long-horizon agents need a harness
that protects stability, preserves continuity through compression, and uses
every inference surface available.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="built-with-the-inference-stack">Built With The Inference Stack<a href="https://inferoa.agentic-in.ai/blog/announcing-inferoa#built-with-the-inference-stack" class="hash-link" aria-label="Direct link to Built With The Inference Stack" title="Direct link to Built With The Inference Stack" translate="no">​</a></h2>
<p><img decoding="async" loading="lazy" alt="Inferoa built with the inference stack" src="https://inferoa.agentic-in.ai/assets/images/inferoa-stack-e103c330b4c3be4729f894c68af109ca.png" width="1672" height="941" class="img_ev3q"></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="vllm-ecosystem">vLLM Ecosystem<a href="https://inferoa.agentic-in.ai/blog/announcing-inferoa#vllm-ecosystem" class="hash-link" aria-label="Direct link to vLLM Ecosystem" title="Direct link to vLLM Ecosystem" translate="no">​</a></h3>
<p>Inferoa starts with the vLLM ecosystem because vLLM exposes the right surfaces:
serving behavior, routing, multimodal paths, endpoint signals, and prefix-cache
economics.</p>
<ul>
<li class=""><a href="https://github.com/vllm-project/vllm" target="_blank" rel="noopener noreferrer" class=""><strong>vLLM Engine</strong></a> provides
high-performance OpenAI-compatible inference and the prefix-cache behavior
Inferoa protects across long sessions.</li>
<li class=""><a href="https://github.com/vllm-project/semantic-router" target="_blank" rel="noopener noreferrer" class=""><strong>vLLM Semantic Router</strong></a>
brings model routing into the agent loop so routes can respond to cost,
safety, privacy, capability, and session pressure.</li>
<li class=""><a href="https://github.com/vllm-project/vllm-omni" target="_blank" rel="noopener noreferrer" class=""><strong>vLLM Omni</strong></a> brings image,
video, and audio understanding or generation into the same durable agent
contract.</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="context-optimization">Context Optimization<a href="https://inferoa.agentic-in.ai/blog/announcing-inferoa#context-optimization" class="hash-link" aria-label="Direct link to Context Optimization" title="Direct link to Context Optimization" translate="no">​</a></h3>
<p>Inferoa also uses the context optimization projects that make long-horizon
agent loops practical:</p>
<ul>
<li class=""><a href="https://www.npmjs.com/package/@colbymchenry/codegraph" target="_blank" rel="noopener noreferrer" class=""><strong>CodeGraph</strong></a>
turns repository context into graph-shaped symbol and range evidence.</li>
<li class=""><a href="https://github.com/rtk-ai/rtk" target="_blank" rel="noopener noreferrer" class=""><strong>RTK</strong></a> rewrites command-heavy tool output
into compact records that preserve evidence while reducing token pressure.</li>
</ul>
<p>Inferoa is the harness layer above that stack: the place where long-horizon
agent behavior and inference behavior meet.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="try-it">Try It<a href="https://inferoa.agentic-in.ai/blog/announcing-inferoa#try-it" class="hash-link" aria-label="Direct link to Try It" title="Direct link to Try It" translate="no">​</a></h2>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#d6dde8;--prism-background-color:#101419"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#d6dde8;background-color:#101419"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#d6dde8"><span class="token plain">npm install -g inferoa@dev</span><br></div><div class="token-line" style="color:#d6dde8"><span class="token plain">inferoa setup</span><br></div><div class="token-line" style="color:#d6dde8"><span class="token plain">inferoa</span><br></div></code></pre></div></div>
<p>The larger goal is simple: agents should not waste the inference stack they are
already paying for. Inferoa makes those signals native to the loop.</p>]]></content:encoded>
            <category>inferoa</category>
            <category>tokenmaxxing</category>
            <category>agents</category>
            <category>inference</category>
            <category>vllm</category>
        </item>
    </channel>
</rss>