We started Aomi Labs in May, 2025. The journey toward effective AI automation began with a narrow, practical need: to turn natural-language intents into safe, reproducible on-chain transactions without bespoke integration of any protocol SDKs. However, this seemingly straightforward task revealed a series of deep architectural challenges that render simple chatbot-like approaches ineffective.
The translation from a user's intent to a transaction is inherently brittle, fanning out into numerous low-level steps such as resolving addresses, choosing function signatures, building calldata, and computing allowances, slippage, and gas strategies. Furthermore, the system must contend with a fragmented landscape of "truths," as contract addresses and ABIs are scattered across documentation, deployment records, and upgradeable proxies, demanding a deterministic discovery path rather than ad-hoc scraping. Given that mainnet transactions are irreversible and expensive, a robust simulation-first methodology is non-negotiable to dry-run operations and detect potential failures before they result in real losses.
The initial architectural approach was to expose battle-tested EVM tools through MCPs, wrapping them in typed endpoints to be called by an LLM. Specifically, we wrap Foundry behind typed tool endpoints so an LLM can ask for a balance, simulate a swap, or compose calldata, but the runtime enforces schemas, sessions, and broadcast boundaries. For target address, we borrow from L2Beat’s discovery system to resolve protocols (with proxy/upgrade awareness) and return validated source for execution.
For safety measures, persistent fork is kept so that emulator capture the exact txs to replay and audit. Our calls to LLM has type-checker to ensure the generated operations are of the right type, and yet, we are extra precautious to ensure simulation results are safe.
.png)
This architecture was conceived during the MCP hype cycle of early 2025, when the promise of a "USB-C for everything used by AI" captivated builders at the application layer. However, the practical application of this model quickly revealed its fundamental limitations. Many AI engineers later realized that MCP is essentially a session-based handshake protocol between a tool server and an agent runtime, which is an overly restrictive and often inefficient way to serve tools. This design forces the agent runtime to follow a rigid sequence of steps and, by operating over HTTP, introduces significant latency that is unacceptable for performance-critical tasks.

Furthermore, its session-based nature assumes a persistent, defined agentic lifecycle, making it an unnecessary overkill for one-shot data processing calls. To the underlying LLM, this complex serving mechanism is irrelevant—it only needs a serialized tool call and its returned value. Builders were ultimately misled by the promise of "universality," overlooking its practical implications and leading to widespread malpractice, as they adopted an architecture that tend to be too slow and rigid for their business use case.