On the Road of Agent Development, We Are Don Quixote

A few months ago, when we published our roadmap for agent security products, there was barely an echo.

Antiy's initial security-agent product roadmap

So we went looking for professionals, seeking opinions one on one. This new product “matrix” — AVL Code, AVL Claw, AVL MCP, AVL Skill — collected almost nothing but negative feedback:

These are not even the same kind of thing. How can Skill sit alongside Claw?
What on earth is AVL MCP? Are you going to roll your own proprietary protocol?
There is already an open-source Claw — just use it. Stop reinventing the wheel!
AVL Code? So you are challenging Claude Code? Is your Landi model even up to it? And what about Cursor? You couldn't take on even OpenCode.

We had no idea how to answer these questions, so we decided to trust our gut and start from AVL Code — at a time when Claw was the hottest thing around.

You Only Know by Doing

The IT department dug out a pile of antique laptops, filled a whole room with “shrimp rigs,” and put IT operations into practice on them first. Yet as development moved forward, the first thing to be halted and knocked out was, of all things, AVL Claw.

The company's mandate for the Landi team was to build reliable, AI-driven security productivity tools. But the slacking and deceit of those “shrimp rigs” — one “shrimp” wandered off every day to catch up on the NBA... — made us realize that implementing Claw ourselves would not make such problems go away. Full system control and direct custody of piles of login credentials pushed against our understanding of security red lines; more importantly, we were not out to build an all-purpose life assistant — we needed stable productivity output. So we decisively halted AVL Claw and accelerated AVL Code.

AVL Code, as we define it, is an AI coding and security-analysis desktop assistant. Its model is clear: engineers use it as a tool — state the task clearly, and let the AI do the work. Its security fence is clear: as a client, it can enforce strict behavior-control policies, and it runs under the host's own security mechanisms, so it will never run amok like some brain-hijacking lobster. Its benchmarks are clear: Claude Code, Codex, and Cursor stand ahead like mountains — there to learn from, and to climb. Its opening direction is clear too: the limits of our model decided from birth that we could not compete on pure coding ability (though we could not go without it), so we would build strength where we are strongest — security analysis and reverse engineering — and then shore up coding step by step. We built in executable-format parsing, hashing, entropy, hex view, disassembly, decompilation, a security domain knowledge base, and more.

As we charged forward, whatever was fuzzy became clear, and every dead end proved itself dead. The controversies around Skill and MCP sorted themselves out quickly too.

The original vision for AVL Skill was a skill ecosystem with built-in security scanning and certificate signature verification — a chain of trust. Verification with China's national crypto algorithms was done in no time. Yet even as skill-poisoning incidents kept making headlines, the entire AI toolchain ecosystem kept hurtling forward on a simple-trust model. So the mechanism naturally settled into AVL Code as an internal capability.

AVL MCP went much the same way. When first proposed, it was to be an extension component of the antivirus engine. Big Beard said, “Our antivirus engine does object detection; in the AI era it must extend to process detection” — so we considered embedding security mechanisms into the agent's invocation flow, as a new middleware product. But in an era of breakneck growth, people rarely pay for security, and so AVL MCP was absorbed into AVL Code as well.

AVL Code has not yet been proven a dead end — so we accelerate. We committed to at least one release a day, though what triggered updates was less often new features than promptly fixing all kinds of small bugs. Our record: 6 releases in a single day.

Eat Your Own Dog Food First

Finding trial users was not easy. In the circles of veteran engineers, you can hand over an invite code loaded with 1000 credits and the man will stay calm as still water, without a ripple. A true brother is not the one who likes your post to make you feel good, but the one who tells you straight: I dual-wield Codex and CC, on the MAX plan every month. Among heavy users of Codex and CC, many consider using any other Code a waste of life.

And the cybersecurity crowd? Hand someone an invite code, and minutes later a brazen injection attempt shows up in the logs. Then again, the internet-facing side sees floods of attack events every day, and the IT department runs automated response as a matter of routine — we long ago made peace with “any open service will be attacked” as the normal state of affairs.

Before you ask others to use it, use it yourself. Antiy has always preached dogfooding — an idea learned from Microsoft. In 1988, Microsoft's DOS all but ruled the personal computer, but its LAN operating system, LAN Manager, was getting thrashed by Novell and had no external users. Executive Paul Maritz wrote an email titled "Eating our own Dogfood." The point was simple: if nobody uses it, use it yourself first.

Of course we wanted a full internal rollout of AVL Code too. The worry was GPU capacity — external testers' experience had to come first — so no internal download link was ever published; and when internal applications showed up at the external portal, the product manager mostly refused to wave them through. But colleagues discovered that AVL Code would work with a plain company SSO login. So instead of applying for download links, they began passing the executable around person to person, and internal trials took off.

The test department was the first to start burning tokens. They were testing a small helper tool called “Net Admin's Hand,” with very fine-grained OS compatibility requirements. In the compatibility analysis, AVL Code finished in minutes what took hours by hand, turning a vague “it should run” into a quantified “runs, doesn't run, and here is why” — a panoramic compatibility map that could directly drive test design, technology selection, and deployment planning. They then used AVL Code to install and configure cve-mcp-server, going from natural-language instructions to a ready MCP service — a fully automated, closed-loop delivery across development and test environments.

The security services team picked up AVL Code through a small live-fire exercise. One evening an order came down from the supervising authority: in response to an iPhone vulnerability, Antiy had to quickly publish an online check page guiding users on older versions to upgrade. The guy on duty could not write code, so he threw the work order into AVL Code — and out came the page tool.

The comrades we had been pining for most, Antiy CERT, were not the first to come aboard — CERT did not initially believe AI could beat them from reverse engineering all the way to attribution. The change came this March. While responding to the npm registry trojan-poisoning incident that hit Axios, an engineer dropped the sample into AVL Code for analysis and attribution almost out of pure skepticism. Minutes later, AVL Code produced: “Attribution conclusion: 40–50% likely a professional cybercrime organization or an emerging APT; 25–30% likely a branch of the Lazarus Group.” CERT's own manual analysis had concluded: Lazarus. From that day on, AVL Code users inside CERT multiplied overnight.

AVL Code's attribution triage of the npm poisoning sample (plain-crypto-js), with confidence-tiered output

And in analyzing Shai-Hulud (the TeamPCP worm), AVL Code's output pinpointed for the engineers “the SessionStart hook in claude/settings.json,” supporting the judgment that the worm's core functionality was written with Claude Code.

The takeaway from all the trials: the teams that got the best results were the ones that embedded AVL Code into their workflows. And workflow restructuring was never designed on paper — sensible patterns emerged naturally from working the agent in.

Hardware Is a Big Problem, but Intelligence Is Not

How many GPUs to support internal use plus external testing? Gritting our teeth, we scraped together all of 8. We assumed a dozen or so concurrent users would saturate them, yet the 8 cards somehow held the line. The logic comes down to one word — “optimize” — or four: “optimize, then optimize again.”

There are never enough GPUs. But short on GPUs — do we just give up on AI?

Last year, waiting for a flight, the guru in the next seat was holding forth on AI to a small crowd — I was traveling to defend the regulatory filing of the Landi vertical LLM; he was explaining how to judge an AI project. Then he suddenly noticed me typing code.

“Brother, stop working. Soon nobody will need to work — the big models will do it all.”

“Oh, we do large models too. We even built a foundation model.”

“You built a foundation model?”

“A vertical one... binary...” I said, a little sulkily.

“Don't give me this base-one, base-two business. How many GPUs have you got...”

Though I never worked out how “base one” would even count, I gritted my teeth, decided to count in even the old and infirm 3090s, and quavered: 20-odd...

“Without 10,000 GPUs, don't even talk about foundation models.”

In one breath the man wiped out our reason for existing. I seethed inside — 10,000 GPUs; the only base he ever counted in was base-ten-thousand — but said nothing and went back to my work.

The Prophets Won't Walk with You, but AI Will

Ever since ChatGPT burst onto the scene, many people have lived wrapped in anticipation and anxiety at once. So have we — and at times like these, even the wrong judgments and choices of the past get psychologically magnified.

In 2011, when GPU acceleration was already drawing intense attention, our first idea was to put it into our fastest network-detection appliances to speed up the protocol stack and virus matching. The validation was disappointing — GPUs excel at massive concurrency, while network traffic detection demands extreme real-time performance. And when we weighed its value for automated sample analysis, the unanimous internal conclusion was that GPU or CPU made no essential difference. So we filed a patent — “A GPU- and buffer-based method and system for network data processing” — while settling into the basic judgment that “GPUs are of little use to network security.”

Was that a case of practicing too early and judging wrong because of it? Looking back, we think not. To find the right road, trial and error is necessary — and it must be continuous.

In this era of upheaval, thinking can hardly keep up with the explosion of information; the information crosses and contradicts itself; everyone is using AI, and no two people's AI is the same AI. Every brainstorm sprays out a pile of ideas “worth doing,” and every idea then finds a pile of reasons it “won't go well.”

We too craved prophets to point the way, and so we looped: Hinton says..., then Fei-Fei Li says..., then Yann LeCun says... — and each time the master sounded right, except the masters did not quite agree with one another, and what they said did not quite seem to touch our field anyway...

“Better one Don Quixote than ten Hamlets.” When I joined Antiy, that line was printed on a sheet of A4 paper taped up beside my workstation. It shaped me far more than the days of onboarding training did.

The AI era assists thinking better than any era before it — but deliberate and hesitate on loop, and you will be paralyzed among possibilities. The AI era has also lowered the threshold of action beyond compare — act first, err fast, and imagination gets pushed into reality.

Hamlet, or Don Quixote? That is the choice. We choose to be Don Quixote and Sancho — with a continuous feedback loop strapped on.

We build AVL Code hoping more people will become the doers of the AI era.

Claude Code, Codex, Cursor — you are mighty beyond measure. But — here we come, riding a donkey!