Chinese AI is no longer just answering prompts, because it has now solved a decade-old math barrier that humans could not clear on their own

Published On: April 21, 2026 at 6:00 PM
Follow Us
AI-generated mathematical proof displayed in formal verification software, representing automated reasoning in advanced algebra

A Peking University-led research team says it built an AI framework that can tackle a genuine open problem in commutative algebra and then formally verify the result in Lean 4.

In a preprint posted on April 4, 2026, the authors describe a two-agent workflow that produced a counterexample and generated a machine-checkable proof, with the formalization completed in about 80 hours of agent runtime.

The bigger story is not just that “AI did math.” It is that the claim comes bundled with something closer to a receipt, a proof that can be compiled and audited, even as the work is still a preprint and the math community is debating what to make of it.

A counterexample, not a neat resolution

The question traces back to 2014 and is tied to Dan D. Anderson, a mathematician then at the University of Iowa, who posed an implication question in commutative algebra. The preprint frames it as an open problem about whether a weaker “approximation” property of certain rings automatically forces a stronger one.

In the paper’s formal statement, the issue is whether weak quasi-completeness implies quasi-completeness for Noetherian local rings. The AI system claims a negative answer by constructing a ring that satisfies the weak condition but fails the stronger one, which means the implication does not hold in general.

That can still be meaningful because counterexamples often redirect a field, telling researchers where a “nice” rule breaks. But mathematicians online have also pointed out a reality check: a decade-old open question is not automatically a widely worked or famously hard one.

Two agents, one workflow

The first agent is called Rethlas, and it is designed for informal reasoning in natural language. It tries out multiple approaches and leans on a semantic theorem search engine named Matlas to pull relevant results from the literature, in a workflow the authors compare to how human mathematicians explore strategies.

A key moment in the authors’ own narrative is retrieval rather than raw invention. Rethlas uses Matlas to discover and apply a technical result by Jensen from 2006, showing how cross-domain theorem search can unlock a path that might not be obvious from inside one subfield.

Then Archon takes over for formal verification. It translates the informal argument into a Lean 4 project, breaks the work into many smaller proof obligations, and fills in gaps that informal math writing often leaves implicit.

Why Lean matters for trust

If you have ever watched a software build fail because of a tiny missing character, you already get the core idea of formal verification. Lean 4 is a proof assistant that forces every step to be explicit, so the end result is not just convincing prose but a proof the computer kernel accepts.

The authors also report using a tool called Comparator to check that the theorem statements match a human-reviewed simplified specification and that the proof uses only standard Lean axioms without sneaking in extra assumptions. That kind of plumbing is not glamorous, but it is exactly what separates an impressive-sounding argument from something you can recheck.

Just as important, the team published the Lean formalization in a public repository, including a short statement file intended to be read first by anyone who wants to validate what is actually being proved. For outsiders, that is a rare kind of transparency in research-level math, where “trust me” has often been the default.

The 80-hour figure in plain terms

This is not a one-page proof sketch. The paper reports roughly 19,000 lines of Lean 4 code across 42 files, and it estimates that an experienced formalization expert typically produces about 150 to 250 lines of Lean per day.

On that rough math, the formalization output corresponds to several-person months of specialized effort. Yet the authors say their system completed the formalization in about 80 hours of agent runtime, which is closer to a long weekend than a season of work.

The cost details are also unusually concrete. The paper says the run used three Claude Code Max subscriptions priced at $200 per month each, and each account consumed about 70% of its weekly quota during the one-week project.

What “no human intervention” really means

Some headlines frame this as fully autonomous, and the authors do make a strong claim about minimal human involvement. They write that the only human intervention was downloading paywalled PDF files the system could not retrieve, after which the system handled OCR and organized the content for its own use.

Crucially, they add that “no mathematical judgment” was required from the human operator. That is a specific and testable kind of autonomy, and it matters more than vague talk about “AGI-level math.”

Still, it also highlights a practical dependency that companies and labs will recognize immediately. If a key step depends on access to restricted documents, then permissions, provenance, and secure document handling become part of the AI system, not an afterthought.

A business signal hiding inside a math story

For most executives, the takeaway is not commutative algebra. It is the idea that advanced reasoning can be packaged into a verifiable artifact, which changes both the speed and the risk profile of research and product decisions.

Think about sectors where one wrong assumption can get expensive fast, like chip design, cryptography, logistics optimization, or financial modeling that quietly ends up affecting a balance sheet. A workflow that pushes arguments into machine-checked form could help teams move faster while keeping a tighter leash on hidden errors.

But it is not plug and play. Formalization still depends on mature libraries, good tooling, and high-quality access to prior results, and this paper itself shows that information access can become a real bottleneck even when the “math” part is automated.

Defense and security implications

Defense and national security communities already use formal methods in narrow, high-value contexts, such as validating safety-critical software and analyzing cryptographic protocols. If systems like this can scale, they could lower the barrier to producing proofs and verification artifacts that are suitable for audit, procurement, and certification workflows.

Math is also dual use by default. Faster theorem search and verification can help harden systems, but it can also accelerate discovery in areas that shift real capabilities, including cryptography, optimization, and signal processing.

There is also a quieter operational lesson. When proof pipelines depend on external documents and automated OCR, the integrity of the reference chain starts to matter as much as the proof engine itself.

What to watch next

First, expect a split conversation between correctness and significance. A Lean-checked proof can make “does it follow” easier to answer, but it does not automatically tell you whether the problem was central, widely studied, or likely to open new research directions. (mathoverflow.net)

Second, watch for replication and generalization. The same preprint discusses research-level evaluation efforts like FrontierMath, and outside this paper Epoch AI reported that an AI system produced the first solution to one of FrontierMath’s unpublished open problems in March 2026, which suggests momentum is building around verifiable research outputs.

At the end of the day, the shift is simple. We are moving from AI that can talk through math to AI that can ship math that survives auditing, and that is when businesses and governments will start paying much closer attention.

The study was published on arXiv.

Leave a Comment