The post Is AGI Here? Not Even Close, New AI Benchmark Suggests appeared on BitcoinEthereumNews.com. In brief ARC-AGI-3 exposes a massive gap between AGI claimsThe post Is AGI Here? Not Even Close, New AI Benchmark Suggests appeared on BitcoinEthereumNews.com. In brief ARC-AGI-3 exposes a massive gap between AGI claims

Is AGI Here? Not Even Close, New AI Benchmark Suggests

For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

In brief

  • ARC-AGI-3 exposes a massive gap between AGI claims and reality, with top AI models scoring below 1% while humans achieve perfect performance.
  • The benchmark tests true generalization—requiring agents to explore, plan, and learn from scratch in unknown environments rather than recall trained patterns.
  • Despite industry hype, current AI systems remain far from AGI, lacking the reasoning and adaptability that even young humans display naturally.

Nvidia CEO Jensen Huang went on Lex Fridman’s podcast last week and said, plainly, “I think we’ve achieved AGI.” Two days later, the most rigorous test in AI research dropped its newest artificial general intelligence benchmark—and every frontier model scored below 1%.

The ARC Prize Foundation released ARC-AGI-3 this week, and the results are brutal. Google’s Gemini 3.1 Pro led the pack at 0.37%. OpenAI’s GPT-5.4 came in at 0.26%. Anthropic’s Claude Opus 4.6 managed 0.25%, while xAI’s Grok-4.20 scored exactly zero. Humans, meanwhile, solved 100% of environments.

This isn’t a trivia test or coding exam, or even ultra-hard PhD-level questions. ARC-AGI-3 is something entirely different from anything the AI industry has faced before.

The benchmark was built by François Chollet and Mike Knoop’s foundation, which set up an in-house game studio and created 135 original interactive environments from scratch. The idea is to drop an AI agent into an unfamiliar game-like world with zero instructions, zero stated goals, and no description of the rules. The agent has to explore, figure out what it’s supposed to do, form a plan, and execute it.

If that sounds like something any five-year-old can do, you’re starting to understand the problem. If you want to see if you are better than AI, you can play the same games featured in the test by clicking on this link. We tried one; it was weird at first, but after a few seconds, you can easily get the hang of it.

It also is the clearest example of what the “G” in AGI stands for. When you generalize, you are able to create new knowledge (how a weird game works) without being trained on it in advance.

Previous versions of ARC tested static visual puzzles—show a pattern, predict the next one. They were hard at first. Then the labs threw compute power and training at them until the benchmarks were effectively dead. ARC-AGI-1, introduced in 2019, fell to test-time training and reasoning models. ARC-AGI-2 lasted about a year before Gemini 3.1 Pro hit 77.1%. The labs are very good at saturating benchmarks they can train against.

Version 3 was designed specifically to prevent that. With 110 of the 135 environments kept private—55 semi-private for API testing, 55 fully locked for competition—there’s no dataset to memorize. You can’t brute-force your way through novel game logic you’ve never seen.

Scoring isn’t pass/fail either. ARC-AGI-3 uses what the foundation calls RHAE—Relative Human Action Efficiency. The baseline is the second-best, first-run human performance. An AI that takes ten times as many actions as a human scores 1% for that level, not 10%. The formula squares the penalty for inefficiency. Wandering around, backtracking, and guessing your way to an answer gets punished hard.

The best AI agent in the month-long developer preview scored 12.58%. Frontier LLMs tested through the official API, with no custom tooling, couldn’t crack 1%. Ordinary humans solved all 135 environments with no prior training and no instructions. If that’s the bar, then the current crop of models isn’t clearing it.

There is one real methodological debate here. ARC’s report says a Duke-built custom harness pushed Claude Opus 4.6 from 0.25% to 97.1% on a single environment variant called TR87. That does not mean Claude scored 97.1% on ARC-AGI-3 overall; its official benchmark score remained 0.25%, but the shift is still worth noting.

The official benchmark feeds agents JSON code, not visuals. That’s either a methodological flaw or a demonstration that today’s models are better at processing human-friendly information than raw structured data. Chollet’s foundation has acknowledged the debate, but isn’t changing the format.

“Frame content perception and API format are not limiting factors for frontier model performance on ARC-AGI-3,” the paper reads. In other words, they seem to reject the idea that models fail because they “can’t see” the tasks properly, arguing instead that perception is already sufficient—and the real gap lies in reasoning and generalization.

The AGI reality check arrived during a week when the hype machine was running at full speed. Besides Huang’s comment, Arm named its new data center chip the “AGI CPU.” OpenAI’s Sam Altman has said they’ve “basically built AGI,” and Microsoft is already marketing a lab focused on building ASI: An evolution of what comes after AGI is achieved. The term is being stretched until it means whatever is commercially convenient, it appears.

Chollet’s position is simpler. If a normal human with no instructions can do it, and your system can’t, then you don’t have AGI—you have a very expensive autocomplete that needs a lot of help.

ARC Prize 2026 is offering $2 million across three competition tracks, all hosted on Kaggle. Every winning solution must be open-sourced. The clock is running, and right now, the machines aren’t even close.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Source: https://decrypt.co/362496/is-agi-here-not-even-close-ai-benchmark

Market Opportunity
Delysium Logo
Delysium Price(AGI)
$0.01083
$0.01083$0.01083
-2.43%
USD
Delysium (AGI) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

USDH Power Struggle Ignites Stablecoin “Bidding Wars” Across DeFi: Bloomberg

USDH Power Struggle Ignites Stablecoin “Bidding Wars” Across DeFi: Bloomberg

A heated contest for control over a new dollar-pegged token has set the stage for what analysts say could define the next phase of the stablecoin industry. According to Bloomberg, a bidding war unfolded on Hyperliquid, one of crypto’s fastest-growing trading platforms, with the prize being the right to issue USDH, its native stablecoin. The competition drew some of the sector’s most prominent names, including Paxos, Sky, and Ethena, who later withdrew their bid, alongside the lesser-known Native Markets, a startup backed by Stripe stablecoin subsidiary Bridge. Hyperliquid Stablecoin Race Shows Branding and Partnerships Matter as Much as Tech Over the weekend, Hyperliquid’s validators, the contributors who secure the network and vote on key decisions, awarded the USDH contract to Native Markets over the weekend. Despite its relatively new status, the firm’s connection with Stripe helped it outpace more established rivals. Stablecoins underpin decentralized finance by providing a dollar-backed medium for collateral, settlement, and payments across applications. What began as a grassroots, community-led sector has evolved into a battleground for institutions and payment companies seeking revenue from interest on reserves. Circle, for example, shares proceeds from its USDC with Coinbase under a partnership designed to stabilize earnings during market swings. The Hyperliquid contest offered a rare glimpse into just how intense competition has become. Paxos pledged to take no revenue until USDH surpassed $1 billion in circulation. Agora offered to share 100% of net revenue with Hyperliquid, while Ethena put forward 95%. All were outbid by Native Markets, whose ties to Stripe’s $1.1 billion acquisition of Bridge and subsequent rollout of the Tempo blockchain positioned it as a strong contender. “Every stablecoin issuer is extremely desperate for supply,” said Zaheer Ebtikar, co-founder of Split Capital. “They are willing to publicly announce how much they are willing to offer. It just shows it’s a very tough business for stablecoin issuers.” While USDC remains dominant on Hyperliquid with more than $5.6 billion in deposits, the arrival of USDH could shift flows and revenue dynamics. Paxos co-founder Bhau Kotecha said the firm sees the exchange’s growth as an important opportunity, while Agora’s co-founder Nick van Eck warned that awarding the contract to a vertically integrated issuer risked undermining decentralization. Regulatory positioning also factored into the debate. Paxos operates under a New York trust charter and is seeking a federal license, while Bridge holds money transmitter approvals in 30 states. Native Markets, in a blog post, cited regulatory flexibility and deployment speed as reasons for its selection. Hyperliquid said the strong engagement from its community validated the process. Circle CEO Jeremy Allaire dismissed concerns over USDC’s status, noting on X that competition benefits the ecosystem. Analysts suggested that fears of centralization may be exaggerated, noting that Hyperliquid is likely to remain neutral and support multiple stablecoins. Still, the contest over USDH highlighted a new reality for stablecoins: branding, partnerships, and business strategy are becoming as decisive as technology. Native Markets Secures USDH Stablecoin Mandate on Hyperliquid Hyperliquid has concluded its governance vote for the USDH stablecoin, awarding the mandate to Native Markets after a closely watched process that drew weeks of community debate and rival proposals. USDH, described by Hyperliquid as a “Hyperliquid-first, compliant, and natively minted” dollar-backed token, is intended to reduce the platform’s dependence on USDC and strengthen its spot markets. Validators on the decentralized exchange voted in favor of Native Markets, a relatively new player backed by Stripe’s Bridge subsidiary, over established contenders including Paxos and Ethena. The outcome followed a string of proposals offering aggressive revenue-sharing terms to win validator support, underscoring the scale of incentives attached to controlling USDH. Hyperliquid’s exchange has become a critical hub for stablecoin liquidity, with $5.7 billion in USDC, around 8% of its total supply, currently held on the network. At prevailing treasury yields, that translates to an estimated $200 million to $220 million in annual revenue for Circle, underlining why a native alternative could be transformative. Hyperliquid’s validators, who secure the network and vote on key decisions, selected Native Markets following an on-chain governance process that concluded September 15. Native Markets has laid out a phased rollout for USDH, beginning with capped minting and redemption trials before expanding into spot markets. Its reserves will be managed in cash and treasuries by BlackRock, with on-chain tokenization through Superstate and Bridge. Yield from those reserves will be split between Hyperliquid’s Assistance Fund and ecosystem development. The launch of USDH comes as Hyperliquid records record profits from perpetual futures trading, with $106 million in revenue in August alone, and prepares to slash spot trading fees by 80% to bolster liquidity. Analysts say the move positions Hyperliquid to capture more of the stablecoin economics internally, marking a significant step in its bid to rival the largest players in decentralized finance
Share
CryptoNews2025/09/18 00:48
Bitcoin Market Faces Renewed Pressure: What Lies Ahead?

Bitcoin Market Faces Renewed Pressure: What Lies Ahead?

The post Bitcoin Market Faces Renewed Pressure: What Lies Ahead? appeared on BitcoinEthereumNews.com. Recent data reveals heightened instability in the cryptocurrency
Share
BitcoinEthereumNews2026/03/31 01:21
BTC fell below $67,000, down 0.94% on the day.

BTC fell below $67,000, down 0.94% on the day.

PANews reported on March 31 that, according to OKX market data, BTC has just fallen below $67,000 and is currently trading at $66,989.20 per coin, down 0.94% on
Share
PANews2026/03/31 01:22