Meta: Can afford a trillion in computing power but can't keep the key people

2026-02-28 09:26:22

Silicon Valley’s AI arms race has never lacked super buyers waving checks; what’s missing are the people who know how to forge the future with this computing power.

By Ada, Deep潮 TechFlow

Pang Ruiming hadn’t even settled into his seat at Meta before leaving.

In July 2025, Zuckerberg secured this highly sought-after Chinese AI infrastructure engineer from Apple with a multi-year compensation package worth over $200 million. Pang was assigned to Meta’s Superintelligence Lab to build the infrastructure for the next-generation AI models.

Seven months later, OpenAI poached him.

According to The Information, OpenAI launched a months-long recruitment campaign for Pang Ruiming. Despite telling colleagues he was “very happy working at Meta,” he ultimately chose to leave. Bloomberg reports that his Meta compensation was tied to milestones, and leaving early meant forfeiting most of his unvested equity.

$200 million couldn’t buy seven months of loyalty.

This isn’t just a simple job switch.

One person’s departure, a signal to many

Pang Ruiming isn’t the first to leave.

Last week, Mat Velloso, head of Meta’s Superintelligence Lab developer platform, also announced his departure. He joined Meta in July 2024 after leaving Google DeepMind, having been there less than eight months. Going further back, in November 2025, Yann LeCun, a Turing Award winner and Meta’s Chief AI Scientist who had been at Meta for 12 years, announced his departure to start his own venture, working on the “world model” he’s long championed. Recently, Russ Salakhutdinov, Vice President of Generative AI Research at Meta and a core disciple of Geoffrey Hinton, also publicly announced his exit.

To understand the talent drain at Meta AI, we must first grasp how damaging Llama 4 really is.

In April 2025, Meta boldly released the Llama 4 series, including Scout and Maverick models. Official papers boasted impressive data, claiming to outperform GPT-4.5 and Claude Sonnet 3.7 on core benchmarks like MATH-500 and GPQA Diamond.

However, this flagship model carrying Meta’s ambitions quickly “exposed its true colors” in third-party blind tests within the open-source community, revealing a stark gap between its generalization and reasoning abilities versus the hype. Facing intense community skepticism, Chief AI Scientist Yann LeCun finally admitted that during testing, “different model versions were used for different test sets to optimize the final scores.”

In rigorous AI academia and engineering circles, this crossed an unforgivable line. In other words, Meta trained Llama 4 to be a “test-taking machine” that only excels at past exam questions, rather than a truly cutting-edge intelligent model. It’s like giving math exams to a math champion, programming tests to a programming champion—each looks strong individually, but they’re not the same model.

This practice is called “cherry-picking” in AI academia, and “cheating” in exam-oriented education.

For Meta, which has always positioned itself as an “open-source lighthouse,” this scandal directly shattered its most valuable trust asset within the developer ecosystem. The immediate consequence was that Zuckerberg lost confidence in the original GenAI team’s engineering standards, leading to subsequent appointments of external executives and sidelining core infrastructure departments.

He spent between $14.3 billion and $15 billion acquiring a 49% stake in data annotation company Scale AI, parachuting 28-year-old Scale CEO Alexandr Wang as Meta’s Chief AI Officer, and establishing Meta’s Superintelligence Lab (MSL). Turing Award winner LeCun now reports to this 28-year-old. In October, Meta laid off about 600 MSL staff, including members of FAIR, the research division LeCun founded.

Meanwhile, the flagship model originally scheduled for release in summer 2025, Llama 4 Behemoth, was repeatedly delayed—from summer to fall, and ultimately indefinitely shelved.

Meta shifted focus to developing next-gen models codenamed “Avocado” (text) and “Mango” (image/video). Reports suggest Avocado aims to compete with GPT-5 and Gemini 3 Ultra. Originally scheduled for late 2025, its release was pushed to Q1 2026 due to underperformance in testing and training optimization. Meta is considering a closed-source release, abandoning its traditional open-source approach for the Llama series.

Meta made two fatal errors in AI modeling: first, faking benchmark results—destroying trust in the developer community; second, forcing FAIR, a foundational research team that takes a decade to mature, into a product-oriented organization driven by quarterly KPIs. These two issues are the root causes of the current talent exodus.

Self-developed chips: another broken leg

Talent is running, and chips are also problematic.

According to The Information, Meta recently canceled its most advanced internal AI training chip project.

Meta’s self-developed chip plan is called MTIA (Meta Training and Inference Accelerator). The initial roadmap was ambitious: MTIA v4 “Santa Barbara,” v5 “Olympus,” and v6 “Universal Core,” scheduled for delivery between 2026 and 2028. Olympus was designed as Meta’s first 2nm chiplet-based chip, aiming to cover high-end model training and real-time inference, ultimately replacing Nvidia in Meta’s training clusters.

Now, this cutting-edge training chip has been canceled.

Meta has made some progress—its inference chips, codenamed “Iris,” have been deployed at scale in data centers, mainly for Facebook Reels and Instagram recommendation systems, reportedly reducing overall costs by 40-44%. But inference and training are different beasts. Inference runs models; training develops models. Meta can produce inference chips but cannot yet build training chips capable of competing directly with Nvidia.

This isn’t the first time. In 2022, Meta attempted to develop inference chips internally but failed in small-scale deployment and gave up, turning instead to Nvidia for large orders.

The setback in self-developed chips has accelerated Meta’s rush to buy externally.

$135 billion panic procurement

In January 2026, Meta announced capital expenditure plans of $115 billion to $135 billion—almost double last year’s $72.2 billion. The majority of this will be spent on chips.

Within ten days, three major deals were finalized:

February 17: Meta signed a multi-year, cross-generational strategic partnership with Nvidia. Meta will deploy “millions” of Nvidia Blackwell and new Vera Rubin GPUs, plus Grace CPUs. Analysts estimate the deal is worth hundreds of billions of dollars, making Meta the first supercomputing customer to deploy Nvidia’s Grace CPUs at scale.
February 24: Meta signed a multi-year chip deal with AMD valued between $60 billion and $100 billion. Meta will purchase AMD’s latest MI450 series GPUs and sixth-generation EPYC CPUs. As part of the deal, AMD issued Meta warrants for up to 160 million common shares—about 10% of AMD’s stock—at $0.01 per share, vesting in stages based on delivery milestones.
February 26: The Information reports Meta signed a multi-billion-dollar multi-year agreement with Google to rent Google Cloud’s TPU chips for training and running its next-gen large language models. Discussions are underway for Meta to directly purchase TPU deployments starting in 2027.

In just ten days, a social media giant placed orders totaling over $100 billion across three chip suppliers.

This isn’t diversification. It’s panic buying.

Three layers of compute anxiety

Why is Meta in such a rush?

First, self-developed chips are no longer reliable. The most advanced training chip project was canceled, meaning Meta will have to rely on external suppliers for AI training needs in the foreseeable future. While their inference chips can handle recommendation systems, training cutting-edge models like Avocado—aiming to rival GPT-5—requires Nvidia or equivalent hardware.

Second, competitors won’t wait. OpenAI has secured massive resources from Microsoft, SoftBank, and the Abu Dhabi sovereign fund. Anthropic has locked in 1 million TPU and Trainium chips from Google and Amazon. Google’s Gemini 3 is fully trained on TPUs. If Meta can’t secure enough compute power, it risks losing its place in the race.

Third, perhaps most fundamentally, Zuckerberg needs to use “purchasing power” to compensate for “R&D shortcomings.” The failures of Llama 4, talent attrition, and chip development setbacks have made Meta’s AI narrative fragile on Wall Street. Signing big deals with Nvidia, AMD, and Google at this moment signals: “We have the money, we’re buying, we’re still in the game.”

Meta’s current strategy is: if software can’t be fixed, then buy hardware; if talent can’t be retained, then buy chips. But AI isn’t a game you win just by writing checks. Compute power is necessary but not sufficient. Without top-tier model teams and a clear technical roadmap, even the most expensive chips are just costly inventory in warehouses.

Buyers’ dilemma

Looking back at Meta’s three deals in February, one interesting detail is often overlooked.

Meta is buying current Blackwell and future Vera Rubin GPUs from Nvidia; from AMD, it’s buying MI450 and future MI455X; from Google, it’s renting current Ironwood TPUs with plans to purchase directly next year.

Three suppliers, three completely different hardware architectures and software ecosystems.

This means Meta must repeatedly switch between Nvidia’s CUDA, AMD’s ROCm, and Google’s XLA/JAX. A multi-supplier strategy can diversify supply chain risks and lower hardware costs, but it exponentially increases engineering complexity.

This is Meta’s most critical weakness: enabling a trillion-parameter model to train efficiently across three fundamentally different hardware and software stacks requires more than engineers familiar with CUDA; it demands architects capable of building cross-platform training frameworks from scratch.

Such talent likely numbers fewer than 100 worldwide. Pang Ruiming is one of them.

Spending $100 billion to acquire the world’s most complex hardware ecosystem, while losing the brains capable of wielding it—that’s the most surreal scene in Zuckerberg’s high-stakes gamble.

Zuckerberg’s gamble

Zooming out, Zuckerberg’s AI strategy over the past 18 months closely mirrors his all-in approach to the Metaverse:

Spot the trend, pour in money, hire aggressively, face setbacks, pivot strategy, then pour in more money.

From 2021 to 2023, it was the Metaverse—losing hundreds of millions annually, with the stock price dropping from $380 to $88. From 2024 to 2026, it’s AI—again, reckless spending, frequent reorganizations, and a narrative of “trust me, I have a vision.”

The difference is, this time AI is a more tangible opportunity than the Metaverse. Meta has the cash to burn, with ad revenue generating abundant cash flow—Q4 2025 revenue hit $59.9 billion, up 24% year-over-year.

The problem: money can buy chips, compute, and even seats at the table, but not the people who stay.

Pang Ruiming chose OpenAI; Russ Salakhutdinov left; LeCun started his own venture.

Zuckerberg’s current bet is that as long as he can buy enough chips, build enough data centers, and spend enough money, he can find or cultivate the talent to use these resources.

This gamble might pay off. Meta remains one of the wealthiest tech companies globally, with over $100 billion in operating cash flow as its strongest moat. From OpenAI to Anthropic, from Google to other competitors, Meta continues to poach talent. According to QuantumBit, nearly 40% of Meta’s Superintelligence team of 44 came from OpenAI.

But the brutal truth of AI competition is that compute reserves, talent rosters, and model benchmarks are all public. The Llama 4 benchmark scandal proved that in this industry, you can’t sustain a lead with just PPTs and PR.

Ultimately, the market only cares about one thing: how good is your model?

Position in the food chain

As AI arms race enters 2026, the hierarchy is becoming clearer:

At the top are OpenAI and Google. OpenAI boasts the strongest models, largest user base, and most aggressive funding. Google has full vertical integration—self-developed chips, models, and cloud infrastructure. Anthropic follows closely, leveraging Claude’s product strength and dual compute supply from Google and Amazon, firmly in the first tier.

Meta? It has spent the most, signed the biggest chip contracts, and reorganized most frequently, but so far, has yet to produce a front-line model convincing enough for the market.

Meta’s AI story is somewhat like Yahoo in 2005. Once among the richest internet companies, it was aggressively acquiring and spending but couldn’t produce a search engine like Google’s. Money isn’t everything. Zuckerberg needs to clarify what Meta’s AI goal really is, rather than chasing every hot trend.

Of course, it’s too early to write Meta’s obituary. With 3.58 billion monthly active users, $59.9 billion quarterly revenue, and the world’s largest social data set, Meta’s assets are hard for any competitor to replicate.

If the next-generation model codenamed Avocado can be delivered on schedule in 2026 and re-enter the top tier, Zuckerberg’s spending and restructuring will be seen as “strategic resilience.” But if once again it falls short, the $135 billion spent will only produce a warehouse full of powered silicon wafers.

After all, Silicon Valley’s AI arms race has never lacked super buyers waving checks. What’s missing are the people who know how to forge the future with this compute power.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.