Old Huang wins! Nvidia's H100 order has been scheduled for 24 years, and Musk can't sit still

Original source: Qubit

The best GPU for refining large models NVIDIA H100, all sold out!

Even if you order it now, it will not be available until Q1 or even Q2 in 2024.

This is the latest news revealed to the Wall Street Journal by CoreWeave, a cloud vendor closely related to Nvidia.

Supply has been extremely tight since early April. In just one week, expected delivery times jumped from reasonable levels to the end of the year.

Amazon AWS, the world’s largest cloud vendor, also confirmed the news. CEO Adam Selipsky recently said:

A100 and H100 are state of the art… hard to get even for AWS.

Earlier, Musk also said in a talk show: GPU is now more difficult to obtain than d products.

If you find a “scalper” to buy, the premium is as high as 25%.

For example, the price on Ebay has risen from about US$36,000 ex-factory to US$45,000**, and the supply is scarce.

Under this situation, large domestic technology companies such as Baidu, Byte, Ali, and Tencent** have also placed orders for A800 and other chips with a total of US$5 billion** from Nvidia.

Among them, only 1 billion US dollars of goods can be delivered within this year, and the other 80% will have to wait until 2024.

So who are the existing high-end GPUs sold to? Where is this wave of production capacity stuck?

Who sells H100 to, Lao Huang has the final say

Since the outbreak of ChatGPT, Nvidia A100 and H100, which are good at training large models, have become popular.

Even H100 can already be used as an asset for start-up companies to find investment funds to obtain mortgage loans.

AI companies represented by OpenAI and Meta, cloud computing companies represented by Amazon and Microsoft, private clouds Coreweave and Lambda, and all various technology companies who want to refine their own large models, The demand is huge.

**However, it is basically Nvidia CEO Huang Renxun who has the final say on who to sell to. **

According to The Information, H100 is in such a short supply that Nvidia** allocated a large number of new cards to CoreWeave**, and limited supply** to established cloud computing companies such as Amazon and Microsoft.

(Nvidia has also invested directly in CoreWeave.)

External analysis is because these established companies are developing their own AI acceleration chips, hoping to reduce their dependence on Nvidia, so Lao Huang will help them.

Lao Huang also controls all aspects of the company’s daily operations within Nvidia, even including “reviewing what sales representatives are going to say to small potential customers”.

About 40 executives in the company directly report to Lao Huang**, which is more than the direct subordinates of Meta Xiaozha and Microsoft Xiaona combined.

A former Nvidia manager revealed, “At Nvidia, Huang Renxun is actually the chief product officer of every product.”

A while ago, it was also rumored that Lao Huang did an exaggerated thing: Ask some small cloud computing companies to provide their customer lists, wanting to know who the end users of the GPU are.

According to external analysis, this move will allow Nvidia to better understand customers’ needs for its products, and it has also raised concerns that Nvidia may use this information for additional benefits.

Some people also think that another reason is that Lao Huang wants to know who is really using the card and who is just hoarding the card and not using it.

Why do Nvidia and Lao Huang have such a big voice now?

The main reason is that the supply and demand of high-end GPUs are too unbalanced. According to the calculation of the GPU Utils website, the H100** gap is as high as 430,000**.

The author Clay Pascal estimated the number of H100 needed by various players in the AI industry in the near future based on various known information and rumors.

AI company side:

  • OpenAI may need 50,000 H100s to train GPT-5
  • Meta is said to need 100,000
  • InflectionAI’s 22,000-card computing power cluster plan has been announced
  • Major AI start-ups such as Anthropic, Character.ai, MistraAI and HelsingAI in Europe each require on the order of 10,000.

Cloud Computing Company:

  • In large public clouds, Amazon, Microsoft, Google, and Oracle are all calculated at 30,000, totaling 120,000
  • The private cloud represented by CoreWeave and Lambda needs a total of 100,000

It adds up to 432,000.

This is not counting some financial companies and other industry participants such as JP Morgan Chase and Two Sigma who have also begun to deploy their own computing power clusters.

So the question is, with such a large supply gap, can’t we produce more?

Lao Huang also thought about it, but production capacity is stuck.

**Where is the production capacity stuck this time? **

In fact, TSMC has already adjusted its production plan for Nvidia.

However, it still failed to fill such a huge gap.

Charlie Boyle, vice president and general manager of Nvidia’s DGX system, said that this time it’s not stuck on the wafer, but that TSMC’s CoWoS packaging technology has encountered a bottleneck in its production capacity.

It is Apple that competes with Nvidia for TSMC’s production capacity, and it will get the A17 chip for the next-generation iPhone before the September conference.

TSMC recently stated that it is expected to take 1.5 years to bring the packaging process backlog back to normal.

CoWoS packaging technology is TSMC’s housekeeping skill, and the reason why TSMC can beat Samsung to become Apple’s exclusive chip foundry depends on it.

The products packaged by this technology have high performance and strong reliability, which is why the H100 can have a bandwidth of 3TB/s (or even higher).

The full name of CoWoS is Chip-on-Wafer-on-Substrate, which is a chip integration technology that is unique on the wafer level.

This technology enables the packaging of multiple chips onto a silicon interposer that is only 100μm thick**.

According to reports, the area of the next-generation interposer will reach 6 times the reticle, which is about 5000mm².

So far, apart from TSMC, no manufacturer has this level of packaging capability.

While CoWoS is certainly powerful, wouldn’t it work without it? Can other manufacturers do it?

Not to mention that Lao Huang has already stated that “we will not consider adding a second H100 foundry”.

In reality, it might not be possible.

Nvidia has cooperated with Samsung before, but the latter has never produced H100 series products for Nvidia, or even other 5nm process chips.

Based on this, some people speculate that Samsung’s technical level may not be able to meet Nvidia’s technological needs for cutting-edge GPUs.

As for Intel…their 5nm products don’t seem to be coming out yet.

Since it is not feasible to change the manufacturer of Lao Huang, how about users directly switch to AMD?

AMD,Yes?

In terms of performance alone, AMD is indeed slowly catching up.

AMD’s latest MI300X has 192GB of HBM3 memory, 5.2TB/s bandwidth, and can run 80 billion parameter models.

The DGX GH200 just released by Nvidia has a memory of 141GB of HBM3e and a bandwidth of 5TB/s.

But this does not mean that AMD can immediately fill the vacancy of the N card——

Nvidia’s real “moat” lies in the CUDA platform.

###

CUDA has established a complete development ecosystem, which means that if users buy AMD products, it will take longer to debug.

An executive of a private cloud company said that no one would dare to risk spending $300 million to deploy 10,000 AMD GPUs experimentally.

The executive believes that the development and debugging cycle may take at least two months.

Against the background of rapid replacement of AI products, a two-month gap may be fatal for any manufacturer.

However, Microsoft extended an olive branch to AMD.

Previously, there were rumors that Microsoft was preparing to jointly develop an AI chip code-named “Athena” with AMD.

Earlier, when MI200 was released, Microsoft was the first to announce the purchase and deploy it on its cloud platform Azure.

For example, MSRA’s new large model infrastructure RetNet was trained on 512 AMD MI200s a while ago.

Under the situation that Nvidia occupies almost the entire AI market, someone may need to take the lead in the charge, and the entire large-scale AMD computing power cluster must be prototyped before anyone dares to follow up.

However, in a short period of time, Nvidia H100 and A100 are still the most mainstream choices.

One More Thing

A while ago, when Apple released the new M2 Ultra chip that supports up to 192GB of memory**, many practitioners enjoyed using it to fine-tune large models.

After all, the memory and video memory of Apple’s M-series chips are unified, 192GB memory is 192GB video memory, which is 2.4 times that of 80GB H100, or 8 times that of 24GB RTX4090.

However, after someone really bought this machine, the actual test and training speed** is not as good as Nvidia RTX3080TI**, fine-tuning is not cost-effective, let alone training.

After all, the computing power of the M-series chips is not specifically optimized for AI computing, and Everbright video memory is useless.

It seems that it mainly depends on H100 to refine the large model, and H100 is something you can’t ask for.

Faced with this situation, there is even a magical “GPU song”** circulating on the Internet.

Very brainwashing, enter with caution.

, duration 04:10

GPU song home

Reference link: [1] [2] [3] [4] [5] [6] [7] [8] [9]

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)