Article 3R2E9 Cambricon, Makers of Huawei's Kirin NPU IP, Build A Big AI Chip and PCIe Card

Cambricon, Makers of Huawei's Kirin NPU IP, Build A Big AI Chip and PCIe Card

by
Ian Cutress
from on (#3R2E9)

p-r-s-four3_575px.jpg

Cambricon Technologies, the company in collaboration with HiSilicon / Huawei for licensing specialist AI silicon intellectual property for the Kirin 970 smartphone chipset, have gone solo and created their own series of chips for the data center.

The IP inside the Kirin 970 is known as Cambricon-1A, the company's first licensable IP. At the time, finding information on Cambricon was difficult: its website was a series of static images with Chinese embedded into the image itself. Funnily enough, we used the AI-accelerated translate feature on the Huawei Mate 10 to translate what the website said. Fast forward 12-18 months, and the Cambricon website is now interactive and has information about upcoming products. A few of which were announced recently.

The Big Chip: Going for Data Center

Built on TSMC's 16FF, the MLU-100 is an 80W chip with a capability of 64 TFLOPS of traditional half-precision or 128 TOPS using the 8-bit integer metric commonly used in machine learning algorithms. This is at 1.0 GHz, or the 'standard' mode - Cambricon's CEO, Dr Chan Tianshi, stated that their new chip has a high-performance mode at 1.30 GHz, which allows for 83.2 TFLOPS (16-bit float) or 166.4 TOPS (8-bit int) but rises to 110W. This technically decreases performance efficiency, but allows for a faster chip. All this data relies on sparse data modes being enabled.

p-r-s-four2_575px.jpg

The technology behind the chip is Cambricon's latest MLUv01 architecture, which is understood to be a variant of the Cambricon-1A used in the Kirin chipsets but scaled to something bigger and faster. Obviously additional rules have to be implemented for data and power management compared to the mobile IP. Cambrian also has its 1H architecture and newly announced 1M architecture, although there is no disclosure as to how these might relay to the chip.

David Schor from WikiChip (the main source of this article) states that this could be NVIDIA's first major ASIC competition for machine learning, if made available to commercial partners. To that end, Cambricon is also manufacturing a PCIe card.

Specification Comparison
AnandTechCambricon
MLU100-Base
Cambricon
MLU100-Perf
Tesla V100
(SXM2)
Tesla V100
(PCIe)
CUDA Cores-51205120
Tensor Cores-640640
Core Clock1.0 GHz1.3 GHz??
Boost Clock-1455MHz1370MHz
Memory ClockDDR4-16001.75Gbps HBM21.75Gbps HBM2
Memory Bus Width256-bit4096-bit4096-bit
Memory Bandwidth102.4GB/sec900GB/sec900GB/sec
VRAM16GB
32GB
16GB
32GB
16GB
32GB
L2 Cache-6MB6MB
Half Precision64.0 TFLOPS83.2 TFLOPS30 TFLOPS28 TFLOPS
Single Precision-15 TFLOPS14 TFLOPS
Double Precision-7.5 TFLOPS7 TFLOPS
Deep Learning128.0 TOPS166.4 TOPS120 TFLOPS112 TFLOPS
GPU-GV100GV100
Transistor Count?21B21B
TDP80 W110 W300W250W
Form FactorPCIeSXM2PCIe
CoolingActivePassivePassive
ProcessTSMC 16FFTSMC 12FFNTSMC 12FFN
ArchitectureCambricon-1?VoltaVolta

Obviously NVIDIA has a strong user base and multiple generations at this, along with the software in hand to take advantage of their hardware. Cambricon did not go into detail about how they plan to support any SDKs for the new chip, however it does have a series of SDKs on its website, supporting TensorFlow, Caffe, and MXNet.

Getting Into the Data Center: PCIe

The best way to be plug and play in a data center is through a PCIe card. Cambricon's MLU100 accelerator card is just that: a PCIe 3.0 x16 enabled implementation with either 16 or 32 GB of DDR4-3200 memory on a 256-bit bus, which is good for 102.4 GB/s of bandwidth. To get that much memory on NVIDIA requires the high end cards, but those cards offer multiple times the memory bandwidth. The memory on the MLU100 card has ECC enabled also.

p-r-s-four1_575px.jpg

The reports so far state that Lenovo is offering the cards as add-ons to its ThinkSystem SR650 dual Intel Xeon servers; up to two per machine. Looking on the Lenovo website it does not look like they are available quite yet. Given Huawei's big enterprise presence, it is likely that we might see the chips in those systems as well.

Next Generation: 5 TOPS/Watt

Also reported was the new Cambricon-1M product IP, although the company was not forthcoming with details. WikiChip states that this new IP is built primarily for 7nm, so we are likely to see it when Huawei/HiSilicon starts shipping 7nm mobile processors and then into the next generation of server-focused products. The goal for this IP is to hit 5 TOPS/Watt, compared to the 3 TOPS/Watt advertised by ARM's IP. David also states that Cambricon has a training and inference chip planned for later this year, with another update in 2019.

Buy Huawei P20 on Amazon.comRelated Reading

Source: WikiChip, Cambricon 1, Cambricon 2

Gallery: Cambricon, Makers of Huawei's Kirin NPU IP, Build A Big AI Chip and PCIe Cardp-r-s-four3_thumb.jpgp-r-s-four1_thumb.jpgp-r-s-four2_thumb.jpgp-r-s-four4_thumb.jpgp-r-s-four5_thumb.jpgp-r-s-four6_thumb.jpg
External Content
Source RSS or Atom Feed
Feed Location https://anandtech.com/rss/
Feed Title
Feed Link https://anandtech.com/
Reply 0 comments