Social Media Company, Meta, has announced that it’s building the world’s fastest AI computer which will be completed in mid-2022.
“Meta is announcing that we’ve designed and built the AI Research SuperCluster (RSC) — which we believe is among the fastest AI supercomputers running today and will be the fastest AI supercomputer in the world when it’s fully built out in mid-2022,” the company said in a blogpost.
According to Meta, its researchers have already started using RSC to train large models in natural language processing (NLP) and computer vision for research, with the aim of one-day training models with trillions of parameters.
The company said the supercomputer will help its AI researchers build new and better AI models that can learn from trillions of examples; work across hundreds of different languages; seamlessly analyze text, images, and video together; develop new augmented reality tools; and much more. Our researchers will be able to train the largest models needed to develop advanced AI for computer vision, NLP, speech recognition, and more.
Meta also plans to use the AI computer to fast track its inroads into the Metaverse. It said. “We hope RSC will help us build entirely new AI systems that can, for example, power real-time voice translations to large groups of people, each speaking a different language, so they can seamlessly collaborate on a research project or play an AR game together. Ultimately, the work done with RSC will pave the way toward building technologies for the next major computing platform — the metaverse, where AI-driven applications and products will play an important role”
RSC today comprises a total of 760 NVIDIA DGX A100 systems as its compute nodes, for a total of 6,080 GPUs — with each A100 GPU being more powerful than the V100 used in our previous system.
“Early benchmarks on RSC, compared with Meta’s legacy production and research infrastructure, have shown that it runs computer vision workflows up to 20 times faster, runs the NVIDIA Collective Communication Library (NCCL) more than nine times faster, and trains large-scale NLP models three times faster. That means a model with tens of billions of parameters can finish training in three weeks, compared with nine weeks before,” it stated.
When RSC is complete, Meta says the InfiniBand network fabric will connect 16,000 GPUs as endpoints, making it one of the largest such networks deployed to date. Additionally, we designed a caching and storage system that can serve 16 TB/s of training data, and we plan to scale it up to 1 exabyte.