NVIDIA to Construct UKs Most Powerful AI Supercomputer

02-11-2020 | By Sam Brown

Recently, NVIDIA announced its plans to construct the UK's largest AI supercomputer, the Cambridge-1 AI supercomputer. What are supercomputers, what hardware will NVIDIA be using, and how will it help advance the medical field?

What are supercomputers?

Supercomputers are computing systems which have a significantly greater amount of processing power than their general-purpose computer counterparts. This definition makes defining a supercomputer somewhat tricky as it depends on the current technology at the time. For example, the Cray 1 was a supercomputer launched in the 1970s that was significantly more powerful than any other machine at the time but compared to today standards it is on par with a 64-bit ARM microcontroller (the Cray-1 supercomputer was a 64-bit machine with a clock frequency of 80MHz). 

Being the most technologically superior computing systems of their time, supercomputers find themselves in the most demanding applications that involve large amounts of complex data processing. For example, supercomputers are often found in physics, medical, geological, meteorology, molecular modelling, and engineering simulations applications. However, it should be important to note that such a system cannot necessarily execute a piece of generic code any faster than a desktop PC (such as a word processor or spreadsheet). Modern supercomputers do not use customised CPUs with higher specs than standard CPUs, but instead same off-the-shelf parts in great numbers. This means that modern supercomputers are geared around massive parallel computing whereby a problem can be broken down into many individual calculations that can all be executed at the same time.

Why do AI systems require large amounts of processing power?

Artificial Intelligence, or AI, is the ability for a computing system to apply intelligence to a problem where it can learn from past information to recognise new patterns. To better understand the difference between an AI system and a hardcoded pattern recognition system, an AI can determine a cat in a photo without ever having seen that photo before. In contrast, a pattern recognition system would only be able to find specific cats that it had seen previously. The AI, understanding that what it has detected is a cat, can then make adjustments to itself to be able to recognise other cats in future pictures better. What makes AI very powerful is that it can recognise patterns that have similarities (such as the number of eyes, nose shape, and teeth) while ignoring others (such as fur colour).

When it comes to creating effective AI systems, large amounts of data are required so that the AI has the maximum opportunity to learn. This is why AI face and speech recognition have only become recently integrated into technology as tech companies now have large amounts of facial and speech data to learn from. But this data is far too great for a single computer to feed into a machine learning algorithm. Thus datacentres are used as they allow for large scale parallel processing which significantly cuts the amount of data processing time down. However, even large datacentres are not fully optimised to preform AI learning tasks as these involve deep neural nets with complex matrices, vectors, and large floating-point operations. While generic CPUs are not efficient at such tasks, it turns out that GPUs are which is why many AI systems are designed to run on graphics processing units, especially those designed by NVIDIA. Thus, an AI supercomputer would shift its hardware focus onto GPUs as opposed to CPUs.

What is the Cambridge-1 AI Supercomputer?

Understanding the importance of GPUs in AI applications, NVIDIA has announced that it will be building the UK's most powerful AI supercomputer, and will be 29th most powerful in the world. The system, which will be ready by the end of the year (2020), will provide over 400 petaflops of AI processing capabilities and 8 petaflops of Linpack performance. The computer system has four main focuses, including Joint industry research, university-granted compute time, supporting AI start-ups, and education on AI systems. 

The supercomputer is constructed using 80 NVIDIA DGX A100 systems which are connected using the NVIDIA Mellanox InfinBand networking systems. Each NVIDIA DGX A100 system integrates 8 NVIDIA A100 GPUS with 320GB total memory, 6 NVIDIA NVSWITCHES which provide 4.8TB/s bi-directional switching, 450GB/s peak bi-directional networking connections, dual 64-core AMD CPUs, and a 15TB GEN4 NVME SSD with a peak data transfer rate of 25GB/s. The NVIDIA A100 GPUS integrate double-precision tensor cores which are aimed at accelerating TensorFlow AI neural nets.

What will the Cambridge-1 AI Supercomputer help with?

While the supercomputer is designed to be used for a wide range of AI-based applications including research, start-ups, and joint ventures, one of the main applications for the system is to help accelerate medical development in the UK. NVIDIA Clara for Computational Drug Discovery is an example of a tool that the system will run to help find effective drugs by utilising data from radiology, genomics, and imaging. Such a tool helps to remove the need for laboratory discovery of new drugs, and thus allow researchers to narrow their field of search when exploring the effects of new compounds. The same system also integrates natural language processing which allows it to look through large numbers of research papers, literature, and databases to find important information. The Cambridge-1 AI Supercomputer will also be utilised in supercharging healthcare research to help improve patient care, diagnosis, and delivery of medical supplies. For example, such a system could be used to determine how viruses spread more accurately, and provide targeted vaccines and/or lockdowns to help contain the pathogen. 

Read More

By Sam Brown