08-04-2025 | Keysight Technologies | Test & Measurement
Keysight Technologies, Inc. has introduced the Keysight AI (KAI) Data Centre Builder, an advanced software suite that emulates real-world workloads to evaluate how new algorithms, components, and protocols affect the performance of AI training. The suite's workload emulation capability integrates LLM and other AI model training workloads into designing and validating AI infrastructure components – networks, hosts, and accelerators. This solution enables tighter synergy between hardware design, protocols, architectures, and AI training algorithms, boosting system performance.
AI operators use many parallel processing strategies, also known as model partitioning, to speed up AI model training. Aligning model partitioning with AI cluster topology and configuration improves training performance. During the AI cluster design phase, experimentation best answers critical questions. Many of the questions focus on data movement efficiency between the GPUs. Key considerations include:
Scale-up design of GPU interconnects inside an AI host or rack
Scale-out network design, including bandwidth per GPU and topology
Configuration of network load balancing and congestion control
Tuning of the training framework parameters
The workload emulation solution reproduces network communication patterns of real-world AI training jobs to accelerate experimentation, lower the learning curve required for proficiency, and supply deeper insights into the cause of performance degradation, which is challenging to achieve through real AI training jobs alone. Keysight customers can access a library of LLM workloads like GPT and Llama, with a selection of popular model partitioning schemas like Data Parallel (DP), Fully Sharded Data Parallel (FSDP), and 3D parallelism.
Using the workload emulation application in the solution enables AI operators to:
Experiment with parallelism parameters, including partition sizes and their distribution over the available AI infrastructure (scheduling)
Understand the impact of communications within and among partitions on overall job completion time (JCT)
Identify low-performing collective operations and drill down to identify bottlenecks
Analyse network utilisation, tail latency, and congestion to understand the impact they have on JCT
The solution's new workload emulation capabilities allow AI operators, GPU cloud providers, and infrastructure vendors to bring realistic AI workloads into their lab setups to validate the evolving designs of AI clusters and new components. They can also experiment to fine-tune model partitioning schemas, parameters, and algorithms to optimise the infrastructure and improve AI workload performance.
Ram Periakaruppan, vice president and general manager of Network Test and Security Solutions, Keysight, said: "As AI infrastructure grows in scale and complexity, the need for full-stack validation and optimisation becomes crucial. To avoid costly delays and rework, it's essential to shift validation to earlier phases of the design and manufacturing cycle. KAI Data Centre Builder's workload emulation brings a new level of realism to AI component and system design, optimising workloads for peak performance."
KAI Data Centre Builder is the foundation of the Keysight Artificial Intelligence (KAI) architecture, a portfolio of end-to-end solutions designed to help customers scale AI processing capacity in data centres by validating AI cluster components using real-world AI workload emulation.