-
Spirent announced a solution to emulate AI workloads over Ethernet
-
VP of Product Management Aniket Khosla said packet loss is a “death knell” for Ethernet on the back-end network
-
Analyst Roy Chua said Spirent’s solution is relevant for telcos if they want to invest in AI model training
Spirent wants to get artificial intelligence (AI) out of traffic jams across the Ethernet highway.
The company at the end of February announced a solution to test AI workloads over Ethernet. Essentially, customers can emulate 400G xPU workloads for AI environments without having to build actual labs for testing.
Spirent VP of Product Management Aniket Khosla said “nobody knows how to test with Ethernet right now.” The company wants to change that.
He explained in the traditional data center, there are front-end and back-end networks. The front-end bit "is more of your traditional switching network.” The back-end network is where AI resides, as well as where concerns about packet loss come into play. Packet loss is something that happens when packets of data fail to reach their expected destination.
If a front-end network experiences packet loss, “it’s not catastrophic,” Khosla said. “If you lose a packet, there’s high latency. The network just works fine and recovers.”
The same can’t be said for a back-end network.
“In an AI data center network specifically, if there is packet loss or latency, it’s kind of a death knell to that network,” he said. “Everything comes to a grinding halt.”
The reason comes down to this: Ethernet “wasn’t designed to be a lossless network,” where devices that make up the network fabric are configured to prevent packet loss.
RELATED: DriveNets revamps Ethernet to connect AI megaclusters in the cloud
“That’s not what it was built to do,” Khosla stated. “People are deploying Ethernet on their back-end networks right now, but there is no way for them to test how that Ethernet fabric performs and what the implications are to the back-end data center if there is packet loss and latency.”
The size of AI training workloads is massive, so traffic has to be split up into smaller chunks and then distributed to the entire data center network. What Spirent is doing is mimicking those distributed traffic patterns from its hardware, “which no one’s been able to do so far," he said.
Most of the customers Spirent has talked to about AI and the data center back-end have been “the big U.S. hyperscalers,” Khosla noted. Telcos? Not so much – yet.
Of the 30 customers Khosla has spoken to, there has been one telco. And that telco is “still trying to figure out how to use this stuff and build out their AI capabilities,” he said.
“They’re still figuring out how they’re going to use and monetize these AI pieces, whether they choose to build the back-end data center networks themselves, whether they choose to go to a third-party cloud provider to provide these services. I think it’s still early days,” said Khosla.
Getting the job done quicker
AvidThink analyst Roy Chua said Spirent’s AI traffic emulation is important for anyone “who wants to run a medium to large distributed training setup with hundreds/thousands to tens of thousands of CPUs/GPUs” and understand how their Ethernet network fabric will perform in that environment.
Customers can leverage Spirent’s solution to pick between fabric vendors or to work out “appropriate configurations and architectures” that can help reduce job completion time (JCT).
RELATED: AI demands 5x more fiber in the data center
“For any model training activity, reducing job completion time can help improve the utilization of a very expensive resource, plus it reduces the time needed to train models with a large number of parameters,” Chua told Fierce.
As for how Spirent’s solution impacts telcos, he said it depends on how much they want to invest in AI model training or fine tuning. It’s relevant if telcos are looking to “benchmark Ethernet fabrics” that they're considering for their own AI data center build out.
Emulating AI traffic would also allow telcos to estimate the impact on JCT before acquiring “a large number of servers with CPUs/GPUs.”
Still early innings for telco AI
“It's still unclear if many telcos will be training their own models at scale,” Chua noted.
Some operators, such as South Korea’s SK Telecom, are “pushing aggressively” into generative AI. In fact, SK Telecom has invested $100 million in Anthropic, one of AI’s hottest startups.
DT also “appears to be very interested” in this space, Chua added, as it recently landed its first Business GPT customer.
“Other telcos are waiting to see early outcomes before jumping in,” he said. “These telcos are content to let the foundation model providers and other parts of the ecosystem (like hyperscalers) invest in the training infrastructure.”
Separately, Spirent last week announced it’s getting acquired by network testing company Viavi Solutions. Perhaps Viavi can help bring Spirent’s product to the forefront for telcos.