At Supercomputing 2024 (SC24), Enfabrica Company unveiled a milestone in AI knowledge middle networking: the Accelerated Compute Cloth (ACF) SuperNIC chip. This 3.2 Terabit-per-second (Tbps) Community Interface Card (NIC) SoC redefines large-scale AI and machine studying (ML) operations by enabling large scalability, supporting clusters of over 500,000 GPUs. Enfabrica additionally raised $115 million in funding and is anticipated to launch its (ACF) SuperNIC chip in Q1 2025.
Addressing AI Networking Challenges
As AI fashions develop more and more giant and complex, knowledge facilities face mounting pressures to attach giant numbers of specialised processing models, resembling GPUs. These GPUs are essential for high-speed computation in coaching and inference however are sometimes left idle because of inefficient knowledge motion throughout current community architectures. The problem lies in successfully interconnecting 1000’s of GPUs to make sure optimum knowledge switch with out bottlenecks or efficiency degradation.
Conventional networking approaches can hyperlink roughly 100,000 AI computing chips in a knowledge middle earlier than inefficiencies and slowdowns change into vital. Based on Enfabrica’s CEO, Rochan Sankar, the corporate’s new know-how helps as much as 500,000 chips in a single AI/ML system, enabling bigger and extra dependable AI mannequin computations. By overcoming the constraints of standard NIC designs, Enfabrica’s ACF SuperNIC maximizes GPU utilization and minimizes downtime.
Key Improvements within the ACF SuperNIC
The ACF SuperNIC boasts a number of industry-first options tailor-made to trendy AI knowledge middle wants:
- Excessive-Bandwidth, Multi-Port Connectivity: The ACF SuperNIC delivers multi-port 800-Gigabit Ethernet to GPU servers, quadrupling the bandwidth in comparison with different GPU-attached NICs. This setup supplies unprecedented throughput and enhances multipath resiliency, making certain strong communication throughout AI clusters.
- Environment friendly Two-Tier Community Design: With a high-radix configuration of 32 community ports and as much as 160 PCIe lanes, the ACF SuperNIC simplifies the general structure of AI knowledge facilities. This effectivity permits operators to assemble large clusters utilizing fewer tiers, decreasing latency and enhancing knowledge switch effectivity throughout GPUs.
- Scaling Up and Scaling Out: The Enfabrica ACF SuperNIC, with its high-radix, high-bandwidth, and concurrent PCIe/Ethernet multipathing and knowledge mover capabilities, can uniquely scale up and scale out 4 to eight latest-generation GPUs per server system. This considerably will increase AI clusters’ efficiency, scale, and resiliency, making certain optimum useful resource utilization and community effectivity.
- Built-in PCIe Interface: The chip helps 128 to 160 PCIe lanes, delivering speeds over 5 Tbps. This design permits a number of GPUs to hook up with a single CPU whereas sustaining high-speed communication with knowledge middle backbone switches. The result’s a extra environment friendly and versatile structure that helps large-scale AI workloads.
- Resilient Message Multipathing (RMM): Enfabrica’s proprietary RMM know-how boosts the reliability of AI clusters. By mitigating the influence of community hyperlink failures or flaps, RMM prevents job stalls, making certain smoother and extra environment friendly AI coaching processes. Sankar notes the significance of this characteristic, particularly in giant setups the place hyperlinks to switches failures change into frequent.
- Software program-Outlined RDMA Networking: This distinctive characteristic empowers knowledge middle operators with full-stack programmability and debuggability, bringing the advantages of software-defined networking (SDN) into Distant Direct Reminiscence Entry (RDMA) setups. It permits customization of the transport layer, which might optimize cloud-scale community topologies with out sacrificing efficiency.
Enhanced Resiliency and Effectivity
Conventional methods typically require one-to-one connections between GPUs and numerous elements, resembling PCIe switches and RDMA NICs. Nevertheless, because the variety of GPUs in a system will increase, the chance of hyperlinks to switches failures grows, with potential disruptions occurring as typically as each 23 minutes in setups with over 100,000 GPUs, in response to Shankar.
The ACF SuperNIC addresses this concern by enabling a number of connections from GPUs to switches. This redundancy minimizes the influence of particular person part failures, boosting system uptime and reliability.
The SuperNIC additionally introduces the Collective Reminiscence Zoning characteristic, which helps zero-copy knowledge transfers and optimizes host memory management. By decreasing latency and enhancing reminiscence effectivity, this know-how maximizes the floating-point operations per second (FLOPs) utilization of GPU server fleets.
Scalability and Operational Advantages
The ACF SuperNIC’s design is just not solely about scale but additionally about operational effectivity. It supplies a software program stack that integrates with commonplace communication, current interfaces, and RDMA networking operations. This compatibility ensures environment friendly deployment throughout numerous AI compute environments composed of GPUs and accelerators (AI chips) from totally different distributors. Knowledge middle operators profit from streamlined networking infrastructure, decreasing complexity and enhancing the flexibleness of their AI knowledge facilities.
Availability and Future Prospects
Enfabrica’s ACF SuperNIC can be out there in restricted portions in Q1 2025, with each the chips and pilot methods now open for orders by means of Enfabrica and chosen companions. As AI fashions demand greater efficiency and bigger scales, Enfabrica’s modern method might play a pivotal function in shaping the subsequent technology of AI knowledge facilities designed to help Frontier AI models.
Filed in AI (Artificial Intelligence), Chip, generative AI, Semiconductors, Server, SoC and Supercomputer.
. Learn extra aboutTrending Merchandise

Motorola MG7550 – Modem with Built in WiFi | Approved for Comcast Xfinity, Cox | For Plans Up to 300 Mbps | DOCSIS 3.0 + AC1900 WiFi Router | Power Boost Enabled

Logitech MK235 Wireless Keyboard and Mouse Combo for Windows, USB Receiver, Long Battery Life, Laptop and PC Keyboard and Mouse Wireless

Lenovo V14 Gen 3 Business Laptop, 14″ FHD Display, i7-1255U, 24GB RAM, 1TB SSD, Wi-Fi 6, Bluetooth, HDMI, RJ-45, Webcam, Windows 11 Pro, Black

Sceptre 4K IPS 27″ 3840 x 2160 UHD Monitor up to 70Hz DisplayPort HDMI 99% sRGB Build-in Speakers, Black 2021 (U275W-UPT)

HP 230 Wireless Mouse and Keyboard Combo – 2.4GHz Wireless Connection – Long Battery Life – Durable & Low-Noise Design – Windows & Mac OS – Adjustable 1600 DPI – Numeric Keypad (18H24AA#ABA)

Sceptre Curved 24.5-inch Gaming Monitor up to 240Hz 1080p R1500 1ms DisplayPort x2 HDMI x2 Blue Light Shift Build-in Speakers, Machine Black 2023 (C255B-FWT240)

Logitech MK470 Slim Wireless Keyboard and Mouse Combo – Modern Compact Layout, Ultra Quiet, 2.4 GHz USB Receiver, Plug n’ Play Connectivity, Compatible with Windows – Off White

Lenovo IdeaPad 1 Student Laptop, Intel Dual Core Processor, 12GB RAM, 512GB SSD + 128GB eMMC, 15.6″ FHD Display, 1 Year Office 365, Windows 11 Home, Wi-Fi 6, Webcam, Bluetooth, SD Card Reader, Grey

Samsung 27′ T35F Series FHD 1080p Computer Monitor, 75Hz, IPS Panel, HDMI, VGA (D-Sub), AMD FreeSync, Wall Mountable, Game Mode, 3-Sided Border-Less, Eye Care, LF27T350FHNXZA
