CCIX: Low Latency, Coherent Interconnect
CCIX (Cache Coherent Interconnect for Accelerators) is the interconnect protocol which enables low-latency accesses to memory and devices sitting across the PCIe bus. CCIX implementations are available with system support from hardware manufacturers, and a stable software eco-system.

The CCIX® standard allows processors based on different instruction set architectures to extend the benefits of cache coherent, peer processing to a number of acceleration devices including FPGAs, GPUs, network/storage adapters, intelligent networks and custom ASICs.

CCIX simplifies the development and adoption by extending well-established data center hardware and software infrastructure. This ultimately allows system designers to seamlessly integrate the right combination of heterogeneous components to address their specific system needs.

Key Features

  • CCIX implementations available from silicon manufacturers, with infrastructure supported by IP providers and a stable software eco-system
  • Low latency access to data, whether it resides in the host memory or accelerator memory
  • Enables direct addressing of memory buffers which reside on accelerator hardware, thereby eliminating PCIe IO remapping of addresses and BAR configurations
  • Leveraging existing PCIe PHY and Data link infrastructure
  • Additional transfer-rate of 25Gbps beyond PCIe Gen4 specifications are available

Use Case #1:

Host Memory Expansion over PCIe Bus with
Computation Storage Services

Data-path acceleration for offloading security, authentication and authorization, such as:

  • In-line data compression and correction with custom implementations
  • In-line encryption and decryption using user modifiable keys
  • Key management functions for authorization and session management

Offloading parts of Video applications or NLP (Natural Language Processing) applications to Compute engine (FPGA) inside the device:

  • Vector Atomics to help improve CPU load/store efficiency for DSP applications such as MAX(), MIN(), SORT(), MEAN().
  • Extend processor cache coherency to memory / storage device, reducing latency.

Use Case #2:

Enabling Low latency Memory Access from
Accelerator to Host memory

Examples:

  • SMART NIC and Network packet filtering, where large flow tables are present in host memory, but smart NIC uses these rules for packet filtering
  • FinTech application, where specialized compute engines like FPGA and GPU on accelerator card operate directly on large data present in host ,memory

Use Case #3:

Fine Grain Data Sharing between
Host Applications and Accelerator Function

Examples:

  • Data Analytics and Video Processing, where algorithm is split into multiple stages and spawned on different hardware like Host CPU and GPU or FPGA based Accelerators. Results from each computational stage can be pipelined by snooping memory across the fabric, using CCIX hardware.
  • In Memory Databases or Caching Applications, where relational data resident in host memory and accelerator engines offloads CPU by running queries without intervening CPU.
  • Graph Search Algorithms, where information collected from various IOT devices is in large host memory, and GPU or FPGA based accelerator, piece-wise, processes the data, without copying it on its local buffers.

For more information on the CCIX related products, click here.