CCIX: Low Latency, Coherent Interconnect

09/10/2020

CCIX (Cache Coherent Interconnect for Accelerators) is the interconnect protocol which enables low-latency accesses to memory and devices sitting across the PCIe bus. CCIX implementations are available with system support from hardware manufacturers, and a stable software eco-system.

The CCIX® standard allows processors based on different instruction set architectures to extend the benefits of cache coherent, peer processing to a number of acceleration devices including FPGAs, GPUs, network/storage adapters, intelligent networks and custom ASICs.

CCIX simplifies the development and adoption by extending well-established data center hardware and software infrastructure. This ultimately allows system designers to seamlessly integrate the right combination of heterogeneous components to address their specific system needs.

 

Use Case #1:

Host Memory Expansion over PCIe Bus with Computation Storage Services

Data-path acceleration for offloading security, authentication and authorization, such as:

  • In-line data compression and correction with custom implementations
  • In-line encryption and decryption using user modifiable keys
  • Key management functions for authorization and session management

Offloading parts of Video applications or NLP (Natural Language Processing) applications to Compute engine (FPGA) inside the device:

  • Vector Atomics to help improve CPU load/store efficiency for DSP applications such as MAX(), MIN(), SORT(), MEAN().
  • Extend processor cache coherency to memory / storage device, reducing latency.

 

 

Use Case #2:

Enabling Low latency Memory Access from Accelerator to Host memory

Examples:

  • SMART NIC and Network packet filtering, where large flow tables are present in host memory, but smart NIC uses these rules for packet filtering
  • FinTech application, where specialized compute engines like FPGA and GPU on accelerator card operate directly on large data present in host, memory

 

 

Use Case #3:

Fine Grain Data Sharing between Host Applications and Accelerator Function

Examples:

  • Data Analytics and Video Processing, where algorithm is split into multiple stages and spawned on different hardware like Host CPU and GPU or FPGA based Accelerators. Results from each computational stage can be pipelined by snooping memory across the fabric, using CCIX hardware
  • In Memory Databases or Caching Applications, where relational data resident in host memory and accelerator engines offloads CPU by running queries without intervening CPU
  • Graph Search Algorithms, where information collected from various IOT devices is in large host memory, and GPU or FPGA based accelerator, piece-wise, processes the data, without copying it on its local buffers

 

 

For more information on the CCIX platform and SMART's development products, please contact us.