CIMFlow LogoCIMFlow

Features

Comprehensive feature overview for CIMFlow compiler and simulator

CIMFlow provides a complete toolchain for CIM accelerator evaluation, spanning neural network compilation to detailed performance analysis.


Key Capabilities

MLIR-Based Compilation
Robust IR with optimization passes built on LLVM/MLIR infrastructure
Cycle-Accurate Simulation
Precise timing model for all hardware components using SystemC
Energy Estimation
Energy consumption tracking for CIM arrays, memory, and data movement
Configurable Architecture
Parameterizable CIM macro size, memory hierarchy, and core count
Profiling & Analysis
Detailed breakdowns of execution time and resource utilization
Unified CLI
Single command interface for compilation, simulation, and batch runs

Compiler Features

The CIMFlow compiler transforms ONNX models into optimized instruction sequences through a two-level pipeline.

Supported Operations

Supported CNN operations:

Compute
Conv2D (standard and depthwise), Linear/Gemm (fully-connected layers)
Pooling
MaxPool, AvgPool (OP-level templates available)
Element-wise
Add (residual connections), Mul (with quantization support)
Activation
ReLU (OP-level template)

Operations such as BatchNorm are fused during ONNX preprocessing. For the complete operator library, see DSL Reference.

CG-Level Optimization

The Computational Graph level handles workload partitioning:

Dependency analysis

Identifies data dependencies between operators for correct execution ordering.

Core mapping

Distributes operators across available cores for parallelism.

Communication planning

Schedules inter-core data transfers via NoC.

Strategy selection

Chooses optimal partitioning based on hardware constraints.

OP-Level Code Generation

The Operator level generates executable instructions:

Memory allocation

Assigns buffers to local memory, global memory, and CIM arrays.

Instruction scheduling

Orders operations to maximize throughput.

Address calculation

Computes memory addresses for all data accesses.

ISA emission

Produces the final 32-bit instruction stream.


Simulator Features

The SystemC-based simulator executes compiled programs with cycle-accurate timing.

Timing Model

Cycle-accurate execution
Precise timing for all instruction types
Memory latency modeling
Configurable access times for different memory levels
NoC simulation
Bandwidth constraints and routing delays for inter-core communication
Pipeline modeling
Instruction fetch, decode, and execute stages

Energy Estimation

CIM Array Energy
Per-operation energy based on macro configuration
Memory Access Energy
Read/write costs for local and global memory
Data Movement Energy
NoC transfer costs based on distance and bandwidth
Static Energy
Leakage consumption over execution time

Profiling Support

Enable profiling with the --profile flag:

cimflow run pipeline -m model.onnx -o output --profile=from-config

Profiling output includes:

  • Per-layer execution time breakdown
  • Memory bandwidth utilization
  • Core activity and idle time statistics
  • Energy consumption by component

CLI Features

The cimflow CLI provides unified access to all framework functionality.

cimflow run pipeline -m model.onnx -o output -t 8 -b 16
cimflow run from-file config/batch.json
cimflow compile cg -m model.onnx -o output -t 8 -k 16
cimflow sim -i instructions.bin -o output -t 8 -b 16

For complete CLI documentation, see CLI Reference.


Audience Perspectives

Design Space Exploration

CIMFlow enables systematic evaluation of CIM architecture variants. The configurable hardware model supports experiments with:

  • Macro group configuration (T, K parameters)
  • Memory hierarchy sizes and bandwidth
  • Core count and NoC topology
  • Quantization bit-widths

Profiling capabilities provide data for performance modeling and optimization studies, enabling validation of analytical models against cycle-accurate simulation results.

Pre-Silicon Validation

CIMFlow provides early performance estimates before hardware implementation. The cycle-accurate simulator models:

  • Instruction execution timing
  • Memory subsystem behavior
  • Inter-core communication overhead
  • Energy consumption by component

This enables identification of bottlenecks and validation of architectural decisions against target workloads before RTL implementation.

Deployment Evaluation

CIMFlow helps evaluate neural network deployments on CIM hardware. The framework supports:

  • ONNX model import from standard training frameworks
  • Performance comparison across model architectures
  • Quantization impact analysis
  • Batch size and throughput optimization

This enables assessment of whether models meet latency and energy requirements on target CIM accelerators.

Last updated on