Features

CIMFlow provides a complete toolchain for CIM accelerator evaluation, spanning neural network compilation to detailed performance analysis.

Key Capabilities

MLIR-Based Compilation

Robust IR with optimization passes built on LLVM/MLIR infrastructure

Cycle-Accurate Simulation

Precise timing model for all hardware components using SystemC

Energy Estimation

Energy consumption tracking for CIM arrays, memory, and data movement

Configurable Architecture

Parameterizable CIM macro size, memory hierarchy, and core count

Profiling & Analysis

Detailed breakdowns of execution time and resource utilization

Unified CLI

Single command interface for compilation, simulation, and batch runs

Compiler Features

The CIMFlow compiler transforms ONNX models into optimized instruction sequences through a two-level pipeline.

Supported Operations

Supported CNN operations:

Compute

Conv2D (standard and depthwise), Linear/Gemm (fully-connected layers)

Pooling

MaxPool, AvgPool (OP-level templates available)

Element-wise

Add (residual connections), Mul (with quantization support)

Activation

ReLU (OP-level template)

Operations such as BatchNorm are fused during ONNX preprocessing. For the complete operator library, see DSL Reference.

CG-Level Optimization

The Computational Graph level handles workload partitioning:

Dependency analysis

Identifies data dependencies between operators for correct execution ordering.

Core mapping

Distributes operators across available cores for parallelism.

Communication planning

Schedules inter-core data transfers via NoC.

Strategy selection

Chooses optimal partitioning based on hardware constraints.

OP-Level Code Generation

The Operator level generates executable instructions:

Memory allocation

Assigns buffers to local memory, global memory, and CIM arrays.

Instruction scheduling

Orders operations to maximize throughput.

Address calculation

Computes memory addresses for all data accesses.

ISA emission

Produces the final 32-bit instruction stream.

Simulator Features

The SystemC-based simulator executes compiled programs with cycle-accurate timing.

Timing Model

Cycle-accurate execution

Precise timing for all instruction types

Memory latency modeling

Configurable access times for different memory levels

NoC simulation

Bandwidth constraints and routing delays for inter-core communication

Pipeline modeling

Instruction fetch, decode, and execute stages

Energy Estimation

CIM Array Energy

Per-operation energy based on macro configuration

Memory Access Energy

Read/write costs for local and global memory

Data Movement Energy

NoC transfer costs based on distance and bandwidth

Static Energy

Leakage consumption over execution time

Profiling Support

Enable profiling with the --profile flag:

cimflow run pipeline -m model.onnx -o output --profile=from-config

Profiling output includes:

Per-layer execution time breakdown
Memory bandwidth utilization
Core activity and idle time statistics
Energy consumption by component

CLI Features

The cimflow CLI provides unified access to all framework functionality.

cimflow run pipeline -m model.onnx -o output -t 8 -b 16

cimflow run from-file config/batch.json

cimflow compile cg -m model.onnx -o output -t 8 -k 16

cimflow sim -i instructions.bin -o output -t 8 -b 16

For complete CLI documentation, see CLI Reference.

Audience Perspectives

Design Space Exploration

CIMFlow enables systematic evaluation of CIM architecture variants. The configurable hardware model supports experiments with:

Macro group configuration (T, K parameters)
Memory hierarchy sizes and bandwidth
Core count and NoC topology
Quantization bit-widths

Profiling capabilities provide data for performance modeling and optimization studies, enabling validation of analytical models against cycle-accurate simulation results.

Pre-Silicon Validation

CIMFlow provides early performance estimates before hardware implementation. The cycle-accurate simulator models:

Instruction execution timing
Memory subsystem behavior
Inter-core communication overhead
Energy consumption by component

This enables identification of bottlenecks and validation of architectural decisions against target workloads before RTL implementation.

Deployment Evaluation

CIMFlow helps evaluate neural network deployments on CIM hardware. The framework supports:

ONNX model import from standard training frameworks
Performance comparison across model architectures
Quantization impact analysis
Batch size and throughput optimization

This enables assessment of whether models meet latency and energy requirements on target CIM accelerators.

Features

On this page