Examples

This section provides hands-on tutorials for using CIMFlow to compile and simulate deep learning models on CIM hardware. Follow the tutorials in order for the best learning experience.

The Docker image includes all dependencies and example models pre-configured.

Pull the tutorial image

docker pull ghcr.io/buaa-ci-lab/cimflow-tutorial:latest

The download is approximately 3GB. Ensure 15GB or more disk space is available after unpacking.

If running locally, ensure CIMFlow is installed and download example models:

Download models

./scripts/download_models.sh

Local installation requires Ubuntu 22.04+ (see system requirements) and building LLVM/MLIR from source (~20-30 minutes).

Quick Start with Docker

Start an interactive container

docker run -it --rm \
  -v "$(pwd)/output:/app/output" \
  ghcr.io/buaa-ci-lab/cimflow-tutorial:latest

docker run -it --rm `
  -v "${PWD}/output:/app/output" `
  ghcr.io/buaa-ci-lab/cimflow-tutorial:latest

docker run -it --rm -v "%cd%/output:/app/output" ghcr.io/buaa-ci-lab/cimflow-tutorial:latest

This mounts a local output directory to persist results outside the container.

Run the tutorial scripts

Inside the container, execute the demo scripts in order:

./tutorial/demo0_setup.sh       # Verify installation
./tutorial/demo1_quickstart.sh  # End-to-end pipeline
./tutorial/demo2_stages.sh      # Step-by-step compilation
./tutorial/demo3_exploration.sh # Design space exploration

Key Terminology

CG (Compute Graph)

High-level representation of neural network operations

OP (Operator)

Low-level CIM hardware operations

ISA

Instruction Set Architecture executed by the simulator

NoC (Network-on-Chip)

On-chip interconnect for inter-core communication

Macro Group

A group of CIM macro units processing data in parallel

Flit

Flow control unit - basic data transfer unit in the NoC

Hardware Parameters Reference

These parameters configure the CIM architecture for compilation and simulation:

Parameter	Flag	Description	Typical Values
T	`-t`	Macro group size (rows per macro)	8, 16
K	`-k`	Macro group number (macros per core)	16, 32
B	`-b`	NoC bandwidth (flits per cycle)	8, 16
C	`-c`	Total core count in the accelerator	64, 128
batch_size	`--batch-size`	Inference batch size	1, 2, 4, 8