Quick Start

Estimated time: ~3 minutes

This tutorial demonstrates running ResNet-18 through the complete CIMFlow pipeline with a single command.

Prerequisite: Complete the Docker Tutorial first, or ensure CIMFlow is installed locally.

The Pipeline Command

CIMFlow compiles ONNX models to CIM hardware instructions and simulates execution. The run pipeline command handles the complete flow:

ONNX Model

CG Compile

OP Compile

Simulate

Report

Running the Pipeline

Configure Parameters

For this example, we use the following configuration:

Parameter	Value	Description
Model	ResNet-18	Standard CNN architecture
T	8	Macro group size
K	16	Macro group number
B	16	NoC bandwidth (flits)
C	64	Core count
Batch Size	8	Inference batch

Execute the Pipeline

cimflow run pipeline \
    -m data/models/resnet18.onnx \
    -o output/quickstart \
    -t 8 -k 16 -b 16 -c 64 \
    --batch-size 8

What happens during execution:

CG stage: Progress bar shows graph partitioning into execution stages
OP stage: Progress bar shows per-core instruction generation
Simulation: Cycle-accurate execution across all cores

View Results

After completion, check the output:

ls output/quickstart/

Understanding the Results

The simulation produces a report with key performance metrics:

simulation_report_*.txt

Simulation Result:
  - latency:            2.08837 ms
  - average power:      4420.4895 mW
  - total energy:       9231608896.0073 pJ/it
  - TOPS:               0.5319
  - TOPS/W:             0.1203

Metric Definitions

Metric	Description
Latency	End-to-end inference time for one batch
Average Power	Power consumption during inference
Total Energy	Energy consumed per inference iteration
TOPS	Tera operations per second (throughput)
TOPS/W	Energy efficiency (throughput per watt)

What Just Happened

The pipeline executed three stages:

1. CG Compilation

Parsed the ONNX model
Created compute graph representation
Partitioned computation for CIM architecture

2. OP Compilation

Generated ISA instructions for each core
Handled memory allocation and data movement
Optimized instruction scheduling

3. Simulation

Cycle-accurate simulation of CIM hardware
Modeled all 64 cores in parallel
Tracked power and energy consumption

Adjusting Parameters

Experiment with different configurations:

Reduce NoC bandwidth to 8 flits

cimflow run pipeline \
    -m data/models/resnet18.onnx \
    -o output/test_b8 \
    -t 8 -k 16 -b 8 -c 64 \
    --batch-size 8

Lower bandwidth increases communication latency but reduces hardware cost.

Larger batch for higher throughput

cimflow run pipeline \
    -m data/models/resnet18.onnx \
    -o output/test_batch16 \
    -t 8 -k 16 -b 16 -c 64 \
    --batch-size 16

Larger batches improve throughput by better utilizing the multi-core parallelism.

Trying Different Models

Run with MobileNetV2:

cimflow run pipeline \
    -m data/models/mobilenetv2.onnx \
    -o output/mobilenet \
    -t 8 -k 16 -b 16 -c 64 \
    --batch-size 8

Keeping Intermediate Files

Use --keep-ir to preserve all intermediate representations:

Keep all intermediate files

cimflow run pipeline \
    -m data/models/resnet18.onnx \
    -o output/debug \
    -t 8 -b 16 \
    --keep-ir

What gets saved

CG instructions (JSON) - High-level compute graph operations
ISA instructions (JSON and assembly) - Per-core instruction sequences
Detailed logs - Compilation and simulation traces

Verbose Output

Enable detailed logging to see compilation progress:

Enable verbose logging

cimflow run pipeline \
    -m data/models/resnet18.onnx \
    -o output/verbose \
    -t 8 -b 16 \
    -l VERBOSE

What You Learned

Pipeline Execution — Run end-to-end compilation and simulation with one command
Results Interpretation — Understand latency, power, TOPS, and efficiency metrics

Quick Start

On this page