Quick Start
Run an end-to-end pipeline with one command
This tutorial demonstrates running ResNet-18 through the complete CIMFlow pipeline with a single command.
Prerequisite: Complete the Docker Tutorial first, or ensure CIMFlow is installed locally.
The Pipeline Command
CIMFlow compiles ONNX models to CIM hardware instructions and simulates execution. The run pipeline command handles the complete flow:
Running the Pipeline
Configure Parameters
For this example, we use the following configuration:
| Parameter | Value | Description |
|---|---|---|
| Model | ResNet-18 | Standard CNN architecture |
| T | 8 | Macro group size |
| K | 16 | Macro group number |
| B | 16 | NoC bandwidth (flits) |
| C | 64 | Core count |
| Batch Size | 8 | Inference batch |
Execute the Pipeline
cimflow run pipeline \
-m data/models/resnet18.onnx \
-o output/quickstart \
-t 8 -k 16 -b 16 -c 64 \
--batch-size 8What happens during execution:
- CG stage: Progress bar shows graph partitioning into execution stages
- OP stage: Progress bar shows per-core instruction generation
- Simulation: Cycle-accurate execution across all cores
View Results
After completion, check the output:
ls output/quickstart/Understanding the Results
The simulation produces a report with key performance metrics:
Simulation Result:
- latency: 2.08837 ms
- average power: 4420.4895 mW
- total energy: 9231608896.0073 pJ/it
- TOPS: 0.5319
- TOPS/W: 0.1203Metric Definitions
| Metric | Description |
|---|---|
| Latency | End-to-end inference time for one batch |
| Average Power | Power consumption during inference |
| Total Energy | Energy consumed per inference iteration |
| TOPS | Tera operations per second (throughput) |
| TOPS/W | Energy efficiency (throughput per watt) |
What Just Happened
The pipeline executed three stages:
- Parsed the ONNX model
- Created compute graph representation
- Partitioned computation for CIM architecture
- Generated ISA instructions for each core
- Handled memory allocation and data movement
- Optimized instruction scheduling
- Cycle-accurate simulation of CIM hardware
- Modeled all 64 cores in parallel
- Tracked power and energy consumption
Adjusting Parameters
Experiment with different configurations:
cimflow run pipeline \
-m data/models/resnet18.onnx \
-o output/test_b8 \
-t 8 -k 16 -b 8 -c 64 \
--batch-size 8Lower bandwidth increases communication latency but reduces hardware cost.
cimflow run pipeline \
-m data/models/resnet18.onnx \
-o output/test_batch16 \
-t 8 -k 16 -b 16 -c 64 \
--batch-size 16Larger batches improve throughput by better utilizing the multi-core parallelism.
Trying Different Models
Run with MobileNetV2:
cimflow run pipeline \
-m data/models/mobilenetv2.onnx \
-o output/mobilenet \
-t 8 -k 16 -b 16 -c 64 \
--batch-size 8Keeping Intermediate Files
Use --keep-ir to preserve all intermediate representations:
cimflow run pipeline \
-m data/models/resnet18.onnx \
-o output/debug \
-t 8 -b 16 \
--keep-irWhat gets saved
- CG instructions (JSON) - High-level compute graph operations
- ISA instructions (JSON and assembly) - Per-core instruction sequences
- Detailed logs - Compilation and simulation traces
Verbose Output
Enable detailed logging to see compilation progress:
cimflow run pipeline \
-m data/models/resnet18.onnx \
-o output/verbose \
-t 8 -b 16 \
-l VERBOSELog levels (from most to least verbose): TRACE | DEBUG | VERBOSE | INFO | WARNING | ERROR
What You Learned
- Pipeline Execution — Run end-to-end compilation and simulation with one command
- Results Interpretation — Understand latency, power, TOPS, and efficiency metrics
Last updated on