[Paper] Bring Your Own Codegen to Deep Learning Compiler

01 Aug 2023

DNN Requires High Computation and memory requirement, all devices can not communicate to high end servers to compute tensors
Troubles with network latency and data privacy, hence Edge Computing, execute at end
Challenges with lower cost and power budget

Types : Tensors
Stack of compute intensive Convolution and Matrix Multiplication operators as Data flow
Complex Control Flow with non linear structure
Sorting is challenging
IF control Flow in dataflow architecture is challenging

Each Vendors designs its own
- Model representation
- Optimization Sequence
- hardware code
- run model inference with graph execution and data transfers
Paper targets unification of this steps
Issues with Compiler stack
- Execution issues with changes in model
- No automatic workload classification into serial host and parallel accelerattor blocks
- Need of reoptimization
Solving Issues with Compiler stack
- Divide Workload at IR level for different hardware based on cost
- Use Annotation from code of model for offloading
- Framework of hardware independent, hardware specific and code generation

Users can model graph into regions with annotation
classify accelerator friendly regions and host regions
Multi-level IR as data strcture for partitioning, Each op with attributes about computation

Quatization - Data types supported -> Reduce resource utilization and this energy
- User defined quantization vs compiler concluded quantizatio
Layout transformatio
- Tensor Layout can reflect on latency of access
- Adds layout transformation nodes at partition function boundary

Classic CNN good to offload with cost based partitioning, same did not work for object detection model like fast R-CNN or SSD
Operators that cannot group with other compute intesive opse.g. transpose, maximum , resahpe

Related posts