###### tags: `chalmers` # Imsys (guest lecture) Within Trancoso's Sustainable Computing course. ## Levels (bottom-up) ### Cloud - data inference ($\neq$ training -> requires global data sharing) ### DNN - problem: complexity - solution: "cambrian explosion" many different families (and species) of NN solutions ### Devices - problem: energy efficiency! ### Sensors sensors everywhere - problem: energy efficiency, again! - solutions: - edge computing - distribution, two possible approaches: - well defined interface - extend NN to include edge devices (big black box) ## Trends ### NN complexity - exponential increase in the number of parameters in the average model (doubles every 3.5 months) ### Hardware development - end of Moore's Law since 2012 - memory has became the most limiting factor -> avoid moving data around ### NN learning algorithms - convolution (depthwise = on the same layer, vs point-wise "across channels"* kernels) *like R B and G for a single pixel ### Hardware architecture (accellerators) - hard to program (almost set-in-stone algorithms) - Software on Chip (SoC) - on-off, high energy demanding -> sudden spikes in energy consumption (energy spiking) - Network on Chip (NoC) -> fast data tranfer between topolgically-near cores ## Work distribution among different hardware ``` CPU -> GPU -> TPU -> ΣPU scalar vector tensor kernel ``` ### 8-bit data quantization approximate floats using unsigned integers - pros: much more efficient - const: less accurate, has some hoverhead (time) ### Optimization - operation fusion (basically multiply&accumulate) - layer fusion (save memeory by basically reordering loops in doing convoluitons) - skipping layers -> memory ``` e.g. Tensorflow -> NN Compiler -> Debugger + Simulator/FPGA-Emulator ``` ## Future ### Reserch opportunities - Accellerator Evaluation Platforms ### Industry - Iterative Algorithm Development @Imsys https://en.wikipedia.org/wiki/Quantization_(signal_processing)