# Optimization course planning
## Course description
### Node-Level Performance Optimization
This course covers advanced topics on code optimization for x86 platforms (Intel and AMD CPUs). We discuss different techniques for analyzing and maximizing both single and multi-core performance within a single node. The topics include instruction-level parallelism, vectorization, and efficient utilization of cache and memory. The course consists of lectures and hands-on exercises.
Learning outcome
- Awareness of features and internal workings of x86 CPUs
- Ability to analyze and assess single-node performance
- Ability to vectorize computations
- Ability to optimize cache and memory access
Prerequisites
- Good knowledge of C/C++ or Fortran
- Good knowledge of threading using OpenMP
- Basic knowledge of modern CPU architectures
Agenda
Day 1
- Overview about performance engineering
- General overview of modern multicore CPU
- Main memory performance
- Performance analysis tools
Day 2
- Deeper dive into caches
- Detailed look into Intel and AMD CPUs
- Advanced vectorization
- Additional optimization topics
## Detailed agenda
Time: Thu-Fri 23.5. - 24.5.
## Day 1
- 09:00 - 09:10 Welcome the course, introductions
- 09:10 - 09:40 Overview about performance engineering (Martti)
- principles, sampling, tracing
- hardware counters, PAPI
- roofline model
- 09:40 - 09:50 break
- 09:50 - 10:30 General overview of modern multicore CPU (Jussi)
- front-end, back-end
- fetch-decode-execute, pipelining
- vectorization
- short overview of cache hierarchy and NUMA
- 10:30 - 11:15 Exercise
- peak flops, effects of pipelining, registers
- vectorization
- hw counters with perf?
- 11:15 - 12:00 Main memory performance (Jussi)
- main memory bandwidth
- brief mention about cache hierachy
- memory controllers
- NUMA
- single vs. multiple threads
- first touch
- thread affinity
- 12:00 - 13:00 Lunch break
- 13:00 - 13:45 Exercise
- stream with different number of threads / affinities
- first touch effects
- 13:45 - 14:15 Amduprof? (Igor)
- 14:15 - 14:30 Coffee
- 14:30 - 15:15 Intel VTune and Advisor (Mikko)
- 15:15 - 16:00 Exercises
- Playing around with tools
16:00 - 16:15 Summary / Q & A
## Day 2
- 09:00 - 10:00 Deeper dive into caches (Martti)
- cache line, alignment, assosiativity
- basic optimization ideas
- interactive demos with HW counters
- 10:00 - 11:00 Exercises
- 11:00 - 12:00 Deeper dive into Intel and AMD CPUs (Mikko + Igor)
- 12:00 - 13:00 Lunch
- 13:00 - 13:45 Deeper dive into vectorization (Mikko)
- 13:45 - 14:30 Exercises
- 14:30 - 14:45 Coffee break
- 14:45 - 15:15 Other optimisation topics (Jussi)
- loop transformations etc.
- 15:15 - 16:00 Exercises
- 16:00 - 16:15 Summary / Q&A
## Potentially useful links
- https://github.com/google/benchmark
- https://crd.lbl.gov/departments/computer-science/par/research/roofline/
- https://github.com/Kobzol/hardware-effects
- https://gcc.godbolt.org