Task 1 [COMPLETED]: Set up experimentation environment. - Create a new folder - Move the "dummy data" - Move the TinyLlama model - Move the launch.sh and train.py scripts - Update the singularity to contain nsigh-systems - Create a bare git repo - Fix the singularity - Run the first job without errors Task 2 [COMPLETED]: Analyze nsys reports - Download the first nsys report and get familiar with some of the details - Assign a range for the backward and forward passes to measure their time - Find one potential way to reduce the time/improve efficiency (maybe by using pinned memory instead of pageable?) Task 3: Optimize - Find out how to utilize all 32 CPU cores