Task 1 [COMPLETED]: Set up experimentation environment.
- Create a new folder
- Move the "dummy data"
- Move the TinyLlama model
- Move the launch.sh and train.py scripts
- Update the singularity to contain nsigh-systems
- Create a bare git repo
- Fix the singularity
- Run the first job without errors
Task 2 [COMPLETED]: Analyze nsys reports
- Download the first nsys report and get familiar with some of the details
- Assign a range for the backward and forward passes to measure their time
- Find one potential way to reduce the time/improve efficiency (maybe by using pinned memory instead of pageable?)
Task 3: Optimize
- Find out how to utilize all 32 CPU cores