# Matlab Advanced May 2021
:::success
## Course practicalities
- **Course organizer**: Enrico Glerean
- **Contact**: enrico.glerean@aalto.fi
- **Zoom Link**: SENT VIA EMAIL
- **hackMD for async course chat**:
- https://hackmd.io/LlTt0oBMRZmqn_0G7a__0A?both
- **Data to play with**: https://drive.google.com/drive/folders/1PHT3eonbNnAVHu9uCEqdLaccgmoq5RjG?usp=sharing
- **Times**: **11, 12, 18, and 19 May at 11:55-15:00** (we always start *5 minutes before 12:00* to have an informal tea/coffee together). 2 x 15 minutes breaks at the beginning of each hour following the first one (i.e. at 13:00 and 14:00)
- **Learning goals**: The goals for this course are ***practical***: to have people to actually solve small advanced matlab problems while learning new tricks.
- **Course structure**: each part starts with a short introduction/background, and is then followed by hands-on session of "doing/coding together". The course is tailored for researchers and students at Aalto university and users of the Triton cluster, however similar workflows might be adapted to other environments.
- **Target audience**: anyone working with Matlab, whether you need to code new scripts or just read/reuse other's people tools.
- **Course credit**: For those who need, the course gives 1ETCS credit (equivalent to 27 hours of work, of which about half are attending the contact sessions). The credit is registered automatically for those who have an Aalto student number. Other participants can request a certificate that can be converted into 1 ECTS.
- **To get the 1 ECTS**: attend all contact sessions, complete the hands-on parts during the sessions, complete the homeworks. If you need to skip more than one lecture, you will be asked to compensate with extra homeworks (this is just to be fair towards other participants, not to punish you!).
- **Homework**: complete
:::
## Day 1 (Tue 11/05/2021) - Matlab workflows at Aalto. Loading and handling datasets efficiently
:::spoiler
Learning Outcomes:
- you know how to start Matlab in multiple systems at Aalto and are able to request more resources when you need
- you are able to load datasets in a format that is not already available in Matlab
- you are able to handle very large datasets efficiently
### Timetable for day 1
| Start | End | Topic | Notes |
| ------| ---- | -------- | -------- |
| 11:55 | 12:00 | Informal tea/coffee break | |
| 12:00 | 12:15 | Practicalities+Icebreaker |
| 12:15 | 12:50 | Matlab workflows |
| 12:50 | 13:00 | Q&A
| 13:00 | 13:15 | Break
| 13:15 | 14:00 | Loading data pt1
| 14:00 | 14:15 | Break
| 14:15 | 15:00 | Loading (big) data pt2
## Workflows at Aalto
Link will be here as well as commands to log in to various sytems
## Exercise 1.1: matlab workflows
1. Connect to https://vdi.aalto.fi and pick ubuntu linux
2. start matlab and run `rng(0);rand` write the number in the hackmd
3. run the command `version` and paste the output on hackmd
4. Type `pwd` and paste the output on hackmd
5. Type `mkdir code; cd code`. What happened?
6. Close Matlab. Then, connect to triton with graphical tunnel and start an interactive session (remember that nothing should on the login node)
7. module load an older version of matlab
8. Do you get the same random number? What is the output of command version?
9. Type `pwd`, is it different than before?
10. Can you go to the "code" subfolder you created earlier?
Use /m/cs/work/USERNAME on VDI or /scratch/work/USERNAME on triton
## Matlab and data
- The experiment workflow
- https://se.mathworks.com/help/matlab/data-import-and-analysis.html?s_tid=CRUX_lftnav
- The MAT file format (save, load, whos). Exercise 1.2
## Exercise 1.2: working with (big) mat files
From the file `bigdata.mat`:
1) count how many variables are stored inside bigdata and write it on hackMD
2) Find the variable with name starting with letter 'x' and ending with letter 'y', load its content in memory
3) compute the global mean of the variable
4) write the result on hackmd
```matlab
variables=whos('-file','bigdata.mat');
for v=1:length(variables)
out=regexp(variables(v).name,'^x.*y$','match');
if(~isempty(out))
disp(out)
load('bigdata.mat',out{1});
break
end
end
eval(['mean(' out{1} '(:))'])
```
---
- Load other types of data. Exercise 1.3
## Exercise 1.3: images timeseries
1. Download and unzip the file "spatio_temporal.zip": each image is a time point
2. load the first 100 images in memory
3. compute the mean across time
4. Write on hackmd the mean value for location (100,100)
5. optional: make a video visualization
```matlab
%% Solution
close all
clear all
N=100;
for n=1:N
filein=['spatio_temporal/IMG' num2str(n,'%05.f') '.png'];
temp=imread(filein);
data(:,:,n)=temp;
figure(1)
imagesc(temp)
colorbar
pause(0.1)
end
%% results
size(data)
M=mean(data,3);
M(100,100)
```
---
## Large datasets.
- Overview https://se.mathworks.com/help/matlab/large-files-and-big-data.html?s_tid=CRUX_lftnav
- MemoryMapping example (let's code together!)
```
%% load the slices in slices.zip and store them as memorymap
% more code will come here
clear all
close all
% let's start with making sure that we can load the slices
temp=imread('slices/200.png');
% we need to count how many images
Z=length(dir('slices/*.png'));
% since we use memoryMapping we need to pre-estimate the size of our data
% that means that we need to know also the size of our images
[X Y]=size(temp);
% I can now prepare the binary file that will be used for memory mapping
fileID = fopen('my_binary_MRI.bin','w');
% here I will load all the slices and store them in the memMap file
for z=1:Z
disp(num2str(z));
data=imread(['slices/' num2str(z) '.png']);
fwrite(fileID,data,'uint8');
end
% close the memmap file
fclose(fileID)
%% another day start from the memMap file directly
clear all
close all
load memMapMRI
% explore the struct
m
% show one slice
imagesc(squeeze(m.Data.x(170,:,:))')
axis xy
colormap('gray')
% compare the memory syze
whos('m').bytes
temp=m.Data.x;
whos('temp').bytes
%% some 3D viz
close all
figure
datatoplot=smooth3(double(m.Data.x),'box',5);
p=patch(isosurface(datatoplot,20));
p.FaceColor = [1 .8 .8];
p.EdgeColor = 'none';
daspect([1 1 1])
view(3);
axis tight
camlight
lighting gouraud
```
### Exercise 1.4 (optional): big data with memoryMaps
1. Use the images in spatio_temporal.zip
2. create a memorymap file for all images
3. extract the time series for coordinate (100,100) and plot it
4. Write 5th 50th and 95th percentiles on hackmd
5. optional: compare memory size of the memory map object, versus memory size of the whole data
:::
## Day 2 (Wed 12/05/2021) - Matlab advanced computing techniques
:::spoiler
Learning Outcomes:
- you know how to profile your code and optimize for speed without parallelization
- you know how to compile C++ code
- you know how to perform machine learning
### Timetable for day 2
| Start | End | Topic | Notes |
| ------| ---- | -------- | -------- |
| 11:55 | 12:00 | Informal tea/coffee break | |
| 12:00 | 12:15 | Practicalities+Icebreaker |
| 12:15 | 13:00 | Matlab code profiling and debugging |
| 13:00 | 13:15 | Break
| 13:15 | 14:00 | Embedding and compiling c++ into Matlab functions
| 14:00 | 14:15 | Break
| 14:15 | 15:00 | Machine learning with Matlab
## Code profiling and improving performance
- Overview of "Software Development Tools" https://se.mathworks.com/help/matlab/software-development.html?s_tid=CRUX_lftnav
- Let's test them! Copy the code block below and save it as a matlab script
- Let's "Run and Time"
- "Code analyzer" in the editor
- Follow tips at https://se.mathworks.com/help/matlab/matlab_prog/techniques-for-improving-performance.html
### Exercise: write a better Pearson's corrlation
```matlab
%% Code performance
% Compare the performance of Pearson's correlation with tic toc
% Run the code profiler to spot what takes most resources
% Use the editor to spot where the code can be improved
% Follow the recomendations at https://se.mathworks.com/help/matlab/matlab_prog/techniques-for-improving-performance.html
clear all
%% Generate two random variables with some autocorrelation
tic
rng(0)
x=0;
y=0;
for n=2:1e4
x(n,1)=randn+x(n-1);
y(n,1)=randn+y(n-1);
end
data_generation_time=toc
%% My correlation
tic
for n=1:1e4
mx=mean(x);
my=mean(y);
sx=std(x);
sy=std(y);
if(sx == 0 | sy == 0)
error('we cannot divide by zero')
end
tempr(n,1)=(x(n) - mx)*(y(n) -my)/(sx*sy);
end
r=sum(tempr)/(length(tempr)-1)
my_correlation_time=toc
%% My correlation as a function
% % Task: Create a function called 'myCorrelation' with a copy paste of the
% % previous section (without the tic toc lines)
% tic
% r_fun=myCorrelation(x,y)
% my_correlation_function_time=toc
%% My better correlation
% use the help of the editor and/or the tips at https://se.mathworks.com/help/matlab/matlab_prog/techniques-for-improving-performance.html
% tic
% ADD CODE HERE for a better version of the correlation block
% my_better_correlation_time = toc
%% Matlab correlation
% Let's check the performance of Matlab's Pearson's correlation implementation
tic
r_matlab = corr(x,y)
matlabs_correlation_time = toc
%% code as a function
% Turn the correlation section into a function, add the function here below
function r = myCorrelation(x,y)
% ADD CODE HERE
end
```
```
%% SOLUTION
%% Code performance
% Compare the performance of Pearson's correlation with tic toc
% Run the code profiler to spot what takes most resources
% Use the editor to spot where the code can be improved
% Follow the recomendations at https://se.mathworks.com/help/matlab/matlab_prog/techniques-for-improving-performance.html
clear all
close all
%% Generate two random variables with some autocorrelation
tic
rng(0)
x=0;
y=0;
for n=2:1e4
x(n,1)=randn+x(n-1);
y(n,1)=randn+y(n-1);
end
data_generation_time=toc
%% My correlation
tic
for n=1:1e4
mx=mean(x);
my=mean(y);
sx=std(x);
sy=std(y);
if(sx == 0 | sy == 0)
error('we cannot divide by zero')
end
tempr(n,1)=(x(n) - mx)*(y(n) -my)/(sx*sy);
end
r=sum(tempr)/(length(tempr)-1)
my_correlation_time=toc
%% My correlation as a function
tic
% % Task: Create a function called 'myCorrelation' with a copy paste of the
% % previous section (without the tic toc lines)
r_fun=myCorrelation(x,y)
my_correlation_function_time=toc
%% My better correlation
tic
% Write a better version of the correlation
mx=mean(x);
my=mean(y);
sx=std(x);
sy=std(y);
if(sx == 0 || sy == 0);
error('we cannot divide by zero');
end
r_better = ((x-mx)'*(y-my))/(sx*sy*(length(x)-1))
my_better_correlation_time = toc
%% Matlab correlation
% Let's check the performance of Matlab's Pearson's correlation implementation
tic
r_matlab = corr(x,y)
matlabs_correlation_time = toc
%% plot the results
figure
h = barh([matlabs_correlation_time my_correlation_time my_correlation_function_time my_better_correlation_time])
yticklabels({'Matlab''s Pearson''s correlation ','my corr','my corr function','my better corr'})
set(gca,'XScale','log')
xlabel('Time [s]')
title('Timings')
figure
h = barh(matlabs_correlation_time./[matlabs_correlation_time my_correlation_time my_correlation_function_time my_better_correlation_time])
yticklabels({'Matlab''s Pearson''s correlation ','my corr','my corr function','my better corr'})
set(gca,'XScale','log')
xlabel('How many times faster')
title('Speed vs Matlab''s function')
%% code as a function
% Turn the correlation into a function, add the function here below
function r = myCorrelation(x,y)
N=length(x);
for n=1:N
mx=mean(x);
my=mean(y);
sx=std(x);
sy=std(y);
if(sx == 0 | sy == 0)
error('we cannot divide by zero')
end
tempr(n,1)=(x(n) - mx)*(y(n) -my)/(sx*sy);
end
r=sum(tempr)/(length(tempr)-1);
end
```
## Embedding c/++ into Matlab
- Reference page https://se.mathworks.com/help/matlab/external-language-interfaces.html?s_tid=CRUX_lftnav
- Exercise together based on https://se.mathworks.com/help/matlab/matlab_external/standalone-example.html
C code to save as `arrayProduct.c`:
```c
/*==========================================================
* arrayProduct.c - example in MATLAB External Interfaces
*
* Multiplies an input scalar (multiplier)
* times a 1xN matrix (inMatrix)
* and outputs a 1xN matrix (outMatrix)
*
* The calling syntax is:
*
* outMatrix = arrayProduct(multiplier, inMatrix)
*
* This is a MEX-file for MATLAB.
* Copyright 2007-2012 The MathWorks, Inc.
*
*========================================================*/
#include "mex.h"
/* The computational routine */
void arrayProduct(double x, double *y, double *z, mwSize n)
{
mwSize i;
/* multiply each element y by x */
for (i=0; i<n; i++) {
z[i] = x * y[i];
}
}
/* The gateway function */
void mexFunction( int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[])
{
double multiplier; /* input scalar */
double *inMatrix; /* 1xN input matrix */
size_t ncols; /* size of matrix */
double *outMatrix; /* output matrix */
/* check for proper number of arguments */
if(nrhs!=2) {
mexErrMsgIdAndTxt("MyToolbox:arrayProduct:nrhs","Two inputs required.");
}
if(nlhs!=1) {
mexErrMsgIdAndTxt("MyToolbox:arrayProduct:nlhs","One output required.");
}
/* make sure the first input argument is scalar */
if( !mxIsDouble(prhs[0]) ||
mxIsComplex(prhs[0]) ||
mxGetNumberOfElements(prhs[0])!=1 ) {
mexErrMsgIdAndTxt("MyToolbox:arrayProduct:notScalar","Input multiplier must be a scalar.");
}
/* make sure the second input argument is type double */
if( !mxIsDouble(prhs[1]) ||
mxIsComplex(prhs[1])) {
mexErrMsgIdAndTxt("MyToolbox:arrayProduct:notDouble","Input matrix must be type double.");
}
/* check that number of rows in second input argument is 1 */
if(mxGetM(prhs[1])!=1) {
mexErrMsgIdAndTxt("MyToolbox:arrayProduct:notRowVector","Input must be a row vector.");
}
/* get the value of the scalar input */
multiplier = mxGetScalar(prhs[0]);
/* create a pointer to the real data in the input matrix */
inMatrix = mxGetPr(prhs[1]);
/* get dimensions of the input matrix */
ncols = mxGetN(prhs[1]);
/* create the output matrix */
plhs[0] = mxCreateDoubleMatrix(1,(mwSize)ncols,mxREAL);
/* get a pointer to the real data in the output matrix */
outMatrix = mxGetPr(plhs[0]);
/* call the computational routine */
arrayProduct(multiplier,inMatrix,outMatrix,(mwSize)ncols);
}
```
And the same in matlab to save as `arrProd.m`:
```matlab
function out=arrProd(s,v);
% Make a function that takes as input parameters:
% A scalar
% A vector of size N
% Checks that the first is a scalar
if(length(s)~=1)
error('first input is not a scalar');
end
% Checks that the second is a vector
if(~(length(v)>1 && any(size(v)==1) && length(size(v)==2)))
error('second input is not a vector');
end
% Multiplies each element of the vector with a for loop
N=length(v);
out=zeros(N,1);
for n=1:N
out(n)=s*v(n);
end
```
- Let's compile and compare! (on VDI)
- How about doing it on Triton?
## Machine learning and Matlab
Matlab or Python? Why not both?
### Getting started quickly with classificationLearner
1. Pick a dataset and load it to workspace
https://se.mathworks.com/help/stats/select-data-and-validation-for-classification-problem.html#buxgihu-1
2. Then follow the steps
https://se.mathworks.com/help/stats/train-classification-models-in-classification-learner-app.html#bu3xete
- open classification learner
- new session from workspace
- select all quick to train
- press train
Recomended from:
https://matlabacademy.mathworks.com/ "Onramp" courses are fast, there are also deeper courses if you want.
:::
## Day 3 (Tue 18/05/2021) - Matlab parallelization
:::spoiler
Learning Outcomes:
- you know how to parallelize your code
- you know how to use Matlab parallel toolbox
- you know how to use the Triton cluster for running parallel Matlab tasks
- you know how to implement GPU processing
### Timetable for day 3
| Start | End | Topic | Notes |
| ------| ---- | -------- | -------- |
| 11:55 | 12:00 | Informal tea/coffee break | |
| 12:00 | 12:15 | Practicalities+Icebreaker |
| 12:15 | 13:00 | Matlab Parallel Computing Toolbox |
| 13:00 | 13:15 | Break
| 13:15 | 14:00 | Running parallel Matlab jobs on Triton
| 14:00 | 14:15 | Break
| 14:15 | 15:00 | GPU computing
## Basic concepts
- Some old slides but still relevant https://www.mathworks.com/content/dam/mathworks/mathworks-dot-com/images/events/matlabexpo/uk/2016/introduction-to-parallel-computing.pdf
- Some video https://www.mathworks.com/videos/parallel-computing-with-matlab-81694.html
- Multithread: some functions are written with parallelization in mind
- https://www.mathworks.com/discovery/matlab-multicore.html
- Multitread on triton
```matlab
srun --cpus-per-task=4 --mem=4G --time=02:00:00 --pty bash
# sinteractive --cpus-per-task=4 --mem=4G --time=02:00:00 -p batch
# ssh to the node to use top
module load matlab
matlab
siz=[8 8 4000000];
data=randn(siz);
tic;fft(data);toc
quit
matlab_multithread
siz=[8 8 4000000];
data=randn(siz);
tic;fft(data);toc
```
- Parallel computing (parallel pool) https://www.mathworks.com/help/parallel-computing/parallel-computing-fundamentals.html
- without communication (parfor)
- without synchrony (parfeval)
- with communication and synchrony (spmd)
- Starting the parallel pool and understanding parfor
```matlab
parpool
gcp
tic;for i=1:16;pause(1);end;toc
tic;parfor i=1:16;pause(1);end;toc
```
- Data matters
```matlab
siz=[8 8 4000000];
data=randn(siz);
tic;fft(data);toc
tic;for i=1:siz(3);fft(data(:,:,i));end;toc
tic;parfor i=1:siz(3);fft(data(:,:,i));end;toc
% repeat with
siz=[8000 8000 4];
```
- Limitations of parfoor
- loop index must be integer and sequential
- you cannot have nested parfors
- you cannot have dependent loop body
- parameter sweeping with parfoor
```matlab
tic
X=1000000;
Y=10;
val=zeros(X,Y);
for x = 1:X
for y = 1:Y
val(x,y)=myparfun(x,y);
end
end
toc
% prepare the sweeping
input=zeros(X*Y,2);
counter=1;
for x = 1:X
for y = 1:Y
input(counter,:)=[x y];
counter=counter+1;
end
end
valout=zeros(size(input,1),1);
tic
for i=1:size(input,1)
valout(i)=myparfun(input(i,1),input(i,2));
end
toc
tic
parfor i=1:size(input,1)
valout(i)=myparfun(input(i,1),input(i,2));
end
toc
function val=myparfun(x,y)
A=randn(16);
val=sum(A(:));
end
```
- Matlab Distributed Computing Server and other "cloud" options (Aalto does not have the license, but we can recreate the same with our Triton cluster)
- gpu examples
```bash
srun --cpus-per-task=4 --mem=8G --gres=gpu:1 --reservation=elgrean_course --pty bash
```
```matlab
d = gpuDevice
CPU = zeros(30,1);
GPU = zeros(30,1);
% Read example image
I = imread("tumor_091R.tif");
% Compute 30 ffts on CPU
for i = 1:size(CPU,1)
tic
fft(I);
CPU(i) = toc;
end
% Create gpuArray
I = gpuArray(I);
% Compute 30 ffts on GPU
for j = 1:size(GPU,1)
tic
fft(I);
wait(gpuDevice)
GPU(j) = toc;
end
% Compare average computing time
avCPU = mean(CPU)
avGPU = mean(GPU)
% Plot results
disp(['The average iteration runtime for CPU is ' num2str(avCPU)])
disp(['The average iteration runtime for GPU is ' num2str(avGPU)])
percentDifference = avCPU/avGPU;
sprintf('Mean speedup factor of %i',round(percentDifference))
figure
histogram(CPU,10)
hold on
histogram(GPU,10)
legend('CPU','GPU')
set(gca,'TickDir','out','FontSize',18)
box off
xlabel('Iteration time [s]')
ylabel('#Occurences')
```
:::
## Day 4 (Tue 19/05/2021) - Matlab advanced graphics OR other topics decided by the participants
:::spoiler {state="open"}
Learning Outcomes:
- you know how to visualize complex data in 2D and 3D
- you know how to create videos with Matlab
- we can cover other topics depending on participants' requests
### Timetable for day 4
| Start | End | Topic | Notes |
| ------| ---- | -------- | -------- |
| 11:55 | 12:00 | Informal tea/coffee break | |
| 12:00 | 12:15 | Practicalities+Icebreaker |
| 12:15 | 13:00 | Matlab figures and handles |
| 13:00 | 13:15 | Break
| 13:15 | 14:00 | Mastering the Patch command
| 14:00 | 14:15 | Break
| 14:15 | 15:00 | Other topics
TL;DR for today: go through all the pages and examples at https://www.mathworks.com/help/matlab/graphics.html?s_tid=CRUX_lftnav :)
:::