Try   HackMD

Matlab Advanced May 2021

Course practicalities

  • Course organizer: Enrico Glerean
  • Contact: enrico.glerean@aalto.fi
  • Zoom Link: SENT VIA EMAIL
  • hackMD for async course chat:
  • Data to play with: https://drive.google.com/drive/folders/1PHT3eonbNnAVHu9uCEqdLaccgmoq5RjG?usp=sharing
  • Times: 11, 12, 18, and 19 May at 11:55-15:00 (we always start 5 minutes before 12:00 to have an informal tea/coffee together). 2 x 15 minutes breaks at the beginning of each hour following the first one (i.e. at 13:00 and 14:00)
  • Learning goals: The goals for this course are practical: to have people to actually solve small advanced matlab problems while learning new tricks.
  • Course structure: each part starts with a short introduction/background, and is then followed by hands-on session of "doing/coding together". The course is tailored for researchers and students at Aalto university and users of the Triton cluster, however similar workflows might be adapted to other environments.
  • Target audience: anyone working with Matlab, whether you need to code new scripts or just read/reuse other's people tools.
  • Course credit: For those who need, the course gives 1ETCS credit (equivalent to 27 hours of work, of which about half are attending the contact sessions). The credit is registered automatically for those who have an Aalto student number. Other participants can request a certificate that can be converted into 1 ECTS.
  • To get the 1 ECTS: attend all contact sessions, complete the hands-on parts during the sessions, complete the homeworks. If you need to skip more than one lecture, you will be asked to compensate with extra homeworks (this is just to be fair towards other participants, not to punish you!).
  • Homework: complete

Day 1 (Tue 11/05/2021) - Matlab workflows at Aalto. Loading and handling datasets efficiently

Learning Outcomes:

  • you know how to start Matlab in multiple systems at Aalto and are able to request more resources when you need
  • you are able to load datasets in a format that is not already available in Matlab
  • you are able to handle very large datasets efficiently

Timetable for day 1

Start End Topic Notes
11:55 12:00 Informal tea/coffee break
12:00 12:15 Practicalities+Icebreaker
12:15 12:50 Matlab workflows
12:50 13:00 Q&A
13:00 13:15 Break
13:15 14:00 Loading data pt1
14:00 14:15 Break
14:15 15:00 Loading (big) data pt2

Workflows at Aalto

Link will be here as well as commands to log in to various sytems

Exercise 1.1: matlab workflows

  1. Connect to https://vdi.aalto.fi and pick ubuntu linux
  2. start matlab and run rng(0);rand write the number in the hackmd
  3. run the command version and paste the output on hackmd
  4. Type pwd and paste the output on hackmd
  5. Type mkdir code; cd code. What happened?
  6. Close Matlab. Then, connect to triton with graphical tunnel and start an interactive session (remember that nothing should on the login node)
  7. module load an older version of matlab
  8. Do you get the same random number? What is the output of command version?
  9. Type pwd, is it different than before?
  10. Can you go to the "code" subfolder you created earlier?

Use /m/cs/work/USERNAME on VDI or /scratch/work/USERNAME on triton

Matlab and data

Exercise 1.2: working with (big) mat files

From the file bigdata.mat:

  1. count how many variables are stored inside bigdata and write it on hackMD
  2. Find the variable with name starting with letter 'x' and ending with letter 'y', load its content in memory
  3. compute the global mean of the variable
  4. write the result on hackmd
variables=whos('-file','bigdata.mat');

for v=1:length(variables)
    out=regexp(variables(v).name,'^x.*y$','match');
    if(~isempty(out))
        disp(out)
        load('bigdata.mat',out{1});
        break
    end
end

eval(['mean(' out{1} '(:))'])

  • Load other types of data. Exercise 1.3

Exercise 1.3: images timeseries

  1. Download and unzip the file "spatio_temporal.zip": each image is a time point
  2. load the first 100 images in memory
  3. compute the mean across time
  4. Write on hackmd the mean value for location (100,100)
  5. optional: make a video visualization
%% Solution

close all
clear all

N=100;

for n=1:N
    filein=['spatio_temporal/IMG' num2str(n,'%05.f') '.png'];
    temp=imread(filein);
    data(:,:,n)=temp;
    figure(1)
    imagesc(temp)
    colorbar
    pause(0.1)
end
    
%% results
size(data)
M=mean(data,3);
M(100,100)



Large datasets.

%% load the slices in slices.zip and store them as memorymap
% more code will come here
clear all
close all

% let's start with making sure that we can load the slices

temp=imread('slices/200.png');

% we need to count how many images

Z=length(dir('slices/*.png'));

% since we use memoryMapping we need to pre-estimate the size of our data
% that means that we need to know also the size of our images
[X Y]=size(temp);

% I can now prepare the binary file that will be used for memory mapping
fileID = fopen('my_binary_MRI.bin','w');
% here I will load all the slices and store them in the memMap file
for z=1:Z
   disp(num2str(z));
   data=imread(['slices/' num2str(z) '.png']); 
   fwrite(fileID,data,'uint8');
end
% close the memmap file
fclose(fileID)

%% another day start from the memMap file directly
clear all
close all
load memMapMRI

% explore the struct
m

% show one slice
imagesc(squeeze(m.Data.x(170,:,:))')
axis xy
colormap('gray')

% compare the memory syze
whos('m').bytes
temp=m.Data.x;
whos('temp').bytes

%% some 3D viz
close all
figure
datatoplot=smooth3(double(m.Data.x),'box',5);
p=patch(isosurface(datatoplot,20));
p.FaceColor = [1 .8 .8];
p.EdgeColor = 'none';
daspect([1 1 1])
view(3); 
axis tight
camlight 
lighting gouraud

Exercise 1.4 (optional): big data with memoryMaps

  1. Use the images in spatio_temporal.zip
  2. create a memorymap file for all images
  3. extract the time series for coordinate (100,100) and plot it
  4. Write 5th 50th and 95th percentiles on hackmd
  5. optional: compare memory size of the memory map object, versus memory size of the whole data

Day 2 (Wed 12/05/2021) - Matlab advanced computing techniques

Learning Outcomes:

  • you know how to profile your code and optimize for speed without parallelization
  • you know how to compile C++ code
  • you know how to perform machine learning

Timetable for day 2

Start End Topic Notes
11:55 12:00 Informal tea/coffee break
12:00 12:15 Practicalities+Icebreaker
12:15 13:00 Matlab code profiling and debugging
13:00 13:15 Break
13:15 14:00 Embedding and compiling c++ into Matlab functions
14:00 14:15 Break
14:15 15:00 Machine learning with Matlab

Code profiling and improving performance

Exercise: write a better Pearson's corrlation

%% Code performance
% Compare the performance of Pearson's correlation with tic toc
% Run the code profiler to spot what takes most resources
% Use the editor to spot where the code can be improved
% Follow the recomendations at https://se.mathworks.com/help/matlab/matlab_prog/techniques-for-improving-performance.html

clear all

%% Generate two random variables with some autocorrelation
tic
rng(0)
x=0;
y=0;
for n=2:1e4
    x(n,1)=randn+x(n-1);
    y(n,1)=randn+y(n-1);
end
data_generation_time=toc

%% My correlation
tic
for n=1:1e4
    mx=mean(x);
    my=mean(y);
    sx=std(x);
    sy=std(y);
    if(sx == 0 | sy == 0)
        error('we cannot divide by zero')
    end
    tempr(n,1)=(x(n) - mx)*(y(n) -my)/(sx*sy);
end

r=sum(tempr)/(length(tempr)-1)
my_correlation_time=toc

%% My correlation as a function


% % Task: Create a function called 'myCorrelation' with a copy paste of the
% % previous section (without the tic toc lines)

% tic
% r_fun=myCorrelation(x,y)
% my_correlation_function_time=toc


%% My better correlation

% use the help of the editor and/or the tips at  https://se.mathworks.com/help/matlab/matlab_prog/techniques-for-improving-performance.html

% tic
% ADD CODE HERE  for a better version of the correlation block
% my_better_correlation_time = toc


%% Matlab correlation

% Let's check the performance of Matlab's Pearson's correlation implementation
tic
r_matlab = corr(x,y)
matlabs_correlation_time = toc



%% code as a function

% Turn the correlation section into a function, add the function here below

function r = myCorrelation(x,y)

% ADD CODE HERE 

end
%% SOLUTION
%% Code performance
% Compare the performance of Pearson's correlation with tic toc
% Run the code profiler to spot what takes most resources
% Use the editor to spot where the code can be improved
% Follow the recomendations at https://se.mathworks.com/help/matlab/matlab_prog/techniques-for-improving-performance.html

clear all
close all

%% Generate two random variables with some autocorrelation
tic
rng(0)
x=0;
y=0;
for n=2:1e4
    x(n,1)=randn+x(n-1);
    y(n,1)=randn+y(n-1);
end
data_generation_time=toc

%% My correlation
tic
for n=1:1e4
    mx=mean(x);
    my=mean(y);
    sx=std(x);
    sy=std(y);
    if(sx == 0 | sy == 0)
        error('we cannot divide by zero')
    end
    tempr(n,1)=(x(n) - mx)*(y(n) -my)/(sx*sy);
end

r=sum(tempr)/(length(tempr)-1)
my_correlation_time=toc

%% My correlation as a function

tic
% % Task: Create a function called 'myCorrelation' with a copy paste of the
% % previous section (without the tic toc lines)

r_fun=myCorrelation(x,y)

my_correlation_function_time=toc


%% My better correlation
tic

% Write a better version of the correlation

mx=mean(x);
my=mean(y);
sx=std(x);
sy=std(y);
if(sx == 0 || sy == 0);
    error('we cannot divide by zero');
end
r_better = ((x-mx)'*(y-my))/(sx*sy*(length(x)-1))

my_better_correlation_time = toc


%% Matlab correlation

% Let's check the performance of Matlab's Pearson's correlation implementation
tic
r_matlab = corr(x,y)
matlabs_correlation_time = toc

%% plot the results
figure
h = barh([matlabs_correlation_time my_correlation_time my_correlation_function_time my_better_correlation_time])
yticklabels({'Matlab''s Pearson''s correlation ','my corr','my corr function','my better corr'})
set(gca,'XScale','log')
xlabel('Time [s]')
title('Timings')

figure
h = barh(matlabs_correlation_time./[matlabs_correlation_time my_correlation_time my_correlation_function_time my_better_correlation_time])
yticklabels({'Matlab''s Pearson''s correlation ','my corr','my corr function','my better corr'})
set(gca,'XScale','log')
xlabel('How many times faster')
title('Speed vs Matlab''s function')



%% code as a function

% Turn the correlation into a function, add the function here below

function r = myCorrelation(x,y)
N=length(x);
for n=1:N
    mx=mean(x);
    my=mean(y);
    sx=std(x);
    sy=std(y);
    if(sx == 0 | sy == 0)
        error('we cannot divide by zero')
    end
    tempr(n,1)=(x(n) - mx)*(y(n) -my)/(sx*sy);
end

r=sum(tempr)/(length(tempr)-1);

end

Embedding c/++ into Matlab

C code to save as arrayProduct.c:

/*==========================================================
 * arrayProduct.c - example in MATLAB External Interfaces
 *
 * Multiplies an input scalar (multiplier) 
 * times a 1xN matrix (inMatrix)
 * and outputs a 1xN matrix (outMatrix)
 *
 * The calling syntax is:
 *
 *		outMatrix = arrayProduct(multiplier, inMatrix)
 *
 * This is a MEX-file for MATLAB.
 * Copyright 2007-2012 The MathWorks, Inc.
 *
 *========================================================*/

#include "mex.h"

/* The computational routine */
void arrayProduct(double x, double *y, double *z, mwSize n)
{
    mwSize i;
    /* multiply each element y by x */
    for (i=0; i<n; i++) {
        z[i] = x * y[i];
    }
}

/* The gateway function */
void mexFunction( int nlhs, mxArray *plhs[],
                  int nrhs, const mxArray *prhs[])
{
    double multiplier;              /* input scalar */
    double *inMatrix;               /* 1xN input matrix */
    size_t ncols;                   /* size of matrix */
    double *outMatrix;              /* output matrix */

    /* check for proper number of arguments */
    if(nrhs!=2) {
        mexErrMsgIdAndTxt("MyToolbox:arrayProduct:nrhs","Two inputs required.");
    }
    if(nlhs!=1) {
        mexErrMsgIdAndTxt("MyToolbox:arrayProduct:nlhs","One output required.");
    }
    /* make sure the first input argument is scalar */
    if( !mxIsDouble(prhs[0]) || 
         mxIsComplex(prhs[0]) ||
         mxGetNumberOfElements(prhs[0])!=1 ) {
        mexErrMsgIdAndTxt("MyToolbox:arrayProduct:notScalar","Input multiplier must be a scalar.");
    }
    
    /* make sure the second input argument is type double */
    if( !mxIsDouble(prhs[1]) || 
         mxIsComplex(prhs[1])) {
        mexErrMsgIdAndTxt("MyToolbox:arrayProduct:notDouble","Input matrix must be type double.");
    }
    
    /* check that number of rows in second input argument is 1 */
    if(mxGetM(prhs[1])!=1) {
        mexErrMsgIdAndTxt("MyToolbox:arrayProduct:notRowVector","Input must be a row vector.");
    }
    
    /* get the value of the scalar input  */
    multiplier = mxGetScalar(prhs[0]);

    /* create a pointer to the real data in the input matrix  */
    inMatrix = mxGetPr(prhs[1]);

    /* get dimensions of the input matrix */
    ncols = mxGetN(prhs[1]);

    /* create the output matrix */
    plhs[0] = mxCreateDoubleMatrix(1,(mwSize)ncols,mxREAL);

    /* get a pointer to the real data in the output matrix */
    outMatrix = mxGetPr(plhs[0]);

    /* call the computational routine */
    arrayProduct(multiplier,inMatrix,outMatrix,(mwSize)ncols);
}

And the same in matlab to save as arrProd.m:

function out=arrProd(s,v);

% Make a function that takes as input parameters:
% A scalar
% A vector of size N
% Checks that the first is a scalar

if(length(s)~=1) 
    error('first input is not a scalar'); 
end

% Checks that the second is a vector
if(~(length(v)>1 && any(size(v)==1) && length(size(v)==2))) 
    error('second input is not a vector'); 
end


% Multiplies each element of the vector with a for loop
N=length(v);
out=zeros(N,1);
for n=1:N
   out(n)=s*v(n); 
end
  • Let's compile and compare! (on VDI)
  • How about doing it on Triton?

Machine learning and Matlab

Matlab or Python? Why not both?

Getting started quickly with classificationLearner

  1. Pick a dataset and load it to workspace
    https://se.mathworks.com/help/stats/select-data-and-validation-for-classification-problem.html#buxgihu-1

  2. Then follow the steps
    https://se.mathworks.com/help/stats/train-classification-models-in-classification-learner-app.html#bu3xete

    • open classification learner
    • new session from workspace
    • select all quick to train
    • press train

Recomended from:
https://matlabacademy.mathworks.com/ "Onramp" courses are fast, there are also deeper courses if you want.

Day 3 (Tue 18/05/2021) - Matlab parallelization

Learning Outcomes:

  • you know how to parallelize your code
  • you know how to use Matlab parallel toolbox
  • you know how to use the Triton cluster for running parallel Matlab tasks
  • you know how to implement GPU processing

Timetable for day 3

Start End Topic Notes
11:55 12:00 Informal tea/coffee break
12:00 12:15 Practicalities+Icebreaker
12:15 13:00 Matlab Parallel Computing Toolbox
13:00 13:15 Break
13:15 14:00 Running parallel Matlab jobs on Triton
14:00 14:15 Break
14:15 15:00 GPU computing

Basic concepts

srun --cpus-per-task=4 --mem=4G --time=02:00:00 --pty bash
# sinteractive --cpus-per-task=4 --mem=4G --time=02:00:00 -p batch 
# ssh to the node to use top

module load matlab
matlab
siz=[8 8 4000000];
data=randn(siz);
tic;fft(data);toc
quit

matlab_multithread
siz=[8 8 4000000];
data=randn(siz);
tic;fft(data);toc
parpool
gcp
tic;for i=1:16;pause(1);end;toc
tic;parfor i=1:16;pause(1);end;toc
  • Data matters
siz=[8 8 4000000];
data=randn(siz);
tic;fft(data);toc
tic;for i=1:siz(3);fft(data(:,:,i));end;toc
tic;parfor i=1:siz(3);fft(data(:,:,i));end;toc

% repeat with
siz=[8000 8000 4];

  • Limitations of parfoor

    • loop index must be integer and sequential
    • you cannot have nested parfors
    • you cannot have dependent loop body
  • parameter sweeping with parfoor


tic
X=1000000;
Y=10;
val=zeros(X,Y);
for x = 1:X
    for y = 1:Y
        val(x,y)=myparfun(x,y);
    end
end
toc

% prepare the sweeping

input=zeros(X*Y,2);
counter=1;
for x = 1:X
    for y = 1:Y
        input(counter,:)=[x y];
        counter=counter+1;
    end
end

valout=zeros(size(input,1),1);
tic
for i=1:size(input,1)
    valout(i)=myparfun(input(i,1),input(i,2));
end
toc

tic
parfor i=1:size(input,1)
    valout(i)=myparfun(input(i,1),input(i,2));
end
toc


function val=myparfun(x,y)
    A=randn(16);
    val=sum(A(:));
end
  • Matlab Distributed Computing Server and other "cloud" options (Aalto does not have the license, but we can recreate the same with our Triton cluster)
  • gpu examples
srun --cpus-per-task=4 --mem=8G --gres=gpu:1 --reservation=elgrean_course --pty bash
d = gpuDevice
CPU = zeros(30,1);
GPU = zeros(30,1);
% Read example image
I = imread("tumor_091R.tif");

% Compute 30 ffts on CPU
for i = 1:size(CPU,1)
    tic
    fft(I);
    CPU(i) = toc;
end
% Create gpuArray
I = gpuArray(I);
% Compute 30 ffts on GPU
for j = 1:size(GPU,1)
    tic
    fft(I);
    wait(gpuDevice)
    GPU(j) = toc;
end
% Compare average computing time
avCPU = mean(CPU)
avGPU = mean(GPU)
% Plot results
disp(['The average iteration runtime for CPU is ' num2str(avCPU)])
disp(['The average iteration runtime for GPU is ' num2str(avGPU)])
percentDifference = avCPU/avGPU;
sprintf('Mean speedup factor of %i',round(percentDifference))
figure
histogram(CPU,10)
hold on
histogram(GPU,10)
legend('CPU','GPU')
set(gca,'TickDir','out','FontSize',18)
box off
xlabel('Iteration time [s]')
ylabel('#Occurences')

Day 4 (Tue 19/05/2021) - Matlab advanced graphics OR other topics decided by the participants

Learning Outcomes:

  • you know how to visualize complex data in 2D and 3D
  • you know how to create videos with Matlab
  • we can cover other topics depending on participants' requests

Timetable for day 4

Start End Topic Notes
11:55 12:00 Informal tea/coffee break
12:00 12:15 Practicalities+Icebreaker
12:15 13:00 Matlab figures and handles
13:00 13:15 Break
13:15 14:00 Mastering the Patch command
14:00 14:15 Break
14:15 15:00 Other topics

TL;DR for today: go through all the pages and examples at https://www.mathworks.com/help/matlab/graphics.html?s_tid=CRUX_lftnav :)