# 2023-03-14-NCL ## N8 CIR WHPC HPC Training # https://hackmd.io/@rseteam-ncl/2023-03-14-NCL ## Attendance Name, Email, University 1. Jannetta Steyn, jannetta.steyn@newcastle.ac.uk, Newcastle University 2. Frances Turner, frances.hutchings@newcastle.ac.uk 3. Sarah Griffin, s.griffin@newcastle.ac.uk, Newcastle University 5. Heather Davies, h.h.davies@liverpool.ac.uk, University of Liverpool 6. Filip Biały, bialy@europa-uni.de, European New School of Digital Studies 7. Hallie Jakolins, h.m.jakolins1@ncl.ac.uk, Newcastle University 8. Xingna Zhang, xingna.zhang@liverpool.ac.uk, University of Liverpool 9. Emily Hickinbotham, e.hickinbotham@newcastle.ac.uk, Newcastle University 10. Han Hu, han.hu@liverpool.ac.uk, University of Liverpool 11. Dahui Yu d.yu10@newcastle.ac.uk, Newcastle University 12. Caroline Jeffery, c.jeffery@liverpool.ac.uk, University of Liverpool--- 13. Katy Hollands, katy.hollands@york.ac.uk, University of York 14. Yuzheng Zhang, yuzheng.zhang@durham.ac.uk, Durham University 15. Neha M Ramteke, nmr519@york.ac.uk, University of York 16. Vera Vinken, v.a.vinken2@newcastle.ac.uk, Newcastle University 17. Jack Hardman, j.hardman@newcastle.ac.uk, Newcastle University 18. Hannah Lloyd-hartley h.lloyd-hartley2@ncl.ac.uk 19. Julie Jacob Thomas j.j.thomas3@ncl.ac.uk, Newcastle University 20. Shamil Al-Ameen, s.q.i.al-ameen2@newcastle.ac.uk, Newcastle University 21. Dorcas Ojo, doo513@york.ac.uk, University of York 22. shahram.mesdaghi@liverpool.ac.uk, University of Liverpool 23. Abdulrahman Dallak, a.dallak2@newcastle.ac.uk, Newcastle University 24. P-J Noble, rtnorle@liverpool.ac.uk, University of Liverpool 25. Francesca Ridley, f.ridley1@ncl.ac.uk, Newcastle University [Course webpage](https://nclrse-training.github.io/2023-03-14-NCL/) [Link to HackMD](https://hackmd.io/@rseteam-ncl/2023-03-14-NCL) [Link to SAFE](https://safe.epcc.ed.ac.uk) [Cirrus Documentation](https://cirrus.readthedocs.io/en/main/) ## Python code: pi-mpi-cirrus.py ``` #!/usr/bin/env python3 """Parallel example code for estimating the value of π. We can estimate the value of π by a stochastic algorithm. Consider a circle of radius 1, inside a square that bounds it, with vertices at (1,1), (1,-1), (-1,-1), and (-1,1). The area of the circle is just π, whereas the area of the square is 4. So, the fraction of the area of the square which is covered by the circle is π/4. A point selected at random uniformly from the square thus has a probability π/4 of being within the circle. We can estimate π by examining a large number of randomly-selected points from the square, and seeing what fraction of them lie within the circle. If this fraction is f, then our estimate for π is π ≈ 4f. Thanks to symmetry, we can compute points in one quadrant, rather than within the entire unit square, and arrive at identical results. This task lends itself naturally to parallelization -- the task of selecting a sample point and deciding whether or not it's inside the circle is independent of all the other samples, so they can be done simultaneously. We only need to aggregate the data at the end to compute our fraction f and our estimate for π. """ import mpi4py.rc # Turn off automatic MPI initialisation - the MPI initialization # is invoked explicitly by calling MPI.Init(). mpi4py.rc.initialize = False import numpy as np import sys import datetime from mpi4py import MPI def inside_circle(total_count): """Single-processor task for a group of samples. Generates uniform random x and y arrays of size total_count, on the interval [0,1), and returns the number of the resulting (x,y) pairs which lie inside the unit circle. """ host_name = MPI.Get_processor_name() print(f"Rank {rank} generating {total_count:n} samples on host {host_name}.") x = np.float64(np.random.uniform(size=total_count)) y = np.float64(np.random.uniform(size=total_count)) radii = np.sqrt(x*x + y*y) count = len(radii[np.where(radii<=1.0)]) return count if __name__ == '__main__': """Main executable. This function runs the 'inside_circle' function with a defined number of samples. The results are then used to estimate π. An estimate of the required memory, elapsed calculation time, and accuracy of calculating π are also computed. """ # Initialise MPI explicitly MPI.Init() # Declare an MPI Communicator for the parallel processes to talk through comm = MPI.COMM_WORLD # Read the number of parallel processes tied into the comm channel cpus = comm.Get_size() # Find out the index ("rank") of *this* process rank = comm.Get_rank() n_samples = int(sys.argv[1]) start_time = datetime.datetime.now() counts = inside_circle(n_samples) if rank == 0: # Time how long it takes to estimate π. start_time = datetime.datetime.now() # Rank zero builds two arrays with one entry for each rank: # one for the number of samples they should run, and # one to store the count info each rank returns. partitions = [ int(n_samples / cpus) ] * cpus counts = [ int(0) ] * cpus else: partitions = None counts = None # All ranks participate in the "scatter" operation, which assigns # the local scalar values to their appropriate array components. # partition_item is the number of samples this rank should generate, # and count_item is the place to put the number of counts we see. partition_item = comm.scatter(partitions, root=0) # Each rank locally populates its count_item variable. count_item = inside_circle(partition_item) # All ranks participate in the "gather" operation, which sums the # rank's count_items into the total "counts". counts = comm.gather(count_item, root=0) if rank == 0: # Only rank zero writes the result, although it's known to all. my_pi = 4.0 * sum(counts) / sum(partitions) end_time = datetime.datetime.now() elapsed_time = (end_time - start_time).total_seconds() # Memory required is dominated by the size of x, y, and radii from # inside_circle(), calculated in MiB size_of_float = np.dtype(np.float64).itemsize memory_required = 3 * sum(partitions) * size_of_float / (1024**3) # accuracy is calculated as a percent difference from a known estimate of π. pi_specific = np.pi accuracy = 100*(1-my_pi/pi_specific) print(f"Pi: {my_pi:6f}, memory: {memory_required:6f} GiB, time: {elapsed_time:6f} s, error: {accuracy:6f}%") ``` ## Linux Command Line Cheat Sheet HPCs will be based on linux systems and the main way that you interact with them is through the command line. Here is a quick reminder of common commands that you will need: `ls` list what is in the current directory `ls path/to/folder` list what is in the directory you have given the path to. `ls -a` list what is in the current directory including hidden files `cd path/to/folder` change directory to the specified folder `cd ..` change directory to the folder above the current folder `pwd` print the file path of the current working directory (to find out where you are) `cp file1 file2` Make a copy of a file, in this example copy file1 and create a duplicate called file2. You can give a full file path to make a copy in a different location. `mv file1 new/location/file1` Move a file, in this case file 1 is moved to a new location in a subfolder. `mv file1 file2` If you move a file to the same directory, this will act to rename the file. This example will rename file1 to file2. `curl weblink` download contents from a given location on the internet, replace 'weblink' with the location. `wget weblink` download contents from a given location on the internet - this is not available on windows by default. An alternative to curl. `nano filename` Open a file with the text editor nano. NB: nano commands are shown in the nano screen, with a '^' meaning ctrl. E.g. ^x to exit means press ctrl x to exit. ## Windows PowerShell Cheat Sheet If you are using windows and don't have a linux command line option available (e.g. git bash or linux subsystem for windows) then you can launch the windows powershell by searching programs for 'cmd'. The below is a quick cheat sheet for common commands for using the windows power shell: `dir` list what is in the current directory `cd` as with linux, this will let you change directory `notepad filename` nano is not installed by default on windows so if you need to edit a document from the powershell command line you will need to use notepad, unless you have an alternative installed.