kosl

@kosl

Joined on Sep 29, 2021

  • Intro to parallel programming 1.1 Welcome to Week 1 Welcome to the course Introduction to parallel programming! I will be your guide through the weeks. In the first week, we will first introduce the basic principles and paradigms of parallel programming. We will not go deeply into each topic since we will dedicate that to the upcoming weeks, touching even on some advanced topics in parallel programming. We describe the more advanced topics with the purpose of telling you what is important to know and giving you a starting point on where to learn more. We hope that at some point you will be able to do your coding and advance your learning beyond this course. Many scientific and engineering challenges can be tackled in the area of computing. In general, we have two views on it. One view is distributed serial computing. This means that you have many problems to solve and you don't care about time. The other is parallel computing, where the problem needs to be divided into many compute cores or nodes, essentially many computers because it is too big to fit into one computer or one computer would be too slow to solve it. The first approach is often called grid computing, while the second is generally referred to as supercomputing, where you need a supercomputer to solve your problem. You have probably also heard of cloud computing or similar, which is a commercial version of grid or supercomputing resources. If you are a serious user, you will quickly learn this. If we use an analogy, buying a car is cheaper than renting it in the long term. There will never be a commercial offering that will match individual purchases for the machine which is usually 90 or 100% utilized. So it's not like buying a car for fun and spending a lot on it. Supercomputers are usually very heavily utilized. This means that during their lifetime, the computer operates near peak performance. The program is usually written and compiled into instructions for serial execution on one processor (see image below). You have to divide the problem into separate subproblems that can be executed in parallel on many processors available and achieve speedup.
     Like 1 Bookmark
  • Beyond OpenMP and MPI: GPU parallelisation 5.1 Welcome to Week 5 In this week we will go beyond classical parallelisation paradigms (like OpenMP and MPI) which are generally associated with CPUs. We will talk about Graphics Processing Units (GPUs) that were originally developed for executing graphical tasks (including rendering in computer games) but can also be used for general-purpose computing in many fields. We have already introduced GPUs in the first week. Through the steps of this week we will give some more details. We shall present the architecture of modern GPUs and then the available GPU execution models for computing acceleration. OpenMP as a means for "off-loading" parallel tasks to GPUs will be shortly introduced, while the main focus will be dedicated to two extensions to C for programming GPUs: CUDA and OpenCL. On one hand, directive-based GPU off-loading is simple to use but it's quite limited, contrary to GPU programming which requires more effort but offers more flexibility. Hopefully, the provided hands-on examples will introduce you to the topic of GPU programming as smoothly as possible. 5.2 GPU Architecture To better understand the capabilities of GPUs in terms of computing acceleration, we will have a look at a typical architecture of a modern GPU. As we already pointed out, GPUs were originally designed to accelerate graphics. They excel at operations (such as shading and rendering) on graphical primitives which constitute a 3D graphical object. The main characteristic of these primitives is that they are independent or, in other words, they can be processed independently in a parallel fashion. Thus, GPU acceleration of graphics was designed for the execution of inherently parallel tasks. On the other hand, CPUs are designed to execute the workflow of any general-purpose program, where many parallel tasks may not be involved. These different design principles reflect the fact that GPUs have many more processing units and higher memory bandwidth, while CPUs are characterized by more specialized processing of instructions and faster clock speed rates.
     Like  Bookmark
  • Message Passing Interface (MPI) 3.1 Welcome to Week 3 Message Passing Interface (MPI) presented in this week will introduce communicators and how processes communicate messages. Sending and receiving messages can be used in one-to-one communication. One-to-many broadcast and scatter messages can be distributed and collected back with reduce and gather operations. All these collective operations, that are frequently used, are provided with the MPI library. We will present differences of those MPI functions through many examples and execises that you may explore further. The structure of this week was inspired by HLRS MPI courses (courtesy: Rolf Rabenseifner (HLRS)). 3.2 Communicator in MPI In the introduction to MPI in the first week we already saw the simple exercise of Hello world. Yet, in order to actually write some useful applications, we will need to learn some basic routines. To start, we need to understand what a communicator in MPI is. As we launch MPI, the entire environment puts all the processes and the cores that are involved in this application and binds them together in what is referred to as a communicator. A communicator is like a set that binds all the processes together and corroborates that only the processes that are together in an application can communicate with each other. The default communicator that we will use most often is
     Like  Bookmark
  • Getting started with OpenMP 2.1 Welcome to Week 2 OpenMP is a shared memory threaded programming model that can be used to incrementally upgrade existing code for parallel execution. Upgraded code can still be compiled serially. This is a great feature of OpenMP that gives you opportunity to check if parallel execution provides the same results as a serial version. To understand the automatic parallelisation better we will initially take a look into runtime functions. They are usually not needed in simple OpenMP programs. Some basic compiler or pragma directives and scope of variables will be introduced to understand the logic of threaded access to the memory. Work sharing directives and synchronisation of threads will be discussed within few examples. How to collect results from threads will be shown with common reduction clauses. At the end of this week we will present interesting task based parallelism approach. Don’t forget to experiment with exercises because those are your main learning opportunity. Let's dive in. The structure of this week was inspired by HLRS OpenMP courses (courtesy: Rolf Rabenseifner (HLRS)). 2.2 Runtime functions The purpose of runtime functions is the management or modification of the parallel processes that we want to use in our code. They come with the OpenMP library.
     Like  Bookmark