3. Multiprocessor Multicore Programming

# 3. Multiprocessor Multicore Programming ###### tags: `Massive computing` ## Problems in multiple cores * **Size** -> both in terms of speed in the electrical signals. * **Electrical consumption** -> Electrical consumption increases linearly with the number of cores. * **Heat** -> As the electrical consumption increases, heat inside the processor also increases. If heat is not disipated, the CPU might burn or melt. ## Superscalar processor cores #TODO: What are execution pipelines in a CPU? **Superscalar processors** execute one - in average - one instruction per clock cycle. It is not always possible to execute an instruction in a single cycle, so *execution pipes* started to be used to overcome this limitation. Although being a great advancement, they had very important security flaws that allowed for the execution of malicious code, causing their production to be stopped. ## Multicore-Multiprocessor architecture A multicore-multiprocessor architecture is comprised by more than one processor per motherboard. There are extra hardware components that synchronize I/O, memory access and a special multichannel memory. It also features a **dedicated external MMU** (Memory Management Unit). ### Pros * Can increase the number of cores, generating faster execution. ### Cons * Need a separate controller for I/O and memory. * Need a more complex OS. ## Multicore Programming >[!info] >Both Constants and Global variables MUST always be defined outside a Parallel environment ### Single thread Many common programs are single thread, i.e., they only require one instruction to be executed at a time in a sequential manner. It does not take full advantage of all the PC resources but it is the safest programming methods as it avoids I/O and memory access conflicts. Critical systems, e.g. the power grid, some airplane systems, will be programmed in this fashion. ### Parallel programming Parallel programs can be classified depending in the relation between data and instructions in the following ways: | | Single instruction | Multiple instructions | |------------- |----- |---- | | Single data | X | MISD | | Multiple data| SIMD | MMIMD | #### Examples of cases * The **dot product** is considered to be SIMD as we have a unique operation (first multiplication, then addition) but many entries. For example, in a vector of length $n$, we would need to apply the **multiplication** operation $n$ times and then sum. If we assume that each operation is completed with single clock cycle, that means that we would need $n$ clock cycles for multiplying the terms, and another $n$ clock cycles for the addition. If we were to have 4 CPU cores to parallelize with, we could be able to perform four multiplications of entries at a time. By doing this the multiplication stage which would last $n$ clock cycles would be reduced to $\frac{n}{4}$ clock cycles. By summing by pairs we can reduce the addition stage from $n$ clock cycles to $log_{cores}(n)$ clock cycles. This is merely an approximation, as we would still perform slightly worse due to overhead and other considerations. ### Types (by granularity) Different parallel programming types can be defined from the dimension of the granularity used. The granularity is the size of the code block that are to be executed by the parallel threads. We have: - **Fine granularity**: parallelize of small operations; - **Middle granularity**: parallelize a group of operations; - **Coarse granularity**: parallelize a big group of operations. #### Considerations - **Fine granularity** will require the usage of control routine to synchronize our processes. This produces more overhead. - **Coarse granularity** will not need as many control routines but it might not exploit the full PC computational power and the data exchange can be costly (due to the its size).