1. Introduction to Massive Computing

# 1. Introduction to Massive Computing ###### tags: `Massive computing` ## What is massive computing? It consists on the processing of large **ammounts of data** and/or having a **big calculation time** for that data. >The goal is to process all the required data using the less time and resources possible. Our main objetive is to make all the necessary calculations in a limited amount of time using a finite amount of resources. We have two main restrictions in our quest for quick and massive computations: we have optimize our instructions, fit all the data in the memory and manage to do all of that with limited resources. ## Terminology * **Multi-task**: Execution of various tasks at once, either processing them at once, or switching between them (not necessarily simultaniously). >If a computer only has one core the OS will alternate between taks. Thus, with a single core, we have multitasking but not simultaneously, for that matter we would need to have two or more cores. * **Multi-processing**: The case of *multi-tasking* when the tasks can be carried out at once (simultaniously). This can only be done with multicore processors, where each core and thread execute a task in parallel. **MULTI-PROCESSING**: Is the ability to execute multiple tasks at the same time simultaneously. For this as previously mentioned we need multiple processor cores (CPU cores). **CPU Core (Computer Processor Unit Core)**: is the hardware that interprets and executes a set of instructions. Depending on the architecture, it could be more powerful or very simple. ![](https://i.imgur.com/xDYc8LG.png) ### Difference between CPU cores and CPU threads %%Introducir aquí el esquema%% When one thread have to wait for data, the other threads take the control of the CPU ## Pros and Cons of parallel processing ### Gustafson's law It is a law that always shows that the runtime of the problem will reduce when using parallel resources ### Well-known dangers in Parallel environments As multiple cores are processing at the same time, problems when accessing the memory can arise. - **Read after write**: TODO - **Write after read**: TODO - **Write after write**: TODO ![](https://i.imgur.com/u2UlROF.png) ## Limitations in parallel processing ### I/O Bound Processes are limited by I/O capabilities. Processes have to always wait to receive data from I/O ports. If we use to much memory then the OS will swap it to the disk, drastically reducing the performance. ### CPU Bound Processes which use the CPU all the time, except when they are uploading data from the memory. These are different kinds of memory and they are limited, specially the fastest ones. The idea is not to exceed the 3rd level of memory. ## Parallel executions Parallel executions consists in splitting calculation and joining them afterwards to obtain the final result. ### Requirements For us to able to use parallel execution each of the sections MUST be independent of other ones. Furthermore, sometimes we need to proceed in a linear way. This is the case with the Fibonacci sequence. Other tasks can be easily parallelized. Others are only *partially* ### Problems with parallel execution Pre-conditions limit the optimal parallelization of tasks. Some of them MUST be performed sequentially. This is a clear *bottleneck*. ## Appendix: History of high-performance Computation The first high-performance computation was done on "*mainframes*" (big shared computers) where only scalar and linear operations where calculated (no vectors, list or arrays could be used). Then, clusters of workstations where started to be used, grouped in networks <span class="centerImg">![](https://i.imgur.com/gywg944.png)</span> Nowadays, clusters of servers are used through high-speed communication networks.