Final presentation

# Final presentation Target WPM: 130 * Linn (30 s = 65 words) Simulations and games need high performing and easy-to-use engines * what is an engine? * why are they important? * science: atomecs et al. * games: overwatch, minecraft * Linn (30 s = 65 words) We have developed, RECS, a simulation engine which uses the ECS architecture... * Weave in our purpose * Linn (4 min = 520 words) Explain ECS * Entities, Components and Systems * how it differs from the norm of OOP * ECS decoupled logic and data, while OOP couples them in a class * drawbacks of OOP, benefits of ECS * modern CPUs (i.e. cache and cores) are better utilized in ECS * better design through composition of components on entities * decoupled data and logic, allowing systems to apply to anything * Linn (30s = 65 words) Possibly recount our purpose and goal * Martin (3 min = 390 words) Why is ECS fast? * Concurrency * thanks to the decoupling of logic and data, it becomes easy to parallelize system execution across available threads * system scheduler, and determining which systems are safe to execute concurrently * Cache * things get faster when they are contiguous in memory * put stuff together in memory, avoid creating "holes" * compare to OOP: classes store data that isn't necessarily accessed together next to each other, while ECS only stores data that will be accessed together closely => better use of cache in ECS * RECS makes use of entity archetypes to achieve this * Mathias (1 min 30s = 195 words) (Method) * prototyping * MVP * engine development * case studies * n-body simulation * rain simulation * Mathias (4 min = 520 words) Overview of engine functionality (weave in the case studies as examples) * Definition and registration of user-defined systems * Show how a system is declared and how parameters are used * Comparison of how simple RECS is vs. how complicated EnTT is * Movement systems (listing 4.1.2 and 4.1.3) * Component definition and querying * Show that components are just simple structs * Explain Read and Write parameters -- mention how these declarations help the scheduler determine which systems are safe to run concurrently * Addition and removal of components * during execution, the ecs-world can be modified by buffering commands which are then executed inbetween update cycles * these commands allow for creation and removal of entities * Instantiation of entities and components * you can add and remove entities to the world * you can add and remove components from entities * Rendering of the ECS-world * you can easily enable graphical rendering of the world * the update-rate of rendering and simulation is entirely decoupled, meaning that the graphics may run at 60 FPS while the simulation runs at 1000 TPS * Martin (3 min = 390 words) Performance * N-body simulation results * figure 6.1.1 shows how all ECS engines are faster than traditional OOP * all of the engines have differing levels of overhead, which can be seen in the left-hand side of the graph * RECS is 678 times faster than Unity GameObjects in the n-body simulation for 2^14 bodies * RECS is fast for computationally heavy systems, like n-body * figure 6.1.2 shows that RECS is faster than both Bevy and EnTT, but Unity ECS is still the fastest * Rain simulation results * figure 6.2.1 shows that RECS still has room to improve when it comes to certain functionality * the reason RECS is so slow in this case study is because it heavily relies on frequent addition and removal of entities. This functionality is not optimized yet in RECS, and can be further developed. * Linn (1 min = 130 words) Further work * the archetypes-implementation could be optimized further for the scenario of frequent addition and removal * it could also be parallelized, currently it's single-threaded * other types of component storage, which are more suited for other scenarios (such as adding and removing frequently) could be implemented * Mathias (1 min = 130 words) Conclusion * RECS is good for applications which have computationally heavy systems and large numbers of bodies, but not for applications which need to frequently add and remove entities or components * The goal has been achieved and we're happy with the end result * Questions? # Manuscript ### (Linn) Simulations and games need high performing and easy-to-use engines High performing computer simulations are highly sought after in both academics and the industry, to allow for fast generation of results and the implementation of a wide variety of real-time applications, like videogames. Behind these simulations are powerful simulation engines, software frameworks that manages everything from data storage to CPU utilization. Traditionally, these engines used an Object-Oriented approach, but due to its poor performance and difficulty to work with, alternative design solutions are rising, the Entity Component System architecture, ECS for short, one of them. ### (Linn) We have developed, RECS, a simulation engine which uses the ECS architecture... We have developed a simulation engine which uses ECS architecture called RECS. The purpose was to implement the engine in such a way that the underlying complexities, like concurrent scheduling and data storage, are shielded from the developer, making the engine easier to use without sacrificing performance. ### (Linn) Explain ECS So what is ECS? ECS is a software architecture based on Data-Oriented Design. Meaning that the focus is on the data layout for efficient data retrieval. It splits the engine into three parts: * Entities which represent objects in the simulated world * Components which correspond to the different attributes that an entity may have, such as position, velocity and mass. * Systems which correspond to behaviours and is the logic of the programs. They operate on subsets of components every update cycle, like, for example, a gravity system that operates on entities that have velocity, mass and position components. Take for example this figure here, visualizing a simple example of ECS. We have the system over here that updates position of every movable object in this world. To do this, it iterates over all velocity components to read out how much each entity should change in position, and then updates every position component accordingly. With the ECS architecture, this iteration is very fast since all of these components are packed closely together in memory. As I mentioned earlier, many think of Object Oriented designs when they think of games and simulation development. However, the encapsulation of data and logic that Object Oriented Programming makes use of can cause problems, such as when systems are scaled up. Large programs runs the risk of having their classes intertwined with each other, and changing one thing can impact something which should be far removed. It is large scale simulations and heavy computation simulations where ECS is a good alternative. The promotion of using plain data objects which separates data and logic from each other makes it easier to optimize the use of CPU caches and ultimately achieving high performance on modern multicore processors. Components can be packed closely together in memory, and therefore making processing data much faster and lowers cache misses. Maintainability of large code bases can also be improved with ECS because of the decoupled design, and that both systems and components are highly reusable. Additionally, it has potential to simulate classical computationally heavy problems such as the n-body problem in a more efficient way. ### (Martin) Why is ECS fast? What makes ECS efficient for computationally heavy problems? There are two main aspects of ECS which makes it well-suited for taking advantage of modern processors: * concurrency, and * caching. Concurrency is about utilizing the processor's ability to execute two or more tasks simultaneously. ECS allows for automatic parallelization of systems thanks to its data-oriented design, that is, its decoupling of logic and data. In object-oriented programming it is common to couple program logic to specific classes which store data, which in turn makes it more difficult to split up processing across multiple cores. There are two kinds of concurrency in ECS: * inter-system parallelism, and * intra-system parallelism. As the names imply, inter-system parallelism is about parallelizing different systems such that they can execute concurrently, while intra-system parallelism is about parallelizing the internal execution of a single system. Systems which are independent, that is they do not access the same components, are easy to run in parallel. This is because they pose no risk of having race conditions and therefore there is no risk of resource inconsistencies. Parallelizing within a single system is a bit more tricky, since it requires careful segmentation to ensure there are no concurrent reads and writes occurring in the same memory location. Together, both of these types of concurrency make ECS-engines able to evenly distribute work across all available CPU-cores. Our engine, RECS, implements both. Another important aspect of modern CPU-architectures is the use of multi-level caching. Caching is paramount to having well-performing programs, due to the main bottleneck of most programs being the act of transferring data between main memory and the processor. A related concept is data prefetching, which is when the processor is able to predict which data will be accessed next and can load it before it needs to be used, thus saving time. To utilize this prefetching, data needs to be laid out in a predictable manner, such as being contiguously stored in an array. Component data needs to be laid out in this manner, in order to speed up system iteration over them. To achieve this, there are different data structures which are carefully designed to "patch" any holes that can be left in the data by removals and transfers. One such data structure, commonly used in ECS, is known as "archetypes", which stores entities with the same components together in memory. RECS implements archetypes, which is to its benefit when it comes to iteration speed, but to its detriment when it comes to dynamic addition and removal. There are many different ways of implementing concurrency and cache-efficiency, so the project needed to be split into phases, where the earlier ones were more exploratory and the later ones focused on implementing specific solutions. ### (Martin) Method These phases were: * a Prototyping-phase * a Minimum Viable Product-phase * and a Main engine development-phase Development started with a prototyping-phase, where three different prototypes were developed in order to explore different aspects of an ECS-engine: system scheduling, data storage and querying. When the prototypes were considered done, they were combined into a single working minimum viable product. This was then followed by the main engine development phase, where more complex features and optimizations were made. During this phase, two case studies were implemented in RECS. Both to guide development of features, and to more realistically evaluate engine performance. Each case study made use of different features of the engine. The first case study was an N-body simulation, which is a gravitational simulation of a collection of bodies being attracted towards each other. The second case study was a simple rain simulation, consisting of rain drops spawning, falling down while being affected by wind and then being destroyed. As previously mentioned, the case studies were used to decide which features to implement in the engine. Next, we'll present the main features of the engine. ### (Mathias) Overview of engine functionality (weave in the case studies) Our engine has 5 main features: * Instantiation of entities and components * User-defined systems * Querying of components * Addition and removal of components * Rendering of the ECS-world **Instantiation of entities and components** Components in RECS are user defines structs. They can for example be marker components containing no data, or simple floats, ints, vectors or a combination of them. Creating a entity is done by asking the engine for a new entity and adding compontents to an entity can easily be done, as shown in the code to the bottom right. **User-defined systems** A system in RECS is a Rust function which takes the components as parameters. Here we have the movement system for the N-body simulation which opperates on entities that have Position and Velocity components. An equivalent system written in using EnTT, a popular C++ ECS framework. Here we can see that RECS requires a lot less code to achieve the same result. **Querying of components** The two most basic types of querying are Read and Write. These determines whether a system gets immutable or mutable access to the component, and allows the ECS engine to schedule systems concurrently. Meaning a component can only have a single system that writes to it, but several reading from it. In the movement system we have the position as write as the position is updated by the system, while velocity is read as it's not updated. This means another system could also be using velocity for other computations. **Addition and removal of components** Another feature RECS supports is the ability to add and remove components and entities during run time. This is done by using commands which are executed between every system execution iteration, as doing this while systems are executing could cause inconsistencies and errors. **Rendering of the ECS-world** The last feature we will talk about is rendering. The engine takes care of almost everything realated to rendering, the only thing the user needs to do is: 1. Enable rendering when starting the engine 2. Load a model to render 3. Create a renderable entity This code would produce something looking like this The rendering system of RECS was designed to be easily plug-and-play, making it very simple to incorporate 3D-graphics into a simulation. Two examples of this are the two case studies, where there only had to be minor modifications to the instantiation of entities in order to render them. In order for the benchmarks of the engine to focus on the simulation performance and not the rendering performance, rendering was completely disabled during the benchmarking process. ### (Edvin) Performance RECS was benchmarked against other contemporary engines, to evaluate how well it performs. This was done thoroughly with the n-body simulation, but there was only time to compare the rain simulation against one other engine. <img alt="figure 6.1.1" width="500" src="https://hackmd.io/_uploads/SkafaieS2.png"> This graph shows how well the n-body simulation performs when simulating differing numbers of bodies in different engines. Note that both the x-axis and the y-axis are logarithmic. The y-values are the total time taken to simulate a single tick, while the x-values are how many bodies were being simulated. Our engine is highlighted as the dotted green line. For smaller inputs, that is, on the left-hand side of the graph, the varying levels of overhead imposed by the different engines. The yellow line above all of the rest is the standard Unity game engine, which is an object-oriented engine based on an older architecture. Note how it is slower than all of the other engines, which are ECS-engines. For 2^14 bodies, RECS performs about 700 times faster than standard Unity. Another notable fact here is that all of the different ECS-engines seem to converge in performance as the number of bodies grows. This is to be expected, as at that point the computationally heavy gravity-system will be the dominating factor and not the engine itself. It is also clear that RECS, in green, is very much comparable and sometimes even faster than, other contemporary ECS-engines. <img alt="figure 6.1.2" width="500" src="https://hackmd.io/_uploads/ByBm6jxH2.png"> This chart shows a single vertical slice of the previous graph, more specifically the largest tested input of 2^14 bodies. Here it can be seen that while RECS beats both Bevy and EnTT, it does not outperform Unity's ECS solution. While the n-body simulation benchmarks show the strengths of RECS, it also has some features which are not as optimized... <img alt="figure 6.2.1" width="500" src="https://hackmd.io/_uploads/HkmwTslHh.png"> This is a comparison between RECS, in green, against Bevy, in red. The y-axis shows time taken per tick, and it is clear that Bevy vastly outperforms RECS in this benchmark. :((( The reason for this is because the rain simulation relies heavily on the functionality of adding and removing entities during runtime, since every raindrop in the simulation is being spawned from clouds and removed when it touches the ground. This is an aspect of RECS which has a lot of optimization-opportunities, as is evident by the performance... ### (Edvin) Further work There are areas in which RECS can be improved for even better performance. The current implementation of archetypes could be further optimizes for frequent addition and removal of components. Currently, every time a component is added or removed from an entity, the entity needs to be moved from its current archetype to a new one, something that takes time. Alternately, instead of modifying the archetype implementation, an alternative storage solution could be beneficial. Bevy, for example, have sparse sets as an alternative since it is faster at handling just addition and removal of components. Theoretically, iteration of components in archetypes can be parallelized. This is also a possible further development in RECS, since this is currently single-threaded. ### (Mathias) Conclusion To summurize RECS perfroms well for applications which have computationally heavy systems and large numbers of enitites. However, there is still room for improvmenet specifically dynamic removal and addition of entities and components which we could see from our Rain simulation. Overall, the goal has been achieved as we have created an ECS engine which is easy to use and has decent perfrormance and we are happy with what we have done during this project. Thank you for listening! Any Questions?