PHAS0100 - Week 2 (20/01/23)

# PHAS0100 - Week 2 (20/01/23) ## Introduction to Modern C++ This week we'll be exploring various aspects of the C++ standard library. If you've learned C++ in an older style then some aspects of this, such as I/O and containers, will be familiar; however we will also see how these are typically used in the modern C++ style, making use of newer features such as type inference, range based loops, smart pointers, and anonymous functions. ### Timetable | Start | End | Topic | | -------- | -------- | -------- | | 14:00 | 14:10 | Intro & Menti quiz 1| | 14:10 | 14:50 | Section 1: Containers| | 14:50 | 15:00 | Break | | 15:00 | 15:10 | Menti quiz 2| | 15:10 | 15:50 | Section 2: Introducing Random Numbers | | 15:50 | 15:55 | Early Module Feedback Form | | 15:55 | 16:05 | Break | | 16:05 | 16:15 | Menti quiz 3| | 16:15 | 17:00 | Section 3: Smart Pointers| ## Before You Begin **Please clone the [Week 2 Exercises repository here](https://github.com/UCL-PHAS0100-22-23/week2_cpp_exercises).** If you are using windows with docker remember to clone the repo with `core.autocrlf` turned off: ```bash= git config --global core.autocrlf false git clone ... git config --global core.autocrlf true ``` The `.devcontainer` folder is included in the top level folder, so open this folder in VSCode to re-open in the container. ## Section 1: Working with Containers and the Standard Library In this section we will mostly explore containers, and some of the things that we can do with them using other parts of the standard library. We will also introduce the idea of a unit test for checking the correctness of our code. ### Exercise 1: - Open your terminal and move into the `Section_1_Vector_Functionality` folder. ```bash= cd Section_1_Vector_Functionality ``` - Explore the folder structure to familiarise yourself with the files. - `src` contains the source files for the vector functions that we will be writing and the main function. - `include` contains the header file for the vector functions. - `test` contains both a `src` and `include` folder. The `include` folder contains a header which defines the testing library - you won't need to worry about this. `src` contains the source file where the tests are defined. #### Part 1: - Add a function which prints out the elements of a vector of integers (`vector<int>`). - Start by adding the function declaration to the header file `vector_functions.hpp` - Then write the function definition in the source file `vector_functions.cpp` - Have you passed by value or reference? Why? - Have you use the `const` keyword or not? Why? - In the `main` function in `exercise_main.cpp` declare a vector which contains some integers, and use your function to print them out. Compile and run your program and check that you see the output you expect to see. To compile from the `Section_1_Vector_Functionality` folder: ```bash= cmake -B build cmake --build build ``` and then to run: ```bash= ./build/bin/vector_exercises ``` #### Part 2: - Complete the function `countMultiplesOfFive` in `vector_functions.cpp` using a range-based loop to determine the number of elements in a `vector<int>` which are equal to a multiple of `5`. (Recall that you can use the `%` operator to get the remainder of an integer division e.g. `10 % 3` would return `1`.) - Go into the `tests/src` folder and open `test_exercises.cpp`. Here we will write a _unit test_ to check if our function works the way we expect. We will be using a library called [Catch2](https://github.com/catchorg/Catch2/tree/v2.x). We'll go into more detail on how this works in the future, but for now you can just fill in the function that we give you. - A _unit test_ is code that we run to check the functionality of a small piece of code (a "unit"), which does a single, well defined thing. For example, we will test a single function to see if it correctly counts the entries in our vector. - Note that there is already a `#include "vector_functions.h"` statement at the top of the file, so that the tests have access to the declaration of the function that we need to test. - If you look at the CMakeLists.txt you will see that the test target is also linked to the `vector_functions` library, so it has access to the full definition of the function. - To write a test, inside the curly-braces following `TEST_CASE`, we can write a `REQUIRE` statement. (There are other kinds of statements which can be written, but we won't cover those this week.) Inside the `REQUIRE` statement you should place an expression that evaluates to a bool. For example: ```cpp // The arguments for this test are first a name / description, // and second an identifier that you can use to run individual tests TEST_CASE( "Counting with loop is correct", "[ex1]" ) { std::vector<int> example = {5}; REQUIRE( countMultiplesOfFive(example) == 1); } ``` - When writing tests, we want to cover multiple test cases which fully explore the behaviour of our function under different circumstances. This is an important part of designing robust software. - In the test case code, declare a vector of ints which has multiples of `5` in it (including `0` as a multiple of 5), as well as other values. In the assertion statement, check that your count function returns the correct number. - Declare a vector which contains some integers but no multiples of `5`, and check that your function returns `0`. - Declare a vector which is empty, and check that your count function returns `0`. - Build your tests using cmake. You can run the tests for exercise 1 in two ways: 1. Using the following terminal command from the top level folder: ```bash= ./build/bin/test_exercises [ex1] ``` 2. Using the following terminal command from _inside the build folder_. ```bash= ctest --verbose -R test_ex1 ``` [`ctest`](https://cmake.org/cmake/help/latest/manual/ctest.1.html) is a testing functionality provided by cmake. If you want to explore the CMakeLists.txt you will see that we add the test `test_ex1` which corresponds to the command `test_exercises [ex1]`, but we won't be going into the how to write CMakeLists.txt for testing today. - **Optional Homework**: If you want to find out more about using the Catch2 library, they have an [introductory tutorial](https://github.com/catchorg/Catch2/blob/v2.x/docs/tutorial.md#top). For simplicity we are using the Catch2 header only library (version 2.x), whereas the most recent version (3.x) is statically compiled library. #### Part 3: - Rewrite your function to count the number of multiples of 5 in a `vector<int>`, this time using the standard library function [`std::count_if`](https://en.cppreference.com/w/cpp/algorithm/count). - `std::count_if` should take three arguments: a start iterator, an end iterator, and a _functor_ (usually a function pointer or lambda expression). It returns the number of elements between these iterators for which the function evaluates to `true`. - Remember that `vector` has the methods `.begin()` and `.end()` to return the first and last iterators in a vector. - Write your function as a lambda expression (remember the `[](){}` syntax) inside the arguments of `count_if`. - Recompile your code and run your unit tests to check that your function still works! ### Exercise 2: - The `vector_functions` library defines a function `addElements(vector<int> v, int x, int n)` which adds `n` elements to the vector `v`, each containing the value `x`. - Run the tests for exercise 2 by running the following in the command line: ```bash= ./build/bin/test_exercises [ex2] ``` or in the build folder: ```bash= ctest --verbose -R test_ex2 ``` - What happens? Look at the output of the test and try to understand why the test fails. - Go to the definition of `addElements` in `vector_functions.hpp` and `vector_functions.cpp` and fix the function. Re-compile and run the test again to see that your correction has worked. ### Exercise 3: - Look up the C++ documentation for `array`, `vector` and `map`. - What is the main difference between `array` and `vector`? - Try to think of an example (or more!) of when you would use an `array` instead of a `vector`. - We'll gather some suggestions on menti after the break! ------------------------------------------------------- ## Section 2: Building an Integrator using the Standard Library The exercises in this section will continue to build our knowledge of the standard library and practice our C++ programming. We will explore how to read C++ documentation and apply this to a numerical problem. ### Exercise 4: #### Part 1. The method of integration that we'll be using will feature random numbers. Random numbers in computing is a broad and complex topic, so this week we'll just focus on the practicalities of _how_ to generate random numbers in C++, which has changed since the introduction of the `<random>` library. - In your terminal, move into the `Section_2_Monte_Carlo_Integrator` folder, which is where we'll do our coding for this exercise. - This code calculates the volume of a sphere by generating random points within a three dimensional space and checking if they are inside the sphere. - In this case it will be a unit sphere (radius = 1), and the points should be generated inside the cube $-1 \le x \le 1$, $-1 \le y \le 1$, $-1 \le z \le 1$. - The volume of the sphere is the proportion of points inside the sphere multiplied by the volume of the total space in which you are generating the points. - The volume of the cube is $V_\text{cube} = 2 \times 2 \times 2 = 8$. - So the volume of the sphere is $V_\text{sphere} = n_\text{inside} / n_\text{outside} \times 8$ - We have provided the code to test if points are within the sphere and calculate the volume. - You will be adding code to generate the random points inside the box. - First, open up the [documentation for `random`](https://www.cplusplus.com/reference/random/), which we'll be looking at for this exercise. - Add an `#include <random>` statement at the top of the file source file in `src`. - To generate random numbers we need two things: a random number _generator_ and a _distribution_. - Start by taking a look at the list of distributions that you can use in the `random` documentation. In this exercise we want to generate random `double` (real as opposed to integer) numbers from a _uniform_ distribution. - Find the appropriate distribution and click on it to go to the documentation for this particular distribution. - You should see at the top of the page that the appropriate header is `<random>` and the namespace is `std::`. - To see how to declare a variable of this type, go to the list of member functions and click on the constructor. - You should see that the constructor takes two parameters: a lower and an upper bound for the random numbers. (Remember that in this example we want to generate numbers between `-1.0` and `1.0`.) - You can also see that you can the distribution specifies some type inside angle brackets `<>`, similar to a vector. We want to generate numbers of type `double`, so use `<double>`, or you can just use empty brackets `<>` since it says that `double` is the default type. - You should now be able to declare your distribution in the C++ code. Declare the generator inside the `IntegrateMonteCarlo3D` function, and give it a sensible name like `uniform_dist`. - You can look at the example code in the documentation page for the distribution if you're not sure how to declare it! - Next we need to declare the generator. There are many different random number generators, and the differences between them is a complex topic, so for today we'll just select the [_Mersenne Twister 19937_](https://cplusplus.com/reference/random/mt19937/). - You can declare the generator with a specific random seed or not. - `std::mt19937 rng_mt;` will work just fine for now. - Random number generators are not really random! The sequences of number that they generate are deterministic, but have statistically similar distributions to random. The sequence generated depends on the _seed_, so if you want different sequences, you need to run with different seeds. - `std::mt19937 rng_mt(1)`; will generate a different sequence from `std::mt19937 rng_mt(2);` - Declare the generator inside the same function. - Once we've declared both a generator and a distribution, we can start generating random numbers! - We actually generate random numbers by calling the _distribution_ rather than the generator. - The [generator object is callable](https://cplusplus.com/reference/random/uniform_real_distribution/operator()/), which means that the `()` operator is defined for it. - You pass the random number generator to the distribution, and the distribution uses this to generate a random value and then transform it to match that distribution and type. - So `uniform_dist(rng_mt)` will return a uniform random number between your max and min value, using your Mersenne Twister generator. - Inside the loop in `IntegrateMonteCarlo3D`, complete the code to assign random values to `x`, `y`, and `z` between `-1` and `1`. - Compile and run the integrator. Do you get the value you expect? (Volume of a sphere is $\frac{4}{3} \pi r^3$.) You can compile and run using ```bash= cmake -B build cmake --build build ./build/bin/monte_carlo_int ``` #### Part 2. - We can use [std::bind](https://cplusplus.com/reference/functional/bind/) to avoid passing the generator to the distribution every time. - `std::bind` is a little complex, but the examples in the documentation help to clarify the usage. - It can be used to bind an argument to a function, to make a function of fewer (possibly no) arguments. - e.g. if I have a function `add(int a, int b)` which adds two integers, then `auto add5 = std::bind(add, 5, std::placeholders::_1);` will create a new object which is callable and adds 5 to a single argument e.g. `add5(12)` will return `17`. - `std::bind(uniform_dist, rng_mt)` will create a new callable object (which takes no arguments) which is equivalent to calling `uniform_dist(rng_mt)`. - Use `auto` when declaring a variable to get the compiler to infer the type of this object! - Use bind to create a new object which binds together your generator and your distribution, and use this to generate your random points instead. - Make sure you've checked what header you need to add to your source file in order to do this! - Recompile and re-run to check that your program still works. #### Part 3. - Having a program that only integrates on function (in our case, the volume of a unit sphere) is not terribly useful! Next we modify `IntegrateMonteCarlo3D` to accept callable `TestPoint` function (e.g. a function pointer or lambda-expression) so that the Monte Carlo integrator can be used to integrate arbitrary volumes / functions. - We need to take a look at the documentation for [std::function](https://en.cppreference.com/w/cpp/utility/functional/function) - We can declare objects of the type `std::function<...>`, and these can also be function parameters - The angle brackets `<>` contain the return type and parameter types of the function. First the return type, then the parameter type in brackets: <return_type(param_type1, param_type2 ...)> - e.g. a function which takes two `int`s and return an `int` could have the type: `std::function<int(int,int)>`. - Add a parameter to the `IntegrateMonteCarlo3D` definition so that it can take an `std::function` which takes 3 integers and returns a boolean. - Replace `InsideUnitSphere` with a call to this code instead. - In `main`, you will need to change how you call `IntegrateMonteCarlo3D`. Try first by passing it a pointer to `InsideUnitSphere`. Re-compile and run your code to check that it still works as before. - You can now pass your integrator any function which checks if a point is inside some 3D volume or set! #### Optional Extension (if there's time) <details> - Add some code to allow a user to determine the number of points that are generated to do the integration without having to recompile the code. You can either use command line arguments or use `std::cin` to read a number in from the terminal at runtime. Use this information to set `N_points`. - Monte Carlo Integration is most useful for integrating in large dimensional spaces where linear sampling methods become inefficient. Generalise your `IntegrateMonteCarlo3D` to an arbitrary n-dimensional case (the number of dimensions should be a argument for `IntegrateMonteCarloND`). </details> ---------------------------------------------------------------- ### Feedback session As part of UCL's [Coninuted Module Dialogue](https://www.ucl.ac.uk/teaching-learning/student-partnership/student-voice-and-surveys/continuous-module-dialogue), please take a few minutes to fill out the [feedback questionnaire on Moodle](https://moodle.ucl.ac.uk/mod/questionnaire/view.php?id=4634413). This feedback will be used this term to improve the teaching in later weeks. ## Section 3: Smart Pointers In this section we will explore some of the properties of smart pointers. We recommend that you work in pairs or small groups so that you can discuss the ideas raised in this section with one another. ### Exercise 4: In this exercise we'll explore how smart pointers work, and why they are useful. We'll be contrasting them with _raw pointers_ to illustrate their differences, but remember that we only want to use raw pointers when we need performant, non-owning pointers in safe contexts. #### Part 1: - Discuss with one another why smart pointers are useful and the key features of `unique_ptr`, `shared_ptr`, and `weak_ptr`. Can you think of any examples of when you would want to use each one? - Participate in the menti quiz! #### Part 2: - In the terminal, move into the folder `Smart_Pointers`. Write your code in the source file in the `src` sub-folder. - We've provided a simple class called `Reporter`, which prints to the screen when the constructor and destructor are called. This kind of class can be very useful when trying to understand how objects are managed in C++! - Declare two different instances of this class on the heap (i.e. as pointers): one as a _raw pointer_ using (`Reporter *` and `new`), and another as a _unique pointer_. Compile and run your code. - The `Reporter` constructor takes a string argument which is an identifier. Give each object a unique identifier so you can tell from the output when each one is being created and destroyed. - Look at your output: what difference is there between your raw pointer and your smart pointer? - If an object is created (the constructor is called) but not destroyed (destructor is not called) then you have created a memory leak! Memory continues to be occupied by this object until your program terminates and all the memory allocated to that program is freed. - Memory leaks which happen throughout a program can build up and cause you to run out of memory, especially when running large scale computations that run for long periods of time. #### Part 3: - Write a function which takes a `unique_ptr`. - Try passing by value; what happens? - Try passing by reference; what happens? - Write a function which creates and returns a `unique_ptr` (using a `return` statement). - In the `main` function, declare a new `unique_ptr` and assign it by making a call to this function. - Given that `unique_ptr` can't be copied, why is this `return` call allowed? - Hint: look a the notes on `return` statents in Passing by Reference and by Value. #### Part 4: - Write a `void` function (no return statement) which declares two `unique_ptr<Reporter>` variables. (Give each `Reporter` a unique ID so that you can tell when each is created and destroyed.) - Update your function so that you pass in a `std::vector<unique_ptr<Reporter>>` (a vector of unique pointers to Reporter objects) by reference. - In your function, use `std::move` to add _one_ of your unique pointers to the vector, but not the other. - In `main`, declare a `vector<unique_ptr<Reporter>>` (you can leave it empty), and call your function, passing in your vector. - Add a `std::cout` after your function call to print to the terminal when you function has finished but before the `main` ends. - Compile and run your code to see what the output is. When is each object created and destroyed? Do you understand why? - Moving ownership of data from one object to can be an important thing to model. This can be especially true when the object that we want to take over control of the data has a wider scope than the original object did: moving the ownership is necessary to make sure that the lifetime of the data is properly managed. #### Part 5: - Add a member variable of type `shared_ptr<Reporter>` to the `Reporter` class and a function for setting it. - In `main`, we want to create two `Reporter` objects using smart pointers, and set them to reference one another. - How should you create these two objects? - Create the two objects and set their shared pointer member variables to point to one another. - Compile and run your code, and observe the calls to the constructors and destructors. What happens? Why? - Rewrite the class to avoid memory leaks or any other problems. #### Part 6: **Optional (if there is time)** <details> - Consider the case of a binary tree; this is a kind of graph data structure with nodes of two kinds: - A leaf node, which is just an empty node with a pointer to its parent node. - A branch node, which has a stored value, a pointer to its parent node and pointers to two child nodes (left and right). - The root of the tree is a branch that has no parent (which can be represented by a null pointer, for example). - Branches always have two children; the children can be branches or leaves. - What kind of pointers should you use in this example? Should you use different kinds of pointers to point to children and parents? How do we ensure memory safety? - Consider how a tree is deleted: when a node is deleted, the entire sub-tree beneath that node should be deleted (children, children of children etc.) since they are no longer accessible. - Consider whether there are circular references in this data structure and how that impacts your memory management. - What is the most appropriate model of ownership? - Are there any performance considerations for large trees? - Would your solution change if two trees can share a sub-tree? </details>