Local Testing Hands-On Session

# Local Testing Hands-on Session --- ## Presentation plan * Part 1: The tools (less hands-on) * Introduction to local testing * Tox * Pytest * Part 2: Writing tests (more hands-on) * basics of testing * python vs ansible tests * unit tests * functional tests --- # Local Testing Hands-On Session ## Part 1: the tools Please Don't assume that you know everything I say here, especially for the parts that seem trivial. I'm setting a commond ground for what We'll see in the second part. --- # Preparation ``` bash # Prerequisite dnf install -y git python3 python3-devel python3-virtualenv python2 python2-devel @'Development tools' # choose your temp dir export TOX_SESSION_DIR=/tmp cd $TOX_SESSION_DIR python3 -m virtualenv -p python3 tox_session_venv source tox_session_venv/bin/activate pip install tox git clone https://review.rdoproject.org/r/rdo-infra/ci-config cd ci-config git fetch https://review.rdoproject.org/r/rdo-infra/ci-config refs/changes/38/26138/1 && git checkout FETCH_HEAD ``` --- We will start from 0, scratch, I assume nothing. --- What is testing ? Testing is the act of comparing the outcome of something we developed, with the expectation we had on that outcome eg. The promoter should promote, so I run the promoter and I expect to see the dlrn updated, the containers uploaded. the collect logs role should collect logs, I run the role and check if the logs are there. Often what we develop covers various use cases, so for example the promoter should promote centos8 and centos7 collect logs should collect logs for both quickstart and infrared. So to test completely our creature we have to run it with different use cases, checking that each outcome matches with the corresponding expectation: I use collect logs in quickstart, all quickstart logs should be collected. IN infrared all infrared logs should be collected. This is called positive testing, we check that the role is doing what is supposed to do. On the other hand, we could make sure that when we run promote on RHEL, we don't push containers to docker.io. This is called negative testing, we check that the role is not doing something that is not supposed to do, responding correctly to error conditions. Now, if this run and comparison between outcome and expectation is done by a human, this is called manual testing. If the run and comparison is done by a machine, it's called automatic testing. ## manual tests are bad ? Opestack is tested by a set of automated tests called tempest. Tempest has more than 1200 tests, which you may understand that running manually would take ages. Manual test are not bad only when you're exploring a solution. They need to be replaced by automatic tests as soon as you can. --- # First panda principle If you have a creature that you feel does lend itself only to manual tests, change it so it can be tested automatically. --- # A typical development How do you test something like the promoter, or the log collection role. In most of our cases, the design part should have produced a generic workflow. 1) on component A do a1, a2, a3 2) on component B do b1, b2, b3 ON this workflow you identify a task that can be done in a particular component A. 1) do a1, 2) do a2, 3) do a3, From this design you extract a sequence of instructions that implement the workflow. ## First important step. Group the instructions into functions that serve a single purpose. If you understand that a is doing something different than b, put a and b in two different function/methods. This is called single responsibility principle and sound trivial, but it's essential for testing. In testing you can't test less than a function, so if you put 2 workflow items in the same unit, the test becomes complicated or even impossible. # Where do we start testing ? Essentially the first goal a developer has to implement the design is to reach basic functionality. Suppose you have implemented the workflow with this. At this point you have very rough functions, but they represent you idea of the worflow and you want to check what's the status of what you wrote. ``` python state = "init" def a1(): # do a def a2(): # do b def a3() # do c ``` ``` yaml --- role: install - name: Install packages package: a ``` ``` yaml --- role: configure - name: configure Packages copy: src: config-examle.ini dest: config.ini ``` The first thing you want to test is if the workflow outcome matches the expectation For python. ATTENTION if your "test" runs ``` python a1() a2() a3() ``` and it completes, THIS IS NOT AN automatic TEST. This is a automatic run. 1) if the run completes, it doesn't mean that the program works. It just means that the program doesn't fail. 2) if you have to look manually at the logs or the output to see if the program did what is supposed to, this is a manual test. To be an automatic test, after the run phase you must include a check phase. The first test will look something like this. ``` python a1() a2() a3() if state == "endstate": print("Test successful") else: print("Test Failed") ``` At this point you launch all your rough functions and try to understand at which point you program blocks. It will be the first of several iteration in which your program will change, improve, degrade, grow, shrink, sometimes you'll have to review the design. And eventually you'll reach your basic functionality. This is a very important step, because you demonstrated that you can close the gap between the initial state of the program to the final state with the functionality you implemented. Remember about the other use cases. You need to add at least one functional test for each use cases your software supports. ``` python a1() a4() a3() if state ... ``` Each test will run on a certain **execution path** <- remember this word as will be very important. We will cover negative tests on part2 of the training. --- # Classification of Automatic tests. Based on the scope and type of expectation of the tests we can identify various layers of tests 1) static code analisys (also called linters) They statically look at the whole code in the repository The code is not really run, a parser checks syntax, code style, unused variables. These check are usually ANNOYING and are HATED, but 1) They enforce a style which is important for collaboration 2) Happens rarely but are able identify bugs at very early stage. 2) Unit tests 3) Functional tests As we've seen they test a sequence of function inside the same component, and verify the outcome at the end of the workflow. They check that the single functions are interacting correctly together. 4) 1-1 Integration tests. They are the functional test applied to the multiple components of a solution. The promoter to work needs dlrn. We need to check taht the functions that interact with dlrn work correctly with dlrn, which is an external component 5) End to end integration tests These are the most extensive tests that can be done. Their scope is multiple function in multiple components, and like the functional test, they usually test the whole workflow from the very beginning, to the final completion. Most of the time they require specialized hardware, or many machines to run, and they take a lot of time to complete, because they touch a lot of functions in the workflow. If you look at what we do, what we work on, what is our product and our maintanance target, the check, gate, periodic jobs that we launch on zuul, they are almost all end to end integration tests for tripleo. That's our main mission. --- Quickstart e2e integration tests and tripleo e2e integration tests are the same. But while tripleo has the other layers, historically quickstart did not --- # Layers differences. A graph with two axes x -> number of function touched y -> number of execution paths touched (coverage) Time to write x -> type of test y -> time to write tests. Functions tested x -> type of test y -> functions covered The pyramid y -> cost in resource and time. x -> scope of a single test. The less the scope of a test is, the more test you need to cover the code. On the promoter code, running 141 unit tests locally take 13 seconds. the longest functional test takes 90 seconds. zuul job be queued for an arbitrary amount of time. When they are started, they take a minimum of 5 minutes until they start user code. This is X^x graph (unit test take more ti) --- ## Why do we need local testing ? Can't we just use zuul or running vms in RDO/Vexxhost then test in the **REAL** environment  You have to start thinking quadrimensionally  Every software needa a combination of all the layers of tests to be tested correctly. But let's consider the change-test cycle in iteration with some example. FOr the first patches in the promoter I ran no less than 2000 iterations. rapid calculations for the run time of the tests (it's the time you wait to get for feedback): 2000 runs of unit tests -> 2000 seconds ~ 30 minutes 2000 of functional tests: 2000 * 90 ~ 20000 sec ~= 6 working days 2000 of e2e integration tests: 2000 * 3h = 750 working days. 2000 of simple tests but run in zuul (5 + 2 minutes) = 29 working days. runnin a unit test is 2700 times faster than runnin a e2e integration test. From a different perspective working hours in a sprint: 13 * 8 = 104 In a single sprint, if you do NOTHING else at all, you can run 34 e2e integration tests 896 small zuul jobs 365000 unit tests. 34 e2e integration tests. You need to make them count ... ## What is local testing ? If end to end integration tests are so comprehensive, what need is there for the layers ? THe goal is to get feedback on your change as fast as possible. The faster you can iterate, the faster you can develop At least on the first phase, the one in which you need to move the idea forward: exclude zuul. exclude e2e integration tests. The definition depends from case to case. Local testing is everything you can test on your machine to undestand your code is good enough to be moved to the test you can't do locally. --- Time feedback confidence --- Do we really need to test this much ?? Time taken to write the test could be used to do something else ... ## "Yeah, but .."s ## ... I don't have the time to test all these corner cases. ## Third problem: You cannot cheat on the time required. The time is waiting for you, and will bite you. The development of a set of function requires an ideal time T to be done correctly. If you're spending less time now, you're incurring into a debt Time cannot be bartered.If you don't spend the required time at the beginning, you may spend double the time ### sculpt this into your mind: Your software will change. Wheter you want or not, your software will need to change. A change in design, a bug, a new feature, an optimization. If you're not doing automatic tests, or doing only the trivial tests of functionality, the future you, or a member of the team, will change a1, a2, a3 that will introduce a bug. If there is no automatic test, manual test will take too much, and will be overlooked. If you have trivial automatic tests, they will pass even if the change has a destructive bug. You may be fast at the start, but as soon as you have to take a step back and change what you alredy did, if you don't have tests, you need to redo lots of things, or just test less. The amount of time spent on write test pales in comparison to the amount of time you save The best thing you can do is to prepare your code for the changes that will come. And the best way i to write tests ---- ## Do they cover everything ? <span> NO, but good tests can cover easily 90% of the possible scenarios in a fraction of the time. </span> <span> The rest can be covered by Integration tests with zuul and manual exploration. </span> --- ## Enough talking about why, let's start talking about how. run the first command ``` cli tox -epy27 -- ci-scripts/dlrnapi_promoter/test_dlrn_hash_unit.py ``` The first run usually takes a bit --- ## What is tox ? Tox is a generic virtualenv manager * It creates a disposable virtualenv (which is not your starting virtualenv created in the preparation) * It installs all the specified dependencies on it. * It runs any command you want inside the virtualenv --- where are .tox environemt ? --- ## Tox command line arguments ``` cli tox -epy27 -- ci-scripts/dlrnapi_promoter/test_dlrn_hash_unit.py ``` * -epy27 defines the virtual (-e)nvironment to use, in this case is a python2.7 virtualenv <aside class="notes">Look at tox.ini</aside> tox.ini usually defines these environments in detail, but this in particular is an embedded environment, and inherits for the generic configuration. * The part after the double dash is passed directly to pytest, here we specify to which test of group of tests to limit the run --- ## What is pytest ? pytest is a testing framework, with tools and libraries that help you write and run tests * It collects the tests in your repo, following specified patterns * Generates input and starting artifacts for the functions tested. * Runs test following specified rules * Gathers stout, sterr, logs, tracebacks * Reports results and gathered information. --- Pytest collection - how does it work ? --- ### tox - pytest interaction ? When the pytest terminates, tox-created virtualenv is not destroyed, it’s reused for the next run. Tox virtualenvs are rebuilt at any relevant config change (e.g. adding a dependency) --- ### What and how does pytest report Leaving aside tox output, it reports the installation steps. Looking at pytest output * Information on pytest itself * plugins information. These are important as lately we are broken if these are the correct version. In particular take note of html, molecule, and cov. * How many tests it collected from the file(s) specified * For every test the status, and the run progress. ---- ### Report Plugins * reporting plugins kick in, it tells where the html report is, (the one we see in zuul too) * report on the coverage, for each file how many lines of code were called. * line the summary, number of test passed and how long it took. --- ## Running a single test ```cli tox -epy27 -- ci-scripts/dlrnapi_promoter/test_dlrn_hash_unit.py::TestDlrnHashSubClasses::test_build_valid_from_source ``` ---- ### A note about coverage It's not a direct report of quality, it tells you what you're NOT testing. * if it's below 90% the tests are not covering enough * if it's above doesn't mean that the test are doing everything right. --- ## How does it look when things are failing ? <aside class="notes">Alter a test to make it fail</aside> --- ## Multiple Tox environment <aside class="notes"> Show how test fail in py35</aside> ```cli tox -epy27,py35 -- ci-scripts/dlrnapi_promoter/test_dlrn_hash_unit.py::TestDlrnHashSubClasses::test_build_valid_from_source ``` ```clike= tox -elinters ``` doesn’t take files into consideration --- ## Is that it with tox and pytest ? Sometimes tests break, and we need to adjust the tox.ini and some pytest argument. But for the majority of the work this knowledge of tox and pytest as tool is sufficient. Knowledge of pytest as a framework, however, it's definitely part of the daily testing routine. --- # End of Part 1 ## in the next episode: test writing for dummies --- # Local Testing Hands-On [WIP] ### Part 2: Writing tests --- Tha basics of testing ``` python from local_lib import f def test_f_0(): a = f(0) working = (a == 1) if not working: raise Exception("f(0) did not return 1") ``` --- # The assert statement ``` python from local_lib import f def test_f_0(): a = f(0) working = (a == 1) error_msg = "f(0) did not return 1" assert working, error_msg ``` --- # Testing ansible ```yaml - hosts: localhost name: Test role role_mcroley with var input 0 vars: input: 0 tasks: - name: call role roley_mcroleface include_role: roley_mcroleface - name: Check role artifacts assert: that: role_output == 1 fail_msg: role did not generate 1 as output success_msg: role generate 1 correctly ``` --- # you can test even without a testing framework # Test framework "just" make this part bearable --- # example of a unit test # Functions and methods are the smallest things you can test --- # Example of functional test --- Don't add options for the testing. if the testing workflow diverges from the production workflow, you're not testing the right things. You're only allowed to change the configuration variable during testing. (And no you can't act diffetently if the code recognize them to be testing variable) --- You need to be at least test-aware. What's the atom on testig function for python and bash role for ansible If you put logic in a playbook **THAT LOGIC WILL NOT BE TESTED** If a function does two things at a time, it will be diffucult to test, because you'll have to check two sets of output/artifacts. The more your code is modularized, the more testable is. --- # Testing workflow * write the code and tests together * e.g. the main, it's not testable inside the if __name__ == "__main__": * create a main() * how do you test the reaction to the command line then ? * add command line as an argument to the main. Don't think you can add the tests later. It will require double the time and effort. If you're making a big function that does everything, and it's not testable, and it's merged. imagine you have to test it after 2 months * you forgot everything. * the code is production working code, so you'll have wes's breath on your neck watching you all the time worrying how much you're gonna break production. * someone else will have built over your patch for their functionality and you'll have to change also their code. --- # Why starting with the functional tests ? The first goal is usually to arrive to something that works end 2 end as fast as possible and the reiterate with different cases and scenarios. --- # Ensure the code fails correctly when it's supposed to fail --- The importance of the test coverage. --- # The second step in testing: Unit tests. The unit tests are usually full of negative checks. As the name suggest, the unit tests test the fundamental blocks of your software, in our cases are methods/function for what in write in python, or roles for what we write in ansible. In our example ``` python state = "init" a1() if state = ``` The important thing that you need to understand in unit testing is that the unit is tested in isolation. It means, that if your function or role uses another function or role to work correctly, the dependent function/role is usually mocked. For example the X function uses Y as dependency, You can't call Y because you're unit testing X. YOu don't care at this stage if Y works or not. you care if X works or not. So you just assume Y is working and is returning Z.