# Report Assignment 3 group 28
#### Report for the repository The Algorithms/Python by group 28.
* Joel:
* insert_repair@ /data_structures/binary_tree/red_black_tree.py
* remove_repair@ /data_structures/binary_tree/red_black_tree.py
* Aïssata
* knappsack/greedy_knappsack
* math/jaccard
* Yen:
* cycle_sort@ /sorts/cycle_sort.py
* strand_sort@ /sorts/strand_sort.py
* Gautam:
* polynom_for_points /linear_algebra/src/polynom_for_points.py
* binomial_heap /data_structures/heap/binomial_heap.py
* Jacob:
* kgV@ /maths/primelib.py
* palindromic_string@ /string/manacher.py
#### Aiming for P+:
Joel [report](https://hackmd.io/@ztick/SJM3j7bx9#P)
Jacob [report](https://hackmd.io/JYzgIghpQNWe-DYxFnLEwA?view#P)
[refactors](https://hackmd.io/EZB_A9-9QGi0PnG5LHjj7A?view#Carried-out-refactoring-optional-P)
## Project
Name: The Algorithms/Python
URL: https://github.com/TheAlgorithms/Python
*One or two sentences describing it*:
Implementations of algorithms in Python for various applications (CS, math, finance etc.). The readme warns those algorithms might be less efficient than their counterparts in the Python standard library.
## Onboarding experience
*Did it build and run as documented?*
The repository is a collection of small projects that we did not need to build as a whole. The parts that we examined ran smoothly right away, plug and play so to speak.
## Complexity
1. What are your results for ten complex functions?
* Did all methods (tools vs. manual count) get the same result?
* This differed within the group, where several of us have got the same result but not all of us. We did also use different techniques from [the wiki page.](https://en.wikipedia.org/wiki/Cyclomatic_complexity) That is, since tools were authorized we used a graph generator, and also counted manually the if-statements.
* Are the results clear?
* Yes, barring some discrepancies from doing the calculations by hand, it is quite clear which functions are complex and which are not.
2. Are the functions just complex, or also long?
* We found that most functions we examined further were not very long, but still complex. It seemed to be that a lot of logic was being contained in as small a place as possible. However, there were also cases of overly verbose code and errors.
4. What is the purpose of the functions?
* They are all parts of common algorithms or data structures, such as cycle sort or red-black trees. Apart from that, they are different. The repository targets a wide range of users with algorithms for finance or blockchain to cite two. We tried to focus on algorithms we were familiar with from our studies.
6. Are exceptions taken into account in the given measurements?
* We found surprisingly few exceptions even though an algorithm library seems like a less scary place to put them than in other production code. Perhaps this is because of the mindset of trying to hide one's miniscule mistakes, or perhaps they are frowned upon in this repository for some legitimate reason, we do not know.
8. Is the documentation clear w.r.t. all the possible outcomes?
* Not really. Only one of us was satisfied with the documentation they found in their deep dive. This means that strong documentation is not the main priority in the project, and though docstrings and further comments are recommended in the "Contributing Guidelines", they are not enforced.
[Results Aïssata](https://hackmd.io/QYRc-8IFT4aXcNI46_YfCA)
[Results Joel](https://hackmd.io/@ztick/SJM3j7bx9)
[Results Yen](https://hackmd.io/@C3YQYHNLTx2wfzQQ5I8MeA/HkuL8mGxq)
[Results Jacob](https://hackmd.io/JYzgIghpQNWe-DYxFnLEwA?view)
[Results Gautam](https://hackmd.io/@EbEt_YxYS8u3kwVeuT3y9Q/ryH9ULzgc)
### Lizard tool
The lizard tool is well documented, and performs as expected out of the box.
For example:
`lizard --CCN 10 -l python -x */venv* -s cyclomatic_complexity`
would sort all functions with CC over 10, sorting them by cyclomatic complexity, excluding virtual environment files (that have very high CC). But simply navigating to a folder and typing:
`lizard`
is enough for it to begin analysis of the folder with default settings. This makes it very easy to use, as you immediately get results and can then focus on tweaking them to your liking.
### Onboarding experience with other tools
We also tried `radon` which gives a [ranking](https://radon.readthedocs.io/en/latest/commandline.html) rather than a CC score.
Another tool we used for CC was [staticfg](https://github.com/coetaur0/staticfg) which was extremely helpful in generating graphs to count nodes N and edges E and connected components P, to apply the [formula](https://en.wikipedia.org/wiki/Cyclomatic_complexity):
M = E - N + 2P
Some algorithms would not build and the error was not documented in a troubleshooting section. We discovered that while making the graph with staticfg, it does not work if the main function had `input()` and `print()` statements.
Those needed to be commented out and replaced with:
```python
import doctest
doctest.testmod()
```
This error was not documented (unsuprisingly as we were using `staticfg` on another repository) and a good practice would be to inform the authors in an issue.
## Refactoring
### Plan for refactoring complex code:
We had some different plans regarding refactoring, and they were mainly split between two camps:
1. Helper functions. We found that highly complex functions with low LOC contained a lot of logic in a small container. This made it difficult to read and therefore difficult to understand, so it would benefit from being split into smaller functions with more thorough documentation.
2. Optimizations. There were instances of code repetition and low efficiency solutions, which could have their complexity decreased by implementing proper data structures and/or removing duplicate logic. One instance of code repetition introduced an error when implementing a test.
### Estimated impact of refactoring (lower CC, but other drawbacks?).
The main point would be for developers to be able to more easily read and understand the code. For us, clear documentation and low CC accomplishes that, but it is not necessarily the case for all users of these algorithms. Some of the functions are math-heavy, and probably have nothing but engineers as their intended users. But the repository is targeted towards users in "education", so lowering the complexity can be useful to the community. Then, verifying that the code does what you want may be more of a hassle when it is spread out into different functions.
### Carried out refactoring (optional, P+):
[Joels report](https://hackmd.io/@ztick/SJM3j7bx9#P) , [Joels refactoring PR](https://github.com/ztick/Python/pull/7)
`git diff 364f1831f2e8bc431b50c68dff3365caa5b4d8c8 d6660e52a61833c1059791cfd71aeabd13cbf885 `
Jacob's incremental refactors: [1](https://github.com/jacobmimms/Python/tree/refactor), [2](https://github.com/jacobmimms/Python/tree/refactor2), [3](https://github.com/jacobmimms/Python/tree/refactor3) and [report](https://hackmd.io/JYzgIghpQNWe-DYxFnLEwA?view#P)
## Coverage
### Tools
*Document your experience in using a "new"/different coverage tool. How well was the tool documented? Was it possible/easy/difficult to integrate it with your build environment?*
Coverage has excellent documentation, including [branch coverage](https://coverage.readthedocs.io/en/6.3.1/branch.html#branch).
```python
coverage run --branch <yourprogram.py>
coverage report
```
After installation it was extremely easy to calculate the branch coverage and see the result in an html page. The page implements colors to make it easy to identify which branches were run or not. Partially run branches are in yellow.
#### Coverage runs

| <b>Fig1: Coverage report for one example function in a generated html page.</b>|
### Your own coverage tool
:arrow_down: Those sections should probably go in personal report
#### Show a patch (or link to a branch) that shows the instrumented code to gather coverage measurements. The patch is probably too long to be copied here, so please add the git command that is used to obtain the patch instead:
Joel: `git diff 7a9b3c7292cbd71fdc7723f449b9bbcbefbf9747 f4b91318ef6d1f333fb65b319414a00dadeeaac3 ` [on master in this repo.](https://github.com/ztick/Python)
Yen: `git diff 21db824cc5b6aa695dc47019dfa028a422730873 7a9b3c7292cbd71fdc7723f449b9bbcbefbf9747` [on coverage_measurement in this repo](https://github.com/b06902047/Python)
Aïssata: Link to [Github PR](https://github.com/TheAlgorithms/Python/pull/6019/files)
Jacob: [PR](https://github.com/jacobmimms/Python/pull/6)
#### What kinds of constructs does your tool support, and how accurate is its output?
Our tools are primitive by design, and only support the environment for which they were created. Typically, they add a flag to a list whenever a branch has been taken. That means, however, that they are very precise in their definition range.
### Evaluation
1. How detailed is your coverage measurement?
2. What are the limitations of your own tool?
3. Are the results of your tool consistent with existing coverage tools?
As we were instructed to implement our own tools, they are all different and evaluated in separate reports to minimize the bloat in this text. They are linked below:
[Code coverage, Aïssata](https://hackmd.io/QYRc-8IFT4aXcNI46_YfCA#Coverage)
[Code coverage, Joel](https://hackmd.io/@ztick/SJM3j7bx9#Task-1-DIY-Implement-branch-coverage-by-manual-instrumentation-of-the-source-code-for-five-functions-with-high-cyclomatic-complexity)
[Code coverage, Yen](https://hackmd.io/@C3YQYHNLTx2wfzQQ5I8MeA/HkuL8mGxq#Part-2-Coverage-measurement-amp-improvement)
[Code coverage, Jacob](https://hackmd.io/JYzgIghpQNWe-DYxFnLEwA?view#Task-1-DIY-Implement-branch-coverage-by-manual-instrumentation-of-the-source-code)
[Code coverage, Gautam](https://hackmd.io/@EbEt_YxYS8u3kwVeuT3y9Q/ryH9ULzgc#Part-2)
## Coverage improvement
For the same reasons as above, the requirements for the coverage can be found in our individual reports, linked above. Below are links to view the old and new coverages, either as htmls (to download and open in browser) or as images.
Report of old coverage: [link]
[Link to Joels old coverage, 74%](https://github.com/ztick/Python/blob/master/data_structures/binary_tree/coverage_before_improvement.html)
Report of new coverage: [link]
[Link to Joels new coverage, 81%](https://github.com/ztick/Python/blob/master/data_structures/binary_tree/coverage_improved.html)
*Number of test cases added: two per team member (P) or at least four (P+).*
Aïssata: 2 (described in [linked report](https://hackmd.io/QYRc-8IFT4aXcNI46_YfCA)).
Gautam: 2 (described [here](https://github.com/gmm97/Python/commit/b8c688af504b6fc2e53fb809a5053681242e8661) and in [linked report](https://hackmd.io/@EbEt_YxYS8u3kwVeuT3y9Q/ryH9ULzgc)
Joel: 4 (described in [this PR](https://github.com/ztick/Python/pull/6) and in [P+ section of the report](https://hackmd.io/@ztick/SJM3j7bx9#P))
Yen: 2 (described in [this PR](https://github.com/b06902047/Python/pull/3) and in [linked report](https://hackmd.io/@C3YQYHNLTx2wfzQQ5I8MeA/HkuL8mGxq#Task-2-Coverage-improvement)).
Jacob: 4 (described in this [PR](https://github.com/jacobmimms/Python/pull/6) and in the [linked report](https://hackmd.io/JYzgIghpQNWe-DYxFnLEwA?view#post-coverage))
## Self-assessment: Way of working
### Current state according to the Essence standard:
This project differed from the previous two, as we were able to delegate the coding work, and our individual progress was unrelated to the progress of the group. As such, we did not need to use the group github repo to manage workflow, which was a departure from the original way of working.
We established new "foundations" for this project in a team meeting, and adapted a new way of working to the differing requirements of the project. In this regard, we had to begin in the "foundations established" state for this project.
In one word, the team is adaptive - that is, we do not fall cleanly into one state; rather, we adjust to the changing requirements of the projects and the needs of the team members. For example, there aren't "non-negotiable tools" - we are willing to try new tools, such as hackmd, when the need arises.
While we might not fall cleanly into one state, the fact that the group can adapt indicates we have meaningfully reached the "In Use" state, as we are able to give and receive feedback constructively.
It is hard to say that we have not reached the "In Place" state, because of the fact that our way of working is so flexible (perhaps this project group does not require or benefit from very explicit "practices and tools" in the way a product team would). Flexibility is unescapable in our situations, but has also its problems, as our different windows of availability makes communication/pair debugging difficult to organise.
To reach the "Working well" state, we would have to learn to work together on larger projects over a longer period of time (as the "working well" state stresses using practices "naturally"). The obstacle for reaching this state is not having the time to mature as a team.
### Personal Reflections:
* Joel: As this assignment differed quite a bit from the earlier ones, we have mostly had opportunities to work on sporadic communication. That was an agreeable experience and one we can learn from in terms of setting communication expectations.
* Aïssata: The assignment was interesting in the sense that I could explore new tools and reflect over which part of the code are used. I would of course never use my own tool but probably lizard or coverage in the future. Communication wise, I find that our group is very adapative, but because of our heavy and conflicting schedules it can be hard to coordinate when needed (debugging). Given our heavy workload outside of this course, I think this is working well.
* Yen: It's pretty cool to learn about coverage by utilizing new tools such as lizard and coverage.py. Besides, our group memebers are willing to share the obstacles they met when doing their own works as well as some useful tools. I get help and learn from that a lot.
* Gautam: I liked that we could work independently on this assignment, and we just had an initial team meeting to iron out the requirements of the project. We helped each other with suggesting useful tools for visualization, and to highlight suitable functions which were lacking in test coverage. This style of working prevents unnecessary duplication of effort, so it was a fun experience.
* Jacob: I mostly agree with the assessment, but if we were to contextualize the standards within a course setting and consider the different group member's backgrounds/ goals, I think it could be said that we have reached the "working well" state.
*Was the self-assessment unanimous? Any doubts about certain items?*
The assessement is unanimous in the sense that it was an eye opener, and the tools are something we will use in the future.
*How have you improved so far?*
Communication improved since last time (the lower workload of this assignment really helped)
*Where is potential for improvement?*
We do not have enough overlapping time. Addressing this is our main concern for the next assignment.
## Overall experience
*What are your main take-aways from this project? What did you learn?*
Branch coverage is a very interesting tool for checking which part of your code is executed. A metric such as cyclomatic complexity is not only a good tool for checking the robustness of code, but is also a great proxy for how easy it is for a human to interact with that code. Following a standard of writing "low CC" code probably results in more readable code that follows better design patterns.
Different tools (and manual calculations!) give very different results, so it is important to consider what the tool is measuring and to choose a tool that fits the needs of the project.
We discovered, after testing different approaches, that the best way to choose a function to analyse and amend was to:
- run the tool for static analysis
- install run the automated tool for coverage to identify the function with bad coverage. Many functions have coverage over 95% and there is almost no room for improvement
```python
coverage run --branch <program>
coverage report
```
#### Coverage branches

| <b>Fig2: Excellent coverage, difficult to improve</b>|

| <b>Fig3: Branches that can be covered by own tool</b>|