Software Engineering 2

--- tags: Bachelor --- # Software Engineering 2 ## Version control ### What Version control software keeps track of every modification to the code in a special kind of database. The code for a project, app or software component is typically organized in a folder structure or **"file tree"**. ### Why - If a **mistake** is made, developers can turn back the clock and compare earlier versions of the code - **Everyone can have the latest version** of the source code easily - Develop multiple features **in parallel** or fix bugs while developing features - **Track every change** and contribution to the project - Know **what has been deployed** to users - Each developer has a **backup** ### How - **Git is a Distributed VCS**, a category known as **DVCS**. - All the past versions are saved inside the VCS - When you need it, **you can request any version at any time** and you'll have a snapshot of the complete project - It’s important to create **atomic commits so that it’s easy to track down bugs and revert changes** with minimal impact on the rest of the project ## GIT A Git repository (repo for short) is a virtual storage of your project. It allows you to save versions of your code, which you can access when needed. `git init` to initialize a repo. ### Git general usage Git add and git commit are used in combination to save a snapshot of a Git project's current state. They are the means to record versions of a project into the repository’s history. Developing a project revolves around the **basic edit/stage/commit pattern**. ### Git commit - **Captures a snapshot** - Git can be thought of as a timeline management utility - Git snapshots are **always committed to the local repository**. - Developers work in an **isolated environment until they are ready to merge** - Git records **the entire contents of each file in every commit** #### Commands `git commit -a` Commit a snapshot of all changes in the working directory, **only tracked files**. `git commit --amend` **Replace** the last commit **with a new commit**, containing **currently staged changes too**. `git status` Show untracked, not staged and staged changes `git log` Show list of commits (and pointers) ### Git add The git add command adds a change in the working directory to the staging area. git add doesn't really affect the repository in any significant way, changes are not actually recorded until you run git commit. The staging area is more like a buffer between the working directory and the project history. The stage lets you group related changes into highly focused snapshots before actually committing it to the project history. This means you can make all sorts of edits to unrelated files, then go back and split them up into logical commits. #### Commands: `git add <filename>` Adds file/folder to the staging area `git add -A` Adds all changes to the staging area ### Branching and merging In git, a **branch is a pointer to a commit**. Default branch called **master**. Git also has a pointer called **HEAD**, which points to the current commit. When you commit Git advances the branch and HEAD pointers to the new commit. Use git checkout to move between commits (branches too, since they are pointers to commits). #### Commands: `git branch <name>` Create a new branch with given name (starting from this commit) `git checkout <name>` Switch to another branch/commit. Cannot checkout if there are unstaged changes (stash ez). Basically moves HEAD ### Merge After a new feature/fix is finished you can merge the branch you are working on with another one and safely delete the branch without losing any history. E.g. (currently on branch master) `git merge develop` - **Fast-forward merge:** Occurs when there is a **linear path** from the tip of your branch and the tip of the target branch. Git simply moves master and HEAD pointers forward, to match develop. - **Three-Way merge*** If the two branches **have diverged** you can not use a ff-merge. **Git uses three commits** (the tip of the two branches and their common ancestor) **to create the merge commit**. You might have to **resolve conflicts manually**. ### Syncing with git #### Git remote When you clone a repository with git clone, it automatically creates a **remote connection called origin** pointing back to the cloned repository. If you create a branch, it will exist only in your local repo. If you want to push it on the remote server you can run `git push origin <branch-name>` #### Git fetch vs git pull - Fetch Download all new remote commits **It doesn’t force you to actually merge the changes into your local repository** - Pull = fetch + merge #### Push Git push uploads local repository content to the remote repository. *To prevent you from overwriting commits, __git won’t let you push when it results in a non-fast-forward merge__ in the destination repository. If the remote history has diverged from your history, you need to pull/fetch the remote branch and merge it into your local one, then try pushing again.* ### .gitignore File containing all files and directory to be ignored by git. **The .gitignore file should be part of the repo**. ### Stash Git stash **temporarily shelves** (or stashes) changes you've made to your working copy **so you can work on something else** (checkout on other branches too), and then **come back and re-apply them later on** (using `git stash pop` ). ### Rebase `git rebase <branch or commit>`: - finds the closest common ancestor between the current branch and the new base - lists all diverged commits - applies all of those in sequence starting from the new base - **NB: this generates brand new commits** - usually done interactively using `git rebase <branch> -i` ### Tag A tag is an alias for a specific commit. `git tag -a v1.4 -m "my version 1.4"` **Tags are not automatically pushed nor pulled**. Use `git push --tags` or `git fetch --tags` ### Blame The high-level function of git blame is the display of **author metadata attached to specific committed lines** in a file. This is used to examine specific points of a file's history and get context as to **who was the last author** that modified the line. ### Configuration ??? ### Diff By default `git diff` will show you any changes since the last commit. Git diff can also compare files from any commit/branch. ### Reset `git reset <commit>` moves the tip of a branch (the HEAD too) to a previous commit. **Commits that are now after HEAD are dangling and will get automatically deleted by git**. If you want to delete commits while keeping local changes, use `git reset <commit> --soft` `git reset` (without specified commit) will reset to the latest commit. ### Revert `git revert <bad-commit>` will create a new commit opposite to \<brand-commit>. After this, the situation should be identical to the commit before \<brand-commit>. Passing the **-n option** will add the inverse changes to the staging index and working directory (instead of creating a commit). ### Git RM (remove) Used to untrack files. `git remove --cached <file-name>` will untrack but keep local files. `git remove -r <dir-name>` will untrack all files inside dir and its sub-dirs. The changes **need to be committed**. ### Amend `git commit --amend` replaces the last commit with a brand new one, including the previous changes and the currently staged ones. ### Pull request Is a request for **initiating code review** and general discussion about a set of changes **before being merged** into a branch. - Commonly used by teams and organizations collaborating using the *Shared Repository Model*, where everyone shares a single repository and **feature branches** are used. - Useful in providing a way to **notify about changes** one has made and wants to integrate in the main branch. - A pull request doesn't have to end directly with a merge, **new commits can be pushed** to the feature branch after the pull request started and it would update automatically. - Useful for **asking suggestions** if a feature development gets stuck - People can review your commits, give suggestions and get updated with the new ones in the meanwhile. - Two options: - Pull Request from a **forked repository** - Pull Request from a **branch** within a repository ## COMPARING WORKFLOWS ### Centralized - Master only - `pull --rebase`, etc... - **very small teams** - The central repository represents the official project, so its **commit history should be treated as sacred and immutable** ### Feature Branch - Master, feature branches - Multiple developers to work on a particular feature without disturbing the main codebase - **Master branch will never contain broken code**, which is a huge advantage for **continuous integration environments** - Perfect if used with **pull requests**. ### Gitflow ![](https://i.imgur.com/giETraI.png) ## Agile Processes ### What? - It's an **interactive** approach to development with **regular feedback intervals** - It's **not built in phases ** ### Why? - Split development of different features - Constantly build and maintain the project - **Easy to adapt to market changes and new requirements** ### How? - Using frameworks like **scrum** and **kanban** - **Graph and charts to identify bottlenecks** and other problems - **Discuss** in person at a **regular basis** ideas and concerns about the project - The team **needs to have trust in each member** In general the product owner should **prioritize the most important work** first and don't push the development team with arbitrary deadlines and too much work. **The team can accept new work only when the old one is complete**. ## Scrum ### Sprint With Scrum the product is built in a series of **fixed-length iterations called sprints**. **Frequent milestones (end of a sprint) give a sense of tangible progress**. The sprint is a **time-limited effort** to implements a set of feature described in the **sprint backlog**. At the end of the sprint these features are showcased. ### Sprint ceremonies - **Sprint planning** A meeting to decide what to do in the coming sprint, define the sprint backlog and the sprint team. - **Daily stand-up** aka **Daily scrum**. A quick meeting (15 min) where every member says what they have done the day before and what they are planning to do today. The team has to stand so **it must be fast**. - **Sprint Demo** A meeting to showcase what the team has shipped in the last sprint - **Sprint retrospective** A review of what did and did not go well in the last sprint to improve the next one. ### The three roles in Scrum #### The product owner - Build and manage the **product backlog** - Make sure that everyone understands the product backlog - Give **priority to features** - Decide when to ship the product - **It's not a project manager!** - He needs understanding of the **current market situation** to deliver the most value out of the project. - It must be **only one** #### Scrum Master (project manager??????) - **Coach** the team, the product owner and the business on the scrum process - **Schedule needed resources** (human and logistical) - **Resolve impediments and distractions** for the development team **NB:** it's important to remember that **Scrum uses a pull model**, the work is pulled from the backlog to ensure **maintaining quality, performances and team moral. The scrum master does not push work to the team!** #### Scrum Team - 5 to 7 members - Different skill sets - The team needs to be sure nobody is a bottleneck - Mutual help - **The team needs to forecast how much work they believe they can complete in the sprint** ### The Backlog The backlog is a **prioritized list of work**. The most important items are at the top. The prioritization is influenced by: - Customer priority (eg: urgent bug fixing) - Urgency of getting feedback - Relative implementation difficulty - Symbiotic relations between items (to do A we need B) It is important to maintain the backlog healthy. It's important to review the backlog before each iteration planning. Regular **review of the backlog is called backlog grooming**. In a large backlog we can have two types of items: - **Near-term** It is a precise and well defined item with complete user stories and an already established collaboration between design and development. - **Long-term** They can be a bit vague but it is useful to get a rough idea to prioritize them. ### User Stories A user story is an **end goal** expressed from the **user's perspective**. **A key component of agile development is putting people first**. Stories use **non-technical language** to provide context for the team. After reading a user story you should know what you are going to implement. - Stories keep **focus on the user** The team is focused on solving problems for real users - Stories **enable collaboration** With a goal defined the team can focus to serve the future user's goals - Stories **drive creative solutions** encourage to think creatively and critically - Stories **create momentum** Small challenges for small wins to increase team moral ### Epics **A large body of work that can be broken down in smaller stories**. Epics are delivered **over a set of sprints**, learning more about the epic more stories will be added or deleted if resolved. - **Reporting**, to the project manager - **Storytelling**, how you arrived at the current state - **Culture** - **Time** - **User role or persona**: unique story for each user persona - **Ordered steps**, a story for each step ### Estimation **There is no requirement to work weekends in order to compensate for under estimating a piece of work** Better to **use Points** which **abstraction can help** to make tougher decision around difficult work. Velocity is the measure of the **amount of work a team can tackle in a single sprint, sum of the Points for all fully completed User Stories**. `person_days * focus_factor = estimated_velocity` The focus factor is computed on experience: `focus_factor = actual_velocity / person_days` example: Previous sprint did 18 story points in 45 Man-Days => 18 / 45 = 0,4 focus And then multiply this factor for the Man-days of the next sprint: 0,4 focus * 50 Man-days = 20 Story Points #### Points vs hours More abstract #### Planning poker - choose points without knowing other members' choice - highest and lowest points discuss ### Running Late Running late in a sprint needs proper solutions. Being Scrub a well timed process we **can not extend the sprints** nor modify adding more members while the sprint is running. **Cannot ship features that were not tested**. **Solution: ship only the user story fully completed. Left Stories will be added to the backlog again**. ## Application programming Interfaces and Service-Oriented Architectures ### APIs An API defines **how other programs can interact with your software**. If a user interacts with your code via code of their own, you are building an API. Good API: - **Simplicity** **Don't add useless complexity** - **Useful Abstractions** You need to **hide details from users leaving only the essential** - **Consistent** Name the same thing the same way, **have a common style**. Call two opposites methods with opposite names (open and close and not open and destroy) - **Principle of Least Astonishment** **no surprises, minimize the learning curve** - **Think of your API as a product** Think of the user/customer side ### REST REST is an **architectural style for building distributed systems based on hypermedia**. Its **not tied to HTTP** but usually uses it as the application protocol. RESTful APIs are designed around **resources**. Every resource has a unique URI. Entities are grouped into **collections**, a separate resource with its own URI. Clients interact with services by exchanging **representation** of resources (JSON, XML, ...) RESTful APIs use a **stateless request model**. The only place where information are stored is the resource itself. #### Addressability - expose the interesting aspects of the dataset as **resources** - **an URI for every piece of information** - **potentially infinite number of URIs** #### Uniform Interface - a small set of **verbs** (methods) applied to a large set of **nouns** (resources) - Safe methods: GET and HEAD methods can be **ignored or repeated** without side effects - Idempotent methods: PUT and DELETE methods can be **repeated** without side effects - Unsafe and non-Idempotent methods: POST method should be treated with **care** ![](https://i.imgur.com/W8v2xF9.png) #### Connectedness - resource representations are hypermedia - served documents contain not just data, but also **links to other resources** #### Statelessness - every HTTP request executes in complete isolation - **does not mean "stateless applications"** State is moved: - in resources (server database) - in clients (cookies) (for login/logout, etc...) ### HATEOAS **Hypertext As The Engine Of Application State**. Idea: navigate the entire set of resources without requiring prior knowledge of the URI scheme, each HTTP GET request should return the information necessary to find all resources related directly to the requested object **through hyperlinks included in the response**. For example having costumers and orders, the representation of an order could include links that identify the available operations for the customer: ```json { orderID: 3, productID: 2, quantity: 4, orderValue: 16.60, links: [ { rel: "customer", href: "http://adventure-works.com/customers/3", action: "GET", types: ["text/xml", "application/json"] }, { rel: "customer", href: "http://adventure-works.com/customers/3", action: "PUT", types: ["application/x-www-form-urlencoded"] } ] } ``` ### Versioning Backward compatibility is essential => versioning #### No versioning This is the simplest approach, big changes could be represented as **new resources or new links**. Existing client applications might continue functioning correctly if they are capable of **ignoring unrecognized fields**, while new client applications can be designed to handle this new field. #### URI versioning Each time you modify the web API or change the schema of resources, you **add a version number to the URI for each resource**. `http://adventure-works.com/v2/customers/3` This scheme complicates implementation of HATEOAS as all links will need to include the version number in their URIs. #### Query String Versioning Rather than providing multiple URIs, you can specify the version of the resource by using a **parameter within the query string** appended to the HTTP request `http://adventure-works.com/customers/3?version=2` This approach has the semantic advantage that the same **resource is always retrieved from the same URI**, but it depends on the code that handles the request to parse the query string and send back the appropriate HTTP response. #### Header Versioning Rather than appending the version number as a query string parameter, you could implement a **custom header** that indicates the version of the resource. This approach requires that the client application adds the appropriate header to any requests. ``` Version 1: HTTP GET http://adventure-works.com/customers/3 HTTP/1.1 Custom-Header: api-version=1 ``` ## Testing Testing is the process of executing a program with **the intent of find-ing errors** - add value to a program - **don't test to show that it works** - raise reliability (find and fix bugs/possible exploits) - **start by assuming it does contain errors and find as many as possible** If our goal is to demonstrate that a program has errors, our test data will have a higher probability of finding errors Usually impossible to test all cases and find all errors A program may be incorrect because of missing paths. Exhaustive path testing, of course, would not detect the absence of necessary paths. Third, an exhaustive path test might not uncover data-sensitivity errors. ### Principles - A necessary expected output/result - Avoid attempting to test your own program (you make implicit assumptions) - A programming organization should not test its own programs - Inspection of the results of each test - Tests for input conditions that are **invalid and unexpected, and for those valid and expected** - **See if it DOES NOT do what it is SUPPOSED to do** is only half the battle; the other half is **seeing whether the program DOES what it is NOT SUPPOSED to do** - ???\???\???Avoid throwaway test cases unless the program is truly a throwaway program - Do not test assuming no errors will be found - Errors are usually close to each other (same shitty developer) - Testing is an **extremely creative and intellectually challenging task** What subset of all possible test cases has the highest probability of detecting the most errors? The study of test-case design methodologies supplies answers to this question. ![](https://i.imgur.com/VDDYXfa.png) [intro and black](https://drive.google.com/file/d/1ofEqiOsgTSV0UFMNhro5cwZ3OBq7FDgo/view) > Testing can show the presence of bugs but never their absence. -DIJIJKKSTRJIA ### Types of software testing - Unit testing: test a function, in isolation. Very often done by implementer. Google calls them small tests. When we do unit tests we have no hypothesis on how the function will be used. - Integration testing: we take many functions (already tested) and test them in combination. We test essentially that the people writing the different modules made compatible assumptions on what other modules provide. - System or validation testing: tests if the overall system meets its goals. ### Purpose and conditions of testing - Functionality - Performance/stress testing: test the system at the boundaries of expected usage conditions, or beyond. - Security - Usability - Reliability (also via stress testing) - Acceptance: A test conducted to determine if the requirements of a contract are met. - Regression: running test cases on a new implementation of a component ### Black-box testing #### Equivalence partitioning ![](https://i.imgur.com/cuIro5W.png) **Errors are more likely on class boundaries** ![](https://i.imgur.com/T5kTTJN.png) ### White-box testing #### Are our partitions good? TEST COVERAGE Tries to do what partition does, but in a different way. Test coverage is a semi-automatic way of partitioning the input domain based on observable features of the source code. #### Types of coverage Ex: ```python def foo(x,y): if x == 0: y += 1 if y == 0: x += 1 ``` - **Function or statement coverage** Percentage of functions or statements of our source code which are executed by our set of test cases **foo(0, -1)** will cover every line Good - Objective measure - We know what it takes to get full coverage Bad - Can't find out **bugs of omission** - **100% coverage does not mean bug-free** - is 80% coverage good or bad? - ??\??\???\???\??\???\???\??\? - **Branch coverage** Coverage of each outcome of condition evaluation 100% of foo(x,y): **foo(0, 1)** first if, second else **foo(1, 0)** first else, second if or **foo(0, -1)** first if, second if **foo(10, 10)** first else, second else - **Loop coverage** Each loop is executed 0 times, once, and more than once. **Loop boundary conditions** are an extremely frequent source of bugs in real codes. - **MC/DC** **Modified Condition/Decision Coverage** **Required for any safety-critical system** it is - branch coverage (decision coverage) + - condition coverage: every term involved in a decision takes on every possible outcome + - modified: every term used in a decision independently affects its outcome To test foo(x,y), branch coverage is enough because each if() has only 1 term - **Path coverage** A path through a program is a sequence of decisions made by operators in the program Path coverage cares about how you got to a certain piece of code ### WHAT ABOUT CODE NOT COVERED? 3 possibilities - unfeasible code - not worth covering - incomplete test suite [white](https://drive.google.com/file/d/1EoMkf83jBxq54XsNrV5JpiB9C8Ff7zjo/view)