# Web Application Security: hands-on intro
## Intro Guide
Due to the time constraints of the workshop this is just a very brief intro to _Dynamic (Web) Application Security Testing_ aka _(web) pentesting_ aka _ethical (web) hacking_. We hope it gives you enough insights but also leaves us enough time for the practical challenges.
In this guide we give you a little context and theoretical background, some tips how to use the tools in the course, and a tip how to approach the challenges in the practice sessions.
## Web Applications

> (Source: https://xkcd.com/869/ , under [CC BY-NC 2.5 license](https://creativecommons.org/licenses/by-nc/2.5/))
Web sites and web applications can take on a wide variety of forms and be based on many different scripting or programming languages. From a very simple site with plain static HTML files to highly scaled and dynamic applications with distributed microservices in the backend.
Here we want to establish just the basics of how websites and web applications work.
### How does a web server work
A web server is basically a programm that listens on one or more TCP ports (usually 80 and 443), waits for requests from web clients and provides whatever information the client requested, if it is available on the server.
For the web server and the web client to talk to each other they follow to the [HTTP protocol](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol), short for _HyperText Transfer Protocol_. Today on the web most sites already use HTTPS, which is HTTP over a TLS-encrypted session. So for the server and the client this usually does not matter, they just speak HTTP, while the operating system's TLS layer makes sure, that everything is encrypted between the two systems connected through TCP/IP.
HTTP is a plaintext protocol. So you could just use `telnet` on the command line to connect to a web server and type in the appropriate text to retrieve some file from the server. The above linked Wikipedia article has a nice example of how this might look like:

> (Source: https://commons.wikimedia.org/wiki/File:Http_request_telnet_ubuntu.png , under public domain)
The request (the thing the client sends to the server) consists of the request line which tells the server what kind of request it is, what resource to get and which protocol version to apply. In the above example it is a `GET` request for the resource `/wiki/Main_Page` with the protocol version `http/1.1`. After the request line the client can send zero to many header fields, then an empty line and an optional request body.
The response (the thing sent back by the server) starts with a status line, consisting of the protocol version, the status code and optionally a text describing the status code. In the case above this is the protocol `HTTP/1.0` with a status code `200` which means `OK`. Then the server adds zero to many header fields, depending on context. Usually it at least tells you what server it is, the date, size and the content-type and language of the requested resource. Following the header fields comes an empty line and an optional response body (e.g. the content of the HTML file the client requested).
The response header can also contain [Cookies](https://en.wikipedia.org/wiki/HTTP_cookie), which are set by the server with header lines like these:
```
Set-Cookie: sessionToken=abcdef01234567890; Expires=Tue, 31 Aug 2021 12:34:56 GMT
Set-Cookie: foo=bar
Set-Cookie: chocolate=good
Set-Cookie: raisins=evil
```
The browser then would store these cookies (until their expire date, if one was supplied) and use them in all follow-up requests to the same web site, by inserting the following header line in to the request:
```
Cookie: sessionToken=abcdef01234567890; foo=bar; chocolate=good; raisins=evil;
```
For more details on what cookies can do and how they work, take a look at the extensive Wikipedia page [HTTP cookie](https://en.wikipedia.org/wiki/HTTP_cookie). For a more detailed description of the HTTP protocol, the different request methods (like GET, POST, PUT, DELETE), and the range of status codes go the Wikipedia [Hypertext Transfer Protocol](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol). This page is well suited as a first reference point whenever you are wondering what is going on at the HTTP level.
### So what happens when I actually visit a website?
HTTP itself is not actually very complicated. The things mentioned above would already allow you to write your own simple web servers and clients. The quick expansion of the web in the 1990ies was in part only possible because of the clarity and simplicity of HTTP.
But in practice when we visit websites nowadays there is usually going on a lot more than just a simple exchange of one HTTP request and response between a server and a client.
Let's do a short excursion to the [Sending Passwords on Postcards](https://tantemalkah.at/2021/sending-passwords-on-postcards) talk to the [slides about what happens when I visit a web site](https://tantemalkah.at/2021/sending-passwords-on-postcards/#/3/10/5). (There is also a [recording of this talk on Youtube](https://www.youtube.com/watch?v=64DyVw5Qz-k), the section we are interested in starts at 28:10 and ends at 37:40)
In the above example we did a simple `GET` request to `diebin.at` on port 80 and got a successfull `200` response with some headers and a request body. While we could be done with that, usually a lot more than this simple request-response exchange happens. In this case the server sent a `Location` header point to a different URL, and also a response body in HTML that just tells us that this document has moved to this other URL. The other URL is basically the same, but beginning with `https://`. Fortunately many public web servers nowadays use TLS encryption and redirect every request on a non-encrypted channel (usually port 80 with http://) to an encrypted one (usually port 443 with https://).
For a convenient browsing experience our web browser will just make another request to this secured page then. Once it has got the final content, it can display it to us. If this is an HTML file, it may for example contain embedded images, e.g. a tag like the following: `<img src="chockie.png" alt="Picture of a chocolate cookie" />`. At this point the browser could just tell us, that on this position in the document a picture with the filename "chockie.png" should be displayed, and that it also has a description, or alternative text (alt), that says: "Picture of a chocolate cookie". The browser (except e.g. a text browser) will then automatically send another HTTP request to the server to retrieve this chockie.png file, so that it also can display it in the HTML page loaded before.
In a similar way a lot of other embedded media could be automatically requested by the browser, as well as CSS files to style the page properly or Javascript files to add interactivity to the page. All of that usally happens without the user explicitly realising.
To see just one example of how many things can be loaded to get one web page, open your developer tools, go to the network section and then visit the [Hypertext Transfer Protocol](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol) Wikipedia page.
### Web sites vs. web applications
In a very exagerated way a web site is just a collection of mostly static hyperlinked content, including different media like text, images, audio and video. On the other hand a web app(lication) is a highly interactive programme accessible through a web browser. While the first can be rather easily maintained by changing the contents of files and adding or removing some files in the web root, the latter needs code adaptations to change even small parts and might also need a whole (re)deployment processs to run through, to get it online.
In practice, of course, it is not that simple and there is no clear boundary or guide to differentiate between a web site and a web application. There are a lot of opinions out there. Sometimes the answer is context-dependent. Take for example the following two short articles, which are both good summaries and comparisons of the two concepts:
* Hillary Nyakundi on freecodecamp.org: [What is the Difference Between a Website and a Web Application?](https://www.freecodecamp.org/news/difference-between-a-website-and-a-web-application/)
* Essential Design on medium.com: [Website vs Web App: What’s the Difference?](https://medium.com/@essentialdesign/website-vs-web-app-whats-the-difference-e499b18b60b4)
While the first lists Wikipedia as an example of a web site, the second one argues with more context that it is actually a web app. Both might be valid statements, depending on which reference point is important in a specific situation or for a specific use case.
For our use case - this seminar, where you will try to hack a web site/application - it is mostly important to differentiate between what is happening when a web server just serves static files and what happens when the web server response is processed by a script/programme, before it is sent to the client. Another important factor is, whether the response contains JavaScript (or other) code that can be executed in the browser.
Let's take a look at the following diagramme, which is a description of what happens when a browser is used to retrieve a web page from a web application:

> (Source: https://de.wikipedia.org/wiki/Datei:Webanwendung_client_server_01.png , under [CC BY-SA 3.0 license](https://creativecommons.org/licenses/by-sa/3.0/legalcode))
Most of this we already described in the two sections above. The browser sends an HTTP request to the server, the server checks if it has the content and sends back an HTTP response (either with the content or an error code/message). And after the response and for all future requests, the browser might store some Cookie data.
But in the above examples we also said: the web server itself only checks if the requested file is there and then sends its contents back in the response body. And this is the basic responsibility of the web server. It opens a file, reads its contents and sends it to the client in the HTTP response body. If it can't find the file, it sends back an error response (usually with the error code _404 Not Found_).
If the server now is configured to serve a web application, let's say a PHP based web application, some other magic has to happen. If, instead of a request for the `index.html` page, the client requests the `index.php` page, the server should not just send back the contents of the _index.php_ file. Rather it will start or communicate with another programm (as displayed in the diagramme above), in this case the PHP interpreter. The PHP interpreter then reads the file, executes all PHP code inside it and returns the file with all PHP tags replaced with the output of the PHP code.
While this code is being executed many other things might happen. E.g. the PHP code could open a connection to a database to store or retrieve some information. It could also load other PHP scripts, e.g. some libraries to scale images. Or it could open a connection to another web service to retrieve some information from there, before it is able to process the final output to the web server.
Our example here uses PHP, because the _Damn Vulnerable Web Application_, which we will use in the practice examples, is using PHP as well. But in general the web application's functionality could be provided by any other scripting or even compiled programming language. Common web application languages besides PHP are Python, Ruby, Perl, ASP, JSP, Go and many more. The only thing needed is a defined interface between the web server and the script/programming language.
Additionaly, not explicitly shown in the diagramme above, most web applications also contain extensive Javascript code, that is executed in the user's browser to add the needed interactivity for a versatile user interface (UI) and a smooth user experience (UX).
## Some security basics

> (Source: https://xkcd.com/538/ , under [CC BY-NC 2.5 license](https://creativecommons.org/licenses/by-nc/2.5/))
* The "CIA triad" of information security (for deeper insights go to the "Key concepts" section of Wikipedia's article on [Information security](https://en.wikipedia.org/wiki/Information_security#Key_concepts)):
- **Confidentiality**: a property of an information system that ensures that users can always only get those informations from the system for which they are authorized to read them
- **Integrity**: a property of an information system that ensures that data can be changed only by users who are authorized to do so. Or, in a slightly different framing: the property of a system "assuring the accuracy and completeness of data over its entire lifecycle" (ibid)
- **Availability**: a property (or the degree thereof) that an information system is providing functions whenever it is supposed to provide those functions, or in other words: that "the information must be available when it is needed" (ibid)
* Differentiating between different users of an information system relies on three important concepts (for more details: ibid, section [Access control](https://en.wikipedia.org/wiki/Information_security#Access_control)):
- **Identification**: a process of claiming that someone is who they say they are. E.g. if a person says "I'm Jane Doe", we would want to make sure, that this person really is Jane Doe. E.g. by using and ID card. Or a username (once logged in), because we know the username can only belong to this specific person (except it was stolen, like a stolen ID card). Or we apply more complicated and different approaches to identify someone (e.g. by a token the person is carrying with them, or the person's body itself). In all of the cases the person carries the means of identification with them.
- **Authentication**: a process of actually verifying that the identification is valid. E.g. by the user typing in a password which belongs to the username. Or by checking some biometric data stored with the user account against the user's body characteristics. Usually we can differentiate between 3 different categories how to authenticate:
- by something you know: e.g. a password or passphrase, a PIN or some other code, an answer to a question, ...
- by something you have: e.g. a token, an ID card, an RFID chip, ...
- by something you are: e.g. finger print, the scan of your retina, ...
- to make authentication stronger we can also use more than one property, ideally from two or more different of the above categories. This is what 2FA (2 factor authentication) is all about.
- **Authorization**: a process to check whether a specific user is allowed to execute a specific action (e.g. read a document, upload a file, change an existing file, send a message, ...). Usually this happens after a user has already been authenticated (so we already know they are who they say they are).
* **Vulnerability**: "A vulnerability is a hole or a weakness in the application, which can be a design flaw or an implementation bug, that allows an attacker to cause harm to the stakeholders of an application." ([OWASP: Vulnerabilities](https://owasp.org/www-community/vulnerabilities/))
* **Exploit**: "a piece of software, a chunk of data, or a sequence of commands that takes advantage of a bug or vulnerability to cause unintended or unanticipated behavior" ([Wikipedia: Exploit (computer security)](https://en.wikipedia.org/wiki/Exploit_(computer_security)))
* **Zero Day** (Exploit): An exploit that is "unknown to everyone but the people that found and developed them" (ibid).
### Where do vulnerabilities come from?
Some reasons why so many applications out there are insecure:
* faulty/buggy/insecure code
- because developers do not know how to write secure code
- because developers are working under too much pressure to satisfy deadlines and deliver code
* faulty/insecure use of code
- because administrators do not know how to secure the service
- because documentation is bad and it is unclear how a clean (and secure) setup looks like
- because everyone likes to throw around with the buzzword label _DevOps_, but in practice there is no systematically engineered process in place to assure the quality and security of complex application systems (this is why there now is also _DevSecOps_)
* faulty/insecure systematic approach to application development
- no one really cares about security
- management layers are ignorant of the importance of security and do not allocate the appropriate resources
- security is not integrated from the start on (e.g. with a secure development life cylce, short SDLC)
There are tools and processes available to integrate security from the start. But those need time, experience, and in consequence also a lot of money. So in the recent decades many companies/organisations just did not care much, as long as they were lucky enough that they or their applications have not been hacked. So there were many good reasons, but not enough tough reasons to develop a security mindset throughout the organisation and from the start.
Another important aspect is that *information security* is not a specific *state* you can reach in any context. It is more a *chance/probability* that you can increase by appropriate measures. In consequence none of the applied tools and methods guarantee you to produce secure code. But they can highly increase the likelihood of your code to become secure and to tackle potential sources of insecure code at every stage of production and maintenance of applications.
The following are examples of where - or through what - insecure code and vulnerabilites are often introduced in the software life cycle:
* At the stage of planning and whenever design desicions have to be made to continue with development:
- Trusting the user's browser (and therefore the user) to check the validity of processed data, instead of doing it on the server
- Using outdated or self-coded encryption algorithms, or worse, none at all
- Using insecure protocols
- Hardcoding credentials that are not kept in separate configuration files outside any version control
* At the stage where actual coding gets done
- Trusting user data without doing any or incomplete input validation and output sanitization
- Using insecure methods and queries to access a database (e.g. SQL)
- Parsing any user provided file (which also means configuration files set by administrators) without taking precautions (e.g. when parsing XML files) or when using insecure external libraries/tools (e.g. when doing data compression)
* At run time, when the web application is getting deployed and running on a (publically) accessible web server
- Using default configuration snipptes without changing sensitive information (e.g. default passwords)
- Leaving debugging mode on in production systems
- Using TLS (or maybe still SSL) in an insecure way (e.g. to be "backwards compatible")
- Not using suggested security features, because they are not needed for the app to be functional, but they would require more time to set up and maintain and add complexity
## Web application security testing
In IT security there are two major categories of how to test applications for weaknesses and vulnerabilities:
* **SAST**, short for Static Application Security Testing, which means looking at an applications source code to check it for security issues (either manually in a code review or audit, or by automated tools)
* **DAST**, short for Dynamic Application Security Testing, which means checking an application for security issues while it is running. There are also tools to do this, but a prominent example of DAST is offensive security testing a.k.a. pentesting, where security specialists try to hack a running application, to find security issues that should then be fixed by the developers.
In our practice sessions we will be doing DAST for a web application, so we could also be more specific and say **DWAST**. An important thing when doing manual dynamic testing is, that it often can take a lot of time, also for experienced testers.
In this course we do not have that much time, and we want to encourage you to use us lecturers as a short-cut to the otherwise often time-consuming and sometimes even tedious research involved in security testing. We also do not have a solution for everything, but we can probably nudge towards a quick win.
For the exercises we would like you to (roughly) follow (or adapt) the following process:
1. Understand what the page is doing
2. Understand how it is doing that
3. Analyse what can be modified
4. Try to modify stuff to provoke unexpected/unintended responses
5. Create your payload / modified request to exploit the vulnerability
At any stage please do not hesitate to talk to your peers and also ask the lecturers, if you are stuck somewhere or if you are just missing an idea how to continue. We are here to use this cooperative learning environment to get some first insights and quick wins. This should hone your security mindset, whether you continue to code, work on coding related projects, or even decide to continue with a more security-focused career path.
One more important thing: when there is code to look at (e.g. because the DVWA offers it, or you are testing an open-source software, or you could aquire the code otherwise), don't hesitate to look at it. Code is an important source for ideas how to hack something. And DWAST is not about having a ingenious moment after another and magically typing the right things into a web form. Rather it about analysing all aspects of a system that you can examine, and to find clever ways to reconfigure and manipulate the system.
Now, let's move to the practical part. Happy hacking!