![image](https://hackmd.io/_uploads/SyWL9dNtC.png) # Introduction to Web Hacking ## Walking An Application ### View Page Source Để tìm file (directory) thì vào Inspect -> Source ![image](https://hackmd.io/_uploads/S1N_-F4KA.png) > External files such as CSS, JavaScript and Images can be included using the HTML code. In this example, you'll notice that these files are all stored in the same directory. If you view this directory in your web browser, there is a configuration error. Mọi thư mục đều được lưu ở /assets.. Để view, ip/assets ![image](https://hackmd.io/_uploads/Hy05ZK4KA.png) Tham khảo: https://otter-security.com/walkthrough/wk_post/12/ ### Dev Tool Every modern browser includes developer tools; this is a tool kit used to aid web developers in debugging web applications and gives you a peek under the hood of a website to see what is going on. As a pentester, we can **leverage (tận dụng)** these tools to provide us with a much better understanding of the web application. We're specifically focusing on three features of the developer tool kit: **Inspector**, **Debugger** and **Network**. * **Inspector** Có thể ẩn các flag sau trang giao diện, tick để kiểm tra từng thuộc tính CSS ![image](https://hackmd.io/_uploads/SJCh4tVFA.png) * **Debugger** ![image](https://hackmd.io/_uploads/H1CsUKNYC.png) > This little bit of JavaScript is what is removing the red popup from the page. We can utilise another feature of debugger called breakpoints. These are points in the code that we can force the browser to stop processing the JavaScript and pause the current execution. * **Network** ![image](https://hackmd.io/_uploads/HyKydK4KA.png) AJAX is a method **for sending and receiving network data** in a web application background without interfering by changing the current web page. ## Content Discovery ### Content Discovery là gì? Firstly, we should ask, in the context of web application security, what is content? Content can be many things, a file, video, picture, backup, a website feature. When we talk about content discovery, we're not talking about the obvious things we can see on a website; **it's the things that aren't immediately presented to us and that weren't always intended (có ý định) for public access.** This content could be, for example, pages or portals intended for staff usage, older versions of the website, backup files, configuration files, administration panels, etc. There are three main ways of discovering content on a website which we'll cover: **Manually**, **Automated** and **OSINT** (Open-Source Intelligence) ### Manually * **Robots.txt** The robots.txt file is a document that tells search engines which pages they are and aren't allowed to show on their search engine results or ban specific search engines from crawling the website altogether. It can be common practice to restrict certain website areas so they aren't displayed in search engine results. These pages may be areas such as administration portals or files meant for the website's customers. This file gives us a great list of locations on the website that the owners don't want us to discover as penetration testers. * **Favicon** aka Favorite icon The favicon is a small icon displayed in the browser's address bar or tab used for branding a website. * **Sitemap.xml** Unlike the robots.txt file, which restricts what search engine crawlers can look at, the sitemap.xml file gives a list of every file the website owner wishes to be listed on a search engine. These can sometimes contain areas of the website that are a bit more difficult to navigate to or even list some old webpages that the current site no longer uses but are still working behind the scenes. ![image](https://hackmd.io/_uploads/HJk0BcNFA.png) * **HTTP Headers** Để xem HTTP Headers: > curl ip -v ![image](https://hackmd.io/_uploads/rJZ68qEtA.png) * **Framework Stack** Once you've established the framework of a website, either from the above favicon example or by looking for clues in the page source such as comments, copyright notices or credits, you can then locate the framework's website. From there, we can learn more about the software and other information, possibly leading to more content we can discover. ### OSINT Informaion Gathering hoặc Recon * **Google Hacking/ Dorking** ![image](https://hackmd.io/_uploads/HyHh_qVFR.png) ``` utilize: sử dụng làm việc gì pick out: lựa chọn ``` * **Wappalizer** Wappalyzer (https://www.wappalyzer.com/) is an online tool and browser extension that helps identify what technologies a website uses, such as frameworks, Content Management Systems (CMS), payment processors and much more, and it can even find version numbers as well. * **Wayback Machine** The Wayback Machine (https://archive.org/web/) is a historical archive of websites that dates back to the late 90s. You can search a domain name, and it will show you all the times the service scraped the web page and saved the contents. This service can help uncover old pages that may still be active on the current website. * **Github** To understand GitHub, you first need to understand Git. Git is a **version control system** that tracks changes to files in a project. Working in a team is easier because you can see what each team member is editing and what changes they made to files. When users have finished making their changes, they commit them with a message and then push them back to a central location (repository) for the other users to then pull those changes to their local machines. GitHub is a hosted version of Git on the internet. Repositories can either be set to public or private and have various access controls. You can use GitHub's search feature to look for company names or website names to try and locate repositories belonging to your target. Once discovered, you may have access to source code, passwords or other content that you hadn't yet found. `track: theo dõi` * **S3 Buckets** S3 Buckets are a storage service provided by Amazon AWS, allowing people to save files and even static website content in the cloud accessible over HTTP and HTTPS. The owner of the files can set access permissions to either make files public, private and even writable. Sometimes these access permissions are incorrectly set and inadvertently allow access to files that shouldn't be available to the public. The format of the S3 buckets is http(s)://{name}.s3.amazonaws.com where {name} is decided by the owner, such as tryhackme-assets.s3.amazonaws.com. S3 buckets can be discovered in many ways, such as finding the URLs in the website's page source, GitHub repositories, or even automating the process. One common automation method is by using the company name followed by common terms such as {name}-assets, {name}-www, {name}-public, {name}-private, etc. ### Automated Sử dụng các tool: **ffuf**, **dirb**, **Gobuster** ![image](https://hackmd.io/_uploads/Hyqc39VKC.png) * **ffuf** > ffuf -w /usr/share/wordlists/SecLists/Discovery/Web-Content/common.txt -u http://10.10.97.43/FUZZ ![image](https://hackmd.io/_uploads/B1s9a5EFC.png) Mục đích: ffuf là một công cụ fuzzing HTTP, chủ yếu được sử dụng để tìm kiếm các thư mục và tệp ẩn trên máy chủ web bằng cách gửi nhiều yêu cầu HTTP với các từ khóa được cung cấp từ một danh sách. * **dirb** > dirb http://10.10.97.43/ /usr/share/wordlists/SecLists/Discovery/Web-Content/common.txt ![image](https://hackmd.io/_uploads/S1BOa9EKC.png) Mục đích: dirb là một công cụ quét thư mục và tệp tin dựa trên từ điển, được thiết kế để tìm các tài nguyên ẩn trên máy chủ web bằng cách gửi yêu cầu HTTP dựa trên danh sách từ khóa. * **Gobuster** > gobuster dir --url http://10.10.97.43/ -w /usr/share/wordlists/SecLists/Discovery/Web-Content/common.txt ![image](https://hackmd.io/_uploads/BkY86cEY0.png) Mục đích: gobuster là một công cụ brute-force URL và DNS, được thiết kế để tìm các thư mục và tệp ẩn trên máy chủ web cũng như các subdomain bằng cách sử dụng các danh sách từ khóa. ## Subdomain Enumeration Enumeration: liệt kê Subdomain enumeration is the process of **finding valid subdomains** for a domain We will explore three different subdomain enumeration methods: **Brute Force**, **OSINT** (Open-Source Intelligence) and **Virtual Host** ### OSINT * **OSINT - SSL/TLS Certificates** SSL/TLS: Secure Sockets Layer/Transport Layer Security When an SSL/TLS certificate is created for a domain by a CA (Certificate Authority), CA's take part in what's called "Certificate Transparency (CT) logs". These are publicly accessible logs of every SSL/TLS certificate created for a domain name. The purpose of Certificate Transparency logs is to stop malicious and accidentally made certificates from being used. We can use this service to our advantage to discover subdomains belonging to a domain, sites like https://crt.sh and https://ui.ctsearch.entrust.com/ui/ctsearchui offer a searchable database of certificates that shows current and historical results. Ví dụ: ![image](https://hackmd.io/_uploads/B1DlfsEtC.png) * **OSINT - Search Engines** ![image](https://hackmd.io/_uploads/ryKizsNFR.png) * **OSINT - Sublist3r** To speed up the process of OSINT subdomain discovery, we can automate the above methods with the help of tools like **Sublist3r** ``` ./sublist3r.py -d acmeitsupport.thm ``` ![image](https://hackmd.io/_uploads/Skvo7iVYC.png) ### Brute Force * **DNS BruteForce** Bruteforce DNS (Domain Name System) enumeration is the method of trying tens, hundreds, thousands or even millions of different possible subdomains from a pre-defined list of commonly used subdomains. Because this method requires many requests, we automate it with tools to make the process quicker. In this instance, we are using a tool called **dnsrecon** to perform this ![image](https://hackmd.io/_uploads/S1dGmjNtA.png) ### Virtual Host Some subdomains aren't always hosted in publically accessible DNS results, such as development versions of a web application or administration portals. Instead, the DNS record could be kept on a private DNS server or recorded on the developer's machines in their /etc/hosts file (or c:\windows\system32\drivers\etc\hosts file for Windows users) which maps domain names to IP addresses. Because web servers can host multiple websites from one server when a website is requested from a client, the server knows which website the client wants from the **Host** header. We can utilise this host header by making changes to it and monitoring the response to see if we've discovered a new website. Like with DNS Bruteforce, we can automate this process by using a wordlist of commonly used subdomains. ![image](https://hackmd.io/_uploads/B1QMEoVt0.png) ## Authentication Bypass ### Username Enumeration A helpful exercise to complete when trying to find authentication vulnerabilities is **creating a list of valid usernames**, which we'll use later in other tasks. > ffuf **-w** /usr/share/wordlists/SecLists/Usernames/Names/names.txt **-X** POST **-d** "username=FUZZ&email=x&password=x&cpassword=x" **-H** "Content-Type: application/x-www-form-urlencoded" **-u** http://10.10.48.180/customers/signup **-mr** "username already exists" ![image](https://hackmd.io/_uploads/HyOfJ6VKA.png) -w: selects the file's location on the computer that contains the list of usernames that we're going to check exists. -X: specifies the request method, this will be a GET request by default, but it is a POST request in our example. -d: specifies the data that we are going to send In our example, we have the fields username, email, password and cpassword. We've set the value of the username to FUZZ. In the ffuf tool, the FUZZ keyword signifies where the contents from our wordlist will be inserted in the request -H: used for adding additional headers to the request. Content-Type: the web server knows we are sending form data -u: specifies the URL we are making the request to, -mr: the text on the page we are looking for to validate we've found a valid username ### Brute Force with ffuf Tạo file txt trên Linux: > touch tên_file.txt -> nano tên_file.txt ![image](https://hackmd.io/_uploads/HyeIEaNt0.png) > ffuf -w valid_usernames.txt:W1,/usr/share/wordlists/SecLists/Passwords/Common-Credentials/10-million-password-list-top-100.txt:W2 -X POST -d "username=W1&password=W2" -H "Content-Type: application/x-www-form-urlencoded" -u http://10.10.48.180/customers/login -fc 200 Previously we used the FUZZ keyword to select where in the request the data from the wordlists would be inserted, but because we're using multiple wordlists, we have to specify our own FUZZ keyword. In this instance, we've chosen **W1** for our list of valid usernames and **W2** for the list of passwords we will try. The multiple wordlists are again specified with the **-w** argument but separated with a comma. For a positive match, we're using the **-fc** argument to check for an HTTP status code other than 200. ### Logic Flaw #### Logic Flaw là gì? Sometimes authentication processes contain logic flaws. A logic flaw is when the typical logical path of an application is either bypassed, **circumvented** **(vượt qua)** or **manipulated (thao túng)** by a hacker. Logic flaws can exist in any area of a website, but we're going to concentrate on examples relating to authentication in this instance. ![image](https://hackmd.io/_uploads/BJcIUTVYC.png) ![image](https://hackmd.io/_uploads/r1228a4YC.png) Because the above PHP code example uses three equals signs (===), it's looking for an exact match on the string, including the same letter casing. The code presents a logic flaw because an unauthenticated user requesting **/adMin** will not have their privileges checked and have the page displayed to them, totally bypassing the authentication checks. #### Logic Flaw Practical (70%) https://www.youtube.com/watch?v=7esqKdp9aoE ### Cookie Tampering (70%) ## Network Security > Learn the basics of passive and active network reconnaissance. Understand how common protocols work and their attack vectors ### Passive Recon ![image](https://hackmd.io/_uploads/ByO2jmBY0.png) We use **whois** to query WHOIS records, while we use **nslookup** and **dig** to query DNS database records. These are all publicly available records and hence do not alert the target. `query: truy vấn` We will also learn the usage of two online services: * DNSDumpster * Shodan.io These two online services allow us to collect information about our target without directly connecting to it. Before the dawn of computer systems and networks, in the Art of War, Sun Tzu taught, “**If you know the enemy and know yourself, your victory will not stand in doubt**.” If you are playing the role of an attacker, you need to gather information about your target systems. If you are playing the role of a defender, you need to know what your **adversary (kẻ thù)** will discover about your systems and networks. Passive reconnaissance activities include many activities, for instance: * Looking up DNS records of a domain from a public DNS server. * Checking job ads related to the target website. * Reading news articles about the target company. #### Whois WHOIS is a request and response protocol that follows the [RFC 3912 ](https://www.ietf.org/rfc/rfc3912.txt)specification. A WHOIS server listens on TCP port 43 **for incoming requests**. The domain registrar is responsible for maintaining the WHOIS records for the domain names it is leasing. The WHOIS server replies with various information related to the domain requested. Of particular interest, we can learn: * Registrar: Via which registrar was the domain name registered? * Contact info of registrant: Name, organization, address, phone, among other things. (unless made hidden via a privacy service) * Creation, update, and expiration dates: When was the domain name first registered? When was it last updated? And when does it need to be renewed? * Name Server: Which server to ask to resolve the domain name? > registrar: người đăng kí > whois tên _miền ![image](https://hackmd.io/_uploads/ByX8WNHYC.png) ![image](https://hackmd.io/_uploads/BkHEu4HtA.png) #### Tổng kết: ![image](https://hackmd.io/_uploads/SJLLOESY0.png) Whois, nslookup, dig, DNSDumpster, Shodan là các kiểu recon thụ động ### Active Recon > Learn how to use simple tools such as traceroute, ping, telnet, and a web browser to gather information. Active reconnaissance requires you to make some kind of **contact with your target**. This contact can be a phone call or a visit to the target company under some pretence to gather more information, usually as part of social engineering. Alternatively, it can be a direct connection to the target system, whether visiting their website or checking if their firewall has an SSH port open. Think of it like you are closely inspecting windows and door locks. Hence, it is essential to remember not to engage in active reconnaissance work before getting signed legal authorization from the client. In this room, we focus on active reconnaissance. Active reconnaissance begins with direct connections made to the target machine. Any such connection might leave information in the logs showing the client IP address, time of the connection, and duration of the connection, among other things. However, not all connections are suspicious. It is possible to let your active reconnaissance appear as regular client activity. Consider web browsing; no one would suspect a browser connected to a target web server among hundreds of other legitimate users. **You can use such techniques to your advantage when working as part of the red team (attackers) and don’t want to alarm the blue team (defenders).** ![image](https://hackmd.io/_uploads/rkC4ySHFA.png) ## Vulnerability Search