--- tags: devtools2022 --- # 05-The Internet & Internet Layering Model **[Day2 PM]** There is __a lot__ to learn about Computer Networks, and today we will briefly go through some of the most important points for development. > If you have learned these before, treat it as a **recap**. To give you some context, Computer Networking in SUTD is taught as a 14-week full time module. This three hours contains a high level overview of the most important concepts in Computer Networking and cherrypicked content that is most relevant to _developer tooling_. If you are interested to learn more about the subject, I recommend the book _Computer Networking: A Top Down Approach_ by Kurose et al. ## Schedule and Learning Objectives * **1330-1400**: Network Protocol Stack * **1400-1430**: Transport Layer Protocol, NAT * **1430-1500**: Network diagnostic tools: netstat, dig, traceroute, wireshark * **1500-1530**: Break * **1530-1630** **Lab**: * Wireshark * AWS Route 53 * Host a simple site with NGINX * **1630-1700**: Discussion, Q&A ## Overview The Internet is a computer network that interconnects billions of computing devices throughout the world. A "computing" device can be your phone, laptop, TV, and the server machine that you stream Netflix from. In Internet jargon, we call each of these devices **hosts**. > There are an estimated 25 billion devices connected to the Internet at any time. ![Overview of the Internet](https://www.cs.utexas.edu/~mitra/csFall2015/cs329/lectures/fig1.gif) Hosts are **connected** together by a network of **communication links**. * These communications links are **abstracted** behind different protocols and can comprise of different physical mediums. * However, because there are so many devices connected to the Internet, it is physically **impossible** have a **direct** connection between every single device. > Why? We thus have a network where we delegate the passing of messages to other devices through one or more intermediate **hops**. **Think about Netflix on your phone.** Your phone does not have a direct physical line to the overseas Netflix servers. * To reach Netflix, data on your phone is first passed to your home's WiFi Router, * then to your Internet Service Provider (ISP) like Starhub, Singtel, M1; * then your ISP passes to a regional ISP that connects different countries together; * it then passes to an ISP in an overseas country, and finally overseas ISP passes the message to Netflix servers. ## `traceroute` `traceroute` (UNIX) or `tracert` (Windows) is a network **diagnostic** **commands** (a.k.a system program) for displaying possible routes and measuring transit delays of packets across an Internet Protocol network To see the "hopping" with our own eyes, let's try to examine how many intermediate hops does it take for a packet to travel to different hosts: a website hosted in Singapore, a website hosted overseas, and to your tablemate's computer. 1. Open your terminal 2. Traceroute to `gov.sg`: - Windows: `tracert gov.sg` - Mac: `traceroute gov.sg` > From output, who do you think is hosting the `gov.sg` website? 3. Traceroute to `washingtonpost.com` > Do you notice that the first few hops are the same? Why do you think this is so? > > The first few addresses are all private addresses (more on that in the later section). The first line represents your computer's local address, the subsequent lines are other intermediate devices in SUTD's network such as the routers and firewalls. 4. Traceroute to your friend's device (find your friend's **local** address) ## Network Protocols Hosts use **protocols** to talk to each other. ==A protocol is a formally defined set of rules (such as the format and order of messages) exchanged between two or more communicating entities, as well as the actions taken on the transmission or receipt of a message or event.== There are **many** different protocols used on the Internet: * HTTP for web browsing, * RTMP for livestreaming, * SMTP for email, to mention a few To ensure different applications can still work on the same established backbone that is the Internet (e.g. you can check email regardless of whether you are using WiFi or Ethernet), the **Internet Layering Model** exists to group protocols into distinct layers. ![Overview of the Internet](https://www.cs.utexas.edu/~mitra/csFall2015/cs329/lectures/fig2.gif) A brief summary of the layers are as follows: ### Application Layer The **highest** level that is closest to the end user, it powers end user applications. Examples include `HTTP` for web browsing (implemented by the browser), `SMTP` for email (implemented by the mail server), `DNS` for hostname resolution (implemented by the DNS server). Defines **messages** to be used in the protocol, for example a command that retrieves a user's current inbox. ### Transport Layer This is **implemented** by the host OS. Facilitates the transport of **arbitrary** **application** layer **messages** between two **sockets**, something like a *magic* portal you can send data in and read data from but have no idea how the data is getting there. Encapsulates (wraps) application layer messages into **datagrams**. The only major transport layer protocols in use are TCP and UDP, and now we have [QUIC](https://en.m.wikipedia.org/wiki/QUIC) as well. > QUIC relies on UDP, but it belongs to the transport layer. It is adopted as a standard by the IETF in May 2021 ### Network Layer Facilitates the transport of **arbitrary** **transport** layer **datagrams** between two addressable **hosts**, such as your phone and Netflix servers. Encapsulates transport layer datagrams into **packets**. The Internet Protocol (IP) version 4 is the dominant network layer protocol in use, with IPv6 slowly being introduced. > Routers support up to this layer by default. ### Link Layer Facilitates the transport of **arbitrary** **network** layer **packets** between two physical **interfaces**, such as the WiFi connection between your phone and your home router. Other Link Layer protocols include Ethernet (wired connections) and Bluetooth. Encapsulates network layer packets into **frames**. > Switches support up to this layer by default. ### Physical Layer Facilitates the **conversion** between **digital** messages to a physical output (analog), such as **pulses** of light in a fiber optic cable or **changes** in frequency in radio waves. Converts link layer frames into a stream of **bits** (1s and 0s) and vice versa. ### Layered Protocol A protocol lives on each layer and **only needs to care about next layer directly below it**. Take `HTTP` and web browsers like Google Chrome for an example. * The Google Chrome developers **created** the web browser that could load web pages by writing code that sends messages through `TCP` sockets. * Chrome **does not need to know** how many *intermediate* hops are there between you and the web server, and neither does it need to know whether you use **WiFi** or **Ethernet** to connect to your home router. * The lower layers are handled by your device's **operating system** and device **drivers**. The average developer usually works in the **application** and **transport** layers. In today's course we will cover a few key protocols and concepts: - `TCP` / `UDP` - `NAT` - `DNS` ## TCP / UDP TCP and UDP are the dominant transport layer protocols. Transport layer protocols provide developers with _sockets_: > Recall: a magic door where they can send and receive data from without concerning how the data *actually* gets there. Sockets are entirely managed by our OS. To do this, the server **creates** a new process that _listens_ on a certain port number and address for incoming data. A host may be running **different** services that listens for TCP/UDP messages, hence there is a need for `port` numbers to distinguish between these services. > It's similar to calling the same phone line and pressing a different **extension** number (1,2,3) to get redirected to a different department. ![](https://ipwithease.com/wp-content/uploads/2020/06/COMMON-TCP-IP-WELL-KNOWN-PORT-NUMBERS-TABLE.jpg) Most **firewalls** allow you to allow/deny traffic based on: 1. The source/destination address, 2. The transport protocol, and 3. The port number. For example, in the morning, in order for your **web** app to be **accessible** on EC2 you had to **modify** the security groups to allow **TCP** port `8080` from anywhere (`0.0.0.0/0`), because `HTTP` is built on top of TCP. ### TCP - Transmission Control Protocol TCP is a transport layer protocol that **preserves** the **order** of datagrams and guarantees that datagrams will be read from the receviving sockets exactly once. It also has other features such as **congestion** control to automatically limit the rate of data being sent over the underlying channel if it detects the receving end has trouble reading the datagrams on time. ![](https://notes.shichao.io/unp/figure_4.1.png) ### UDP - User Datagram Protocol UDP is a _connectionless_ protocol that strips all the bells and whistles that TCP offers. Unlike TCP, UDP is an **unreliable** transport protocol: it does not guarantee datagrams will arrive in order, or even arrive _at all_. However, it is **significantly** faster than TCP. It is thus used in applications where throughput is more important than data integrity, such as **teleconferencing** applications that can **afford** to lose a few frames of real-time video. ![](https://i.stack.imgur.com/G5nUi.png) ## Monitoring network connections with `netstat` 1. Reopen your Cloud9 environment 2. Run your nodejs app, e.g. `node example.js` 3. In another terminal window, run `netstat`. * Look at the `LISTENING` section. 5. Try to access your webapp at your EC2 instance's IP address. 6. Run `netstat` again: * Observe there is now an active TCP connection between the EC2 instance and some address. That address is actually your computer having connnected to your EC2 instance. Your web browser **opened** a TCP connection to your EC2 instance to **send** and **receive** HTTP messages. ## NAT Back when the Internet Protocol was created decades ago, it was originally thought that it was **possible** to assign every connected device a unique IP address. However as technology advanced, there became **MORE** Internet-capable devices than available IP addresses. To overcome this issue, networks are split into: 1. Local Area Networks (LAN) 2. Wide Area Networks (WAN) ![](https://purple.ai/wp-content/uploads/2021/01/Lan-Vs-Wan.png) ### WAN Wide Area Networks span a wide area (across countries, regions etc) as opposed to a LAN. In a WAN, host devices have **publicly routable** addresses that are (ideally) **globally unique**. > The Internet Assigned Numbers Authority (IANA) assigns blocks of IP addresses to governments and companies worldwide. Public routers in the WAN are capable of **forwarding** an IP packet to another host by looking up the destination address in its **route table**. ### LAN On the other hand, devices in LANs have **private** addresses. There are [three defined ranges](https://en.wikipedia.org/wiki/Private_network) of IP addresses designated for **private** use. Unlike publicly routable addresses, private use addresses are **not** unique (in the sense that there are other LAN's in the world with the same IP address as yours, e.g: `192.168.1.1`) and **are valid only within the LAN**. The owner of the LAN can **freely** assign these private addresses to devices in the network **without** registering with the IANA. > For instance, my home's `192.168.1.1` will not be the same your `192.168.1.1`. LANs are copiously deployed in **homes** and **offices** to save the cost of **purchasing public addresses**; instead having all the devices share the same public address. ### Replying to private addresses The consequence is that if I , `192.168.1.1`, try to send a packet to some public host e.g: `1.1.1.1`, my packet might be able to reach the host, but the host will not be able send back a reply because my address is a **private** address. > There might be thousands of devices in the world that has the same private address. ### NAT Details To overcome this issue, NAT (**Network Address Translation**) is used. NAT **gateways** sit between the LAN and the WAN. * It has **both** a private address (so other devices in the LAN can talk to it) and a **public** address (so that devices in the outside world can talk to it). * Whenever a local device tries to send a packet to the outside world, it goes through the NAT Gateway. The NAT Gateway then **rewrites** the source private address in the outgoing packet and **replaces** with its own **public** address before sending it to *another* public WiFi router. For example, in your home network, **your WiFi router is also the gateway**: * When your smartphone tries to access the Internet, packets first go through your WiFi router, have the source address **replaced** with its WAN address (assigned by your ISP), before forwarding it to another public router. * When the destination host receives the packet, it is able to send a **reply** back to the NAT gateway **because the NAT gateway has a publicly routable address**. The gateway then rewrites the **incoming** packet's destination address to the private address of the local device before finally forwarding it to the local device. ![](https://www.firewall.cx/images/stories/nat-concept-1.gif) ### Seeing NAT in action at SUTD Open [whoismyisp.org](https://whoismyisp.org) to find out your public address. Notice that you and me, when connected to the same network, have the same public IP address. However, we both have different private IP addresses. You can find out about your private address using the command `ifconfig` (UNIX) or `ipconfig` (Windows): ![](https://i.imgur.com/wQoqK6t.png) ## DNS ==DNS (Domain Name Service) is the Internet's **yellow** pages.== Hostnames such as `google.com` are easy to remember, but the **IP stack** requires hosts to have an IP address such as `1.1.1.1`, not a name. DNS is thus commonly employed by other application-layer protocols (such as HTTP in your web browser) to translate user-supplied hostnames to IP addresses. > For example, when you type `google.com/about` in your web browser, in order for the web browser to send a HTTP request message to Google's web server, the user's host must first obtain the IP address of the destination web server. This is done as follows: 1. The user host (Windows, mac etc) runs a DNS **client** 2. The browser extracts the hostname portion of the URL (`google.com`) and passes it to the DNS client 3. The DNS client sends a **query** containing the hostname to a DNS **server** (by default, this is your ISP's DNS server) 4. The DNS client eventually recives a reply, which includes the IP address for the hostname `google.com` such as `123.456.789.123` 5. The browser is now ready to initiate a TCP connection to Google's web servers to send HTTP requests using the resolved address of Google servers. In practice, you'll need to know how to create DNS records in DNS servers for your company so that `www.mycompany.com` **points** to your company's web server or other services. ### Wireshark Wireshark is a **powerful** tool used to capture packets sent over a network and analyse the content of the packets retrieved. Please [install](https://www.wireshark.org) it from here. The file [dnsrealtrace.pcapng](https://drive.google.com/file/d/118Z03KnN7mNchsIs3G-DUdtf1zJV3NVI/view?usp=sharing) contains a trace of the packets sent and received when a web page is downloaded from a web server over the SUTD network. > In the process of downloading the web page, DNS is used to find the IP address of the server. Open the dnsrealtrace.pcapng in Wireshark and you should see this interface: ![](https://natalieagus.github.io/50005/assets/images/nslab3/5.png) Now answer the following questions. 1. Locate the DNS query and response messages. Are they sent over UDP or TCP? 2. What is the destination port for the DNS query message? What is the source port of the DNS response message? 3. What is the IP address to which the DNS query message was sent? Run either of these commands to determine the IPv4 address of your local DNS server. Are these two addresses the same? * `scutil --dns` (**macOS**) or * `cat /etc/resolv.conf |grep -i '^nameserver'|head -n1|cut -d ' ' -f2` (**Ubuntu**), or * `ipconfig/all` (**Windows**, look under DNS Servers field) 5. Examine the second DNS **query** message in the Wireshark capture. What type of DNS query is it? 6. Locate a TCP SYN packet sent by your host subsequent to the above (second) DNS response. > This packet opens a TCP connection between your host and the web server. Does the destination IP address of the SYN packet correspond to any of the IP addresses provided in the DNS response message? Now you can use Wireshark to capture packets for analysis: Once the program is launched, select the network interface to capture and click on the sharkfin icon at the top left of the application right under the menu bar to begin capturing packets. If you click on each packet, you can see each layer’s header and the application layer payload. ![](https://natalieagus.github.io/50005/assets/images/nslab3/6.png) To explore the interface, mention the interface (e.g. eth0, wlan) in the capture option. * There are display filters to analyse the packets. * Protocols: TCP, UDP, ARP, SMTP, etc. * Protocol fields: port, src.addr, length, etc. (E.g. ip.src == 192.168.1.1) For more detailed instructions on Wireshark, refer to its [official homepage](https://www.wireshark.org/). ### AWS Route53 Route53 is a **managed DNS service and registrar**. ### DNS Registrar A DNS Registrar is a company that you can **pay** to **register** a domain name such as `mycompany.com` that will be globally recognised as belonging to you. Once you **own** the domain name, you are **free** to create as many subdomains as you want under the domain; just like how the Government has `gov.sg` as the main domain and many websites such as `vaccine.gov.sg` and `tokengowhere.gov.sg`. ### Managed DNS Service A managed DNS service is a service where you only need to update the **DNS records**, while the **menial** tasks like **installing** a DNS server and keeping it online is done by them (Amazon). For today's course, we are using a subdomain under the a domain name purchased by Instructor, `sutdacademytools.net`. Each student will temporarily have their own domain name `student_name.sutdacademytools.net` that can be **managed** in Route53 of your AWS account. We will practise creating records in the Route53 console. > **Note:** Instructor will temporarily grant you access to Route53 console for this exercise. Once registered, the access will be revoked. ### Part 1: Create a hosted zone In the Route53 console, create a **new public hosted zone** with the following name: ``` student_name.sutdacademytools.net ``` You will see the domains of 4 nameservers, such as `xyz.awsdns.net`. > **Copy these addresses somewhere.** ### Part 2: A Record In the morning, you had used nodejs to create a simple web server. Now let's see how to make it accessible with a domain name of your choice. First, **find** the IP address of your EC2 instance. Next, In the Route53 console, **create a new record**: - Name: www - Value: Your IP address - Type: A - TTL: Default (300) ### Part 3: TTL The TTL option that we left as default previously stands for "Time To Live". It tells the DNS **client** how long it should cache (remember) the result of a DNS query. The reason for setting a finite period for caching is because: 1. DNS queries are (relatively) slow, 2. IP addresses of hostnames are bound to change. If you have a very high TTL, your clients will make **fewer** DNS queries, but if you ever want to **change** the value of your DNS records (such as changing the IP address), your clients will be **stuck with outdated information until the TTL is over**. For this reason, a good starting point for TTL is 5 minutes. ### Part 4: Create the NS records in the parent hosted zone Finally, we need to create NS records in the **parent** hosted zone `sutdacademytools.net`. This is basically telling the clients: *if you want to resolve domain names under `studentname.sutdacademytools.net`, please contact these nameservers*. Open the `sutdacademytools.net` hosted zone and create an `NS` record for your domain name with the 4 nameservers you created in Part 1. ### Test with `dig` You can now `dig` your EC2 hostname to ensure that it's now reachable via the Internet: ![](https://i.imgur.com/rbhnZQM.png) When successful, you can try accessing your sample webserver using your newly created hostname instead of the public IP address, e.g: `http://studentname.sutdacademytools.net:8080`. ### Site Hosting with NGINX [Nginx](https://www.nginx.com) is a web server that can also be used as a reverse proxy, load balancer, mail proxy and HTTP cache. Suppose you have set your domain name and now your site is reachable at: `http://studentname.sutdacademytools.net:8080`. Obviously, you wouldn't want people having to type your **port number**. Port **80** is the port number assigned to commonly used internet communication protocol, Hypertext Transfer Protocol (`http`). It is common for a webserver to **host multiple sites**, and as such we can't just run our website directly at port 80. * Ports below 1024 can be opened only by **root** * Running your webserver, e.g: `node server.js` as root everytime is not a good practice (imagine the security issues). As such we can use **reverse proxy** like NGINX. Firstly, install it: ```bash sudo apt-get update sudo apt-get install nginx ``` Then, run the http server first (assume at port `8080`), e.g (using `nohup` so you can see the logged output if any): ```bash nohup node index.js & ``` After `nginx` is installed, you need to configure it: ```bash sudo nano /etc/nginx/sites-available/default ``` Paste the following and save (ctrl+x, save): ``` server { listen 80; server_name natalieagus.net www.natalieagus.net; location /http-serverjs { proxy_pass http://localhost:8080; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $host; } } ``` Verify that the config works properly: ```bash sudo service nginx configtest ``` Then start: ```bash sudo service nginx start ``` In your EC2, we have already set inbound rule to allow HTTP connection: ![](https://i.imgur.com/StlzaiX.png) You can now test reaching your site using the url: http://studentname.sutdacademytools.net/http-serverjs in your web browser, or curl it: ```bash curl -v http://studentname.sutdacademytools.net/http-serverjs ``` Here's an example output from instructor's personal domain name: ![](https://i.imgur.com/o2zeHKS.png) # Summary Today we have learned the following: - [x] Basics of Computer Networks: internet protocol stack, LAN, WAN, NAT - [x] A few useful internet tools: traceroute, dig, wireshark, netstat - [x] Understanding how DNS works - [x] Hands on with AWS Route 53 - [x] Hosting a simple web server with a reverse proxy In the next session, we will expand our knowledge about Internet Security.