owned this note
owned this note
Published
Linked with GitHub
---
tags: devtools2021
---
# Fundamentals of Operating System (Day 1 - Afternoon Session)
We must **choose** a suitable Operating System and hardware to succesfully deploy our server `html.js` or `server.js`. Surely we can deploy it on our personal computer, but that's not an ideal machine to use for long-term servers (unreliable, lack of proper security measures, not to mention **expensive** to maintain).
As such we typically rely on existing services, like the AWS EC2, Google Compute Engine, DigitalOcean Droplets, Azure VMs among many others. All of these services are web services that provides secure, resizable compute capacity in the cloud. They're **ideal** to be used to run our server code reliably.
## Amazon Web Server setup (50 mins)
For the purpose of this course, we will be using AWS. We have **created** an account for each of you. Log in to your AWS account and go to [EC2 homepage](http://console.aws.amazon.com/ec2/)
Then, choose a region of your choice. **Remember this region** because that's where your instance is hosted. In this example, we use "Ohio".
![](https://i.imgur.com/WyBv75y.png)
Right now we do not have any instances yet, so let's create one. Click on **Launch Instances**, and select `Ubuntu Server 20.04 LTS` for the **Operating System** (also known as machine image) option:
![](https://i.imgur.com/ocGFZNT.png)
For the instance type, select `t2.micro`. It is **free tier** eligible. The *instance type* simply defines the hardware capacity of our computer.
![](https://i.imgur.com/JBEQEQ2.png)
Click "Review and Launch" to immediately launch the Instance. We can do other settings later. You will see this page:
![](https://i.imgur.com/EizoQbK.png)
When you click "Launch", you will be prompted with key-pair generation. Select create new pair and give it a name, then download the `.pem` file.
![](https://i.imgur.com/l3q0Hvc.png)
This is important if you want to remotely access your AWS EC2 instance via `ssh`. We will do this later. For now, you will see that you have one instance in the dashboard:
![](https://i.imgur.com/tJ9xZdK.png)
Click on the instance and then click the **connect** button on the next page:
![](https://i.imgur.com/BH54epI.png)
### Connect to EC2 using InstanceConnect
AWS EC2 supports direct access to your instance via the web browser.
![](https://i.imgur.com/JyUjm9w.png)
Clicking on "connect" opens a new tab that shows the command prompt of your newly made instance. From here, you can type various commands and use the computer as per normal.
![](https://i.imgur.com/S9BhEke.png)
**There's no GUI here**, we are simply accessing the Operating System **services** via the **command line interface** (CLI). We have been doing quite a bit of that earlier to run `node` and various `git` commands, but we will learn more about the CLI and OS real soon.
### Connect to EC2 using SSH client
If you wish to use your own SSH client, then you can follow these steps:
1. Navigate to the private key downloaded earlier, and change it to be executable using the command `chmod 400 <filename>.pem`
2. Execute `ssh` command to your instance. The details can be found in the SSH Client tab:
![](https://i.imgur.com/umOO1di.png)
The output should looks similar to that of EC2InstanceConnect:
![](https://i.imgur.com/p3Ou2sr.png)
### Connect to EC2 using Cloud9
Another convenient and recommended way to connect to your EC2 instance is to use AWS Cloud9. Cloud9 is a free web IDE that allows you to connect and access your EC2 instances. Select Cloud9 services from the "Services" tab of your AWS console:
![](https://i.imgur.com/Tvf5xEg.png)
Afterwards, click on *create environment*:
![](https://i.imgur.com/zA8OIRm.png)
Give your environment a nice name:
![](https://i.imgur.com/19UOBj2.png)
Configure the environment using your EC2 username (which will be `ubuntu` by default) and hostname:
![](https://i.imgur.com/ZGfjYKh.png)
The EC2 hostname can be found in the EC2 dashboard we saw earlier
![](https://i.imgur.com/E0Spphb.png)
**Click "Copy key to clipboard" at Cloud9 environment setting**, and head to your EC2 InstanceConnect terminal or your own terminal (that has SSH access to your EC earlier) and paste in the following commands in succession:
```shell=
sudo apt-get update
sudo apt-get -y upgrade
echo <paste the copied key> >> ~/.ssh/authorized_keys
sudo apt-get install -y nodejs
```
The first two commands updates the Ubuntu OS in your EC2. The third command adds the Cloud9 public key to your EC2 `~/.ssh/authorized_keys` file. The complete command will look something like this:
```
echo ssh-rsa 5QGOtWtUJqiBnZQb0ZqPXzgCmd0JgH0H/pSZSr2qFqUjHgYC/yknqCQ6O0WhmeK9ond8vR40zk4aZKnuvE+ZTZDQhMtQSuBcyQPzAqLmXmWAHF5vPAsdLmCXVRTAsQ6aYeDom8viwiZ1yHTuWWAuxeNm7SQTIxho0KSjwRc3OO1gIU6UhFK/jpWZ8X+vEppMRKYMiB1+OTUbrcL9UPUizXp5GIkNO7bgqRtvIRzRriTkiXg+zOSIEcg7KkXXPhoeLUHcAUGpqKqesGTy9jJars8hRAczRaynDL5lHiiL3Ah4UXAvl8bO+WosohPai/4nZ6TX45yOMWEIKidCGal8VGjwnEmsWMmWduwquXey4T6tcGK7cLmM9hPymdwyxgZMTv05Cop2d3XemeH/NNu9BNWucvoZjbCieKih2wcqqxYdwxNFhKzc+TydJGa7ALrS+BznJMwb+HrfDpVzsWBFAwOUseC1vgVKOCpkeriDp+PVMwjly0ABwtJkww5/73zmVewet7eN+EAYPusd9SJAF+CbevRb18dhFxUSaFFetKBbcns9hzBZNR6S7GQ8vATnsrhj9Qzd/krWmFmsX+mMRqzSDng74WVnwBtFWM253ZEdCcRl8RFY0ia0ff0upJ++dwrNCm9k/Y/U+Km9fuYaC9NX12MNB4vwdQ== nat+581994641164@cloud9.amazon.com >> ~/.ssh/authorized_keys
```
You can read the file by typing the command:
```
cat ~/.ssh/authorized_keys
```
Finally, the fourth command is to install `node` into the EC2. Cloud9 requires `node` to run. You might be faced with warnings as such to restart some outdated daemon. Simply select (navigate with arrow and press space to select) all and tab to the `OK` to restart.
![](https://i.imgur.com/vRiHXCl.png)
Once you're done, head back to the Cloud9 dashboard and click on "next step":
![](https://i.imgur.com/yVdx4NE.png)
This will bring you to the next page where you can simply click "Create Environment". You will then be prompted with this:
![](https://i.imgur.com/CPa5Kc3.png)
You can click "next" to allow Cloud9 to install manually, but that's rather hassling and you might be met with unexpected errors. It's best to install this manually.
> For now, just ignore this and head back to your EC2 SSH console.
Right click on the C9 install link and **get the link address**: https://d1q2hgnv37wylw.cloudfront.net/static/c9-install.sh
> If it's still the same as the above, copy the commands below to continue. Else replace the link for `wget` with the new link. We never now when they will ever update their CDN.
Head back to your EC2 terminal in InstanceConnect or your own SSH client. Type the following commands in succession:
```shell=
wget https://d1q2hgnv37wylw.cloudfront.net/static/c9-install.sh
chmod a+x c9-install.sh
sudo apt-get -y install python2
sudo apt-get -y install build-essential
./c9-install.sh
```
The first command downloads the shell script `c9-install.sh` from the cloudfront website. We then change the **file permission** of this script into **executable**. The next two commands install the necessary libraries (Python and some basic utilities -- generally includes the GCC/g++ compilers and libraries). The last command executes the script to install C9 into your EC2.
Wait for a few minutes and once it is done, return to your Cloud9 webpage and click **refresh** to **relaunch** the IDE. You can also find your environments in the Cloud9 dashboard:
![](https://i.imgur.com/8QRSZT3.png)
The IDE looks like this:
![](https://i.imgur.com/RTEbQSH.png)
If you're faced with outdated package warning e.g: `tmux`, just click `Update`:
![](https://i.imgur.com/SRX7FZU.png)
Right now, you only have **one file** in your Root directory, which is the `install.sh` script you downloaded earlier using `wget`. With this, you're all connected with a convenient editor GUI to traverse and manipulate the EC2 filesystem.
One last thing to do is to **run** the code you pushed at the remote github repository in this EC2.
## Exercise: Run the code in your EC2 instance (5 mins)
You can `git clone <repository-url>` your EC2 instance. You should have this at the C9 IDE and run the server `html.js`:
![](https://i.imgur.com/NTuhIn7.png)
But wait! How do we access them? We cant just do http://0.0.0.0:8000 in our browser because this is now hosted at another machine (our EC2).
> In the context of servers, 0.0.0.0 means *all IPv4 addresses on the local machine*. For instance, if a host is assigned two IP addresses, 192.168.1.1 and 10.1.2.21, and a server running on the host is configured to listen on 0.0.0.0, it will be reachable at both of those IP addresses. Note that this is common to hosts that have more than one *[network interface](https://goinbigdata.com/demystifying-ifconfig-and-network-interfaces-in-linux/)*, as they have one Internet address for each interface.
This means we must find out the **PUBLIC IP address** of our EC2 instance. It is somewhere in your *instance*'s' EC2 dashboard (IPv4). **Find it**.
Afterwards, construct the url: `http://<your-EC2-public-IPv4>:8000`. We need to also tell our EC2 to **accept** incoming traffic. Go to the **security** tab in your instance's EC2 dashboard, and click on the security group entry highlighted in blue:
![](https://i.imgur.com/5lXloxU.png)
Click on "edit inbound rules" in the new page:
![](https://i.imgur.com/jWpSRF9.png)
Then add the new rule to allow **custom TCP** connection from any IPv4 address (so the public can access this server) and give it a nice description:
![](https://i.imgur.com/jJOoyi2.png)
After saving the inbound rules, try the url to your server in your computer's browser. You should see the nice webpage as before, but this time round it hosted from your EC2:
![](https://i.imgur.com/NODbsyc.png)
## Exercise: make changes to your file and pull from the EC2 instance (5 mins)
Open index.html in your local repository and make some small changes to index.html:
```htmlmixed=
<body>
<div class="center">
<h1>Hello Again!</h1>
<p>This is served from a file</p>
<p>Have a nice Day!</p>
</div>
</body>
```
Save it and `push` to the remote repository (of course it's implied that you need to `add` then `commit`). Afterwards, attempt to `pull` it on the EC2 instance, and re-serve the webpage. Refresh the site's URL in your web browser should see this new text instead:
![](https://i.imgur.com/wNXH0Mw.png)
Similarly, if you want to `push` directly from the EC2 instance, you need to generate a new **personal access token** and save it in the EC2's github login credentials.
Find out how to do this (10 mins). You can review the previous notes for hints.
## Possible error: EADDRINUSE
If you're met with certain errors like this when running node for the second time:
![](https://i.imgur.com/g0OMZk8.png)
..it means that there's a process that is using that exact port 8000. We need to manually **kill** the **process** by typing the command:
`ps aux | grep node`. This lists out all processes with the name "node":
![](https://i.imgur.com/88krzhB.png)
The first entry indicates that there's a `node` process that still runs `html.js`. We need to kill that process by the `kill` command:`kill -9 <pid>` where `pid` is the process id number (51934) in this case. Afterwards, you can run `node html.js` again.
> Always press ctrl+c to kill the `node` process properly.
## Introduction to Operating System (60 mins)
Now that we have successfully "deployed" our web server code onto the EC2 instance, it is time to understand a little bit about what's going on under the hood. In particular, to give some light to these questions:
1. What is the command-line-interface? What does it have to do with the Operating System? Can I use the same commands on different OS?
2. We need to install git earlier in our machines, why don't we need to do the same in the EC2?
3. How does my web browser get the HTML file hosted at the EC2?
4. What is the deal with that process ID?
5. What is that number 8000?
Let's begin with the Basics of Operating System.
### The Operating System
An operating system is a **special** program that acts as an **intermediary** between **users** of the computer and the computer **hardware**.
> The operating system is part of the computer system and is analogous to a government.
The **goal** of an operating system is such that we have a dedicated program to fulfil the following essential roles:
* Resource allocator and coordinator: controls hardware and input/output requests, manage conflicting requests, manage interrupts
* Controls program execution:
* Storage hierarchy manager
* Process manager
* Limits program execution and ensure security: preventing illegal access to the hardware or improper usage of the hardware
Once we have an operating system, it makes things **easier** for users to use a program / code another program for other purposes **within** a computer system.
There are a lot of things that make up an operating system, but they are generally divided into three categories:
- [x] The Kernel
- [x] System programs
- [x] User programs
![](https://i.imgur.com/9pfIhTW.png)
The OS provides an **environment** such that **user programs** such as the text editor, web browser, compiler, database system, music player, video editor, etc can do **useful** work.
> Since each user program runs in a **virtual** machine (i.e: it is written in a manner that the entire machine belongs to itself), there has to be some sort of another **program** that manages and oversees all programs that live on the RAM and reside on disk, as well as managing the memory hierarchy.
This special program is part of the operating system called the **kernel**. It provides essential **services**, such as interprocess communication and file system management.
> There are a whole lot of other details about Operating System, OS services, and Kernel that are omitted for this course. It is recommended for you to do further reading with [this book](https://www.os-book.com/OS10/) if you're interested.
### The Kernel
The Kernel is a software, which forms the **heart** of an operating system. Its size varies greatly depending on the architecture, for [example](https://en.wikipedia.org/wiki/Comparison_of_operating_system_kernels),
* ==Ubuntu OS **(our EC2 OS)**==
* Size: about 2.7GB,
* Kernel: Linux, size about 70MB
* Windows OS
* Size: about 20-45GB,
* Kernel: Windows NT, size varies hugely depending on architecture
* macOS
* Size: about 15 GB (Big Sur)
* Kernel: [XNU](https://en.wikipedia.org/wiki/XNU), size varies hugely depending on version and extensions
* [Oracle Solaris](https://en.wikipedia.org/wiki/Oracle_Solaris)
* Size: 2-4GB
* Kernel: Solaris Kernel, size varies depending on version
* Used and actively developed in specialized/high-end
One of the most famous kernels that are used by many OS is the Linux Kernel. A few examples of [Linux *distributions*](https://en.wikipedia.org/wiki/Linux_distribution) (an operating system made from a software collection that is based upon the Linux kernel) are Ubuntu, Debian, Fedora, Android, and Chrome OS among many others.
See for yourself the code for the Linux Kernel originally developed by Linus Torvalds [here](https://github.com/torvalds/linux).
> Fun fact: Torvalds also initially developed [Git](https://en.wikipedia.org/wiki/Git) to maintain the development of the Linux kernel. You can find the source code [here](https://github.com/git/git).
## Operating System Services
### Operating System User Interface
The OS interface gives users **convenient** access to various OS services. They are programs that can execute specialised commands and help users perform appropriate system calls in order to navigate and utilise the computer system.
There are two general ways for users to conveniently access OS services:
* Using Graphical User Interface (GUI)
* Using Command Line Interface (CLI)
#### The OS GUI
The OS GUI is what we usually call our “home screen” or “desktop”. It characterises the **feel** and **look** of an operating system. We use our mouse and keyboard everyday to interact with the OS GUI and make various system calls:
* Opening or closing an app
* File creation or deletion
* Get attached device input or output
* Install new programs, etc
#### The Command Line Interface
The OS CLI is what we usually know as the “**terminal**” or “**command line**”.
A command-line interface (CLI) is a means of **interacting** with a computer program where the user issues successive commands to the program in the form of **text**. The program which handles this interface feature is called a command-line interpreter.
#### Command Line Interpreter
The particular program that acts as the interpreters of these commands are known as **shells**. Typical OS might come with multiple command line interpreters. A user may choose among several different shells, including the Bourne shell, C-shell, Bourne-Again shell, Korn shell, and others. Users typically interact with a shell via a **terminal emulator**, or by directly writing a shell script that contains a bunch of successive commands to be executed.
> A terminal emulator is a text-based user interface (UI) to provide easy access for the users to issue commands. Examples of terminal emulators that we may have encountered before are [iTerm](https://iterm2.com), MacOS terminal, Terminus, and Windows Terminal.
Common shells that we may have encountered:
* Bourne-Again shell (bash): written as part of the GNU Project to provide a superset of Bourne Shell functionality. This shell can be found installed and is the default interactive shell for most Linux distros and macOS systems
![](https://i.imgur.com/7aq51oh.png)
* Z shell (zsh) is a relatively modern shell that is backward compatible with bash. It's the default shell in macOS since 10.15 Catalina.
![](https://i.imgur.com/dLWw1Lb.png)
* PowerShell – An object-oriented shell developed originally for Windows OS and now available to macOS and Linux.
![](https://i.imgur.com/oQuDgNc.jpg)
### UNIX-like Operating System
The **commands** that are valid for macOS might not be valid for Windows user. For instance, the command `ls` can be used in macOS to list out the items in the current **directory**, but the same command wont work on Windows. The command `dir` must be used instead.
![](https://i.imgur.com/wWFXGNu.png)
For practical purposes, there has to be a coherence beteween various OSes such that at least they confirm to the same sets of commands. Therefore a family of *UNIX-like operating system* was born.
> A Unix-like (sometimes referred to as UN\*X or \*nix) operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification. **A Unix-like application is one that behaves like the corresponding Unix command or shell**. There is **no** standard for defining the term, and some difference of opinion is possible as to the degree to which a given operating system or application is "Unix-like".
#### POSIX
Unix was selected as the basis for a standard system interface partly because it was "*manufacturer-neutral*" (they are able to use the technology best suited to your needs at any time). However, several major versions of Unix **existed**, so there was a **need** to develop a **common-denominator system**.
The standard of which these UNIX OS comform to is called the **POSIX** standard.
> The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining **compatibility** between operating systems.
**A brief history:**
The POSIX specifications for Unix-like operating systems originally consisted of a single document for the core programming interface, but eventually grew to many separated documents. The standardized user command line and scripting interface were based on the UNIX System V shell.
Many user-level programs, services, and utilities (including `awk`, `echo`, `ed`) were also standardized, along with required program-level services (including basic I/O: file, terminal, and network). POSIX also defines a standard threading library API which is supported by most modern operating systems. In 2008, most parts of POSIX were combined into a **single standard**.
> POSIX defines both the system- and user-level application programming interfaces (API), along with command line shells and utility interfaces, for software compatibility (portability) with variants of Unix and other operating systems. POSIX is also a trademark of the IEEE, intended to be used by both application and system developers.
Ubuntu, macOS, and SolarisOS are all UNIX-like and POSIX compliant. Therefore similar sets of UNIX commands can be used to access the services of all these OSes. Windows however is not UNIX-like, resulting in it having an entirely different sets of commands.
Here's a list of common UNIX commands. We have seen some of them before:
![](https://i.imgur.com/NN3eYoi.jpg)
And here's a list of common Powershell commands:
![](https://i.imgur.com/YDvJlZI.png)
### Commands
Through the CLI, we can conveniently type out commands such as `git`, `echo`, `node`, etc. Some commands are "default" (comes with the OS without the need to install anything else), and some requires installations (like `git`).
So how exactly do these commands work? By now we know that the commands that can be interpreted highly depends on the OS type (UNIX-like, or not). The shell primarily interprets a command from the user and **executes** it. There are two ways to implement commands:
* **Built-in**: the shell itself contains the code to execute the command.
* **System programs**: command interpreter does not understand the command in any way; it merely uses the command to identify a file to be loaded into memory and be executed. This is used by UNIX, among other operating systems.
> You can type echo $PATH on your terminal to find out possible places on where these system programs are.
![](https://i.imgur.com/fA5FjN7.jpg)
In short, when we **install** something new, like `python`, its **binary** (executable) is installed at any of these paths, e.g `usr/bin/python`.
- When we type the command `python file.py`, our shell attempts to find `python` program (name must be matching) in any of the paths shown in the output of `echo $PATH`
- If there exist `python` in any of these paths, the shell will **execute** that program with the argument `file.py`
- If not, a `command not found` will be shown
![](https://i.imgur.com/viyyghi.png)
==[Here's a very handy and useful website](https://explainshell.com) to learn what a particular command do, just simply paste a command such as `git log --graph --all --oneline --decorate` to it and observe some magic in action:==
![](https://i.imgur.com/jC2ikRV.png)
#### Environment Variable
What if we want the shell to search in other locations? We can modify the `PATH` environment variable in a file in your home directory called `.zshenv` (if you use zsh), or `.bash_profile` (if you use bash).
![](https://i.imgur.com/XIgUPiz.png)
For instance, suppose we write and compile a C program that prints "Hello World" as follows:
![](https://i.imgur.com/dN8vADn.png)
We tell zsh to the execute the program from `this directory`, signified as a "dot" `./nat-cprog`, thus printing `Hello, World!`. Attempting to execute the program name directly will result in `command not found` error because zsh doesn't know where this program is located.
We can then add this current `~/Desktop/SUTD Acad` path to the `PATH` environment variable, and this allows zsh to execute the program *as a command*:
![](https://i.imgur.com/lr2YjLh.png)
> We will learn more about common commands and shell *scripting* in Day2. Shell scripting is a great way to **automate** repetitive processes, such as creating or deleting thousands of files daily.
### Processes
There are **hundreds** of processes running in our computer at any given time. You can type the command `ps -ef` to list all running processes in the system.
![](https://i.imgur.com/ujlzUAn.png)
A process is formally defined as **a program in execution**. A program is *not* a process.
* Process is an active, dynamic entity -- i.e: it changes state overtime during execution, while a program is a *passive*, static entity.
* If we open Microsoft Word *twice*, we have **two** MS Word *processes*, but a single MS Word *program*.
A process couples **two abstractions**: **concurrency** and **protection**. Each process runs in a *different address space* and sees itself as running in a **virtual** machine -- unaware of the presence of other processes in the machine.
* Multiple processes execution in a single machine is concurrent, managed by the scheduler, which is part of the kernel code.
* Processes who wish to **communicate** with each other must utilise the Kernel services as well.
==**Important**: Each process is given a `pid` (process ID). We can give commands to our Kernel via the CLI to `kill` any existing process using the `pid`. This is extremely handy to do if we need to kill a process that *hangs* or takes too much resources during runtime.==
## OS Service: Interprocess Communication with Message Passing
Every message passed back and forth between writer and reader (server and client) must be done using kernel’s help. One of the ways is via *message passing* (the other is using *shared memory*).
***Socket** is one of message passing interfaces. *
A socket is one endpoint of a two-way communication link between two programs running on the network:
* It is a concatenation of an IP address, e.g: 0.0.0.0 for localhost
* ...plus TCP (connection-oriented) or UDP (connectionless) *port*, e.g: 8080.
> We will learn more about UDP and TCP as network communication protocols in the later part of today.
* When concatenated together, they form a socket, e.g: 0.0.0.0:8080
* All socket connection between two communicating processes must be unique
> This is why we cannot run our `server.js` twice, because both are using the same socket.
In short, **socket** can be used for two processes in the same machine to communicate locally, or two processes in *different* machine to communicate (over the internet):
![](https://i.imgur.com/RiBTp8P.png)
### Port
A port is a virtual point where network connections **start** and **end**. Ports are software-based and **managed** by a computer's operating system. Each port is associated with a specific process that *listens* to it.
> Of course not all processes are assigned a port, only for those who requests for it like our webserver.
Ports allow computers to easily differentiate between different kinds of **traffic**: emails go to a different port than webpages, for instance, even though both reach a computer over the same Internet connection.
Ports are **standardized** across all network-connected devices in our computer, with each port assigned a **number** between 0 to 65536. Most ports are reserved for certain protocols — for example, all Hypertext Transfer Protocol (HTTP) messages go to port 80. Samples of standard ports include:
* **Port 22:** Secure Shell (SSH). SSH is one of many tunneling protocols that create secure network connections.
* **Port 25:** Simple Mail Transfer Protocol (SMTP). SMTP is used for email. If we have an email server, we need to bind to this port.
* **Port 80:** Hypertext Transfer Protocol (HTTP). If we are hosting a web server, we need to bind to this port. If we host the webserver at other ports such as `8080`, we need to *specify* it when keying the url in the address bar like we did earlier.
* **Port 53:** Domain Name System (DNS).
* Preview: DNS is an essential process for the modern Internet; it matches human-readable domain names to machine-readable IP addresses, enabling users to load websites and applications without memorizing a long list of IP addresses.
* For instance, the IP of one of Netflix's server is `172.217.194.102`. We can access google by typing `https://172.217.194.102`. We can also access Google by typing `https://google.com`.
* Our computer must somehow **maps** between `google.com` to `172.217.194.102` so that our queries to Google can be routed properly. The system that is responsible for this hostname-address translation is DNS. We will learn about it more later today.
As a **summary**, ==**IP addresses** enable messages to go to and from specific **devices**, port numbers allow targeting of specific **processes** within those devices.==
> In our sample server, we use an arbitrary port `8080` and `8000` (some unused ports) so not to affect other standard protocols.
## OS Service: The File System
Another important service provided by our OS and its Kernel is file system management. A file system controls how data is stored and retrieved in a system. **It is a set of rules (and features) applicable to determine the way the data is stored.** A physical disc can be separated into multiple physical partitions, where each partition can have *different file system*.
The purpose of a file system is to maintain and organize secondary storage.
Without a file system, data placed in a storage medium would be one large body of data with no way to tell where one piece of data stops and the next begins.
The File System operates using **specific** data structure and has a specific **format**. *Its interface is part of the OS, so they vary between operating systems.*
> We can perform operations on files, and these operations are possible through the file interface or methods, analogous to the interface/methods we implement for classes in OOP.
> Operations that can be performed on files as we know it already include: create, read, write, and delete. Other operations that are not that common are reposition and truncate.
**Examples of common file system includes:**
* File allocation table (FAT) is supported by the Microsoft Windows OS
* New Technology File System (NTFS)-- is the default file system for Windows products from Windows NT 3.1 OS onward
* ext4 is a file system for many Linux Distributions
* Universal Disk Format (UDF) is a vendor-neutral file system used on optical media and DVDs
* Hierarchical file system (HFS) was developed for use with Mac operating systems. HFS is succeeded by HFS+
* Apple File System (APFS) for macOS
### A File
The File System manages collection of Files in our storage device.
A file is a **logical** storage unit in the computer, defined by the operating system. In layman terms, a file is a group of data bytes which are stored neatly in a known location with a unique name (path).
Consider the **file** with name: `classrecording.mp4` below. The file consists of file **attributes** and file **content**. File attributes contain important information such as name, size, datetime of creation, user ID, etc. Its content is essentially a group of data bytes (~536MB). When we use this file, we don’t really care about its **physical** address (where it is actually stored on **disk**).
==We only care about its path. That’s what we mean by “logical” storage unit.==
![](https://i.imgur.com/rburbCf.png)
### Directories
Directories are: metadata that organizes files in a **structured name space**. In layman terms, directories are **lists** of names assigned to each file. You can think of it like a *mall directory*, where we have **names** and **mall location, e.g: \#03-68**. The *mall location* in the case of a File System contains the ID of that file so that our computer can know **where** our file content is physically located.
We can change the **names** of each file by changing the **content** of the directory, while keeping the ID the same. This is analogous to changing the name of a Shop (we need to change the mall directory), but the "address" within that mall is still the same, e.g: \#03-68.
Note that directory is very similar to the definition of a **folder**. **However folder is a GUI concept**, associated with the common folder icons to represent **collection** of files. If you are referring to a container of documents, then the term folder can be used -- related to the GUI. The term directory refers more *broadly* to the way a structured list of document files and folders is stored on the computer.
### Windows vs Linux File System
When we compare file system in Windows and Linux, in Microsoft Windows, files are stored in folders on different data drives like C: D: E:. But, in Linux, files are ordered in a tree structure starting with the root (denoted with forward /) directory.
This root directory can be considered as the start of the file system, and it further branches out various other subdirectories. A general tree file system on your UNIX-like OS may look like this:
![](https://i.imgur.com/vR1IWM8.png)
The full path of the folder `Documents` is `/Users/Ubuntu/Documents`. A file located inside Documents, for example: `homework.py` has a full path of `/Users/Ubuntu/Documents/homework.py`
### Working directory
The **working** directory of a process is dynamically associated with each process. Our **currently running shell** is also a process. It gives the "starting point" for a process to navigate the file system.
When the process refers to a file using a **simple** file name or **relative** path (as opposed to a file designated by a *full path* from a root directory), the reference is interpreted relative to the working `/Users/Ubuntu/Documents` that asks to create the file `foo.txt` will end up creating the file `/Users/Ubuntu/Documents/foo.txt`.
We can change the **current working directory** of our POSIX-compliant shell with the command `cd`, and see the current working directory with the command `pwd`. The command `ls` lists out all files in the current working directory (one level).
![](https://i.imgur.com/IM8Ek9C.png)
When `ls` was first executed in the above example, the current working directory is at `/Users/natalie_agus/Desktop`, thus listing all first-level files at that path. The second execution of `ls` lists different outputs because it was executed at a different working directory `/Users/natalie_agus/Desktop/SUTD Acad`.
## Summary
We have learned a lot in just a few hours, and the information might be overwhelming. It is useful to search for terms that are unclear, such as *virtualisation* and *memory* to piece up the knowledge. The purpose of all the knowledge above is to help you utilise OS services better:
- [x] Using appropriate commands in the CLI depending on the OS
- [x] Establishing sockets for Interprocess Communication, and realising the importance of Network Security
- [x] Knowing the basic concept of processes, working directory, environment variables to utilise other OS services better
- [x] Spawning and killing processes in the system
- [x] Navigating the file system to manipulate files: creation, deletion, truncation, etc
- [x] Installing useful **tools** like `git` and `node`, and know how to troubleshoot (we will have more fun with shell scripting in Day2)
Next, we will learn basics of Computer Networks to have a basic idea on what was going on when a data is transferred **between two machines** via the Internet. There exists many **protocols** to ensure smooth delivery of data between the client and server, much like the protocols for package delivery.
![](https://i.imgur.com/GEARwN8.png)
Follow this [link](https://hackmd.io/@Crimsonlycans/BydSP-UWF) to proceed.