We must choose a suitable Operating System and hardware to succesfully deploy our server html.js
or server.js
. Surely we can deploy it on our personal computer, but that's not an ideal machine to use for long-term servers (unreliable, lack of proper security measures, not to mention expensive to maintain).
As such we typically rely on existing services, like the AWS EC2, Google Compute Engine, DigitalOcean Droplets, Azure VMs among many others. All of these services are web services that provides secure, resizable compute capacity in the cloud. They're ideal to be used to run our server code reliably.
For the purpose of this course, we will be using AWS. We have created an account for each of you. Log in to your AWS account and go to EC2 homepage
Then, choose a region of your choice. Remember this region because that's where your instance is hosted. In this example, we use "Ohio".
Right now we do not have any instances yet, so let's create one. Click on Launch Instances, and select Ubuntu Server 20.04 LTS
for the Operating System (also known as machine image) option:
For the instance type, select t2.micro
. It is free tier eligible. The instance type simply defines the hardware capacity of our computer.
Click "Review and Launch" to immediately launch the Instance. We can do other settings later. You will see this page:
When you click "Launch", you will be prompted with key-pair generation. Select create new pair and give it a name, then download the .pem
file.
This is important if you want to remotely access your AWS EC2 instance via ssh
. We will do this later. For now, you will see that you have one instance in the dashboard:
Click on the instance and then click the connect button on the next page:
AWS EC2 supports direct access to your instance via the web browser.
Clicking on "connect" opens a new tab that shows the command prompt of your newly made instance. From here, you can type various commands and use the computer as per normal.
There's no GUI here, we are simply accessing the Operating System services via the command line interface (CLI). We have been doing quite a bit of that earlier to run node
and various git
commands, but we will learn more about the CLI and OS real soon.
If you wish to use your own SSH client, then you can follow these steps:
chmod 400 <filename>.pem
ssh
command to your instance. The details can be found in the SSH Client tab:The output should looks similar to that of EC2InstanceConnect:
Another convenient and recommended way to connect to your EC2 instance is to use AWS Cloud9. Cloud9 is a free web IDE that allows you to connect and access your EC2 instances. Select Cloud9 services from the "Services" tab of your AWS console:
Afterwards, click on create environment:
Give your environment a nice name:
Configure the environment using your EC2 username (which will be ubuntu
by default) and hostname:
The EC2 hostname can be found in the EC2 dashboard we saw earlier
Click "Copy key to clipboard" at Cloud9 environment setting, and head to your EC2 InstanceConnect terminal or your own terminal (that has SSH access to your EC earlier) and paste in the following commands in succession:
sudo apt-get update
sudo apt-get -y upgrade
echo <paste the copied key> >> ~/.ssh/authorized_keys
sudo apt-get install -y nodejs
The first two commands updates the Ubuntu OS in your EC2. The third command adds the Cloud9 public key to your EC2 ~/.ssh/authorized_keys
file. The complete command will look something like this:
echo ssh-rsa 5QGOtWtUJqiBnZQb0ZqPXzgCmd0JgH0H/pSZSr2qFqUjHgYC/yknqCQ6O0WhmeK9ond8vR40zk4aZKnuvE+ZTZDQhMtQSuBcyQPzAqLmXmWAHF5vPAsdLmCXVRTAsQ6aYeDom8viwiZ1yHTuWWAuxeNm7SQTIxho0KSjwRc3OO1gIU6UhFK/jpWZ8X+vEppMRKYMiB1+OTUbrcL9UPUizXp5GIkNO7bgqRtvIRzRriTkiXg+zOSIEcg7KkXXPhoeLUHcAUGpqKqesGTy9jJars8hRAczRaynDL5lHiiL3Ah4UXAvl8bO+WosohPai/4nZ6TX45yOMWEIKidCGal8VGjwnEmsWMmWduwquXey4T6tcGK7cLmM9hPymdwyxgZMTv05Cop2d3XemeH/NNu9BNWucvoZjbCieKih2wcqqxYdwxNFhKzc+TydJGa7ALrS+BznJMwb+HrfDpVzsWBFAwOUseC1vgVKOCpkeriDp+PVMwjly0ABwtJkww5/73zmVewet7eN+EAYPusd9SJAF+CbevRb18dhFxUSaFFetKBbcns9hzBZNR6S7GQ8vATnsrhj9Qzd/krWmFmsX+mMRqzSDng74WVnwBtFWM253ZEdCcRl8RFY0ia0ff0upJ++dwrNCm9k/Y/U+Km9fuYaC9NX12MNB4vwdQ== nat+581994641164@cloud9.amazon.com >> ~/.ssh/authorized_keys
You can read the file by typing the command:
cat ~/.ssh/authorized_keys
Finally, the fourth command is to install node
into the EC2. Cloud9 requires node
to run. You might be faced with warnings as such to restart some outdated daemon. Simply select (navigate with arrow and press space to select) all and tab to the OK
to restart.
Once you're done, head back to the Cloud9 dashboard and click on "next step":
This will bring you to the next page where you can simply click "Create Environment". You will then be prompted with this:
You can click "next" to allow Cloud9 to install manually, but that's rather hassling and you might be met with unexpected errors. It's best to install this manually.
For now, just ignore this and head back to your EC2 SSH console.
Right click on the C9 install link and get the link address: https://d1q2hgnv37wylw.cloudfront.net/static/c9-install.sh
If it's still the same as the above, copy the commands below to continue. Else replace the link for
wget
with the new link. We never now when they will ever update their CDN.
Head back to your EC2 terminal in InstanceConnect or your own SSH client. Type the following commands in succession:
wget https://d1q2hgnv37wylw.cloudfront.net/static/c9-install.sh
chmod a+x c9-install.sh
sudo apt-get -y install python2
sudo apt-get -y install build-essential
./c9-install.sh
The first command downloads the shell script c9-install.sh
from the cloudfront website. We then change the file permission of this script into executable. The next two commands install the necessary libraries (Python and some basic utilities – generally includes the GCC/g++ compilers and libraries). The last command executes the script to install C9 into your EC2.
Wait for a few minutes and once it is done, return to your Cloud9 webpage and click refresh to relaunch the IDE. You can also find your environments in the Cloud9 dashboard:
The IDE looks like this:
If you're faced with outdated package warning e.g: tmux
, just click Update
:
Right now, you only have one file in your Root directory, which is the install.sh
script you downloaded earlier using wget
. With this, you're all connected with a convenient editor GUI to traverse and manipulate the EC2 filesystem.
One last thing to do is to run the code you pushed at the remote github repository in this EC2.
You can git clone <repository-url>
your EC2 instance. You should have this at the C9 IDE and run the server html.js
:
But wait! How do we access them? We cant just do http://0.0.0.0:8000 in our browser because this is now hosted at another machine (our EC2).
In the context of servers, 0.0.0.0 means all IPv4 addresses on the local machine. For instance, if a host is assigned two IP addresses, 192.168.1.1 and 10.1.2.21, and a server running on the host is configured to listen on 0.0.0.0, it will be reachable at both of those IP addresses. Note that this is common to hosts that have more than one network interface, as they have one Internet address for each interface.
This means we must find out the PUBLIC IP address of our EC2 instance. It is somewhere in your instance's' EC2 dashboard (IPv4). Find it.
Afterwards, construct the url: http://<your-EC2-public-IPv4>:8000
. We need to also tell our EC2 to accept incoming traffic. Go to the security tab in your instance's EC2 dashboard, and click on the security group entry highlighted in blue:
Click on "edit inbound rules" in the new page:
Then add the new rule to allow custom TCP connection from any IPv4 address (so the public can access this server) and give it a nice description:
After saving the inbound rules, try the url to your server in your computer's browser. You should see the nice webpage as before, but this time round it hosted from your EC2:
Open index.html in your local repository and make some small changes to index.html:
<body>
<div class="center">
<h1>Hello Again!</h1>
<p>This is served from a file</p>
<p>Have a nice Day!</p>
</div>
</body>
Save it and push
to the remote repository (of course it's implied that you need to add
then commit
). Afterwards, attempt to pull
it on the EC2 instance, and re-serve the webpage. Refresh the site's URL in your web browser should see this new text instead:
Similarly, if you want to push
directly from the EC2 instance, you need to generate a new personal access token and save it in the EC2's github login credentials.
Find out how to do this (10 mins). You can review the previous notes for hints.
If you're met with certain errors like this when running node for the second time:
..it means that there's a process that is using that exact port 8000. We need to manually kill the process by typing the command:
ps aux | grep node
. This lists out all processes with the name "node":
The first entry indicates that there's a node
process that still runs html.js
. We need to kill that process by the kill
command:kill -9 <pid>
where pid
is the process id number (51934) in this case. Afterwards, you can run node html.js
again.
Always press ctrl+c to kill the
node
process properly.
Now that we have successfully "deployed" our web server code onto the EC2 instance, it is time to understand a little bit about what's going on under the hood. In particular, to give some light to these questions:
Let's begin with the Basics of Operating System.
An operating system is a special program that acts as an intermediary between users of the computer and the computer hardware.
The operating system is part of the computer system and is analogous to a government.
The goal of an operating system is such that we have a dedicated program to fulfil the following essential roles:
Once we have an operating system, it makes things easier for users to use a program / code another program for other purposes within a computer system.
There are a lot of things that make up an operating system, but they are generally divided into three categories:
The OS provides an environment such that user programs such as the text editor, web browser, compiler, database system, music player, video editor, etc can do useful work.
Since each user program runs in a virtual machine (i.e: it is written in a manner that the entire machine belongs to itself), there has to be some sort of another program that manages and oversees all programs that live on the RAM and reside on disk, as well as managing the memory hierarchy.
This special program is part of the operating system called the kernel. It provides essential services, such as interprocess communication and file system management.
There are a whole lot of other details about Operating System, OS services, and Kernel that are omitted for this course. It is recommended for you to do further reading with this book if you're interested.
The Kernel is a software, which forms the heart of an operating system. Its size varies greatly depending on the architecture, for example,
One of the most famous kernels that are used by many OS is the Linux Kernel. A few examples of Linux distributions (an operating system made from a software collection that is based upon the Linux kernel) are Ubuntu, Debian, Fedora, Android, and Chrome OS among many others.
See for yourself the code for the Linux Kernel originally developed by Linus Torvalds here.
Fun fact: Torvalds also initially developed Git to maintain the development of the Linux kernel. You can find the source code here.
The OS interface gives users convenient access to various OS services. They are programs that can execute specialised commands and help users perform appropriate system calls in order to navigate and utilise the computer system.
There are two general ways for users to conveniently access OS services:
The OS GUI is what we usually call our “home screen” or “desktop”. It characterises the feel and look of an operating system. We use our mouse and keyboard everyday to interact with the OS GUI and make various system calls:
The OS CLI is what we usually know as the “terminal” or “command line”.
A command-line interface (CLI) is a means of interacting with a computer program where the user issues successive commands to the program in the form of text. The program which handles this interface feature is called a command-line interpreter.
The particular program that acts as the interpreters of these commands are known as shells. Typical OS might come with multiple command line interpreters. A user may choose among several different shells, including the Bourne shell, C-shell, Bourne-Again shell, Korn shell, and others. Users typically interact with a shell via a terminal emulator, or by directly writing a shell script that contains a bunch of successive commands to be executed.
A terminal emulator is a text-based user interface (UI) to provide easy access for the users to issue commands. Examples of terminal emulators that we may have encountered before are iTerm, MacOS terminal, Terminus, and Windows Terminal.
Common shells that we may have encountered:
Bourne-Again shell (bash): written as part of the GNU Project to provide a superset of Bourne Shell functionality. This shell can be found installed and is the default interactive shell for most Linux distros and macOS systems
Z shell (zsh) is a relatively modern shell that is backward compatible with bash. It's the default shell in macOS since 10.15 Catalina.
PowerShell – An object-oriented shell developed originally for Windows OS and now available to macOS and Linux.
The commands that are valid for macOS might not be valid for Windows user. For instance, the command ls
can be used in macOS to list out the items in the current directory, but the same command wont work on Windows. The command dir
must be used instead.
For practical purposes, there has to be a coherence beteween various OSes such that at least they confirm to the same sets of commands. Therefore a family of UNIX-like operating system was born.
A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-like application is one that behaves like the corresponding Unix command or shell. There is no standard for defining the term, and some difference of opinion is possible as to the degree to which a given operating system or application is "Unix-like".
Unix was selected as the basis for a standard system interface partly because it was "manufacturer-neutral" (they are able to use the technology best suited to your needs at any time). However, several major versions of Unix existed, so there was a need to develop a common-denominator system.
The standard of which these UNIX OS comform to is called the POSIX standard.
The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems.
A brief history:
The POSIX specifications for Unix-like operating systems originally consisted of a single document for the core programming interface, but eventually grew to many separated documents. The standardized user command line and scripting interface were based on the UNIX System V shell.
Many user-level programs, services, and utilities (including awk
, echo
, ed
) were also standardized, along with required program-level services (including basic I/O: file, terminal, and network). POSIX also defines a standard threading library API which is supported by most modern operating systems. In 2008, most parts of POSIX were combined into a single standard.
POSIX defines both the system- and user-level application programming interfaces (API), along with command line shells and utility interfaces, for software compatibility (portability) with variants of Unix and other operating systems. POSIX is also a trademark of the IEEE, intended to be used by both application and system developers.
Ubuntu, macOS, and SolarisOS are all UNIX-like and POSIX compliant. Therefore similar sets of UNIX commands can be used to access the services of all these OSes. Windows however is not UNIX-like, resulting in it having an entirely different sets of commands.
Here's a list of common UNIX commands. We have seen some of them before:
And here's a list of common Powershell commands:
Through the CLI, we can conveniently type out commands such as git
, echo
, node
, etc. Some commands are "default" (comes with the OS without the need to install anything else), and some requires installations (like git
).
So how exactly do these commands work? By now we know that the commands that can be interpreted highly depends on the OS type (UNIX-like, or not). The shell primarily interprets a command from the user and executes it. There are two ways to implement commands:
You can type echo $PATH on your terminal to find out possible places on where these system programs are.
In short, when we install something new, like python
, its binary (executable) is installed at any of these paths, e.g usr/bin/python
.
python file.py
, our shell attempts to find python
program (name must be matching) in any of the paths shown in the output of echo $PATH
python
in any of these paths, the shell will execute that program with the argument file.py
command not found
will be shownHere's a very handy and useful website to learn what a particular command do, just simply paste a command such as git log --graph --all --oneline --decorate
to it and observe some magic in action:
What if we want the shell to search in other locations? We can modify the PATH
environment variable in a file in your home directory called .zshenv
(if you use zsh), or .bash_profile
(if you use bash).
For instance, suppose we write and compile a C program that prints "Hello World" as follows:
We tell zsh to the execute the program from this directory
, signified as a "dot" ./nat-cprog
, thus printing Hello, World!
. Attempting to execute the program name directly will result in command not found
error because zsh doesn't know where this program is located.
We can then add this current ~/Desktop/SUTD Acad
path to the PATH
environment variable, and this allows zsh to execute the program as a command:
We will learn more about common commands and shell scripting in Day2. Shell scripting is a great way to automate repetitive processes, such as creating or deleting thousands of files daily.
There are hundreds of processes running in our computer at any given time. You can type the command ps -ef
to list all running processes in the system.
A process is formally defined as a program in execution. A program is not a process.
A process couples two abstractions: concurrency and protection. Each process runs in a different address space and sees itself as running in a virtual machine – unaware of the presence of other processes in the machine.
Important: Each process is given a pid
(process ID). We can give commands to our Kernel via the CLI to kill
any existing process using the pid
. This is extremely handy to do if we need to kill a process that hangs or takes too much resources during runtime.
Every message passed back and forth between writer and reader (server and client) must be done using kernel’s help. One of the ways is via message passing (the other is using shared memory).
*Socket is one of message passing interfaces. *
A socket is one endpoint of a two-way communication link between two programs running on the network:
We will learn more about UDP and TCP as network communication protocols in the later part of today.
This is why we cannot run our
server.js
twice, because both are using the same socket.
In short, socket can be used for two processes in the same machine to communicate locally, or two processes in different machine to communicate (over the internet):
A port is a virtual point where network connections start and end. Ports are software-based and managed by a computer's operating system. Each port is associated with a specific process that listens to it.
Of course not all processes are assigned a port, only for those who requests for it like our webserver.
Ports allow computers to easily differentiate between different kinds of traffic: emails go to a different port than webpages, for instance, even though both reach a computer over the same Internet connection.
Ports are standardized across all network-connected devices in our computer, with each port assigned a number between 0 to 65536. Most ports are reserved for certain protocols — for example, all Hypertext Transfer Protocol (HTTP) messages go to port 80. Samples of standard ports include:
8080
, we need to specify it when keying the url in the address bar like we did earlier.172.217.194.102
. We can access google by typing https://172.217.194.102
. We can also access Google by typing https://google.com
.google.com
to 172.217.194.102
so that our queries to Google can be routed properly. The system that is responsible for this hostname-address translation is DNS. We will learn about it more later today.As a summary, IP addresses enable messages to go to and from specific devices, port numbers allow targeting of specific processes within those devices.
In our sample server, we use an arbitrary port
8080
and8000
(some unused ports) so not to affect other standard protocols.
Another important service provided by our OS and its Kernel is file system management. A file system controls how data is stored and retrieved in a system. It is a set of rules (and features) applicable to determine the way the data is stored. A physical disc can be separated into multiple physical partitions, where each partition can have different file system.
The purpose of a file system is to maintain and organize secondary storage.
Without a file system, data placed in a storage medium would be one large body of data with no way to tell where one piece of data stops and the next begins.
The File System operates using specific data structure and has a specific format. Its interface is part of the OS, so they vary between operating systems.
We can perform operations on files, and these operations are possible through the file interface or methods, analogous to the interface/methods we implement for classes in OOP.
Operations that can be performed on files as we know it already include: create, read, write, and delete. Other operations that are not that common are reposition and truncate.
Examples of common file system includes:
The File System manages collection of Files in our storage device.
A file is a logical storage unit in the computer, defined by the operating system. In layman terms, a file is a group of data bytes which are stored neatly in a known location with a unique name (path).
Consider the file with name: classrecording.mp4
below. The file consists of file attributes and file content. File attributes contain important information such as name, size, datetime of creation, user ID, etc. Its content is essentially a group of data bytes (~536MB). When we use this file, we don’t really care about its physical address (where it is actually stored on disk).
We only care about its path. That’s what we mean by “logical” storage unit.
Directories are: metadata that organizes files in a structured name space. In layman terms, directories are lists of names assigned to each file. You can think of it like a mall directory, where we have names and mall location, e.g: #03-68. The mall location in the case of a File System contains the ID of that file so that our computer can know where our file content is physically located.
We can change the names of each file by changing the content of the directory, while keeping the ID the same. This is analogous to changing the name of a Shop (we need to change the mall directory), but the "address" within that mall is still the same, e.g: #03-68.
Note that directory is very similar to the definition of a folder. However folder is a GUI concept, associated with the common folder icons to represent collection of files. If you are referring to a container of documents, then the term folder can be used – related to the GUI. The term directory refers more broadly to the way a structured list of document files and folders is stored on the computer.
When we compare file system in Windows and Linux, in Microsoft Windows, files are stored in folders on different data drives like C: D: E:. But, in Linux, files are ordered in a tree structure starting with the root (denoted with forward /) directory.
This root directory can be considered as the start of the file system, and it further branches out various other subdirectories. A general tree file system on your UNIX-like OS may look like this:
The full path of the folder Documents
is /Users/Ubuntu/Documents
. A file located inside Documents, for example: homework.py
has a full path of /Users/Ubuntu/Documents/homework.py
The working directory of a process is dynamically associated with each process. Our currently running shell is also a process. It gives the "starting point" for a process to navigate the file system.
When the process refers to a file using a simple file name or relative path (as opposed to a file designated by a full path from a root directory), the reference is interpreted relative to the working /Users/Ubuntu/Documents
that asks to create the file foo.txt
will end up creating the file /Users/Ubuntu/Documents/foo.txt
.
We can change the current working directory of our POSIX-compliant shell with the command cd
, and see the current working directory with the command pwd
. The command ls
lists out all files in the current working directory (one level).
When ls
was first executed in the above example, the current working directory is at /Users/natalie_agus/Desktop
, thus listing all first-level files at that path. The second execution of ls
lists different outputs because it was executed at a different working directory /Users/natalie_agus/Desktop/SUTD Acad
.
We have learned a lot in just a few hours, and the information might be overwhelming. It is useful to search for terms that are unclear, such as virtualisation and memory to piece up the knowledge. The purpose of all the knowledge above is to help you utilise OS services better:
git
and node
, and know how to troubleshoot (we will have more fun with shell scripting in Day2)Next, we will learn basics of Computer Networks to have a basic idea on what was going on when a data is transferred between two machines via the Internet. There exists many protocols to ensure smooth delivery of data between the client and server, much like the protocols for package delivery.
Follow this link to proceed.