# API setup in cPouta
In order to create an restAPI in cPouta, one must get familiar with what you want to do with it and what do you want to do with it. This documentation details only one architecture that was already done but other architectures are possible. For a good API, you will need
1. a server
2. a database
3. an API design
As a total beginner in the field, it took me about 2 weeks to go through all the steps. But you will need to get acquainted with sql relational database, remote connection, API design
## Server
As a member of a research institution in Finland, we are allowed to use the CSC infrastructure to compute, store, and retrieve data.
for more information about CSC see [CSC webpages](https://www.csc.fi)
In order to setup your own space online, follow the [guidelines](https://docs.csc.fi/cloud/pouta/) from CSC.
A set of video tutorial is available from their [support pages](https://docs.csc.fi/cloud/pouta/pouta-videos/)
A good tutorial to start with and get all the tools you need was developed in our unit and can be found [here](https://hackmd.io/bCR_WaNJRJ2bO68eT0bpKA).
NOTE: if you install ubuntu, do not try to connect via
```
ssh -i .ssh/keypair.pem cloud-user@hostIP
```
but instead
```
ssh -i .ssh/keypair.pem ubuntu@hostIP
```
By going through their materials you should be able to set it up quite nicely and setup also security measures to open and restrict the access to your virtual machine.
## Database
The second step is to work with a database. A database is basically a set of table that are stored on your computer (localhost) or on a remote server (IP address). A common database tool is [PostgreSQL](https://www.postgresql.org), it is free, open-source, has a graphical interface but most likely you will just need the command line options.
if you are a beginner and do not know where to start, one of the best step-by-step tutorial is available from [here](https://www.youtube.com/watch?v=qw--VYLpxG4&t=4306s).
### Installation
right, let's assume now that you are aware of the tool and found out how to work with it, we need to install PostgreSQL on our virtual machine in cPouta.
1. connect to your VM
2. Install PostgreSQL
```
sudo apt install postgresql postgresql-contrib
```
3. Allow remote access to server. Change commented #listen_address to listen_address = '*'
```
sudo vim /etc/postgresql/12/main/postgresql.conf
listen_address = '*'
```
4. Allow access with password from remote host.
```
sudo vim /etc/postgresql/12/main/pg_hba.conf
```
Add following line to the end of the file
```
host database_name user_name ip_of_remote_computer/32 md5
```
5. Add user
```
sudo -u postgres createuser --interactive
```
```
username: user_name
yes/no choises: no
```
6. Create database
```
sudo -u postgres createdb database_name
```
7. Add password to user
```
sudo -u postgres psql -c "alter user user_name with password 'example_password';"
```
8. Change ownership of the database_name
```
sudo -u postgres psql -c "alter database database_name owner to user_name;"
```
9. Restart PostgreSQL
```
sudo systemctl restart postgresql
```
After that you may need to open port 5432 from cPouta interface.
You can test connection from client with command:
```
psql -h host_name-d database_name -U user_name
```
Change hostname, database_name, user_name and example_password to what you like, just remember them or store them in a safe place for future use.
You are now ready to interact with your database remotely, store and query data remotely. But since this is impractical and only available to you, it would be good to open the database access publicly. To do that, you will need to setup your API. APIs will allow you to retrieve access, send query and obtain results from the database automatically.
## API design
First, you will need to design your API. What is meant by designing an API is to define the roles, queries and under which forms queries can be made. This is also very important to document your APIs that others can access it without wondering how it is done.
Designing the API can be done in multiple ways but these 2 platforms are usually favored:
- [Stoplight](https://stoplight.io)
- [Swagger](https://swagger.io)
I have personally used stoplight since it has a nice user interface where you can create all your relationship dynamically and still output your YAML document.
[^a]: YAML is a document format that stores all the data about your API. it works just like XML
Since I was a beginner in this, I found [this lecture](https://www.youtube.com/watch?v=ROVI2G8eH78) quite useful for setting up the bases but the most important aspect is
1. to setup the Paths
this means how to access certain type of data
2. set the responses
3. set each parameters that can be passed
some of the parameters can be made mandatory and some not
This part is crucial to understand how everything works and how you can relate your databases to your API.
## API implementation
before the deployment, the API will need to be tested locally and then be deployed on the cPouta server
You can try to implement the API with Python3, [Flask](https://flask.palletsprojects.com/en/2.0.x/installation/), [Flask-SQLAlchemy](https://flask-sqlalchemy.palletsprojects.com/en/2.x/) and SQLAlchemy. You can first create test environment in your own computer before you release it in cPouta server.
going through the installation on Anaconda
Working around with flask and tutorials
[Tutorial1](https://programminghistorian.org/en/lessons/creating-apis-with-python-and-flask)
### Query design
This is for example a shortened query that I designed to retrieve the latest entry in the database and select all entries between 2 dates. Both queries having optional parameters if needed.
```python
import flask
from flask import request
import sqlalchemy as db
import json
app = flask.Flask(__name__)
#app.config["DEBUG"] = True
def loginaccess(tablenm):
usrname = 'username'
pswd = 'password'
host = 'IP address of your host'
port = 'database port, usually 5432'
database = 'database_name'
table_load = tablenm
return usrname, pswd, host, port, database, table_load
##### Error management #####
@app.errorhandler(404)
def page_not_found(e):
return "<h1>404</h1><p>The resource could not be found.</p>", e
def incorrect_arg(e):
return "<h1>" + str(e) + "</h1><p>The resource could not be found. <br> Check the arguments if they were given correctly</p>", e
##### Error management #####
### Three types of requests ###
###### GET ALL THE DATA ######
@app.route('/api/v1/resources/emissions/all', methods=['GET'])
def api_all():
usrname, pswd, host, port, database, table_load = loginaccess('emissions')
query = "SELECT * FROM " + table_load + ";"
to_call = 'postgresql://' + usrname + ':' + pswd + '@' + host + ':' + port + '/' + database
engine = db.create_engine(to_call)
conn = engine.connect()
all_books = conn.execution_options(isolation_level="SERIALIZABLE").execute(query).fetchall()
result_list = []
for row in all_books:
result_list.append({
'id':row.id,
'date_time':row.date_time,
'country':row.country,
'emdb':row.emdb,
'em_prod':row.emissionintprod,
'em_cons':row.emissionintcons
})
result_dict = {'results': result_list}
jsonString = json.dumps(result_dict, indent=4, sort_keys=True, default=str)
conn.close()
return jsonString
###### GET ALL THE DATA ######
###### GET DATA BY DATE######
@app.route('/api/v1/resources/emissions/findByDate', methods=['GET'])
def api_filter_em_findByDate():
usrname, pswd, host, port, database, table_load = loginaccess('emissions')
query_parameters = request.args
country = query_parameters.get('country')
EmDB = query_parameters.get('EmDB')
startdate = query_parameters.get('startdate')
enddate = query_parameters.get('enddate')
query = "SELECT * FROM " + table_load + " WHERE"
to_filter = []
if country:
query += ' country=%s AND'
to_filter.append(country)
if EmDB:
query += ' emdb=%s AND'
to_filter.append(EmDB)
if startdate:
query += ' date_time>=%s AND'
to_filter.append(startdate)
if enddate:
query += ' date_time<=%s AND'
to_filter.append(enddate)
if not (country and EmDB or startdate or enddate):
return page_not_found(404)
query = query[:-4] + ';'
print(query)
print(to_filter)
to_call = 'postgresql://' + usrname + ':' + pswd + '@' + host + ':' + port + '/' + database
engine = db.create_engine(to_call)
conn = engine.connect()
all_books = conn.execution_options(isolation_level="SERIALIZABLE").execute(query, to_filter).fetchall()
if len(all_books):
results = [list(row) for row in all_books]
result_list=[]
for row in all_books:
result_list.append({
'id':row.id,
'date_time':row.date_time,
'country':row.country,
'emdb':row.emdb,
'em_prod':row.emissionintprod,
'em_cons':row.emissionintcons
})
result_dict = {'results': result_list}
jsonString = json.dumps(result_dict, indent=4, sort_keys=True, default=str)
conn.close()
else:
return incorrect_arg(404)
return jsonString
###### GET DATA BY DATE######
###### GET LATEST DATA######
@app.route('/api/v1/resources/emissions/latest', methods=['GET'])
def api_filter_em_latest():
usrname, pswd, host, port, database, table_load = loginaccess('emissions')
query_parameters = request.args
############ Load the last entries ############
query = "SELECT * FROM " + table_load + " ORDER BY date_time DESC LIMIT 1 ;"
to_filter = []
to_call = 'postgresql://' + usrname + ':' + pswd + '@' + host + ':' + port + '/' + database
engine = db.create_engine(to_call)
conn = engine.connect()
all_books = conn.execution_options(isolation_level="SERIALIZABLE").execute(query, to_filter).fetchall()
results = [list(row) for row in all_books]
query = []
############ Load the last entries ############
query = "SELECT * FROM " + table_load + " WHERE"
query += ' date_time=%s AND'
to_filter.append(results[0][1].strftime("%Y-%m-%d %H:%M:%S"))
country = query_parameters.get('country')
EmDB = query_parameters.get('EmDB')
if country:
query += ' country=%s AND'
to_filter.append(country)
if EmDB:
query += ' emdb=%s AND'
to_filter.append(EmDB)
# return page_not_found(404)
query = query[:-4] + ';'
print(query)
print(to_filter)
to_call = 'postgresql://' + usrname + ':' + pswd + '@' + host + ':' + port + '/' + database
engine = db.create_engine(to_call)
conn = engine.connect()
all_books = conn.execution_options(isolation_level="SERIALIZABLE").execute(query, to_filter).fetchall()
result_list=[]
for row in all_books:
result_list.append({
'id':row.id,
'date_time':row.date_time,
'country':row.country,
'emdb':row.emdb,
'em_prod':row.emissionintprod,
'em_cons':row.emissionintcons
})
result_dict = {'results': result_list}
jsonString = json.dumps(result_dict, indent=4, sort_keys=True, default=str)
#print(json.dumps(result_dict, indent=4, sort_keys=True, default=str))
conn.close()
return jsonString
###### GET LATEST DATA######
```
## Deployment
### Python installations
To deploy your flask app to the cPouta server, a few steps are required. First, we need to install Apache compatible with python
```
sudo apt-get install python3-pip apache2 libapache2-mod-wsgi-py3
```
Add the api folder where we will store some of the files
```
mkdir -p /var/www/flask_api/api
```
Transfer the api.py file that you created earlier and rename it as '_init__.py'
if you cannot load it directly jsut create the init file and copy/paste your api.py text in it. You must go in the api folder "var/www/flask_api/api/"
```
sudo touch __inti__.py
```
remove app.run()
Then you need to install python and some dependency packages on the server
```
cd /var/www/flask_api
python3 -m venv env
. env/bin/activate
pip install flask flask-sqlalchemy sqlalchemy psycopg2-binary
```
Within the flask_api folder, we declared a virtual environment *env* using python 3.X (depending on the python version available for the OS you are running). Within this virtual environment, we need to install other packages that are used in the script *flask*, *sqlalchemy*, *psycopg2*. If you have need for any other packages, you will need to install them at that point in the virtual environment.
### Apache deployment
then we need to create the conf file for apache
```
sudo touch /etc/apache2/sites-available/flask_api.conf
```
and then you should edit the file using nano for example
```
sudo nano /etc/apache2/sites-available/flask_api.conf
```
once the file is created and you are in editing more, add the following lines to the file
```
<VirtualHost *:80>
ServerName ip-address
WSGIDaemonProcess api python-home="/var/www/flask_api/env"
WSGIProcessGroup api
WSGIApplicationGroup %{GLOBAL}
WSGIScriptAlias / /var/www/flask_api/api.wsgi
<Directory /var/www/flask_api/api/>
Order allow,deny
Allow from all
</Directory>
ErrorLog ${APACHE_LOG_DIR}/error.log
LogLevel warn
CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>
```
Create api.wsgi file. This is to allow the server to know what it is going to run
```
sudo touch /var/www/flask_api/api.wsgi
```
edit your api.wsgi file in vim
```
sudo vim /var/www/flask_api/api.wsgi
```
and insert the following text instead
```
#!/var/www/flask_api/bin/python
import sys
import logging
logging.basicConfig(stream=sys.stderr)
sys.path.insert(0,"/var/www/flask_api")
```
Remember to save and exit ':wq' command in vim.
enable site, type the following instructions in the terminal window:
```
a2ensite flask_api
```
restart apache
```
systemctl restart apache2
```
### PostgreSQL tuning
Some of the information in the *pg_hba.conf* file might need to be fine-tuned in order to grant remote access to outside.
```
sudo vim /etc/postgresql/12/main/pg_hba.conf
```
within the *pg_hba.conf* file, you will need to add access to all users to the database. (top red highlight). Important to have it in the second row to potentially not shadow other access.
```
local all all md5
```

and then remove the text or comment out the text in the second red rectangle (as shown in the picture above).
after every change in the PostgreSQL, it is good to restart it every time
```
systemctl restart postgresql
```
### Folder's rights
You should create user to server and give ownership of /var/www/flask_api directory to it and allow user to read only access to database. This should remove database access issues.
to add a user to the server and create a password you will remember for this user
```
sudo adduser someusername
```
check that the user was created successfully
```
grep '^someusername' /etc/passwd
```
change the owner ship of the folder and subfolder of flask_api
```
sudo chown -R guest_making_city: /var/www/flask_api
```
### Server's rights
On the server's side in CSC Pouta, you must open the port 80 for all users. The port 80 will be used to access the database remotely through query via the api we just created.
Go to you security group, and add the following exceptions in the security group of your server, with the open IP 0.0.0.0/0

### API access
If everything goes right and works, you can type in a browser the floating IP address of your server and you shall be able to see the description message you have written down in CSC portal. in my case, it looks something like this

then you can write your queries
```
IP/QUERY
e.g.
http://128.214.253.150/api/v1/resources/emissions/latest
```
In which case, my code returns a .json formatted document

### Some tips
It is good to restart/reboot the PostgreSQL and Apache2 after you edit a file just to make sure that the new file are considered into the system
If you have problems with python, one can debug the operation of the python file using the following command
```python
. env/bin/activate
tail -f /var/log/apache2/error.log
```
Within the virtual environment you created, you can see the error log file in real time of your Apache application that includes python errors. (ctrl+C to exit the tailing mode).
Make sure there is no syntax error throughout the files that are declared in the conf, wsgi, or python script.
## Working example
You can try an existing API using the design I created for the Making city project here:
https://app.swaggerhub.com/apis/jean-nicolas.louis/emission-and_power_grid_status/1.0.0