# Finlab_Data_Center by Django
## Purpose
Use Django framework to make a data-center website, in order to schedule crawlers or ETL functions by django-Q, which is a multiprocessing task queue for Django and has high performance and easy way to controll in django-admin website.
In order to monitor ETL processes, we store the hook result(log) in django_admin database, then we could use these data to push and analysis info to our web or third party.
## Data Flow

## Data center basic env framework

## MySQL Auth
```
mysql -u root
CREATE USER 'finlab'@'localhost' IDENTIFIED BY 'finlab';
GRANT ALL PRIVILEGES ON * . * TO 'finlab'@'localhost';
# change passward
ALTER USER 'finlab'@'localhost' IDENTIFIED BY 'passward';
```
## TW_stock_setup
```
git clone https://gitlab.com/finlab_company_class/tw_stock.git
cp -r tw_stock/data finlab_data_center/tw_stock_class/
```
### GCP Cloud SQL:
add [IP4(search)](http://ipv4.whatismyv6.com/) in gcp sql connection
## Django command
### static file
[use for django admin css no show in gunicorn server,run this command at first time in local use ](https://zhuanlan.zhihu.com/p/31475023)
`python manage.py collectstatic`
### Set env and install module
```
virtualenv venv
source venv/bin/activate
# primary module
pip install -r requirements-to-freeze.txt
# related module in requirements.txt
pip freeze > requirements.txt
```
### DB related command
Only admin_db router could be migrate, because we need elastic in stock database.Use raw sql to define tables in stock database, remember to record raw sql files in sql-records directory.
```
# default setting is dev
python manage.py migrate --database=admin_db
python manage.py migrate --settings=finlab_data_center.settings.pro --database=admin_db
```
### Create superuser
`python manage.py createsuperuser --database=admin_db`
### Run jupyter by django-plus kernal
`python manage.py shell_plus --notebook`
#### add below code in ipynb file to connect django env
```
import sys,os
sys.path.append("..")
import django
django.setup()
```
### Run python shell
`python manage.py shell`
### Run Server
```
# only test
python manage.py runserver
# use gunicorn in production for security
gunicorn finlab_data_center.wsgi:application
```
### Start app
`python manage.py startapp us_data`
### [Inspect DB](https://docs.djangoproject.com/en/3.1/howto/legacy-databases/)
use legacy db to export django-model format in model file
`python manage.py inspectdb --database=us_db > us_data/models.py`
## Create New Table Flow
1. Define raw sql, and paste raw sql in records dir.
2. Use inspectdb to print model codes in django
ex:
```
# show on terminal
python manage.py inspectdb --database=us_db
# export to modelfile
python manage.py inspectdb --database=us_db > us_data/models.py
```
3. Write crawlers and use `CrawlerProcess.specified_date_crawl or SqlImporter.add_to_sql` to create and test init data,**don't use CrawlerProcess.auto_update_crawl firstly**.
## Crawlers
use etl/import_sql.py to control crawlers to import data.
### Demo
1. tw_data

2. us_data

## [Django Q](https://django-q.readthedocs.io/en/latest/index.html)
### Run django-Q
`python manage.py qcluster`
### Use task func
we have two task funcs(to process for time series or non time series) and one hook func.

### [Set django-Q cluster](https://django-q.readthedocs.io/en/latest/configure.html)

### [Set django-Q admin_site](https://django-q.readthedocs.io/en/latest/admin.html?highlight=admin#admin-pages)

### Schedule board

### CRUD for tasks
args only accept str or number


## File framework for app
1. crawlers.py: write crawlers
2. tasks.py: write schedule task functions
3. models.py: define data format.
## Notifications
use email to sent schdule daily report
## Server(Gunicorn)
## Deploy
### Docker command(see /scripts)
#### build、tag、push
```
docker build -t finlab-data-center -f Dockerfile .
docker tag finlab-data-center asia.gcr.io/rare-mender-288209/finlab-data-center:v0.1.3
docker push asia.gcr.io/rare-mender-288209/finlab-data-center:v0.1.3
```
#### run
```
docker run --env PORT=8001 --env DJANGO_ENV=pro -it -p 8001:8001 finlab-data-center
```
#### use volume,Synchronous update with local, no need to rebuild
```
docker run --env PORT=8001 --env DJANGO_ENV=pro -it -p 8001:8001 -v "$(pwd):/app" finlab-data-center /bin/bash
```
### [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine/docs/tutorials/hello-app?hl=zh-tw)
#### Create Cluster
```
# init
gcloud config set project rare-mender-288209
gcloud config set compute/zone asia-east1-a
gcloud container clusters create finlab-micro-service --num-nodes=1
gcloud container clusters get-credentials finlab-micro-service
kubectl create deployment finlab-data-center --image=asia.gcr.io/rare-mender-288209/finlab-data-center:v0.1.3
kubectl expose deployment finlab-data-center --type LoadBalancer \
--port 80 --target-port 8001
kubectl delete service finlab-data-center
gcloud container clusters delete finlab-data-center
```
#### [GKE connect to Cloud SQL Proxy](https://cloud.google.com/sql/docs/mysql/connect-kubernetes-engine)
##### [medium reference](https://medium.com/google-cloud/connecting-cloud-sql-kubernetes-sidecar-46e016e07bb4)
##### 1.use secret to set env args, sql proxy host must be localhost
```
kubectl create secret generic cloudsql-db-credentials --from-literal=DBACCOUNT=finlab_admin --from-literal=DBPASSWORD=qetuadgj --from-literal=DBHOST=127.0.0.1
```
##### [2.create service account](https://blog.johnwu.cc/article/gcp-kubernetes-connect-to-cloudsql.html)
##### 3.set iam service account key
```
gcloud iam service-accounts keys create ~/key.json \
--iam-account cloudsql-instance@rare-mender-288209.iam.gserviceaccount.com
kubectl create secret generic cloudsql-instance-credentials --from-file=service_account.json="/Users/benbilly3/key.json"
```
### first deploy(use k8s_deploy.yaml)
#### [how to write k8s deploy yaml?](https://github.com/GoogleCloudPlatform/cloudsql-proxy/blob/master/examples/kubernetes/proxy_with_sa_key.yaml)
`kubectl create -f k8s_deploy.yaml `
### rolling update(change image version)
`kubectl set image deployment/finlab-data-center finlab-data-center=asia.gcr.io/rare-mender-288209/finlab-data-center:v0.1.3`
## Other notes
1. config_json: note env args like db connection, api_key
2. db_router.py: control multiple db connection
3. settings dir: default env setting is dev, both two env are inherited from base.py. django-controller is setting in base.py.
4. CHANGELOG.md: record work comments gor everyversion
5. /scripts:record the command.sh
## Reference
[Django](https://www.djangoproject.com/)
[Django-Q](https://django-q.readthedocs.io/en/latest/)
[Django-Q introduce in PyCon ](https://www.youtube.com/watch?v=Ljs9e0kI7Ow)
[Asynchronous tasks in Django with Django Q](https://www.valentinog.com/blog/django-q/)
[GKE](https://cloud.google.com/kubernetes-engine/docs/tutorials/hello-app?hl=zh-tw#cloud-shell_3)