# ASMPT
## alert server
```bash=
curl -X POST -H "Content-Type: application/json" -d '{
"commonLabels": {
"dag_id": "model_update",
"lot_ids": 5908,
"dates": "2024-08-09"
}
}' http://10.121.252.189:30888/api/v1/alerts
curl -X POST -H "Content-Type: application/json" -d '{
"dag_id": "model_update",
"run_id": "model_update_2024-08-22T09:39:37.812110Z"
}' http://10.121.252.189:30888/api/v1/dags/runs
curl -X POST -H "Content-Type: application/json" -d '{
"commonLabels": {
"dag_id": "data_analyzer",
"lot_ids": 5908,
"dates": "2024-08-09"
}
}' http://10.121.252.189:30888/api/v1/alerts
# get dagruns
curl -X POST -H "Content-Type: application/json" -d '{
"dag_id": "data_analyzer",
"run_id": "data_analyzer_2024-08-22T09:26:23.667200Z"
}' http://10.121.252.189:30888/api/v1/dags/runs
# get result
http://10.121.252.189:30888/api/v1/dags/driftreport?dag_id=data_analyzer&run_id=9
```
## Grafana movement
```bash=
curl -X GET -H "Content-Type: application/json" http://10.121.252.189:30890/list_alert >> /home/jerry2024/MLOps-ASMPT/model-monitoring/examples/templates/alerts.json
bdu7ejskde5fkf
ddu7ejs6my8zka
curl -X POST -H "Content-Type: application/json" -d '{"alert_id": "bdu7ejskde5fkf"}' http://10.121.252.189:30890/delete_alert
curl -X POST -H "Content-Type: application/json" -d @../examples/templates/Hadoop.json http://172.18.0.2:30890/modify_dashboard
curl -X GET http://10.121.252.184:3000/api/search?query=Hadoop \
-H "Authorization: Bearer glsa_0YTZLoVuQ8OWcMFdVvi0ev65MtqQLElT_e94148c8" \
-H "Content-Type: application/json"
curl -X GET http://10.121.252.184:3000/api/dashboards/uid/ae8da2e5-e12a-4842-b244-e59f452d661e \
-H "Authorization: Bearer glsa_0YTZLoVuQ8OWcMFdVvi0ev65MtqQLElT_e94148c8" \
-H "Content-Type: application/json" >> /home/jerry2024/MLOps-ASMPT/model-monitoring/examples/templates/Hadoop.json
```
## Demo
### data analysis
```bash=
# trigger
curl -X POST -H "Content-Type: application/json" -d '[{
"labels": {
"dag_id": "data_analysis",
"lot_id": "ATWLOT-020123-0343-136-001",
"date": "2023-02-01"
}
}]' http://10.121.252.194:5888/api/v1/alerts
```
### model_evaluate
```bash=
# trigger
curl -X POST -H "Content-Type: application/json" -d '[{
"labels": {
"dag_id": "model_evaluate"
}
}]' http://10.121.252.194:5888/api/v1/alerts
# get dagruns
curl -X POST -H "Content-Type: application/json" -d '{
"dag_id": "model_evaluate",
"run_id": "1719565873"
}' http://10.121.252.194:5888/api/v1/dags/runs
# see whether state is success / running
# get result
curl -X POST -H "Content-Type: application/json" -d '{
"dag_id": "model_evaluate",
"run_id": "1719548649"
}' http://10.121.252.194:5888/api/v1/dags/response
```
### collector
```bash=
# list alerts
curl -X GET http://10.121.252.194:5222/list_alert >> /home/jerry2024/MLOps-ASMPT/model-monitoring/examples/logs/alerts.json
# list panels
curl -X POST -H "Content-Type: application/json" -d '{"dashboard": "MLOPS"}' http://10.121.252.194:5222/list_panel >> /home/jerry2024/MLOps-ASMPT/model-monitoring/examples/logs/panels.json
# create alert
curl -X POST http://10.121.252.194:5222/create_alert \
-H "Content-Type: application/json" \
-d @/home/jerry2024/MLOps-ASMPT/model-monitoring/examples/templates/alert.json
# delete alert
curl -X POST -H "Content-Type: application/json" -d '{"alert_id": "bd89b397-0b5b-4b7b-9676-e9446c6354a1"}' http://10.121.252.194:5222/delete_alert
# modify dashboards
curl -X POST -H "Content-Type: application/json" -d @/home/jerry2024/MLOps-ASMPT/model-monitoring/examples/templates/dashboard.json http://10.121.252.194:5222/modify_dashboard
# delete dashboards
curl -X POST -H "Content-Type: application/json" -d '{"dashboard": "DEMO"}' http://10.121.252.194:5222/delete_dashboard
```
## Architecture
* 20cores, 16g ram, 256g disk, ubuntu 23.04 live server
* 10.121.252.7
* ports:
* grafana 3000
* prometheus 9090
* mlflow 8080
* http-server (mlflow exporter) 8001
* airflow 8000
## Api References
### mlflow exporter (to update metrics)
#### usage
```bash=
curl -X GET http://10.121.252.7:5111/update_score
```
#### expect result
```bash=
"Score updated successfully", 200
```
### alert service (to trigger dag)
#### usage
* action type could be model_evaluate, model_update and data_analysis.
```bash=
curl -X POST -H "Content-Type: application/json" -d '[{
"labels": {
"dag_id": "data_analysis",
"lot_id": "ATWLOT-020123-0343-136-001",
"date": "2023-02-01"
}
}]' http://10.121.252.7:5888/api/v1/alerts
curl -X POST -H "Content-Type: application/json" -d '[{
"labels": {
"dag_id": "model_evaluate"
}
}]' http://10.121.252.7:5888/api/v1/alerts
curl -X POST -H "Content-Type: application/json" -d '{
"dag_id": "data_analysis",
"run_id": ""
}' http://10.121.252.7:5888/api/v1/dags/response
curl -X POST -H "Content-Type: application/json" -d '{
"dag_id": "model_evaluate",
"run_id": "1718183652"
}' http://10.121.252.7:5888/api/v1/dags/runs
curl -X POST -H "Content-Type: application/json" -d '{"run_id": "17174832558"}' http://10.121.252.7:5888/api/v1/dags/response/put
```
#### expect result
```bash =
"Alert received successfully!", 200
```
#### Todos
* see the result of each dag
* maybe need to pass more data to dag
### Collectors (control grafana)
```bash=
# list alert
curl -X GET http://10.121.252.7:5222/list_alert
# list panels
curl -X POST -H "Content-Type: application/json" -d '{"dashboard": "HDFS"}' http://10.121.252.7:5222/list_panel
# delete alerts
curl -X POST -H "Content-Type: application/json" -d '{"alert_id": "a4e4c499-e7c0-445e-bbf5-d7972b3154d3"}' http://10.121.252.7:5222/delete_alert
# create alerts
curl -X POST http://10.121.252.7:5222/create_alert \
-H "Content-Type: application/json" \
-d @alert_template.json
# modify dashboards
curl -X POST -H "Content-Type: application/json" -d @/home/jerry2024/MLOps-ASMPT/model-monitoring/examples/templates/dashboard.json http://10.121.252.7:5222/modify_dashboard
# delete dashboards
curl -X POST -H "Content-Type: application/json" -d '{"dashboard": "HDFS"}' http://10.121.252.7:5222/delete_dashboard
-- test commands --
# test
curl -s -X 'GET' -u admin:admin 'http://10.121.252.7:3000/api/v1/provisioning/alert-rules/export' -H 'accept: application/json' | jq --sort-keys '.groups[].rules[]' > process_memory_copy_group_rules.json
curl -s -X 'GET' -u admin:admin 'http://10.121.252.7:3000/api/folders' -H 'accept: application/json'
curl -s -X 'GET' -u admin:admin 'http://10.121.252.7:3000/api/search?query=HDFS' -H 'accept: application/json'
// get
curl -s -X 'GET' -u admin:admin 'http://10.121.252.7:3000/api/dashboards/uid/dd75c664-4c59-417f-a388-ba918a0cf820' -H 'accept: application/json' > dashboard_template2.json
curl -s -X POST -u admin:admin 'http://10.121.252.7:3000/api/dashboards/db' \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-d @dashboard_template.json
```
## Project Deployments
### Hive inotify
* code
* https://github.com/kevin1010607/MLOps-ASMPT/tree/model-monitor/model-monitoring/examples/inotify
* prometheus.yml (/etc/prometheus/prometheus.yml)
```bash=
- job_name: 'hive-inotify'
static_configs:
- targets: ['10.121.240.106:9605']
```
sudo systemctl reload prometheus
bash Startall.sh
### ALERT SERVER
#### build
```bash=
docker build -t johnson684/alert-server:latest .
docker push johnson684/alert-server:latest
```
#### Pods
```bash=
kubectl delete -f alert_server.yaml
kubectl apply -f alert_server.yaml
```
### Metrics collector & mlflow-exporter
* python metric-exporter.py
* python collector.py
---
## Installations
### kafka
totally based on the quick-start tutorial
https://kafka.apache.org/quickstart
```bash=
bash kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic hive_table_events
```
### kafka-python
```bash=
pip install git+https://github.com/dpkp/kafka-python.git
```
### qemu
```bash=
sudo apt-get install qemu-guest-agent
systemctl start qemu-guest-agent
```
### Docker
#### install
```bash=
# remove unofficial packages
for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
# install the latest version
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# verify
sudo docker run hello-world
```
#### permission denied
* permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/json": dial unix /var/run/docker.sock: connect: permission denied
```bash=
sudo chmod 666 /var/run/docker.sock
```
### Kind
#### install
```bash=
# For AMD64 / x86_64
[ $(uname -m) = x86_64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.20.0/kind-linux-amd64
# For ARM64
[ $(uname -m) = aarch64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.20.0/kind-linux-arm64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind
```
#### kubectl
```
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
```
### prometheus & grafana
```bash=
sudo apt install -y prometheus prometheus-node-exporter
sudo apt-get install -y apt-transport-https software-properties-common wget
sudo mkdir -p /etc/apt/keyrings/
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install -y grafana-enterprise
sudo systemctl enable --now grafana-server
```
#### graphite
- use to export spark metrics
- add following code in prometheus.yml and then reload it
- mapping file
```bash=
mappings:
- match: '*.*.executor.filesystem.*.*'
name: spark_app_filesystem_usage
labels:
application: $1
executor_id: $2
fs_type: $3
qty: $4
- match: '*.*.jvm.*.*'
name: spark_app_jvm_memory_usage
labels:
application: $1
executor_id: $2
mem_type: $3
qty: $4
- match: '*.*.executor.jvmGCTime.count'
name: spark_app_jvm_gcTime_count
labels:
application: $1
executor_id: $2
- match: '*.*.jvm.pools.*.*'
name: spark_app_jvm_memory_pools
labels:
application: $1
executor_id: $2
mem_type: $3
qty: $4
- match: '*.*.executor.threadpool.*'
name: spark_app_executor_tasks
labels:
application: $1
executor_id: $2
qty: $3
- match: '*.*.BlockManager.*.*'
name: spark_app_block_manager
labels:
application: $1
executor_id: $2
type: $3
qty: $4
- match: '*.*.DAGScheduler.*.*'
name: spark_app_dag_scheduler
labels:
application: $1
executor_id: $2
type: $3
qty: $4
- match: '*.*.CodeGenerator.*.*'
name: spark_app_code_generator
labels:
application: $1
executor_id: $2
type: $3
qty: $4
- match: '*.*.HiveExternalCatalog.*.*'
name: spark_app_hive_external_catalog
labels:
application: $1
executor_id: $2
type: $3
qty: $4
- match: '*.*.*.StreamingMetrics.*.*'
name: spark_app_streaming_metrics
labels:
application: $1
executor_id: $2
app_name: $3
type: $4
qty: $5
- match: '*.*.executor.filesystem.*.*'
name: filesystem_usage
labels:
application: $1
executor_id: $2
fs_type: $3
qty: $4
- match: '*.*.executor.threadpool.*'
name: executor_tasks
labels:
application: $1
executor_id: $2
qty: $3
- match: '*.*.executor.jvmGCTime.count'
name: jvm_gcTime_count
labels:
application: $1
executor_id: $2
- match: '*.*.executor.*.*'
name: executor_info
labels:
application: $1
executor_id: $2
type: $3
qty: $4
- match: '*.*.jvm.*.*'
name: jvm_memory_usage
labels:
application: $1
executor_id: $2
mem_type: $3
qty: $4
- match: '*.*.jvm.pools.*.*'
name: jvm_memory_pools
labels:
application: $1
executor_id: $2
mem_type: $3
qty: $4
- match: '*.*.BlockManager.*.*'
name: block_manager
labels:
application: $1
executor_id: $2
type: $3
qty: $4
- match: '*.driver.DAGScheduler.*.*'
name: DAG_scheduler
labels:
application: $1
type: $2
qty: $3
- match: '*.driver.*.*.*.*'
name: task_info
labels:
application: $1
task: $2
type1: $3
type2: $4
qty: $5
```
```bash=
- job_name: 'graphite_exporter'
static_configs:
- targets: ['http://10.121.251.37:9108/']
```
- How to turn on?
```bash=
./graphite_exporter --graphite.mapping-config=graphite_exporter_mapping
```
### jmx
```bash=
java -jar jmx_prometheus_httpserver-0.20.0.jar 12345 config.yaml
```
* config
```yaml=
hostPort: localhost:36892
rules:
- pattern: ".*"
```
* hadoop-env.sh
```sh=
export HDFS_NAMENODE_OPTS="-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=36892"
```
### py-venv
```bash=
sudo apt install python3-pip
sudo apt install python3-venv
(env) python -m ensurepip --default-pip
(env) pip install -r requirements.txt
```
## Background (tmux)
```bash=
source /home/jerry2024/deepdata/env/bin/activate
mlflow server --host 127.0.0.1 --port 8080
python prometheus.py
airflow webserver --port 8000
airflow scheduler
```
## airflow (in env)
```bash=
export AIRFLOW_HOME=~/airflow
AIRFLOW_VERSION=2.8.2
# Extract the version of Python you have installed. If you're currently using a Python version that is not supported by Airflow, you may want to set this manually.
# See above for supported versions.
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
# For example this would install 2.8.2 with python 3.8: https://raw.githubusercontent.com/apache/airflow/constraints-2.8.2/constraints-3.8.txt
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
airflow db migrate
airflow users create \
--username admin \
--firstname Peter \
--lastname Parker \
--role Admin \
--email spiderman@superhero.org
# password: jerry2024
airflow webserver --port 8000
airflow scheduler
# 停止airflow webserver
ps -ef | grep 'airflow' | grep 'webserver' | awk '{print $2}' | xargs kill -9
cd $AIRFLOW_HOME
rm -rf airflow-webserver.pid
rm -rf airflow-webserver-monitor.pid
# 停止airflow scheduler
ps -ef | grep 'airflow' | grep 'scheduler' | awk '{print $2}' | xargs kill -9
cd $AIRFLOW_HOME
rm -rf airflow-scheduler.pid
```
## hadoop tmux
```bash=
# exporter for hadoop
cd /home/hdoop/jmx_exporter
java -jar jmx_prometheus_httpserver-0.20.0.jar 12345 config.yaml
# exporter for spark
cd /usr/local/graphite_exporter
./graphite_exporter --graphite.mapping-config=graphite_exporter_mapping
## open
tmux new-session -s exporters -d
tmux send-keys -t exporters:0 'cd /home/hdoop/jmx_exporter' C-m
tmux send-keys -t exporters:0 'java -jar jmx_prometheus_httpserver-0.20.0.jar 12345 config.yaml' C-m
tmux split-window -h -t exporters
tmux send-keys -t exporters:1 'cd /usr/local/graphite_exporter' C-m
tmux send-keys -t exporters:1 './graphite_exporter --graphite.mapping-config=graphite_exporter_mapping' C-m
## close
tmux detach -s exporters
```
## npm
```bash=
sudo apt install npm
```
## java & spark
[Notion](https://dasbd72.notion.site/Spark-Client-Installization-d2159c74379b4e45a279247de6649bc3)
---
## tmux usage
[tutorial](https://andyyou.github.io/2017/11/27/tmux-notes/)