# Midware Construction Manual
###### tags: `By_Ivan`
## Intro
This is the document for Osprey's midware. Aside from how to collect data from the clients, here we focus on how to transport the information into the database.
How to build the midware :[fallowing paragraph](https://hackmd.io/XT7qc-bvR92wXQdtGVgRYg?view#Modules-Design-Pseudo-codes)
Osprey Data Workflow :
```plantuml
start
-Probes on ~~remote machine~~ (data collection)
-**Midware** (data parsing / adds machine info)
-Central machine (mySQL Database)
-Osprey (AIOPs)
-other features
```
## Input regulations
Osprey accepts csv files for its input. It also requires a config json file to define the header.
The input file should provide two time data. One is for input mechanism, presented in the filename; the second is for recording the data timeline, presented within the csv line.
```htmlembedded
#config.json
{"xxx_probe":{ # define the output filename
"probe_name_idx": [1,2], # index of the value's name, could have multiple
"probe_attr_idx": [0], # index of the values, could have multiple
"probe_time_idx": 5, # index of the data time
"probe_server_idx": 3, # index of the client's (probe server) ip
"title_time_format": "%Y%m%d_%H_%M", # define the file's time, *MUST* precise to Minute
"data_time_format": "%Y%m%d_%H:%M:%S", # define the data's time, *CAN* precise to Second
"header":["value", "index", "keyword_input", "probe_server", "datetime"]
}}
```
```bash
[root@VM probe]$ ls
xxx_probe20210619_15_20.csv
[root@VM probe]$ cat xxx_probeYmd_H_M.csv
1, "test_name1", "test_name2", "127.0.0.1", "20210619_15:20:30"
```
# Module Design concepts
There are three goals we wish to acomplish:
1. Scalability
2. Real time config adjustment
3. Real time monitoring
### 1. Scalability
Along with the product's devlopement, it is necessary to create new midware to collect information from various sources.
First of all, to lower the effort in developement, a basic framework is essential. The framework is not strictly regulated in functions or algorithms, but most of the argument input and driver functions are pre-implemented. Thus the coder can focus on how to retrieve the data, how to parse these texts, and how to present to the user side.
Secondly, server resource management is also a must when the quantity of monitored objects goes large. In current design, all probe uses the same amount of threads and ports and the same tool packages as well. These uniform setting should make the program easy to manage.
### 2. Real time config adjustment
The midware should become a background service, and send the queried data through connection automatically. Preferably, we don't like to turn off the program when we adjust the setting, such as adding or deleting a monitor target.
In order to make this work, a dictionary object controlling the query behavior is shared between main module and UI module.
Users can configurate this variable object through the UI endpoints. When ever they make any modification, the update would change what and how the query operates.
#### Room for improvement: Overhead in preventing error
It is expected for the query section to use one or multiple for loop. Thus, to prevent the modification from breaking the loop iteration, the dictionary object would be copy at the start of every query action.
This could possibly be solved by signals or some "check update" mechanics. The copy should only be executed when there's request from the endpoint. However, how to prevent users from "accidentally" or "intentionally" spamming the copy mechanism is an important and somewhat tricky problem.
### 3. Real time monitoring
Osprey is designed to record values in 1 minute intervals. Considering the quantity of monitored objects in the future, I arranged the timer loop and query functions in seperated threads. This is meant to prevent the query tasks from exceeding 1 minute time limit.
Upon each iteration, the main thread would fork out a sub-thread in charge for query message and csv recording. While the main thread stalls for the next 1 minute, the sub-thread would go through the process of: extract the target list from config, send messages to probe through some other method (not listed in this document), parse the return messages, and write to a new csv file.
#### Room for improvement: Thread handling and async functions
In current design, the sub-thread would exit when its tasks are finished. It is a resource-heavy method and can be improved.
In my thought, with async functions and some realtime multithreading techniques, it is possible to time the query action correctly while leaving the sub-threading alive. This method is yet to be built by those who have related experiece, since I have no such skill.
## Modules Design and codes
The graph of the midware pipeline:

#### Package and requirement
This framework structure is built with these packages:
Flask, json, logging, threading, atexit, argparse
os, sys, time
### Midware.conf.json:
```json
{
"server": {
"host": "Machine_IPaddress",
"password": "PSWd",
"user": "USEr",
"UI_address": {
"host": "0.0.0.0",
"port": 8888
}
},
"probe": {
"Names": [ "Objects" ]
}
}
```
### Main module:
```python=
import xxx_query as q
import xxx_UI as ui
from daemonize import daemon
class D(daemon):
def run(self):
# Set up config
self.config_setup()
server = self.config["server"]
q.login(host = server["host"], user = server["user"], password = server["password"])
# Start ui thread
UI_SETUP()
ui_th.start()
# Loop runs every 1 minute. It's expected to stop by keyInterupt or SigTerm
while(True):
thd = threading.Thread(target=q.bulk_query, args=(self.config["probe"], self.out_Dir))
thd.start()
# 1 minute timer
now = datetime.datetime.now()
delta = datetime.timedelta(minutes=1) - datetime.timedelta(seconds=now.second, microseconds=now.microsecond)
time.sleep(delta.total_seconds())
def __init__(self, args):
# Init and arguments setup
INIT_SETUP()
# Rotation Log and logging level setting
RotatingFileHandler(args.log)
level = logging.WARNING-10*args.v
logging.getLogger().setLevel(level)
def config_setup(self):
# Load json file and record the config when exiting
with open(self.conf_Path) as fp:
self.config = json.load(fp)
atexit.register(self._config_record)
def _config_record(self):
if self.config:
with open(self.conf_Path, "w") as fp:
json.dump(self.config,fp, indent=4)
if __name__ == "__main__":
assert MIN_VERSION < sys.version_info < MAX_VERSION
main_path = os.path.realpath(__file__)
prj_dir_path = os.path.dirname(main_path)
# Arugument parsing
parser = argparse.ArgumentParser(description='Desciption.',usage='%(prog)s [-h] [options] {start,stop,restart,daemon}')
PARSER_SETUP()
args = parser.parse_args()
# Init application
APP = D(args)
if args.command == "daemon":
APP.start()
elif args.command == "restart":
APP.restart()
elif args.command == "stop":
APP.stop()
elif args.command == "start":
APP.start(if_daemon=False)
```
:::spoiler INIT_SETUP()
```python=
super(D,self).__init__()
self.pidfile = args.pidfile
self.conf_Path = args.config
self.out_Dir = args.out_Dir
self.config = {}
```
:::
:::spoiler UI_SETUP()
```python
ui.app._config = self.config["probe"] # copy config
ui_th = threading.Thread(target=ui.app.run, kwargs=server["UI_address"])
ui_th.setDaemon(True)
```
:::
:::spoiler PARSER_SETUP()
```python
parser.add_argument('command', choices=["start","stop","restart","daemon"])
parser.add_argument("-c","--config", metavar="path", default=os.path.join(prj_dir_path,"midware.conf"))
parser.add_argument("-l","--log", metavar="path", default=os.path.join(prj_dir_path,".midware.log"))
parser.add_argument("-o","--out_Dir", metavar="path", default=os.path.join(prj_dir_path,"probe/"))
parser.add_argument("-p","--pidfile", metavar="path", default=os.path.join(prj_dir_path,".pidfile"))
parser.add_argument("-v", action="count", default=0)
```
:::
### Query module:
```python=
```
### UI module:
```python=
from flask import Flask, request
app = Flask(__name__)
app._config="" # would be given by the main module
@app.route('/', methods=['GET'])
def index():
return ("Explanation of the port usage")
@app.route('/show', methods=['GET'])
def show():
# shows current config
return (app._config)
@app.route("/reset", methods=['GET'])
def reset():
# set config to default
set_to_Default(app._config)
return ("status")
@app.route('/update', methods=['POST'])
def update():
# update a particuler object
Obj = request.form.get("Target_name", default=None)
app._config.update({"Target_name":Obj})
return ("status")
@app.route('/delete', methods=['POST'])
def delete():
# delete a particuler object
Obj = request.form.get("Target_name", default=None)
app._config.pop({"Target_name":Obj})
return ("status")
```
### Daemonize module:
please refer to [this document](https://hackmd.io/@mcnlab538/BJNV9Zs8D)