MOD Log Survey === keypoints: 1. Log parameter classification. 2. Study the difference between specific words in the log and general articles. 3. Influence of word abbreviations 4. log level classification and instance mapping (Ex : 140.114.213.70 <-> elk.es.ntu.edu.tw) Feature of log files: 1. short sentences holding dense information 2. small word pool 3. sentences with timestamp 4. scheduled message Potential issues: 1. Abbreviation Testing: 1. Extracting from logfile, get the inportant words 2. Extract values 3. overlaping / sentence structure Input files: Processing flow: 1. Log Parameter Classification: (140.114.175.38 -> IP, /var/data/ -> DIR) 2. Word abbreviations: |Origin |Abbreviation| |-------|------------| |Central Process Unit|CPU | |Memory |Mem | 3. Use level-info to remove the useless log 4. Label: Probe-based label : Use probes with specific information for classification (cpu-load,network bandwidth...) Type-based label : Use general system information for classification(Network,File system...) 5. Classification: based-line: bag of word with one-hot encoding Embedding: Use embedding based method 6. Combine classification result and extract value ___ ## Process Survey 1. web log, preprocessing : </br>https://airccj.org/CSCP/vol1/cscp0101.pdf 2. event log, batch, production stage:</br>https://www.researchgate.net/profile/Niels_Martin/publication/287198292_Batch_processing_definition_and_event_log_identification/links/567291bd08aeb8b21c70c44f/Batch-processing-definition-and-event-log-identification.pdf 3. Logcluster 4. Drain