IBM-NTU - HackMD

# IBM-NTU ## Resource Constrained Pattern Matching Paper link (NTU Template). https://www.overleaf.com/read/vpjhctznsksr ### Paper Structure 1. Introduction * Importantce of pattern matching * as key component of deep packer inspection * Breif intro to pattern matching * different approaches and their pros, cons * resource consumption * Deploy DPI on resource constrained is an issue * on smart NIC, for example * Shortcoming of prior work * DFA-based Rule Grouping requires much more memory * Intro and summarize Rule Balancer * Contribution 2. Background * Intro to Automaton-based pattern matching **(Ongoing**) * RegEx and pcre * DFA vs NFA, processing time and memory cost * Related Work * Rule grouping * Other complementary optimization approaches 3. Methodology * Problem Formulation :::danger Comment. So $k$ is the maximum number of $R_x$ ? ::: :::success Response. Nope, it's the number of cores we have. ::: :::danger Comment. It looks like $R_1 \sim R_k$ are the input parameters, but it should be the output of your optimization problem. Please check. ::: :::success Response. They are inputs of our objective function and the output of the optimization. There is no contradiciton. ::: :::danger Comment. R is previously defined as regular expression, why are we using it "r" ? ::: :::success Response. The captital letter R is used to represent **the set of regular expressions** while the small letter r is used to be a single regex. ::: * Evolutionary Rule Balancer * Overview :::danger Comment. It is not clear of $a_i$. Is this a group or a rule? ::: :::success Response. $a_i$ is the group ID for the $i$-th rule. The above sentence is the definition and the following is the explanation of the notation. ::: :::danger Comment. "fitness of chromosome" would be a new concept for a reader. You might include some backgroudn explanation about the GA algorithm in high level. ::: :::success Response. Fix the order of the sentences ::: :::danger Comment. Use vectorized image for 3.2 and 3.3 ::: :::success Response. What does vectorized image mean? ::: :::danger Comment. Figures are not referenced in the text. ::: :::success Response. Add reference in the context! ::: * Regex Embedding * Case Study. NFA * Minimal Total NFA Size <--> Max-k Cut :::danger Comment. What is N? ::: :::success Response. The number of states of NFA for input regular expressions, as explained in the table 3.1. Add more discriptions in the proposition. ::: :::danger Comment. You may want to mention that K, the number of partitions, is given. ::: :::success Response. Yes, the number $k$ is supposed to be given. ::: :::danger Comment. This notation would be wrong, Please check. I think $V_i={r_{i_1}... r_{i_k}}$ would be correct. ::: :::success Response. The expression is correct. $k$ is the number of cores, not the maximal possible number of regex in a group. ::: :::danger Comment. Can we simplify this equation? You can use references of previous equations. ::: :::success Response. Unfortunately, the answer is no. This is just direct corollary from the definition (3.7). We didn't make any concrete constrcution, So the expression cannot be simplified at the moment. ::: :::danger Comment. What do you mean by "Hope to have" ? :) ::: :::success Response. Notice that the those paragraphs consists of not only the proofs, but also the motivations. The following expression is equivalent to reduce the problem to Max k-cut, which is our goal. That's why I said "Hope to have" :) ::: :::danger Comment. You may want to use different index for diferent i? r_{i,i} is a very confusion notation. ::: :::success Response. As a rule will be decomposed into many sub-patterns, I would like to use such notation. Maybe there is a more proper way to do so. ::: :::danger Comment. This equation is very hard to follow. You may want to simplify the equation and put the details in the appendix. this comment applies to 3.13-3.15. ::: :::success Response. Agree, still thinking ... ::: * Minimal Largest NFA Size <--> Multi-depot Salesman * Algorithm design 4. Experiment * Space-time tradeoff * Comparison of grouping algorithms * Evaluate NFA Size Estimator * Evaluate RegEx Embedding 5. Conclusion