---
title: PLs-and-Degrees-of-Freedom
author: bynx
tags: pl
---
# Programming Language Resolutions
## What is _Resolution_?
_Resolution_ is defined here ala _Display Resolution_. In other words, as degrees-of-freedom vary, so too does our observable. In QFT, we know this term as _dimensionality_ or _cardinality_; we use the former and latter interchangeably.
Classical representations of programming languages aim to reduce dimensionality in an effort to ease lexical, syntactical, and semantical analysis. This result is a _discrete resolution_ of a provided input sample program, $P$. Examples of the aforementioned discrete resolutions can be see in the [Cononical Representations](/kLE1slUeS1qZyjOAoeV4Xg##Canonical-Representations) section below.
## A Modest Proposition
Given a vector representation of a sample program, $P_{\lvert\psi\rangle}$, renormaliztion group transformations of $\lvert P \rangle$ will validate universality for cononical representations, $P_{\gamma}$, $P_{\rho}$, $P_{\tau}$. As a result, dimensionality can be preserved while maintaining a discrete resolution, i.e., logically equivalent semantical mutations exist for all semantic mutations at all resolutions of $P$.
### Challanges
Assigning a semantic label to a code snippet (such as a name to a method) is an example for a class of problems that require a compact semantic descriptor of a snippet. The question is how to represent code snippets in a way that captures some semantic information, is reusable across programs, and can be used to predict properties such as a label for the snippet. This leads to two challenges:
1. Representing a snippet in a way that enables RG transformations
1. Learning which parts in the representation are relevant to prediction of the desired property, and learning the order of importance of the part.
## Canonical Representations
### Source Code
### Concrete Syntax Tree / Parse Tree / Derivation Tree
### Abstract Syntax Tree
A tree representation for the abstract _syntactic structure_ of source code
$\quad \rightarrow$ **_Node_**_: construct, such as statement, loop_
$\quad \rightarrow$ **_Edge_**_: containment relationship_
This makes it possible to apply all kinds of **syntax-directed** translation/transformation ASTs are simplified parse tree. It retains syntactic structure of code.
**Definition 1 (Abstract Syntax Tree).** An Abstract Syntax Tree $(AST)$ for program sample $P$ is a tuple, where for:
$$
\Large\color{orchid}{P_{{_A}{_S}{_T}} = \lgroup \eta ,\tau ,\chi , s, \delta, \phi \rgroup}
$$
- $\eta$ is a **set of nonterminal nodes**
- $\tau$ is a **set of terminal nodes**
- $\chi$ is a **set of values**
- $s \in \eta$ is the **root node**
- $\delta : \eta \rightarrow (\eta \bigcup \tau )\star$ is a function that maps a nonterminal node to a list of its children
- $\phi : \tau \rightarrow \chi$ is a function that maps a terminal node to an associated value.
Every node except the root appears exactly once in all the lists of children. Next, we define $AST$ paths. For convenience, in the rest of this section we assume that all definitions refer to a single $P_{{_A}{_S}{_T}}$ .
An $AST$ path is a path between nodes in the AST, starting from one terminal, ending in another terminal, and passing through an intermediate nonterminal in the path which is a common ancestor of both terminals. More formally:
**Definition 2 (AST path).** An $AST$-path of length $\kappa$ is a sequence of the form $n_{1}d_{1} \dots n_{\kappa}d_{\kappa} n_{\kappa+1}$ where:
- $n_{1},n_{\kappa+1} \in \tau$ are terminals
- $\forall i \in [2..\kappa]: n_{i} \in \eta$ are nonterminals
- $\forall i \in [1..\kappa]: d_{i} \in {\{\uparrow,\downarrow\}}$ are movement directions (either up or down in the tree), e.g.,
\begin{align}
(d_{i}\ &=\ \uparrow) \ \Rightarrow\ n_{i} \in \delta (n_{i+1} ) \\
(d_{i}\ &=\ \downarrow)\ \Rightarrow\ n_{i+1} \in \delta (n_{i})
\end{align}
For an $AST$-path $\rho \in P$, we use $start(\rho)$ to denote $n_{1}$ $-$ the starting terminal of $\rho$, and $end(\rho)$ to denote $n_{\kappa+1}$ $-$ its final terminal.
Using this definition we define a path-context as a tuple of an AST path and the values associated with its terminals:
**Definition 3 (AST path-context).** Given an $AST$ path $\rho$, its path-context is a triplet $\lgroup \chi_{s}, \rho, \chi_{\tau} \rgroup$ where $\chi_{s} = \phi$ (i.e., $start(\rho)$) and $\chi_{\tau} = \phi$ (i.e., $end(\rho)$) are the values associated with the start and end terminals of $\rho$, respectively.
That is, a path-context describes two actual tokens with the syntactic path between them.
*Example 1.* A possible path-context that represents the statement: $x = 7;$" would be:
$$
\color{orchid}{\lgroup\ x,\ (NameExpr\ \uparrow\ AssignExpr\ \downarrow\ IntegerLiteralExpr),\ 7\ \rgroup}
$$
### Directed Acyclic Graph
$DAG$s are similar to an $AST$ but with _a unique node for each value._
### Control Flow Graph
A representation, using graph notation, of all paths that might be traversed through a program during its execution. A directed graph (usually for a single procedure) in which:
- Each node is a single basic block
- There is an edge $b1→b2$, where control _may_ flow from the last statement of $b1$ to the first statement of $b2$ in some execution.
_Note: $CFG$s are actualy conservative approximations of the control flow_
### Code Property Graph
A code property graph is a property graph, $G = (V,E,λ,μ)$, constructed from the $AST$, $CFG$, and $PDG$ of source code with:
\begin{align}
\color{orchid}{V} &\color{orchid}{= V_{A}} \\
\color{orchid}{E} &\color{orchid}{= E_{A}\ {\bigcup}\ \ E_{C}\ \bigcup E_{P}}\ \ \\
\color{orchid}{λ} &\color{orchid}{= λ_{A}\ \ {\bigcup}\ \ λ_{C}\ \bigcup λ_{P}}\ \ \\
\color{orchid}{μ} &\color{orchid}{= μ_{A}\ \ {\bigcup}\ \ μ\ P} \\
\end{align}
where we combine the labeling and property functions with a slight abuse of notation.
[J. Ferrante, et al. _The program dependence graph and its use in optimization_. (1987)][pdg-doi]
[pdg-doi]: https://dl.acm.org/doi/abs/10.1145/24039.24041
### Program Dependence Graph
A _directed_ graph representing dependencies among:
- Code **Control** dependence
- $A$'s control depends on $B$, if $B$’s execution decides whether or not $A$ is executed
- Code **Data** dependence (Data Dependency Graph $-$ $DDG$)
- $A$'s data depends on $B$ if $A$ uses variable defined in $B$
A $PDG$ contains both **_control dependence edges_** and **_data dependence edges_**.
### Points-to Graph
For a program location, for any object `reference/pointer`, calculate all the possible `objects/variables` it may/must refer/point to:
- Connect together analyzed program semantics for individual methods
- Essential to expand intra-procedural analysis to inter-procedural
- Detect consistent usage of resources
- File `open`/`close`, `lock`/`unlock`, `malloc`/`free`
### Property Graph
A _property graph_ $G=(V,E,λ,μ)$, is a _directed, edge-labeled, attributed multigraph_ where $V$ is a **set of nodes**, $E$ is a **set of directed edges**, and $λ : E → Σ$ is an edge labeling function assigning a label from the alphabet $Σ$ to each edge.
Properties can be assigned to edges and nodes by the function:
$$
\large{\color{orchid}{μ:(V\ \bigcup E)×K→S}}
$$
where $K$ is the set of _property keys_ and $S$ the set of _property values_.
### Call Graph
A directed graph representing _caller-callee relationship_ between methods/functions
$\quad \rightarrow$ **Node**: methods/functions
$\quad \rightarrow$ **Edges**: calls
### Static Single Assignment Form
A program is in $SSA$ form $iff$:
1. **each variable** is assigned a value in **exactly one** statement
1. **each variable** use is **dominated by the definition**
The $SSA$ Graph is a _directed graph_ in which:
$\quad \rightarrow$ **Node**: All definitions and uses of $SSA$ variables
$\quad \rightarrow$ **Edges**: $\{(d,\ u)\ :\ u \}$ use the $SSA$ variable defined in $d$
### Three-Address Code
A term used to describe many different representations of the form:
$$
\Large{\color{orchid}{\psi\ =\ \theta_{fn}\ \ op1\ \ op2\ \ op3}}
$$
where $\psi \in \Psi$, $\theta \in \Theta$ are our statement and operation value, respectively $-$ $op\star$ are _optional_ parameters passed to $\theta_{fn}$.
### Domain-Specific Languages
We define $A+B$ as the _disjoint union_ of two sets, $A$ and $B$:
\begin{align}
\large \color{orchid}{A+B} &\color{orchid}{\equiv \large A \ \ {\bigcup}^{\ \ast} \ B} \\
&\color{orchid}{\equiv \large \ (\ A×{\{0\}}\ ) \ \bigcup \ (\ B×{\{1\}}\ )} \\
&\color{orchid}{\equiv \large \ A^{*} \ \bigcup \ B^{\ *}}
\end{align}
_Note: Sequences in the _free monoid_ are denoted as a vector: $\large{\vec{v} \in A^{\ast}}$_
### Context-free Grammar
A context-free grammar is a tuple $(\Sigma, X, R, s)$ where $\Sigma$ and $X$ are finite sets called **_terminals_** and the **_non-terminals_**, respectively.
- $R \subseteq X \times (X + \Sigma)^\star$ is a finite set of **_production rules_**
- $s \in X$ is called the **_start symbol_**
The **_language_** of a context-free grammar is given by:
$$
\large{\color{orchid}{L(G) = \{\ \vec{u} \in V^\star\ \vert\ s \to_R \vec{u}\ \}}}
$$
where the rewriting relation, $(\to_R) \subseteq (V + X)^\star \times (V + X)^\star$, is traditionally defined as the transitive closure of the following directed graph:
$$
\large\color{orchid}{\{\ \
(\vec{u}{\ x\ }\vec{w},\ \ \vec{u\ v\ w})\quad
\Big{\vert} \quad
\vec{u}, \vec{w} \in (V+X)^{\ast},\ \ (x, \vec{v}) \in R\ \
\large\}}
$$
Note: Given a set $S$, the **_free monoid_** on $S$ is the set $S^{\ast}$ of all finite sequences of elements $s \in S$, made into a monoid using concatenation.
### Intermediate Representations
### Assembly Instructions
### Machine Code
---
## Language Translation Sample
Let's solidify that with an example in JavaScript:
### JavaScript
Source Code
```javascript=
/*
* ECMA-262/11
*/
const v0 = 1337;
const v1 = {a:v1};
for (let v2 in v1) { v2 = v3; { const v4 = {a:v1}; v2 = v4; }
}
```
### Abstract Syntax Tree (Domain-Specific Language)
```==
/*
* Custom ES AST
*/
'const' BindExpr ';'
'const' BindExpr ';'
ForInExpr '{' BindExpr ';' '{' 'const' BindExpr ';' BindExpr ';' '}'
'}'
```
### FuzzIL (Intermediate Representation)
Three-Address Code
```swift=
/*
* FuzzIL Translation
*/
v0 <- LoadInteger 1337
v1 <- CreateObject ['a':v0]
BeginForIn v1 -> v2
BeginBlockStatement
v3 <- LoadInteger 1337
Reassign v2, v3
BeginBlockStatement
v4 <- CreateObject ['a':v1]
Reassign v2, v4
EndBlockStatement
EndBlockStatement
EndForIn
```
### JavaScriptCore's B3 Intermediate Representation
```==
bb#1
[ 0] enter
[ 1] get_scope dst:loc4
[ 3] mov dst:loc5, src:loc4
[ 6] check_traps
[ 7] mov dst:loc6, src:Undefined(const0)
[ 10] resolve_scope dst:loc7, scope:loc4, var:0, resolveType:GlobalProperty, localScopeDepth:0
[ 17] put_to_scope scope:loc7, var:0, value:Int32: 1337(const1), getPutInfo:1049600<DoNotThrowIfNotFound|GlobalProperty|ConstInitialization|NotStrictMode>, symbolTableOrScopeDepth:0, offset:0
[ 25] resolve_scope dst:loc7, scope:loc4, var:1, resolveType:GlobalProperty, localScopeDepth:0
[ 32] new_object dst:loc8, inlineCapacity:1
[ 36] resolve_scope dst:loc9, scope:loc4, var:1, resolveType:GlobalProperty, localScopeDepth:0
[ 43] get_from_scope dst:loc10, scope:loc9, var:1, getPutInfo:2048<ThrowIfNotFound|GlobalProperty|NotInitialization|NotStrictMode>, localScopeDepth:0, offset:0
[ 51] put_by_id base:loc8, property:2, value:loc10, flags:IsDirect
[ 57] put_to_scope scope:loc7, var:1, value:loc8, getPutInfo:1049600<DoNotThrowIfNotFound|GlobalProperty|ConstInitialization|NotStrictMode>, symbolTableOrScopeDepth:0, offset:0
[ 65] mov dst:loc6, src:Undefined(const0)
[ 68] mov dst:loc7, src:<JSValue()>(const2)
[ 71] resolve_scope dst:loc8, scope:loc4, var:1, resolveType:GlobalProperty, localScopeDepth:0
[ 78] get_from_scope dst:loc9, scope:loc8, var:1, getPutInfo:2048<ThrowIfNotFound|GlobalProperty|NotInitialization|NotStrictMode>, localScopeDepth:0, offset:0
[ 86] mov dst:loc8, src:loc9
[ 89] get_property_enumerator dst:loc9, base:loc8
[ 92] get_enumerable_length dst:loc10, base:loc9
[ 95] mov dst:loc11, src:Int32: 0(const3)
Successors: [ #2 ]
```