# INF222 - Programming Languages ## Learning outcomes ### Knowledge * explain the concepts of _concrete_ and _abstract syntax_ of a language * syntax: grammatical arangement of primitives * abstract syntax: what is necessary and relevant for executing the program (ignored comments, ignored layout) * concrete syntax: precise syntax of the program construct (e.g. what the programmer writes) * explain the concerns of designing syntax that can be parsed effectively * we defined language syntax as a grammar in Backus-Naur-Form, wellformedness and welltypedness criteria were defined seperately * language constructs are represented with unique keywords (If vs Ife vs Expr-If) * select parsing tools and approaches according to problem context * use preexisting parser generators * explain notions and approaches to _defining semantics_ of programming languages ![](https://i.imgur.com/WdtvirX.png) * denotational semantics: * every construct in the AST has a denotation (meaning) * semantic domain is mathematical * axiomatic semantics * uses logical assertions about program state (hoare logic) * designed for reasoning about algorithms * explain what _type safety_ of a programming language means * [type systems](#Type-systems) * explain different variations of _polymorphism_ and relate them to features of contemporary programming languages * subtyping allows inheritance of an interface, featured in object oriented PLs by using subclassing * explain the essence of _important programming language constructs_ and concepts * If-Then-Else, While-Loop, Stmt, Expr, Expr-If, Procedure, Operators, Variable??? * describe their implementation approaches, purpose and productive use in programming * ??? * and variations in different mainstream programming languages * ??? ### Skills * define an abstract grammar for a small programming language and _implement a parser_ for it * define an operational semantics for a small language and _implement it as an interpreter_ * define and _implement_ a type system for a small language ### General competence * make justified decisions about the _use of different programming language constructs_ in programming * make justified decisions about _selecting programming languages_ for software projects * _follow new developments_ in programming languages * read and _understand_, to a useful degree, _scholarly articles_ in the area of programming languages ## Software languages (not necessarily Programming Languages) * non-natural language formulated for software use * used as program I/O and for saving data (serialization) * must have a precise syntax * e.g. HTML, JSON, XML, etc. ## Programming languages (subset of Software languages) * Software languages for instructing or specifying computations * define data structures * express algorithms * clarify properties * Communicate ideas of computiations between humans (human readable) * should have a precise syntax and semantics * defined independently of any specific implementation * AND/OR defined by a reference implementation ## Programming language classification A coarse approach to understanding language property bundling Language properties: * Paradigm * Type system * Generality * Representation kind * Declarative (?) * Close to metal * Artefact / purpose ### Paradigms * Imperative * can modify data in-place * often coupled with aliasing (i.e. references, pointers) * often close to metal (e.g. C) * Functional * expression oriented * often coupled with higher order and a terse notation (Verbose means lots of symbols. Terse means fewer symbols.) * often with an advances type system * Logical / rulebased * difficult to understand runtime properties (depend on solver) * Object-oriented * traces back to SIMULA 1967 * uses objects which bundle data as fields (properties/attributes) and code as procedures (methods) * often considered industry-standard ### Type systems * statically typed (types known at compile/edit time) * dynamically typed (types depend on run time) * strong/weak typing [(what does this even mean?)](https://stackoverflow.com/questions/2690544/what-is-the-difference-between-a-strongly-typed-language-and-a-statically-typed) * duck typing (if it walks like a duck and it quacks like a duck, then it must be a duck) * type depends on the available operations/methods (dynamic) * structural typing (name of type is not important, e.g. Haskell) * types are the same if they have the same data structure * nominal typing * types are distinguished by name only ### Generality * General purpose programming language (GPL) * expressive: "Turing complete" * abstraction mechanisms * operation (procedure/function) astraction * data structure abstraction * abstract data types * module system [(?)](https://en.wikipedia.org/wiki/Modular_programming) * Domain-specific language (DSL) for programming * concepts for a specific domain * embedded in GPL: library for a specific domain * external (full toolset) * notation adapted to the domain * knowledge of the domain embedded in tools * lack of abstraction mechanisms and module system * no interoperability with other DSLs ### Representation Kinds * textual language * usual case, can use plain text editing tools * understood as representing underlying structure * parsing tools to recover structure * tree language * nesting * nested expressions and statements * nested data structures * editing * structured editing: tree operations * projective editing: emulates text editing * graph language * visual (graphical) presentation and editing ### Declarativeness * declarativeness: degree of lack of resource control * "what" vs "how" * agorithmic control * rulebased/logic * functional * imparative * variable updates * oder of updates to variables * multiway dataflow constraint systems * evaluation order related to "side effects" * "good": variable update from procedure calls * "bad": variable updates in expression evaluation ### Close to metal * degree to which the language maps to hardware * system programming languages * minute control of hardware details: C, C++ * features readily maps to hardware * language features based on formal computational model * functional prog.: Haskell * logical prog.: Prolog * close to metal languages often embody hardware from 1970s * pointers to memory addresses * no distributed memory and parallel processing * memory access considered cheap compared to evaluation ### purposes of software languages * programming language * query language * transformation language * modeling language * specification language * documentation language * configuration language ## defining a programming language * syntax and static semantics * specification language * dynamic semantics * execution model * pragmatics * support tools * IDE * Compiler - interpreter * verifiers * test systems ### concrete syntax vs abstract syntax * concrete syntax: what does the programmer write * abstract syntax: what is necessary and relevant for executing the program * topic originally developed for compiling * ingored comments * ignored layout * syntax * lexical: character stream to token stream * context free syntax * grammars: concrete and abstract syntax * signature declarations: mostly abstract syntax ### syntax vs static semantics * syntax definition * may embody operator precedence (often handled outside of grammar) * may embody type system (i.e. branches in expr-if) * static semantics: additional wellformedness properties * decl before use (needs context) * typing often delegated to static semantics * grammar can be defined in backus-naur-form ### programming and specification language * programming * defining data structures, abstract data types * defining algorithms * specification * defining (intended) properties of algorithms * pre/post conditions for each procedure (assertions) * assert invariants for steps in algorithm * defining (intended) properties of abstract data types * preconditions * axioms * procedure (no side effect) with assertion * blends with generic programming ### dynamic semantics * what is the meaning of wellformed programs * defined in language manual (language standard) * formalised english (standardise) * typically gets out of hand for a commercial language * ada programmers actually refer to the standard * formal approaches * denotational semantics * operational semantics * big step * small step * axiomatic semantics * abstract machine ### execution model * how does the program execute on the machine * storage layout of data * storage requirements for data * execution order for statements * implies running time and data space needs * recursion * stack frames * whether tail optimisation is required or allowed * optimiser allowed to rearrange code * may cause concurrency issues * interaction with modern CPUs * out of order execution of instructions in the CPU ### pragmatics * how to use the language * effectively (programmer time) * efficiently (machine time) * organisation * libraries * modules * code * guidelines for coding style * security related * interpeson communication related ### support tools * IDE * concrete syntax * with comments * with layout * projective editing * navigation of the module system * presentation of modular dependencies * access to documentation of external libraries * Compiler / Interpreter * needs abstract syntax * feedback should be to original source * downgrades program structure * to data structures and algorithms only * to machine level * memory and instruction set ### programing correctness assurance tools * verifiers * needs integration between program and specification * embedded specification language using assertions * external specification language and mapping * from programming to specification domain * from specification to programming domain * needs abstract syntax for both * feedback should be to original source * test systems * unit testing: careful selection of test cases * axioms as test oracles: just add data * needs access to assertion mechanism * feedback should be from original source ## abstract syntax ### concrete and abstract syntax * two forms of syntax * concrete syntax (what the programmer writes) * abstract syntax (what is essential for the semantics) ### Ralf Lämmel's abstract syntax notation * BSL (Basic Signature Language) constructs * function types: * function symbol, list of argument sorts, result sort * sorts(=types) implicit * ESL (Extended Signature Language) constructs * BSL symbol declarations * type declarations list types t*,t+ * optional types t? * tuple types * primitive types * boolean, integer, float, string, term ### BTL (basic TAPL(Types and Programming Languages) language) * basic expression language * basic version * "low resolution": does not check types * fully typed version * better resolution, far more complicated ### BIPL (basic imperative prog. lang.) * extend BTL with assign, ifstmt, while, block, skip and seq(uence) statements ### environment for interpreters * env maps variable names to values * simple env model in haskell: type Env = [(String, Integer)] ## Parsing ### Grammar * Grammars are the language of specifying parsers * Grammar G = <T, N,->, S> * T: Terminals * N: Non-Terminals * ->:production rules * S: start symbol, an element of N ### Grammar levels * phrase structure grammar: turing-acceptable * context free grammar: pushdown automaton * popular for defining syntax of programming languages * regular grammar: finite state machine * mostly too limited for prog. lang. ### Usefulness of CFG * not powerful enough to capture * type constraints * decl before use * c++ * too powerful * efficient tools exist for restrictions of CFG * in many cases there are transformations to useful restrictions * hardcoded parser can circumvent many problems ### Grammar Tools * Grammar G, L(G) is language defined by G * Grammars define concrete syntax of a language, can resemble the abstract syntax of a prog.lang. * Tools: * Recogniser/acceptor * tells if a string is a member of the grammars language * parser * answers - how is this string derived from the grammar? * builds a corresponding tree: * parse tree or concrete syntax tree ### toolset for a prog. lang. * string related tools * acceptor * parser: concrete syntax -> CST * pretter-printer/unparser: CST -> concrete syntax * tree related tools * translator from CST to AST (abstract syntax tree) * translator from AST to CST * wellformedness tools on AST (static semantics) * type checker * decl before use * semantics related tools * evaluator: from AST and env/store to value domain ### grammer tooling problems * grammar tools are sensitive to how the rules are formulated * parser needs to decide which rule to apply * binary operator precedence * binary operator association * dangling else problem ### operator precedence * not a problem for acceptors * can be handled * by precedence aware parsing techniques * by systematic rewrites of the grammar ### operator association * assiciativity is not a problem for acceptors * can be handled * by associativity aware parsing techniques * by systematic rewrites of the grammar ### dangling "else" problem * possibility to leave out else block can make grammar ambiguous: [wiki](https://en.wikipedia.org/wiki/Dangling_else) ### grammar operations * union * renaming symbols * grammar operations may change properties * grammars can be normalized for specific properties * grammars can be transformed to handle ambiguities ## scoping ### studying a new proglang construct * lots of examples, relevant? ### scoping principles * runtime scoping (python?) * variables are declared when seen at runtime * dynamic scoping (function calls prefer local variables) * contexts bind at runtime * static / lexical scoping \(C\) * contexts bind at compile-time * constant scoping (weird) * values bind at compile-time ### implementing environments for scopes * unscoped environment * just one environment * grows as new bindings are added * runtime-scoping * scoped environment * list of environments for each scope * local scopes can be added and removed * local scope is captured at declaration time * static scoping * binding is looked up in scope of declaration * dynamic scoping * binding is looked up in scope of runtime ## Domain Specific Languages (DSLs) * see assignment ## DSL imperative lang. (DIPL) ## Variables ## Assert ## property based testing * unit tests rely heavily on assertions * test cases are often hand-chosen ### example: java language spec * rules for using java interfaces are specified in text form * intricate corner cases and clumsily written descriptions leads to erroneous implementations * axioms can be formalised as java code ### parametrized testing * uses axioms as test oracles (implemented as methods) * must work for any combination of input data * must cover exceptions & corner cases * test by calling axiom methods with relevant data * carefully selected data sets * typical cases, boundary cases * random data sets * outperform small selected test sets * helpful in discovering unexpected corner cases * generating relevant terms can be non-trivial ### example: finite integers * ordered ring properties fail * widening is okay * compute and narrowing is commutative * modulus and computer numbers does not commute * alternative representations: * bounded integers: boundary checks * saturation integers: stops at ceiling and floor * finite range natural numbers (without minus): commutative semiring, total order, ordered addition * finite range signed integers * not distributive * not associative for addition * minus is not inverse to addition ### example: floating point numbers * associativity and distributivity cannot be guaranteed ## Generics * is a mapping from required DSL to defined DSL * Generic code * user declared parameters (requirement r-DSL) * syntactic declarations of types, operations * leave semantivs open * user defined DSL (d-DSL) * defines types, operations (in terms of r-DSL) * example: generic sorting algorithm, collection classes ### typesafe generics * declare r-DSL * provide rules for expected behaviour * procedures with assertions * declare d-DSL in terms of r-DSL * typesafety ensures that no other operations can be used * provide rules for expected behaviour * procedures with assertions * obersations * typesafety can be used to ensure many properties * maybe the rules for a d-DSL match those for an r-DSL ### prog. lang. with generics * Java generics: only for classes, interfaces limited but with some typesafety * C++ generics: very flexible, not really typesafe ## typing * bottom-up type inference * C, C++, Java, Fortran * allows overloading * explicit declaration of variable types and operation profiles (???) * top-down type inference * many functional languages like haskell * finds most general type for an expression/function definition * efficient * strong typing, not annotations needed * exotic type systems * dependent types: value embedded into type * subtyping (e.g. subclassing, inheritance) * type system forms a partial order * linear types: resource handling ## Lambda calculus * universal model of computation * variable binding * abstract syntax: E ::= X | λ X. E | E E * alpha rename / alpha conversion: if you substitute with a value v in a function f and v contains a free variable x that is bound in f (free variable capture), x must be renamed in v example: (λ a b . a)(b) != λ b . b rename to (λ a b . a)(c\) -> λ b . c * nested lambda expressions are to be treated like nested scopes example: (λ x . x (λ x . x))(y) -> y (λ x . x) + shorthands * λ x y z. e for λ x. λ y. λ z. e * e1 e2 e3 for ((e1 e2) e3 * untyped lambda calculus: interpreting pure lambda terms is cumbersome -> extend with values E ::= X | λ X. E | E E | "True" | "False" | "Zero" | "Succ" E | "Pred" E | "IsZero" E | "Ifte" E E E * compute calue terms where possible, partially evaluated terms elswhere * simply typed lambda calculus: extended with typed proper values * semantics: remove types and use untyped semantics * is rigid, reuse of untyped has been lost * System F: type parameter id = ΛX.λx:X.x * can use untyped lambda calculus semantics