owned this note
owned this note
Published
Linked with GitHub
# INF222 - Programming Languages
## Learning outcomes
### Knowledge
* explain the concepts of _concrete_ and _abstract syntax_ of a language
* syntax: grammatical arangement of primitives
* abstract syntax: what is necessary and relevant for executing the program (ignored comments, ignored layout)
* concrete syntax: precise syntax of the program construct (e.g. what the programmer writes)
* explain the concerns of designing syntax that can be parsed effectively
* we defined language syntax as a grammar in Backus-Naur-Form, wellformedness and welltypedness criteria were defined seperately
* language constructs are represented with unique keywords (If vs Ife vs Expr-If)
* select parsing tools and approaches according to problem context
* use preexisting parser generators
* explain notions and approaches to _defining semantics_ of programming languages

* denotational semantics:
* every construct in the AST has a denotation (meaning)
* semantic domain is mathematical
* axiomatic semantics
* uses logical assertions about program state (hoare logic)
* designed for reasoning about algorithms
* explain what _type safety_ of a programming language means
* [type systems](#Type-systems)
* explain different variations of _polymorphism_ and relate them to features of contemporary programming languages
* subtyping allows inheritance of an interface, featured in object oriented PLs by using subclassing
* explain the essence of _important programming language constructs_ and concepts
* If-Then-Else, While-Loop, Stmt, Expr, Expr-If, Procedure, Operators, Variable???
* describe their implementation approaches, purpose and productive use in programming
* ???
* and variations in different mainstream programming languages
* ???
### Skills
* define an abstract grammar for a small programming language and _implement a parser_ for it
* define an operational semantics for a small language and _implement it as an interpreter_
* define and _implement_ a type system for a small language
### General competence
* make justified decisions about the _use of different programming language constructs_ in programming
* make justified decisions about _selecting programming languages_ for software projects
* _follow new developments_ in programming languages
* read and _understand_, to a useful degree, _scholarly articles_ in the area of programming languages
## Software languages (not necessarily Programming Languages)
* non-natural language formulated for software use
* used as program I/O and for saving data (serialization)
* must have a precise syntax
* e.g. HTML, JSON, XML, etc.
## Programming languages (subset of Software languages)
* Software languages for instructing or specifying computations
* define data structures
* express algorithms
* clarify properties
* Communicate ideas of computiations between humans (human readable)
* should have a precise syntax and semantics
* defined independently of any specific implementation
* AND/OR defined by a reference implementation
## Programming language classification
A coarse approach to understanding language property bundling
Language properties:
* Paradigm
* Type system
* Generality
* Representation kind
* Declarative (?)
* Close to metal
* Artefact / purpose
### Paradigms
* Imperative
* can modify data in-place
* often coupled with aliasing (i.e. references, pointers)
* often close to metal (e.g. C)
* Functional
* expression oriented
* often coupled with higher order and a terse notation (Verbose means lots of symbols. Terse means fewer symbols.)
* often with an advances type system
* Logical / rulebased
* difficult to understand runtime properties (depend on solver)
* Object-oriented
* traces back to SIMULA 1967
* uses objects which bundle data as fields (properties/attributes) and code as procedures (methods)
* often considered industry-standard
### Type systems
* statically typed (types known at compile/edit time)
* dynamically typed (types depend on run time)
* strong/weak typing [(what does this even mean?)](https://stackoverflow.com/questions/2690544/what-is-the-difference-between-a-strongly-typed-language-and-a-statically-typed)
* duck typing (if it walks like a duck and it quacks like a duck, then it must be a duck)
* type depends on the available operations/methods (dynamic)
* structural typing (name of type is not important, e.g. Haskell)
* types are the same if they have the same data structure
* nominal typing
* types are distinguished by name only
### Generality
* General purpose programming language (GPL)
* expressive: "Turing complete"
* abstraction mechanisms
* operation (procedure/function) astraction
* data structure abstraction
* abstract data types
* module system [(?)](https://en.wikipedia.org/wiki/Modular_programming)
* Domain-specific language (DSL) for programming
* concepts for a specific domain
* embedded in GPL: library for a specific domain
* external (full toolset)
* notation adapted to the domain
* knowledge of the domain embedded in tools
* lack of abstraction mechanisms and module system
* no interoperability with other DSLs
### Representation Kinds
* textual language
* usual case, can use plain text editing tools
* understood as representing underlying structure
* parsing tools to recover structure
* tree language
* nesting
* nested expressions and statements
* nested data structures
* editing
* structured editing: tree operations
* projective editing: emulates text editing
* graph language
* visual (graphical) presentation and editing
### Declarativeness
* declarativeness: degree of lack of resource control
* "what" vs "how"
* agorithmic control
* rulebased/logic
* functional
* imparative
* variable updates
* oder of updates to variables
* multiway dataflow constraint systems
* evaluation order related to "side effects"
* "good": variable update from procedure calls
* "bad": variable updates in expression evaluation
### Close to metal
* degree to which the language maps to hardware
* system programming languages
* minute control of hardware details: C, C++
* features readily maps to hardware
* language features based on formal computational model
* functional prog.: Haskell
* logical prog.: Prolog
* close to metal languages often embody hardware from 1970s
* pointers to memory addresses
* no distributed memory and parallel processing
* memory access considered cheap compared to evaluation
### purposes of software languages
* programming language
* query language
* transformation language
* modeling language
* specification language
* documentation language
* configuration language
## defining a programming language
* syntax and static semantics
* specification language
* dynamic semantics
* execution model
* pragmatics
* support tools
* IDE
* Compiler - interpreter
* verifiers
* test systems
### concrete syntax vs abstract syntax
* concrete syntax: what does the programmer write
* abstract syntax: what is necessary and relevant for executing the program
* topic originally developed for compiling
* ingored comments
* ignored layout
* syntax
* lexical: character stream to token stream
* context free syntax
* grammars: concrete and abstract syntax
* signature declarations: mostly abstract syntax
### syntax vs static semantics
* syntax definition
* may embody operator precedence (often handled outside of grammar)
* may embody type system (i.e. branches in expr-if)
* static semantics: additional wellformedness properties
* decl before use (needs context)
* typing often delegated to static semantics
* grammar can be defined in backus-naur-form
### programming and specification language
* programming
* defining data structures, abstract data types
* defining algorithms
* specification
* defining (intended) properties of algorithms
* pre/post conditions for each procedure (assertions)
* assert invariants for steps in algorithm
* defining (intended) properties of abstract data types
* preconditions
* axioms
* procedure (no side effect) with assertion
* blends with generic programming
### dynamic semantics
* what is the meaning of wellformed programs
* defined in language manual (language standard)
* formalised english (standardise)
* typically gets out of hand for a commercial language
* ada programmers actually refer to the standard
* formal approaches
* denotational semantics
* operational semantics
* big step
* small step
* axiomatic semantics
* abstract machine
### execution model
* how does the program execute on the machine
* storage layout of data
* storage requirements for data
* execution order for statements
* implies running time and data space needs
* recursion
* stack frames
* whether tail optimisation is required or allowed
* optimiser allowed to rearrange code
* may cause concurrency issues
* interaction with modern CPUs
* out of order execution of instructions in the CPU
### pragmatics
* how to use the language
* effectively (programmer time)
* efficiently (machine time)
* organisation
* libraries
* modules
* code
* guidelines for coding style
* security related
* interpeson communication related
### support tools
* IDE
* concrete syntax
* with comments
* with layout
* projective editing
* navigation of the module system
* presentation of modular dependencies
* access to documentation of external libraries
* Compiler / Interpreter
* needs abstract syntax
* feedback should be to original source
* downgrades program structure
* to data structures and algorithms only
* to machine level
* memory and instruction set
### programing correctness assurance tools
* verifiers
* needs integration between program and specification
* embedded specification language using assertions
* external specification language and mapping
* from programming to specification domain
* from specification to programming domain
* needs abstract syntax for both
* feedback should be to original source
* test systems
* unit testing: careful selection of test cases
* axioms as test oracles: just add data
* needs access to assertion mechanism
* feedback should be from original source
## abstract syntax
### concrete and abstract syntax
* two forms of syntax
* concrete syntax (what the programmer writes)
* abstract syntax (what is essential for the semantics)
### Ralf Lämmel's abstract syntax notation
* BSL (Basic Signature Language) constructs
* function types:
* function symbol, list of argument sorts, result sort
* sorts(=types) implicit
* ESL (Extended Signature Language) constructs
* BSL symbol declarations
* type declarations list types t*,t+
* optional types t?
* tuple types
* primitive types
* boolean, integer, float, string, term
### BTL (basic TAPL(Types and Programming Languages) language)
* basic expression language
* basic version
* "low resolution": does not check types
* fully typed version
* better resolution, far more complicated
### BIPL (basic imperative prog. lang.)
* extend BTL with assign, ifstmt, while, block, skip and seq(uence) statements
### environment for interpreters
* env maps variable names to values
* simple env model in haskell: type Env = [(String, Integer)]
## Parsing
### Grammar
* Grammars are the language of specifying parsers
* Grammar G = <T, N,->, S>
* T: Terminals
* N: Non-Terminals
* ->:production rules
* S: start symbol, an element of N
### Grammar levels
* phrase structure grammar: turing-acceptable
* context free grammar: pushdown automaton
* popular for defining syntax of programming languages
* regular grammar: finite state machine
* mostly too limited for prog. lang.
### Usefulness of CFG
* not powerful enough to capture
* type constraints
* decl before use
* c++
* too powerful
* efficient tools exist for restrictions of CFG
* in many cases there are transformations to useful restrictions
* hardcoded parser can circumvent many problems
### Grammar Tools
* Grammar G, L(G) is language defined by G
* Grammars define concrete syntax of a language, can resemble the abstract syntax of a prog.lang.
* Tools:
* Recogniser/acceptor
* tells if a string is a member of the grammars language
* parser
* answers - how is this string derived from the grammar?
* builds a corresponding tree:
* parse tree or concrete syntax tree
### toolset for a prog. lang.
* string related tools
* acceptor
* parser: concrete syntax -> CST
* pretter-printer/unparser: CST -> concrete syntax
* tree related tools
* translator from CST to AST (abstract syntax tree)
* translator from AST to CST
* wellformedness tools on AST (static semantics)
* type checker
* decl before use
* semantics related tools
* evaluator: from AST and env/store to value domain
### grammer tooling problems
* grammar tools are sensitive to how the rules are formulated
* parser needs to decide which rule to apply
* binary operator precedence
* binary operator association
* dangling else problem
### operator precedence
* not a problem for acceptors
* can be handled
* by precedence aware parsing techniques
* by systematic rewrites of the grammar
### operator association
* assiciativity is not a problem for acceptors
* can be handled
* by associativity aware parsing techniques
* by systematic rewrites of the grammar
### dangling "else" problem
* possibility to leave out else block can make grammar ambiguous: [wiki](https://en.wikipedia.org/wiki/Dangling_else)
### grammar operations
* union
* renaming symbols
* grammar operations may change properties
* grammars can be normalized for specific properties
* grammars can be transformed to handle ambiguities
## scoping
### studying a new proglang construct
* lots of examples, relevant?
### scoping principles
* runtime scoping (python?)
* variables are declared when seen at runtime
* dynamic scoping (function calls prefer local variables)
* contexts bind at runtime
* static / lexical scoping \(C\)
* contexts bind at compile-time
* constant scoping (weird)
* values bind at compile-time
### implementing environments for scopes
* unscoped environment
* just one environment
* grows as new bindings are added
* runtime-scoping
* scoped environment
* list of environments for each scope
* local scopes can be added and removed
* local scope is captured at declaration time
* static scoping
* binding is looked up in scope of declaration
* dynamic scoping
* binding is looked up in scope of runtime
## Domain Specific Languages (DSLs)
* see assignment
## DSL imperative lang. (DIPL)
## Variables
## Assert
## property based testing
* unit tests rely heavily on assertions
* test cases are often hand-chosen
### example: java language spec
* rules for using java interfaces are specified in text form
* intricate corner cases and clumsily written descriptions leads to erroneous implementations
* axioms can be formalised as java code
### parametrized testing
* uses axioms as test oracles (implemented as methods)
* must work for any combination of input data
* must cover exceptions & corner cases
* test by calling axiom methods with relevant data
* carefully selected data sets
* typical cases, boundary cases
* random data sets
* outperform small selected test sets
* helpful in discovering unexpected corner cases
* generating relevant terms can be non-trivial
### example: finite integers
* ordered ring properties fail
* widening is okay
* compute and narrowing is commutative
* modulus and computer numbers does not commute
* alternative representations:
* bounded integers: boundary checks
* saturation integers: stops at ceiling and floor
* finite range natural numbers (without minus): commutative semiring, total order, ordered addition
* finite range signed integers
* not distributive
* not associative for addition
* minus is not inverse to addition
### example: floating point numbers
* associativity and distributivity cannot be guaranteed
## Generics
* is a mapping from required DSL to defined DSL
* Generic code
* user declared parameters (requirement r-DSL)
* syntactic declarations of types, operations
* leave semantivs open
* user defined DSL (d-DSL)
* defines types, operations (in terms of r-DSL)
* example: generic sorting algorithm, collection classes
### typesafe generics
* declare r-DSL
* provide rules for expected behaviour
* procedures with assertions
* declare d-DSL in terms of r-DSL
* typesafety ensures that no other operations can be used
* provide rules for expected behaviour
* procedures with assertions
* obersations
* typesafety can be used to ensure many properties
* maybe the rules for a d-DSL match those for an r-DSL
### prog. lang. with generics
* Java generics: only for classes, interfaces limited but with some typesafety
* C++ generics: very flexible, not really typesafe
## typing
* bottom-up type inference
* C, C++, Java, Fortran
* allows overloading
* explicit declaration of variable types and operation profiles (???)
* top-down type inference
* many functional languages like haskell
* finds most general type for an expression/function definition
* efficient
* strong typing, not annotations needed
* exotic type systems
* dependent types: value embedded into type
* subtyping (e.g. subclassing, inheritance)
* type system forms a partial order
* linear types: resource handling
## Lambda calculus
* universal model of computation
* variable binding
* abstract syntax: E ::= X | λ X. E | E E
* alpha rename / alpha conversion: if you substitute with a value v in a function f and v contains a free variable x that is bound in f (free variable capture), x must be renamed in v
example: (λ a b . a)(b) != λ b . b
rename to (λ a b . a)(c\) -> λ b . c
* nested lambda expressions are to be treated like nested scopes
example: (λ x . x (λ x . x))(y) -> y (λ x . x)
+ shorthands
* λ x y z. e for λ x. λ y. λ z. e
* e1 e2 e3 for ((e1 e2) e3
* untyped lambda calculus: interpreting pure lambda terms is cumbersome -> extend with values
E ::= X | λ X. E | E E | "True" | "False" | "Zero" | "Succ" E | "Pred" E | "IsZero" E | "Ifte" E E E
* compute calue terms where possible, partially evaluated terms elswhere
* simply typed lambda calculus: extended with typed proper values
* semantics: remove types and use untyped semantics
* is rigid, reuse of untyped has been lost
* System F: type parameter
id = ΛX.λx:X.x
* can use untyped lambda calculus semantics