HackMD - Collaborative Markdown Knowledge Base

### Table of Contents 1. [Link](#Glean) : Simon Marlow - Glean - Facts about code 2. [Link](#Correct-Cloud) : Thomas Ball - The Correct Cloud - Logic and Languages 3. [Link](#GraalVM) : Dhruv Makwana - One VM to Rule Them All? Lessons Learned with GraalVM --- ### Glean #### Summary With software scaling to over 50 million lines of code, it is very difficult to do analysis of code at scale. The traditional approach of locally generating metadata, using an IDE or static analyzer does not scale as the code base is large and multiple versions of the code base need to exist simultaneously. Thus Glean is a system which is designed to analyze code, generate metadata and then act as a central database which incrementally stores metadata about your code. It has a query engine which lets you generate facts about your code to perform different types of analysis. These facts can be as simple as function call sites to more advanced usage such as dead code elimination. The talk gives a broad overview of the problem, the solution in the form of Glean and then a demo of Glean followed by some implementation details. #### Code Metadata If your IDE understands your code, then you can do interesting things like * Jump to call sites of functions * Jump to function definitions * Type-safe refactoring * finding functions which are called with a certain overloading * finding classes which inherit from a certain class So the more metadata you have about your code, the more intelligent things you can do. This can lead to improving the code search and code exploration experience. #### Storing the metadata Having a single database for storing all your metadata can help you improve how you use metadata about your code. You can query the database to understand things like - * static analysis results * defect reports * coverage and quality reports Traditional methods of analyzing code metadata depend on the IDE locally analyzing the code and generating metadata. But this gets difficult when code bases are reaching the size of 50 million lines of code. Moreover, you require multiple copies or versions of this database to be able to go back and forth across changes in your code base. #### Glean - Facts about code Glean is a system which is being developed at Facebook as an engine which can let you query a code metadata database to generate facts. It uses - * **indexers** - To create metadata about code, e.q. a C++ indexer or a Java indexer * **database** - An incremental database to store this metadata across versions * **queries** - A system to query the database to be used by clients like IDE's etc But doing this has a challenge - ##### How much detail to store? To what detail shall we represent a language? We believe that we should store upto the AST of a language and derive higher properties. Inspiration can be taken from Datalog. If we have a predicate called `bike` which denotes something is a bike, e.g. `bike("road bike")` means that `road bike` is a `bike`, and we do the same thing for a `car`, then we can derive the fact that a thing is a `transport` if it is a `car` or `bike`. Now instead of bikes and cars imagine C++ or Java declarations and we can then derive abstract properties to represent data structures, types and store it incrementally. Dhruv Makwana - One VM to Rule Them All? Lessons Learned with GraalVM ``` bike("road bike") bike("mountain bike") car("suv") car("sedan") transport(X) -> car(X) | bike(X) ``` #### Demo of Glean A demo of Glean is given showing how to represent a simple functional language and then create a program in that language and see the facts. As a point into the implementation of Glean, Glean generates Apache Thrift types which can then be converted to native types in different programming languages. Further discussion about using RocksDB as a backend for storage and efficiently designing predicates such that you can search by filename, declaration name, usage, expression and other details are discussed. --- ### Correct Cloud #### Summary A multitude of tools at Microsoft are used to prove reliability and correctness of systems. In this talk, a broad overview of various classes of logic problems and their mechanized proofs in terms of expressiveness and automation capability is discussed. The main topics covered are - * Using SLAM to solve Boolean programs modelling finite-state machines * Using Z3 to test and prove network policies * Using F* to verify the EverCrypt library * An introduction to IVy #### Classes of logic There are various classes of logic that can be mechanized with varying degrees of automation and expressiveness - * Boolean Satisfiability - NP Complete * Quantified Boolean Formula - PSPACE * Effectively Propositional (EPR) - NEXPTIME * First Order Logic - Semi-decidable * First Order Logic + Integer Linear Arithmetic - Undecidable --- ### GraalVM #### Summary Goldman Sachs has created a general purpose language called **Slang**. Slang is syntactically between C and Lisp. It is a graduallGraalVM allows running of following languages which are being developed and tested in related repositories with GraalVM core to run on top of it using Truffle and the GraalVM compiler. These are:y typed scripting language. The code can self-modify the AST. It is executed by an interpreter written in C++ which proceeds via walking the AST and evaluating it. Goldman Sachs has **150 MILLION** lines of code in Slang. They want to modernize the environment around Slang such as debugger and profiler and also improve the performance. What have they considered? * Stop using Slang * Slang 2.0 * Rewrite the interpreter and runtime * Compile Slang to another language None of these ideas seemed practical so they are now exploring the use of Graal. This talk discusses how Goldman Sachs is migrating Slang to Graal. #### What is Graal? While Graal is a specific technology, Goldman Sachs is using a host of things to port Slang. The tools are - * Graal - It is a JIT compiler for JVM bytecode. JVM bytecode is the intermediate representation that Java is compiled to. The Java Virtual Machine then executes this bytecode * Truffle - It is a framework for writing programming languages. You write an AST interpreter using the Truffle API and then code generation is handled for you * Sulong - It is a LLVM bitcode interpreter, which has been written using Truffle, so that LLVM based languages can run on GraalVM * GraalVM - It is the JVM but with Graal as the JIT compiler instead of the curretn JIT compiler * SubstrateVM - SubstrateVM provides a runtime which can be used to AOT compile your code and link it with the runtime to run applications as native images without the JVM/GraalVM. Most of the focus of the GraalVM project is to provide a common and language-agnostic platform for managed languages with runtimes. Like LLVM/GCC strived to solve the MxN problem of M languages on N machines, GraalVM is trying to tackle M languages+runtimes on N machines problem. By unifying the runtime for all languages, common optimizations and tooling can be obtained for free. #### Implementation Details and Results The rest of the talk goes into how Slang is mapped to Graal. * One approach is to generete LLVM bitcode for the C++ interpreter and then run it on Graal using Sulong * The other approach is to lift parts of Slang into Truffle which generates JVM bytecode which can be run on GraalVM * Then the results are discussed with performance between code written in Python, Java, Slang, Slang on Sulong and Slang on GraalVM