Try   HackMD

Nested Modules in an Interactive World

tags: Juvix juvix-project

The Juvix module system is one that wishes to leverage the benefit of a nested module system (rust, ocaml) while maintaining a workflow that can support working interactively or through a more static approach. In order to achieve this goal, we analyze different approaches of module management, and distill the qualities that are positive to a project and user behavior.

The Haskell World View

The Haskell view of modules is an interesting one in that the language's modules look nested, while continuing to be flat.

An argument over the Flat World Model

Consider the following example

-- Compilation.hs module Juvix.Backends.LLVM.Compilation import qualified Juvix.Backends.LLVM.Codegen.Block as Block import qualified Juvix.Backends.LLVM.Codegen.Closure as Closure import qualified Juvix.Backends.LLVM.Codegen.Record as Record import qualified Juvix.Backends.LLVM.Codegen.Sum as Sum import qualified Juvix.Backends.LLVM.Codegen.Types as Types import qualified Juvix.Backends.LLVM.Codegen.Types.CString as CString ... preProcess :: ErasedAnn.AnnTerm PrimTy RawPrimVal -> Types.Annotated Types.TermLLVM preProcess = Conversion.op compileProgram :: Monad m => -- | Term to compile. ErasedAnn.AnnTerm PrimTy RawPrimVal -> Feedback.FeedbackT [] P.String m Text compileProgram t = case Block.runEnvState (register t) mempty of ...

Here we import the path Juvix.Backends.LLVM.Codegen over and over in a truly nested world we would be able to say

import Juvix.Backends.LLVM.Codegen preProcess :: ErasedAnn.AnnTerm PrimTy RawPrimVal -> Types.Annotated Types.TermLLVM preProcess = Conversion.op compileProgram :: Monad m => -- | Term to compile. ErasedAnn.AnnTerm PrimTy RawPrimVal -> Feedback.FeedbackT [] P.String m Text compileProgram t = case Block.runEnvState (register t) mempty of ...

as what exists within Juvix.Backends.LLVM.Codegen is the Block, Closure, Sum, Types, CString module. We could even take this a step further and qualify Juvix.Backends.LLVM.Codegen as C, and be able to say C.Block.runEnvState.

This is very much so what the java programming language does, which in some ways has similar syntatical convention (.) for it's "module" lookups as Haskell

public class test { public static void main (String args[]) { System.out.print("Test"); } } // we can also write import java.lang.System.* // for our purposes we could have also wrote // import static java.lang.System.out; public class test2 { public static void main (String args[]) { out.print("Test"); } }

Not only would a nested module system allow Juvix to save on imports, it will also allow more effective organization of libraries and business code.

Take for example the organization of a standard library.

-- psuedo Haskell -- Protolude.hs module Protolude where type Int = Int.T type Word = Word.T ------------------------------ -- Protolude/List.hs ------------------------------ module ProtoLude.List where filter :: (a -> Bool) -> [a] -> [a] filter = ... zip :: [a] -> [b] -> [(a,b)] -- Currently named nonEmpty toNonEmpty :: [a] -> Maybe (NonEmpty a) toNonEmpty = ... unfoldr :: (b -> Maybe (a, b)) -> b -> [a] sort :: Ord a => [a] -> [a] -------------------------------- -- Protolude/Int.hs -------------------------------- module Protolude.Int where type T = Prelude.Int -- this is gcdInt' in Protolude gcd :: T -> T -> T gcd = ... ------------------------------- -- Protolude/Word.hs ------------------------------- module Protolude.Word where type T = Prelude.Word -- this is gcdWord' in Protolude gcd :: T -> T -> T gcd = ...

The above example shows a few positive points over the current protolude that we use today. The first being is that functions which have multiple variants like gcdInt' and gcdWord' can now be used as the following

import Protolude myLogic :: Word myLogic = Word.gcd 3 5

Where the disambiguation can happen on the module level. However this leads into another important point. For functions like filter we could see this function as working on many different data structures, and maybe even a type class! To figure out what the version in protolude is, the user would have to type in the prompt :t filter to confirm the version. However in the case of writing List.filter, it becomes readily apparent that the filter function will be on the list data type.

Thus the standard library is setup in such a way that encourages operations pertaining to a particular type to be in the same module as that type, combined with no extra imports from the user means that users are more readily available to be using qualified names (which in Ocaml, Clojure, is considered good practice) and have lower resistance for trying other data structures in the standard that is not directly in prelude (every inconvenience discourages users from using [1] [2]).

Another advantage comes in the large scale packaging of code and the flexibility of later renaming.

Let us argue this point from the perspective of two different library authors working on a similar problem. They don't know each other, and decide to focus on different parts of the problem. Let's say the domain pertains to proving different properties of stream fusion. Thus they both define

  • Data.Stream.Proofs
  • Data.Stream.Proofs.Combinations
  • Data.Stream.Proofs

However maybe one defines Data.Stream.Proof.SingularPass, while the other defines Data.Stream.Proof.ConstantSpace.

This is all fine, as long as no one requires both of these packages at once. However, say we are a startup who wants to prove properties about stream fusion as we need hard real time guarantees. Thus we'd wnat a proof about the passes all being fused, and a proof that the code is taking a constnat amount of space as any GC would make our application not hard real time.

Thus we wish to use both of these. However in the flat representation, we could only use both packages if we rename each shared module in one of the packages (that is if the language even allows us!). Thus I have to rename one package as follows

  • Data.Stream.Proofs.Singular
  • Data.Stream.Proofs.Singular.Combinations
  • Data.Stream.Proofs.Singular

For each module given. However in a nested setting our choices get a lot more intersting. If the package manager allows us we can simpily isolate this issue at the package level and state, and import both depencies as qualified

-- build file qualified [Singular, Constant] -- File Level in our project module Data.Stream.Proofs where module Singular = Singular.Data.Stream.Proofs module Constant = Constant.Data.Stream.Proofs

This ability can further be automated away, if we treat modules as values, as we can request particular handlers to say concatenate the modules if no conflicting symbols are found etc.

An Arugment on Small Worlds, or how Small Worlds now can grow to nice Colonies later

Another interseting point is that Haskell enforces a system of one module per file and that if one does not import a module, then it can't be accessed at all. This has the upshot of Haskell's build system not requiring explicit ordering, however in times where a particular task demands multiple functions for one "kind" of item in the system, then we have little recourse.

For example:

Let us say we are making LLVM data types, we need to

  1. Make a name for the type
  2. Get the named reference in the system to that type
  3. Lay out the structure that is generated
  4. make a convinence name for the pointer

Below is the code that does precisely that.

portTypeNameRef :: Type portTypeNameRef = Type.NamedTypeReference portTypeName portTypeName :: IsString p => p portTypeName = "graph_port" portType :: Type -> Type portType nodePtr = StructureType { isPacked = True, ... } portPointer :: Type portPointer = pointerOf portTypeNameRef portTypeSize :: Num p => p portTypeSize = numPortsSize + pointerSizeInt -- | Construct a 32 bit port space so we can put many inside a node cheaply -- The pointer points to the beginning of a node and an offset nodePointer :: Type nodePointer = pointerOf nodeTypeNameRef nodeTypeNameRef :: Type nodeTypeNameRef = Type.NamedTypeReference nodeTypeName nodeTypeName :: IsString p => p nodeTypeName = "graph_node" nodeType :: Type -> [Type] -> Type nodeType tag extraData | bitSizeEncodingPoint = abstracted { elementTypes = ... } | otherwise = abstracted { elementTypes = ... } where abstracted = StructureType { isPacked = True, elementTypes = [] }

We make *type, *typeName, *typeNameRef, etc etc etc. If Haskell had a system where the user could make an adhoc module on the spot then the code would have been put in

module Port where nameRef = ... type' = ... typename = ... pointer = ... size = ... module Node where nameRef = ... type' = ... typename = ... pointer = ... size = ...

as it would be easy to do so. When tasks become laborious to do the right way users will often take lazy short cuts. Some examples of the right way to do it in haskell include:

  1. make the files for Node and Port up front
  2. Make a typeclass for the structure, and make a Void token that entails the implementation.

Some other benefits of the more Module focused way, I can safely factor Node and Port to their own files with no differnece in the caller interface. This very explicit structure, encoruages me to write a module signautre for them. And lastly I've increased the regularity of the codebase as this easy to form idiom has a concrete meaning.

Conclusion

Thus to sum up the implications of Haskell's Module View

  • Negatives
  1. A flat module sturcture leads to a lot of unnecessary imports
  2. A flat module structure leads to a suboptimal orgnaization of the standard library
  3. A flat module structure leads to a more rigid package structure
  4. The Correspondence between Modules and Files leads to less uniformity and more refactoring in the codebase
  • Positives
  1. The Simplicity leads to

The Unix World View or how Worse is Truly Better

Although Unix is not a programming language, it would be remis to neglect the correspondence between the file system of unix and programming lanugages.

The Unix file system has a few notions.

  1. We can be in a Module (directory).
  2. Modules (directories) are nested.
  3. Modules are in the same namespace as everything else (files)
  4. Modules can access any other Module that exists in the world (on the system)
  5. Modules form a graph, not a tree.

Let us demonstrate each point

cd ~/holder/example # We are now in ~/holder/example module/directory 2 katya@Babylon-4:~/holder/example %
# go to the module above via home dir cd ~/holder # get to example again by the module in ~/holder cd example
2 katya@Babylon-4:~/holder/example % echo "" > bin 2 katya@Babylon-4:~/holder/example % LANG=en_US.UTF-8 mkdir bin mkdir: cannot create directory ‘bin’: File exists # Modules can't be the same name as files!
# can go to the top directory just fine cd / # Can go to work just fine as well!!! cd ~/Documents/Work
# lets get weird about this 3 katya@Babylon-4:~/holder/example % mkdir Foo 3 katya@Babylon-4:~/holder/example % mkdir Foo/Bin 3 katya@Babylon-4:~/holder/example % cd Foo/Bin 2 katya@Babylon-4:~/holder/example/Foo/Bin % ln -s -d ~/holder/example/Foo . 2 katya@Babylon-4:~/holder/example/Foo/Bin % cd Foo 2 katya@Babylon-4:~/holder/example/Foo/Bin/Foo % cd Bin/Foo/Bin 2 katya@Babylon-4:~/holder/example/Foo/Bin/Foo/Bin/Foo/Bin %

From an interactive point of view the Unix file system is quite nice, in that I can access the entire unix world from any directory, there is no arbitrary cut off of access to other modules depending on what module I am in.

Further, since the Unix filesystem allows symlinks (which can be viewed as module aliases) and directories are in the same namespace as files, we can form a graph from what was a tree. This is an important implication that falls out of the rules:

  1. Modules are the same as other items in the system
  2. Modules can be Aliased

For a programming language this too would be a natural implication, although checks can be had if one really insits on the matter

Conclusion

  1. The Unix Filesystem serves as a good model for an interactive module system.
  2. The Entire System can be accessed from any other part of the system.

The Closed World View (Agda, MiniJuvix)

Agda/MiniJuvix is interesting to analyze here, as it's explictily looking for a static development model.

A good way to describe the MiniJuvix's Module view is as follows.

  1. There exists Global Modules
  2. There exists Local Modules within Global Modules
  3. Global modules can not be in other Global Modules
  4. Until a module is imported (required), it does not exist as far as that module is concerned.

An Argument on Conflicting Definitions

Let us give an example of interesting behavior that is entailed by the above rules.

Say we are a business, and we have some very vital business functions. it's so important in fact that we have named a module BusinessLogic.Important to handle all the important requests. We can think of it as a common API.

Now BusinessLogic.Important is a bit verbose to say, so, we have another Global module, BusinessLogic which rexports Important, and perhaps adds a few other functions that are handy with the less important functions of BusinessLogic..

Now say we browse the codebase and see the pattern of Application.Platform all over the codebase, people import BusinessLogic, and carry on with important and non important functions.

Now for our task, let's make an app, but we are still in the experimentation phase, so we just import the Important functions to begin.

-- BusinessLogic/Importnat.mju ---------------------------------- module BusinessLogic.Important .... key-change does not exist in here ... -- BusinessLogic.mju ---------------------------------- module BusinessLogic import BusinessLogic.Important module Important = open public BusinessLogic.Important def key-change = .... -- Application/Platform.mju ---------------------------------- module Application.Platform import BusinessLogic def fi = BusinessLogic.Important.key-change module BL = BusinessLogic def vital = ... BL.Important.key-change ... -- MyFirstApp.Ju ---------------------------------- module MyFirstApp import BusinessLogic.Important -- lets use our key-change! def lets-business = ... BusinessLogic.Important.key-change ....

Please take a second and think about what you expect from module MyFirstApp.

Does this code work? The answer is no. the Global module BusinessLogic.Important does not have key-change, the only module which does is

BusinessLogic.Important, sorry let me rephrase, BusinessLogic's local module, Important.

Thus Although the pathing looks somewhat nested, it simply isn't. Thus the following code would lead to an ambiguity error being shown to the user

-- MyFirstApp.Ju ---------------------------------- module MyFirstApp import BusinessLogic.Important import BusinessLogic -- lets use our key-change! def lets-business = ... BusinessLogic.Important.key-change ....

as these two modules are in fact not the same! Thus the modules both live in the same namesapce as everything else, but also don't on a global level.

An argument on User Extensions

Futher issues arise when we want user contributions in the STDlib. For example, the user can not extend the base standard library.

It is nice when a user can extend the language in the same way as the language developer can, thus user extensions feel built in as they look no different than what the language authors have wrote down.

In the case of the standard library this might occur like in the example in the section An argument over the Flat World Model. Namely in the Data.Stream.Proofs example.

In the MiniJuvix world, this would not so easily be gotten, and thus the Prelude may export Stream

module Prelude import Prelude.Stream module Stream = open public Prelude.Stream

and in the normal module of a truly nested system we can get

import Prelude open Prelude fi = Stream.Proofs.occurence

however in the MiniJuvix case, this would not be possible, and we'd have to explicitly import the extension

import Prelude import Prelude.Stream.Proofs -- I'm unsure of how minijvuix handles the Prelude here, as they both have it... open Prelude fi = Prelude.Stream.Proofs.occurence

Thus user extensions to the language do not feel built in, and getting it to be so, would require users to write their own prelude for each project, which tries to hookup whatever extensions they want.

An Argument Of Egg Shell Interaction

This import behavior of global modules to be recognized in the current scope, has another wide ranging implication. Namely that:

  1. The system is not accessible from any place in time.
  2. Changing the current module if that is even allowed, resets the places of the system where you can see.

For example, let us play in the MyFirstApp.

JU-User> import MyFirstApp JU-User> in-module MyFirstApp MyFirstApp> lets-business ... Gave wrong result ...

It seems lets-business gave the wrong result. Let us check out our tests to see what is up

MyFirstApp> Test.MyFirstApp

here I can auto complete Test.My as the system knows about the code in the system, but it is not accessible, thus I should get an error like

error Test is not a module in MyFirstApp

Thus we have to import it

MyFirstApp> import Test.MyFirstApp MyFirstApp> Test.MyFirstApp.run-the-tests ... 5 tests out of 10 failing ...

and now we can run it. If we were to investigate the issue and deduce it comes from Business.LessImportant instead then we can maybe play with the data like

MyFirstApp> import Business.LessImportant MyFirstApp> in-module Business.LessImportant Business.LessImportant> ... play around for a while and play with behavior ... Business.LessImportant> import MyFirstApp -- time to go back, but first import where we are Business.LessImportant> in-module MyFirstApp

Note here we had to remember to rebring MyFirstApp into scope, as by the very act of switching what module our current environment is, has made us forget about MyFirstApp even though it exists perfectly well in the environment, we simply can't talk about that namespace in MyFirstApp [3].

Conclusion

  1. MiniJuvix's module system allows multiple modules with conflicting definitions being accessible (import BusinessLogic.Importat, import BusinessLogic. Multiple BusinessLogic.Important!).
  2. In the State of extending Say Shared Structures, the entire structure must be rexported.
  3. The System combines the Flat Nature of Haskell's Module system with some local nested Systems
  4. The System can be checked without explicit ordering declarations
  5. The System is not accessible from any point in the system.

Point 5. leads to

  1. Trying to move around and interact with the system in an interactive way is relegated to an unergonomic second class citizen.

And point 3. leads to

  1. User extensions on Libraries look less first class than in Haskell

This is mainly due to requiring an extra import, that builtins do not need.

The Rust World View

The OCaml World View

Desirable Values

  1. Modules are locatable in the world not only through the top level flat table, but also in other modules.
  2. Modules exist in the same namespace as values (Juvix-1!)
  3. Modules approach first class values, if not first class values
  4. Modules are defined incrementally

The Juvix World View

Out of all these models. The closest world view on modules to the Desirable values table above is the Unix view with Ocaml characteristics.

Thus the design for Juvix's module system is quite simple.

  1. Modules exist in the same namespace
  2. Modules are nested
  3. Modules can be switched at any point and be added to incrementally
  4. The Session is always in some module

Most of the properties of the Juvix Module system fall out of fact 1., 2. and 3..

For example

  • Graphs may form because if one rexports their ancestory, they can then by some path trace back to themselves. This is a rare occurrence but fall out of the rules. Some arbitrary rules can be applied to stop this, as the system knows enough to fully stop this behavior if need be.

  • From the Interactive Sesssion, the user can hop to any module without recompilation or stating explicitly what they need to bring into scope as the user has the full access to the system.

The Juvix Module system will have special code to handle

  1. Includes
  2. Opens (Implicit and Explicit)
  3. Private values

Where an implicit open is something opened in the environment automatically.

Other feautres such as Aliases, are derived from these two features above.

Thus the lookup rule for the Current module of Juvix can now be explained

  1. Lookup the value in the Private Table
  2. Lookup the value in the Pubically exported table
  3. Lookup the value in the Explicit Opens and Includes
  4. Lookup the value in the implicit Opens

For Global lookups the rule changes slightly

  1. Lookup the vlaue in the Pubically exported table
  2. Lookup the value in the Includes

Examples

TODO :: how it works just falls out of the implications, so just have fun with design - mariari

Future Update Notes

  • Add an explicit point about first class environemnts
  • Make a point about how you can have primitives that split your global environment
    • You can then express this as a a package system like asdf over your code
    • The build system can also rename the packages that come into the environment, which means that we can properly deal with ambiguities of different library versions
    • Make a point about how this scales better in a nested world (this is hinted at in a point in this document, refer back to it when making this point)
    • Make a point about how doing this at a file/module level is too fine grained and will cause more confusion as the concepts get confused

Foot Notes

[1] https://www.youtube.com/watch?v=-J8YyfrSwTk I believe it is this talk where Yaron Minsky explains that if tests aren't easy to make people won't do it, even if it is important.
I don't have the exact time stamp, and it may be a different talk. Update this reference when it is precisely found.

[2] Hence + for string concatenation in java is horrible and ought to be taxed. Even making it be ++ to make it slightly more taxing to write may indeed lower some usage.

[3] Note it has to be like this, as many shortcuts we can make, can run into an ambiguity and frustrating of wanting to go to BusinessLogic's Important rather than BusinessLogic.Important.