OCaml is a statically and strongly typed programming language. It is also an expression-oriented language, everything is a value, and every value has a type. Functions and types are the two foundational principles of OCaml. The OCaml type system is highly expressive, providing many advanced constructs. Yet, it is easy to use and unobtrusive. Thanks to type inference, programs can be written without typing annotations, except for documentation purposes and a few corner cases. The basic types and the type combination operations enable a vast range of possibilities.
This tutorial begins by a section presenting the types which are predefined in OCaml. It starts with atomic types such as integers and booleans. It continues by presenting predefined compound types such as strings and lists. The tutorial ends with a section about user-defined types: variants and records.
OCaml provides several other types, but they all are extensions of those presented in this tutorial. Types which are in the scope of this tutorial are all the basic constructors and most comon predefined types.
This is an intermediate level tutorial. The only prerequisite is to have completed the get started series of tutorials.
The goal of this to tutorial is to provide for following capabilies:
Here is an integer:
The int
type is the default and basic type of integers numbers in OCaml. It represents platform dependent signed integers. This means int
does not always have same the number of bits, depending on underlying platform characteristics such as processor architecture or operating system. Operations on int
values are provided by the Stdlib
and the Int
modules.
Usually int
has 31 bits in a 32-bit architectures and 63 in 64-bit architectures, one bit is reserved for OCaml's runtime operation. The standard library also provides Int32
and Int64
modules which supports platform independent operations on 32 and 64 bits signed integers. These modules are not detailed in this tutorial.
There are no dedicated types for unsigned integers in OCaml, bitwise operations on int
just ignore the sign bit. Binary operators use standard symbols, signed remainder is writen mod
. There is no predefined power operator on integers in OCaml.
Fixed-size float numbers have type float
. Operations on float
complies with the IEEE 754 standard, with 53 bits of mantissa and exponent ranging from -1022 to 1023.
OCaml does not perform any implicit type conversion between values. Therefore, arithmetic expressions can't mix integers and floats, parameters are either all int
or all float
. Arithmetic operators on float are not the same, they are written with a dot suffix: +.
, -.
, *.
, /.
.
Operations on float
are provided by the Stdlib
and the Float
modules.
Boolean values are represented by the type bool
.
Operations on bool
are provided by the Stdlib
and the Bool
modules. Conjunction (“and”) is written &&
and disjunction (“or”) is written \\
; both don't evaluate their right argument if the value of their left argument is sufficient to deciced the value of the whole expression.
Values of type char
correspond to the 256 symbols defined in the ISO/IEC 8859-1 standard. Character literals are surrounded by single quotes. Here is an example.
Operations on char
values are provided by the Stdlib
and the Char
modules.
The module Uchar
provides support for Unicode characters.
Strings are finite and fixed-sized sequences of values of type char
. Strings are immutable, it is impossible to change the value of character inside a string. The string concatenation opeartor has symbol ^
.
Indexed access to string characters is possible using the following syntax:
Operations on string
values are provided by the Stdlib
and the String
modules.
Byte sequences are finite and fixed-sized sequences of bytes. Each individual byte is represented by a char
value. Byte sequences are mutables, they can't be extended or shortened, but each component byte may be updated. Essentially, a byte sequence byte
is a mutable string that can't be printed. There is no way to write a bytes
literally, it must be produced by a function.
Operations on bytes
values are provided by the Stdlib
and the Bytes
modules. Only the function Bytes.get
allows direct access to the characters contained in a byte sequence. There is not direct access operator on byte sequences.
Arrays are finite and fixed-sized sequences of values of a the same type. Here are a couple of examples:
Arrays may contains values of any type. Here arrays are int array
, char array
and string array
, but any type of data can used in an array. Usually, array
is said to be a polymorphic type. Strictly speaking it is a type operator, it accepts a type as parameter (here int
, char
and string
) to form another type (those infered here). This is the empty array.
Here 'a
means “any type”. It is called a type variable and is usally pronounced like if it was the greek letter α (“alpha”). This the type parameter meant to be replaced by another type.
Like string
and bytes
, arrays support direct access, but the syntax is not the same.
Arrays are mutables, they can't be extended or shortened, but each component value may be updated.
Operations on arrays are provided by the Array
modules. There is a dedicated tutorial Arrays.
As literals, list are very much like arrays. Here are the same examples as previously, turned into lists.
Like arrays, lists are finite sequences of values of the same type. They also are polymorphic too. However, lists are extensible, immutable and don't support direct access to all the values it contains. Lists play a central role in functional programming, they are the subject of a dedicated tutorial.
Operations on lists are provided by the List
module. The List.append
function, which concatenates two lists can also be used as an operator with the symbol @
.
Two symbols are of special importance with respect to lists.
[]
, has type 'a list'
and is pronounced nil::
and pronounced “cons”, it is used to add a value at the head of a listTogether, they are the basic mean to build lists and access the data stored in lists. For instance here is how lists are build by successively applying the cons operator.
Pattern-matching provides the basic mean to access data stored inside a list.
In the above expressions [1; 2; 3]
is the value which is matched over. Each expression between |
and ->
symbols is a pattern. They are expressions of type list, only formed using []
, ::
and variables names; representing various shapes a list may have. When the pattern is []
it means “if the list is empty”. When the pattern is x :: u
it means “if the list contains data, let x
be the first element of the list and u
be the rest of the list.” Expression at the right of the ->
symbols are the results returned in each corresponding case.
Operations on lists are provided by the List
module. There is a dedicated tutorial on Lists.
The option
type is also a polymorphic type. Option values can store any kind of data, or represent absence of any such data. Option values can only be constructed in two different ways; either None
when no data is available or Some
otherwise.
Here is an example of pattern matching on a option value.
Operations on options are provided by the Option
module. Options are discussed in the Error Handling guide.
When it makes sense to mark the outcomes of a function as being either failure or success, the result
type can do it. There are only two ways to build a result value; either using Ok
or Error
, with the intendended meaning. Both constructors can hold any kind of data. The result
type is polymorphic but it has two type parameters, one for Ok
values, another for None
values.
Operations on results are provided by the Result
module. Results are discussed in the Error Handling guide.
Here is a tuple, actually a pair.
This is pair containing the integer 3
and the character 'a'
; its type is int * char
. The *
symbol stands for product type.
This generalizes to tuples with 3 or more components, for instance : (6.28, true, "hello")
has type float * bool * string
. The types int * char
and float * bool * string
are called products types. The *
symbol is used to
The predefined function fst
returns the first component of a pair, while snd
returns the second component of a pair.
In the standard library both are defined using pattern matching. Here is how a function extracting the third component of the product of four types.
Note that the product type operator *
is not associative. Types int * char * bool
, int * (char * bool)
and (int * char) * bool
are not same, the values (42, 'a', true)
, (42, ('a', true))
and ((42, 'a'), true)
are not equal.
The type of functions from type a
to type b
is written a -> b
. Here are a few examples:
The first expression is an anoymous function of type int -> int
. The type is infered from the expression x * x
which must be of type int
since *
is an operator which returns an int
. The <fun>
printed in place of the value is token meaning function don't have a value to be displayed. This is because if they have been compiled, their code may not be available.
The second expression is function application, parameter 9
is applied, result 81
is returned.
The first expression is another anonymous function, it is the identity function, it returns its argument, unchanged. This function can be applied to anything. Anything can be returned unchanged. This means the parameter of that function can be of any type, and result must have the same type. This is called polymorphism the same code can be applied to data of different types.
This is what is indicated by the 'a
in the type (pronounced as the greek letter α, “alpha”). This is a type variable. It means values of any type can be passed to the function. When that happens, their type is substitued to the type variable. This also expresses identity has the same input and output type, whatever it may be.
The two following expressions shows the identity function can indeed be applied to parameters of different types.
Defining a function is the same as giving a name to any value. This is was is illustrated in the first expression.
When writing in OCaml, a lot of function are written. The function g
is defined here using a shorter, more common syntax and maybe more intuitive syntax.
In OCaml, functions may terminate without returning a value of the expected type by throwing an exception, this does not appear in its type. There is no way to know if a function may raise an exception without inspecting its code.
Functions may have several parameters.
As of the product types symbol *
, the function type symbol ->
is not associative. These two types are not the same:
(int -> int) -> int
: this is a function taking function of type int -> int
as parameter, and returning an int
as resultint -> (int -> int)
: this is a function taking an int
as paramter and returning a function of type int -> int
as resultA unique value has type unit
, it is written ()
and pronounced “unit”.
The unit
type has several usages. One of its main roles is to serve as a token when a function does not need to be passed data or doesn't have any data to return once it has completed its computation. This happens when functions have side effects such as OS-level I/O. Functions need to be applied to something for their computation to be triggered, they also must return something. When nothing making sense can be passed or returned, ()
should be used.
Function read_line
reads an end-of-line terminated sequence of characters from standard input and returns it as a string. Reading input begins when ()
is passed.
Function print_endline
prints the string followed by and line ending on standard output. Return of the unit value means the output request has been queued by the operating system.
The simplest form of a variant type corresponds to an enumerated type. It is defined by an explicit list of named values. Defined values are called constructors and must be capitalized.
For example, here how a variant data type could be defined to represent Dungeons & Dragons character classes and alignments.
Such kind of variant types can also be used to represent week days, cardinal
directions or any other fixed sized set of values that can be given names. A
total ordering is defined on values, following the definition order (e.g. Druid < Ranger
).
Here how pattern matching can be done on types defined as such.
Note that:
unit
is an enumerated as a variant with a unique constructor is ()
.bool
is also an enumeated as a variant with two constructors : true
and false
.A pair (x, y)
has type a * b
where a
is the type of x
and b
is the type of y
. Some may find intuiguing that a * b
is called a product. Although this is not a complete explanation, here is a remark which may help understanding. Consider the product type character_class * character_alignement
. There are 12 classes and 9 alignments. Any pair of values from those types inhabits the product type. Therefore, in the product type, there are 9 × 12 = 108 values, which also is a product.
It is possible to wrap data in constructors. The following type has several constructors with data and some without. It represents the different means to refer to a Git commit.
Here is how pattern matching can be used to write a function from commit
to string
Here, the function ...
construct is used instead of the match ... with ...
construct. Previously, example functions had the form let f x = match x with ...
and the variable x
did not appear after any of the ->
symbols. When it is the case the function ...
construct can be used instead, it stands for fun x -> match x with ...
and saves from finding a name which is used right after and only once.
A variant definition refering to itself is recursive. A constructor may wrap data from the type being defined.
This the case of the following definition, which can be used to store JSON values. Here is how it can look like:
Both constructors Array
and Object
contain values of type json
.
Functions defined using pattern matching on recursive variants are often recursive too. This functions checks if a name is present in a whole JSON tree.
Here, the last pattern is using the symbol _
which catches everything. It allows returning false
on all data which is neither Array
nor Object
.
The predefined type option
is defined as a variant type, with two constructors: Some
and None
. It can contain values of any type, such as Some 42
or Some "hola"
. The variant option
is polymorphic. Here is how it is defined in the standard library:
The predefined type list
is also a polymorphic variant with two constructors. Here is how it is defined in the standard library:
The only bit of magic here is the trick turning constructors into symbols. This is left unexplained in this tutorial. The types bool
and unit
also are regular variants, with the same magic:
Implicitely, product types also behaves as variant types. For instance, pairs can be seen as inhabitants of this type:
Where (int, bool) pair
would be writen int * bool
and Pair (42, true)
would be written (42, true)
. From developer perspective, everything happens as if such a type would be declared for every possible product shape. This is what allows pattern matching on products.
Even integers and floats can be seen as enumerated-like variant types, with many constructors and funky syntactic sugar. This is what allows pattern matching on those types.
In the end, the only type construction which does not reduce to a variant is the function arrow type. No pattern matching on functions.
Here is an example of a variant type which combines constructors with data and without data, polymorhism and recursion.
It can be used to represent arbitrary labelled binary trees. Using pattern matching, here is how the a map function can be defined in this type:
Remark: OCaml has someting called Polymorphic Variants. Although the types option
, list
and tree
are variants and polymorphic, they aren't polymorphic variants, they are type parametrized variants. Among the functional programming community the word “polymorphism” is used loosely, whenever anything can be applied to various types. We stick to this usage and say the variants in this section are polymorphic. OCaml polymorphic variants are covered in another tutorial.
Records are a like tuples, several values are bundled together. In a tuple, components are identified by their position in the corresponding product type. They are either first, second, third or at some position. In a record, each component is has a name. That's why record types must be declared before being used.
For instance, here is the defintion of a record type meant to partially represent a Dungeons & Dragons character class.
This is using the types character_class
and character_alignment
defined earlier. Values of type character
are carrying the same data as inhabitants of this product: string * int * string * character_class * character_alignment * int
.
Access to the fields is done using the dot notation. Here is an example:
To some extent, records also are variants, with a single constructor carrying all the fields as a tuple. Here is how to alternately define the character
record as a variant.
One function for each field, to get the data it contains. It provides the same funtionality as dotted notation.
Writting level ghorghor_bey'
is the same as ghorghor_bey.level
.
Remarks
To be true to facts, it is not possible to encode all records as variants since OCaml provides a mean to define fields those value can be updated which isn't avaiable while defining variant types. This is detailed in the tutorial on imperative programming.
Records SHOULD NOT be defined using this technique. It is only demonstrated here to further illustrate the expressive strengh of OCaml variants.
This way to define records MAY be applied to Generalized Algebraic Data Types which are the subject of another tutorial.
Just like values, any type can be given a name.
This is mostly useful as a mean of documentation or as mean to shorten long type expressions.
This tutorial has provided a comprehensive overview of the basic data types in OCaml and their usage. We have explored the built-in types, such as integers, floats, characters, lists, tuples and strings, and user-defined types: records and variant types. Records and tuples are mechanisms for grouping heterogeneous data into cohesive units. Variants are a mechanism for exposing heterogeneous data as coherent alternatives.
From the data point of view, records and tuples are like conjunction (logical “and”), while variants are like disjunction (logical “or”). This analogy goes very deep, with records and tuples on one side as products and variants on the other side as union. These are true mathematical operations on data types. Records and tuples play the role of multiplication, that why they are called product types. Variants play the role of addition. Putting it all together, basic OCaml types are said to be algebraic.
Going further, there are several advanced topics related to data types in OCaml that you can explore to deepen your understanding and enhance your programming skills.