The purpose of this lab is to learn how to translate between languages (English, Spanish, German, …) using Grammatical Framework.
See also last week's introductory notes on grammar.
See below for Homework, Due 2/16.
To follow along with the lab, go to https://cloud.grammaticalframework.org/gfse/ or, if you have GF installed, create a new folder called HelloGrammar.
All languages between which we will translate will have the same abstract grammar, but every lanugage will have its own unique concrete grammar. Translation works, very roughly, as follows:
concrete grammar English -> abstract gr -> concrete gr Spanish
The algorithms behind ->
are implemented in GF, all we have to do is do write the grammars.
The arrow ->
on the left does what is called parsing, the arrow on the right does what is called linearization.
Aim: At the end of these notes, you will be able to translate between a simple fragment of English and Spanish.
When using grammatical framework, you'll almost always need two or more grammars working in tandem to produce the desired result. The first is the abstract grammar, essentially a generalized definition of your language that is more machine readable than human readable. The goal of the abstract grammar is to define what abstract syntax trees are possible in your language. Here is a very simple abstract grammar that lets you greet someone.
abstract Hello = {
flags coding = utf8 ;
flags startcat = Greeting;
cat
Greeting; Recipient;
fun
Hello : Recipient -> Greeting;
Welcome : Recipient -> Greeting;
Friends : Recipient;
Mom : Recipient;
World : Recipient;
}
abstract Hello = {}
This is simply the header for an abstract grammar. It's the keyword abstract
followed by the general name of your grammar. If you were writing a grammar for, say, a calculator, you might say abstract Calculator
instead.
flags coding = utf8 ;
flags startcat = Greeting;
Flags handle meta information about your grammar. The first one handles the character set that the grammar uses. The second flag defines the starting point of your grammar.
cat
Greeting; Recipient;
The keyword cat (category) is allows the user to specify all "categories" present in the grammar. Categories are similar to parts of speech/sentence constructs. For example, a noun phrase would be a category, as well as a noun. You might expect a noun phrase and noun could be used in the same context.
In this case we only have two categories: Recipient, which represents who is being greeted. And Greeting, which will combine a greeting phrase with a recipient to produce a complete sentence.
fun
Hello : Recipient -> Greeting;
Welcome : Recipient -> Greeting;
Functions (fun
) define how categories are pieced together and where specific words are included. The specifics of how these functions work are handled in concrete versions of the grammar. For now, we can think of funs in an abstract grammar as headers that will need to be filled in later once we get to the concrete grammars.
When a function accepts multiple categories, the final category is the "return value" and every category prior are the parameters. So in this case, the Hello function takes a Recipient and transforms it into a Greeting.
Friends : Recipient;
Mom : Recipient;
World : Recipient;
These functions don't take any parameters, and instead are just the representations of singular words of type Recipient
Concrete grammars are the second type of grammar you'll be working with. Each of these are implementations of abstract grammars. GF supports many different languages when writing concrete grammars, but for the sake of ease, let's start with English
concrete HelloEng of Hello = {
flags coding = utf8 ;
lincat Greeting, Recipient = {s : Str} ;
lin
Hello recip = {s = "hello" ++ recip.s} ;
Welcome recip = {s = "welcome" ++ recip.s} ;
World = {s = "world"} ;
Mom = {s = "mom"} ;
Friends = {s = "friends"} ;
}
concrete HelloEng of Hello = {}
This is the header for a concrete grammar. The convention for the name of the grammar is [AbstractName + Language Abbreviation]. You must specify of [AbstractName]
as well.
lincat Greeting, Recipient = {s : Str} ;
The lincat
keyword defines constructors for each category type. For now, we only need to hold on to a string for both Greetings and Recipients, but more complex data can be stored in lincats. In this case, however, Greeting and Recipient are both given a single piece of data: a string called s
.
lin
Hello recip = {s = "hello" ++ recip.s} ;
Welcome recip = {s = "welcome" ++ recip.s} ;
The lin keyword defines "linearizations" for the different functions defined in the abstract grammar. Every function needs to have a linearization. These are both linearizations of the two greeting functions defined in Hello
. Since both functions have a Recipient as a parameter, the function name is accompanied with a local Recipient called recip
. The function then says that the string s
is equal to some word concatenated with the string stored in recip
. As you can see, the concatenation operator is ++
. A space is inserted automatically.
World = {s = "world"} ;
Mom = {s = "mom"} ;
Friends = {s = "friends"} ;
Finally, all we need to define for the three Recipient functions is the word they represent.
GF allows concrete implementations in a variety of languages. Let's implement the same grammar in Spanish. With a simple grammar like this, it's easy enough to do: (Note, you can provide definitions for multiple lincats on the same line)
concrete HelloSpa of Hello = {
flags coding = utf8 ;
lincat Greeting, Recipient = {s : Str} ;
lin
Hello recip = {s = "hola" ++ recip.s} ;
Welcome recip = {s = "bienvenidos" ++ recip.s} ;
World = {s = "mundo"} ;
Mom = {s = "mamá"} ;
Friends = {s = "amigos"} ;
}
Let's see some more functionality of GF by adding adjectives to our grammar. It's a good idea to always start by updating the abstract gramar then move to the concrete implementatons.
abstract Hello = {
flags coding = utf8 ;
flags startcat = Greeting;
cat
Adjective; Greeting; Recipient;
fun
Hello : Recipient -> Greeting;
Welcome : Recipient -> Greeting;
RecipPhrase : Adjective -> Recipient -> Recipient;
Friends : Recipient;
Mom : Recipient;
World : Recipient;
Awesome : Adjective;
Happy : Adjective;
}
Let's disect what our new abstract grammar means
cat Adjective; ...
As discussed, we need a new category, so best to include it with the others.
RecipPhrase : Adjective -> Recipient -> Recipient
This function is the core of our new grammatical construct. Essentially it is saying that the combination of a single Adjective and a single Recipient is now considered a Recipient. This has the added bonus of allowing for multiple adjectives to be added together, but we'll see why that is in a bit.
Awesome : Adjective;
Happy : Adjective
Just like before, we need to define some words as well, this time that evaulate to Adjectives.
With these changes, updating our concrete English grammar is easy enough:
concrete HelloEng of Hello = {
flags coding = utf8 ;
lincat
Greeting = {s : Str};
Recipient = {s : Str};
Adjective = {s : Str};
lin
Hello recip = {s = "hello" ++ recip.s};
Welcome recip = {s = "welcome" ++ recip.s};
World = {s = "world"};
Mom = {s = "mom"};
Friends = {s = "friends"};
RecipPhrase adj recip = {s = adj.s ++ recip.s};
Awesome = {s = "awesome"};
Happy = {s = "happy"};
}
As you can see, all it takes is defining a simple lincat for Adjective and some linearizations for RecipPhrase and our new adjectives.
Unfortunately, Spanish isn't quite as simple becuase it makes use of two grammatical concepts that english doesn't have: grammatical gender and number agreement with nouns and adjectives. For example, we want "Hello happy friends" to map to "Hola amigos felices" and we want "Welcome awesome world" to yield "Beinvenidos mundo maravilloso"
Luckily, Grammatical Framework gives another tool to concrete grammars: param
. Params (or parameters) can be things like verb tense, number, gender, formality, etc. Essentially anything that can inflect (change the form) of a word.
GF cloud has a cool tool where you can explore all combinations of inflections for many different languages
For our spanish grammar, we need to define two parameters: Number and Gender. Let's see that in action.
concrete HelloSpa of Hello = {
flags coding = utf8 ;
lincat
Adjective = {s : Gender => Str};
Greeting = {s : Str};
Recipient = {s : Str; g : Gender};
lin
Hello recip = {s = "hola" ++ recip.s};
Welcome recip = {s = "bienvenidos" ++ recip.s};
World = {s = "mundo"; g = Masc Sg};
Mom = {s = "mamá"; g = Fem Sg};
Friends = {s = "amigos"; g = Masc Pl};
RecipPhrase adj recip = {s = recip.s ++ (adj.s ! recip.g); g = recip.g};
Awesome = {s = table {
Masc Sg => "maravilloso";
Masc Pl => "maravillosos";
Fem Sg => "maravillosa";
Fem Pl => "maravillosas"
}};
Happy = {s = table {
Masc Sg => "alegre";
Masc Pl => "alegres";
Fem Sg => "alegre";
Fem Pl => "alegres"
}};
param
Number = Sg | Pl;
Gender = Masc Number | Fem Number;
}
param
Number = Sg | Pl;
Gender = Masc Number | Fem Number;
First, we need to define two parameters: Number and Gender. As you can see, Gender is a composite of both its own data and data from Number. This makes the rest of the code significantly simpler as we inflect our adjectives.
At their core, params are simply extra bits of data that are made available to categories. This is very similar to member variables in classes.
Recipient = {s : Str; g : Gender};
Among all the categories, it's the Recipient category that needs to store its number and gender. Since our Gender param knows about number too, you just need to include another variable g
as part of the Recipient category.
Adjective = {s : Gender => Str};
Here, we need to let GF know that Adjectives are more than just a simple string. The expression Gender => Str
is shorthand for "string representations of these words are dependent on a Gender param". This changes how the linearization for our adjectives will look.
Awesome = {s = table {
Masc Sg => "maravilloso";
Masc Pl => "maravillosos";
Fem Sg => "maravillosa";
Fem Pl => "maravillosas"
}};
Happy = {s = table {
Masc Sg => "alegre";
Masc Pl => "alegres";
Fem Sg => "alegre";
Fem Pl => "alegres"
}};
To linearize the requirement s : Gender => Str
, we need a table. (In fact, X => Y in Grammatical Framework is the data type of tables). As you can see in each of the tables, we need to define what the adjective should evaluate to any given combination of gender and number
Also, literal gender values have the exact same format as our definition, in this case it's either Masc
or Fem
followed by Sg
or Pl
.
World = {s = "mundo"; g = Masc Sg};
Mom = {s = "mamá"; g = Fem Sg};
Friends = {s = "amigos"; g = Masc Pl};
These lines of code show how to encode the gender and number in our Recipient linearizations. As you can see, different members are separated by ;
.
RecipPhrase adj recip = {s = recip.s ++ (adj.s ! recip.g); g = recip.g};
Finally, this line ties everything together by showing how to access the correct form of the adjectives given our recipient. The expression adj.s ! recip.g
evaluates to the adjective string. Once again, adj.s is a table of type Str => Gender
. The !
operator in GF is how you "call" a table. So the table is adj.s
and the param is recip.g
.
As is the case with many languages, the ability to define reusable code is very important for writing readable, scalable GF code. Whereas python has Functions and Java has Methods, GF has Operations. Let's see how definining just two opers
can make our Spanish Grammar much shorter while still accomplishing the same goal.
concrete HelloSpa of Hello = {
flags coding = utf8 ;
lincat
Adjective = Adj;
Greeting = {s : Str};
Recipient = {s : Str; g : Gender};
lin
Hello recip = {s = "hola" ++ recip.s};
Welcome recip = {s = "bienvenidos" ++ recip.s};
World = {s = "mundo"; g = Masc Sg};
Mom = {s = "mamá"; g = Fem Sg};
Friends = {s = "amigos"; g = Masc Pl};
RecipPhrase adj recip = {s = recip.s ++ (adj.s ! recip.g); g = recip.g};
Awesome = mkAdj "maravilloso" "maravillosos" "maravillosa" "maravillosas";
Happy = mkAdj "alegre" "alegres" "alegre" "alegres";
param
Number = Sg | Pl;
Gender = Masc Number | Fem Number;
oper
Adj : Type = {s : Gender => Str};
mkAdj : Str -> Str -> Str -> Str -> Adj = \ms,mp,fs,fp -> {
s = table {
Masc Sg => ms;
Masc Pl => mp;
Fem Sg => fs;
Fem Pl => fp
}
};
}
Adj : Type = {s : Gender => Str};
Our first oper is just a simple type definition. You may notice that the first part of the line is very similar to our abstract grammar funs (i.e. Friends : Recipient
). Essentially we're saying that the oper Adj
is of Type Type
. After the equals sign is where we actually define what the type "looks like". In this case, we once again have a table that translates from Gender to Str. (Note: The standard convention for GF is to capatalize the first letter of Types)
This kind of oper is useful if you have a bunch of lincats that all have the same structure. But a much more important use is so you can define more complex opers that can make use of your new type as we'll see in a bit.
Adjective = Adj;
Type opers can simply be substituted for the full typing of lincats. If we wanted to include another grammatical concept that was dependent on gender and number, we could very easily do so without reusing code.
mkAdj : Str -> Str -> Str -> Str -> Adj = \ms,mp,fs,fp -> {
s = table {
Masc Sg => ms;
Masc Pl => mp;
Fem Sg => fs;
Fem Pl => fp
}
};
Opers can be much more than simple type definitions! This oper defines a "constructor" of sorts that lets us very easily make a new Adjective.
First off, the type of this oper is specified, once again, in between the :
and the =
. In this case, it takes four strings and returns an adjective. After the =
is something you may not have seen before: lambda expressions. If you've never seen lambda expressions don't worry, in this case it functions simply as a way to give names to each of the parameters. The full function header translated into Java would be:
adj mkAdj(String ms, String mp, String fs, String fp){...}
Second, the body of the oper should look familiar! It's the same table definition format that we used before, where each gender/number combo is given a unique string.
Awesome = mkAdj "maravilloso" "maravillosos" "maravillosa" "maravillosas";
Happy = mkAdj "alegre" "alegres" "alegre" "alegres";
Here is how we actually use our new oper. If you've worked with Haskell before, this syntax should look familiar. But if not, in functional languages don't use parentheses to call functions like imperative languages do. Instead, they're simply called by writing the name followed by each of the parameters separated by spaces. You can place the whole function call in parentheses, which would be especially useful if you wanted to use the resulting Adj
as the parameter of another oper. But in this case, it's not necessary.
(due before the start of class 2/16/22)
Greet : Type =
Recip : Type =
mkGreet : Str -> Recip -> Greet = \txt,recip ->
mkRecip : Str -> Gender -> Recip = \txt,g ->
mkRecipPhr : Adj -> Recip -> Recip = \adj,recip ->
If you want to add a German concrete grammar to translate from and to English and Spanish, here are some hints. For more ask on the Slack channel.
German word order, for now, is as in English rather than as in Spanish.
Here is a little dictionary:
English | German |
---|---|
Hello | Hallo |
Welcome | Willkommen |
World | Welt |
Mom | Mama |
Friends | Freunde |
awesome | wunderbar |
happy | freundlich |
German nouns can have 3 grammatical genders, masculine, feminine, neutral. These do not need to align with what we would expect from their meanings. Eg "Mädchen" for English "girl" is neutral.
The endings of the adjectives (works for the one listed above, I didn't check an official grammar):
Gender Number | Ending |
---|---|
Masc Sg | -er |
Masc Pl | -e |
Fem Sg | -e |
Fem Pl | -e |
Neut Sg | -es |
Neut Pl | -e |
For example, "awesome friends" becomes "wunderbare Freunde" adding the "e" at the end of "wunderbar" to indicate plural.