Lambda expressions for Table functions (addendum to DCIC)

Addendum to Section 4.1.4

Section 4.1.4 of your textbook gives some examples of using filter-with and build-column in a format slightly different than what we are going to use in class.

For 4.1.4.1 Finding Rows:

The textbook writes:
filter-with(shuttle, below-1K)
We want you to use:
filter-with(shuttle, lam(r): below-1K(r) end)

For 4.1.4.3 Adding New Columns:

The textbook writes:
build-column(employees, "total-wage", compute-wages)
We want you to use:
build-column(employees, "total-wage", lam(r): compute-wages(r) end).

Why are we doing it this way? A lambda is just a function without a name (called an "anonymous function"). It allows us to express a computation that we want to happen on some input, for example, each Row of a Table. For now, we'll only use them in one specific context: as the input for filter-with and build-column. Since build-column and filter-with require us to provide a computation that gets applied to each Row, think of breaking up a lambda expression such as lam(r): compute-wages(r) end into two pieces:

An expression on any Row of the table, assuming the Row is named r, e.g. compute-wages(r). In this case, we are calling compute-wages on r, which we know we can do because compute-wages was written to take in a Row with the specific columns of the employees Table.
The piece with lam(r) and end, which is just syntax we use for defining an anonymous function (which filter-with and build-column will call on Row of the Table).

By using the lambda syntax, we make it explicit that we are using an expression that computes on each row r of the Table. This will come in handy when we tackle more complicated program tasks, but can also save you from having to write a function every time you have to process a table, as we'll see below.

Simplifying the textbook example without writing a helper function

In our preferred method of calling build-column with a lambda expression, the lambda expression can be any expression on a Row of the table, not just a helper function call! Let's see how we can use that idea to rewrite the example in 4.1.4.3 without a helper function.

Trying out expressions on `Row`s

When we were writing functions, we started by writing example expressions on different, specific pieces of data and highlighting the similarities and differences in order to create a function that can take in custom inputs (like the three stripe flag example). We can do so the same thing with expressions on Rows of a Table. Let's do this with the total wage example from 4.1.4, by computing the total wage for some example rows. First, let's grab some example rows from the table using row-n:

harley-row = employees.row-n(0)
obi-row = employees.row-n(1)

How do we compute Harley's total wage and Obi's total wage using the information stored in these rows? We know that we can use the notation row-name[column-name] to extract the value of a specific column within a row (Section 4.1.2)). If we wanted to compute Harley's total wage, we would multiply the value of the "hourly-wage" cell by the value of the "hours-worked" cell:

harley-row["hourly-wage"] * harley-row["hours-worked"]

Same idea for Obi:

obi-row["hourly-wage"] * obi-row["hours-worked"]

We can run these expressions in Pyret to check our work.

From specific expressions to lambdas

Looking at our examples above, we see that if we have any Row in the employees table (let's call it r), the total wage will be:

r["hourly-wage"] * r["hours-worked"]

If we want to build a column with the total wage for every employee, this is exactly the computation we need! build-column applies the computation of its third input (the lambda expression) to every row of the input table, in order to compute the value that goes into the new column of that row. The syntax lam(r) helps that happen, by telling Pyret/build-column that each row can be called r:

build-column(employees, "total-wage", lam(r): r["hourly-wage"] * r["hours-worked"] end)

This means that we do not need a separate compute-wages function and can accomplish our task in just one line.

Helper functions with inputs other than rows

Let's think outside of the context of Tables for a minute and say we had a function to compute a total wage from two Numbers:

fun get-total-wage(hour-wage :: Number, hours-worked :: Number) -> Number:
    hour-wage * hours-worked
where:
    get-total-wage(100, 0) is 0
    get-total-wage(15.50, 8) is 124
end

Why might we have such a function?

The example above is admitedly extremely simple, but often we have functions that are written outside of the context of tabular data but work on data we see in a Table. These functions might be used in other computations, or might be easier to test than functions on Tables/Rows, since they just require us to provide example values such as Strings and Numbers.

How might we use get-total-wage to rewrite our build-column? We need to pay attention to how we "stitch together" the data in a row (because build-column works on Rows) with the data the function takes in. Again, let's start with an example for Harley's Row. To get their total wage using the helper function, we need to know their hourly wage and their hours worked. But we can get that using the bracket notation:

get-total-wage(harley-row["hourly-wage"], harley-row["hours-worked"])

Generalizing to any row r in the table, our helper function call looks like:

get-total-wage(r["hourly-wage"], r["hours-worked"])

So, our build-column can be written a third way,

build-column(employees, "total-wage", lam(r): get-total-wage(r["hourly-wage"], r["hours-worked"]) end)

When should we use a helper function vs. putting the whole computation in a lambda expression?

The expressions

build-column(employees, "total-wage", lam(r): compute-wages(r) end)

and

build-column(employees, "total-wage", lam(r): r["hourly-wage"] * r["hours-worked"] end)

and

build-column(employees, "total-wage", lam(r): get-total-wage(r["hourly-wage"], r["hours-worked"]) end)

will produce equivalent tables, so either one would be correct. However, one form may be preferable to the other in the following cases:

If the expression is a short (1 line) expression over a row, it is probably simpler/more readable to directly write the computation in the lambda expression rather than to use a helper function. Some examples include short arithmetic expressions such as (r["col1"] + r["col2]) / 2", String expressions such as string-length(r["col3]""), Boolean expressions such as r["name"] == "Blueno", and very simple if-expressions such as if r["discount"] == "student: r["price"] * 0.9 else: r["price"] end.

If you want to test the computation separately, use a helper function (either on Rows or other data). You cannot write tests for lambda expressions, but you could write tests for a helper function such as compute-wages to verify that it is working correctly.

If the computation is more complex than a 1-line expression, use a helper function (either on Rows or other data). Not only will your code be more readable, more complex expressions should be tested. An example would be if your computation requires anif-expression with else ifs (such as if we needed to determine if an employee's salary is taxed at different rates and needed, in the wages example).

If the computation gets used in a context outside of Tables, use a helper function that takes in data other than Rows (such as our get-total-wage function).

Using lambda expressions when working with multiple tables

In this section, we are using the problems statement and plan from Part 1 of this document. The plan was:

Find the row in the student table that matches the given student
Extract the semester from the row
Find the row in the advisor table that matches the extracted semester
Extract the advisor name from the row

How do we translate this to working with our Table operations? First, we define our function:

fun get-advisor(student-name :: String, student-table :: Table, advisor-table :: Table) -> String

We will be writing the body of this function, so we have the following names available to us from the inputs student-name, student-table, advisor-table.

1. Find the row in the student table that matches the given student

To find a row in a table, we filter the table (using filter-with) with a lambda expression that will result in a table that only contains the row we need. If we assume that student names are unique in the table, the criteria simply needs to be that the "name" column of the row matches the given student name. Once we have a Table with one Row, we can use row-n(0) to get that row (since the row in question is guaranteed to be the first row):

student-row = filter-with(student-table, lam(r): r["name"] == student-name end).row-n(0)

Note that we can use function inputs in lambda expressions, as long as the whole line of code appears in the body of the function.

2. Extract the semester from the row

We simply use bracket notation:

student-semester = student-row["semester"]

Note that we could have combined the expressions from steps 1 and 2 into one expression, e.g. student-semester = filter-with(student-table, lam(r): r["name"] == student-name end).row-n(0)["semester"]. Regardless of how we code the steps, it is still worthwhile to break the problem into two steps to remember that we are first finding a row, and then extracting a value from the row.

3. Find the row in the advisor table that matches the extracted semester

Now we have a new named piece of data to work with, student-semester. We can use the same searching idea from Step 1, but recognize now that we are searching within advisor-table for the "sem" column of a row to match student-semester:

advisor-row = filter-with(advisor-table, lam(r): r["sem"] == student-semester end).row-n(0)

4. Extract the advisor name from the row

Again, we use bracket notation: advisor-row["name"]. The final function looks like:

fun get-advisor(student-name :: String, student-table :: Table, advisor-table :: Table) -> String:
  student-row = filter-with(student-table, lam(r): r["name"] == student-name end).row-n(0)
  student-semester = student-row["semester"]
  advisor-row = filter-with(advisor-table, lam(r): r["sem"] == student-semester end).row-n(0)
  advisor-row["name"]
end

When juggling multiple tables, writing down a plan helps us keep track of which information we need from which table. Then, it is a matter of using operations we've already learned for single tables to code up the steps.

Lambda expressions for Table functions (addendum to DCIC)

Addendum to Section 4.1.4

Simplifying the textbook example without writing a helper function

Trying out expressions on Rows

From specific expressions to lambdas

Helper functions with inputs other than rows

Using lambda expressions when working with multiple tables

1. Find the row in the student table that matches the given student

2. Extract the semester from the row

3. Find the row in the advisor table that matches the extracted semester

4. Extract the advisor name from the row

Read more

CS200 Bridge 4: Structured data for formatting

CS0111 - Spring 2025: Final Exam Prep Guide

CS200 Bridge 5: Running Times

From CS111 to CS200 (Current Version)

Trying out expressions on `Row`s