Lab 8: Tables, Lists, Datatypes, and Trees – Oh My!

This lab is an important junction for two reasons: it's where those aiming for 200 should start ramping up to harder problems, and it's our last Pyret lab before we start Python. As a result, this lab is structured with some options for you to choose from.

Everyone will do the first set of problems. Further down, you'll see two options:

Option 1 (writing functions over lists of datatypes). If you are still getting the hang of writing functions over lists, work on these problems (we will continue to write programs over lists in Python, so this is useful practice).
Option 2 (programming over trees). If you are headed to 200, work on Option 2. Those not heading to 200 but who feel comfortable with writing list functions are also welcome to work on option 2.

Learning Objectives

Explain differences between data structures
Choose the correct data structures based on program goals and specifications

Resources

Lab 8 Presentation Slides

Designing Data Structures (Required for Everyone)

In lecture, we have done a couple of problems where we give you a problem scenario and ask you to figure out how to use tables, lists, and datatypes to capture the data. And now we have trees too!

The ability to figure out the structure of data needed for a problem is a critical skill in computing. Even if you never write code again after you leave CS111, you may need to think about a collection of data for a project and what shape it has (with someone else writing the code to process that shape). The problems in this section have you practice this skill (which you can expect to be tested on the the final exam).

You are not writing any functions for this section.

Warmup – Reviewing Data Structure Features

When we are matching a data structure to a scenario, questions we ask ourselves include:

Does the data feature a fixed number of components, or an unknown/inconsistent number of components? For example, say you're recording data on dogs. In one scenario, you are guaranteed to get each dog's name, age and breed. In another scenario, there are no guarantees about which dog attributes you will receive. For example:

Scenario 1:

Name	Age	Breed
Bowser	4	Bulldog
Crowley	2	Ferret
Spot	10	Lab

Scenario 2:

Bowser
Crowley, 2 years old, ferret, white and brown stripes, loves long walks on the beach
Spot, Lab, blue collar, owner is Ashley

Is the data meant to represent a single object (one-dimensional) or some collection of n objects (two-dimensional)? For example, say we're representing an Item in a store versus a Shopping Cart, which contains some indefinite number of Items. What's different about how we need to represent these?
If there are sequence-style relationships across parts of the data, are there branches/options in those relationships? For example, say we're representing a Shopping Cart versus a Family Tree. Both need to contain some indefinite sequence of data, but are different in how the items in the sequence relate.

Task 1: For each of the above three questions, write out which data structure(s) you should use in each case of the question. When you are done, check your work against the summary in the expandable section below.

Data structure summary

Use a datatype when the number of components is fixed or a list if the number of components is not fixed
Use a table when the data are two-dimensional with fixed number of components, datatype when one-dimensional with fixed number of components. Lists within lists can capture multiple dimensions with non-fixed numbers of components
Trees when there are branches/options, lists when there are no options and the sequences can vary in length, otherwise datatypes can work

Scenario Exercises

You'll read some scenarios for this part of the lab. As you read each scenario, make sure to document your choices and thoughts in this Google Form. For each of these scenarios, complete the following two tasks:

Task 2: Which combination of tables, lists, datatypes, and trees might you use to capture the data in the scenario? You might find it useful to sketch a picture of the scenario to help you think through this.

Task 3: Expand the section marked "Data Structure Proposals" to see some specific datatype proposals for the scenario. Talk through an evaluation of whether the data structure makes sense, citing strengths and weaknesses of the proposal.

A datatype "makes sense" if it:

reflects the key relationships among data in the scenario
can accommodate larger versions of the scenario
makes choices consistent with the strengths of each data structure (the criteria from the warmup)

Scenario 1: An Elimination-Round Tournament

Consider a tournament that is played in elimination rounds: Some number of teams (usually a power of 2 such as 2, 4, 8, 16, …) enters the tournament. Teams are paired up for round 1. The winner from each round advances to the next, playing another winner from that same round. Each team plays at most once per round. The tournament finishes after a round with only two teams (which gives the overall winner).

Data structure proposals

# option 1 (a feeder match refers to a previous match 
that some team won to qualify for their current match)
datatype Match:
  | first-round(team1 :: String, 
                team2 :: String)
  | later-round(team1 :: String, 
                team2 :: String,
                feeder1 :: Match, 
                feeder2 :: Match)
end

# option 2
table: round :: Number, team1 :: String, team2 :: String
  ...
end

Scenario 2: A Round-Robin Tournament

Now consider a round-robin tournament, in which every team plays every other team, and the team with the best statistics (combination of points scored, score differences, etc) is the winner.

Data structure proposals

# option 1
table: team1 :: String, team2 :: String, score1 :: Number, score2 :: Number
  ...
end

# option2
data Match:
  | match(team1 :: String, team2 :: String, score1 :: Number, score2 :: Number)
end
Tournament is a List<Match>

Scenario 3: A Guessing Game

Imagine a guessing game, in which a player is asked to think of an object/item/etc. The computer then asks a series of yes/no questions before making a guess as to what the object might be. Here are two sample sequences of questions and answers.

Computer: Does it have four legs?
Player: yes
Computer: Does it fit in your hand?
Player: yes
Computer: I guess a mouse

Computer: Does it have four legs?
Player: no
Computer: Does it have two legs?
Player: yes
Computer: Is it active at night?
Player: yes
Computer: I guess an owl

The guessing game program needs a data structure to capture the questions that the computer will ask, including the different "next" questions depending on the answer (yes or no). The question sequences could be long.

Data structure proposals

datatype GuessingGame1:
  | questions(first-ques :: String,
              yes-questions :: List<String>,
              no-questions :: List<String>)
end

datatype GuessingGame2:
  | guess(item :: String)
  | question(ques :: String, 
             if-yes :: GuessingGame2, 
             if-no :: GuessingGame2)
end

Scenario 4: Multiple-choice Guessing Game

Imagine a similar guessing game, but now the questions are multiple choice, with different numbers of answers per question. For example, now the computer might ask "How many legs does it have", with answers 0, 2, 4, and 100 all being acceptable answers.

Data structure proposals

# option 1
data MCGuessing1:
  | guess(item :: String)
  | question(choice1 :: MCGuessing1,
             choice2 :: MCGuessing1,
             choice3 :: MCGuessing1,
             choice4 :: MCGuessing1)
end

# option 2
data MCGuessing2:
  | guess(item :: String)
  | question(ques :: String, 
             options :: List<MCGuessing2>)
end

# option3
data MCGuessing3:
  | guess(item :: String)
  | question(ques :: String, 
             options :: List<Choice>)
end
data Choice:
  | choice(label :: String, 
           nextques :: MCGuessing3)
end

Consider the connections between friends in a social network. Each person has some number of friends, who in turn have other friends, and so on. There may be multiple "friendship chains" that connect one person to another.

Data structure proposals

# option 1
data Person:
  | person(name :: String, friends :: List<Person>)
end
A social network is a List<Person>

# option 2
table name :: String, friends :: List<String>
  ...
end

These are all examples of real problems (or types of problems) that programmers work on, and choosing the appropriate data structure is a big part of approaching the programming tasks for these problems. If you go on to take CS200, you will get more practice programming with these sorts of data structures.

CHECKPOINT: Call over a TA once you reach this point.

Data Structures Reflection

Task 4: Read this short excerpt from “Generalizing Functions: A Powerful Tool in Programming.”:

“Generalizing functions is a powerful technique in programming. It allows us to design functions that can be applied to various scenarios by factoring out common patterns. By creating higher-order functions, we can define general methods of computation that can be shared and reused. This not only improves code efficiency but also enhances code maintainability.”

Task 5: Take a moment to think about the summary_generator function you coded in Project 1. And discuss the following questions with your lab partner:

What are the benefits of creating more generalized functions (like summary_generator) as opposed to specific functions (such as a function that specifically generates plots of mean vs. categories and another function that specifically generates plots of median vs. various categories)?
On the other hand, what are the drawbacks of doing this? What/who could be left out?

Task 6: Now take a moment to think about the data structure exercises you just completed. And discuss the following questions with your lab partner:

What are the benefits of creating more generalized data structures as opposed to specific data structures?
On the other hand, what are the drawbacks of doing this? What/who could be left out?

Matching Ads, One Last Time

The trees for the guessing game are examples of what are called decision trees: a series of questions about characteristics of an object or situation are asked, working down the tree until a decision (or guess) is reached at the bottom of the tree.

Some real-world ad-matching systems are based on decision trees. When a user is logged into a site, a program asks the site (e.g. Google) the questions in the tree about the user (e.g., "what gender are they", "what age-range are they", "where do they live", etc): the bottom of the tree contains the ads to show with users with traits from the answers.

In theory, we have shown you enough programming at this point to write an ad-matcher based on decision trees. In practice, that program would be a bit harder than what you're prepared for. Those who come to CS200, however, will do a course project that constructs a decision-tree from a table of data about ad-like information.

Option 1: Practice with Lists in Datatypes

Consider the following datatype, which represents information about a single course:

data Course:
  | lecture(name :: String, roster :: List<String>)
end

For now, we're just considering lecture-based courses, where each course contains the list of names of students who are taking the course.

Task 7: Copy this datatype into your file and make a couple of example courses.

Task 8: Write a function that takes a Course and a student name and returns a course. The returned course should have the student name added to the roster.

Task 9: Some years, it feels like certain students' first names appear many times. Write a function count-name which takes a Course and a name and returns the number of students with that first name who are in the course roster. Assume every name in the roster is formatted like this 'FIRSTNAME LASTNAME'.

Do this with explicit recursion, rather than built-ins.

Hint: The question here is how to write a recursive function to process a list that is inside of a datatype. You should use two functions here: a recursive one on a list of strings, and a non-recursive one on the datatype that calls your recursive one.

Task 10: We want to represent more information about our students than just their names!

Create a new Student datatype that holds the student's name and semester level.
Create a new CourseStudent datatype. This should look very similar to the Course datatype, but it should use your Student datatype.

Task 11: Where in your count-name function would you need to make edits in order to accommodate this change in datatype? You don't need to write the code (unless you want the practice). The point is to think about what parts of your original code would change or stay the same with this extended definition.

Option 2: Working with Trees (200-bound/Challenge)

You will write several tree programs as part of the bridge work for CS200. For this lab, we want to make sure that you understand how recursion works on trees before you start writing programs to process them.

Consider the following program, which returns the name of one person with the given eye color from the given tree. Your goal is to understand which name gets returned and why.

Task 12: Look at the program and the where cases. Without running the code, predict what the code will produce on each case. It might be helpful to you to draw a tree diagram!

data AncTree:
  | noInfo
  | person(name :: String, born :: Number, eye :: String, parent-1 :: AncTree, parent-2 :: AncTree)
end

robert-tree = person("Robert", 1893, "brown", noInfo, noInfo)
laura-tree = person("Laura", 1920, "blue", noInfo, noInfo)
john-tree = person("John", 1920, "green", noInfo, robert-tree)
ellen-tree = person("Ellen", 1945, "brown", laura-tree, john-tree)
bill-tree = person("Bill", 1946, "green", noInfo, noInfo)
susan-tree = person("Susan", 1971, "blue", ellen-tree, bill-tree)
charlie-tree = person("Charlie", 1972, "green", noInfo, noInfo)
anna-tree = person("Anna", 1997, "blue", susan-tree, charlie-tree)

NOBODY-FOUND = "Nobody was found!"

fun find-eye(at :: AncTree, with-color :: String) -> String:
  doc: "return name of one person with given eye color (assume there is one)"
  cases (AncTree) at:
    | noInfo => NOBODY-FOUND
    | person(n, b, e, p1, p2) =>
      if (e == with-color): n
      else:
        find-eye-p1 = find-eye(p1, with-color)
        if not(find-eye-p1 == NOBODY-FOUND): find-eye-p1
        else: find-eye(p2, with-color)
        end
      end
  end
where:
  find-eye(anna-tree, "green") is ...
  find-eye(anna-tree, "brown") is ...
end

Task 13: In lecture, we showed using an execution diagram to track the evolution of recursive function calls. Draw an execution diagram for the program execution from the first where example, until the point when the if-expression evaluates to true. Indicate the order in which the functions cals get called and in which order they finish running and return a value. (If you color-code your diagram, this is easier to list off).

Execution diagram for has-horse from class

The function calls get called in the order: red, black, blue. The functions finish in the order: black, blue, red. If Princequillo had parents, we would have had to draw the recursive calls the code would make on those parents' parents, and we would have to be careful in remembering the order in which function calls get made and finish.

Use this image as a starting point:

Task 14: What is the sequence of values for p1 across the calls to find-eye, assuming we search for a name that is not in the tree? Assume we start with the call find-eye(anna-tree, "X"). Make a list of the calls to find-eye that occur in order, in a form like:

find-eye(anna-tree, "X")
find-eye(susan-tree, "X")
…

Then annotate each call with the value that p1 will have in the directory at the time that the call gets made. What do you notice about that sequence after you've finished it?

Task 15: Write a function all-names that takes an AncTree and returns a list of the names of all people.

Task 16: Make a copy of all-names and modify it so that it returns a list of only the names of people with blue eyes. Name this function all-blue-eyed.

Task 17: A predicate is a function that returns a boolean value. For example, whether or not a person has blue eyes could be written as a predicate like below. Write a couple of predicates that decides which people to include in the output list (like a form of filter on trees).

fun has-blue-eyes(at :: AncTree) -> Boolean:
  doc: "return whether or not a person has blue eyes"
  cases (AncTree) at:
    | noInfo => false
    | person(n, b, e, p1, p2) => e == "blue"
  end
where:
  has-blue-eyes(anna-tree) is true
  has-blue-eyes(charlie-tree) is false
end

Task 18: Make a copy of all-blue-eyed and modify it to take a predicate. Name this function all-predicate. Test this function using the predicates you wrote in Task 17.

CHECKPOINT: Call over a TA once you reach this point.

Install Python/VSCode if you haven't already

We start working in Python soon. If you never did the Python setup part of lab 0, please go back and do that now.

Brown University CSCI 0111 (Spring 2025)
Check out our course Spotify playlist!

Lab 8: Tables, Lists, Datatypes, and Trees – Oh My!

Learning Objectives

Resources

Designing Data Structures (Required for Everyone)

Warmup – Reviewing Data Structure Features

Scenario Exercises

Scenario 1: An Elimination-Round Tournament

Scenario 2: A Round-Robin Tournament

Scenario 3: A Guessing Game

Scenario 4: Multiple-choice Guessing Game

Scenario 5: Friends in a Social Network

CHECKPOINT: Call over a TA once you reach this point.

Data Structures Reflection

CHECKPOINT: Call over a TA once you reach this point. Be sure to share your takeaways from Task 5.

Matching Ads, One Last Time

Option 1: Practice with Lists in Datatypes

Option 2: Working with Trees (200-bound/Challenge)

CHECKPOINT: Call over a TA once you reach this point.

Install Python/VSCode if you haven't already

Read more

Homework 6

Working with VSCode

From CS111 to CS200 (Current Version)

Project 2 Spring-2025