Try   HackMD

CS20 Bridge Assignment: Binary Search Trees

[Work in Pyret, putting your work in a file called bsts-code.arr]

In class, we saw ancestor trees, in which we linked horses to their parents. The datatype we had for this was:

data HorseAncestry:
  | horse(name :: String, 
      year :: Number,
      country :: String,
      mom :: HorseAncestry,
      dad :: HorseAncestry )
  | no-info
end

Tree-shaped data (data in which multiple "next" pieces of data branch off a single datum) arises in various contexts in CS. Think of the layers of menus of options you select through when trying to call customer service: first you indicate the general nature of your call, then you get a more refined set of questions, and so on. Each question is a data point that branches off to other questions, each of which has their own set of questions, and so on.

Ancestor trees and phone-call menus are examples where our data is inherently tree-shaped: the relationships we want to capture branch off between data points. But in CS we also use trees in a more artificial manner: we impose a branching structure on data to allow us to work with that data more efficiently.

This assignment has you work with binary search trees (henceforth abbreviated as BST), a way of organizing a collection of data with order to help us find data items faster than searching in lists.

BSTs (on Numbers)

Imagine that you had a set of numbers such as

8, 5, 7, 1, 3, 12, 6, 10, 2, 4

and you wanted to organize it in a way that showed the ordering relationship among the numbers. You could make a sorted list, but as we have seen, checking whether a specific number is in a list can take time linear in the size of the list.

A binary search tree (BST) is a binary tree (meaning a tree with at most two branches from each node/point) in which every node has the following property:

Every element in the left subtree is smaller than 
the element at the root, every element in the right 
subtree is larger than the element at the root, and 
both the left and right subtrees are BSTs.

(this statement assumes no duplicates, but that is okay since we are modeling sets. If you want duplicates, change less-than to less-than-or-equal in the definition).

Here is one example BST on our sample set:

         10
       /    \
      4      12
    /   \
   2     7
 /  \   /  \
1   3  6    8
      /
     5

Practice (not for turn in): Draw two more BSTs on the same collection of numbers. Start with a different number at the top (called the root) and see what you come up with. Can you make distinctly different shapes?

Practice (not for turn in): Which of the following are BSTs on the numbers from 1 through 5?

      4             3            4          3
     / \           / \          / \       /   \
    5   3         2   5        2   5     1      5 
       / \       /   /        /           \    /
      1   2     1   4        1             2  4
                              \
                               3

A Datatype for BSTs

Generalizing from our ancestor tree, here's a general datatype for BSTs on numbers:

data BST:
  | leaf
  | node(val :: Any, 
         left :: BST,
         right :: BST)
end 

This has the same shape as the AncTree definition, but it is not specific to relationships on people. Instead, it allows any kind of data (through the type Any). It also uses the term leaf, instead of unknown, for the empty tree.

Searching in BSTs

Imagine you were given a BST representing a set of numbers, and were asked whether a specific number is in the set. How would you proceed?

  • If the tree is a leaf, return false
  • If the tree is not a leaf
    • if the number you're seeking is at the root, return true
    • If the number you are seeking is smaller than the number at the root, search in the left subtree
    • If the number you are seeking is larger than the number at the root, search in the right subtree

Practice (not for turn in): Using the same tree we provided, work through a couple of example searches, some of numbers in the tree and some not in the tree.

Task (to turn in): When you understand the algorithm, implement the function. Don't forget to include test cases.

fun has-elt(num :: number, t :: BST) -> Boolean

Note: even though the definition of BST is more general than just numbers, you can assume we are working with trees of numbers when writing your code.

Creating BSTs

How do we build-up a BST, however? For lists, we know how to use link to add an element. How do we add an element to a BST?

Assume you are given a BST and a number. You add an element to the BST by looking through the tree for the leaf where the number should be, then putting the number in that spot. For example, assume you wanted to insert the number 4 into the following tree:

         8
       /   \
      5     9
    /   \
   3     7

We'd start at the root, and ask whether 4 is less than or greater than 8. Less than, so we should insert 4 on the left (so that we maintain the requirement on a BST). Now we compare 4 to 5. Less than, so we insert on the left. Now compare 4 to 3. Larger than, so we insert on the right. But the right is a leaf, so that is where we put the 4. Here's the resulting BST:

         8
       /   \
      5     9
    /   \
   3     7
    \
     4

Notice that in this algorithm, new elements are always inserted at the leaves, not in the middle of the tree. This disrupts the tree as little as possible, while still maintaining the BST requirement.

Practice (not for turn in): By hand, follow the above algorithm to create BSTs for each of the following setups on the same numbers:

  • start with a leaf, then add 3, 1, 2, 4 in that order

  • start with a leaf, then add 4, 3, 2, 1 in that order

  • start with a leaf, then add 4, 2, 1, 3 in that order

Task: Implement add-elt (our description explained the algorithm, which inserts at leaves). Don't forget to write test cases (and writing them first could really help you here, especially in the leaf case).

fun add-elt(num :: number, to :: BST) -> BST

Also, follow the template code skeleton that we discussed on trees in lecture. It is easy to overthink this problem and get your code in a mess. The code should follow fairly naturally from the examples and template if you use them carefully.

Using BSTs for sorting

You thought you were done with sorting in the previous assignment, but it turns out there are some sorting algorithms that use secondary data structures to help sort the list. One such approach is "tree sort." Given a list, tree sort works in two steps:

  1. Build a BST by adding each element of the list to a tree (starting from the empty tree)
  2. Turn the BST into an ordered list by turning the left subtree into an ordered list, turning the right subtree into an ordered list, and appending these two lists along with the root element in the middle.

Task: Write a function with the definition

fun tree-sort(num-list :: List<Number>) -> List<Number>

that implements tree sort. You can assume that the elements in num-list will be unique.

Hint: the two steps in tree sort lend themselves well to two helper functions, one that takes in a List<Number> and outputs a BST and one that takes in a BST and outputs a List<Number>. Since the two functions run indepedently, code and test each of them in turn before combining them. For the second function in particular, it might help to draw some diagrams of what the following five BSTs should look like, turned into Lists (again, you do not have to turn the diagrams in)

4      4     4         6          4
      /       \      /   \      /   \
     2         6    5     7    2     6
                                   /   \
                                  5     7

Reflection

In a comment, describe any questions you have (if any) based on these exercises.

Handin

Turn in a file bsts-code.arr with your code.

Information about the Autograder

After you submit your work, Gradescope will run our autograder tests and display the results. It is your responsibility to make sure that all autograder tests pass.

  • If you get an error message saying the autograder failed to run, check to make sure that your code file is named bsts-code.arr and that the function names and inputs are exactly as asked for in the handout. Specifically for this assignment, make sure that your BST data declaration is exactly the same as in this handout (with a leaf and a node with the relevant fields)
  • If you are failing some tests, go back and write some tests of your own and reason through them in your code. Part of the transition from 111 to 200 is getting more practice reasoning about what a comprehensive set of tests looks like. What kind of input trees should you try to make sure that your algorithms are working correctly?

The autograder runs our tests on your add-elt, has-elt, and tree-sort implementations. Getting a full autograder score does not guarantee that you passed the assignment Milda will manually grade your code to make sure that the algorithms were implemented as described in the handout.