--- tags: Homeworks --- # Homework 4: Hashtables **Due Date: Monday, March 8th, anywhere on Earth time (Tuesday, March 9th, 8AM EDT)** ## Learning Objectives This assignment has you practice the following course objectives: * State the key structural and performance characteristics of core data structures: lists, trees, arrays, hashtables, heaps, and graphs * Understand what makes a data structure mutable or immutable at the level of implementation * Develop classes that protect how their data is used through access modifiers and appropriate choice of provided methods * Comment code effectively with Javadocs * Develop a collection of examples and tests that exercise key behavioral requirements aspects of a problem * Write programs that can compile and run against a provided interface * Write programs that behave as expected based on a provided description or collection of test cases ## Assignment Link Here is the [github classroom link](https://classroom.github.com/a/tDmUBbxR). Please make sure to read [our gradescope submission guide](https://hackmd.io/jC-SEMn_RpiGqMH_EhsDMw) before handing in your assignment! ## Handing In The source code files include: * `AbsHashTable.java` containing `public abstract class AbsHashTable<K, V> implements IDictionary<K, V>` * `IDictionary.java` containing `public interface IDictionary<K, V>` * `IGrep.java` containing `public interface IGrep` * `KeyNotFoundException.java` containing `public class KeyNotFoundException` * `KeyAlreadyExistsException.java` containing `public class KeyAlreadyExistsException` Do not alter any of the files in the `src` folder. That is, you should only alter the files that are in the `sol` folder: * `Grep.java`, containing `public class Grep implements IGrep` * `Chaining.java`, containing `public class Chaining<K, V> extends AbsHashTable<K, V>` * `Homework4TestSuite.java` containing `public class Homework4TestSuite` * `written-questions.txt`. This should have your answers to the written portions of this assignment. * `grep-test-files/`. This is a folder containing `.txt` files that you used while testing your Grep class. To hand in your files, submit them to Gradescope. - Submit `Grep.java`, `Chaining.java`, and `written-questions.txt` to the Homework 4: Implementation submission - Submit `Homework4TestSuite.java` and the `.txt` files in `grep-test-files` to the Homework 4: Testing submission. Once you have handed in your homework, you should receive an email, more or less immediately, confirming that fact. If you don't receive this email, try handing in again, or ask the TAs what went wrong. **Note:** If your code does not work with the autograder, you will be penalized. If there are any complications, follow these steps: * Make sure that your Testing submission passes the wheat. If it doesn't, you should use the autograder output and the message in the failed autograder test to debug your test suite. * If your test suite passes the wheat, run your implementation against your test suite. If this crashes, you should fix your solution code so that it is compatible with your test suite. * If you are still having issues, please reach out to us at TA Hours or Piazza. ## The Hunt Continues (Theme) You learn from reading Blueno's old love letters that, back when they first met, Blueno left various comments on *Untitled Lamp/Bear of Hamad International Airport*'s Piazza posts, marked by a characteristic yellow square. You wonder if following this digital paper trail might lead to another clue. You're reading Piazza to prepare for your homework, and you notice a resolved Piazza post with an unresolved followup containing the following limerick: *In search, you are, of the Bear of Blue?* *An array of light you so fervently pursue* *Linked key-value pairs* *Yield the blue n’ yellow bears* *...Hidden deep in the [Piazza FAQ!](https://piazza.com/class/kjqgmueegkh6j2?cid=933)* ## Java’s Built-in Hash Tables In this homework, you will be writing an application (DIY Grep) using Java’s built-in hashtables,and completing an implementation of hashtables from scratch (via chaining). Java has two styles of built-in hash tables, specifically, `HashMap` or `HashSet`. These classes differ in that the former uses a hash table to represent a dictionary (a mapping from some key to values), while the latter uses a hash table to represent a set, which is a special case of a *dictionary*; the keys are the elements of the set, and there are no values (Recall the invariant that dictionaries do not allow duplicate keys. That is why it makes sense to view sets as dictionaries). Consequently, it is straightforward to use a hash table to represent a set. You can use Java’s `HashMap` class with `import java.util.HashMap;`. Documentation on how to use Java’s HashMap can be found [here](https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html). Likewise, you can use Java’s HashSet class by importing `java.util.HashSet`. Documentation on how to use Java’s `HashSet` can be found [here](https://docs.oracle.com/javase/8/docs/api/java/util/HashSet.html). You should only use these in the Grep problem ## Searching in Files (Grep) ### (Background) Command-Line Grep The UNIX command grep is an extremely powerful and useful tool. You can use grep to search a file for a given pattern, and report where that pattern appears in the file, as follows: `grep -n <pattern> <file>` The -n is an optional parameter that tells grep that we want it to report line numbers. It prints out each line of text next to the line number. This is but one of many, many grep features, which are fully documented on its man page (which you can access, if you want to learn more, by typing ‘man grep’ into a terminal). Here’s a couple of examples of how you’d use grep with this line number option: `grep -n treasure myFile` This command would print out something like: `1: The treasure is under the lamp` `4: treasure` `9: They found a hint that led them to another piece of the treasure map` **Your output format will look different (see below), but this is the essence of what you are trying to replicate with your code.** ### DIY Grep Now we’re going to implement (a limited version of) `grep` for ourselves! **Task:** Explain how a hashtable could be used to make it easy to look up the line numbers associated with a given word in a multi-file. Your answer to this part is just in prose. Write your answer to this question in `written-questions.txt` **Task:** Write a class `Grep` with a constructor and a single method, `lookup`. This class should implement the `IGrep` interface in the source files for this assignment. Your constructor should take as input a filename and perform any necessary preprocessing. The `lookup` method should take as input a word and return a set of the line numbers on which that word appears in the file. It should operate in expected constant time. As noted above, you can (and should) make use of Java’s built-in hash table data structure, which is called `HashMap`, to solve this problem **Notes:** * If the same word appears more than once on the same line, you should include the line number only once in your output. * You should treat words as sequences of characters separated by whitespace; so "glitter" and "glitter!" are distinct words. Also, you can assume words are case sensitive; so "mystery" and "Mystery" are distinct words. **Hints:** * As part of the preprocessing step, you may want to use the `split` method in the `String` class, which splits up a string into pieces each time it encounters a specific character, and stores those pieces in an array of strings. For example ```java= String sentence = "This is a sentence." String[] words = sentence.split(" ") // words is an array: ["This", "is", "a", "sentence"] ``` * Look at Lab 4 for how to read from files. * Also you may need to catch and handle relevant exceptions. Think about exceptions like `FileNotFoundException` and/or `IOException` (Lab 4 Section 3.3) * The Java syntax to declare a `HashMap` that maps, for example, from a `String` to a Set of type `Integer` is `new HashMap<String, Set<Integer>>()`. Likewise, the syntax to declare a `HashSet` of `Integers` is `new HashSet<Integer>()`. **Task:** Write a main method for the Grep class. The `String[]` args should correspond to a file name, and then at least 1 other word to look for, in that order. **Note:** To take in arguments using IntelliJ, press the drop-down menu near the green “run” button and select edit configurations. In the “Program Arguments” field, put your program arguments (space separated). These arguments will appear in the `String[]` args variable in your main method. ![](https://i.imgur.com/wpTlDtb.png) ![](https://i.imgur.com/DhlJFz8.png) The output of running this should look something like: ``` traveller was found on lines: [1] on was found on lines: [3, 7, 9, 11] ``` **Task:** Add tests to `Homework4TestSuite.java` that test your `lookup` function. You do not need to test for exceptions that might be thrown on I/O errors (as you are catching those and printing a message, which can’t be tested). **Note:** We are only testing `lookup` here, not the main Grep program, because the latter simply prints output to the screen. In such cases, you test the inner logic, then manually inspect output build from the results of the inner logic. That is, you only need to test your `lookup` function, not the Grep constructor. **Hint:** We have a sample `.txt` file in `grep-test-files` for you to use. Please do not be afraid to create your own text files for testing, especially to try to catch edge cases! If you do this, make sure to hand in these files along with your `Homework4TestSuite.java` ## Chaining Hash Tables In this problem, you will implement a hash table using chaining. Recall that the internal data structure of a hash table implemented using chaining is an array of lists. You will use Java's `LinkedList` class in particular for the list implementation. The point of this problem is for you to see how hashtables are built when they don't already exist in a language. Therefore, you may not use Java's hashmaps or hashsets in your implementation. The `src` directory contains an abstract class named `AbsHashTable.java`. It defines a class for combining keys and values (as shown in Monday's lecture). Look at the `KVPair<K, V>` class stored inside. `AbsHashTable.java` implements an interface `IDictionary<K, V>`, which has the definitions of the methods we need to implement. **Task:** Write a class, `Chaining<K, V>` that extends `AbsHashTable`. This will be your hashtable implementation. The `Chaining` constructor should take as input a `size` variable, which specifies the size (number of slots) of the table. Your `Chaining` class needs to implement each of the abstract methods in `AbsHashTable` (abstract methods have headers, but not yet bodies). You are welcome to implement additional (helper) methods if you wish. - **Hint:** The key method to write in `Chaining` is `findKVPair`. Write this method first, and use it as a helper when you write `insert` and `delete`. - **Hint:** You will see that some of the abstract methods throw exceptions. The names of the exceptions should make it clear what each exception is used for. You should throw these exceptions where appropriate. If your code can reasonably handle an exception, you should catch it. But if you think that an exception should be passed back out to the user of your hashtable class, you may leave it uncaught. - **Hint:** Recall that hash tables are backed by a hashing function, which maps a key to an index in the array. You should use the built-in `hashCode` method on your keys for hashing. The Java built-in hash function returns a signed (either positive or negative) integer. Taking the absolute value of the integer will let you map it into an array index. **Note:** In Java, you cannot create an array of a generic type. In order to circumvent this limitation, you should do the following to create your hash table: ```java= this.data = (LinkedList<KVPair<K,V>>[]) new LinkedList[size]; ``` This situation is one of a few exceptions (`equals` is another) when casting is acceptable; in general, however, it's still discouraged. This line of code will generate a warning that there is an unchecked cast. To get rid of the warning, above any methods that cast like this, you should write: ```java= @SuppressWarnings("unchecked") ``` Typically, warnings such as these indicate a problem with your code, so you should not suppress them; you should pay attention to them! In this specific instance, however, we justify its use as we are trying to get around a `Java` limitation. **Task:** Test your `insert` and `delete` methods. You do not need to test `findKVPair`, or any helper methods that you write in order to write `insert` and `delete`. With this in mind, consider the access modifiers that you should put on fields and methods in your `Chaining` class. Put your tests in the `Homework4TestSuite.java` file. - Since your class is generic, be sure to use multiple types in your testing. Additionally, since we are working with a mutable data structure, be sure to have setup methods. - You might find the `lookup` and `update` methods in `AbsHashTable` helpful in testing `insert` and `delete` - Don't forget to test exceptions! You should test for the `KeyNotFound` and `KeyAlreadyExists` exceptions ## Hash Table Iterator :::info [**Added for clarification 3/4, 10am**] When we implemented our own `LinkList` class in lecture, we talked about wanting to enable a programmer to write `for`-loops over our lists. To do this, we needed to have our `LinkList` class implement an interface called `Iterable` and to provide an `iterator` object. In this part of the assignment, we want you to support `for`-loops over hashtables, where the loop would **iterate over all of the keys** that have values in the hashtable. This could be used to count how many keys are in the hashtable, or to count how many hashed values satisfied some condition, for example. To be clear, your goal is to support `for`-loops over the **keys**, *not the slots of the underlying array*. Section 2.3 of the notes for lecture 12 gives step-by-step instructions for adding support for iterators to a class. Follow those steps to make your `Chaining` class support `for`-loops. ::: Whenever you implement a collection (like a list or a hashtable) you should override `iterator`. In this problem, you will think about how to do this for hash tables. (You may assume that the hash table does not change while you are iterating; no items are inserted or deleted.) One way to iterate through a chained hashtable (at the implementation level) is to visit all the slots in the hash table: if a slot is empty, you can skip right over it; but if it is not empty, you would then iterate over the bucket stored at that slot. **Task**: Implement an iterator for your hashtable implementation based on the method just described. **Hint:** See [Lecture 12 Section 2.2](https://brown-cs18-master.github.io/content/lectures/12iterators/12iterators.pdf) for how to write an iterator method. **Task:** Use your iterator to implement `equals` and `toString` functions in`Chaining.java`. Write tests for your `equals` method in `Homework4TestSuite.java`. You do not need to write tests for your `toString` method. (Added 3/4/21: the format of your `toString` method is up to you; it should present the key-value pairs present in the hashtable). **Hint:** See [Lecture 6](https://brown-cs18-master.github.io/content/lectures/06fieldmutation/06fieldmutation.pdf) for how to implement your own equals method. **Task:** The iterator you just implemented examines all *n* slots in the chaining hash table, even though there might not be data stored at all, or even most, of them. Describe a more efficient chaining iterator that only examines slots which store data. Your iterator should not affect the run time of the hash table’s basic operations; that is, the runtimes of `lookup`, `insert`, `update`, and `delete` should not change. Describe your propsed new iterator in `written-questions.txt` **You don't have to implement this more efficient iterator. We are just asking you to explain how it would work** **Note:** You cannot simply store the keys in Java’s HashSet and iterate through this set. Your solution should not involve iterating through another HashSet or HashTable (note that these two data structures would have similar iterators) as this would be using the solution to the problem to solve the problem! **Hint:** Consider augmenting the key-value pairs stored in your dictionary with additional fields. **Task:** Discuss the trade-offs between the naive iterator that you coded, and the more efficient iterator that we asked you to write about. Write your thoughts in `written-questions.txt` --- ## FAQ **Can the input for grep be multiple words?** Yes. **Should we test constructor exceptions?** No (TODO: tentative-double check) **If I write a helper for grep, will that be constant time?** Not necessarily, and you shouldn't need one. Do any preprocessing in the constructor. Then your actual grep function should be able to be constant time. (Remember what constant time means: an operation is constant time if it takes the same amount of time regardless of the size of the data structure.) [TODO: UPDATE BASED ON THE CLARIFICATIONS ADDED TO THE HW] Please let us know if you find any mistakes, inconsistencies, or confusing language in this or any other CS 18 document by filling out our [anonymous feedback form](https://forms.gle/ym89rAZyBk8Hg3LNA).