--- tags: Homeworks-Summer21 --- # Homework 4: Hashtables **Due Date: Monday, June 28th, anywhere on Earth time (Tuesday, June 29th, 8AM EDT)** ## Learning Objectives This assignment has you practice the following course objectives: * State the key structural and performance characteristics of core data structures: lists, trees, arrays, hashtables, heaps, and graphs * Understand what makes a data structure mutable or immutable at the level of implementation * Develop classes that protect how their data is used through access modifiers and appropriate choice of provided methods * Comment code effectively with Javadocs * Develop a collection of examples and tests that exercise key behavioral requirements aspects of a problem * Write programs that can compile and run against a provided interface * Write programs that behave as expected based on a provided description or collection of test cases ## Assignment Link Here is the [Github Classroom link](https://classroom.github.com/a/1_5j8sUZ). If you're having trouble with IntelliJ, check out our [IntelliJ Common Problems](https://hackmd.io/webHvA8yTCqGOAKXYgrI_A) handout. Please make sure to read [our gradescope submission guide](https://hackmd.io/jC-SEMn_RpiGqMH_EhsDMw) before handing in your assignment! ## Handing In The source code files include: * `IDictionary.java` containing `public interface IDictionary<K, V>` * `IGrep.java` containing `public interface IGrep` * `KeyNotFoundException.java` containing `public class KeyNotFoundException` * `KeyAlreadyExistsException.java` containing `public class KeyAlreadyExistsException` Do not alter any of the files in the `src` folder. That is, you should only alter the files that are in the `sol` folder: * `Grep.java`, containing `public class Grep implements IGrep` * `Chaining.java`, containing `public class Chaining<K, V> implements IDictionary<K, V>` * `Homework4TestSuite.java` containing `public class Homework4TestSuite` * `grep-test-files/`. This is a folder containing `.txt` files that you used while testing your Grep class. To hand in your files, submit them to Gradescope. - Submit `Grep.java` and `Chaining.java` to the **Homework 4: Implementation** submission - Submit `Homework4TestSuite.java` and the `.txt` files in `grep-test-files` to the **Homework 4: Testing** submission. - Complete the **Homework 4: Written** assignment on Gradescope (in a quiz format, but untimed and homework collaboration policy still applies, just like normal written questions). Once you have handed in your homework, you should receive an email, more or less immediately, confirming that fact. If you don't receive this email, try handing in again, or ask the TAs what went wrong. **Note:** If your code does not work with the autograder, you will be penalized. If there are any complications, follow these steps: * Make sure that your Testing submission passes the wheat. If it doesn't, you should use the autograder output and the message in the failed autograder test to debug your test suite. * If your test suite passes the wheat, run your implementation against your test suite. If this crashes, you should fix your solution code so that it is compatible with your test suite. * If you are still having issues, please reach out to us at TA Hours or Ed. ## The Hunt Continues... :::spoiler Optional theme material You learn from reading Blueno's old love letters that, back when they first met, Blueno left various comments on *Untitled Lamp/Bear of Hamad International Airport*'s Ed posts, marked by a characteristic yellow square. You wonder if following this digital paper trail might lead to another clue. You're reading Ed to prepare for your homework, and you notice a resolved Ed post with an unresolved followup containing the following limerick: *In search, you are, of the Bear of Blue?* *An array of light you so fervently pursue* *Linked key-value pairs* *Yield the blue n’ yellow bears* *...Hidden deep in the [Ed FAQ!](https://edstem.org/us/courses/5623/discussion/490971)* ::: ## Java’s Built-in Hash Tables In this homework, you will be writing an application (in the Grep class) using Java’s built-in hashtables, and completing an implementation of hashtables from scratch (in the Chaining class). Java has two styles of built-in hash tables, specifically, `HashMap` or `HashSet`. These classes differ in that a `HashMap` uses a hash table to represent a *dictionary* (a mapping from some keys to values) as you have seen in lecture. Hashing has other uses, too. A `HashSet` uses a hash table to represent a set by implementing the Java interface `Set` (the same way `LinkedList` or `ArrayList` implement `List`) The `Set` interface is similar to other collections of objects, with the extra requirement that no two elements are the same (as determined by `.equals` equality). So, if we call a `Set`'s `add` method on a object already contained within the `Set`, the `Set` will not change (see the documentation for more details). A hash table is a straightforward way to represent a set, as a set can be seen as a special case of a *dictionary*. Since dictionaries do not allow duplicate keys, we can view sets as dictionaries whose keys do not map to any values! You can use Java’s `HashMap` class by copy pasting `import java.util.HashMap;` at the top of your file (though IntelliJ may also automatically import it if you use `HashMap` in your code). Documentation on how to use Java’s HashMap can be found [here](https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html). Likewise, you can use Java’s `HashSet` class by importing `java.util.HashSet` (documentation found [here](https://docs.oracle.com/javase/8/docs/api/java/util/HashSet.html)) and Java's `Set` interface by importing `java.util.Set` (documentation found [here](https://docs.oracle.com/javase/8/docs/api/java/util/Set.html)). **You are allowed to use these Java built-ins in `Grep.java`, but not in `Chaining.java`.** ## Searching in Files (Grep) ### (Background) Command-Line Grep The UNIX command `grep` is an extremely powerful and useful tool (see lab 1 or lab 4's Just for Fun sections if you're not familiar with the command line). You can use `grep` to search a file for a given pattern, and report where that pattern appears in the file, as follows: `grep -n <pattern> <file>` The -n is an optional parameter that tells grep that we want it to report line numbers. It prints out each line of text next to the line number. This is but one of many, many grep features, which are fully documented on its man page (which you can access, if you want to learn more, by typing `man grep` into a terminal). Here’s a couple of examples of how you’d use grep with this line number option: `grep -n treasure myFile` This command would print out something like: `1: The treasure is under the lamp` `4: treasure` `9: They found a hint that led them to another piece of the treasure map` **Your output format will look different (see below), but this is the essence of what you are trying to replicate with your code.** ### DIY Grep Now we’re going to implement (a limited version of) `grep` for ourselves! **Task:** Write a class `Grep` with a constructor and a single method, `lookup`. This class should implement the `IGrep` interface in the source files for this assignment. Your constructor should take as input a filename and perform any necessary preprocessing. The `lookup` method should take as input a word and return a set of the line numbers on which that word appears in the file. *Note that `lookup` should operate in expected constant time.* As noted above, you can (and should) make use of Java’s built-in hash table data structure, which is called `HashMap`, to solve this problem. **Notes:** * If the same word appears more than once on the same line, you should include the line number only once in your output. * You should treat words as sequences of characters separated by whitespace; so "glitter" and "glitter!" are distinct words. Also, you can assume words are case sensitive; so "mystery" and "Mystery" are distinct words. **Hints:** * Start by creating a short (2-3 line file) example on paper and drawing out what the resulting hashmap contents would be. This will help you figure out what types you need for your hashmap before you start coding. * As part of the preprocessing step, you may want to use the `split` method in the `String` class, which splits up a string into pieces each time it encounters a specific character, and stores those pieces in an array of strings. For example ```java= String sentence = "This is a sentence." String[] words = sentence.split(" ") // words is an array: ["This", "is", "a", "sentence"] ``` * Look at Lab 4 for how to read from files. * Ensure that you catch and handle relevant exceptions by printing an informative message and "exiting gracefully" (meaning that you end the method without throwing the exception). Think about exceptions like `FileNotFoundException` and/or `IOException` (Lab 5) * The Java syntax to declare a `HashMap` that maps, for example, from a `String` to a `Set` of type `Integer` is `new HashMap<String, Set<Integer>>()`. Likewise, the syntax to declare a `HashSet` of `Integers` is `new HashSet<Integer>()`. **Task:** Write a main method for the Grep class. The `String[]` args should correspond to a file name, and then at least 1 other word to look for, in that order. The output of running the main method should look something like: ``` traveller was found on lines: [1] on was found on lines: [3, 7, 9, 11] ``` **Your main method should not throw any exceptions;** make sure it prints out an informative message and exits gracefully instead of throwing any exceptions that may arise from incorrect user input. **Note:** To take in arguments using IntelliJ, press the drop-down menu near the green “run” button and select edit configurations. In the “Program Arguments” field, put your program arguments (space separated). These arguments will appear in the `String[]` args variable in your main method. ![](https://i.imgur.com/wpTlDtb.png) ![](https://i.imgur.com/DhlJFz8.png) **Task:** Add tests to `Homework4TestSuite.java` that test your `lookup` function. You do not need to test for exceptions that might be thrown on I/O errors (as you are catching those and printing a message, which can’t be tested). **Note:** We are only testing `lookup` here, not the main Grep program, because the latter simply prints output to the screen. In such cases, you test the inner logic, then manually inspect output build from the results of the inner logic. That is, you only need to test your `lookup` function, not the Grep constructor. **Note:** We have a sample `.txt` file in `grep-test-files` for you to use. Please do not be afraid to create your own text files for testing, especially to try to catch edge cases! If you do this, **make sure your filepaths are of the format `"grep-test-files/filename.txt"`** (they shouldn't look like `"/Users/yourname/morefolders/grep-test-files/ozymandias.txt"`) and **make sure to hand in these text files** along with your `Homework4TestSuite.java` to the Testing submission. ## Chaining Hash Tables :::warning This will be covered in lecture **Monday, June 21** ::: In this problem, you will implement a hash table using chaining. Recall that the internal data structure of a hash table implemented using chaining is an array of lists. You will use Java's `LinkedList` class in particular for the list implementation. The point of this problem is for you to see how hashtables are built when they don't already exist in a language. Therefore, you may not use Java's hashmaps or hashsets in your implementation. The `sol` directory contains a class named `Chaining.java`. It defines a class for combining keys and values (as shown in Monday's lecture) called `KVPair<K, V>` (stored inside the `Chaining` class). `Chaining.java` implements an interface `IDictionary<K, V>`, which has the definitions of the methods we need to implement. **Task:** Write a class, `Chaining<K, V>` that implements `IDictionary<K, V>`. This will be your hashtable implementation. We have already written `lookup` and `update` for you; you need to write the constructor, `insert`, and `delete`. The `Chaining` constructor should take as input a `size` variable, which specifies the size (number of slots) of the table. Attempts to make a hashtable with a negative or 0 size should throw an `IllegalArgumentException` with the error message `"Invalid size."`. You are welcome to implement additional (helper) methods if you wish. - **Hint:** The key method to write in `Chaining` is `findKVPair`. Write this method first, and use it as a helper when you write `insert` and `delete`. - **Hint:** You will see that some of the stencil methods throw exceptions. The names of the exceptions should make it clear what each exception is used for. You should throw these exceptions where appropriate. If your code can reasonably handle an exception, you should catch it. But if you think that an exception should be passed back out to the user of your hashtable class, you may leave it uncaught. - **Hint:** Recall that hash tables are backed by a hashing function, which maps a key to an index in the array. You should use the built-in `hashCode` method on your keys for hashing. The Java built-in hash function returns a signed (either positive or negative) integer. Taking the absolute value of the integer will let you map it into an array index. **Note:** In Java, you cannot create an array of a generic type. In order to circumvent this limitation, you should do the following to create your hash table: ```java= this.data = (LinkedList<KVPair<K,V>>[]) new LinkedList[size]; ``` This situation is one of a few exceptions (`equals` is another) when casting is acceptable; in general, however, it's still discouraged. This line of code will generate a warning that there is an unchecked cast. To get rid of the warning, above any methods that cast like this, you should write: ```java= @SuppressWarnings("unchecked") ``` Typically, warnings such as these indicate a problem with your code, so you should not suppress them; you should pay attention to them! In this specific instance, however, we justify its use as we are trying to get around a `Java` limitation. **Task:** Test your `insert` and `delete` methods. You do not need to test `findKVPair`, or any helper methods that you write in order to write `insert` and `delete`. With this in mind, consider the access modifiers that you should put on fields and methods in your `Chaining` class. Put your tests in the `Homework4TestSuite.java` file. - Since your class is generic, be sure to use multiple types in your testing. Additionally, since we are working with a mutable data structure, be sure to have setup methods. - You might find the `lookup` and `update` methods helpful in testing `insert` and `delete` - Don't forget to test exceptions! You should test for the `KeyNotFound` and `KeyAlreadyExists` exceptions ## Written Questions In Lecture 13, we looked at defining *iterators* to enable using a for-loop to visit all elements of a data structure. Rather than have you write an iterator as part of this assignment, we instead want you to think about them more conceptually by answering written questions, which we have put into a Gradescope quiz form. However, the assignment is not a quiz --- there are no time limits, and you can use normal homeowrk collaboration policy rules to complete it! **Task:** Complete the assignment on Gradescope under Homework 4: Written. ## FAQ **Can the input for grep be multiple words?** Yes. **If I write a helper for grep, will that be constant time?** Not necessarily, and you shouldn't need one. Do any preprocessing in the constructor. Then your actual grep function should be able to be constant time. (Remember what constant time means: an operation is constant time if it takes the same amount of time regardless of the size of the data structure.) --- *Please let us know if you find any mistakes, inconsistencies, or confusing language in this or any other CS18 document by filling out the (mostly) [anonymous feedback form](https://cs.brown.edu/courses/cs018/feedback)!*