--- tags: hw, spr22 --- # Homework 3: Hashtables ### Due: Thursday, March 10th, 2022 at 11:59pm ET **Collaboration Policy:** This assignment uses the default course [collaboration policy](https://hackmd.io/wStcp9Z4TH641sgw7r6Zyw?view). Roughly, you may discuss conceptual issues, but you must code alone. See the policy for details and examples of what is and isn't allowed. This [HW3 FAQ](https://edstem.org/us/courses/16807/discussion/1238725) post will be maintained on Ed. ## Learning Objectives This assignment has you practice the following course objectives: - Implementing common data structures - Working with exceptions - Understanding hashtables - Writing good test cases - Exploring social impacts of technology ## Assignment Link Use this [Github Classroom link](https://classroom.github.com/a/k5p_Gt8H) to get the stencil code for this assignment. ## Setup If you're having trouble with IntelliJ, our [First Time Setup Guide](https://hackmd.io/7azYQstzS4a6udptmaADCw) includes a common bugs/FAQ section. Please make sure to read [our Gradescope submission guide](https://hackmd.io/oTbF1IUuRs27VgVJF4z2rA) before handing in your assignment! ## Style Expectations For all future assignments, including this one, we expect your code to conform to the [CS200 Java Style Guide](https://hackmd.io/io7Gock2TZqC6UnBwpjP2w). ## Stencil Overview The `src` folder has the following files (**which you should not modify**): * `IDictionary.java` containing `public interface IDictionary<K, V>` * `IGrep.java` containing `public interface IGrep` * `KeyNotFoundException.java` containing `public class KeyNotFoundException` * `KeyAlreadyExistsException.java` containing `public class KeyAlreadyExistsException` You will, however, alter the files that are in the `sol` folder: * `Grep.java`, containing `public class Grep implements IGrep` * `Chaining.java`, containing `public class Chaining<K, V> implements IDictionary<K, V>` AND `private static class KVPair<K, V>` * `Homework3TestSuite.java` containing `public class Homework3TestSuite` You will also alter the `grep-test-files/` directory. This is a folder containing one sample `.txt` file that you can use to test your Grep class. We recommend making more files in that directory to help with your testing (think edge cases!). <!--- If you're having trouble with IntelliJ, check out our [IntelliJ Common Problems](https://hackmd.io/webHvA8yTCqGOAKXYgrI_A) handout. Please make sure to read [our gradescope submission guide](https://hackmd.io/jC-SEMn_RpiGqMH_EhsDMw) before handing in your assignment! ---> ## Java’s Built-in Hash Tables In this homework, you will be writing an application (in the Grep class) using Java’s built-in hashtables, and completing an implementation of hashtables from scratch (in the Chaining class). Java has two styles of built-in hash tables, specifically, `HashMap` or `HashSet`. These classes differ in that a `HashMap` uses a hash table to represent a *dictionary* (a mapping from some keys to values) as you have seen in lecture. Hashing has other uses, too. A `HashSet` uses a hash table to represent a set by implementing the Java interface `Set` (the same way `LinkedList` or `ArrayList` implement `List`) The `Set` interface is similar to other collections of objects, with the extra requirement that no two elements are the same (as determined by `.equals` equality). So, if we call a `Set`'s `add` method on a object already contained within the `Set`, the `Set` will not change (see the documentation for more details). A hash table is a straightforward way to represent a set, as a set can be seen as a special case of a *dictionary*. Since dictionaries do not allow duplicate keys, we can view sets as dictionaries whose keys do not map to any values! You can use Java’s `HashMap` class by copy pasting `import java.util.HashMap;` at the top of your file (though IntelliJ may also automatically import it if you use `HashMap` in your code). Documentation on how to use Java’s HashMap can be found [here](https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html). Likewise, you can use Java’s `HashSet` class by importing `java.util.HashSet` (documentation found [here](https://docs.oracle.com/javase/8/docs/api/java/util/HashSet.html)) and Java's `Set` interface by importing `java.util.Set` (documentation found [here](https://docs.oracle.com/javase/8/docs/api/java/util/Set.html)). **You are allowed to use these Java built-ins in `Grep.java`, but not in `Chaining.java`.** ## Searching in Files (Grep) ([Gimme More Line Numbers](https://www.youtube.com/watch?v=THuJja1hFqc)) ### (Background) Command-Line Grep The UNIX command `grep` is an extremely powerful and useful tool (see [this activity](https://hackmd.io/D6jv55ieSJC6_8hOlRPXFQ?both) if you're not familiar with the command line). You can use `grep` to search a file for a given pattern, and report where that pattern appears in the file, as follows: `grep -n <pattern> <file>` The -n is an optional parameter that tells grep that we want it to report line numbers. It prints out each line of text next to the line number. This is but one of many, many grep features, which are fully documented on its man page (which you can access, if you want to learn more, by typing `man grep` into a terminal). Here’s a couple of examples of how you’d use grep with this line number option: `grep -n treasure myFile` This command would print out something like: `1: The treasure is under the lamp` `4: treasure` `9: They found a hint that led them to another piece of the treasure map` **Your input/output format will look different (see below), but this is the essence of what you are trying to replicate with your code.** ### DIY Grep Now we’re going to implement (a limited version of) `grep` for ourselves! In your `sol` directory, we've given you a class `Grep` that implements the `IGrep` interface. This class has: * A constructor with one parameter, a `String` with the filename that we'll be `grep`ing * A method `lookup` with one parameter, a `String` representing the word to look up. This method returns a `Set` of `Integer`s representing the line number(s) that the word appears on We'll be working within this class to write our own implementation of `grep`. *As noted above, you can (and should) make use of Java’s built-in hash table data structure, which is called `HashMap`, to solve this problem.* **Task:** Process the input file to your `Grep` to set up some data structure (*wink, wink*) that stores each of the words in the file with the line number(s) in which that word appears. * As part of this processing step, you'll likely want to use the `split` method in the `String` class, which splits up a string into pieces each time it encounters a specific character, and stores those pieces in an array of strings. For example ```java= String sentence = "This is a sentence." String[] words = sentence.split(" ") // words is an array: ["This", "is", "a", "sentence"] ``` * The Java syntax to declare a `HashMap` that maps, for example, from a `String` to a `Set` of type `Integer` is `new HashMap<String, Set<Integer>>()`. Likewise, the syntax to declare a `HashSet` of `Integers` is `new HashSet<Integer>()`. **Notes:** * If the same word appears more than once on the same line, you should include the line number only once in your output. (*Hint: think about what your data structure choices do to help here before writing too much code for this!*) * You should treat words as sequences of characters separated by whitespace; so "glitter" and "glitter!" are distinct words. Also, you can assume words are case sensitive; so "mystery" and "Mystery" are distinct words. * If the word does not appear anywhere in the file, you should be returning an empty `Set` **Hints & Tips:** * Start by creating a short (2-3 line file) example on paper and drawing out what the resulting hashmap contents would be. This will help you figure out what types you need for your hashmap before you start coding. * Look at [Lab 5](https://hackmd.io/KjVh7MI9R_OzhOG6-58Gvg?view#Readers-Writers-and-Buffers) for how to read from files. * You'll probably need to use two classes from the Java IO library (`java.io`) to accomplish this goal: one to read a file, and one to read specific lines from that file * Ensure that you catch and handle relevant exceptions by printing an informative message and "exiting gracefully" (meaning that you end the method without throwing the exception). Remember, we want error messages to be as informative as possible, so we want to give unique information about each error / error type that might pop up. * Think about exceptions like `FileNotFoundException` and/or `IOException` **Task:** Fill in the `lookup` method. As a reminder, this method should return a `Set` of `Integer`s representing the line number(s) that the word passed in as an argument appears on **Task:** Write a `main` method for the Grep class. The `String[]` args should correspond to a file name, and then at least 1 other word to look for, in that order. The output of correctly running the main method should look something like: ``` traveller was found on lines: [1] on was found on lines: [3, 7, 9, 11] ``` **Your main method should not throw any exceptions**; make sure it prints out an informative message and exits gracefully instead of throwing any exceptions that may arise from incorrect user input. :::spoiler **Reminder: How to take in arguments using IntelliJ** To take in arguments using IntelliJ, press the drop-down menu near the green “run” button and select edit configurations. In the “Program Arguments” field, put your program arguments (space separated). These arguments will appear in the `String[]` args variable in your main method. ![](https://i.imgur.com/S0DdTpy.png) ![](https://i.imgur.com/cHSenAE.png) ::: <br> **Task:** Add tests to `Homework3TestSuite.java` that test your `lookup` function. You do not need to test for exceptions that might be thrown on I/O errors (as you are catching those and printing a message, which can’t be tested). **Note:** We are only testing `lookup` here, not the main Grep program, because the latter simply prints output to the screen. In such cases, you test the inner logic, then manually inspect output build from the results of the inner logic. That is, you only need to test your `lookup` function, not the Grep constructor. **Note:** We have a sample `.txt` file in `grep-test-files` for you to use. Please do not be afraid to create your own text files for testing, especially to try to catch edge cases! If you do this, **make sure your filepaths are of the format `"grep-test-files/filename.txt"`** (they shouldn't look like `"/Users/yourname/morefolders/grep-test-files/ozymandias.txt"`) and **make sure to hand in these text files** along with your `Homework3TestSuite.java` to the Testing submission. ### FAQ **Can the input for grep be multiple words?** Yes. Each input word should be separated by a space following the filepath as the arguments to your main method. A valid input to your main method would follow the structure: ```java: <mandatoryFilePath> <mandatoryWord1> <optionalWord2> <optionalWord3> etc. ``` **If I write a helper for grep, will that be constant time?** Not necessarily. Your preprocessing may not be constant time, but your `lookup` certainly should be. <!--- Not necessarily, and you shouldn't need one. Do any preprocessing in the constructor. Then your actual grep function should be able to be constant time. (Remember what constant time means: an operation is constant time if it takes the same amount of time regardless of the size of the data structure.) ---> **What should the lookup method in Grep do upon looking up a word that does not exist in the text?** It should return an empty set. **For the file passed into Grep, is the first line 0 or 1?** The first line should be considered as 1. ## Chaining Hash Tables ([Chaining Cars](https://www.youtube.com/watch?v=GemKqzILV4w)) In this problem, you will implement a hash table as we outlined in lecture. Recall that the internal data structure we saw for a hash table is an array of lists. You will use Java's `LinkedList` class in particular for the list implementation. The point of this problem is for you to see how hashtables are built when they don't already exist in a language. Therefore, **you may not use Java's hashmaps or hashsets in your implementation.** <!-- :::info **Lila edit:** I restructured the description of what can be found in `Chaining.java` below to match the structure of `Grep.java` as I described in the above section. The original paragraph about this is commented out in case we want to revert it ::: --> In your `sol` directory, we've given you a class `Chaining` that implements the `IDictionary<K, V>` interface. This class has: * A constructor that takes in the size of your hashtable * An inner class `KVPair<K, V>` for combining keys and values (as shown in lecture Friday 3/4) <!-- * There are a few accessor/mutator methods in this class that we've used in stencil methods that are given to you and can be used when implementing your hashtable. *Specifically, you may want to consider how one of these methods might help with your implementation of `findKVPair`* * You should **not** modify anything in the `KVPair <K, V>` class --> * Several methods from the `IDictionary` interface that we must implement: * `lookup`: we've implemented this method for you, but it will be your job (later) to implement its helper method `findKVPair` * `update`: we've written for you in its entirety * `insert` and `delete`: you must write these methods on your own (more info later in the handout) * Overridden methods for `equals` and `toString` that you should implement. Think about what it means for two hashtables to be equal <!--- The `sol` directory contains a class named `Chaining.java` (this name refers to the style of implementation we are using). It defines a class for combining keys and values (as shown in lecture) called `KVPair<K, V>` (stored inside the `Chaining` class). `Chaining.java` implements an interface `IDictionary<K, V>`, which has the definitions of the methods we need to implement. We have already written `lookup` and `update` for you; you need to write the constructor, `insert`, and `delete`. ---> ### The Constructor **Task:** Fill in the constructor for `Chaining.java`. The `Chaining` constructor should take as input a `size` variable, which specifies the size (number of slots) of the table. Attempts to make a hashtable with a negative or 0 size should throw an `IllegalArgumentException` with the error message `"Invalid size."`. When the constructor is done, you should have a valid empty hashtable. **Note:** This task requires you to set up fields for the underlying data structures for the hashtable (which we discussed in class) and create the corresponding objects. The subtlety here is that you have to **completely set up the objects**. We're leaving it to you to figure out what that means (that's one of the learning goals of this assignment), but we have later tasks that will explicitly hint you towards this. * Look back at the lectures on hashtables (Wednesday 3/2 and Friday 3/4) for an idea of what underlying data structure could be helpful here) #### Details on Creating Arrays of Generics (parameterized types) **Note:** In Java, you cannot create an array of a generic type (generic means "with type parameters"). In order to circumvent this limitation, you should do the following to create your hash table: ```java= this.data = (LinkedList<KVPair<K,V>>[]) new LinkedList[size]; ``` This line of code will generate a warning that there is an unchecked cast. To get rid of the warning, above any methods that cast like this, you should write: ```java= @SuppressWarnings("unchecked") ``` Typically, warnings such as these indicate a problem with your code, so you should not suppress them; you should pay attention to them! In this specific instance, however, we justify its use as we are trying to get around a `Java` limitation. ### Finding key-value pairs **Task:** Now that your data structures are set up, fill in the definition of `findKVPair`, which searches your data structure for the `KVPair` for a given key value. This method should throw a `KeyNotFoundException` if the given key is not in the hashtable. * You are welcome to implement additional (helper) methods if you wish. If they are only for use within your class, mark them as `private`. **Hint:** Recall that hash tables are backed by a hashing function, which maps a key to an index in the array. You should use the built-in `hashCode` method on your keys for hashing. The Java built-in hash function returns a signed (either positive or negative) integer. Taking the absolute value of the integer will let you map it into an array index. Once you have `findKVPair` implemented, you should have an empty hashtable on which the `lookup` and `update` methods (which we provided) can run. **Write (and run) your empty-hashtable tests for `lookup` and `update` NOW.** The subtlety we warned about in the constructor regarding setting up the objects can show up on these empty tests if you missed something. Making sure the empty case works now could save you a lot of debugging headache later. ### Implement `insert` and `delete` <!--- Recall that hash tables are backed by a hashing function, which maps a key to an index in the array. You should use the built-in `hashCode` method on your keys for hashing. The Java built-in hash function returns a signed (either positive or negative) integer. Taking the absolute value of the integer will let you map it into an array index. ---> **Task:** Fill in the `insert` method, which takes a key and a value and results in a hashtable in which the given key maps to the given value, unless the key was already in the hashtable. If the key was already in the hastable, throw a `KeyAlreadyExistsException` and leave the hashtable contents unmodified. **Task:** Write some initial tests for `insert`. Start with tests for keys that are NOT already in the table. If you get an unexpected problem, ask yourself whether you could avoid it by modifying the constructor (this is another place where the constructor subtlety we warned about could arise). **OO Design Warning:** We have mentioned in lecture how good OO design avoids code that has to ask what type something is. Asking whether something is equal to `null` is often a type check in disguise. If your code asks whether something is `null`, consider whether that is actually necessary (this again connects to the constructor subtlety). **Task:** Fill in the `delete` method, which removes a `KVPair` from the hashtable. The method header in the stencil indicates which exceptions this method can throw. You are welcome to use [`LinkedList` methods](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/LinkedList.html) if you think they will help (but this is not required). #### Handling Exceptions Some of the stencil methods throw exceptions. The names of the exceptions should make it clear what each exception is used for. Your code should throw the exceptions that label the methods under the corresponding conditions. Some of the thrown exceptions are really for the user of the hashtable, and aren't useful for your own implementation. You don't need to catch these exceptions (you will test for them, however). You may find situations in which the thrown exceptions are useful, in which case you should catch them and do something appropriate (we're intentionally leaving this for you to think about). **Note:** You may **NOT** modify the `throws` annotations (or anything else) in the `src` files. Work with the `throws` annotations that we have provided. **Note:** You are **NOT** expected to print out messages to the user just because an exception got thrown. If you don't see a use of an exception for the logic of your code, just pass them along to the user/test suite (by not catching them). ## Implementing `toString` and `equals` :::info This section was added on Saturday 3/5. ::: **Task:** Fill in `toString` in the stencil file. Use a format such as: ``` [key1: value1, key2:value2, ...] ``` (small variations on this are also fine -- the goal is to print the key/value mappings, but not the internal indicies). The order in which the keys print out doesn't matter. ***Hint:** Consider nested `for` loops*. **Task:** Fill in the `equals` method in the stencil file. Two `Chaining` objects should be considered equal if they have the same set of key-value pairs. They could have different underlying array sizes and different organizations of `KVpairs` within the arrays (including the same `KVPair` being stored in different array slots in otherwise equal arrays). There are no running-time restrictions on your solution. ***Hint:** This problem requires you to figure out how to iterate through one hashmap while checking its contents against the other. `findKVpair` may be helpful depending on how you break this down, but using it is not required.* ## Testing ([If I Ain't Got Passing Tests](https://www.youtube.com/watch?v=Ju8Hr50Ckwk)) **Task:** Test your `insert` and `delete` methods, including their interactions with `lookup` and `update`. :::info The next task was added on Saturday 3/5. ::: **Task:** Test your `equals` method. Think about how each of array size, allocation of pairs to slots, and different relationships between sets of `KVpairs` should figure into the definition of equality (and hence your tests). You do **NOT** need to test `findKVPair`, any helper methods that you write in order to write `insert` and `delete`, or `toString`. Put your tests in the `Homework3TestSuite.java` file. - Since your class is generic, be sure to use multiple key and value types in your testing. - Since we are working with a mutable data structure, don't forget about `@Before` methods that can be used to set up your data across tests. - Don't forget to test exceptions! You should test for the `KeyNotFound` and `KeyAlreadyExists` exceptions We will run wheat/chaff testing on your test suites. Aim to catch at least 75% of the chaffs. ## Handing In Before handing in your files, check that your solution is compatible with Gradescope. To do this, run `AutograderCompatibility.java`. Make sure that it compiles and runs without any errors. To hand in your files, submit them to Gradescope. - Submit `Grep.java` and `Chaining.java` to the **Homework 3: Implementation** submission - Submit `Homework3TestSuite.java` and the `.txt` files in `grep-test-files` to the **Homework 3: Testing** submission. Once you have handed in your homework, you should receive an email, more or less immediately, confirming that fact. If you don't receive this email, try handing in again, or ask the TAs what went wrong. **Note:** If your code does not work with the autograder, you will be penalized. If there are any complications, follow these steps: * Make sure that your Testing submission passes the wheat. If it doesn't, you should use the autograder output and the message in the failed autograder test to debug your test suite. * If your test suite passes the wheat, run your implementation against your test suite. If this crashes, you should fix your solution code so that it is compatible with your test suite. * If you are still having issues, please reach out to us at TA Hours or Ed. --- *Please let us know if you find any mistakes, inconsistencies, or confusing language in this or any other document by filling out the (mostly) [anonymous feedback form](https://docs.google.com/forms/d/e/1FAIpQLSdFzM6mpDD_1tj-vS0SMYAohPAtBZ-oZZH0-TbMKv-_Bw5HeA/viewform)!*