Lab 04: Java IO

--- tags: labs-summer21 --- # Lab 04: Java IO ## Learning Objectives In this lab, you will learn to: * read from and write to files, buffers, and streams in Java * manipulate `File`s in Java ## Setup Before beginning this assignment, make your personal repository for this lab using the following GitHub classroom link. https://classroom.github.com/a/3GmSk0Bh You will be referring to this source code throughout the lab. Open your repository in IntelliJ to get started! ## The Java IO Library The Java Application Program Interface (API) is the set of interfaces of the Java libraries. In this lab, you will learn about the `java.io` package, a standard Java library. IO stands for input/output, and this library handles input and output for Java programs. You are welcome to browse through the [`java.io` package summary](http://docs.oracle.com/javase/7/docs/api/java/io/package-summary.html) for your reference. You will notice that there are **lots** and **lots** of classes in this library. Don't spend too much time trying to understand everything in this package in detail right now; learning to read library documentation takes practice. But do come back to this later, because this particular skill (which requires an investment up front) may save you more time than any other in your future computer science endeavors. You will also need the `java.io` library in your assignments in CSCI0180. In today's lab, you will learn about program arguments and how to use `java.io` library to read from and write to files, buffers, and streams. You will be interacting with the following seven classes: `FileReader`,`FileWriter`, `BufferedReader`, `BufferedWriter`, `InputStreamReader`, `OutputStreamReader`, and `File`. Of primary interest to us will be the methods for working with these classes. ## Program Arguments Much like how functions and methods take arguments as input, programs can also take arguments as input. Programs accept inputs through the `main` method's argument, `String[] args`, which declares that your program takes as input an array of `String`s called `args`. Before this lab, the programs you have written haven't taken in any arguments, so the `args` array has always been empty. The `args` array refers to any strings that might be handed to a program before it is run. Here is an example of a class, `PrintArgs`, with only a single method `main`, that simply prints its program arguments. ```java public class PrintArgs { public static void main(String[] args) { for (String arg : args) { System.out.println(arg); } } ``` With IntelliJ, we've been able to compile execute code by pressing the green play button, but this process can also be done with the command line. `javac` is a command that turns a `.java` file into an executable, and `java` is a command that runs the Java executable you made with `javac`. Using the command line, we can supply program arguments, so therefore, program arguments are also called command-line arguments. In the following example, we demonstrate compiling `PrintArgs` from the command-line and running it with the arguments `"hi"`, `"there"`, and `"everybody"`: ``` $ javac PrintArgs.java $ java PrintArgs hi there everybody ``` With the input `hi there everybody` the values of the the 0th, 1st, and 2nd strings in the `args` array are set to the strings `"hi"`, `"there"`, and `"everybody"`, respectively. Hence, when we run the program, we see that these words are printed out one after another as seen below. ``` hi there everybody ``` You can also supply arguments when running a program in Intellij. To run your program with arguments in Intellij, go to `Run` $\rightarrow$ `Edit Configurations`. ![](https://i.imgur.com/FzOdhdo.png) On the left hand menu, make sure to select the program you want to run. Then, type the names of your arguments in the `Program Arguments` box (separated by spaces), click `Apply`, and then run your program. ![](https://i.imgur.com/U9zC4sB.png) **Task**: Open the `PrintArgs` class in the provided lab files and try running it in Intellij with different arguments. ::: spoiler If you're not able to find edit configurations or add program arguments when you look at `Edit Configurations`, click here! When you open`Edit Configurations` and you might see something like this: ![](https://i.imgur.com/nTccNOb.png) To solve this, you can run your program by clicking the green arrow next to a `main` method to force IntelliJ to create a configuration for you. For example, if you run the main method in the `PrintArgs` class, when you open `Edit Configurations` again you should see something that looks like this: ![](https://i.imgur.com/rCjAMIb.png) You can then add your program arguments in the box that says `Program arguments`. ::: ## Readers, Writers, and Buffers For your reference throughout this lab, here are links to the [FileReader](http://docs.oracle.com/javase/7/docs/api/java/io/FileReader.html), [FileWriter](http://docs.oracle.com/javase/7/docs/api/java/io/FileWriter.html), [BufferedReader](http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html), and [Buffered Writer](http://docs.oracle.com/javase/7/docs/api/java/io/BufferedWriter.html) APIs. We explained how these work in the presentation, but feel free to refer to the following sections if you get stuck at the next check point. :::spoiler **Why do we need buffering?** When you read and write data, you are reading from and writing to computer memory. This is expensive in terms of time. **Buffering** speeds up the transfer of data by limiting the number of calls to these expensive computations. A buffer acts as a temporary data store so that data can be read or written in batches from the buffer rather than the underlying data stream. Essentially, rather than reading of writing every single piece of data from the data stream to another part of memory individually, we can store larger chunks of it in a buffer, resulting in fewer calls to read from an input stream. A buffered input stream reads data from a buffer, refilling when the buffer is empty. A buffered output stream stores data in a buffer, then writes the data into computer memory in one go --- we say the data has been *flushed*. For example, if you are writing to a file, the writer will not write each individual line to the file. Instead, it will wait until a lot of lines are waiting to be written and at that point, write all of those lines to the file. ::: :::spoiler **Reading and Writing: How do I do it?** Reading/writing from/to a file consists of a few basic steps: * Construct a `FileReader`/`FileWriter` with a filename as a String (including its path) as an argument. A `FileReader` will read from this file and a `FileWriter` will write to this file. * Construct a `BufferedReader`/`BufferedWriter` from this `FileReader`/`FileWriter`. * Read from/write to the file using the `BufferedReader` / `BufferedWriter`. * Close the reader/writer. These steps are explained in more detail below. To read from a file, you construct a `FileReader`. One way to do this is to pass in to the constructor the name of the file you wish to read: ``` FileReader fReader = new FileReader("/course/cs0180/src/poems/howl"); ``` When a `FileReader` is constructed, the file is opened. **Note:** Ensure that you use the full path of the file as the filename. Alternatively, if you use a relative path, ensure that you are running the program in the correct working directory (you can check this in Run > Edit Configurations in Intellij). Any time you want to read anything line-by-line (including user input, which we will get to later), you'll want to use a `BufferedReader`. Using a `BufferedReader` instead of just a`FileReader` ensures that your data is read in appropriately-sized blocks. Given a `FileReader`, you can create a BufferedReader like this: ``` BufferedReader bReader = new BufferedReader(fReader); ``` And here is how you use a `BufferedReader` to read a file line-by-line: ``` String line = bReader.readLine(); while (line != null) { System.out.println(line); line = bReader.readLine(); } bReader.close(); ``` The `readLine` method, which does exactly that ("read lines"), allows you to read from a file line-by-line. Along the way, it returns each line it reads as a `String`, until it encounters an `EOF` (i.e. **e**nd **o**f **f**ile), at which point there no further data to read, and it returns `null`. When you are done manipulating your file, you must close it. To close a `FileReader`, you use the `close` method: ``` fReader.close(); ``` In order to read from a file, we use a `FileReader` wrapped in a `BufferedReader`. Similarly, in order to write to a file, we use a `FileWriter` wrapped in a `BufferedWriter`: ``` FileWriter fwriter = new FileWriter("/course/cs0180/src/lab05/file.txt") BufferedWriter bWriter = new BufferedWriter(fWriter); ``` Whenever you want to write to a file, you should wrap a `FileWriter` in a `BufferedWriter` like above. In order to write, you should use the write method. ``` bWriter.write("I am writing to a file!"); bWriter.flush(); bWriter.close(); ``` After you are done writing to your buffer, you need to flush it. That is, you must tell Java to write everything that is currently stored in your buffer to the `FileWriter`. If you properly `close` your `BufferedWriter`, it will be flushed automatically. So you should not usually need to explicitly call the `flush` method, but rather you should flush your buffers implicitly by closing them when you are done reading or writing. ::: ### Concatenate The word "concatenate" means "to chain things together, one after the other." The linux command `cat` (that you can call in terminal), short for "concatenate," concatenates files. (If you are using windows, you can use `type` to accomplish the same task.) **Task:** Try out the `cat` command in your terminal on a few of your own files to see how it works, like so: ``` $ cat my-file.txt my-other-file.txt ``` **Note:** You can also use the `cat` command on a single file! Try that and see what it does. **Task:** In the `Cat` class from the stencil, fill out the `cat` method. It takes as input a `BufferedReader` and a `BufferedWriter`. It should continuously read lines from the buffered reader argument and write those lines to the buffered writer argument until the end of the file (called `EOF`) is reached. `readLine()` represents `EOF` as `null`. Remember that you will also want to write a new line once you have written a full line. Use the `BufferedWriter`'s `newLine()` method to accomplish this. **Hint:** A `while` loop will be helpful here! Look at the code snippets in *Reading and Writing: How do I do it?* for tips on how to construct the loop. **Task:** In your `Cat` class, fill out the `main` method that takes as input two file names. Have it invoke your `cat` method to read the contents of the first argument and write it to the second. **Notes:** We have provided you a test class to test your `cat` method; be sure to use it to ensure that your method works! **Fill in the paths with files that you test with.** We also provided sample text files for you to test with. Please download the [poems folder at this link](https://drive.google.com/file/d/1TkaPSIZ5gRJdbTxccELvOLmizAeeO3k8/view?usp=sharing) and put it in your CSCI0180 directory on your computer. The path to the poems on your device will be something like `/course/cs0180/poems`. :::spoiler Having trouble finding files or getting `FileNotFoundException`? If you're having trouble finding the files (i.e. getting a `FileNotFoundException`), try using the **absolute path**. This is a path that starts with `/`. If you need help finding the absolute path, try running `pwd` (or "print working directory"). This shows you what your path looks like. ``` $ pwd /Users/<username>/Desktop/CS18/lab4 ``` You can also try `ls` to find the *exact* file names on your computer. Note that extensions like `.txt` do matter! ::: ::: warning Note that IntelliJ may ask you to mark your methods with tags that indicate that they may throw an exception. This will be due to the fact that you have yet to cover how to cleanly handle exceptions that sometimes arise in the Java IO library. It will be covered in the next lab! ::: :::info **Checkpoint**! Call over a TA to check your work. ::: ## Streams ### Interactivity The buffered readers and writers you have been using are objects that are responsible for buffering *streams* of data (much like when you stream video online, and you need to wait for it to buffer). The stream of data being buffered might originate from a file, as we saw already. However, user input to and from the console also travels on data streams! In this part of the lab, you will learn how to read user input from the console in order to make your programs interactive. Although you may not know it, you've probably used input and output streams before. `System.out` is the standard output stream. Whenever you use `System.out.println`, you're sending a string over the `System.out` stream. Guess what? There's also a `System.in` input stream! So we can create a reader (say, an `InputStreamReader`) that reads from `System.in`. Then, if we create a `BufferedReader` with this `InputStreamReader`, we'll be able to call `readLine` to get a line of user input! Here's how to create such a reader: ``` BufferedReader reader = new BufferedReader(new InputStreamReader(System.in)); ``` ### Fun With Words Now that you know about streams, file readers, and file writers, it's time to combine them to create interactive programs! Combining multiple different readers and writers allows you to read from a file and write out to `System.out`, or vice versa. **Task:** Write a method, ```toDocument``` inside of the Doc class, that repeatedly prompts the user for a word from the console, and writes those words to a file, with a space separating each word, until `null`/`EOF` is reached. The method should just take in a `BufferedWriter` which will write to a file. **Hint:** You can use `readLine()`, just like you did above! **Task:** Write a main method which takes in an argument denoting which file to write to. Then, it should create an appropriate writer (which writes to the file that was specified) and pass that to `toDocument`. **Hint:** To tell the console that you're done sending input, you must send it an `EOF`: i.e., a signal that indicates the end of the stream. You do this by typing `Ctrl-d`. When a `BufferedReader` gets `Ctrl-d`, it will translate it as `null`. You *must* make your program stop accepting input after `EOF`. **Note:** `Ctrl-d` sometimes does not work in IntelliJ. We are not sure why, but you may experience issues as you work on Search and other projects that rely on user input. For this reason, if you are having trouble with `EOF` in IntelliJ, you should try running your program in the command line, where `EOF` *will* work. There is a guide on running from the command line on our course website. However, it is not strictly necessary for you to do this in order to get checked off; we understand this Intellij issue is outside of your control. :::info **Checkpoint**! Call over a TA to check your work. ::: ## Just For Fun: CSV Parsing ::: spoiler CSV Parsing ### CSV Files CSV stands for comma-separated value. A CSV file contains rows of data, each row containing several datapoints separated by commas. You can think of a CSV as a spreadsheet: each datapoint in a row is the value in one cell of that row. We've provided the CIT elevator logs in the lab04 repo. The swipe data you've been given has three columns. The first contains the name of a person who swiped into the upper floors of the CIT on Sunday. The second contains the time they swiped into the elevator to go up, and the third contains the time they swiped into the elevator to go down. These times are stored in a standard CS format called Unix time, meaning they are stored in the number of seconds they occurred after midnight on January 1, 1970. We want you to look for everyone in the upper floors of the CIT, on February 16, 2020 at 2:38 PM, or in Unix time, at 1581881945, i.e. everyone who entered before and exited after this timestamp. The file is clearly too long to look through yourself. You know how to read through a file using a BufferedReader, so finding this datapoint should be manageable, but it might take a while to write. Luckily for you, CSV parsing is an incredibly common task, so there are packages that provide CSV parsers for you. You can just use one of these to quickly parse the dataset! ### Setting up the Apache CSVParser To run this project, you'll need to import the `org.apache.commons:commons-csv:1.0` from maven into your IntelliJ project. To do this, go to File > Project Structure, select Modules on the left sidebar, select the Dependencies tab, then click `+`. From there select `Library` and then you may need to select `New Library`, then `From Maven`. Then go ahead and enter `org.apache.commons:commons-csv:1.0` into the search bar and select the option from the dropdown (you may need to click search and then click the dropdown button). Select all four boxes for Transitive Dependencies, Javadocs, Sources and Annotations. After that you can just click through to install! ### Using the Apache CSVParser In order to parse our CSVs, we'll be working with the Apache CSVParser. Using a new package can be overwhelming at first, so we'll walk you through the process of using it. First, you'll need to import the classes we'll be using. **Task:** Create a class called `SuspectFinder`. At the top, import `org.apache.commons.csv.CSVFormat`, `org.apache.commons.csv.CSVParser`, and `org.apache.commons.csv.CSVRecord`. Create a method called `findSuspects` that takes in a file name and returns a `List<String>` containing the names of anyone in the CIT at the time of the crime. First, we'll have to tell the parser what specific format our CSV file is in, since there are multiple different valid formats. We do this through the use of a `CSVFormat` object. **Task:** Create a variable `format` of type `CSVFormat` in `findSuspects`. We want to use the standard CSV format, which is called `RFC4180`. This format is built in as a static field of `CSVFormat`: `CSVFormat.RFC4180`. However, this isn't enough for our format. The first row of the CSV you're using contains the names of each column in the dataset. This is called a `header`, and need to be accounted for in our format. Luckily, this is easy in Apache as well. All we'll have to do is set `format` to `CSVFormat.RFC4180.withHeader()`. Now, we'll need to set up our `CSVParser`. The constructor of `CSVParser` takes 2 arguments: a `FileReader` of the file you want to parse, and the `CSVFormat` object you just created. **Task:** Create a variable `parser` of type `CSVParser` in your method. Now, the `CSVParser` can parse the file into a series of `CSVRecord`s, one for each row, which in turn each act like a list of the `String`s from their row. You can find the documentation for `CSVParser` [here](https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVParser.html) and the documentation for `CSVRecord` [here](https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVRecord.html). **Task:** Finish the method, returning the names of everyone who was in the CIT at the specified time. :::