In this lab, we are going to talk about Binary I/O and how we can manipulate binary files, we should know that every file in your computer is a binary file, but the files that contain binary which represents a text will be considered as text files as they are very common.
You can view any file as a binary file using a Hex Editor
, which is a program that will display the binary data of a file, you could also see your whole hard drive or ram as a big binary chunk.
You could use this online hex editor, or you could install a program such as HxD.
The binary data of an Image.
Note: We read binary data as hex because it's easier for us, and programs usually store data as bytes (two hex digits).
The main classes for reading and writing as binary is InputStream
, and OutputStream
, these two abstract classes define the common methods for writing or reading binary data (which could be something other than files).
Please read the documentation here:
https://docs.oracle.com/javase/8/docs/api/java/io/InputStream.html
https://docs.oracle.com/javase/8/docs/api/java/io/OutputStream.html
As an example, we will take a look at the FileOutputStraem
class which will output the binary data to a file.
And here's the file in a hex editor:
Example 1 Output
Now we could use FileInputStream
to read the data from that file.
So it's useful to write data byte by byte, but sometimes we need something bigger than a byte, something like int
, or float
, we could do that by hand but there is an easier way, to use DataInputStraem
and DataOutputStream
classes.
Please take a look at the documentation here:
https://docs.oracle.com/javase/8/docs/api/java/io/DataOutputStream.html
https://docs.oracle.com/javase/8/docs/api/java/io/DataInputStream.html
Notice that we've created a FileOutpuStream
, then we've constructed the DataOutputStream
using that object because we could write data to something other than a file (e.g. a network packet!).
So this file will contain the numbers from 1000000
to 1010000
, and the output size will be 40,000
bytes because each int
will take four bytes.
Now if we created a text file with the same data, it will take about 90,000
bytes.
Writing binary data to a binary file is faster in writing and reading, and take much lower space.
In this example, we will save and read the data of students using DataOuptutStream
and DataInputStream
.
In this section, we will try to use ObjectOutpuStream
, and ObjectInputStraem
, which are classes that help us automatically serialize an object, so the previous example will be trivial and it will help you a lot if you want to save the data of a certain object.
A serializable class should implement the interface Serializable
, which will tell the output stream that this class could be serialized.
In this example, we will create the previous example using Object Streams.
Notice that in the previous examples, we access the data sequentially, but sometimes we need to access the data in a random way (which means we will read whatever address we want).
Notice that Random Access File is not an output or input stream, it's not a stream.
Notice that the constructor of a RandomAccessFile contains the file path and a mode, the mode specifies if you want to read, read-write, or write to that file.
Please take a look at the documentation:
https://docs.oracle.com/javase/8/docs/api/java/io/RandomAccessFile.html
Notice that we are reading the output file of Example 1.
The seek method will move the file pointer to a certain location in the file.
Q: What should be the output of this program?
In this task, you should create a program that takes a file name as an argument, and prints the hex dump of it, here is an example of my solution:
Task 1 Expected Ouptut
In this task, you should create a data recovery program, which will recover files from the hard disk after deleting it, these programs will search for something called file signature which is something that identifies files, then they extract that file from the hard disk binary.
Take a look here: https://en.wikipedia.org/wiki/List_of_file_signatures
Your program should take a file as input (to avoid destroying your hard drive you will read a file), and output what files it finds.
The program should extract PDF files, PDF files will always start with these bytes:
25 50 44 46
And end with these bytes:
25 25 45 4F
Your program should search for this pattern and extract the PDF files from this dump file:
https://drive.google.com/file/d/1BaPR6Qg6_Xwjt7C7uQ4Um93bgqO8fqY8/view?usp=sharing
This dump file contains 2 pdf files, so your program should output two files!
Programming
Java
IUG