## What is an image?
An image (for our purposes) is a sequence of bytes which represents a grid of pixels and associated metadata (type of camera, colorspace, etc). Images can be **lossless**, which means they preserve the original data exactly, or **lossy**, which means they lose some features to make the size of the file smaller. Since image forensics is about finding data hidden in an image, challenges involving it will typically (but not always) involve a lossless image.
### Vector vs Raster
Most "images" that you see are **raster** images, which means they are composed of pixels. This type of image will often look "grainy" when resized to a larger size, and if you zoom in enough, you will be able to see individual pixels. **Vector** images, on the other hand, are composed of a sequence of lines and curves, and thus can be resized arbitrarily without losing quality. Vector images are useful when the thing being displayed is mainly monochrome and not composed of complex colors (like a logo), while raster images are better for displaying high-detail things (like photos of something).

### Image formats
There are a *ton* of image formats, but two of them are most common: PNG and JPEG. Most other image formats are either not well supported or have other disadvantages. JPEG files are lossy, sacrificing quality for file size, which means that they are typically far smaller than PNG files, which are lossless and therefore larger. We will mainly focus on PNG files, since their structure is extremely simple.
## Metadata
Almost all images contain metadata, which is any extra information other than what gets rendered on your screen. Often, challenges will store some information in the metadata of a file, so it is important to be able to view metadata. Thankfully, almost all image formats use a common format, called **EXIF**, to store this information. The **exiftool** command can be used to see and modify the metadata of an image:
```
> exiftool steamroller.png
ExifTool Version Number : 12.33
File Name : steamroller.png
File Type : PNG
File Type Extension : png
MIME Type : image/png
Image Width : 960
Image Height : 639
Bit Depth : 8
...
Title : 124663936
Creator : Dmitry Kalinovsky
Creator Tool : Adobe Photoshop CS5 Macintosh
Image Size : 960x639
Megapixels : 0.613
```
## File signatures
Almost every type of file can be identified by a short sequence of bytes at the start. This is how the `file` utility works, and is also how we can identify an image. A list of signatures is available [here](https://www.garykessler.net/library/file_sigs.html). The extension in the name of a file (.png, .jpg) is not necessarily accurate, but we can check the signature to determine what the file's actual type probably is. If the signature is not in the list, you can check if it closely matches another one.
## PNG files
A PNG file is composed of an 8-byte signature followed by chunks, with some mandatory and some optional. Chunks are a "container" around some sort of data, with different types of chunks to store different data. The structure of a chunk is defined as follows:
```
Length: 4 bytes in big-endian order specifying the length of the chunk's data (excluding type/checksum)
Chunk type: 4 bytes of ASCII text specifying how to interpret the data
Data: the information which the chunk contains
CRC: 4 bytes which store the CRC32 of the chunk type + data to verify the data's integrity
```
Three chunks are required for every PNG: IHDR (specifies information about color type/pixel information/image layout), IDAT (stores all pixels, compressed using zlib), and IEND (a chunk with no data that confirms the image is over). Anything after an IEND chunk is ignored. If any chunks fail to be read correctly, or the file signature is incorrect, then an image utility will throw an error:

### IHDR
The IHDR stores the following data:
```
Width: 4 bytes
Height: 4 bytes
Bit depth: 1 byte
Color type: 1 byte
Compression method: 1 byte
Filter method: 1 byte
Interlace method: 1 byte
```
The ones which are most important are width/height, color type, and bit depth; the others are fixed. Width and height store the dimensions of the image, and sometimes will need to be modified. Color type stores information about the pixels of the image, such as whether they allow transparency, use a palette (the PLTE chunk), and use greyscale or color. Bit depth tells how much information is used per color; this value will typically be 8.
```
Color Allowed Interpretation
Type Bit Depths
0 1,2,4,8,16 Each pixel is a grayscale sample.
2 8,16 Each pixel is an R,G,B triple.
3 1,2,4,8 Each pixel is a palette index;
a PLTE chunk must appear.
4 8,16 Each pixel is a grayscale sample,
followed by an alpha sample.
6 8,16 Each pixel is an R,G,B triple,
followed by an alpha sample.
```
## IDAT
This chunk stores the actual pixels of the image after filtering and compression using zlib. Feel free to research about this on your own, but it is often not particularly important to know about how the data is stored. Instead, you can use a library such as [Pillow](https://pypi.org/project/Pillow/) to load the pixel data instead of manually getting it.
### IEND
This chunk is static, storing no information. It always appears as follows:
```
00 00 00 00 49 45 4e 44 ae 42 60 82
```
For more exact information about PNG files and chunk types, see the [PNG specification](http://www.libpng.org/pub/png/spec/1.2/PNG-Structure.html), especially the [list of chunks](http://www.libpng.org/pub/png/spec/1.2/PNG-Chunks.html)
## Solving challenges
The most important thing to know about image forensics is that jumping straight into trying various tools will lead to you being very frustrated at challenges. Instead, look for irregularities in the image before assuming it uses a specific technique.
### Checking validity
When you first download a PNG file from a forensics file, you should first check that it is a valid, working file. You can do this using a tool called **pngcheck**:
```
# check test.png for validity in verbose mode
> pngcheck -v test.png
File: test.png (8760 bytes)
chunk IHDR at offset 0x0000c, length 13
1366 x 770 image, 8-bit palette, non-interlaced
chunk PLTE at offset 0x00025, length 120: 40 palette entries
chunk IDAT at offset 0x000a9, length 8571
zlib: deflated, 32K window, maximum compression
chunk IEND at offset 0x02230, length 0
No errors detected in test.png (4 chunks, 99.2% compression).
```
If the image is not valid, pngcheck will tell you exactly what is wrong:
```
> pngcheck -v test_broken.png
File: test_broken.png (8760 bytes)
chunk IHDR at offset 0x0000c, length 13
1366 x 770 image, 8-bit palette, non-interlaced
CRC error in chunk IHDR (computed b5286dfb, expected 00000000)
ERRORS DETECTED in test_broken.png
```
These errors can often be fixed by editing the bytes of the file at the location of the error. I use the tool **hexedit**, but VSCode also has an extension to add a hex editor.
### Metadata
Sometimes secret information will be hidden in the file's metadata. If anything in the output of exiftool looks odd, consider whether that data is important to the challenge.
### Steganography
Typically, a pixel is stored as a triple of (red, green, blue) values, each from 0 to 255, representing how "on" each color should be. If you embed data in the last bit of some color in each pixel, it will barely affect the image while allowing you to store data:
```
(100, 200, 150)
0b1100100 0b11001000 0b10010110
|
v
0b1100101 0b11001000 0b10010110
(101, 200, 150)
```
In this example, we used the last bit of the red channel of the image to hide information. You can store it in other bits, but it becomes more and more obvious the higher the bit you choose. Here is an image with the message encoded in bit 0:

And the same image with a message in bit 7:

Because the least significant bit hides the data better, the technique is typically called **least-significant-bit steganography**. To check for this, you can use **zsteg** to check for steganography in all planes:
```
> zsteg -a image.png
...
b7p,bgr,msb,xy,prime.. file: OpenPGP Public Key
b8,g,lsb,xy,prime .. file: OpenPGP Public Key
b8,b,lsb,xy,prime .. text: "lfltfksqotveigkdf_ezunu"
b8,rgb,msb,xy,prime .. text: ".Ku~K\rA+-"
b1,r,msb,yx .. text: "Soon after his succession, probably in 1058, Guiscard separated from his wife Alberada because they were related within the prohibited degrees. Shortly after, he married Sichelgaita, the sister of Gisulf II of Salerno, Gu"
...
```
Then, you can look through the results for any legible text.
## Appending after end
For many image formats, especially PNG, there is a clear indication of when the file ends. Sometimes, challenges will have images where data is tacked on to an image after the actual image has ended. For PNG images, pngcheck will let you know if this is the case, and you can extract any resulting files via **binwalk**:
```
binwalk --dd='.*' test_appended.png
```
Note that out of the files it "finds," one will be the zlib-compressed image data, which is a bit of a false positive.
## Corrupted images
Challenges will sometimes give you an image that does not open in any viewing software. To do this type of challenge, **use pngcheck judiciously**. Read what it complains about and fix the problem, usually with a hex editor. This will save you *lots* of time. Randomly editing things until the file opens is a waste of energy.
## Know thy enemy
The best way to learn forensics is to actually *understand* the files you are looking at. I highly recommend [this blog post](https://www.da.vidbuchanan.co.uk/blog/hello-png.html) if you want to understand how and why PNG files work, and [this one](https://parametric.press/issue-01/unraveling-the-jpeg) for JPEG files.
## Resources
- [PNG specification](http://www.libpng.org/pub/png/spec/1.2/PNG-Structure.html)
- [File signatures list](https://www.garykessler.net/library/file_sigs.html)