# Traboda ## File Structure Of Some Commonly Used Files ### PNG A PNG is a graphical file format of an image which supports lossless compression. ### PNG File Signature Magic Number -> **89 50 4E 47 0D 0A 1A 0A** ![](https://i.imgur.com/QzznMyD.png) As we have already seen, there are two types of chunks in a PNG image - Critical Chunks and Ancillary Chunks. ### Chunk Layout Each chunk consists of four parts: **Length** A 4-byte unsigned integer giving the number of bytes in the chunk's data field. The length counts only the data field, not itself, the chunk type code, or the CRC. Zero is a valid length. Although encoders and decoders should treat the length as unsigned, its value must not exceed (2^31)-1 bytes. **Chunk Type** A 4-byte chunk type code. For convenience in description and in examining PNG files, type codes are restricted to consist of uppercase and lowercase ASCII letters (A-Z and a-z, or 65-90 and 97-122 decimal). However, encoders and decoders must treat the codes as fixed binary values, not character strings. For example, it would not be correct to represent the type code IDAT by the EBCDIC equivalents of those letters. Additional naming conventions for chunk types are discussed in the next section. **Chunk Data** The data bytes appropriate to the chunk type, if any. This field can be of zero length. **CRC** A 4-byte CRC (Cyclic Redundancy Check) calculated on the preceding bytes in the chunk, including the chunk type code and chunk data fields, but not including the length field. The CRC is always present, even for chunks containing no data. See CRC algorithm. The chunk data length can be any number of bytes up to the maximum; therefore, implementors cannot assume that chunks are aligned on any boundaries larger than bytes. ### Critical Chunks A valid PNG image must contain an IHDR chunk, one or more IDAT chunks, and an IEND chunk. **IHDR** -> Describes image dimensions, color type, bit depth etc. It must be noted that this must be the first chunk (always). **PLTE** -> Contains the list of colours. **IDAT** -> Contains the image data. **IEND** -> Marks the end of the image. ### Ancillary Chunks Ancillary chunks can be otherwise called as optional chunks. These are the chunks which are generally ignored by decoders. **bKGD** -> Gives the default background colour. **dSIG** -> This chunk is used to store the digital signature of the image. **pHYS** -> Holds the pixel size and the ratio of dimensions of the image. ### CRC algorithm Chunk CRCs are calculated using standard CRC methods with pre and post conditioning, as defined by ISO 3309 [ISO-3309] or ITU-T V.42 [ITU-V42]. The CRC polynomial employed is x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x+1 The 32-bit CRC register is initialized to all 1's, and then the data from each byte is processed from the least significant bit (1) to the most significant bit (128). After all the data bytes are processed, the CRC register is inverted (its ones complement is taken). This value is transmitted (stored in the file) MSB first. For the purpose of separating into bytes and ordering, the least significant bit of the 32-bit CRC is defined to be the coefficient of the x^31 term. ### JPG/JPEG JPG or JPEG is a file format of an image which supports lossy compression. Magic Number -> **FF D8 FF E0 00 10 4A 46 49 46 00 01** ![](https://i.imgur.com/ZwQCWIS.png) ### File Structure A JPEG image is represented as a sequence of segments where each segment begins with a marker. Each marker starts with 0xFF byte followed by marker flag to represent the type of marker. The payload followed by marker is different as per marker type. Common JPEG marker types are as listed below: ![](https://i.imgur.com/nkvkh2f.png) ### ZIP Zip is actually a file format which supports lossless data compression. This file format achieves the compression of a file(s) using a number of compression algorithms. DEFLATE is the most used compression algorithm. Zip files have the file extension .zip or.ZIP. Magic Number -> **50 4B 03 04 and 50 4B 05 06**(for empty zip files) ![](https://i.imgur.com/KlnIw0j.png) ### ZIP File Format Each Zip file is structured in the following manner: ![](https://i.imgur.com/vuNdpGK.png) ZIP file format uses 32-bit CRC algorithm for archiving purpose. In order to render the compressed files, a ZIP archive holds a directory at its end that keeps the entry of the contained files and their location in the archive file. It, thus, plays the role of encoding for encapsulating information necessary to render the compressed files. ZIP readers use the directory to load the list of files without reading the entire ZIP archive. The format keeps dual copies of the directory structure to provide greater protection against loss of data. Each file in a ZIP archive is represented as an individual entry where each entry consists of a Local File Header followed by the compressed file data.The Directory at the end of archive holds the references to all these file entries. ZIP file readers should avoid reading the local file headers and all sort of file listing should be read from the Directory. This Directory is the only source for valid file entries in the archive as files can be appended towards the end of the archive as well. That is why if a reader reads local headers of a ZIP archive from the beginning, it may read invalid (deleted) entries as well those are not part of the Directory being deleted from archive.