# 3rd Year Project Meetings (October, November)
## 22 October 2020
The aim of the project is to check if the Square Root Law holds. Informally, I will verify the relation between then cover size and the maximal undetectable payload size. We suppose that doubling the cover size will allow only $\sqrt{2}$ more payload. Below you can find a diagram of the whole pipeline.
```mermaid
graph LR;
Cover(Cover .pgm) -- HILL --> Costs(Costs)
Cover -- my code --> Stego(Stego .pgm)
Costs(Costs .costs) -- my code --> Stego
MS(Payload size) --> Stego
Cover -- SRMQ1 --> CF(Cover features .fea)
Stego -- SRMQ1 --> SF(Stego features .fea)
CF --> LC
SF --> LC(Low complexity linear classifier)
```
### Technical details
- work with .pgm files
- use covers of different sizes
- repeat the pipeline 1-10k times
- just simulate hiding
- helpful link http://dde.binghamton.edu/download/
### Things to remember
- keep notes of any experiment
- include how data was stored
- include time spent executing (get credit for your work)
- hypothesis, design, experiment, conclusions
### From the Advanced Security Lecture Notes
- Embeddin and extracting functions (C covers, K keys, M messages):
$Emb : C×K×M→C, Ext : C×K→M$
- Types of errors
- **Type I, false positive, $\alpha$** = Warden detected something not happening
- **Type II, false negative, $\beta$** = Warden failed to detect steganography
- decision function $D : C → \{Positive, Negative\}$
- **embedding efficiency** = # bits hidden / average # changes needed
- **capacity** = payload / size (e.g. 3 bits per pixel)
- false/true positive/negative table:
| D | Positive | Negative |
|:--------:|:--------:|:--------: |
| Carrier | $1-\beta$ | $\beta$ |
| Innocent| $\alpha$ | $1 - \alpha$ |
- when $\alpha$ increases, $1-\beta$ also increases
### Finding the right probabilities
Suppose that for any pixel in the image we associated a cost $d_i$ and we want to find some probabilities $p_i$ such that the total entropy is at least *m* (the payload size in bits) and the distortion is minimised. More precisely we want to
$$ min\ p^Td $$
$$ subject\ to\ \sum_{i}H(p_i) = m $$
It can be easily shown that the solution to this minimisation problem is
$$ p_i = \frac{1}{1 + e^{\lambda d_i}} $$
for some $\lambda$. Moreover, note that $p_i$ is strictly decreasing as $\lambda$ increses and $p_i = \frac{1}{2}$ for $lambda = 0$. $H(x)$ is strictly increasing for $x < 1/2$ and strictly decreasing for $x > 1/2$. Therefore, our condition $T = \sum_{i}H(p_i) - m$ is strictyly increasing on $(-\infty, 0]$ and strictly decreasing on $[0, \infty)$. The maximum of $T$ is reached for $\lambda = 0$ with the value $n$, the total number of bits/pixels that can be changed. Therefore, we can find two $\lambda$ solutions, $\lambda_1 < 0$ and $\lambda_2 > 0$. We are interested to minimise the probabilities so $\lambda_2$ is the solution that we are looking for. Therefore, I suggest the following algorithm:
1. Set $l = 0$ and $r = 1$.
2. While $T > m$ for $\lambda = r$ do $r \leftarrow c r$ for a constant $c$
3. While $r - l > eps$ do
- set $m \leftarrow \frac{l + r}{2}$
- if $T > m$ for $\lambda = m$ then $l \leftarrow m$
- otherwise $r \leftarrow m$
4. Return $\lambda = \frac{l + r}{2}$
Alternatively, we can use Newton's method for a faster convergence but we need to make sure that we don't find the negative solution by mistake. **How can we avoid such cases? Is the Newton's method better or it might jump very far from the solution?[Question 1]** I suppose that running the Interval bisection algorithm for $eps = 0.1$ should be enough to get a safe interval for Newton to converge quadratically. I computed the derivative as
$$ (H(p_i))' = -d_i^2 \frac{\lambda e^{\lambda d_i}}{(1 + e^{\lambda d_i})^2} $$
### Questions for the next meeting
1. [Question 1] above
**A**: We should use inverval bisection and optimise the slow parts - maybe help the compiler to use vectorial instructions
2. I see that "hill.py" contains the cost generation "HILL(self, cover)" so I suppose that the next step (after generating the costs) is to write a random embedder based on my computed $\lambda$ (which determines my probabilities). Do I understand it correctly?
**A**: That seems correct.
3. I suppose that changing each pixel independently with its own probability is enough. Is that correct?
**A**: Yes. Change them independently based on that probability.
4. What does a change mean? Is it just XOR-ing with 1?
**A**: A change is a random +/-1, except for the boundary when 255 -> 254 and 0 -> 1.
## 02 November 2020
A very short discussion with answers to my previous questions (see above). One nice thing to remember is that the curve of $(\alpha, 1 - \beta)$ is always concave(?) because we can randomly generate any point on the line connecting any two points on the curve.
### What I did:
- Connected to the server
- Created a simple image cropper
- Modified the HILL script to generate costs (requires imageio package installed)
### Calling conventions
- Cropper: [width] [height] [source] [destination] [delta row 0-100] [delta column 0-100]
- HILL: --input [source] --output [destination]
- CostDisplay: [source] [destination]
### Current pipeline
```mermaid
graph LR;
Cover(cover.pgm) -- Crop --> Cropped(cropped.pgm)
Cropped -- HILL --> Costs(cover.cost)
Costs -- CostDisplay --> Display(costs.pgm)
```
### Questions for the next meeting:
1. What size should we use for our images? Should we crop multiple images from each one? Are we going to use only 8 bits/pixel?
**A**: We are going to use ~10 sets of images cropped from the initial set. Images in one set will have the same size and roughly 10%, 20%, ..., 100% of the maximum number of pixels. Each pixel will be represented by exactly 8 bits. Crop at least 32 bits from each edge. Maximum size something like 3k x 2k (multiple of 8 is preferred). Ratio 3:2.
2. How should I train the linear classifier? Maybe keep 30% of the data for testing? When can I conclude that some payload is easily detectable?
**A**: It is probably enough to use the training error reported by the optimiser. We don't need any threshold but instead evaluate the accuracy for each set with payload $O(1)$, $O(n)$ and $O(\sqrt{n})$.
3. I remember you said something about organising images in folders. What are the constraints in this case?
**A**: Consider the following structure:
```
.
├── size-5MP
├── size-6MP
│ ├── cover
| ├── costs
| ├── stego-128-bits
| ├── stego-256-bits
│ └── stego-512-bits
├── size-7MP
...
```
4. Could you install git, please? I might also need some python packages later. I am currently working on my laptop. Later I will migrate to your server.
**A**: Have a look at how to install python packages without admin right.
## 09 November 2020
General information about image processing:
- 8-10 image sets cropped from initial set
- crop at least 32 pixels from each side
- each size $(x * 10\%)$ of the max size of pixels (3k x 2k)
- image size multiple of 8 in both directions
- ratio 3:2 approximatively
- regularisation is very important!
### Expected result of the first part of the project
- the square root relation between payload and size
- x: size, y: accuracy

### Possible extension
- find the ratio $(payload^2/size)$ that maintains detectability (payload not only $O(sqrt(size))$ but $f(image) * sqrt(size)$ where $f$ must be constant-ish)
- it is probably $~(\sum_i\frac{1}{c_i})$ or similar (maybe the average?)
- x: guessed ratio, y: detectability

### Calling conventions
- Cropper: [width] [height] [source] [destination] [delta row 0-100] [delta column 0-100]
- HILL: --input [source] --output [destination]
- CostDisplay: [source] [destination]
- Stego: [source] [cost] [destination] [diff destination] [payload]
### Current pipeline
```mermaid
graph LR;
Cover(cover.pgm) -- Crop --> Cropped(cropped.pgm)
Cropped -- HILL --> Costs(cover.cost)
Costs -- CostDisplay --> Display(costs.pgm)
Costs --> Stego(Stego)
Cropped -- Random Embedder --> Stego
Payload(Payload) --> Stego
Stego --> Diff(Difference)
Cropped --> Diff
```
### Todo
- search for python install without admin right
- test git on server
- read about the next pipeline parts
### Questions
1. Did I understand the desired outcome of the project extension correctly? Is that constant valid only when payload ~= sqrt(size) or it should be true in general? Should it also be proportional to the decidability? I am afraid I might have misunderstood what you said last time.
**A**: Have a look at the ADK72B.
2. The download webpage is no longer accessible. However, I got a windows copy of the SRMQ1 feature extractor from Radostin. Do you have a copy of the linux version? Is the version relevant in our case?
**A**: The website is working now. ADK also copied the files to my home folder.
3. I think I need some help to understand how to use the feature extractor. There are many groups of ~325 features each. Should I use all of them? What does 325 mean? Is it a constant for each image?
**A**: Each set has a specific convolution matrix. Just concatenate all features and input them into the linear classifier. See [Rich Models for Steganalysis of Digital Images](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.441.6997&rep=rep1&type=pdf).
4. What does "quantization q=1c" mean?
**A**: It means that we consider only the intergral part.
5. All my tools are currently processing one image/run. Should I make them receive a folder of images and process all of them in the same run?
**A**: It would be probably better to process a sequence of images from the folder, not the entire one.
## 16 November 2020
### Some more details/hints
- The real correlation between payload and size is actually $payload = \sqrt{size}* log(size)$
- Use "screen" to avoid killing the ssh session (read manual)
### Calling conventions (updated to handle more files at once)
- Cropper
- [width] [height] [source dir] [destination dir] [delta row] [delata col] [regex match]
- ./crop 300 500 ../images/ cropped/ 50 70 .*pgm
- HILL
- --input [source] --output [destination]
- python3 cost.py --input "../images/*" --output /costs
- CostDisplay
- [source dir] [destination dir] [regex match]
- ./costDisplay costs/ display/ .*cost
- Stego
- [source dir] [cost dir] [destination dir] [diff destination dir] [payload bits] [regex match]
- ./stego images/ cost/ res/ diff/ 10000 .*pgm
- SRMQ1 feature extractor
- matlab -r "extract [sourceDir] [destinationDir] [fileMatch]"
- matlab -r "extract '../Pipe/Cropped/' './' '*.pgm'"
### Current pipeline
```mermaid
graph LR;
Cover(Cover .pgm) -- Crop --> Cropped(Cropped .pgm)
Cropped -- HILL --> Costs(Costs .cost)
Costs -- CostDisplay --> Display(CostDisplay .pgm)
Costs --> Stego(Stego .pgm)
Cropped -- Random Embedder --> Stego
Payload(Payload bits) --> Stego
Stego --> Diff(Difference .pgm)
Cropped --> Diff
Cropped -- SRMQ1 --> CFea(Cover Features .fea)
Stego -- SRMQ1 --> SFea(Stego Features .fea)
```
### Questions
1. Are we going to use SRM or SRMQ1? What is the difference? Is SRMQ1 using only quantization 1? Why are the features real numbers when using quantization?
**A**: We will use SRMQ1 (quantization 1). The features are computes as the average of all quantized features of some kind.
2. You can find all my code in the home directory under /3rd-year-project. If you run "make test" it will get ~10 images through the pipe. The result can be found in folder Pipe/. Could you have a look over the generated files and let me know if you see anything suspicious, please? Especially, the cost display seems to follow some patters that might come from the calculation and not from the image itself.
**A**: The cost looks suspicious. That might be because of some previous conversion applied to those images.
3. I suppose that I will train one linear classifier for each (image size) set. Is that correct?
**A**: Yes.
4. Should I use the Linear Classifier from the downlaod website (at least for the first part)? http://dde.binghamton.edu/download/LCLSMR/
**A**: Yes.
## 23 November 2020
### About the linear classifier
- Requires
- A training set
- A testing set
- Returns for the testing set
- PFA = False positive rates
- PMD = False negative rates
- PE = Minimum error ((PFA + PMD) / 2)
- There are
- 12753 features for each image
### Calling conventions
- Cropper
- [width] [height] [source dir] [destination dir] [delta row] [delata col] [regex match]
- ./crop 300 500 ../images/ cropped/ 50 70 .*pgm
- HILL
- --input [source] --output [destination]
- python3 cost.py --input "../images/*" --output /costs
- CostDisplay
- [source dir] [destination dir] [regex match]
- ./costDisplay costs/ display/ .*cost
- Stego
- [source dir] [cost dir] [destination dir] [diff destination dir] [payload bits] [regex match]
- ./stego images/ cost/ res/ diff/ 10000 .*pgm
- SRMQ1 feature extractor
- matlab -nodisplay < exit -r "extract [sourceDir] [destinationDir] [fileMatch]"
- matlab -nodisplay < exit -r "extract '../Pipe/Cropped/' './' '*.pgm'"
- echo "exit()" > Tools/exit
- Linear Classifier
- matlab -nodisplay < exit -r < exit "classify [sourceDir] [destinationDir] [fileMatch]"
- matlab -nodisplay < exit -r "classify('./CoverFeatures/', './StegoFeatures/', '*.fea')"
### Current pipeline
```mermaid
graph LR;
Cover(Cover .pgm) -- Crop --> Cropped(Cropped .pgm)
Cropped -- HILL --> Costs(Costs .cost)
Costs -- CostDisplay --> Display(CostDisplay .pgm)
Costs --> Stego(Stego .pgm)
Cropped -- Random Embedder --> Stego
Payload(Payload bits) --> Stego
Stego --> Diff(Difference .pgm)
Cropped --> Diff
Cropped -- SRMQ1 --> CFea(Cover Features .fea)
Stego -- SRMQ1 --> SFea(Stego Features .fea)
CFea --> Lin(Linear Classifier)
SFea --> Lin
```
### Experiments
#### Useful commands
- { time make test > results; } 2> time &
- { time make test_classify >> extra; } 2>> time_extra &
- jobs, kill %1, bg %1, fg %1
#### Setup
| Id | Width | Height | Payload | Images |
| - | - | - | - | - |
| 1 | 1024 | 512 | 60000 bits | 001*.pgm (98) |
| 2 | ... | ... | 200000 bits | ... |
| 3 | ... | ... | 30000 bits | ... |
| 4 | ... | ... | 15000 bits | 0[0\|1]*.pgm (1997) |
| 5 | 512 | 512 | 14000 bits | ... |
| 6 | 1024 | 512 | 20000 bits | ... |
| 7 | 1024 | 1024 | 28000 bits | ... |
| 8 | 2048 | 1024 | 40000 bits | ... |
#### Time
| Id | Crop | Cost | Cost Display | Stego | Features | Classifier | Total |
| - | - | - | - | - | - | - | - |
| 1 | 0s | 1m4s | 0s | 2m40s | 3m17s x2 | 7s | 7m |
| 2 | ... | ... | ... | ... | ... | ... | 7m |
| 3 | ... | ... | ... | ... | ... | ... | 7m |
| 4 | 5s | 20m | 5s | 50m | 67m x2 | - | 140m |
| 5 | - | - | - | - | - | 6m | 87m |
| 6 | - | - | - | - | 73m x2 | 6m | - |
| 7 | - | - | - | - | - | 6m | 450m |
| 8 | - | - | - | - | - | 6m | 882m |
#### Results
| Id | Error | Obs |
| - | - | - |
| 1 | 0.0867 | image00046.pgm has fewer bytes than declared in the header |
| 2 | 0.0000 | ... |
| 3 | 0.1071 | ... |
| 4 | 0.1374, 0.2181 x3| 00046 & 01914 broken (!! MUST INSPECT) |
| 5 | 0.2604, 0.1769 x4 | 01851 cropped to white image - my HILL crashes (!! MUST INSPECT) |
| 6 | 0.3345 x4 | ... |
| 7 | 0.3215 x5 | ... |
| 8 | 0.2116 x5 | ... |
### Questions
1. The linear classifier has some hard-coded tolerance (a vector - Tolerance = 2.^(-0:1:20)*1e-6). I see that it is used to approximate the solution of systems $Ax = b$. Why is it useful to use multiple tolerances? Should I do the same?
**A**: Inspect it further and understand how the linear classifier works.
2. I suppose that my training set is going to be the same as the testing set. Is that correct?
**A**: Yes.
3. I will probably use 10 sets of images (of different size), each one with 3 or 4 payload sizes. Should I train 30-40 linear classifiers or just 10?
**A**: 30-40 initially, then maybe fewer
4. Are the results above expected?
**A**: Yes-ish.
5. The error is current calculated as (1/2) * (False_Positive + False_Negative). Is that what we want?
**A**: Just choose one metric and use that.
6. The last test yielded 0.1374 error once and then 0.2181 three times. Is that normal or I did something wrong?
**A**: It is not expected. Look inside the linear classifier code. The result should be deterministic.
## 30 November 2020
### Estimating the time and memory requirements
#### Size groups
| Group | Width | Height |
| - | - | - |
| 0.5 MB | 864 | 576 |
| 1.0 MB | 1248 | 832 |
| 1.5 MB | 1536 | 1024 |
| 2.0 MB | 1752 | 1168 |
| 2.5 MB | 1968 | 1312 |
| 3.0 MB | 2160 | 1440 |
| 3.5 MB | 2328 | 1552 |
| 4.0 MB | 2496 | 1664 |
| 4.5 MB | 2640 | 1760 |
| 5.0 MB | 2784 | 1856 |
#### Size and time estimation
| - | Crop | Cost | Cover Features | Stego | Stego Features | Total |
| - | - | - | - | - | - | - | - |
| Image Count | 100.000 | 100.000 | 100.000 | 300.000 | 300.000 |
| Average Size | 3MB | 6MB | 0.15MB | 3MB | 0.15MB |
| Total size in | 300GB | 300GB | 300GB | 900GB | 900GB |
| Total size out | 300GB | 600GB | 15GB | 900GB | 45GB | 1.8TB |
| Time/GB in | - | 24m | 68m | 55m | 68m |
| Time/GB out | - | 12m | 1360m | 55m | 1360m |
| Time/Image | 0.0125s | - | - | - | - |
| Total time | 20m | 120h | 340h | 825h | 1020h | 2325h = 97 days |
### Experiments
- THE IMAGE SET WAS CHANGED BEFORE RUNNING THE FOLLOWING EXPERIMENTS
- RUNNING TWO INSTANCES OF FEATURE EXTRACTORS (1 IN BACKGROUND, 1 IN FOREGROUND) MAY LEAD TO THE SITUATION WHEN THE CLASSIFIER STARTS BEFORE THE FIRST EXTRACTOR FINISHES
- INF COST BUG FIXED AFTER EXPERIMENT 14
#### Useful commands
- { time make test > results; } 2> time &
- notif "( time make test > results ) 2> time" &
- alias: run_pipe
- { time make test_classify >> extra; } 2>> time_extra &
- jobs, kill %1, bg %1, fg %1
- notif [command] - will send a Telegram notification when a process finishes
#### Setup
| Id | Width | Height | Payload | Images |
| - | - | - | - | - |
| 9 | 512 | 512 | 14000 bits | 000*.pgm (99) |
| 10 | 512 | 512 | 14000 bits | 00*.pgm (999) |
| 11 | 512 | 512 | 14000 bits | 0[0\|3]*.pgm (1999) |
| 12 | 1024 | 512 | 20000 bits | 0[0\|3]*.pgm (1999) |
| 13 | 1024 | 1024 | 28000 bits | 0[0\|3]*.pgm (1999) |
| 14 | 2048 | 1024 | 40000 bits | 0[0\|3]*.pgm (1999) |
|
| 15 | 512 | 512 | 14000 bits | 0[0\|3]*.pgm (1999) |
| 16 | 1024 | 512 | 20000 bits | 0[0\|3]*.pgm (1999) |
| 17 | 1024 | 1024 | 28000 bits | 0[0\|3]*.pgm (1999) |
| 18 | 2048 | 1024 | 40000 bits | 0[0\|3]*.pgm (1999) |
#### Time
| Id | Crop | Cost | Cost Display | Stego | Features | Classifier | Total |
| - | - | - | - | - | - | - | - |
| 9 | 1s | 1m | 0s | 2m40s | 3m30s x2 | 9s | 7m34s |
| 10 | - | - | - | - | - | 2m | 40m |
| 11 | 21s| 11m | 2m | 28m | 40m x2 | 5m30s | 85m |
| 12 | 23s | 23m | 3s | 57m | 1h13m x2 | 5m30s | 160m |
| 13 | 23s | 46m | 7s | 1h50m | 2h16m x2 | 5m30s | 300m |
| 14 | 26s | 1m30s | 14s | 3h40s | 4h35m x2 | 6m20s | 600m |
|
| 15 | 21s| 11m | 2m | 28m | 40m x2 | 5m30s | 85m |
| 16 | 23s | 23m | 3s | 57m | 1h13m x2 | 5m30s | 160m |
| 17 | 23s | 46m | 7s | 1h50m | 2h16m x2 | 5m30s | 300m |
| 18 | 26s | 1m30s | 14s | 3h40s | 4h35m x2 | 6m20s | 600m |
#### Results
| Id | Error | Obs |
| - | - | - |
| 9 | 0.0707 | - |
| 10 | 0.33 | - |
| 11 | 0.3164 | - |
| 12 | 0.2691 | - |
| 13 | 0.1116 | - |
| 14 | 0.0955 | - |
|
| 15 | 0.3217 | - |
| 16 | 0.2838 | - |
| 17 | 0.1148 | Why?? |
| 18 | 0.3952 | - |
### Questions
1. Please have a look at my suggested size groups and their estimation for time/size. Is that reasonable? I think that 2TB should be enough for this experiment and it should be completed in 100 days without any parallelisation (~15 day with multiple cores maybe?).
**A**: That seems reasonable.
2. I expected the results of experiments 15-18 to be similar. Why is experiment 17 an outlier?
**A**: No idea... Maybe some bug? Try feeding with constant cost
3. Would you recommend to run the experiment with only 1000 images first and then use the full set?
**A**: Yes. The whole experiment will probably be repeated next term anyway.
# Next notes: [December onwards](https://hackmd.io/@W9LPPdtDTQCwdxGEjX9N8A/Sy2JaP0iv)