# Version Reconstruction
#### Setup Analysis
1. Checkout the `master` branch of chrome-auditor repo.
2. `cd analysis`
3. Set `uri = bolt://guangliang.gtisc.gatech.edu` in `config.cfg`
4. Install the requirements in `requirements.txt`.
#### Recovering Versions from a domain:
In the analysis stage, you can run the following command to reconstruct the versions for a domain. For example, in order to reconstruct versions of `https://www.americandancefestival.org`, the following command would be ran:
```
python3 analysis.py domain-profiler https://www.americandancefestival.org label-0 --use-sld --verbose
```
#### Output
```
Security Origin: https://www.americandancefestival.org
Iteration 0:
Domains-Used 10
Frames-Filtered 33 (0.75){'privy.com', 'addthis.com', 'googleapis.com', 'moatads.com', 'google-analytics.com', 'facebook.net', 'sharethis.com', 'instagram.com', 'facebook.com', 'addthisedge.com'}
Iteration 1:
Domains-Used 1
Frames-Filtered 11 (0.25){'gstatic.com'}
Total Frames: 44
Filtered Frames: 44
Total Frames Filtered: 1.00
Domains Used: 11/11 1.00
================================================================================
**NOTE**: Domains that show up in one day only are excluded.
**NOTE**: Domains that have daily average request count less than 0 are excluded.
================================================================================
Generating domain profile chronology...
domain min_date max_date daily_avg_req_cnt
gstatic.com 2020-02-23 2020-03-12 2
instagram.com 2020-02-23 2020-03-13 1
None 2020-02-22 2020-03-13 6
privy.com 2020-02-22 2020-03-13 5
addthis.com 2020-02-22 2020-03-13 3
facebook.net 2020-02-22 2020-03-13 3
sharethis.com 2020-02-22 2020-03-13 3
google-analytics.com 2020-02-22 2020-03-13 2
addthisedge.com 2020-02-22 2020-03-13 1
moatads.com 2020-02-22 2020-03-13 1
facebook.com 2020-02-22 2020-03-13 1
googleapis.com 2020-02-22 2020-03-13 1
================================================================================
Generating domain profile versions...
Version Domain Set
0 [{'privy.com', 'facebook.net', 'addthis.com', 'googleapis.com', 'moatads.com', 'google-analytics.com', None, 'sharethis.com', 'facebook.com', 'addthisedge.com'}, neotime.Date(2020, 2, 22)]
1 [{'gstatic.com', 'instagram.com'}, neotime.Date(2020, 2, 23)]```
```
In this scenario, the relevant information for completing the version reconstruction evaluation is the following:
```
Generating domain profile versions...
Version Domain Set
0 [{'privy.com', 'facebook.net', 'addthis.com', 'googleapis.com', 'moatads.com', 'google-analytics.com', None, 'sharethis.com', 'facebook.com', 'addthisedge.com'}, neotime.Date(2020, 2, 22)]
1 [{'gstatic.com', 'instagram.com'}, neotime.Date(2020, 2, 23)]
```
In the above example, the number of versions would be two.
Tasks
====
1. For each securityOrigin in `security_origin.csv` run the `domain-profiler` with the parameters shown in the above example.
2. Save the __entire__ output to a file. The name of the file can be the name of the security origin.
4. In [1] there is a mapping from URL to category. The purpose of the evaluation is to determine the average version count for each category.
[1] `https://github.com/jallen89/chrome-auditor/blob/master/evaluation/weblinks-collector/weblinkCategoryMapping.txt`