# Version Reconstruction #### Setup Analysis 1. Checkout the `master` branch of chrome-auditor repo. 2. `cd analysis` 3. Set `uri = bolt://guangliang.gtisc.gatech.edu` in `config.cfg` 4. Install the requirements in `requirements.txt`. #### Recovering Versions from a domain: In the analysis stage, you can run the following command to reconstruct the versions for a domain. For example, in order to reconstruct versions of `https://www.americandancefestival.org`, the following command would be ran: ``` python3 analysis.py domain-profiler https://www.americandancefestival.org label-0 --use-sld --verbose ``` #### Output ``` Security Origin: https://www.americandancefestival.org Iteration 0: Domains-Used 10 Frames-Filtered 33 (0.75){'privy.com', 'addthis.com', 'googleapis.com', 'moatads.com', 'google-analytics.com', 'facebook.net', 'sharethis.com', 'instagram.com', 'facebook.com', 'addthisedge.com'} Iteration 1: Domains-Used 1 Frames-Filtered 11 (0.25){'gstatic.com'} Total Frames: 44 Filtered Frames: 44 Total Frames Filtered: 1.00 Domains Used: 11/11 1.00 ================================================================================ **NOTE**: Domains that show up in one day only are excluded. **NOTE**: Domains that have daily average request count less than 0 are excluded. ================================================================================ Generating domain profile chronology... domain min_date max_date daily_avg_req_cnt gstatic.com 2020-02-23 2020-03-12 2 instagram.com 2020-02-23 2020-03-13 1 None 2020-02-22 2020-03-13 6 privy.com 2020-02-22 2020-03-13 5 addthis.com 2020-02-22 2020-03-13 3 facebook.net 2020-02-22 2020-03-13 3 sharethis.com 2020-02-22 2020-03-13 3 google-analytics.com 2020-02-22 2020-03-13 2 addthisedge.com 2020-02-22 2020-03-13 1 moatads.com 2020-02-22 2020-03-13 1 facebook.com 2020-02-22 2020-03-13 1 googleapis.com 2020-02-22 2020-03-13 1 ================================================================================ Generating domain profile versions... Version Domain Set 0 [{'privy.com', 'facebook.net', 'addthis.com', 'googleapis.com', 'moatads.com', 'google-analytics.com', None, 'sharethis.com', 'facebook.com', 'addthisedge.com'}, neotime.Date(2020, 2, 22)] 1 [{'gstatic.com', 'instagram.com'}, neotime.Date(2020, 2, 23)]``` ``` In this scenario, the relevant information for completing the version reconstruction evaluation is the following: ``` Generating domain profile versions... Version Domain Set 0 [{'privy.com', 'facebook.net', 'addthis.com', 'googleapis.com', 'moatads.com', 'google-analytics.com', None, 'sharethis.com', 'facebook.com', 'addthisedge.com'}, neotime.Date(2020, 2, 22)] 1 [{'gstatic.com', 'instagram.com'}, neotime.Date(2020, 2, 23)] ``` In the above example, the number of versions would be two. Tasks ==== 1. For each securityOrigin in `security_origin.csv` run the `domain-profiler` with the parameters shown in the above example. 2. Save the __entire__ output to a file. The name of the file can be the name of the security origin. 4. In [1] there is a mapping from URL to category. The purpose of the evaluation is to determine the average version count for each category. [1] `https://github.com/jallen89/chrome-auditor/blob/master/evaluation/weblinks-collector/weblinkCategoryMapping.txt`