# CPR Progress: - [Slides & Experiment Results](https://docs.google.com/presentation/d/1Joq3tmL4Dzb59zLqbBacW7FqBl8MLFQX-SCx6vvGxKU/edit?usp=sharing) ## 06/30 - 資料路徑: - **DATASETS**: /tmp2/yzliu/for_Jacky_senbai/CPR_revision/datasets - **RESULTS**: /tmp2/yzliu/for_Jacky_senbai/CPR_revision/results ### Current problems: - 原 paper settings: - 200 update times (aka 2*10^8 samples) - BPR/Hop-Rec 的 200 epoch 太低 - 500 很正常 - [] 測 + 畫 metric 折線圖 - Some baseline > CPR ### TODO: - 把每 **100** 個 epoch 的 embedding 存起來 + 跑 eval - 畫折線圖 - 多跑幾次,算 mean & variance - 可能改 item aggregation ### Survey: - CMF 以後,不使用外部資訊 (Ex: 文字、影像) 的跨領域推薦研究幾乎沒有。 - 連 EMCDR 都有所謂使用者資訊 - 主要的研究也都 focus 在 shared user - 唯一做 cold start 的大概是 [CATN](https://arxiv.org/pdf/2005.10549.pdf) (SIGIR 2020),但 CATN 也有使用文字 ### Summary: - 問題: 實驗分數在 200~500 epoch 之間落差很大, Ex: 200 epoch 時,CPR 遠大於 BPR,但 500 時的差距就相對合理 - 原因: BPR+, Hop-Rec+ 等等,是把兩張圖拼接,收斂需要跑更多 epoch - 解法: 因為 CPR 相形之下收斂較快,可畫折線圖說明其效果 ## 07/08: - Current situation: - Potential Calim: CPR converges faster than other methods. - Need to compare in same training settings. - 1. There is no concept of epoch in smore - 2. Cross-Domain vs. Single Domain - After comparision: - 1. For CPR, we set a step to total amount of edges of Target-Domain. - 2. CPR's convergence seems not faster than others. (Based on results by epoch) - Evaluation slow: - Need multi-processing evaluation - TODO: - [x] Epoch-based smore training - [x] multi-processing evaluation - [ ] Experiments: - [ ] CPR: - [ ] Baselines: - [ ] BPR - [ ] Hop-Rec - [ ] Bi-TGCF - [ ] CMF - [ ] EMCDR - Issues: - Significant perforfance drop from target users to shared users. - Intuitively, in a cross-domain recommendation scenario, shared users ( users that occur in both source domain & target domain) have sufficient information, implying a better recommendation performance than users who only occur in the target domain (i.e., target users). - However, in all of our datasets, the performance of shared users are significantly worse than target user (even worse than cold-start users). - We doubt that there are two possible reasons: - (1) The low proportion of shared users to all users. However, for the tv-vod dataset, most users are shared users, and it still has an extensive performance drop (0.9 vs. 0.7 on recall). - (2) The sample bias. - Summary: - 1. Experiment alignment on each method. - 2. Evaluation acceleration. - 3. Still working on experiments. ## 07/15: - Finished: - [x] Accelerate evaluation with multi-processing - [x] Add comparison plots to the slides (some method is still waiting for the evaluation) - TODO: - [ ] Calculate Variance - [x] CPR's Parameter Adjustment (Since performance is not as expect) - [ ] Experiments: - [x] CPR: - [ ] Baselines: - [x] BPR - [x] Hop-Rec - [ ] Bi-TGCF - [ ] CMF - [ ] EMCDR - Discovery: - Since we change the step of CPR into **Target-Domain's edge size** (while the step of LightGCN+ is **Target+Souce domains' edge size**), it's possible that 500epoch is not enough for converging. - The answer is **negative**, scores are not higher in 600epoch. Scores in 100epoch are as good as those in 500epoch, or even better. ## 07/22: - Spend a lot of time on rescaling codes of CMF/EMCDR codes for meeting on reproducing experiments on our 10-core datasets (different from original datasets). - Finished CPR's Parameter Adjustment - Best Parameter Combination: ug0.01, ig0.06 - In TVVOD, increase **1~1.2** recall/NDCG point - In CSJHK, increase **1~2.8** recall/NDCG point - In MTB, increase **3-3.3** recall/NDCG point - However, CPR only performs best in MTB. CPR is in 2nd or 3rd place in other datasets. - **LightGCN**, **Bi-TGCF**, **BPR** and **HopRec** are strong enough. ## 07/29 CPR vs LightGCN: - Aggregation: - Only on user + 1 layer - Equally aggregated from both domain - Optimization: - User: - Similar - Item: - Source of new Datasets: [Amz Review Data 2018](https://nijianmo.github.io/amazon/index.html) ## 08/05 - New Dataset preprocessing & experients (LightGCN can't run too big dataset): - (Electronics -> Cellphones and Accessories) **No need to filter** - (Sports and Outdoors -> Clothing, Shoes and Jewelry) **10-core** - Figure out a LOO bug - one log user will not add into train. This raised CPR's bug. - ![](https://i.imgur.com/Qx1QsjM.png) - Implementing CoNet - Meeting Notes - add @20 - LightGCN too low? - After completing big table, adjust weired score. List interesting examples. - Explain why CPR doing better in cold-start - Hope cold-start user close to target item - 志明學長的圖:https://www.geogebra.org/m/epvjwaJG ## 08/13 - Meeting Notes - 10-core elcpa - 全部模型都用一樣的大圖比較好解釋 - 小圖可能當appliaction study, 是不是在什麼情況下(不同User)比較適合用小圖? - 開始準備Paper的數字 # 08/19 - Finished 10-core elcpa, elcpa becomes an extremely small dataset. - LightGCN performs better in little dataset? (since the CPR's score of raw-data elcpa is better than LightGCN) ![](https://i.imgur.com/1rZRtql.png) - So, maybe CPR recommends better for those users who have few interactions. ## Time Table: - 7/23-7/30: - 1. New dataset & preprocessing (remove CSJ-HK) - 2. All baseline Methods (remove Hop-Rec) - 3. Fixed Epoch Number (200 epoch) - 7/31-8/6: - Test for multiple times (mean, var, ...) - t-SNE? <8/6 前完成所有基本實驗> - 8/7-13: - Other Experiments - 8/13-8/20: - Abstraction - Methodology <8/23 之前完成所有實驗> <8/30 Abstraction Deadline> <9/8 Submission Deadline>