owned this note
owned this note
Published
Linked with GitHub
# CPR Progress:
- [Slides & Experiment Results](https://docs.google.com/presentation/d/1Joq3tmL4Dzb59zLqbBacW7FqBl8MLFQX-SCx6vvGxKU/edit?usp=sharing)
## 06/30
- 資料路徑:
- **DATASETS**: /tmp2/yzliu/for_Jacky_senbai/CPR_revision/datasets
- **RESULTS**: /tmp2/yzliu/for_Jacky_senbai/CPR_revision/results
### Current problems:
- 原 paper settings:
- 200 update times (aka 2*10^8 samples)
- BPR/Hop-Rec 的 200 epoch 太低
- 500 很正常
- [] 測 + 畫 metric 折線圖
- Some baseline > CPR
### TODO:
- 把每 **100** 個 epoch 的 embedding 存起來 + 跑 eval
- 畫折線圖
- 多跑幾次,算 mean & variance
- 可能改 item aggregation
### Survey:
- CMF 以後,不使用外部資訊 (Ex: 文字、影像) 的跨領域推薦研究幾乎沒有。
- 連 EMCDR 都有所謂使用者資訊
- 主要的研究也都 focus 在 shared user
- 唯一做 cold start 的大概是 [CATN](https://arxiv.org/pdf/2005.10549.pdf) (SIGIR 2020),但 CATN 也有使用文字
### Summary:
- 問題: 實驗分數在 200~500 epoch 之間落差很大,
Ex: 200 epoch 時,CPR 遠大於 BPR,但 500 時的差距就相對合理
- 原因: BPR+, Hop-Rec+ 等等,是把兩張圖拼接,收斂需要跑更多 epoch
- 解法: 因為 CPR 相形之下收斂較快,可畫折線圖說明其效果
## 07/08:
- Current situation:
- Potential Calim: CPR converges faster than other methods.
- Need to compare in same training settings.
- 1. There is no concept of epoch in smore
- 2. Cross-Domain vs. Single Domain
- After comparision:
- 1. For CPR, we set a step to total amount of edges of Target-Domain.
- 2. CPR's convergence seems not faster than others. (Based on results by epoch)
- Evaluation slow:
- Need multi-processing evaluation
- TODO:
- [x] Epoch-based smore training
- [x] multi-processing evaluation
- [ ] Experiments:
- [ ] CPR:
- [ ] Baselines:
- [ ] BPR
- [ ] Hop-Rec
- [ ] Bi-TGCF
- [ ] CMF
- [ ] EMCDR
- Issues:
- Significant perforfance drop from target users to shared users.
- Intuitively, in a cross-domain recommendation scenario, shared users ( users that occur in both source domain & target domain) have sufficient information, implying a better recommendation performance than users who only occur in the target domain (i.e., target users).
- However, in all of our datasets, the performance of shared users are significantly worse than target user (even worse than cold-start users).
- We doubt that there are two possible reasons:
- (1) The low proportion of shared users to all users. However, for the tv-vod dataset, most users are shared users, and it still has an extensive performance drop (0.9 vs. 0.7 on recall).
- (2) The sample bias.
- Summary:
- 1. Experiment alignment on each method.
- 2. Evaluation acceleration.
- 3. Still working on experiments.
## 07/15:
- Finished:
- [x] Accelerate evaluation with multi-processing
- [x] Add comparison plots to the slides
(some method is still waiting for the evaluation)
- TODO:
- [ ] Calculate Variance
- [x] CPR's Parameter Adjustment (Since performance is not as expect)
- [ ] Experiments:
- [x] CPR:
- [ ] Baselines:
- [x] BPR
- [x] Hop-Rec
- [ ] Bi-TGCF
- [ ] CMF
- [ ] EMCDR
- Discovery:
- Since we change the step of CPR into **Target-Domain's edge size** (while the step of LightGCN+ is **Target+Souce domains' edge size**), it's possible that 500epoch is not enough for converging.
- The answer is **negative**, scores are not higher in 600epoch. Scores in 100epoch are as good as those in 500epoch, or even better.
## 07/22:
- Spend a lot of time on rescaling codes of CMF/EMCDR codes for meeting on reproducing experiments on our 10-core datasets (different from original datasets).
- Finished CPR's Parameter Adjustment
- Best Parameter Combination: ug0.01, ig0.06
- In TVVOD, increase **1~1.2** recall/NDCG point
- In CSJHK, increase **1~2.8** recall/NDCG point
- In MTB, increase **3-3.3** recall/NDCG point
- However, CPR only performs best in MTB. CPR is in 2nd or 3rd place in other datasets.
- **LightGCN**, **Bi-TGCF**, **BPR** and **HopRec** are strong enough.
## 07/29
CPR vs LightGCN:
- Aggregation:
- Only on user + 1 layer
- Equally aggregated from both domain
- Optimization:
- User:
- Similar
- Item:
- Source of new Datasets: [Amz Review Data 2018](https://nijianmo.github.io/amazon/index.html)
## 08/05
- New Dataset preprocessing & experients (LightGCN can't run too big dataset):
- (Electronics -> Cellphones and Accessories) **No need to filter**
- (Sports and Outdoors -> Clothing, Shoes and Jewelry) **10-core**
- Figure out a LOO bug
- one log user will not add into train. This raised CPR's bug.
- 
- Implementing CoNet
- Meeting Notes
- add @20
- LightGCN too low?
- After completing big table, adjust weired score. List interesting examples.
- Explain why CPR doing better in cold-start
- Hope cold-start user close to target item
- 志明學長的圖:https://www.geogebra.org/m/epvjwaJG
## 08/13
- Meeting Notes
- 10-core elcpa
- 全部模型都用一樣的大圖比較好解釋
- 小圖可能當appliaction study, 是不是在什麼情況下(不同User)比較適合用小圖?
- 開始準備Paper的數字
# 08/19
- Finished 10-core elcpa, elcpa becomes an extremely small dataset.
- LightGCN performs better in little dataset? (since the CPR's score of raw-data elcpa is better than LightGCN)

- So, maybe CPR recommends better for those users who have few interactions.
## Time Table:
- 7/23-7/30:
- 1. New dataset & preprocessing (remove CSJ-HK)
- 2. All baseline Methods (remove Hop-Rec)
- 3. Fixed Epoch Number (200 epoch)
- 7/31-8/6:
- Test for multiple times (mean, var, ...)
- t-SNE?
<8/6 前完成所有基本實驗>
- 8/7-13:
- Other Experiments
- 8/13-8/20:
- Abstraction
- Methodology
<8/23 之前完成所有實驗>
<8/30 Abstraction Deadline>
<9/8 Submission Deadline>