# 0518-Meeting <!-- ## Last-Click為相同item的比例 ![](https://i.imgur.com/5ydq5tM.png) --> ## Rosetta.ai <!-- ### config.transformed_dummy_item ![](https://i.imgur.com/vZPkN6Q.png) ### continuous_features ![](https://i.imgur.com/YuRum1i.png) --> ![](https://i.imgur.com/nSpbv4A.png) <!-- ![](https://i.imgur.com/TZ2FYtu.png) --> **藍色區塊:** ![](https://i.imgur.com/woPfPFn.png) **橘色區塊:Previous interacted** ![](https://i.imgur.com/rypqe6S.png) **黃色區塊:過去互動的type, click等** ![](https://i.imgur.com/gN3KAqq.png) <!-- 過去互動過的item ![](https://i.imgur.com/7px0XZw.png) --> **綠色區塊:** 這個session內扣掉要預測的item後的impression ![](https://i.imgur.com/mfTLdCW.png) Continuous Feature ![](https://i.imgur.com/9kHpKab.png) V dot U ![](https://i.imgur.com/zRQuU2t.png) Max-pooling 原始做法: ![](https://i.imgur.com/Ot0XRNl.png) Click-out items(model最左feature) 取出一個max過後的item, action embedding ![](https://i.imgur.com/49M1vVp.png) ## Concat All 切割檔案會導致新mapping的id跑掉 ![](https://i.imgur.com/PQ4b73C.png) ## Process time train 25w sessions, val 5w ![](https://i.imgur.com/FUBJhME.png) | Process | Time | | -------- | -------- | | Preprocess | 1249s | | Train | 813s | | Val | 47s | --- train 50w sessions, val 5w ![](https://i.imgur.com/QZCt7Iu.png) | Process | Time | | -------- | -------- | | Preprocess | 2542s | | Train | 1646s | | Val | 58s | --- ## Attentive RNN ![](https://i.imgur.com/cpKaJkm.png) ![](https://i.imgur.com/bpjyXq3.png) 6隻前處理code共會產生34隻csv ### Pandas vs Pickle ![](https://i.imgur.com/leNnO4t.png) Pandas存太慢 (30分鐘才存2G) ![](https://i.imgur.com/luK5KMm.png) trade-off ![](https://i.imgur.com/aCiHe9x.png) ![](https://i.imgur.com/lUmmwBq.png) 單一個dataframe, 在session只用30%的情況還會到30G+ ![](https://i.imgur.com/fQmrPSH.png) ### Performance 10W session (10%), MRR 0.65 Claim 0.678 ![](https://i.imgur.com/PanMPT9.png) <!-- | Process | Time | | -------- | -------- | | Preprocess | 大於4hr | | Train | 1hr | | Val | > 5 mins | --> ### 專題生進度安排 > https://share.clickup.com/c/h/3fjyn-159/c4e47a282ec9173