AnyClip Analysis

###### tags: `customers` # AnyClip Analysis ## WebApp2 **EN model** 1) **First, We've uploaded their models and integrated it with the system** ![](https://hackmd.io/_uploads/BJZ2hF_iK.png) 2a) **We've reviewed its metrics on train, validation, test** ![](https://hackmd.io/_uploads/BkvZMFdiK.png) 2b) As part of the feature exploration the TF_IDF related features were explored and shown to be highly correlative with loss (low TF-IDF, high loss) ![](https://hackmd.io/_uploads/SJdCGKuiF.png) 3a) **looking in the embedded space created by the system, some of the features are very useful in determining the embedding (people_dot_with_prob for example)** ![](https://hackmd.io/_uploads/BJpBf9usF.png) 3b) **While others, like feed and content owner are not very relevant** ![](https://hackmd.io/_uploads/ByK17c_jt.png) ![](https://hackmd.io/_uploads/Hk3bX9djt.png) 4) **to make these two features more useful we've replaced the categorical coding of the strings to an embedding layer (dim=2) and retrained the model** ![](https://hackmd.io/_uploads/Skav8c_sF.png) 5) **This resultes in an improvement of the train, validation and test metrics (test has changed from 0.776 to 0.736 @ EN2DE model)** ![](https://hackmd.io/_uploads/BycmBFOsY.png) 6) **To exemplify the usefulnes, we can observe a much greater seperability in the latent space for the feed variables** The embedding of two seperate feeds in the embedding layer model: ![](https://hackmd.io/_uploads/BJNarKOiF.png) The embedding of two seperate feeds in the original model: ![](https://hackmd.io/_uploads/Skr0BF_jY.png) [link](https://anyclip.tensorleap.ai/process/model/61c1f685bd03d9001972ee63#%28Process:%28showAnalyzer:!t,showJobsModelSlide:!f%29,TLEditor:%28transform:%28k:0.00040766066254326934,x:3753.967621133087,y:738.5102962857736%29%29,_store:%28versions/currentIds:!%28%29%29,scatter/panel-1640612661045:%28ds:loss,hm:metadata_meta_feed,st:training,t:%28k:0.5,x:60.58236053744578,y:59.94368364089803%29%29,scatter/panel-1640674337823:%28ds:loss,hm:metadata_meta_feed,st:training,t:%28k:0.6925547340554626,x:51.94740625818852,y:22.613189264304825%29%29,scatter/panel-training-61c1f685bd03d9001972ee63}:%28ds:loss,hm:metadata_row_num,st:training,t:%28k:1,x:0,y:0%29%29%29) [link](https://anyclip.tensorleap.ai/process/model/61bb4d7e3b68340019a796b7#%28Process:%28showAnalyzer:!t,showJobsModelSlide:!f%29,TLEditor:%28transform:%28k:0.00040766066254326934,x:3753.967621133087,y:738.5102962857736%29%29,_store:%28versions/currentIds:!%28%29%29,scatter/panel-1639665892093:%28ds:loss,hm:metadata_meta_feed,projection:XZ,st:training,t:%28k:0.5,x:97.95624712922184,y:46.74575272147575%29%29,scatter/panel-1640106878747:%28ds:loss,hm:metadata_row_num,st:training,t:%28k:1,x:0,y:0%29%29,scatter/panel-1640107223210:%28ds:loss,hm:metadata_row_num,st:validation,t:%28k:0.5,x:34.69229820682588,y:35.07590860211384%29%29,scatter/panel-1640612661045:%28ds:loss,hm:metadata_meta_feed,st:training,t:%28k:0.5,x:60.58236053744578,y:59.94368364089803%29%29,scatter/panel-1640674337823:%28ds:loss,hm:metadata_meta_feed,st:training,t:%28k:0.6925547340554626,x:51.94740625818852,y:22.613189264304825%29%29,scatter/panel-training-61c1f685bd03d9001972ee63}:%28ds:loss,hm:metadata_row_num,st:training,t:%28k:1,x:0,y:0%29%29%29) 7. **Finally, we've noticed some ambiguity - same features are mapped to different ground truth** ![](https://hackmd.io/_uploads/BkwTlhuoF.png) <h2> Additional comments </h2> - There seem to be a domain gap between the train/validation and the test set. This could be seen where finetuning the model for too long - validation remains constant while test increase. - This could also be examplified by looking at the distribution of selected feed values ![](https://hackmd.io/_uploads/BkPtvtdot.png) - There seem to be some difference between the En and the EN-DE models. Mainly, the En2De model seems last accurate. The main observable difference is the way the TF-IDF variables distribute between the two models. - The En model: ![](https://hackmd.io/_uploads/BkuIuFOjt.png) - The En2De model: ![](https://hackmd.io/_uploads/ryQROFOjK.png) ## Previous **EN model** 1) **First, We've uploaded their models and integrated it with the system** [Model's block diagram](https://anyclip.tensorleap.ai/process/model/61bb4d7e3b68340019a796b7#%28DashboardTabs:%28dashboardId:'61add1226bc7cd00125b8cba'%29,Process:%28showDashboards:!f,showJobsModelSlide:!f%29,TLEditor:%28transform:%28k:0.0003808534582977946,x:3742.475613678166,y:595.6057028344246%29%29,_store:%28versions/currentIds:!%28%29%29%29) ![](https://hackmd.io/_uploads/HkApZFuot.png) 2a) **We've reviewed its metrics on train, validation, test** ![](https://hackmd.io/_uploads/BkvZMFdiK.png) [metrics]( https://anyclip.tensorleap.ai/process/model/61bb4d7e3b68340019a796b7#(DashboardTabs:(dashboardId:'61add1226bc7cd00125b8cba'),Process:(showDashboards:!t,showJobsModelSlide:!f),TLEditor:(transform:(k:0.0004082280703008075,x:3747.260682123536,y:749.159477104938)),_store:(versions/currentIds:!()))) 2b) **As part of the feature exploration the TF_IDF related features were explored and shown to be highly correlative with loss (low TF-IDF, high loss)** ![](https://hackmd.io/_uploads/SJdCGKuiF.png) [TF IDF metrics](https://anyclip.tensorleap.ai/process/model/61bb4d7e3b68340019a796b7#(DashboardTabs:(dashboardId:'61add1226bc7cd00125b8cba'),Process:(showDashboards:!t,showJobsModelSlide:!f),TLEditor:(transform:(k:0.0004082280703008075,x:3747.260682123536,y:749.159477104938)),_store:(versions/currentIds:!()))) **looking in the embedded space created by the system, some of the features are very useful in determining the embedding (people_dot_with_prob for example)** ![](https://hackmd.io/_uploads/rJ5fVKOjK.png) [Informative feature example](https://anyclip.tensorleap.ai/process/model/61bb4d7e3b68340019a796b7#(DashboardTabs:(dashboardId:'61add1226bc7cd00125b8cba'),Process:(showAnalyzer:!t,showDashboards:!t,showJobsModelSlide:!f),TLEditor:(transform:(k:0.0004082280703008075,x:3747.260682123536,y:749.159477104938)),_store:(versions/currentIds:!()),scatter/panel-1639665892093:(ds:loss,hm:input_people_dot_with_prob_value,st:validation,t:(k:1,x:0,y:0)),scatter/panel-1640106878747:(ds:loss,hm:metadata_row_num,st:training,t:(k:1,x:0,y:0)),scatter/panel-1640107223210:(ds:loss,hm:metadata_row_num,st:validation,t:(k:1,x:0,y:0)))) 3b) **While others, like feed and content owner are not very relevant** ![](https://hackmd.io/_uploads/SJhL4tOiK.png) ![](https://hackmd.io/_uploads/SyId4tdjK.png) [Content owner effect on embedding](https://anyclip.tensorleap.ai/process/model/61bb4d7e3b68340019a796b7#(DashboardTabs:(dashboardId:'61add1226bc7cd00125b8cba'),Process:(showAnalyzer:!t,showDashboards:!t,showJobsModelSlide:!f),TLEditor:(transform:(k:0.0004082280703008075,x:3747.260682123536,y:749.159477104938)),_store:(versions/currentIds:!()),scatter/panel-1639665892093:(ds:loss,hm:input_content_owner_value,st:validation,t:(k:1,x:0,y:0)),scatter/panel-1640106878747:(ds:loss,hm:metadata_meta_feed,projection:XY,st:validation,t:(k:1,x:0,y:0)),scatter/panel-1640107223210:(ds:loss,hm:metadata_row_num,st:validation,t:(k:1,x:0,y:0)))) [Feed effect on embedding](https://anyclip.tensorleap.ai/process/model/61bb4d7e3b68340019a796b7#(DashboardTabs:(dashboardId:'61add1226bc7cd00125b8cba'),Process:(showAnalyzer:!t,showDashboards:!t,showJobsModelSlide:!f),TLEditor:(transform:(k:0.0004082280703008075,x:3747.260682123536,y:749.159477104938)),_store:(versions/currentIds:!()),scatter/panel-1639665892093:(ds:loss,hm:metadata_meta_feed,st:validation,t:(k:1,x:0,y:0)),scatter/panel-1640106878747:(ds:loss,hm:metadata_meta_feed,projection:XY,st:validation,t:(k:1,x:0,y:0)),scatter/panel-1640107223210:(ds:loss,hm:metadata_row_num,st:validation,t:(k:1,x:0,y:0)))) **to make these two features more useful we've replaced the categorical coding of the strings to an embedding layer and retrained the model** ![](https://hackmd.io/_uploads/HJWxrFuiY.png) [New model block diagram](https://anyclip.tensorleap.ai/process/model/61c1f685bd03d9001972ee63#%28DashboardTabs:%28dashboardId:'61add1226bc7cd00125b8cba'%29,Process:%28showAnalyzer:!f,showDashboards:!f,showJobsModelSlide:!f%29,TLEditor:%28transform:%28k:0.000363225650326053,x:4922.07011652321,y:420.9780148438694%29%29,VisPanel:%28panel-1640107223210:1%29,_store:%28versions/currentIds:!%28%29%29,scatter/panel-1639665892093:%28ds:loss,hm:input_people_dot_with_prob_value,st:validation,t:%28k:0.863339558574412,x:24.995947043405693,y:14.196308881647681%29%29,scatter/panel-1640106878747:%28ds:loss,hm:metadata_row_num,st:training,t:%28k:0.7453551933994602,x:76.63132019226151,y:18.411145461280455%29%29,scatter/panel-1640107223210:%28ds:loss,hm:metadata_row_num,st:validation,t:%28k:0.6434946236506354,x:57.3662499858836,y:50.713001525152464%29%29%29) 5) **This resultes in an improvement of the train, validation and test metrics (test has changed from 0.776 to 0.736 @ EN2DE model)** ![](https://hackmd.io/_uploads/BycmBFOsY.png) [New model metrics](https://anyclip.tensorleap.ai/process/model/61c9bf85db3ce10019142405#(DashboardTabs:(dashboardId:'61add1226bc7cd00125b8cba'),Process:(showDashboards:!t,showJobsModelSlide:!f),TLEditor:(transform:(k:0.00040766066254326934,x:3753.967621133087,y:744.5102962857736)),_store:(versions/currentIds:!()))) 6) **To exemplify the usefulnes, we can observe a much greater seperability in the latent space for the feed variables** The embedding of two seperate feeds in the embedding layer model: ![](https://hackmd.io/_uploads/BJNarKOiF.png) The embedding of two seperate feeds in the original model: ![](https://hackmd.io/_uploads/Skr0BF_jY.png) [New embedding](https://anyclip.tensorleap.ai/process/model/61c1f685bd03d9001972ee63#(Process:(showAnalyzer:!t,showJobsModelSlide:!f),TLEditor:(transform:(k:0.00040766066254326934,x:3753.967621133087,y:738.5102962857736)),_store:(versions/currentIds:!()),scatter/panel-1640612661045:(ds:loss,hm:metadata_meta_feed,st:training,t:(k:0.7067481390580825,x:69.5890937215301,y:52.03893651923481)),scatter/panel-1640674337823:(ds:loss,hm:metadata_meta_feed,st:training,t:(k:0.6925547340554626,x:70.60833307703618,y:5.425490900047009)),scatter/panel-training-61c1f685bd03d9001972ee63}:(ds:loss,hm:metadata_row_num,st:training,t:(k:1,x:78.78437042236328,y:405.8634262084961)))) [Old embedding](https://anyclip.tensorleap.ai/process/model/61bb4d7e3b68340019a796b7#(Process:(showAnalyzer:!t,showJobsModelSlide:!f),TLEditor:(transform:(k:0.00040766066254326934,x:3753.967621133087,y:738.5102962857736)),_store:(versions/currentIds:!()),scatter/panel-1639665892093:(ds:loss,hm:metadata_meta_feed,projection:XZ,st:training,t:(k:0.5,x:97.95624712922184,y:46.74575272147575)),scatter/panel-1640106878747:(ds:loss,hm:metadata_row_num,st:training,t:(k:1,x:0,y:0)),scatter/panel-1640107223210:(ds:loss,hm:metadata_row_num,st:validation,t:(k:0.5,x:34.69229820682588,y:35.07590860211384)),scatter/panel-1640612661045:(ds:loss,hm:metadata_meta_feed,st:training,t:(k:0.5,x:60.58236053744578,y:59.94368364089803)),scatter/panel-1640674337823:(ds:loss,hm:metadata_meta_feed,st:training,t:(k:0.6925547340554626,x:51.94740625818852,y:22.613189264304825)),scatter/panel-training-61c1f685bd03d9001972ee63}:(ds:loss,hm:metadata_row_num,st:training,t:(k:1,x:0,y:0)))) 7. **Finally, we've noticed some ambiguity - same features are mapped to different ground truth** ![](https://hackmd.io/_uploads/rkWnUtdjF.png) [Labeling ambiguity](https://anyclip.tensorleap.ai/process/model/61c445951327e1001917a681#(DashboardTabs:(dashboardId:'61add1226bc7cd00125b8cba'),Process:(showAnalyzer:!t,showDashboards:!f,showJobsModelSlide:!f),TLEditor:(transform:(k:0.0010869565217391304,x:0,y:0)),VisPanel:(panel-1640618875817:0),_store:(versions/currentIds:!()),scatter/panel-1640268876341:(ds:loss,hm:input_people_dot_with_prob_value,projection:XZ,st:training,t:(k:1,x:0,y:0)),scatter/panel-1640614132952:(ds:similarity,hm:metadata_gt,st:validation,t:(k:0.623300597137549,x:39.42131118541644,y:16.167518364754983)),scatter/panel-1640617189877:(ds:similarity,hm:metadata_gt,st:validation,t:(k:0.5,x:65.88571829326732,y:13.721666310084856)),scatter/panel-1640618701959:(ds:similarity,hm:metadata_gt,st:validation,t:(k:0.6708211124411309,x:64.68616017700134,y:32.547083988332616)),scatter/panel-1640618875817:(ds:similarity,hm:metadata_gt,st:validation,t:(k:0.5,x:48.49075468787269,y:31.663205960619052)),scatter/panel-training-61c445951327e1001917a681}:(ds:loss,hm:metadata_SUBSETrow_num,st:training,t:(k:1,x:0,y:0)))) <h2> Additional comments (Not sure we should discuss) </h2> - There seem to be a domain gap between the train/validation and the test set. This could be seen where finetuning the model for too long - validation remains constant while test increase. - This could also be examplified by looking at the distribution of selected feed values ![](https://hackmd.io/_uploads/BkPtvtdot.png) [link](https://anyclip.tensorleap.ai/process/model/61bb4d7e3b68340019a796b7#%28DashboardTabs:%28dashboardId:'61add1226bc7cd00125b8cba'%29,Process:%28showDashboards:!t,showJobsModelSlide:!f%29,TLEditor:%28transform:%28k:0.00040766066254326934,x:3753.967621133087,y:738.5102962857736%29%29,_store:%28versions/currentIds:!%28%29%29%29) - There seem to be some difference between the En and the EN-DE models. Mainly, the En2De model seems last accurate. The main observable difference is the way the TF-IDF variables distribute between the two models. - The En model: ![](https://hackmd.io/_uploads/BkuIuFOjt.png) [link](https://anyclip.tensorleap.ai/process/model/61bb4d7e3b68340019a796b7#%28DashboardTabs:%28dashboardId:'61add1226bc7cd00125b8cba'%29,Process:%28showDashboards:!t,showJobsModelSlide:!f%29,TLEditor:%28transform:%28k:0.00032506575710113697,x:3876.7862476591567,y:787.5563787348929%29%29,_store:%28versions/currentIds:!%28%29%29%29) - The En2De model: ![](https://hackmd.io/_uploads/ryQROFOjK.png) [link](https://anyclip.tensorleap.ai/process/model/61c34a72880fc40019f2d37c#%28DashboardTabs:%28dashboardId:'61add1226bc7cd00125b8cba'%29,Process:%28showDashboards:!t,showJobsModelSlide:!f%29,TLEditor:%28transform:%28k:0.0003960710468986956,x:3758.450031069285,y:774.5401385348595%29%29,_store:%28versions/currentIds:!%28%29%29%29)