Improving SmartQuotes Product Recommendation

# Improving SmartQuotes Product Recommendation <span style="color:#F49AC2">Drishti Bhasin</span> <span style="color:#85A7FF">Mentor : Karthik S V</span> <span style="color:#85A7FF">Manager : Vidhyasagar Alvarsamy</span> --- ## <span style="color:#85A7FF">Problem Statement</span> <div style="text-align: left"> </br>  #### <span style = "color:#F5F5DC"> The vision behind introducing product recommendation within SmartQuotes is to cut down the time taken to: </span> * Finalize a quote for an enterprise customer in <span style = "color:#F49AC2"> minimum amount of time </span>. * <span style = "color:#F49AC2"> Reduce the cognitive load </span> over sellers of manually going through customer history to draft a quote. * Provide <span style = "color:#F49AC2"> accurate </span> renewal quote recommendations. As a part of my project, I worked on exploring new techniques to predict a renewal quote using the corresponding base quotes. </div> --- ## <span style="color:#85A7FF">Current Approach</span> <div style="text-align: left"> </br> ### <span style = "color:#F5F5DC"> Architecture Overview </span>  A deep neural network is trained with input being the base quote i.e purchase data from the expiring agreement term and output being the product list in the renewal quote. </div> </br> </br> <div style="text-align: left"> ### <span style = "color:#F5F5DC"> How are the quotes represented? </span> </div> <img style="float: center;" src="https://i.imgur.com/o7eoKsc.png" alt="u denotes the set of base quotes and v denotes the corresponding set of labels" width="700" height = "300" class="left">       <style> .reveal { font-size: 14px; } </style> --- ## <span style="color:#85A7FF">Solution </span> <div style="text-align: left"> ### <span style = "color:#F5F5DC"> Treating products as labels </span> * Treat this problem as a <span style = "color:#F49AC2">Multi-label Classification</span> problem. </br> * Each item in the product is a label and the base quote is assigned multiple labels. </br> * These labels become the product list of the renewal quote.      The main challenge the above approach poses is <span style = "color:#F49AC2">the number of choices we have.</span> Multi-label algorithms can barely scale to questions involving five thousand choices </div>      <div style="text-align: left"> ### <span style = "color:#F5F5DC"> Enter Extreme classification </span> * This paradigm tackles multi-label problems involving an extremely large number of choices. * Our focus was on two algorithms in this area, <span style = "color:#F49AC2">DeepXML</span> and <span style = "color:#F49AC2">GalaxC</span>. * Both architectures for annotating data points<span style = "color:#F49AC2"> (base quotes)</span> with the most relevant subset of labels<span style = "color:#F49AC2"> (renewal quote)</span> from an extremely large label set<span style = "color:#F49AC2"> (product list)</span>. <style> .reveal { font-size: 20px; } </style> --- ## <span style="color:#85A7FF">DeepXML</span> <br clear="left"/> <div style="text-align: left"> #### <span style = "color:#F5F5DC"> Decomposes the deep extreme multi-label task into four simpler sub-tasks each of which can be trained accurately and efficiently</span> </div> <img src="https://i.imgur.com/gUvqB8X.jpg" alt="u denotes the set of base quotes and v denotes the corresponding set of labels" width="500" height = "200"> </br> <br clear="left"/> <div style="text-align: left"> This algorithm was deployed on the Bing search engine for a number of short text applications ranging from matching user queries to advertiser bid phrases to showing personalized ads </div> <br clear="left"/> <div style="text-align: left"> #### <span style = "color:#F49AC2"> Why is it suitable for our use case? </span> It provides a modular framework where one can replace any of the above mentioned modules to customise the model according to their problem statement. One of its spawns, **GalaxC** will be explored in the slides ahead </div> <style> .reveal { font-size: 20px; } </style> </br> --- ## <span style="color:#85A7FF">Architecture Overview</span> | <span style = "color:#F5F5DC">Module 1</span> </br> <span style = "color:#F49AC2"> Intermediate Representations </span> | <span style = "color:#F5F5DC">Module 2</span> </br> <span style = "color:#F49AC2"> Label </br> Shortlisting </span> | <span style = "color:#F5F5DC">Module 3</span> </br> <span style = "color:#F49AC2"> Learning Final Representations </span> | <span style = "color:#F5F5DC">Module 4</span> </br> <span style = "color:#F49AC2"> Joint Classifier Training </span> | | -------- | -------- | -------- | --- | | * A feature extractor is trained on a non extreme surrogate task. </BR></br> * The Motivation is to train our classifiers in log time | * Eliminate all labels which dont contribute much to learning. </br></br> * The motivation is to reduce the problem from an extreme problem to a traditional classification problem | * Use Transfer learning to obtain the final representations. </br></br> * Motivation is to get all the accuracy gains from finetuning while maintaining logarithmic costs. | * Jointly train a classifier along with the final feature representations </br></br> * Only the shortlisted labels from module 2 will be used. | <img style="float: center;" src="https://i.imgur.com/i1JMLJL.png" alt="u denotes the set of base quotes and v denotes the corresponding set of labels" width="400" height ="250"> --- ## <span style="color:#85A7FF">GalaxC </span> <div style="text-align: left"> #### <span style = "color:#F5F5DC"> Thinking of Extreme Classification as a link prediction problem in vast Bipartite graphs </span> </div>  <img src="https://i.imgur.com/e3RYJuE.png" alt="u denotes the set of base quotes and v denotes the corresponding set of labels" width="400"/> <br clear="left"/> <div style="text-align: left"> there is an edge (d,l) ∈ E between base quote 𝑑 ∈ D and label l ∈ L if l is a positive label for d </div> <div style="text-align: left"> #### <span style = "color:#F49AC2"> This Joint graph enables: </span> </div> <br clear="left"/> <div style="text-align: left"> * Rich correlations to be learnt between the data points and the labels, for eg suppose base quotes 𝑑1, 𝑑2 share a common label 𝑙1. If there is another label 𝑙2 relevant to 𝑑2, it can be inferred that 𝑙2 might be relevant to 𝑑1 as * Scenarios when the data point representation is not very expressive </div> <style> .reveal { font-size: 20px; } </style> --- ## <span style="color:#85A7FF">Architecture Overview</span> | <span style = "color:#F5F5DC">Module 1</span> </br> <span style = "color:#F49AC2"> Learning Node </br> Representations </span> | <span style = "color:#F5F5DC">Module 2</span> </br> <span style = "color:#F49AC2"> Label Shortlisting </span> | <span style = "color:#F5F5DC">Module 3</span> </br> <span style = "color:#F49AC2"> Fine-Tuning </br> Parameters </span> | <span style = "color:#F5F5DC">Module 4</span> </br> <span style = "color:#F49AC2"> Joint Classifier Training </span> | | -------- | -------- | -------- | --- | | * Refine initial one hot encoded representations using a Graph Neural Network. </br> * Assign a weightage to each representation based on its importance. </br> * Use them in a one-vs-all classifier to score each label | Same as DeepXML | * Parameters of the </br> Graph Neural Network </br> remain fixed. </br></br> * Weights for the </br> one-vs-all classifiers </br> and the attention network are fine-tuned. | Same as DeepXML | <img style="float: right;" src="https://i.imgur.com/mB1uX3r.png" alt="u denotes the set of base quotes and v denotes the corresponding set of labels" width="400" height="250"> <img style="float: left;" src="https://i.imgur.com/4cF4G27.png" alt="u denotes the set of base quotes and v denotes the corresponding set of labels" width="400" height ="250"> --- ## <span style="color:#85A7FF">Results</span> <br clear="left"/> <div style="text-align: left"> ### <span style = "color:#F5F5DC"> Metrics </span> Shown below are the results obtained from both the algorithms trained for 20 epochs on a CPU, in comparison to the metrics shown by the og approach. | Algorithm | ANN | DeepXML | Galaxc | | --------- | --- | ------- | ------ | | Recall | 40 | 30 | 19 | | Precision | 56 | 55 | 51 | ### <span style = "color:#F5F5DC"> Analysis </span> * Both the algorithms have not yet yielded better results than the original approach. * The usage of label metadata, training for a longer time ad hyperparameter tuning can be explored to see an improvement in results. </br> ### <span style = "color:#F5F5DC"> Challenges </span> * Working with a very sparse dataset. * Frequent memory management issues with the Galaxc implementation * Both the algorithms were devised keeping textual inputs in mind, using them for our tabular dataset was quite challenging. </div> <style> .reveal { font-size: 20px; } </style> </br> --- ## <span style="color:#85A7FF">Looking Forward</span> * Using textual metadata for the base quote representations * Modularising the code so that the team can further explore this paradigm and experiment with different datasets * Trying Siamese Networks for clustering the base quotes and the labels. --- ## <span style="color:#85A7FF">Learnings</span> * Working with real datasets and adapting research papers from different domains according to the data I had. * Introduction to a paradigm I had never worked with before and understanding its nuances * Developing good programming practices and making an experimentation framework out of my research paper implementations, for the ease of use by my team members * Lastly, I'm so grateful to have gotten this opoortunity to work in team of highly passionate individuals, where I got to be a part of really fun brainstorming sessions and learnt a new thing every single day <style> .reveal { font-size: 20px; } </style> --- ### Thank You </br>