# MLE - Grocery Recommendation ## Motivation For this analysis, I will help a grocery store discover insights from sales data that could be used for - targeted direct mail marketing (specific coupons mailed to customers) - targeted email marketing ("An item you like has gone on sale!") - online shopper recommendations to 'add to cart' based on similar items and also based on items other people who bought that item have purchased. ## Dataset - Link: https://www.kaggle.com/psparks/instacart-market-basket-analysis #### The data devide into 6 files: - aisles.csv: 134 unique aisle numbers and descriptions - department.csv: 21 unique department numbers and descriptions - products.csv: 49688 product ids with description, aisle id and department id - orders.csv: 3 421 083 unique orderid, with user id, order number, order_dow, order_hour_of_day, days_since_prior_order, - Order_products__train.csv: Order id, product id, add to cart order, and reorder indicator - Order_products__prior.csv: Order id, product id, add to cart order, and reorder indicator ## Steps to tackle the problem: ### EDA: *Goal: Checking and fix if there's any abnormal data. Identify some pattern in the data such as: - exploring the purchasing patterns in the data. - which aisles and departments are ordered from the most and even down to the product level. - typical number of items in each order and how many days users go before their next order. ### Clustering: *Goal: Clustering user by K-means in order to find similar user group for sending targeted email marketing - create and calculate metrics to add dimension to data point for clustering - estimate and visualise how different each group compare to each other ### NLP Search Engine: *Goal: A search engine for user to input any text and will return the result of any product relevant to that text - stemmed the text data (basically removing suffixes) and used a count vectorizer to generate numerical representations of the words. - when a search is entered, the search goes through the same stemming and vectorizing and is compared to the existing product base using cosine similarity (a measure of the size of the angle generated between the word vectors.) - some reference links: https://towardsdatascience.com/a-beginners-guide-to-stemming-in-natural-language-processing-34ddee4acd37 https://towardsdatascience.com/understanding-cosine-similarity-and-its-application-fd42f585296a ### Recommendation system: *Goal: Recommend product for buyer to add-to-cart - created a "model-based" recommendation system using Singular Value Decomposition (SVD) that will generate predictions of a user's rating of an item that they have not previously ordered - I will use the number of times a user has purchased an item as a stand in for an actual rating. This generated a rating scale of 1 - 100 for each user and product combination Reference Link: https://towardsdatascience.com/recommender-system-singular-value-decomposition-svd-truncated-svd-97096338f361 ### Market Basket Analysis: *Goal: Base on products that already in-cart, recommend some products will likely be added to cart by buyer Ref link: https://towardsdatascience.com/affinity-analysis-market-basket-analysis-c8e7fcc61a21