# Shopee competition ## Date: 11/21 ## 20201115 - Quantitize label in data_description.txt - One hot encoding to string (increase lebels to solve the strings cannot be trainned issue) ## 20200919 ## Python preprocessing lib - https://docs.python.org/3/library/csv.html - numpy lib - pandas python ## Data preprocessing: Reduce data and increase computing efficiency - How to select the label to compute the result via machine laerning - How to deal with missing data - Add other element - Use average - drop this data - Use the original label to create the new labels - Use height and weight to calculate BMI - ML cannot deal with string - If convert string to number, the number could be the order to misunderstand results - Use one hot ending -> increase column size - Column reduction - Some "correlation coefficient" may help us to determine if any two columns should be merged ## Procedure - Data preprocessing > 20-80 or 30-70 to ML ## ML - lib: sklearn - parameter adjusting: grid search ## How to evaluate model - MSE - F-score - precision - accuracy - recall ## Shopee Training - https://www.happycoding.today/ - https://www.sharecourse.net/sharecourse/ ## 2020 - https://www.kaggle.com/c/shopee-product-detection-open - https://www.kaggle.com/c/shopee-sentiment-analysis ## 2019 - https://www.kaggle.com/c/undrg-rd1-listings - https://www.kaggle.com/c/ungrd-rd2-auo ## Emsber TODO - Study algorithm, statistics - Find Kaggle practice - Python, ML study - Jupyter - colab ## Reference - https://www.youtube.com/watch?v=Mq_Ksga9uHY