# Shopee competition
## Date: 11/21
## 20201115
- Quantitize label in data_description.txt
- One hot encoding to string (increase lebels to solve the strings cannot be trainned issue)
## 20200919
## Python preprocessing lib
- https://docs.python.org/3/library/csv.html
- numpy lib
- pandas python
## Data preprocessing: Reduce data and increase computing efficiency
- How to select the label to compute the result via machine laerning
- How to deal with missing data
- Add other element
- Use average
- drop this data
- Use the original label to create the new labels
- Use height and weight to calculate BMI
- ML cannot deal with string
- If convert string to number, the number could be the order to misunderstand results
- Use one hot ending -> increase column size
- Column reduction
- Some "correlation coefficient" may help us to determine if any two columns should be merged
## Procedure
- Data preprocessing > 20-80 or 30-70 to ML
## ML
- lib: sklearn
- parameter adjusting: grid search
## How to evaluate model
- MSE
- F-score
- precision
- accuracy
- recall
## Shopee Training
- https://www.happycoding.today/
- https://www.sharecourse.net/sharecourse/
## 2020
- https://www.kaggle.com/c/shopee-product-detection-open
- https://www.kaggle.com/c/shopee-sentiment-analysis
## 2019
- https://www.kaggle.com/c/undrg-rd1-listings
- https://www.kaggle.com/c/ungrd-rd2-auo
## Emsber TODO
- Study algorithm, statistics
- Find Kaggle practice
- Python, ML study
- Jupyter
- colab
## Reference
- https://www.youtube.com/watch?v=Mq_Ksga9uHY