# Quality Adjustment
## To-do List
* Compile all of the functions all together
* Cleaning Processes
* Import the data, and delete all of unnecessary rows and columns
* The unnecessary cases are as the following
* Completely empty rows and columns
* Rows and columns designed just for convenience in Excel
* Rename all of the columns to the following
* operator
* promotion_name
* promotion_type
* promotion_duration
* promotion_beginning
* promotion_end
* promotion_price
* voice_minute_quota
* sms_number
* mms_number
* internet_quota_min
* internet_quota_mb
* internet_quota_gb
* Note: This feature will be later combined with internet_quota_mb, with only internet_quota_mb remained; the formula for the output will be internet_quota_mb = internet_quota_mb + 1024 * internet_quota_gb
* sms_price
* mms_price
* excessive_internet_price_per_min
* Note: This feature will later be combined with excessive_internet_price_per_hour, with only the hour case remained; the formula for the output will be min(excessive_internet_price_per_min * 60, excessive_internet_price_per_hour)
* excessive_internet_price_per_hour
* first_min_voice_price
* Note: This feature will later be dropped, with only avg_voice_price remained
* second_min_voice_price
* Note: This feature will later be dropped, with only avg_voice_price remained
* third_min_voice_price
* Note: This feature will later be dropped, with only avg_voice_price remained
* fourth_min_voice_price
* Note: This feature will later be dropped, with only avg_voice_price remained
* note
* Note: We will extract more information from this column later on
* Fix the "merged & center" problem in Excel by forward filling the NaN cases for the following columns
* promotion_duration
* operator
* promotion_name
* promotion_type
* promotion_duration
* promotion_beginning
* promotion_end
* Keep only postpaid packages
* Basically eliminate all other packages - see if there is any need for any additional fix
* Remove packages with strange/erratic prices
* Example: Non-numeric case
* Replace any value with only '-' with NaN
* Lower all of the capital letters for less cases needed to be fixed
* Remove any unit from any value
* For example: Change '300 บ.' to '300'
* Add the following new columns
* avg_voice_price: Find the unweighted average of first_min_voice_price, second_min_voice_price, third_min_voice_price, and fourth_min_voice_price
* Formula: (first_min_voice_price + second_min_voice_price + third_min_voice_price + fourth_min_voice_price)/4
* Reasoning: Avoid a potential multicolinearity problem caused by very similar values across the four voice prices
* na_avg_voice_price: A dummy variable with the value = 1 when avg_voice_price is NaN, and 0 otherwise
* na_sms_price: A dummy variable with the value = 1 when sms_price is NaN, and 0 otherwise
* na_mms_price: A dummy variable with the value = 1 when mms_price is NaN, and 0 otherwise
* na_excessive_internet_price_per_hour: A dummy variable with the value = 1 when excessive_internet_price_per_hour is NaN, and 0 otherwise
* na_excessive_internet_price_per_mb: A dummy variable with the value = 1 when excessive_internet_price_per_mb is NaN, and 0 otherwise
## List of Included Features
* Operator
The operator name of each package
* Package Price (Y variable)
The given price will be the original price as provided in the original dataset
* Voice Quota (Unit of minutes)
The number of minutes that a user can use for the given promotion
* SMS Quota (Unit of number of messages)
The number of SMS's that a user can use for the given promotion
* MMS Quota (Unit of number of messages)
The number of MMS's that a user can use for the given promotion
* Internet Time Quota (Unit of minutes)
The number of internet minutes that a user can use for the given promotion
* Internet Memory Quota (Unit of MBs)
The number of internet memories that a user can use for the given promotion
* Average Voice Price (Unit of Baht/minute)
The average price per minute that a user has to pay in order to call someone after all of his quota has been depleted
Note: the original data have the more detailed values for each excessive minute. However, most of the prices are identical across all of these minutes. Hence, it will cause a colinearity issue.
* NA Average Voice Price (Flag - 1: True/0: False)
The flag telling whether the average voice price is missing or unable to be calculated
* The value is 1 if it is missing, and 0 otherwise.
* SMS Price (Unit of price per message)
The price of each SMS sent by a user after using all of the provided quotas
* NA SMS Price (Flag - 1: True/0: False)
The flag telling whether the SMS price is missing or unable to be calculated
* The value is 1 if it is missing, and 0 otherwise.
* MMS Price (Unit of price per message)
The price of each MMS sent by a user after using all of the provided quotas
* NA MMS Price (Flag - 1: True/0: False)
The flag telling whether the MMS price is missing or unable to be calculated
* The value is 1 if it is missing, and 0 otherwise.
* Excessive Internet Price Per Hour (Unit of Baht/hour)
The price of each excessive hour of the internet use
* NA Excessive Internet Price Per Hour (Flag - 1: True/0: False)
The flag telling whether the excessive internet price per hour is missing or unable to be calculated
* The value is 1 if it is missing, and 0 otherwise.
* Excessive Internet Price Per MB (Unit of Baht/MB)
The price of each excessive MB of the internet use
## List of Network Providers
The carriers in the data files are described as the following.
* AIS - AWN (2013, 2014, 2016, 2017), AIS (2013, 2014, 2015), DPC (2015) - child company of AIS
* DTAC - DTAC (2014, 2015, 2016, 2017), dtac trinet (2013, 2014)
* TRUE - RF (2013, 2014), True Mobile (2015, 2016), TRUEMOVE H (2017)
* CAT - CAT (2016), My By Cat (2017)