# Quality Adjustment ## To-do List * Compile all of the functions all together * Cleaning Processes * Import the data, and delete all of unnecessary rows and columns * The unnecessary cases are as the following * Completely empty rows and columns * Rows and columns designed just for convenience in Excel * Rename all of the columns to the following * operator * promotion_name * promotion_type * promotion_duration * promotion_beginning * promotion_end * promotion_price * voice_minute_quota * sms_number * mms_number * internet_quota_min * internet_quota_mb * internet_quota_gb * Note: This feature will be later combined with internet_quota_mb, with only internet_quota_mb remained; the formula for the output will be internet_quota_mb = internet_quota_mb + 1024 * internet_quota_gb * sms_price * mms_price * excessive_internet_price_per_min * Note: This feature will later be combined with excessive_internet_price_per_hour, with only the hour case remained; the formula for the output will be min(excessive_internet_price_per_min * 60, excessive_internet_price_per_hour) * excessive_internet_price_per_hour * first_min_voice_price * Note: This feature will later be dropped, with only avg_voice_price remained * second_min_voice_price * Note: This feature will later be dropped, with only avg_voice_price remained * third_min_voice_price * Note: This feature will later be dropped, with only avg_voice_price remained * fourth_min_voice_price * Note: This feature will later be dropped, with only avg_voice_price remained * note * Note: We will extract more information from this column later on * Fix the "merged & center" problem in Excel by forward filling the NaN cases for the following columns * promotion_duration * operator * promotion_name * promotion_type * promotion_duration * promotion_beginning * promotion_end * Keep only postpaid packages * Basically eliminate all other packages - see if there is any need for any additional fix * Remove packages with strange/erratic prices * Example: Non-numeric case * Replace any value with only '-' with NaN * Lower all of the capital letters for less cases needed to be fixed * Remove any unit from any value * For example: Change '300 บ.' to '300' * Add the following new columns * avg_voice_price: Find the unweighted average of first_min_voice_price, second_min_voice_price, third_min_voice_price, and fourth_min_voice_price * Formula: (first_min_voice_price + second_min_voice_price + third_min_voice_price + fourth_min_voice_price)/4 * Reasoning: Avoid a potential multicolinearity problem caused by very similar values across the four voice prices * na_avg_voice_price: A dummy variable with the value = 1 when avg_voice_price is NaN, and 0 otherwise * na_sms_price: A dummy variable with the value = 1 when sms_price is NaN, and 0 otherwise * na_mms_price: A dummy variable with the value = 1 when mms_price is NaN, and 0 otherwise * na_excessive_internet_price_per_hour: A dummy variable with the value = 1 when excessive_internet_price_per_hour is NaN, and 0 otherwise * na_excessive_internet_price_per_mb: A dummy variable with the value = 1 when excessive_internet_price_per_mb is NaN, and 0 otherwise ## List of Included Features * Operator The operator name of each package * Package Price (Y variable) The given price will be the original price as provided in the original dataset * Voice Quota (Unit of minutes) The number of minutes that a user can use for the given promotion * SMS Quota (Unit of number of messages) The number of SMS's that a user can use for the given promotion * MMS Quota (Unit of number of messages) The number of MMS's that a user can use for the given promotion * Internet Time Quota (Unit of minutes) The number of internet minutes that a user can use for the given promotion * Internet Memory Quota (Unit of MBs) The number of internet memories that a user can use for the given promotion * Average Voice Price (Unit of Baht/minute) The average price per minute that a user has to pay in order to call someone after all of his quota has been depleted Note: the original data have the more detailed values for each excessive minute. However, most of the prices are identical across all of these minutes. Hence, it will cause a colinearity issue. * NA Average Voice Price (Flag - 1: True/0: False) The flag telling whether the average voice price is missing or unable to be calculated * The value is 1 if it is missing, and 0 otherwise. * SMS Price (Unit of price per message) The price of each SMS sent by a user after using all of the provided quotas * NA SMS Price (Flag - 1: True/0: False) The flag telling whether the SMS price is missing or unable to be calculated * The value is 1 if it is missing, and 0 otherwise. * MMS Price (Unit of price per message) The price of each MMS sent by a user after using all of the provided quotas * NA MMS Price (Flag - 1: True/0: False) The flag telling whether the MMS price is missing or unable to be calculated * The value is 1 if it is missing, and 0 otherwise. * Excessive Internet Price Per Hour (Unit of Baht/hour) The price of each excessive hour of the internet use * NA Excessive Internet Price Per Hour (Flag - 1: True/0: False) The flag telling whether the excessive internet price per hour is missing or unable to be calculated * The value is 1 if it is missing, and 0 otherwise. * Excessive Internet Price Per MB (Unit of Baht/MB) The price of each excessive MB of the internet use ## List of Network Providers The carriers in the data files are described as the following. * AIS - AWN (2013, 2014, 2016, 2017), AIS (2013, 2014, 2015), DPC (2015) - child company of AIS * DTAC - DTAC (2014, 2015, 2016, 2017), dtac trinet (2013, 2014) * TRUE - RF (2013, 2014), True Mobile (2015, 2016), TRUEMOVE H (2017) * CAT - CAT (2016), My By Cat (2017)