大數據行銷作業二 b05504066

大數據行銷作業二 b05504066 === ## Data input 1. static_reports.csv 2. dynamic_reports.csv 3. real_time_reports.csv ```python import pandas as pd import numpy as np %config Completer.use_jedi = False %config IPCompleter.greedy=True pd.set_option('display.max_rows', 200) ##Read three databases## static_db = pd.read_csv("static_reports.csv") dynamic_db = pd.read_csv("dynamic_reports.csv") real_time_db = pd.read_csv("real_time_reports.csv") # real_time_db ``` ## Useful information get some information that is useful for building RMF table Change the datetime format in real_time_table Since last recorded date is 2007-12-31, we assign "*today*" to be 2008-01-01 ```python ## take unique categories and its appearance from one cloumn appearance = real_time_db['Customer_ID'].value_counts() Customer_lst = real_time_db['Customer_ID'].value_counts().index ## change time format real_time_db['Date'] = pd.to_datetime(real_time_db['Date'],format='%Y%m%d') real_time_db['Date'].sort_values today = pd.to_datetime("2008-01-01") real_time_db['Date'] ``` 0 2006-01-01 1 2006-01-02 2 2006-01-04 3 2006-01-04 4 2006-01-04 ... 3072 2007-12-30 3073 2007-12-30 3074 2007-12-31 3075 2007-12-31 3076 2007-12-31 Name: Date, Length: 3077, dtype: datetime64[ns] ## Working on monetary add all amount for every customer_id ```python Monetary_dict={} for id in Customer_lst: idx = real_time_db.index[real_time_db['Customer_ID']==id] sum = real_time_db.iloc[idx]['Amount'].sum() Monetary_dict[id]=sum M_series = pd.Series(Monetary_dict) # M_series ``` ## Working on recency calculate the recency for every customer_id Recency_mth for bob stone ```python Recency_dict={} Recency_mth_dict = {} for id in Customer_lst: idx = real_time_db.index[real_time_db['Customer_ID']==id] dif = (today-real_time_db.iloc[idx]['Date'].max()).days dif_mth = (today-real_time_db.iloc[idx]['Date'].max())/ np.timedelta64(1, "M") Recency_dict[id]=dif Recency_mth_dict[id] = dif_mth R_series = pd.Series(Recency_dict) # R_series.sort_values() ``` ## Working on frequency calculate appeared times for every customer_id (Already done in "*appearance*") ```python F_series = appearance # F_series ``` ## Five equal groups Each category is divided into 5 segments. 5- highest 1- lowest ```python Cus_num = len(R_series) R_score_dic={} F_score_dic={} M_score_dic={} score= [5,4,3,2,1] step = int(Cus_num/5) bound = range(0,Cus_num,step) count=0 score_idx=0 R_series = R_series.sort_values() for i in range(Cus_num): R_score_dic[R_series.index[i]]=score[score_idx] F_score_dic[F_series.index[i]]=score[score_idx] M_score_dic[M_series.index[i]]=score[score_idx] count += 1 if (count == step): count=0 score_idx += 1 R_score = pd.Series(R_score_dic) F_score = pd.Series(F_score_dic) M_score = pd.Series(M_score_dic) ``` ## Bob Stone's scale Index | scoring rule | weights :-----: |:-------------: |:-------: R | < 3 months: 24 pts 3 ~ 6 months: 12 pts 6 ~ 9 months: 6 pts 9 ~ 12 months: 3 pts > 12 months: 0 pt| medium F | Frequency *　4 | high M | Monetary * 0.3 % (max = 9pts) | low ```python R_series_mth = pd.Series(Recency_mth_dict).sort_values() R_score_bob_dic={} F_score_bob_dic={} M_score_bob_dic={} for idx in Customer_lst: # R mth = R_series_mth[idx] if mth <= 3: R_score_bob_dic[idx]=24 elif mth <= 6: R_score_bob_dic[idx]=12 elif mth <= 9: R_score_bob_dic[idx]=6 elif mth <= 12: R_score_bob_dic[idx]=3 else: R_score_bob_dic[idx]=0 # F F_score_bob_dic[idx] = F_series[idx] * 4 # M a = M_series[idx] * 0.003 if a <= 9: M_score_bob_dic[idx] = a else: M_score_bob_dic[idx] = 9 R_score_bob = pd.Series(R_score_bob_dic) F_score_bob = pd.Series(F_score_bob_dic) M_score_bob = pd.Series(M_score_bob_dic) ``` ## Concatenation of series concatenate all columns in one dataframe. Index | Definition ---------- | -------------- R_raw, F_raw, M_raw | Original value R, F, M|Ranking based on equally-divided strategy R_bob, F_bob, M_bob | Score based on Bob Stone strategy rmf_equal | Total of original RFM ranking rfm_bs | Total of Bob Stone score equal_rank, bs_rank | The real ranking of the customer, less means valuable equal_rank_scaleup | Since the maximmum ranking in original srategy is 15, we multiply it by 6 to compare with BS_rankng. rank_diff | Difference between scale-up equal_rank and bs_rank R_score_my, F_score_my, M_score_my | score based on my rules rfm_my | total score based on my rules my_rank | ranking based on my rules ```python result = pd.concat([R_series,F_series,M_series,R_score,F_score,M_score,R_score_bob,F_score_bob,M_score_bob],\ axis=1,keys=['R_raw','F_raw','M_raw','R','F','M','R_bob','F_bob','M_bob'],\ names='Customer_ID') result['rfm_equal'] = result['R'] + result['F'] + result['M'] result['rfm_bs'] = result['R_bob'] + result['F_bob'] + result['M_bob'] result.sort_values('rfm_equal',ascending = False) result['equal_rank']=result['rfm_equal'].rank(method='dense',ascending=False) result['bs_rank']=result['rfm_bs'].rank(method='dense',ascending=False) # print(result['bs_rank'].max()) result['equal_rank_scaleup']=result['equal_rank']*6 result['rank_diff'] = abs(result['equal_rank_scaleup'] - result['bs_rank']) # print(result) ``` ```python result.query('rank_diff > 18') a =result.query('M_bob == 9 ').index # import matplotlib.pyplot as plt print("The minimum monetary value to receive 9 pts:",result.loc[a]['M_raw'].min()) ``` The minimum monetary value to receive 9 pts: 3024 ## My scale Index | scoring rule | weights :-----: |:-------------: | :-------: R | < 1 months: 20 pts 1 ~ 3 months: 13 pts 3 ~ 6 months: 2 pts 6 ~ months: 0 pts | medium F | Frequency *　4 | high M | Monetary * 0.05 % (max = 10pts) | low ## My scoring rules I doesn't change the frequency's rule of Bob Stone since this index is also important in our supermarket cases. We would first discuss Monetary index: First of all, Bob stone scale is based on USD, therefore we change the percentage from 10% to 0.3%. We knew that Bob's scale doesn't care about the monetary value, so it's quite normal that almost every customer has the highest score, 9 in this case. However, although the monetary value is also the least important index in supermarket, about 3/4 is max score. The lowest value in this segmanet is 3024NTD, which is pretty low for a total spending among two years. Due to this unbalance distribution, the scale is revised down from 0.3% to 0.05%, and the highest point is adjusted to 10 , which in result reduces the count of the max-point segment to 71 and implies that customers who are marked 10 spend about 20000NTD among this two years, a much reasonable number. Next, we would discuss on Recency index: The ordinary cut-off in Bob's scale is three months, which is apparently too long for supermarket's customer. Buying groceries should be weekly or even daily routine. Based on this concept, I revised the breakpoint from 3, 6, 9, 12 months to 1, 3, 6 months. Customers who don't buy things for a while will loss their score from 20 ,13 to 5 and 0. After applying these two changes, if we set the customer whose ranking is below 40 as VIP, we found out that no customer raise their ranking, but about 40 customers of which ranking is below 40 in Bob's scale fall upon 70 in our scale. **This implies that we have delete 61 people out of our VIP list, and remain about 47 people.** These customer should be well treated with customized strategies, and for others, maybe we could just send same coupons to them. ```python R_score_my_dic={} F_score_my_dic={} M_score_my_dic={} for idx in Customer_lst: # R mth = R_series_mth[idx] if mth <= 1: R_score_my_dic[idx]=20 elif mth <= 3: R_score_my_dic[idx]=13 elif mth <= 6: R_score_my_dic[idx]=5 else: R_score_my_dic[idx]=0 # F F_score_my_dic[idx] = F_series[idx] * 4 # M a = M_series[idx] * 0.0005 if a <= 9: M_score_my_dic[idx] = a else: M_score_my_dic[idx] = 10 # print(R_score_my_dic) result['R_score_my'] = result.index.to_series().map(R_score_my_dic) result['F_score_my'] = result.index.to_series().map(F_score_my_dic) result['M_score_my'] = result.index.to_series().map(M_score_my_dic) result['rfm_my'] = result['R_score_my'] + result['F_score_my'] + result['M_score_my'] result.query('M_score_my ==10').shape[0] result['my_rank']=result['rfm_my'].rank(method='dense',ascending=False) # result.loc[:,['bs_rank','equal_rank_scaleup','my_rank']] result.iloc[:,6:] print("people whose rank increases:",result.query('bs_rank > my_rank ').shape[0]) print("people whose original rank is below 40 but falls out to 70 in my scale:",result.query('bs_rank < 40 & my_rank > 70').shape[0]) print("people whose original rank is lower than 40:",result.query('bs_rank < 40').shape[0]) print("people whose rank is lower than 40 in our scale:",result.query('my_rank < 40').shape[0]) result.to_csv("rfmtable.csv",index=False) ``` people whose rank increases: 0 people whose original rank is below 40 but falls out to 70 in my scale: 16 people whose original rank is lower than 40: 106 people whose rank is lower than 40 in our scale: 47 ```python ``` ###### tags: `大數據行銷`