Response Model (價量關係研究)

--- GA: UA-159972578-2 --- ###### tags: `計量經濟` `Econometrics` `Response Model` `價量關係` # Response Model (價量關係研究) ## 前情提要 ### 何謂 Response Model + 一種價量關係的研究模型 + 使用的模型方法與一般線性/非線性無異 ### Response Model 被用來 1. 調整產品售價 2. 最佳化市場策略 3. 測試有效市場策略 4. 計劃未來行銷活動 ### 總體/個體經濟的假設 + 個經是可以直接做lm，因為個經假設個人可以不受環境影響自行定價 + 總經不行(因為會有內生性問題)，總經才有供需曲線（價量互相影響） ### 控制變數的意義 1. 商管角度情境：沒有做實驗但有一筆資料，只能碰運氣假設該資料的控制變數有控制力 (1) 想研究x對Y(知道還有其他x會影響，但現在不關心) (2) 先用控制變數(已知的會有重大變化的變數)去對Y看效果 (3) 再去加上自變數(想探討的) (4) 在已經控制下，加進去還會有很大的變化嗎？ 2. 實驗設計角度情境：實質意義上的控制 + 知道這個藥是性別有差異，所以研究藥的效果要控制性別 + 只對男生做或只對女生做 + 缺點：只對一個做就沒有一般性 + 讓treatment對control是獨立的 + Random Block + 用每種control變數的組合去定義block + Randomly assign in treatment + 所以control不會影響treatment effect + 但如果只是把問卷搜集來的變數加進去就說有控制，而做問卷的實驗設計根本沒有考慮到的話，該變數其實沒有控制力 ### 名詞補充：PDF & CDF + PDF機率密度函數 + CDF累積分佈函數 ![](https://i.imgur.com/qw8KENs.png) ## 1. Response Model (價量模型) ### 資料介紹 + DisplayCoupon和Display/Coupon不會有同時為1的情況 ```{r} str(sales) ``` ``` ## Classes 'data.table' and 'data.frame': 124 obs. of 6 variables: ## $ OBS : int 1 2 3 4 5 6 7 8 9 10 ... ## $ SALES : num 61.8 11.5 61.6 38.3 31.7 ... ## $ PRICE : num 1.09 1.27 1.27 1.27 1.27 ... ## $ DISPLAY : int 0 0 0 0 0 0 0 0 0 0 ... ## $ COUPON : int 0 0 0 0 0 0 0 0 0 1 ... ## $ DISPLAYCOUPON: int 0 0 0 0 0 0 0 0 0 0 ... ``` ### 價量關係圖 + 價量相關的變數取log後散布圖會比較正常(R裡的log函數是以e為底數的ln) ```{r} plot(log(SALES)~PRICE, data = sales) abline(log.model) ``` ![](https://i.imgur.com/G5qWHK9.png) + 成微負相關，合理 ### Linear Response Ｍodel ```{r} log.model <- lm(log(SALES)~PRICE, data = sales) summary(log.model) ``` ``` ## Call: ## lm(formula = log(SALES) ~ PRICE, data = sales) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.9605 -0.6181 -0.1357 0.5485 2.8918 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 5.0844 0.9139 5.564 1.59e-07 *** ## PRICE -0.6623 0.7866 -0.842 0.401 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.9007 on 122 degrees of freedom ## Multiple R-squared: 0.005776, Adjusted R-squared: -0.002373 ## F-statistic: 0.7088 on 1 and 122 DF, p-value: 0.4015 ``` + 價格上升1%，銷量下降0.66% ### 討論共線性問題 + DISPLAY, COUPON, DISPLAYCOUPON三個DV對銷量之影響 ```{r} aggregate(log(SALES) ~ DISPLAY + COUPON + DISPLAYCOUPON, FUN = mean, data = sales) ``` ``` ## DISPLAY COUPON DISPLAYCOUPON log(SALES) ## 1 0 0 0 3.797571 ## 2 1 0 0 4.657477 ## 3 0 1 0 5.557327 ## 4 0 0 1 5.946818 ``` + DisplayCoupon和Display/Coupon是否有共線性問題? + 須做correlation確認 ```{r} cor(sales$DISPLAY, sales$DISPLAYCOUPON) cor(sales$COUPON, sales$DISPLAYCOUPON) ``` > [1] -0.1473 > [1] -0.08006 + 皆小於0.3，關係極微弱 + 看起來沒有太嚴重的共線性問題 ### 討論遺漏變數問題 + 價量關係與價格息息相關，上述模型可能有遺漏變數問題，例如：Price ```{r} dummy.model <- lm(log(SALES)~ DISPLAY + COUPON + DISPLAYCOUPON, data = sales) coef(dummy.model) ``` ``` ## (Intercept) DISPLAY COUPON DISPLAYCOUPON ## 3.7975707 0.8599068 1.7597567 2.1492468 ``` ```{r} update(dummy.model, . ~ . + PRICE) ``` ``` ## Call: ## lm(formula = log(SALES) ~ DISPLAY + COUPON + DISPLAYCOUPON + ## PRICE, data = sales) ## ## Coefficients: ## (Intercept) DISPLAY COUPON DISPLAYCOUPON PRICE ## 3.4310 0.8747 1.7646 2.1630 0.3123 ``` + 三個DV之係數加入Price後並無重大的改變，Price不是遺漏變數 + Coupon不會影響Price + Price還是原本的Price，降價的錢是算在行銷成本裡 + 會真實影響消費者心理 + 只對價格敏感的人有效果，第三級差別取價(williness to pay)，跟直接降價的意義不同 ### 討論交互作用 + 前提假設： + DisplayCoupon和Display/Coupon之間不完全獨立 + Display和Coupon有交互作用(係數可相加) + 若要做成交互作用項，得先將data改成： + DisplayCoupon和Display/Coupon會有同時為1的情況 ```{r} sales$COU = ifelse(sales$COUPON==1 | sales$DISPLAYCOUPON==1, 1, 0) sales$DIS = ifelse(sales$DISPLAY==1 | sales$DISPLAYCOUPON==1, 1, 0) inter.model <- lm(log(SALES)~ DIS*COU, data = sales) ``` + Dummy模型 vs 交互作用模型： ```{r} rbind(coef(dummy.model), coef(inter.model)) ``` ``` (Intercept) DISPLAY COUPON DISPLAYCOUPON [1,] 3.798 0.8599 1.76 2.1492 [2,] 3.798 0.8599 1.76 -0.4704 ``` + Display/Coupon係數相加的效果，反而比DisplayCoupon還低 + 有負的交互作用 + 邊際效用遞減，可能是又是折價又是廣告很煩人 ### 加入 Dynamic Variables (Lagged effect) + 假設價量之間有carry-over effect(滯後影響) 1. Time span between marketing activities and response 2. Evaluation of several time periods by back shifting + 動態變數函數 + lag()：前期變數(整排延後一期) + lead()：後期變數(整排領先一期) ```{r} head(cbind(sales$PRICE, lag(sales$PRICE))) ``` ``` ## [,1] [,2] ## [1,] 1.090000 NA ## [2,] 1.271818 1.090000 ## [3,] 1.271818 1.271818 ## [4,] 1.271818 1.271818 ## [5,] 1.271818 1.271818 ## [6,] 1.271818 1.271818 ``` #### 1. 看價格改變在跨期之影響 * 當期價格：lag(price) * 前期價格：price ```{r} lag.model <- lm(log(SALES) ~ lag(PRICE) + PRICE, data = sales) summary(lag.model) ``` ``` ## Call: ## lm(formula = log(SALES) ~ lag(PRICE) + PRICE, data = sales) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.7875 -0.5700 -0.1230 0.6069 2.2250 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.9059 0.9163 4.263 4.05e-05 *** ## lag(PRICE) 4.9349 1.2282 4.018 0.000103 *** ## PRICE -4.5790 1.2249 -3.738 0.000285 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.8523 on 120 degrees of freedom ## (1 observation deleted due to missingness) ## Multiple R-squared: 0.1239, Adjusted R-squared: 0.1093 ## F-statistic: 8.482 on 2 and 120 DF, p-value: 0.0003585 ``` + 因為都是價格變數(差了一期反應期)，以降價的角度來看：每降價1%的時候銷售量會變高4.58%。 + 降價的當期：銷量上升 + 負負得正：每次降價(負的)*負向係數 + 降價的前期：需求下降(顯著) + 可能是預期心理，購買行為hold住 + 例如雙十一就快開始，十月的銷量減少 + 降價的後期：需求下降(不顯著) + 降價效果沒有帶來什麼特別的好處，只是防守性的策略。 1. 可能大家都降價，所以只好跟著降價 2. 該公司有經濟規模，可以透過降價策略去打敗小公司 #### 2. 看COUPON的跨期影響 ```{r} update(lag.model, . ~ . + COUPON + lag(COUPON)) ``` ``` ## Call: ## lm(formula = log(SALES) ~ lag(PRICE) + PRICE + COUPON + lag(COUPON), ## data = sales) ## ## Coefficients: ## (Intercept) lag(PRICE) PRICE COUPON lag(COUPON) ## 3.833 4.843 -4.505 1.354 -0.384 ``` + 折價券在第一期正相關，第二期是負相關 + 代表折價券有邊際效果遞減現象，不該接連著發折價券 ### 選擇模型變數(Dropping predictors) + AIC(Akaike Information Criterion) + 引入懲罰項 + 降低overfitting + AIC越低越好 ```{r} AIC(lm(log(SALES) ~ PRICE + lag(PRICE) + DISPLAY + lag(DISPLAY) + COUPON + DISPLAYCOUPON + lag(DISPLAYCOUPON), data = sales)) ``` > [1] 189.1284 + Backward Selection + 起始是full model : y=b0+b1x1+b2x2+b3x3….+e + 減少變數找出RSS最小或R2最大 + 針對各個模型找出AIC,BIC,adjusted R2表現最好的模型組合 ``` final.model <- stepAIC(extended.model, direction = "backward", trace = F) summary(final.model) ``` ## 2. Nonlinear Response Model (個人選擇模型) ### 資料介紹 ```{r} str(choice) ``` ``` ## Classes 'data.table' and 'data.frame': 2798 obs. of 13 variables: ## $ OBS : int 1 2 3 4 5 6 7 8 9 10 ... ## $ HOUSEHOLDID : int 1 1 1 1 1 1 1 1 1 1 ... ## $ LASTPURCHASE : int 0 0 0 0 0 0 0 0 0 0 ... ## $ BUD : int 1 1 1 1 1 1 1 1 1 1 ... ## $ HOPPINESS : int 0 0 0 0 0 0 0 0 0 0 ... ## $ PRICE.BUD : num 0.052 0.052 0.046 0.052 0.046 0.047 0.046 0.046 0.047 0.05 ... ## $ PRICE.HOP : num 0.034 0.044 0.048 0.034 0.048 0.03 0.041 0.031 0.031 0.028 ... ## $ DISPL.BUD : int 0 0 0 0 0 0 0 0 0 0 ... ## $ DISPL.HOP : int 0 0 0 0 0 0 0 0 0 1 ... ## $ FEAT.BUD : int 0 0 1 0 1 0 0 0 0 0 ... ## $ FEAT.HOP : int 0 0 0 0 0 0 0 0 0 0 ... ## $ FEATDISPL.BUD: int 0 0 0 0 0 0 0 0 0 0 ... ## $ FEATDISPL.HOP: int 0 0 0 0 0 0 1 0 0 0 ... ``` ### 兩品牌的購買狀況及價格 ```{r} colMeans(choice[,c("HOPPINESS","BUD","PRICE.HOP","PRICE.BUD")]) ``` ``` ## HOPPINESS BUD PRICE.HOP PRICE.BUD ## 0.10007148 0.89992852 0.03355468 0.03482761 ``` ### 兩品牌價格比率 ```{r} choice<-choice %>% mutate(price.ratio = log(choice$PRICE.HOP/choice$PRICE.BUD)) #加上price.ratio variable head(cbind(choice$price.ratio, choice$PRICE.HOP, choice$PRICE.BUD)) ``` ``` ## [,1] [,2] [,3] ## [1,] -0.42488319 0.034 0.052 ## [2,] -0.16705408 0.044 0.052 ## [3,] 0.04255961 0.048 0.046 ## [4,] -0.42488319 0.034 0.052 ## [5,] 0.04255961 0.048 0.046 ## [6,] -0.44895022 0.030 0.047 ``` + price.ratio=hope.price/bud.price + Ratio小：表示HOP價格 < BUD價格 + Ratio大：表示HOP價格 > BUD價格 ### 關係圖 ```{r} plot(HOPPINESS ~ price.ratio, data = choice) abline(probability.model) ``` ![](https://i.imgur.com/R4IQINe.png) + 比Linear Model還fit data，因為這裡是選擇模型，所以用適合分類的Logistic Model ### Logistic Response Model ```{r} logistic.model <- glm(HOPPINESS ~ price.ratio, family = binomial, data = choice) summary(logistic.model) ``` ``` ## Call: ## glm(formula = HOPPINESS ~ price.ratio, family = binomial, data = choice) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -2.0750 -0.4280 -0.2353 -0.0765 3.3212 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -3.5727 0.1418 -25.19 <2e-16 *** ## price.ratio -6.7388 0.4006 -16.82 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 1820.0 on 2797 degrees of freedom ## Residual deviance: 1290.4 on 2796 degrees of freedom ## AIC: 1294.4 ## ## Number of Fisher Scoring iterations: 7 ``` + 選擇是一種類別，所以用Logistic而非Linear Model + 在Logistic Model裡coef指的是log(odds_ratio) + 其中odds_ratio= P(x)/1-P(x) + 取完exponential後，還要做點轉換才能直接使用該數字解釋 ### margins() 計算係數 + 利用margins()可把模型的係數算成能夠直接解釋的量 + 因為y的機率隨著x而變(參考下面cplot)，因此margin算出來的值是一個平均值 ```{r} margins(logistic.model) ``` > price.ratio > -0.4585 + 如果price.ratio下降1單位，HOPPINESS的購買機率將上升46%。 ### cplot ```{r} x <- seq(from = -2, to = 2, by = 0.5) cplot(logistic.model, "price.ratio", xvals = x) ``` ``` ## xvals yvals upper lower ## 1 -2.0 9.999501e-01 1.000017e+00 9.998832e-01 ## 2 -1.5 9.985511e-01 9.999247e-01 9.971774e-01 ## 3 -1.0 9.595381e-01 9.814489e-01 9.376272e-01 ## 4 -0.5 4.493509e-01 5.011249e-01 3.975768e-01 ## 5 0.0 2.731358e-02 3.469811e-02 1.992906e-02 ## 6 0.5 9.653428e-04 1.589547e-03 3.411382e-04 ## 7 1.0 3.324924e-05 6.762153e-05 -1.123048e-06 ## 8 1.5 1.144171e-06 2.773075e-06 -4.847335e-07 ## 9 2.0 3.937192e-08 1.108188e-07 -3.207499e-08 ``` ![](https://i.imgur.com/lUPG3dM.png) + 根據X(price.ration)的變動單位，Y的預測機率主要效果落在-1~0之間 + 不用降價/升價太多就可以預測Y的機率 ### 解釋模型的預測值可能很不準 * 模型要做準就不能只用OLS這種最簡單好算的方式 * 預測性模型要用maximum likelihood（逼近） * 模型愈複雜likelihood就會愈大，所以會用AIC（每增加一個變數，自由度會扣掉一個penalty）或BIC去看，怕過度適配 * 給定看的到係數是多少，給定模型出現data的機率 * 一直改變模型係數，想辦法找到一個係數組合看到的機率最大 ### Predictive Performance + 混淆矩陣(Confusing Matrix) ```{r} predicted <- ifelse(fitted(extended.model) >= 0.5, 1, 0) observed <- choice$HOPPINESS prop.table(table(predicted, observed)) ``` ``` ## observed ## predicted 0 1 ## 0 0.88849178 0.07755540 ## 1 0.01143674 0.02251608 ``` + ROC(Receiver operating characteristics) curve * Sensitivity=TP/(TP+FN) * Specificity=TN/(TN+FP) ![](https://i.imgur.com/bgImviG.png) ```{r} ROC <- roc(predictor = fitted(extended.model), response = observed) plot(ROC) ``` ![](https://i.imgur.com/ruS3NNM.png) ### 驗證模型 ```{r} train.data <- subset(choice, subset = LASTPURCHASE ==0) test.data <-subset(choice, subset = LASTPURCHASE == 1) train.model <- glm(HOPPINESS ~ price.ratio + FEAT.HOP + FEATDISPL.HOP, family = binomial, data = train.data) probability <- predict(train.model, test.data, type = "response") predicted <- ifelse(probability >=0.5, 1, 0) observed <- test.data$HOPPINESS prop.table(table(predicted, observed)) ``` ``` ## observed ## predicted 0 1 ## 0 0.923333333 0.063333333 ## 1 0.006666667 0.006666667 ``` ### Cutoff選擇延伸 * 機器學習預測完會有一個Distribution Predict Probability（y=0和y=1的兩個常態機率分佈） * 將這個Distribution Predictive Probility選定了一個cutoff後，才會變成一個action * actual的混淆矩陣 * 將他與domain know how 找出來的一個cost矩陣相乘加總 * 找出一個最大預期報償矩陣 ## 3. Probit Response Model + 假設wiliness to pay是常態分佈或logistic分佈使用 + ![](https://i.imgur.com/TVCqCKm.png =35%x)其為累積標準常態分類函數 ```{r} probit.model <- glm(HOPPINESS ~ price.ratio, family = binomial(link = probit), data = choice) summary(probit.model) ``` ``` ## Call: ## glm(formula = HOPPINESS ~ price.ratio, family = binomial(link = probit), ## data = choice) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -1.9149 -0.4482 -0.2266 -0.0408 3.6124 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -1.95409 0.06728 -29.04 <2e-16 *** ## price.ratio -3.54755 0.20609 -17.21 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 1820.0 on 2797 degrees of freedom ## Residual deviance: 1291.5 on 2796 degrees of freedom ## AIC: 1295.5 ## ## Number of Fisher Scoring iterations: 7 ``` + Probit Model中的coef為Z-value的計算。 + 當Y為二元時，其條件期望值為Y=1的條件機率。 + 因此由X變動引起Y的期望變動為Y=1的機率變動。 + 給定X1,X2,X3,…..Xk下，Y=1的預測機率值可利用計算Z值，Z=b0+b1x1+b2x2+….+bkxk，然後查常態分配表的Z值。 + 計算自變數變動的效應可利用： (1) 計算自變數初始值時的預測機率值 (2) 計算自變數新值或變動後的值的預測機率值 (3) 求他們的差 ### 比較Probit Model & Logistic Model的係數 ```{r} cbind(coef(probit.model), coef(logistic.model)) ``` ``` ## [,1] [,2] ## (Intercept) -1.954092 -3.572678 ## price.ratio -3.547546 -6.738768 ``` + logistic coef = log-odds + probit coef = z-values ### margins() 計算係數 ```{r} margins(probit.model) ``` > price.ratio > -0.4503 + 與Logistic Model差異不大