# Cosmed Sale Research 20200815 ## 現場反饋 - 開賣後分類頁仍顯示尚未開賣、售完後分類頁未顯示已售完 - 兩層 CDN 導致 delay 4~6min - 商品頁持續顯示“開賣通知我" - 收藏 load 不出來 - iOS 首頁白頁 - 監控到的 error / latency 下降 ## 觀察重點 ``` 比較基準 墊腳石 8/12 搶購: - time 0812 17:50 ~ 0812 18:10 - max RPM 93k 康是美 8/15 搶購: - time 0815 10:50 ~ 0815 11:10 - max RPM 75k ``` - [ ] RealTimeData 回來的慢、fail - [ ] 收藏頁回來的慢、fail - [ ] 首頁回來的慢、fail - [x] 是否因為調整 timeout 時間, 導致產生更多 timeout error. 還是該 timeout 的不管 5s, 20s 都會 timeout - [x] error 是否有變多 - [ ] 加 CDN 會有幫助嗎 ## 要拿出來討論的事項整理 - [ ] timeout error 變多的情況推論 (p100 -> p90, 10x timeout) ## 研究內容 ### error 是否有變多 > 研究人員: `@tomaz` - 兩天比較 - 康是美 8/15 搶購: RPM max 75k, error sum 1500 * 10 = 15000 - 墊腳石 8/12 搶購: RPM max 93k, error sum 2819 - 數量多出了很多,但是佔比差不多 - 展開原因也發現分佈比例類似 **康是美 8/15 搶購 - error 數量** region: ap-northeast-1 log-group-names: /eks-ap-northeast-1-tw-91app-io/prod-bff start-time: 2020-08-15T02:50:00.000Z end-time: 2020-08-15T03:10:00.000Z query-string: ``` filter ( type = "ERROR" and tag = "OperationLoggingApolloServerPlugin" ) | stats count() as cnt by data.operationName | sort cnt desc ``` ------------------------------------------- | data.operationName | cnt | |----------------------------------|------| | iOS_salePageRealTimeData | 740 | | android_salePage_realtime_info | 614 | | iOS_salePageAdditionalInfo | 27 | | cms_shopCategory_default_orderby | 27 | | iOS_salePageInfo | 27 | | cms_shopCategory | 24 | | android_salePage | 18 | | android_salePage_extra | 8 | | iOS_shopCategory | 6 | | <no-operation> | 3 | | cms_shopCategory_promotion_list | 2 | | android_getSalePageInit | 2 | | android_searchHotKeywords | 1 | | cms_layoutTemplate_spCatAd_list | 1 | | total | 1500 | ------------------------------------------- **康是美 8/15 搶購 - 展開原因** region: ap-northeast-1 log-group-names: /eks-ap-northeast-1-tw-91app-io/prod-bff start-time: 2020-08-15T02:50:00.000Z end-time: 2020-08-15T03:10:00.000Z query-string: ``` filter ( type = "ERROR" and tag = "OperationLoggingApolloServerPlugin" ) | parse err.message /text: (?<reason>.+)/ | stats count() as cnt by data.operationName, data.error.type, reason | sort cnt desc ``` ------------------------------------------------------------------------------ | data.operationName | data.error.type | reason | cnt | |----------------------------------|-----------------|-----------------|-----| | iOS_salePageRealTimeData | Error | request timeout | 732 | | android_salePage_realtime_info | Error | request timeout | 608 | | cms_shopCategory_default_orderby | Error | request timeout | 24 | | cms_shopCategory | Error | request timeout | 19 | | iOS_salePageAdditionalInfo | Error | request timeout | 18 | | iOS_salePageInfo | Error | request timeout | 18 | | android_salePage | PythiaError | | 9 | | iOS_salePageAdditionalInfo | PythiaError | | 9 | | iOS_salePageInfo | PythiaError | | 9 | | android_salePage | Error | request timeout | 9 | | iOS_salePageRealTimeData | PythiaError | | 8 | | android_salePage_extra | Error | request timeout | 7 | | android_salePage_realtime_info | PythiaError | | 6 | | iOS_shopCategory | Error | request timeout | 6 | | cms_shopCategory | PythiaError | | 5 | | cms_shopCategory_default_orderby | PythiaError | | 3 | | <no-operation> | Error | request timeout | 3 | | android_getSalePageInit | PythiaError | | 2 | | cms_shopCategory_promotion_list | Error | request timeout | 2 | | android_searchHotKeywords | Error | request timeout | 1 | | android_salePage_extra | PythiaError | | 1 | | cms_layoutTemplate_spCatAd_list | Error | request timeout | 1 | ------------------------------------------------------------------------------ **墊腳石 8/12 搶購 - error 數量** region: ap-northeast-1 log-group-names: /eks-ap-northeast-1-tw-91app-io/prod-bff start-time: 2020-08-12T09:50:00.000Z end-time: 2020-08-12T10:10:00.000Z query-string: ``` filter ( type = "ERROR" and tag = "OperationLoggingApolloServerPlugin" ) | stats count() as cnt by data.operationName | sort cnt desc ``` ------------------------------------------- | data.operationName | cnt | |----------------------------------|------| | iOS_salePageRealTimeData | 1216 | | android_salePage_realtime_info | 913 | | iOS_salePageAdditionalInfo | 182 | | iOS_salePageInfo | 180 | | cms_shopCategory | 122 | | cms_shopCategory_default_orderby | 104 | | android_salePage_extra | 33 | | android_salePage | 30 | | iOS_shopCategory | 14 | | iOS_leftMenu | 11 | | android_getSalePageInit | 8 | | CouponListPage | 2 | | cms_shopCategory_promotion_list | 1 | | iOS_couponList | 1 | | iOS_appAnnouncementList | 1 | | CouponList | 1 | | total | 2819 | ------------------------------------------- **墊腳石 8/12 搶購 - 展開原因** region: ap-northeast-1 log-group-names: /eks-ap-northeast-1-tw-91app-io/prod-bff start-time: 2020-08-12T09:50:00.000Z end-time: 2020-08-12T10:10:00.000Z query-string: ``` filter ( type = "ERROR" and tag = "OperationLoggingApolloServerPlugin" ) | parse err.message /text: (?<reason>.+)/ | stats count() as cnt by data.operationName, data.error.type, reason | sort cnt desc ``` ------------------------------------------------------------------------------- | data.operationName | data.error.type | reason | cnt | |----------------------------------|-----------------|-----------------|------| | iOS_salePageRealTimeData | Error | request timeout | 1048 | | android_salePage_realtime_info | Error | request timeout | 892 | | iOS_salePageInfo | PythiaError | | 158 | | iOS_salePageAdditionalInfo | PythiaError | | 158 | | iOS_salePageRealTimeData | PythiaError | | 157 | | cms_shopCategory | PythiaError | | 103 | | cms_shopCategory_default_orderby | PythiaError | | 83 | | android_salePage_extra | PythiaError | | 22 | | android_salePage | PythiaError | | 22 | | cms_shopCategory_default_orderby | Error | request timeout | 21 | | android_salePage_realtime_info | PythiaError | | 21 | | cms_shopCategory | Error | request timeout | 19 | | iOS_salePageAdditionalInfo | Error | request timeout | 13 | | iOS_salePageAdditionalInfo | Error | request failed | 11 | | iOS_salePageInfo | Error | request timeout | 11 | | iOS_salePageRealTimeData | Error | request failed | 11 | | android_salePage_extra | Error | request timeout | 11 | | iOS_salePageInfo | Error | request failed | 11 | | iOS_leftMenu | PythiaError | | 10 | | iOS_shopCategory | Error | request timeout | 8 | | android_salePage | Error | request timeout | 7 | | iOS_shopCategory | PythiaError | | 6 | | android_getSalePageInit | PythiaError | | 6 | | CouponListPage | Error | request timeout | 2 | | android_getSalePageInit | Error | request timeout | 2 | | iOS_couponList | Error | request timeout | 1 | | iOS_appAnnouncementList | Error | request timeout | 1 | | android_salePage | Error | request failed | 1 | | iOS_leftMenu | Error | request timeout | 1 | | cms_shopCategory_promotion_list | Error | request timeout | 1 | | CouponList | Error | request timeout | 1 | ------------------------------------------------------------------------------- ### 是否因為 timeout 時間縮短了,導致 timeout error 大幅提升 > 研究人員: `@tomaz` - 平時的 timeout error 在修改前、修改後並沒有太大差異 (平時不會 timeout 的都能在 5s 內回應, 超過 5s 的就會 timeout 了,不管等 5s, 20s 都一樣) - 搶購時 timeout error 倍數上升,但 api 分佈比例類似 - 墊腳石 2823 - 康是美 3245 * 10 = 32450 - 推論: **修改前 p100 的人會 timeout, 修改後 p90 的人會 timeout, 大約 10x 人數** - 那得到了什麼 - 墊腳石 latency p95 7s, avg 1.2s - 康是美 latency p95 4.2s, avg < 300ms **墊腳石 timeout error** region: ap-northeast-1 log-group-names: /eks-ap-northeast-1-tw-91app-io/prod-bff start-time: 2020-08-12T09:50:00.000Z end-time: 2020-08-12T10:10:00.000Z query-string: ``` filter ( data.status = "timeout" and type = "ERROR" ) | parse data.url /^https?:\/\/(?<service>.+?)\/(?<path>([^\.]*?\/){0,3}).*?$/ | stats count() as _count by service, path | sort _count desc ``` -------------------------------------------------------------------------------------- | service | path | _count | |------------------|--------------------------------------------------------|--------| | webapi.91mai.com | webapi/SalePage/GetSalePageRealTimeData/ | 1925 | | webapi.91mai.com | webapi/TraceSalePageList/ | 229 | | webapi.91mai.com | webapi/SalePageV2/GetSalePageV2Info/ | 188 | | webapi.91mai.com | webapi/ShopCategory/GetSalePageList/ | 75 | | webapi.91mai.com | webapi/Shop/ | 55 | | webapi.91mai.com | webapi/LayoutTemplateData/GetLayoutTemplateData/ | 55 | | webapi.91mai.com | webapi/SearchV2/ | 36 | | webapi.91mai.com | webapi/SalePage/GetSalePageHotListByShopCategoryId/ | 27 | | webapi.91mai.com | webapi/SalePageV2/ | 25 | | webapi.91mai.com | webapi/APPNotification/ | 23 | | webapi.91mai.com | webapi/shop/getCustomizedBrandIdentityDisplaySettings/ | 22 | | webapi.91mai.com | webapi/shop/getForcedLogoutVersionList/ | 21 | | webapi.91mai.com | webapi/ShopCategory/GetPromotionList/ | 20 | | webapi.91mai.com | webapi/SalePageV2/GetSalePageAdditionalInfo/ | 18 | | webapi.91mai.com | webapi/ShopStaticSetting/ | 15 | | webapi.91mai.com | webapi/AppAnnouncement/ | 15 | | webapi.91mai.com | webapi/AppNotification/GetMobileAppSettings/ | 14 | | webapi.91mai.com | webapi/Shop/GetShopCategoryListV3/ | 12 | | webapi.91mai.com | webapi/AppAnnouncement/getAppAnnouncementList/ | 10 | | webapi.91mai.com | webapi/ecoupon/ | 9 | | webapi.91mai.com | webapi/Shop/GetShopintroduction/ | 8 | | api2.91mai.com | o2o/api/coupon/ | 7 | | webapi.91mai.com | webapi/PromotionV2/GetList/ | 5 | | webapi.91mai.com | webapi/Activity/ | 4 | | webapi.91mai.com | webapi/HotSaleRanking/GetHotSaleRankingList/ | 2 | | webapi.91mai.com | webapi/shop/getShopContractSetting/ | 2 | | webapi.91mai.com | webapi/InfoModule/ | 1 | -------------------------------------------------------------------------------------- **康是美 timeout error** region: ap-northeast-1 log-group-names: /eks-ap-northeast-1-tw-91app-io/prod-bff start-time: 2020-08-15T02:50:00.000Z end-time: 2020-08-15T03:10:00.000Z query-string: ``` filter ( data.status = "timeout" and type = "ERROR" ) | parse data.url /^https?:\/\/(?<service>.+?)\/(?<path>([^\.]*?\/){0,3}).*?$/ | stats count() as _count by service, path | sort _count desc ``` --------------------------------------------------------------------------------------------------- | service | path | _count | |-------------------------------|--------------------------------------------------------|--------| | webapi.91mai.com | webapi/SalePage/GetSalePageRealTimeData/ | 1324 | | webapi.91mai.com | webapi/TraceSalePageList/ | 1182 | | d38tzu0atxk400.cloudfront.net | webapi/SalePageV2/GetSalePageV2Info/ | 152 | | d38tzu0atxk400.cloudfront.net | webapi/ShopCategory/GetSalePageList/ | 84 | | webapi.91mai.com | webapi/Shop/ | 63 | | webapi.91mai.com | webapi/SalePage/GetSalePageHotListByShopCategoryId/ | 57 | | webapi.91mai.com | webapi/SearchV2/ | 35 | | webapi.91mai.com | webapi/SalePageV2/ | 32 | | webapi.91mai.com | webapi/APPNotification/ | 31 | | webapi.91mai.com | webapi/shop/getCustomizedBrandIdentityDisplaySettings/ | 31 | | d38tzu0atxk400.cloudfront.net | webapi/ShopCategory/GetPromotionList/ | 30 | | webapi.91mai.com | webapi/ShopStaticSetting/ | 30 | | webapi.91mai.com | webapi/AppAnnouncement/ | 29 | | webapi.91mai.com | webapi/shop/getForcedLogoutVersionList/ | 23 | | d38tzu0atxk400.cloudfront.net | webapi/LayoutTemplateData/GetLayoutTemplateData/ | 22 | | webapi.91mai.com | webapi/Activity/ | 21 | | webapi.91mai.com | webapi/AppNotification/GetMobileAppSettings/ | 19 | | webapi.91mai.com | webapi/AppAnnouncement/getAppAnnouncementList/ | 19 | | webapi.91mai.com | webapi/ecoupon/ | 17 | | webapi.91mai.com | webapi/Shop/GetShopintroduction/ | 15 | | d38tzu0atxk400.cloudfront.net | webapi/SalePageV2/GetSalePageAdditionalInfo/ | 13 | | webapi.91mai.com | webapi/HotSaleRanking/GetHotSaleRankingList/ | 7 | | webapi.91mai.com | webapi/shop/getShopContractSetting/ | 3 | | d38tzu0atxk400.cloudfront.net | webapi/Shop/GetShopCategoryListV3/ | 3 | | webapi.91mai.com | webapi/InfoModule/ | 1 | | api2.91mai.com | o2o/api/coupon/ | 1 | | webapi.91mai.com | webapi/PromotionV2/GetList/ | 1 | --------------------------------------------------------------------------------------------------- **平時的 timeout error (改後)** region: ap-northeast-1 log-group-names: /eks-ap-northeast-1-tw-91app-io/prod-bff start-time: 2020-08-14T12:50:00.000Z end-time: 2020-08-14T13:10:00.000Z query-string: ``` filter ( data.status = "timeout" and type = "ERROR" ) |stats count() #| parse data.url /^https?:\/\/(?<service>.+?)\/(?<path>([^\.]*?\/){0,3}).*?$/ #| stats count() as _count by service, path #| sort _count desc ``` ----------- | count() | |---------| | 47 | ----------- **平時的 timeout error (改前)** region: ap-northeast-1 log-group-names: /eks-ap-northeast-1-tw-91app-io/prod-bff start-time: 2020-08-07T12:50:00.000Z end-time: 2020-08-07T13:10:00.000Z query-string: ``` filter ( data.status = "timeout" and type = "ERROR" ) |stats count() #| parse data.url /^https?:\/\/(?<service>.+?)\/(?<path>([^\.]*?\/){0,3}).*?$/ #| stats count() as _count by service, path #| sort _count desc ``` ----------- | count() | |---------| | 59 | -----------