# Subscription Prediction
>Main compiler: Cynthia
>Motivation: gain valuable insights for subscription prediction
>Research note:
>
[2020-03-19]
* prediction of transactions (remove ISP and auto-subscription users)
* xgboost model
* data preprocess: shift variables according to their correlation
* period A (12 weeks)
* 
* period B (12 weeks)
* 
* pearson correlation with transactions (remove ISP and auto-subscription users) (2010~2019)

* pearson correlation with transactions (remove ISP and auto-subscription users) (2017~2019)

* pearson correlation with transactions (remove ISP and auto-subscription users) (2019~)

[2020-02-27]
* total transactions is seasonal time series
* split into user type: old / new user
* discovery: old -> seasonal, new -> irregular
* split into autorenew / non-autorenew
* discovery: autorenew -> seasonal, non-autorenew or first transaction of autorenew -> irregular
* autorenew:
* 
* non-autorenew or first transaction of autorenew:
* 
[2020-02-20]
* motivation:
* predict the amount of conversion in advance (etc. couple months)
* find the insights correlated to the amount
* dataset
* internal
* downloads, registration, churn, transaction, user count, stream count
* external
* downloads of competitive Product
* ptt / dcard discussion
* commercial cost
* app store ratings
* fan page posts and likes
* process (internal only)
* data exploration
* discovery: feature importance changes a lot between 2017, 2018 and 2019
* model
* xgboost
* VAR (vector autoregression)
* rmse: about 10,000 (conversion: 40,000)
* difficulties
* features for now may not presentative enough for prediction
* monthly data (36x80) is too few and may be biased
* restart with weekly data
* transactions prediction
* model AR (lags: 24 weeks)
* result: rmse: 41586.133
* 
* difference prediction
* 