# CTR Forecasting PRD
A CTR prediction model based on historical behaviour via context and category mechanism
Forecasting CTR based on factors specific to the campaign (and historical data) is the objective of this project.
Practically, it plays an important role in CTR modelling on the Doceree platform by mining 1. historical campaigns 2. category 3. platform 4. perhaps even user interest from the rich historical campaigns and also in the future behaviour data. Driven by the development of deep learning, deep CTR models with ingeniously designed architecture for user interest modelling have been proposed, bringing remarkable improvement of model performance over the offline metrics. However, great efforts are needed to deploy these complex models to the online serving systems for real-time inference, facing massive traffic requests. Things turn to be more difficult when it comes to long sequential user behaviour data, as the system latency and storage cost increase approximately linearly with the length of the user behaviour sequence. This will be in phase 2 of this.
There are 5 major steps to release this model
Phase 1:
1. Data Collection
1. Data Pre-processing
1. Research & EDA
1. Model Building & Training
1. Model Testing & Evaluation
1. Data Pipeline creation
1. UI Changes that are needed to connect the model
1. Backend services that need to be written
1. Alpha Model version deployment
Phase 2:
1. Additional data collection (Behavior, user interest etc., )
1. Feature Engineering
1. Research & EDA
1. Model Building & Training
1. Evaluation and Fine-tuning
1. Reevaluation
1. Data pipeline
1. UI changes
1. Backend
1. Beta version deployment
### Features we have from historical campaigns
```
Index(['DS', 'HH', 'TYPE_OF_AD', 'CREATED_AT', 'CREATIVE_ID', 'SUBCAMPAIGN_ID',
'CAMPAIGN_ID', 'BRAND_ID', 'ADVERTISER_ID', 'CODE_SNIPPET_ID',
'ASSET_ID', 'PLATFORM_ID', 'PUBLISHER_ID', 'HCP_ID', 'BID_TYPE',
'TYPE_OF_EVENT', 'CHANNEL_TYPE', 'DEVICE_TYPE', 'DIMENSION',
'PLATFORM_TYPE', 'BIDREQUESTID', 'EVENTSTATUS', 'BID_AMOUNT',
'USER_TAXONOMY', 'USER_ZIP'],
dtype='object')
```
Out of this the usable data are
```
1. TYPE_OF_EVENT',
2. 'BID_AMOUNT',
3. "CHANNEL_TYPE"
4. DIMENSION
5. DEVICE TYPE
6. PLATFORM TYPE
7. USER ZIP
8. USER TAXONOMY - But need extended taxonomies
9. EVENT STATUS
10.TYPE OF AD
```
Basic EDA has been performed on 10 campaigns without a result. Hence this needs to be done all over again with fresh new data.
We also need hourly data that also includes impressions. CTRs by category, by the hour and all campaigns data in one go identified by a campaign ID
We need a sample size of atleast a few hundred thousand rows of data.
Next steps:
Data gathering.