Top User Engagement Analysis

# Top User Engagement Analysis In order to correlate engagement of users on our portals with the revenue they generate, we want to find out the overlap between the following two groups: 1. Users that generate high revenue. We call this group millionaires. 2. Users that engage highly with the portals. We call this group Fans. Mathematically, we want to find out the amount of overlap between `Millionaires` and `Fans`: $$overlap = { \ {\big| \ \text{Millionaires} \ \big|} \ \cap \ {\big| \ \text{Fans} \ \big|} \ }\ $$ Following from this analysis https://hackmd.io/fNjlwdxNQj26emjt3DMD1A?both, we are hypothesising that $$ \frac{\big| \ \text{Millionaires} \ \cap \ \text{Fans} \ \big|}{\big| \ \text{Fans} \ \big|} \ \gg \ \frac{\big| \ \text{Poors} \ \cap \ \text{Fans} \ \big|}{\big| \ \text{Poors} \ \big|} $$ $\cap$ is intersection symbol (think `INNER JOIN`) $| \ A \ |$ means size of set A (`COUNT(*)`) $\gg$ means much greater than We will test whether this hypothesis holds true below. ```sql WITH T AS ( SELECT us.rockman_id, count(cu.rockman_id) AS interaction_score, max(case when date_diff('days', us.sale_timestamp, current_date) > 7 then coalesce(us.tb_first_week_revenue, 0) else null end) :: float as revenue_week_1 FROM -- user_subscriptions 1 -- * content_usage, but |user_subscriptions| < |content_usage| (grouped by rcokman_id) user_subscriptions us LEFT JOIN content_usage cu ON cu.rockman_id = us.rockman_id AND cu.rockman_id NOT IN ('test_rockman_id', 'fdf098fcc6', '', 'undefined', '(not-set)') AND cu.rockman_id IS NOT NULL WHERE country_code = 'MY' AND operator_code = 'MY_MAXIS' AND us.timestamp < '2020-01-01' AND us.timestamp >= '2019-12-01' GROUP BY us.rockman_id ORDER BY interaction_score DESC ), T2 AS ( SELECT T.*, row_number() OVER (ORDER BY interaction_score DESC) rank, percent_rank() over (order by interaction_score) as interaction_prank, percent_rank() over (order by revenue_week_1) as revenue_week_1_prank FROM T ) select sum(case when revenue_week_1_prank >= 0.9 and interaction_prank >= 0.5 then 1 else 0 end) as millionaires_and_high_engagers, sum(case when revenue_week_1_prank >= 0.9 then 1 else 0 end) as millionaires, sum(case when interaction_prank >= 0.5 then 1 else 0 end) as high_engagers, sum(case when revenue_week_1_prank < 0.5 and interaction_prank < 0.5 then 1 else 0 end) as poors_and_low_engagers, sum(case when revenue_week_1_prank < 0.5 then 1 else 0 end) as poors, sum(case when interaction_prank < 0.5 then 1 else 0 end) as low_engagers, sum(case when interaction_score = 0 then 1 else 0 end) as users_with_no_interaction, sum(case when revenue_week_1_prank < 0.5 and interaction_prank >= 0.5 then 1 else 0 end) as poors_and_high_engagers from T2 ``` This query results : millionaires_and_high_engagers | millionaires | high_engagers | poors_and_low_engagers | poors | low_engagers | users_with_no_interaction | poors_and_high_engagers ---------------|----------|-----------|----------------|-------|--------------|---------------------------|------------------------- 11 | 5996 | 37 | 37992 | 37995 | 60398 | 60398 | 3 Let's denote Fans by F and Millionaires by M. We know from conditional probability P(M | F) = P (M /\ F) / P(F) = 11 / 37 = 28% So the chances of a user being a Millionaire, given he is a Fan are 28%. Let's also find P(P | F) = P(P /\ F) / P(F) = 3 / 37 = 8% so if you’re a fan, you’re almost 4x more likely to be a millioniare. Our goals is to increase P(M). Bayes Theorem tells us P(M|F) = P(M) P(F|M) / P(F) so P(M) = P(M|F) . P(F) / P(F|M) = P(F /\ M) / P(F|M) Assuming P(F|M) is constant since we do not have control over it, we need to increase P(F) and increase P(F /\ M). So we need to increase number of Fans on the portals and increase the number of fans who are also millionaires. Following are some of the hypothesis we think will help to increase these numbers: - Force auto-redirection to the portal from the Thank You page after subscription, so that users land on the portal. - Target billable users to increase billability of subscribers - Optimize features in products after having more insights from users inside Next, we need to test these hypotheses.