# Behavior Lift PRD ### Doceree Behavior Lift: ## Vision To be able to measure the impact of a campaign and the change in behavior that the campaign has caused. In addition to this, the endgame is to apply AI and data science to predict and improvise an ad campaign's overall impact. ### Patentability? This is something we will identify along the way and since this has been confirmed as a novel opportunity by business, we will document and come up with our own way to patent this as innovative disruption in this space. 1. Proprietary Algorithms 2. Approach & process 3. Outcome and study - paper publish opportunity? #### Problem statement: Currently, if a campaign has not received good number of clicks or low CTR, that campaign is considered unsuccessful. This is not true. There is a lot of impact that campaign would have made on the mind of the physician. To capture that change, we need behavioural lift solution. **USP**: To help the advertiser understand what changes his/her campaigns have been able to make the on the Physician mind and finally how has the perception of Physician has changed towards his/her brand. The value will be that advertiser will be able to check is ; out of the total population reached how many are aware of his/her brand, how many will prescribe the brand or how has the trend changed towards his/her brand. **Users of the product:** Our advertiser clients. **What is Behavioural lift?** Behavioural Lift solution measures the moments that matter—from initial impression to final conversion—with the metrics that matter, like brand awareness, ad recall, and consideration. You will glean insights within a matter of a few days, so you can adjust your Doceree campaigns according to what is working in near real time. **When to use Behavioural lift?** •You're running one campaign for one brand or product. •You're planning to keep the creative and budget the same throughout the campaign. **How can we do it?** **Surveys:** To measure the moments along the consumer journey, including brand awareness, ad recall, consideration, favourability, and purchase intent, we first isolate a randomized control group that was not shown your ad and an exposed group that did see your ad. About a day after seeing (or not seeing) your ad, we deliver a one-question survey to both groups. Since the only effective difference between the two groups is whether they saw your ad, we can accurately determine the lift attributed to your campaign. **Contextual Targeting/Retargeting:** Behavioural Lift will also measure the impact your campaign has on creating interest in your brand by using contextual targeting on Doceree network. Like surveys, we randomly pick a group that saw your ad and a control group that did not see your ad. We then compare the contextual targeting behaviour of both groups, looking at how often they search for keywords related to your brand or campaign. The difference in searches can be attributed to your campaign. **Where we will run the surveys:** • On Doceree Network • We need to have a HTML approach for the surveys. Since, this might take time, we need to explore the surveymonkey and typeform integration with our platform to route out surveys. **What all we need to capture?** Metrics: The questions of the survey will be bucketed as per the below mentioned metrics. This will help us assess the campaign from initial impression to final impression and we can get onto a behaviour quotient (Behavioural lift number) as well. We need to quantify the survey responses so that the same can be used to calculate the quotient. 1. **Awareness metrics:** a. Attention: This can be measured by asking the respondents in a survey if they recall ‘seeing an unbranded visual representation of the ad’ b. Brand linkage: This can be measured by asking respondents in a survey if they can name the brand sponsoring the unbranded visual representation of the ad. 2. **Interest Metrics:** a. **Message Communication:** **i**. Is the ad’s strategic message really being conveyed by the ad and received by the viewers? To measure this, survey respondents are often asked directly, “Other than getting you to buy the product, what was the main idea of the ad?” **ii.** In addition, it is important to measure how differentiated, relevant, and believable the main idea is. **b.** **Brand Attributes:** How does the ad improve perceptions on key brand attributes, for example, “faster speed,” “understands what’s important to me,” or “safer for my family” and so on? Best practice in surveys is to assess respondents’ views on these versus a non-exposed, matched control group. **c.** **Ad Diagnostics:** Why is the ad performing the way it is? Ratings often look to understand an ad’s likability, newsworthiness, entertainment and/or informational value, importance, and uniqueness, as well as whether it is confusing, believable, humorous, factual, buzzworthy, shareable, relevant, or annoying. 3. **Desire Metrics**: Desire metrics should measure the “response” to the ad, i.e., a change in intention or behaviour, or perhaps a change in attitudes toward the brand, which ensures that consumers are more motivated toward the brand after hearing and seeing the message. a. **Persuasion:** **i.** Here, one is most concerned about a change in attitudes related to behaviour — that is, are people more likely to buy the brand more often (intent) and/or use the brand more often (frequency) after being exposed to the advertising? **ii.** This is most often characterized as a “likelihood to purchase, consider, test drive, sign up” and so on. In addition, purchase and usage frequency allow for a more nuanced understanding of the persuasion of an ad. **b.Brand Favourability:** **i.** One can also consider desire in terms of brand equity, whether people feel more positive or more favourable toward a brand after experiencing the advertising. **ii.** Brands will often look at two components to understand their equity, affinity (brand closeness) and relevance (how well the brand meets a consumer’s personal needs). Once we capture the survey responses bucketed into the above-mentioned cohorts, the results will be displayed in the Analytics section of our platform. The analytics of the surveys will be a mix of direct responses displayed to the end users and some derived metrics as mentioned below: • **Lifted users:** This column shows you the estimated number of users in a sample survey whose perception of your brand changed because of your ads, extrapolated to the overall reach of the campaign. It shows the difference in positive responses to your brand or product surveys between the group of users who saw your ad and the group who didn’t • **Cost per lifted user:** The average cost for a lifted user who is now thinking about your brand after seeing your ads. Cost per lifted user is measured by dividing the total cost of your campaign by the number of lifted users. You can use this metric to understand the cost to change someone’s mind about your brand in terms of brand consideration, ad recall, or brand awareness. • **Absolute Brand Lift:** Absolute brand lift measures how much your ads influenced your audience’s positive feelings towards your brand or product. For example, an increase from 20% to 40% in the positive survey responses between the two surveyed groups represents an absolute lift of 20%. This metric is calculated by subtracting the positive response rate of the baseline group from the exposed group. • **Headroom Brand Lift:** The impact your ads had on increasing positive feelings towards your brand or product compared to the positive growth potential your brand or product could have gotten. This metric is calculated by dividing absolute lift by 1 minus the positive response rate of the baseline group. For example, an increase from 20% to 40% in the positive survey responses between the exposed group and the baseline groups represents a headroom lift of 25%. • **Relative Brand Lift:** The difference in positive responses to brand or product surveys between users who saw your ads, versus users who were stopped from seeing your ads. This difference is then divided by the number of positive responses from the group of users who didn’t see your ads. The result measures how much your ads influenced your audience’s positive perception of your brand. For example, an increase from 20% to 40% in the positive survey responses between the two surveyed groups represents a relative lift of 100%. • **Baseline positive response rate:** How often users who were stopped from seeing your ads responded positively to your brand. Use this metric to better understand how positive responses to your brand were influenced by general media exposure and other factors, not by seeing the ads in your campaigns. • **Exposed positive response rate:** • **Overall Behavioural quotient:** Need to define logic for this metric. **Duration of the surveys:** The surveys generally lasts for the whole campaign duration at least and even more as per the advertiser needs. We can start the survey as soon as within 2 days of the campaign start or do a post campaign survey as well. We can visit the single customer multiple times on different platforms to increase our chances of him/her filling up more responses. If we can ensure, every time we throw him/her a survey, a new question should be visible to them. **To be done:** • Survey designs template • Algorithm to quantify the survey responses: Have mentioned above the metrics that needs to be covered from the response we get. • Calculation of Control and test group • Calculation of Behavioural quotient • Behavioural lift analytics: Have mentioned above **Survey designs template** <Aman working on it> </Aman> ### Data Science Section: **Algorithms** a. **Extrapolation:** The behavior lift survey will have two type of data inputs. One- will be numerical data and Two - will be non-numerical data (can also be time/date/categorical) Data gathered from the surveys will be converted from a csv to a `dataframe` so that they can be extrapolated using `scipy 1. **Extrapolation based on numerical inputs** (Some more work has to be done here, this is a working interim) Extrapolating, in general, requires one to make certain assumptions about the data being extrapolated. One way is by curve fitting some general parameterized equation to the data to find parameter values that best describe the existing data, which is then used to calculate values that extend beyond the range of this data. The difficult and limiting issue with this approach is that some assumption about trend must be made when the parameterized equation is selected. This can be found thru trial and error with different equations to give the desired result or it can sometimes be inferred from the source of the data. The data provided in the question is really not large enough of a dataset to obtain a well fit curve; however, it is good enough for illustration. The following is an example of extrapolating the DataFrame with a 3rd order polynomial f(x) = a x3 + b x2 + c x + d **Using Linear interpolation** ``` from scipy.interpolate import interp1d from scipy import arange, array, exp def extrap1d(interpolator): xs = interpolator.x ys = interpolator.y def pointwise(x): if x < xs[0]: return ys[0]+(x-xs[0])*(ys[1]-ys[0])/(xs[1]-xs[0]) elif x > xs[-1]: return ys[-1]+(x-xs[-1])*(ys[-1]-ys[-2])/(xs[-1]-xs[-2]) else: return interpolator(x) def ufunclike(xs): return array(list(map(pointwise, array(xs)))) return ufunclike ``` 2. Non-numeric or Time-data based extrapolation [Source](https://stackoverflow.com/questions/34159342/extrapolate-pandas-dataframe/35960833#35960833) Extrapolating a DataFrame with a DatetimeIndex index This can be done with two steps: Extend the DatetimeIndex Extrapolate the data Extend the Index Overwrite df with a new DataFrame where the data is resampled onto a new extended index based on original index's start, period and frequency. This allows the original df to come from anywhere, as in the csv example case. With this the columns get conveniently filled with NaNs! #### Fake DataFrame for example (could come from anywhere)(to be replaced with actual behavior lift survey output data) ``` X1 = range(10) X2 = map(lambda x: x**2, X1) df = pd.DataFrame({'x1': X1, 'x2': X2}, index=pd.date_range('20130101',periods=10,freq='M')) ``` #### Number of months to extend `extend = 5` #### Extrapolate the index first based on original index ``` df = pd.DataFrame( data=df, index=pd.date_range( start=df.index[0], periods=len(df.index) + extend, freq=df.index.freq ) ) ``` #### Display ``` print df x1 x2 2013-01-31 0 0 2013-02-28 1 1 2013-03-31 2 4 2013-04-30 3 9 2013-05-31 4 16 2013-06-30 5 25 2013-07-31 6 36 2013-08-31 7 49 2013-09-30 8 64 2013-10-31 9 81 2013-11-30 NaN NaN 2013-12-31 NaN NaN 2014-01-31 NaN NaN 2014-02-28 NaN NaN 2014-03-31 NaN NaN ``` Extrapolate the data Most extrapolators will require the inputs to be numeric instead of dates. This can be done with # Temporarily remove dates and make index numeric ``` di = df.index df = df.reset_index().drop('index', 1) ``` See this answer for how to extrapolate the values of each column of a DataFrame with a 3rd order polynomial. Snippet from answer ### Curve fit each column ``` for col in fit_df.columns: # Get x & y x = fit_df.index.astype(float).values y = fit_df[col].values # Curve fit column and get curve parameters params = curve_fit(func, x, y, guess) # Store optimized parameters col_params[col] = params[0] ``` ### Extrapolate each column ``` for col in df.columns: # Get the index values for NaNs in the column x = ``` ``` ```df[pd.isnull(df[col])].index.astype(float).values # Extrapolate those points with the fitted function df[col][x] = func(x, *col_params[col]) Once the columns are extrapolated, put the dates back ### Put date index back df.index = di ### Display print df x1 x2 2013-01-31 0 0 2013-02-28 1 1 2013-03-31 2 4 2013-04-30 3 9 2013-05-31 4 16 2013-06-30 5 25 2013-07-31 6 36 2013-08-31 7 49 2013-09-30 8 64 2013-10-31 9 81 2013-11-30 10 100 2013-12-31 11 121 2014-01-31 12 144 2014-02-28 13 169 2014-03-31 14 196 ``` ### Calculation of Control and test group #### A probabilistic approach What is the probability that your campaign is going to influence a behavioral change? Physician behavior is influenced by many different factors. A marketer should try to understand the factors that influence consumer behavior. Here are 5 major factors that influence consumer behavior: 1. Awareness Factors 2. Desire - Social and cultural factors, persuading & brand factor 3. Interest - Learning, Motivation, Attitude & beliefs, Perception Idea of a basic Survey Design to guage the above: The Physician Brand Behavior (PBB) depends on five factors: 1. New Research Material (NRM) 2. Brand prescription (BP) 3. Promotional Tools (PT) 4. Decision Sample (DS) After an extended research and literature survey, several papers have been considered to prepare this approach. As per a [paper](https://www.researchgate.net/publication/232050460_Factors_That_Influence_Physicians'_Prescribing_of_Pharmaceuticals_A_Literature_Review) Regression analysis and factor analysis were selected to find out solutions of objectives of research. The relationship between dependent and independent variables was found out by statistical tool known as regression technique. In the research, impact of different factors on physician prescription behavior was presented. Scientific package for the Social Sciences (SPSS) Software was used to find out the relationship between dependent and dependent variable in the paper, however we will use an equivalent scipy and our own algorithms that are proprietary. **Model Equation** **PBB= α + β1 NRM + β2 BP + β3 PT + β4 DS e** Here, the PBB stands for physician brand behavior, NRM stands for new research material or media which are introduced for first time in India/any country (India since we will be conducting the survey in India to start with), BP stands for branded drugs, PT stands for promotional tools (medium of promotion), DS stands for decision samples given by physicians during their campaign interaction or while filling our survey and e stands for error term. One we will test if the brand is known to the doctor and he prescribes them Second we will test the hypothesis that the brand is not known to the doctor and he does not prescribe them. Based on the outcome we can decide retargeting or targeting. Hence we device an approach to this evaluation. **Model Hypothesis** H0 1: New research molecule is significant to physician prescription behavior. H0 2: Branded prescription behavior is significant to physician prescription behavior. H0 3: Promotion tool is significant to physician prescription behavior. H0 4: Drug sample is significant to physician prescription behavior. **Control & Measure** A/B Testing In every AB test, we formulate the null hypothesis which is that the two conversion rates for the control design ({ \varrho}_{ c }) and the new tested design ({ \varrho}_{ n }) are equal: {H}_{0 }:{ \varrho}_{ c }={ {\varrho } }_{ n } The null hypothesis is tested against the alternative hypothesis which is that the two conversion rates are not equal: {H}_{0 }:{ \varrho}_{ c }\neq { {\varrho } }_{ n } Before we start running the experiment, we establish three main criteria: **The significance level for the experiment:** A 5% significance level means that if we declare a winner in your AB test (reject the null hypothesis), then you have a 95% chance that you are correct in doing so. It also means that you have significant result difference between the control and the variation with a 95% “confidence.” This threshold is, of course, an arbitrary one and one chooses it when making the design of an experiment. **Minimum detectable effect:** The desired relevant difference between the rates you would like to discover **The test power:** the probability of detecting that difference between the original rate and the variant conversion rates. Using the statistical analysis of the results, we might reject or not reject the null hypothesis. Rejecting the null hypothesis means your data shows a statistically significant difference between the two conversion rates. We could also use other methods like ANOVA or calculate standard errors or co-efficients. However it will come to Rejecting or not rejecting the null hypothesis in the initial draft atleast. Not rejecting the null hypothesis means one of three things: There is no difference between the two conversion rates of the control and the variation (they are EXACTLY the same!) The difference between the two conversion rates is too small to be relevant There is a difference between the two conversion rates but you don’t have enough sample size (power) to detect it. The first case is very rare since the two conversion rates are usually different. The second case is ok since we are not interested in the difference which is less than the threshold we established for the experiment (like 0.01%). The worst case scenario is the third one. we are not able to detect a difference between the two conversion rates although it exists. Because of the data, you are completely unaware of it. To prevent this problem from happening, we need to calculate the sample size of your experiment before conducting it. It is important to remember that there is a difference between the population conversion rates and the sample size conversion observed rates r. The population conversion rate is the conversion rate for the control for all visitors that will come to the page. The sample conversion rate is the control conversion rate while conducting the test. We use the sample conversion rate to draw conclusions about the population conversion rate. This is how the statistics work: we draw conclusions from the population based on what we see for our sample. Making a mistake in our analysis based on faulty data (point 3) will impact the decisions we make for the population Type II errors occur when are not able to reject the hypothesis that should be rejected. With type I errors, you might reject the hypothesis that should NOT be rejected concluding that there is a significant difference between the tested rates when in fact it isn’t. These two situations are illustrated below: ![](https://i.imgur.com/AqwMNoN.png) ### Behavior Quotient #### Extracting behavioral traits of a physician based on the survey 1. Awareness Factors 2. Desire - Social and cultural factors, persuading & brand factor 3. Interest - Learning, Motivation, Attitude & beliefs, Perception Behavior Index Quotient Values Levels of Behavior Colors 0 to 3 Low influence of the brand Amber 3 to 5 Moderate influence of the brand Yellow 5 to 7 Above average impact of the brand on behavior amber 8 to 9 Brand has very high impact Green 0 to 1 Brand is unknown completely Red -1 to 0 Brand has negative image amongst physicians - Maroon Data collection Instruments: The data will be collected through questionnaire from the target population (doctors) and the questionnaire can contain five Likert scales ranging from as below. 1. Strongly disagree 1. Disagree 1. Neither agree nor disagree 1. Agree 1. Strongly agree Project Timeline --- ```mermaid gantt title A Gantt Diagram section Section Survey Phase 1 :a1, 2021-08-08, 30d Survey Phase 2 :after a1 , 20d section Another Phase 1 assessment :2021-09-08,18d Phase 2 assessment : 24d ``` Task breakdown timelines: Survey Questions: Aman Survey Template: Aman Survey Banner: Aman Survey Backend: (Cookie Code/Script) - Jeril Survey Frontend: Aman + Jeril Survey Other Tech Work - Jeril + Maddy Survey QA - Who ### RoadBlocks/Challenges 1. Evaluate retargeting data for India 2. Evaluate 3rd party platform usage 3. Evaluate if we can use 3rd party platform and HCPID can be mapped Note: This is a work in progress document. Please reach out to Aman or Madhusudhan Anand for questions.