# 11 Lab [en] - Predictive modelling, actions in dashboards
###### tags: `Data Visualization` `predictive modelling`
[TOC]
# Introduction - Purpose of the exercise
This exercise covers predictive modeling techniques using Tableau's built-in functions:
* MODEL_QUANTILE,
* MODEL_PERCENTILE.
The source data for the exercise is the data source stored in Tableau: **World Indicators**. The exercise is to examine the relationship between the following measures contained in the above file:
* health expenditure per capita,
* female life expectancy,
* birth rate.
We assume that there is a relationship (trend) between the following measures:
* **a** - female life expectancy and health expenditure,
* **b** - female life expectancy and birth rate (per 1000 population).
In this exercise we will try to examine which countries fit well into the observed trend, and which are moving away from it.
# 1: Life expectancy and health expenditure
We examine the relationship between the indicators:
* Life Expectancy Female,
* Health Exp/Capita (health expenditure per person).
Plot the correlation (using the AVG and MEDIAN aggregation functions) as shown in Fig.1. In addition, to check the trend between the measures in the figure we have added a best-fit trend line (please choose which one).

<center><small>Rys.1 </small></center></div>
In order to eliminate the effect of a large difference in the value of the coefficient associated with per capita expenditures, a logarithmic scale on the OX axis can be used - Fig.2.

<center><small>Rys.2 </small></center></div>
Assuming that the logarithmic relationship is a trend that describes the analyzed data set well, let us assume that further analysis will be conducted between:
* AVG(Life Expectancy Female)
* LOG(MEDIAN(Health Exp/Capita)).
To do this, define a calculation field, e.g. **LifeExpPerc** according to the formula:
```
MODEL_PERCENTILE(AVG([Life Expectancy Female]), LOG(MEDIAN([Health Exp/Capita])))
```
Next, add the defined calculation field MODEL_PERCENTILE to the view, according to the following points.
1. Drag LifeExpPerc to Color on the Marks card.
Click the drop-down arrow on the pill and select Compute Using > Country/Region.
1. Click Color on the Marks card, and then click Edit Colors.
2. Under Palette, select Orange-Blue Diverging.
Select the Stepped Color checkbox.
1. Select the Reversed checkbox.
2. Click OK.

In Fig. you can see the distribution of countries where life expectancy is both higher and lower than expected based on spending levels. Note that, in general, dark red indicates that life expectancy is high relative to health care spending, dark blue indicates that life expectancy is low relative to health care spending, and gray indicates that life expectancy is close to that expected by the model, based on the level of health care spending.
> **MODEL_PERCENTILE returns the probability that the unobserved value is less than or equal to the observed marker, as defined by target_expression and based on other predictors** (predictor - a statistical model variable, used in forecasting; independent variable, explanatory variable) that the user can select. This is the a posteriori prediction function, also known as the cumulative density function (CDF). This calculation will return a value between 0 and 1.
> **Percentile**, also called centile (**centile**) in measurement practice, is a statistical unit that describes the position of a given result relative to a whole group of results. Percentile** is a statistical unit which describes the position of a given result in relation to the whole group of results.
## Grouping by colors
To simplify the analysis, use the prediction results in the calculation box to visually group the results. We will build the groups so that we group together markers above the 90th and below the 10th percentile, scores between the 80th and 90th percentile, and scores between the 10th and 20th percentile, and so on.
```
IF
ISNULL([LifeExpPerc])
THEN "Null"
ELSEIF [LifeExpPerc] >=0.9 OR
[LifeExpPerc] <=0.1
THEN "<10th & >90th percentile"
ELSEIF [LifeExpPerc] >=0.8 OR
[LifeExpPerc] <=0.2
THEN "<20th & >80th percentile"
ELSEIF [LifeExpPerc] >=0.7 OR
[LifeExpPerc] <=0.3
THEN "<30th & >70th percentile"
ELSEIF [LifeExpPerc] >=0.6 OR
[LifeExpPerc] <=0.4
THEN "<40th & >60th percentile"
ELSE "50th percentile +-10"
END
```
Then add the above mentioned calculation field to the color tag of the popup menu by choosing Compute Using > Country/Region. Also, please select the **Traffic Light** color palette and gray for Nulls.

Looking at the orange mark in the corner, notice that the United States spends $8895 per woman with a life expectancy of 81 years. Moving along the X axis to the left, you can see that other countries spend less and have the same life expectancy.
The model estimates the strength of the relationship for each country.
# 2. Life expectancy vs. birth rate
In the next step, we will examine the trend between female life expectancy and birth rate.

Following the previous procedure, we will implement the modeling using the MODEL_PERCENTILE function

Unfortunately, not all points fit the trend.
The **MODEL_QUANTILE** function can be used to verify more subtle relationships. MODEL_QUANTILE is used to generate numerical predictions based on target percentile, target expression, and predictors. It is the inverse of **MODEL_PERCENTILE**.
Let's define the variable **quantile of life expectancy vs. expenditure**.
```
MODEL_QUANTILE(0.5, AVG([Life Expectancy Female]), LOG(MEDIAN([Health Exp/Capita])))
```
Let's explain the calculation to better understand it:
* We start with MODEL_QUANTILE, where the first argument is 0.5, specifying what percentile to predict.
* The target expression is female life expectancy.
* The predictor is the median health expenditure per capita.
Add the above expression to the tooltip as **Model predicted female life expectancy**. Please use this variable to describe each point (State) in the tooltip.