# 8 Lab [en] - Data visalization, Trend analysis
###### tags: `Data Visualization` `trend analysis` `data mining`
[TOC]
# Introduction - Purpose of the exercise
The purpose of this exercise is to learn about the visual analytics features offered by Tableau.
# 1. background information
Two popular approaches to visual data analysis include **data visualization** and **visual analytics**. Both methods play a very important role in data mining.
**Data visualization** is the graphical representation of data in the form of dashboards or reports. Individual visualizations present views of data that answer "what? and how?" questions, e.g., "What are our sales and profits for different regions and months or years? "
Data visualization provides answers to a finite set of questions and provides some interactivity during the data mining process.
Answering the "what? and how?" questions is the first step in data mining - the next is the "why?" question. To make the data mining process deeper and more complete one should use analytics methods. **Visual analytics** is a more user-friendly and technology-enabled branch of data analytics.
**Table 1** Differences between Data Visualization and Visual Analytics processes.
| Features | Data visualization | Visual analytics |
| ------------------------------ |:-------------------:| :------------------:|
| Answering questions: what?, how? | X | X |
|Illustration of data points, series, KPIs| X|X|
|Visual presentation of data: dashboards, reports|X|X|
|Support for interactivity: filters, highliters, tooltips, data drilling|X|X|
|Deeper analysis, support for question: why?||X|
| Support for advanced analytical methods||X|
|Unification of the visualization process, data mining and queries||X|
|Helps to think visually about problems and questions. Leads to unexpected insights and allows identification of outliers in the data.||X|
|Helps share key information and provides tools for group work with data||X|
# 2: Visual analytics tools in Tableau
The following figure illustrates the visual analytics tools available in the Tableau tool cockpit.

## 2.1 Trend Lines
Trend lines are used to predict the continuation of a particular trend of an analyzed variable. Trend lines also help to identify the correlation between two variables. There are many mathematical models for establishing trend lines - Tableau provides five most commonly used options in this regard, i.e. linear, logarithmic, exponential (2 types) and polynomial trends.

The following material demonstrates how to use trend lines to analyze the dependence of energy output on wind speed in wind turbines.
{%youtube 9z-uTFk86p0 %}
## 2.2 Statistics
There are two parameters in Tableau, the values of which allow us to assess the validity of the chosen trendline fit:
* test probability - [(p-value, probability value) - the probability that a relationship observed in a random sample from the population may have occurred by chance, due to random sampling variability, although it does not occur at all in the population. It is a tool for basic error control only, and indicates the evidential value of data only indirectly. ](https://pl.wikipedia.org/wiki/Warto%C5%9B%C4%87_p)
* coefficient of determination - [Tells how much of the variation (variance) of the explained variable in the sample matches the correlations with the variables in the model. It is therefore a measure of the degree to which the model fits the sample. The coefficient of determination takes values in the interval [0;1] if there is a free expression in the model and the least squares method was used to estimate the parameters. Its values are usually expressed as a percentage. The better the model fit, the R² value is closer to 1.](https://pl.wikipedia.org/wiki/Wsp%C3%B3%C5%82czynnik_determinacji)
## 2.3 Additional tools
### 2.3.1 Histogram

Histogram - a type of graph that shows how many items fall within each group; a type of graph that uses rectangles of different lengths, and often different widths, to show the number or rate of something within different ranges. A histogram does not usually have spaces between the columns.
[More ifnormation](https://en.wikipedia.org/wiki/Histogram)

# 3 Exercise
## 3.1 Blood pressure vs age
Using systolic blood pressure (SBP) data for different age groups, fit optimal trend lines for women (1) and men (2). Prepare a dashboard that shows the above trend lines along with a plot of residual values. You can also add your own commentary on the dashboard, hypothesis on how age influences the health of the average woman/man.
Data for the exercise are presented in the figure below (they should be transcribed in a format acceptable to Tableau).

## 3.2 Temperature vs number of sun spots
[Climate explained: Sunspots do affect our weather, a bit, but not as much as other things...](https://theconversation.com/climate-explained-sunspots-do-affect-our-weather-a-bit-but-not-as-much-as-other-things-145101)
Try to analyse the influence of the Sun activity (number of Sun spots) on global warming. The data you can download from Kaggle (search for data).
# 4 Exercise: carbon emissions analysis
We have developed a complex system of producing more and more animals that use more and more of our resources, while leaving a massive amount of waste, pollution and adverse climate change in their way. This excer
cise focuses on the environmental impacts of food.

Prepare the following visualizations and analysis based on the Kaggle project [Choose your food wisely!](https://www.kaggle.com/code/selfvivek/choose-your-food-wisely):
1. Histograms of measures.
2. Total emission vs Food product. Visualize the [Pareto principle](https://en.wikipedia.org/wiki/Pareto_principle) (the idea that a small quantity of work or resources can produce a large number of results).

3. Stages of supply chain (Land usage, Farm, Animal Feed, Processing, Transport, Retail, Packaging).

4. Scatter plot: Fresh Water use per 1000Kcal vs Fresh Water use per Kg.
5. Food products cluster analysis related to the total impact on environment.
