Try   HackMD

8 Lab [en] - Data visalization, Trend analysis

tags: Data Visualization trend analysis data mining

Introduction - Purpose of the exercise

The purpose of this exercise is to learn about the visual analytics features offered by Tableau.

1. background information

Two popular approaches to visual data analysis include data visualization and visual analytics. Both methods play a very important role in data mining.

Data visualization is the graphical representation of data in the form of dashboards or reports. Individual visualizations present views of data that answer "what? and how?" questions, e.g., "What are our sales and profits for different regions and months or years? " Data visualization provides answers to a finite set of questions and provides some interactivity during the data mining process.

Answering the "what? and how?" questions is the first step in data mining - the next is the "why?" question. To make the data mining process deeper and more complete one should use analytics methods. Visual analytics is a more user-friendly and technology-enabled branch of data analytics.

Table 1 Differences between Data Visualization and Visual Analytics processes.

Features Data visualization Visual analytics
Answering questions: what?, how? X X
Illustration of data points, series, KPIs X X
Visual presentation of data: dashboards, reports X X
Support for interactivity: filters, highliters, tooltips, data drilling X X
Deeper analysis, support for question: why? X
Support for advanced analytical methods X
Unification of the visualization process, data mining and queries X
Helps to think visually about problems and questions. Leads to unexpected insights and allows identification of outliers in the data. X
Helps share key information and provides tools for group work with data X

2: Visual analytics tools in Tableau

The following figure illustrates the visual analytics tools available in the Tableau tool cockpit.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

2.1 Trend Lines

Trend lines are used to predict the continuation of a particular trend of an analyzed variable. Trend lines also help to identify the correlation between two variables. There are many mathematical models for establishing trend lines - Tableau provides five most commonly used options in this regard, i.e. linear, logarithmic, exponential (2 types) and polynomial trends.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

The following material demonstrates how to use trend lines to analyze the dependence of energy output on wind speed in wind turbines.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

2.2 Statistics

There are two parameters in Tableau, the values of which allow us to assess the validity of the chosen trendline fit:

2.3 Additional tools

2.3.1 Histogram

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Histogram - a type of graph that shows how many items fall within each group; a type of graph that uses rectangles of different lengths, and often different widths, to show the number or rate of something within different ranges. A histogram does not usually have spaces between the columns. More ifnormation

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

3 Exercise

3.1 Blood pressure vs age

Using systolic blood pressure (SBP) data for different age groups, fit optimal trend lines for women (1) and men (2). Prepare a dashboard that shows the above trend lines along with a plot of residual values. You can also add your own commentary on the dashboard, hypothesis on how age influences the health of the average woman/man.

Data for the exercise are presented in the figure below (they should be transcribed in a format acceptable to Tableau).

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

3.2 Temperature vs number of sun spots

Climate explained: Sunspots do affect our weather, a bit, but not as much as other things

Try to analyse the influence of the Sun activity (number of Sun spots) on global warming. The data you can download from Kaggle (search for data).

4 Exercise: carbon emissions analysis

We have developed a complex system of producing more and more animals that use more and more of our resources, while leaving a massive amount of waste, pollution and adverse climate change in their way. This excer cise focuses on the environmental impacts of food.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Prepare the following visualizations and analysis based on the Kaggle project Choose your food wisely!:

  1. Histograms of measures.

  2. Total emission vs Food product. Visualize the Pareto principle (the idea that a small quantity of work or resources can produce a large number of results).

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  3. Stages of supply chain (Land usage, Farm, Animal Feed, Processing, Transport, Retail, Packaging).

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  4. Scatter plot: Fresh Water use per 1000Kcal vs Fresh Water use per Kg.

  5. Food products cluster analysis related to the total impact on environment.

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →