# 01 [en] - Introduction, workshop ###### tags: `Data Visualization` `Tableau` `charts` [TOC] # 1. Basic information ## 1.1 Analytical platform - common core functionalities ```markmap # Analytical Platform (Tableau) ## Data Sources - Database Connectors - APIs - Cloud Storage - Local Files (CSV, Excel) - Web Data Connectors ## Data Transformation & Preparation - Data Cleaning - Data Blending - Data Aggregation - Joins & Unions - Calculated Fields ## Visualization & Analytics - Charts & Graphs - Dashboards - Storytelling - Analysis - Statistical Analysis - Machine Learning Analysis - Mapping & Geo Analysis ## Collaboration & Sharing - Exporting - Publishing - User Permissions - Commenting ## Extensions & Integrations - API Integrations - Custom Visualizations - Embedded Analytics - Third-party Add-ons ``` <center><small>Fig. Analytical platform - common core func.</small></center> ![image](https://hackmd.io/_uploads/B1Oqxgpl1x.png) <center><small>Fig. Gartner Magic Quadrant for Analytics and Business Intelligence Platforms 2023.</small></center> Gartner Magic Quadrants are research reports produced by Gartner, a leading global research and advisory firm. These reports provide a graphical representation of a market's direction, maturity, and participants, assessing vendors based on their ability to execute and their completeness of vision. The Magic Quadrant is divided into four quadrants: * Leaders: Vendors in this quadrant demonstrate a strong ability to execute and have a comprehensive vision for their market. They are well-established and have a significant market presence. * Challengers: These vendors have a strong ability to execute but may lack a complete vision for future market trends. They are often larger companies with solid products but may not be innovating as quickly as leaders. * Visionaries: Vendors in this quadrant exhibit a clear vision of future trends and innovations but may struggle with execution or lack the necessary market presence. * Niche Players: These vendors focus on a specific segment of the market and may excel in that niche. However, they often have limited ability to execute across the broader market or lack a comprehensive vision. Gartner Magic Quadrants are widely used by businesses to assess technology vendors and make informed purchasing decisions. ## 1.2 The process of preparing a visualisation - Tableau [Fig.1](#rys1) presents the main steps that make up the process of preparing a presentation in Tableau. <a id="rys1"></a>![](https://i.imgur.com/s5wiUvw.png) <center><small>Fig.1 BI visualization preparation process.</small></center> ## 1.3 Management of visualisation project files A typical visualization project implemented within BI requires the construction of a repository consisting of at least several types of files: 1. *Data source* related files - these are files that store data, e.g: ***csv, xls, txt, ...*** Workbook *Tableau* related files - files with the extension ***twb***. Files with so-called *extract* data sources Tableau *extract* data source files - files that store subsets of data extracted from a data source to optimize data access and provide functionality not supported by the original data source . [Detailed information from the manufacturer's website.](https://https://help.tableau.com/current/pro/desktop/en-us/extracting_data.htm) File extensions *extract* - **hyper**. Files, so-called *packaged workbook* - file that combines Tableau workbook with *extract* data source into one entity - extension **twbx**. :::warning **Note** :warning: It is recommended that you create a hierarchical, dedicated file structure to store data related to all projects carried out in class. As a rule, all files related to one project should be stored in one folder. Below is a proposed example of a file organisation structure. ```graphviz digraph hierarchy { node [color=Red,fontname=Courier,shape=box] edge [color=Grey, style=dashed] NameSurname->{"01 lab" "02 lab" ".."} "01 lab"->{"01 excercise" "02 excercise" "03 excercise"} "01 excercise"->{"data source \nxls,csv,.." "Tableau workbook\n*.twb" "Tableau extract \n *.hyper" "Tableau packaged \nworkbook *.twbx"} } ``` ::: ## 1.4 Tableau Public - creating an account on the server In order to publish your prepared visualizations, you must have an account on a Tableau server. Tableau offers two types of servers: 1. commercial servers - license required; offers extensive methods for managing permissions and organizing published visualizations; 1. free server provided by Tableau, so called [*Tableau Public*](https://public.tableau.com/) - non-commercial server; limited ability to manage published visualizations. <br> # 2. Sample visualisations in Tableau ## 2.1 Preparing the data source [Download the data source](https://docs.google.com/spreadsheets/d/1tj-HaKkxR-sEwHgpR6jYjVMC7NKUysb5/edit?usp=sharing&ouid=111325505720189090582&rtpof=true&sd=true) - file form the online store, with orders database. ## 2.2 Connect the data source to the worksheet ![image](https://hackmd.io/_uploads/B12lwgTgJx.png) <center><small>Fig. Connection to data source: excell spreadsheet or directly to Google drive</small></center> ## 2.3 Data Types, Dimensions and Measures, Blue and Green Tableau identifies each field as a dimension or measure in the Data pane, depending on the type of data the field contains. Data fields are made from the columns in your data source. Each field is automatically assigned a data type such as integer, string, or date, and a role: a discrete dimension or continuous measure (or less commonly, a continuous dimension or discrete measure). * **Dimensions** contain qualitative values (such as names, dates, or geographical data). You can use dimensions to categorize, segment, and reveal the details in your data. Dimensions affect the level of detail in the view. * **Measures** contain numeric, quantitative values that you can measure. Measures are aggregated by default. When you drag a measure into the view, Tableau applies an aggregation on the pill. ### Blue versus green fields Tableau represents data differently in the view depending on whether the field is discrete or continuous). Continuous and discrete are mathematical terms. * Continuous means "forming an unbroken whole, without interruption". **<font color=green>Fields are colored green.</font>** When a continuous field is put on the Rows or Columns shelf, an axis is created in the view. * Discrete means "individually separate and distinct." **<font color=blue>These fields are colored blue.</font>** When a discrete field is put on the Rows or Columns shelf, a header is created in the view. ### Tableau Desktop workflow ![image](https://hackmd.io/_uploads/BJiwlbW-kg.png) <center><small>Fig. Sample ETL process - zero step in the Tableau workflow</small></center> For sharing the dashboards online you can use: - Tableau server - commercial software component, - [Tableau Public](https://public.tableau.com/app/discover) - free component, but it is necessary to create a user account in Tableau Public. ## 2.4 Sample visualisations - types of charts Common components of user interface: - rows, columns, - panel **Marks** and visual attributes (e.g. color, size, label, tooltip, shape, ..) - **Filters**, - **Data** tab (measures and dimensions). ![image](https://hackmd.io/_uploads/r1O2D-Z-ke.png) <center><small>Fig. User interface - common components</small></center> ### 2.4.1 Line chart / Bar chart Prepare line chart visualising sales (Sales) over time (Order date). Set up: 1. labels (values of *Sales*), 1. different colors for different product category (*Category*), 1. number format for Sales axis (without **K** suffix), 2. set workbook locale. ![image](https://hackmd.io/_uploads/HyoVoiGZkx.png) <center><small>Fig. Line chart</small></center> :::info #### ::: Bar chart - excercise Practise using filters and different order in rows/column shelves to prepare Fig.A ![image](https://hackmd.io/_uploads/rJNQcGZWyg.png) <center><small>Fig.A Sum(Sales) versus categories and selected countries </small></center> ::: ### 2.4.2 Point chart and how to avoid overplotting Visualizing large data sets with x-y plots can be difficult due to overplotting, where points overlap and create a cluttered display. Prepare **Circle** or **Shape chart** visualising total sales in product categories for different regions. ![image](https://hackmd.io/_uploads/H10YJsGW1l.png) <center><small>Fig. Overplotting</small></center> Try to avoid overplotting by: 1. scattering points to an additional measure (*random()* function) 2. replacing visualization of individual points with visualization of stat. distributions (Box Plot, histogram) 3. using data drilling technique - definition of hierachies in data source. ![image](https://hackmd.io/_uploads/Bkm3Ejz-1x.png) ### 2.4.3 Maps Prepare the map for **EU Market** that illustrates the number of customers (by colour and label) in each country. ![image](https://hackmd.io/_uploads/BknYtoMbyg.png) <center><small>Fig. Number of customers in different countries</small></center> ### 2.4.4 Pie chart / Donut chart / Sunburst chart Prepare **Pie chart** visualising number of orders versus categories. Set up: 1. filters for markets: EU and EMEA 2. as measure: COUNTD(Order ID), as dimension: Category 3. define labels; for percentage value use Quick Table Calculation ![image](https://hackmd.io/_uploads/r1X9e7b-1x.png) <center><small>Fig. Pie/Donut/Sunburst chart</small></center> ## 2.5 Dashboards and visualisation interactivity * show and customize filters * define tooltips * connect sheets on dashboard ![image](https://hackmd.io/_uploads/H1dVRsGZ1l.png) <center><small>Fig. Sample of the final dashboard</small></center> ## 2.6 Publishing to the *Tableau Public* server The final step is to publish the visualization on the *Tableau Public* server. To do this: * on the *Data source* tab, change the connection type from *Live* to *Extract*, * top menu: Server -> Tableau Public -> Save to Tableau Public; after providing login details, we choose a name for the visualization on the server and publish the visualization. # Do-it-yourself task --- *[BDL]: Local Data Bank *[BI]: Business Intelligence