---
title: Introduction
description:
duration: 200
card_type: cue_card
---
### **DataViz-Lecture 02-VideoGames (1 hour 30 minutes)**
#### **Content**
- Quizzes
- Quiz 1 (Barplot)
- Quiz 2 (Scatterplot)
- Bivariate
- Continous-Continous
- Line plot
- Scatterplot
- Categorical-Categorical
- Dodged countplot
- Stacked countplot
- Categorical-Continuous
- Multiple BoxPlots
- Barplots
- Subplots
---
title: Bivariate Data Visualisation intro, Line plot
description:
duration: 1200
card_type: cue_card
---
#### **Importing the data**
Code:
``` python=
!gdown https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/021/299/original/final_vg1_-_final_vg_%281%29.csv?1670840166 -O vgsales.csv
```
> Output:
```
Downloading...
From: https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/021/299/original/final_vg1_-_final_vg_%281%29.csv?1670840166
To: /content/vgsales.csv
100% 2.04M/2.04M [00:01<00:00, 1.76MB/s]
```
Code:
``` python=
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
```
Code:
``` python=
data = pd.read_csv('vgsales.csv')
data.head()
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/932/original/1.png?1695752105" width="700" height="150">
Code:
``` python=
data.describe()
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/933/original/2.png?1695752161" width="700" height="250">
### **Bivariate Data Visualization**
#### **Continous-Continous** (30-40 Minutes)
So far we have been analyzing only a single feature.
But what if we want to visualize two features at once?
#### What kind of questions can we ask regarding a continous-continous pair of features?
- Maybe show relation between two features, like **how does the sales vary over the years**?
- Or show **how are the features associated, positively or negatively**?
\...And so on
Let's go back to the line plot we plotted at the very beginning
#### **Line Plot**
- A line chart in data visualization is a type of **graph** that displays data points as connected line segments.
- It is commonly used to show **trends**,**patterns**, or changes in data over time or across categories, with the x-axis typically representing **time or categories** and the y-axis representing **values or quantities**.
- Line charts are useful for visualizing continuous data and making it easier to understand how variables relate to each other.
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/322/original/Line.png?1695495606" width="400" height="300">
#### How can we plot the sales trend over the years for the longest running game?
First, let's find the longest running game first
Code:
``` python=
data['Name'].value_counts()
```
> Output:
```
Ice Hockey 41
Baseball 17
Need for Speed: Most Wanted 12
Ratatouille 9
FIFA 14 9
..
Indy 500 1
Indy Racing 2000 1
Indycar Series 2005 1
inFAMOUS 1
Zyuden Sentai Kyoryuger: Game de Gaburincho!! 1
Name: Name, Length: 11493, dtype: int64
```
Great, so `Ice Hockey` is longer running than most games
Let's try to find the sales trend in North America of the same across the years
Code:
``` python=
ih = data.loc[data['Name']=='Ice Hockey']
sns.lineplot(x='Year', y='NA_Sales', data=ih)
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/945/original/4.png?1695752926" width="400" height="300">
#### What can we infer from this graph?
- The sales across North America seem to have been boosted in the years of 1995-2005
- Post 2010 though, the sales seem to have taken a dip
Line plot are great to represending trends such as above, over time
#### Style and Labelling
We already learnt in barplot how to add **titles, x-label and y-label**
Let's add the same here
Code:
``` python=
plt.title('Ice Hockey Sales Trend')
plt.xlabel('Year')
plt.ylabel('Sales')
sns.lineplot(x='Year', y='NA_Sales', data=ih)
plt.show()
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/946/original/5.png?1695753007" width="400" height="300">
- It gives **meaning of values** on x and y axis in **lables**
- Mention the purpose of plot using **title**
#### Now what if we want to change the colour of the curve ?
`sns.lineplot()` contains an argument **color**
- It takes as argument a matplotlib color
OR
- as string for some defined colours like:
- black: `k`/ `black`
- red: `r`/`red` etc
**But what all colours can we use ?**
Matplotlib provides a lot of colours
Check the documentation for more colours
<https://matplotlib.org/2.0.2/api/colors_api.html>
Code:
``` python=
plt.title('Ice Hockey Sales Trend')
plt.xlabel('Year')
plt.ylabel('Sales')
sns.lineplot(x='Year', y='NA_Sales', data=ih, color='r')
plt.show()
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/947/original/6.png?1695753295" width="400" height="300">
Now, lets say we only want to show the values from years 1990-2000
#### How can we limit our plot to only the last decade of 20th century?
This requires changing the range of x-axis
#### But how can we change the range of an axis in matplotlib ?
We can use:
- `plt.xlim()`: x-axis
- `plt.ylim()`: y-axis
These funcs take same 2 args:
1. `left`: Starting point of range
2. `right`: End point of range
Code:
``` python=
plt.title('Ice Hockey Sales Trend')
plt.xlabel('Year')
plt.ylabel('NA Sales')
plt.xlim(left=1995,right=2010)
sns.lineplot(x='Year', y='NA_Sales', data=ih)
plt.show()
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/948/original/7.png?1695753488" width="400" height="300">
So far we have visualised a single plot to understand it
**What if we want to compare it with some other plot?**
Say, we want to compare the same sales trend between two games
- Ice Hockey
- Baseball
Let's first plot the trend for \"Baseball\"
Code:
``` python=
baseball = data.loc[data['Name']=='Baseball']
sns.lineplot(x='Year', y='NA_Sales', data=baseball)
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/949/original/8.png?1695753579" width="400" height="300">
Now, to compare these, so we will have to draw these plots in the same figure
#### How can we plot multiple plots in the same figure ?
Code:
``` python=
sns.lineplot(x='Year', y='NA_Sales', data=ih)
sns.lineplot(x='Year', y='NA_Sales', data=baseball)
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/950/original/9.png?1695753641" width="400" height="300">
We can use multiple `sns.lineplot()` funcs
Observe:
Seaborn automatically created 2 plots with **different colors**
#### However how can we know which colour is of which plot ?
- sns.lineplot() has another argument **label** to do so
- We can simply set the label of each plot
Code:
``` python=
sns.lineplot(x='Year', y='NA_Sales', data=ih, label='Ice Hockey')
sns.lineplot(x='Year', y='NA_Sales', data=baseball, label='Baseball')
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/951/original/10.png?1695753784" width="400" height="300">
We can also pass these labels in plt.legend() as a list in the order plots are done
Code:
``` python=
sns.lineplot(x='Year', y='NA_Sales', data=ih)
sns.lineplot(x='Year', y='NA_Sales', data=baseball)
plt.legend(['Ice Hockey','Baseball'])
plt.show()
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/952/original/11.png?1695754010" width="400" height="300">
#### Now can we change the position of the legend, say, to bottom-right corner?
- Matplotlib automatically decides the best position for the legends
- But we can also change it using the `loc` parameter
- `loc` takes input as 1 of following strings:
- upper center
- upper left
- upper right
- lower right etc
Code:
``` python=
sns.lineplot(x='Year', y='NA_Sales', data=ih)
sns.lineplot(x='Year', y='NA_Sales', data=baseball)
plt.legend(['Ice Hockey','Baseball'], loc='lower right')
plt.show()
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/953/original/12.png?1695754113" width="400" height="300">
#### Now what if we want the legend to be outside the plot?
Maybe the plot is too congested to show the legend
We can use the same `loc` parameter for this too
Code:
``` python=
sns.lineplot(x='Year', y='NA_Sales', data=ih)
sns.lineplot(x='Year', y='NA_Sales', data=baseball)
plt.legend(['Ice Hockey','Baseball'], loc=(-0.5,0.5))
plt.show()
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/954/original/13.png?1695754238" width="500" height="300">
The pair of floats signify the (x,y) coordinates for the legend
==> From this we can conclude `loc` takes **two types of arguments**:
- The location in the **form of string**
- The location in the **form of coordinates**
#### What if we want to add other stylings to legends ?
For eg:
- Specify the **number of rows/cols**
- Uses parameter `ncols` for this
- The number of **rows are decided automatically**
- Decide if we want the box of legends to be displayed
- Use the bool param `frameon`
and so on.
Code:
``` python=
sns.lineplot(x='Year', y='NA_Sales', data=ih)
sns.lineplot(x='Year', y='NA_Sales', data=baseball)
plt.legend(['Ice Hockey','Baseball'], loc='lower right', ncol = 2, frameon = False)
plt.show()
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/955/original/14.png?1695754327" width="400" height="300">
Now say we want to highlight a point on our curve.
For e.g.
#### How can we highlight the maximum \"Ice Hockey\" sales across all years ?
Let's first find this point
Code:
``` python=
print(max(ih['NA_Sales']))
```
> Output:
```
0.9
```
---
title: Scatter Plot
description:
duration: 400
card_type: cue_card
---
#### **Scatter Plot**
- A scatter plot in data visualization is a graph that displays individual data points as dots on a two-dimensional plane.
- It helps show **how two variables are related** or how they vary together, with one variable plotted on the horizontal **(x-axis)** and the other on the vertical **(y-axis)**.
- This type of chart is useful for **identifying patterns, trends, or correlations** in data.
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/323/original/diag2.png?1695495919" width="400" height="350"> <br />
Now suppose we want to find the relation between `Rank` and `Sales` of all games.
#### Are `Rank` and `Sales` positively or negatively correlated?
In this case, unlike line plot, there maybe multiple points in y-axis for each point in x-axis
``` python=
data.head()
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/956/original/16.png?1695754815" width="600" height="150">
#### How can we plot the relation between `Rank` and `Global Sales`?
Can we use lineplot?
Let's try it out
``` python=
sns.lineplot(data=data, x='Rank', y='Global_Sales')
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/957/original/17.png?1695754945" width="400" height="300">
The plot itself looks very messy and it\'s hard to find any patterns from it.
#### Is there any other way we can visualize this relation?
Use scatter plot
Code:
``` python=
sns.scatterplot(data=data, x='Rank', y='Global_Sales')
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/958/original/18.png?1695755106" width="400" height="300">
Compared to lineplot, we are able to see the patterns and points more distinctly now!
Notice,
- The two variables are negatively correlated with each other
- With increase in ranks, the sales tend to go down, implying, lower ranked games have higher sales overall!
Scatter plots help us visualize these relations and find any patterns in the data
Key Takeaways:
- For **Continuous-Continuous Data** =\> **Scatter Plot**,**Line Plot**
Sometimes, people also like to display the linear trend between two variables - Regression Plot, do check that
---
title: Quiz-1
description:
duration: 60
card_type: quiz_card
---
# Question
Apple wanted to conduct an analysis and find the relationship price and number of units sold for it's products. Which of the following plots will we prefer ?
# Choices
- [x] Scatter Plot
- [ ] Pie Chart
- [ ] Boxplot
- [ ] Line Plot
---
title: Quiz-1 explanation, Categorical categorical
description:
duration: 1200
card_type: cue_card
---
#### Quiz-1 explanation
Since we are comparing two numerical variables (price and units sold) to find their relationship pattern, we will use a scatterplot
### **Categorical-Categorical** (20 Minutes)
Earlier we saw how to work with continous-continuous pair of data
Now let's come to the second type of pair of data:
**Categorical-Categorical**
#### What questions comes to your mind when we say categorical-categorical pair?
Questions related to distribution of a category within another category
- What is the **distribution of genres for top-3 publishers**?
- Which **platforms do these top publishers use?**
#### Which plot can we use to show distribution of one category with respect to another?
-> We can have can **have multiple bars for each category**
- These multiple bars can be stacked together - **Stacked Countplot**
Or
- Can be placed next to each other - **Dodged Countplot**
#### **Dodged Count Plot**
- A **Dodged Count Plot** in data visualization is a chart that displays the **frequency of different categories** within two or more groups **side by side**, making it easy to compare the distribution of data across these groups.
- Each category is represented by a **separate set of bars or columns**, with each group\'s data visually separated for clarity.
- It's commonly used to show how categorical variables are distributed across different conditions or categories.
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/325/original/diag3.png?1695496338" width="400" height="300">
#### How can we compare the top 3 platforms these publishers use?
We can use a dodged countplot in this case
Code:
``` python=
plt.figure(figsize=(10,8))
sns.countplot(x='Publisher',hue='Platform',data=top3_data)
plt.ylabel('Count of Games')
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/959/original/20.png?1695755420" width="400" height="300">
#### What can we infer from the dodged countplot?
- EA releases PS2 games way more than any other publisher, or even platform!
- Activision has almost the same count of games for all 3 platforms
- EA is leading in PS3 and PS2, but Namco leads when it comes to DS platform
#### **Stacked Countplot**
- A stacked count plot in data visualization is a chart that displays the count of different categories or groups in a dataset, with each category represented as a separate bar or column.
- The bars are stacked on top of each other, showing the total count while also highlighting the distribution of counts within each category.
- This type of plot is useful for comparing the composition of data across multiple categories or subgroups.
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/328/original/Dav5.png?1695499000" width="400" height="300">
#### How can we visualize the distribution of genres for top-3 publishers?
We can use a `stacked countplot`
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/021/545/original/download_%281%29.png?1671006217" width="400" height="300">
But stacked countplots can be misleading
Some may find it difficult to understand if it starts from baseline or from on top of the bottom area
#### How do we decide between a Stacked countplot and Dodged countlot?
- Stacked countplots are a good way to represent totals
- While dodged countplots helps us to comapare values between various categories, and within the category itself too
---
title: Continuous categorical
description:
duration: 600
card_type: cue_card
---
### **Continous-Categorical** (10 Minutes)
Now let's look at our 3rd type of data pair
#### What kind of questions we may have regarding a continuous-categorical pair?
- We might to want calculate some numbers category wise
- Like **What is the average sales for every genre?**
- Or we might be interested in checking the distribution of the data category-wise
- **What is the distribution of sales for the top3 publishers?**
#### What kind of plot can we make for every category?
-> Either KDE plot or Box Plot per category
#### **Boxplot**
- A box plot, also known as a box-and-whisker plot, is a simple and effective way to visualize the distribution of a dataset.
- It displays the median, quartiles, and potential outliers of the data in a box-like graph.
#### Box plots show the five-number summary of data:
1. Minimum score,
2. first (lower) quartile
3. Median
4. Third (upper) quartile
5. maximum score
#### **Diagram**
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/043/original/Box_Plot.png?1695311672" width="600" height="300">
#### What is the distribution of sales for the top3 publishers?
Code:
``` python=
sns.boxplot(x='Publisher', y='Global_Sales', data=top3_data)
plt.xticks(rotation=90,fontsize=12)
plt.yticks(fontsize=12)
plt.title('Sales for top3 publisher', fontsize=15)
plt.show()
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/961/original/22.png?1695755725" width="400" height="350">
#### What can we infer from this plot?
- The overall sales of EA is higher, with a much larger spread than other publishers
- Activision doesn't have many outliers, and if you notice, even thought the spread is lesser than EA, the median is almost the same
#### **Barplot**
What if we want to compare the sales between the genres?
We have to use:
- Genre (categorical)
- Mean of global sales per genre (numerical)
#### How to visualize which genres bring higher average global sales?
Code:
``` python=
sns.barplot(data=top3_data, x="Genre", y="Global_Sales", estimator=np.mean)
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/962/original/23.png?1695755818" width="400" height="300">
If you remember, we had earlier seen EA had a larger market share of sales
Along with this fact, majority of games EA made was sports
This ultimately proves the fact that Sports has a high market share in the industry, as shown in the barchart
---
title: Quiz-2
description:
duration: 60
card_type: quiz_card
---
# Question
For the company "Toyota", we want to find which type of vehicle has made the maximum sales. Which plot we will prefer to use here?
# Choices
- [x] Bar Plot
- [ ] Pie Chart
- [ ] Boxplot
- [ ] Line Plot
---
title: Quiz-2 explanation, Subplots
description:
duration: 1200
card_type: cue_card
---
#### Quiz-2 explanation
We are comparing a numerical (sales) and a categorical (type of vehicle) variable. Hence we will use a barplot here
### **Subplots (15-20 Minutes)**
So far we have **shown only 1 plot** using `plt.show()`
Say, we want to plot the trend of NA and every other region separately in a single figure
#### How can we plot multiple smaller plots at the same time?
We will use **subplots**, i.e., **divide the figure into smaller plots**
We will be using `plt.subplots()` It takes mainly 2 arguments:
1. **No. of rows** we want to **divide our figure** into
2. **No. of columns** we want to **divide our figure** into
It returns 2 things:
- Figure
- Numpy Matrix of subplots
Code:
``` python=
fig = plt.figure(figsize=(15,10))
sns.scatterplot(top3_data['NA_Sales'], top3_data['EU_Sales'])
fig.suptitle('Main title')
plt.show()
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/963/original/24.png?1695756046" width="600" height="500">
Code:
``` python=
fig = plt.figure(figsize=(15,10))
plt.subplot(2, 3, 1)
sns.scatterplot(x='NA_Sales', y='EU_Sales', data=top3_data)
plt.subplot(2, 3, 3)
sns.scatterplot(x='NA_Sales', y='JP_Sales', data=top3_data, color='red')
fig.suptitle('Main title')
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/964/original/25.png?1695756128" width="600" height="400">
Code:
``` python=
fig, ax = plt.subplots(2, 2, figsize=(15,10))
ax[0,0].scatter(top3_data['NA_Sales'], top3_data['EU_Sales'])
ax[0,1].scatter(top3_data['NA_Sales'], top3_data['JP_Sales'])
ax[1,0].scatter(top3_data['NA_Sales'], top3_data['Other_Sales'])
ax[1,1].scatter(top3_data['NA_Sales'], top3_data['Global_Sales'])
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/965/original/26.png?1695756295" width="600" height="400">
Notice, we are using 2 numbers during each plotting
Think of subplots as a 2x2 grids, with the two numbers denoting `x,y`/`row,column` coordinate of each subplot
#### What is this `ax` parameter exactly?
Code:
``` python=
print(ax)
```
> Output:
```
[[<matplotlib.axes._subplots.AxesSubplot object at 0x7f5aad891850>
<matplotlib.axes._subplots.AxesSubplot object at 0x7f5aad82b340>]
[<matplotlib.axes._subplots.AxesSubplot object at 0x7f5aaddefa60>
<matplotlib.axes._subplots.AxesSubplot object at 0x7f5aade221c0>]]
```
Notice,
- It's a 2x2 matrix of multiple axes objects
We are plotting each plot on a single `axes` object.
Hence, we are using a 2D notation to access each grid/axes object of the subplot
Instead of accesing the individual axes using `ax[0, 0]`, `ax[1, 0]`, there is another method we can use too
Code:
``` python=
import matplotlib.pyplot as plt
import numpy as np
plt.figure(figsize=(20,12)).suptitle("NA Sales vs regions",fontsize=20)
# Using a 2x3 subplot
plt.subplot(2, 3, 1)
sns.scatterplot(x='NA_Sales', y='EU_Sales', data=top3_data)
plt.subplot(2, 3, 3)
sns.scatterplot(x='NA_Sales', y='JP_Sales', data=top3_data, color='red')
plt.subplot(2, 3, 4)
sns.scatterplot(x='NA_Sales', y='Other_Sales', data=top3_data, color='green')
plt.subplot(2, 3, 6)
sns.scatterplot(x='NA_Sales', y='Global_Sales', data=top3_data, color='orange')
plt.show()
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/966/original/27.png?1695756574" width="600" height="400">
`Suptitle` adds a title to the whole figure
#### We need to observe a few things here
1. The 3rd paramter defines the position of the plot
2. The position/numbering starts from 1
3. It goes on row-wise from start of row to its finish
4. Empty subplots don't show any axes
#### But how do we know which plot belongs to which category?
Basically the context of each plot
We can use `title`, `x/y label` and every other functionality for the subplots too
Code:
``` python=
plt.figure(figsize=(20,12)).suptitle("NA Sales vs regions",fontsize=20)
# Using a 2x3 subplot
plt.subplot(2, 3, 1)
sns.scatterplot(x='NA_Sales', y='EU_Sales', data=top3_data)
plt.title('NA vs EU Sales', fontsize=12)
plt.xlabel('NA', fontsize=12)
plt.ylabel('EU', fontsize=12)
plt.subplot(2, 3, 3)
sns.scatterplot(x='NA_Sales', y='JP_Sales', data=top3_data, color='red')
plt.title('NA vs JP Sales', fontsize=12)
plt.xlabel('NA', fontsize=12)
plt.ylabel('JP', fontsize=12)
plt.subplot(2, 3, 4)
sns.scatterplot(x='NA_Sales', y='Other_Sales', data=top3_data, color='green')
plt.title('NA vs Other Region Sales', fontsize=12)
plt.xlabel('NA', fontsize=12)
plt.ylabel('Other', fontsize=12)
plt.subplot(2, 3, 6)
sns.scatterplot(x='NA_Sales', y='Global_Sales', data=top3_data, color='orange')
plt.title('NA vs Global Sales', fontsize=12)
plt.xlabel('NA', fontsize=12)
plt.ylabel('Global', fontsize=12)
plt.show()
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/967/original/28.png?1695756654" width="600" height="400">
#### What if we want to span a plot across the full length of the plot?
Think of this in **terms of a grid.**
Currently we are **dividing our plot into 2 rows and 3 columns**
But we want our plot to be across the middle column, with grids 2 and 5
This can be said as a **single column**
So, this problem can be simplified to plotting the plot across **second column in a 1 row 3 column subplot**
Code:
``` python=
plt.figure(figsize=(20,12)).suptitle("Video Games Sales Dashboard",fontsize=20)
# Using a 2x3 subplot
plt.subplot(2, 3, 1)
sns.scatterplot(x='NA_Sales', y='EU_Sales', data=top3_data)
plt.title('NA vs EU Sales', fontsize=12)
plt.xlabel('NA', fontsize=12)
plt.ylabel('EU', fontsize=12)
plt.subplot(2, 3, 3)
sns.scatterplot(x='NA_Sales', y='JP_Sales', data=top3_data, color='red')
plt.title('NA vs JP Sales', fontsize=12)
plt.xlabel('NA', fontsize=12)
plt.ylabel('JP', fontsize=12)
# Countplot of publishers
plt.subplot(1,3,2)
sns.countplot(x='Publisher', data=top3_data)
plt.title('Count of games by each Publisher', fontsize=12)
plt.xlabel('Publisher', fontsize=12)
plt.ylabel('Count of games', fontsize=12)
plt.subplot(2, 3, 4)
sns.scatterplot(x='NA_Sales', y='Other_Sales', data=top3_data, color='green')
plt.title('NA vs Other Region Sales', fontsize=12)
plt.xlabel('NA', fontsize=12)
plt.ylabel('Other', fontsize=12)
plt.subplot(2, 3, 6)
sns.scatterplot(x='NA_Sales', y='Global_Sales', data=top3_data, color='orange')
plt.title('NA vs Global Sales', fontsize=12)
plt.xlabel('NA', fontsize=12)
plt.ylabel('Global', fontsize=12)
plt.show()
```
> Output:
<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/050/968/original/29.png?1695756726" width="600" height="400">