---
title: Content
description:
duration: 5400
card_type: cue_card
---
## <font color='blue'>**Content**</font>
- **Population vs Sample**
- **Sample Statistics**
- Sample Mean
- Sample variance
- Sample Standard Deviation
- **Point Estimates**
- **Standard Error**
- **Sampling techniques**
- Random Sampling
- **Uniform Distribution**
---
title: Introduction, Population vs Sample
description:
duration: 5400
card_type: cue_card
---
## <font color='blue'>**Introduction**</font> (3 mins)
In our last class, we were primarily concerned with Binomial and Bernoulli distributions, looking at scenarios with known population parameters.
Also, we have gained ample knowledge of Descriptive statistics. Now it's time to study inferential statistics.
Today we will explore a different dimension of statistics—making estimations or inferences based on sample data.
So, let's get started.
> <font color='purple'>**What happens when we face real-world scenarios with immense populations or when collecting data from every single element seems impractical?**</font>
**Scenarios:**
- <font color='purple'>Imagine we want to understand the average income of all professionals in Bangalore,</font>
- <font color='purple'>Or we're curious about the percentage of people who prefer coffee over tea in an entire country</font>.
Collecting data from every individual in these scenarios is difficult and often impossible.
This is where the concept of <font color='purple'>**Samples**</font> comes into play.
From those samples, we can use **point estimates** to estimate the population parameters.
<br>
- Point estimates offer us a way to draw insights and make decisions using information from a <font color='purple'>**sample (a smaller group of observations)**</font>, which can be more feasible and cost-effective.
By analyzing a well-chosen sample, we can make educated guesses or **point estimates** about population parameters.
<br>
Before we dive into point estimates, it's crucial to the the difference between populations and samples.
So, let's start by differentiating populations from samples.
## <font color='blue'>**Population vs Sample**</font> (5 mins)
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/005/original/pvs.png?1700131161 height = 300 width = 600>
### <font color='purple'>**Population:**</font>
- Think of a population as the <font color='purple'>entire set of items under study</font>.
- This could be the entire group of interest, whether it's people, objects, data points, or any other relevant entities.
- You use populations to draw conclusions.
**For example:**
If we were conducting a survey to understand the average income of all residents in a city,
- the population in this case would be every single resident in that city.
**Note:**
Obtaining the data from an <font color='purple'>entire population is often impractical, time-consuming, and expensive</font>, especially when the population is vast.
Therefore, we turn to the concept of sampling.
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/006/original/samp1.png?1700131349 height = 500 width = 500>
### <font color='purple'>**Sample:**</font>
- A sample is a smaller, manageable subset of the population.
- The sample is an <font color='purple'>unbiased subset of the population that best represents the whole data</font>.
To overcome the restraints of a population, you can sometimes collect data from a subset of your population and then consider it as the general norm.
<br>
**Continuing the above example:**
- <font color='purple'>Suppose you have selected and collected data from 500 residents from that city, this group of 500 individuals would represent your sample</font>.
The idea here is that the characteristics of the sample should resemble the characteristics of the entire population, allowing us to make **inferences** or **estimates** about the population using the sample data.
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/007/original/sam2.png?1700131475 height = 300 width = 350>
The process of collecting data from a small subsection of the population and then using it to generalize over the entire set is called **Sampling**.
Later in the lecture, we will discuss a few sampling techniques.
<br>
<font color='orange'>**Conclusion**</font>
> <font color='purple'>Why should we use samples instead of studying entire populations?</font>
- **Practicality**:
- As mentioned earlier, it's often not feasible to study an entire population due to constraints such as time, resources, and cost.
- **Reduced Error**:
- A well-designed sample can provide an accurate estimate of the population characteristics with significantly less time and effort.
- <font color='purple'>In some cases, testing or studying an entire population is destructive or impractical.</font>
- For instance, if you're inspecting a shipment of fresh fruit, you can't inspect every single piece; instead, you take a sample to represent the whole.
With that, we can say that <font color='purple'>**sampling is a practical way** to obtain data when studying an entire population is challenging or impossible</font>.
---
title: Sample statistics
description:
duration: 5400
card_type: cue_card
---
## <font color='blue'>**Sample Statistics**</font> (7-9 mins)
When you collect data from a population or a sample, there are various measurements and numbers you can calculate from the data.
- A <font color='purple'>**parameter**</font> is a measure that describes the **whole population**
Population data has a few population parameters like:
1. Population Mean
2. Population Variance
3. Population SD
<br>
A <font color='purple'>**statistic**</font> is a measure that describes the **sample**.
Sample statistics are descriptive values calculated from sample data.
Sample statistics include measures like:
1. Sample Mean
2. Sample Variance
3. Sample Standard Deviation
Let's explore each of these sample statistics and understand how they differ from population parameters:
### <font color='purple'>**Sample Mean $(\bar{x})$**</font>
The sample mean, denoted as **$\bar x$**, is the average value of a set of data points within a sample.
**Formula:**
$\Large\bar{x} = \frac{∑x_i}{n}$
Where,
- $\bar{x}$ = Sample mean
- $∑x_i$ = sum of each value in the sample
- $n$ = number of values in the sample
**Difference from Population Mean (μ):**
- The sample mean is an estimate of the population mean and is calculated from sample data.
- <font color='purple'>It may vary from one sample to another but is expected to be close to the population mean</font> when using a sufficiently large, random sample.
**Example:**
Suppose you want to estimate the average income of people in a city.
Instead of surveying the entire population, <font color='purple'>you randomly select 100 individuals and calculate their average income, which is the sample mean</font>.
### <font color='purple'>**Sample Variance $(s^2)$**</font>
The sample variance, denoted as $σ^2$, measures the spread or dispersion of data within a sample.
The sample variance is used to make estimates or inferences about the population variance.
**Formula:**
$\Large s^2 = \frac{\displaystyle\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1}$
Where,
- $s^2$ = Sample variance
- $x_i$ = represents individual data points.
- $\bar{x}$ = sample mean
- $n$ = number of data points in the sample
<br>
**How it is different from population variance**.
- The population variance is a parameter that describes the variability in the entire population.
- The sample variance is a statistic used to estimate the population variance based on a sample.
<font color='red'>***Instructor Note:***</font>
> <font color='purple'>**Q. Why we have used n-1 here instead of n?**</font>
The reason we use "n - 1" in the sample variance formula is to correct for bias and obtain an unbiased estimate of population variability.
It is part of a technique called <font color='purple'>**Bessel's correction**</font>
**Bessel's correction**
Bessel correction is a tweak we make to a formula when we're trying to estimate or guess something about a whole group based on a sample from that group.
It helps make our guess more accurate by accounting for the fact that we're not looking at the entire group but only a part of it
### <font color='purple'>**Sample Standard Deviation $(s)$**</font>
The sample standard deviation, denoted as $σ$, is a measure of how spread out the data in a sample is.
**Formula**
$\Large s = \sqrt\frac{\displaystyle\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1}$
<br>
**How it is different from Population Standard Deviation (σ)**:
- The sample standard deviation estimates the population standard deviation but may not match it exactly due to the finite size of the sample.
<font color='orange'>**Conclusion:**</font>
In summary,
Sample statistics, are essential tools for summarizing and analyzing data from a sample to <font color='purple'>**make inferences**</font> about the corresponding population parameters.
Now, let's solve some quizzes
---
title: Quiz-1
description:
duration: 60
card_type: quiz_card
---
# Question
There are 45 students in a class.
5 students were randomly selected from this class and their heights (in cm) were recorded as follows:
[131, 150, 140, 142, 152]
Calculate Sample mean and sample variance
# Choices
- [x] $\bar X = 143$, $σ^2 = 71$
- [ ] $\bar X = 142.4$, $σ^2 = 66.3$
- [ ] $\bar X = 147$, $σ^2 = 73.2$
- [ ] $\bar X = 152$, $σ^2 = 64.5$
---
title: Quiz 1 explanation
description:
duration: 5400
card_type: cue_card
---
Given,
Sample size (n) = 5
Sample mean:
- $\frac{131 + 150 + 140 + 142 + 152}{5} = 143$
Sample variance:
- $\frac{[(131-143)^2 + (150-143)^2 + (140-143)^2 + (142-143)^2 + (152-143)^2]}{4} = 71$
---
title: Point Estimates, Sampling techniques, Standard error
description:
duration: 5400
card_type: cue_card
---
## <font color='blue'>**Point Estimates**</font> (8 mins)
<font color='purple'>Now, we will use these sample statistics as a point estimate to estimate population parameters</font>.
- It serves as the best guess for the true, but often unknown, population parameters.
- For instance, the sample mean estimates the population mean, and the sample variance estimates the population variance.
<br>
> **Imagine a scenario:**
```
We want to estimate the mean weight of a certain species of turtle in Florida.
```
Measuring each turtle individually across the entire state is <font color='purple'>impractical in terms of time and resources.</font>
Instead, we opt to select a **random sample of 50 turtles** and use the mean weight of this sample to estimate the true population mean
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/011/original/bw2.png?1700132243 height = 300 width = 600>
Assume the <font color='purple'>sample mean is **150.4 pounds**</font>,
Then our <font color='purple'>point estimate for the true population mean of the entire species would be 150.4 pounds</font>.
### <font color='purple'>**Uncertainity in the estimation of variability**</font>
It is important to recognize that due to sample variability, our estimate might not be exactly on target.
For instance,
> **Q. What if we might pick a sample full of low-weight turtles?**
- Then the <font color='purple'>sample mean weight will be on the lower side</font> and our estimate will be wrong
> **Q. What if we might pick a sample full of heavy turtles?**
- Then the <font color='purple'>sample mean weight will be on the higher side</font> and our estimate will be wrong again
<br>
To capture this <font color='purple'>variability we can create a range of values</font>
- If our sample mean is 150.4 pounds, we can say that the true population parameter could reasonably fall within a certain range, accounting for potential variations.
For Example, we are pretty sure that the average weight is around 150.4 pounds, but it could be a bit less or a bit more.
**In conclusion**,
To account for this uncertainty, we recognize a range of values within which the true parameter is likely to fall.
> <font color='purple'>**Importance of representative samples**</font>
To get accurate insights about a whole group, we need a sample that reflects the group's main traits.
If our sample closely resembles the population, we can trust that estimates drawn from it are reliable and unbiased reflections of the entire population.
> <font color='purple'>**Q. But how to select the sample from the population?**</font>
## <font color='blue'>**Sampling Techniques**</font> (12-15 mins)
As we know when our population is large in size, geographically dispersed, or difficult to contact, it’s necessary to use a sample.
<font color='purple'>Sampling methods/techniques refer to how we select members from the population to be in the study</font>.
There are two primary types of sampling methods that you can use in your research:
1. **Probability sampling**
- In this every member of the population has a chance of being selected, allowing us to make strong statistical inferences about the whole group.
2. **Non-probability sampling**
- In this method, individuals are selected based on non-random criteria, and not every individual has a chance of being included.
- This type of sample is easier and cheaper to access, but it has a higher risk of sampling bias.
<br>
Ideally, a sample should be randomly selected and representative of the population.
<font color='red'>***Instructor Note:***</font> **This is only for reference**
There are four main types of probability samples.
1. **Simple random sampling**
2. **Systematic sampling**
3. **Stratified sampling**
4. **Cluster sampling**
There are three main types of non probability sampling.
1. **Purposive (judgmental) sampling**
2. **Purposive (judgmental) sampling**
3. **Snowball sampling**
Start from here:
In this module, we are going to <font color='purple'>study about probability sampling and that too Random Sampling</font> and in the ML module we will see other sampling techniques.
### <font color='purple'>**Simple Random Sampling**</font>
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/013/original/bl89.png?1700132420 height = 400 width = 400>
A simple random sample is a <font color='purple'>randomly selected subset of a population</font>.
- In this sampling method, <font color='purple'>each member of the population has an exactly equal chance of being selected</font> which tends to produce representative, unbiased samples.
**For instance**,
If you want to pick 1000 individuals from a town with 100,000 residents, <font color='purple'>each person has a 0.01 probability of being selected</font>.
This straightforward calculation doesn't require in-depth knowledge of the population's composition, Hence, simple random sampling.
> <font color='purple'>**Q What are some prerequisites before using this method?**</font>
- You have a complete list of every member of the population.
- You can contact or access each member of the population if they are selected.
- You have the time and resources to collect data from the necessary sample size
Simple random sampling works best if you have a lot of time and resources to conduct your study,
Or if you are studying a limited population that can easily be sampled.
> <font color='purple'>**Q. How to perform simple random sampling?**</font>
There are 4 key steps to select a simple random sample.
> **Step 1: Define the population**
Imagine you're conducting a survey to study the eating habits of people in a particular city.
- Your <font color='purple'>defined population is all the residents of that city who are aged 18 to 60</font>.
- Let the size of the population be 100,000 in this case
<br>
> **Step 2: Determine the Sample Size**
Now, you need to decide on the size of your sample.
<font color='purple'>Larger samples enhance statistical confidence but they also come with increased costs and effort</font>.
- <font color='purple'>The sample size is based on the factors like,</font>
- The city's population size (let say, 100,000),
- How sure do you want to be in your results?
- Let's say we're aiming for 95% **certainty**.
- How precise do you need to be?
- Let's say we're cool with a 5% **margin of error**.
- An estimated standard deviation.
- After crunching those numbers, it turns out we'll need to survey about 5000 people.
- That should give us a good balance between getting enough info and keeping things manageable.
<font color='red'>**Instructor Note:**</font>
Certainty is often expressed as a confidence level. The statement "aiming for 95% certainty" indicates a confidence level of 95%.
Margin of error Indicates the precision or range of uncertainty associated with the estimate. A smaller margin of error suggests a more precise estimate.
We will discuss this topics in the upcoming classes.
<br>
> **Step 3: Randomly Select Your Sample**
You have two options for random selection:
1. <font color='purple'>The lottery method</font>
In the lottery method, think of it like picking names from a wheel. Imagine you have a wheel/bowl with everyone's name in the group written on separate pieces of paper.
To choose a random sample, mix the papers well and roll the wheel, and get a few names.
It's like randomly selecting people without any specific order, just like how a lottery randomly picks winners.
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/015/original/roun.png?1700132544 height = 350 width = 350>
2. <font color='purple'>The random number method.</font>
- You assign a unique number to each potential participant.
- Using a random number generator, you then select 500 numbers at random from the list of assigned numbers.
<br>
> **Step 4: Collect Data from Your Sample**
You proceed to collect data from the 500 individuals in your sample.
- You distribute questionnaires and conduct interviews, ensuring that each person actively participates in the survey.
- Your data collection methods include reminders, follow-up phone calls, and in-person visits if necessary.
- This way, you aim for a high response rate and minimize potential biases.
Through this range of methods, you can understand the eating habits of city residents efficiently and reliably.
<font color='red'>***Instructor Note:***</font>
Random Sample Generator Interactive tool: https://epitools.ausvet.com.au/randomnumbers
How to use it:
- Choose sampling without replacement, sample with replcaement will be the case of bootstrapping
- Enter minimum and maximum value as per your choice
- Click 'submit'
### <font color='orange'>**Conclusion:**</font>
- Simple random sampling is a powerful method for selecting a representative sample from a larger population.
By carefully following all the steps, we can collect data that mirrors the characteristics of our population.
After this, we will have sample data.
- Now, we can calculate sample statistics, which we have covered earlier in this lecture,
- Which will enable us to make valuable inferences about the population based on our well-chosen sample.
Till now, we have seen what a sample is, how it differs from the population, and the different statistics to analyse sample data.
We also discussed sampling techniques to choose the sample and looked into the concept of estimating true population parameters using sample statistics.
However, <font color='purple'>there's a critical issue we need to address</font>.
> <font color='purple'>**Q. What if we collect multiple samples from the same population?**</font>
It allows us to <font color='purple'>understand how sample statistics, such as the sample mean, vary from one sample to another</font>.
Here's what happens,
1. **Variation in Sample Statistics**:
- <font color='purple'>Different samples drawn from the same population will produce different sample statistics</font>.
- For example, the sample mean will vary from one sample to another because of the inherent randomness in the sampling process.
2. **Estimation of Population Parameters**:
- <font color='purple'>These sample statistics can be used to estimate the population parameters</font> (e.g., population mean) with greater accuracy.
---
title: Standard Error
description:
duration: 600
card_type: cue_card
---
## <font color='blue'>**Standard Error**</font> (10 mins)
> <font color='purple'>**Q. How to quantify the variability which is caused by the variation in sample statistics across different samples?**</font>
Here, the Concept of **Standard Error** comes into play.
- The standard error (SE) quantifies this variability, <font color='purple'>indicating how much the sample mean is expected to deviate from the population mean</font> when different samples are drawn.
Now, if you can recall
> <font color='purple'>**If we want to assess the accuracy/reliability of the estimates**.</font>
- <font color='purple'>The standard error helps us</font> assess the reliability of sample-based estimates.
- A **smaller standard error** suggests that the sample statistic is **likely to be close to the population parameter**
- While a **larger standard error** indicates **more variability** and less precision in the estimate.
<font color='purple'>**Formula**</font>
The standard error of an estimate can be calculated as the standard deviation divided by the square root of the sample size:
$\Large SE = \frac{σ}{\sqrt n}$
where:
$σ$ = The population standard deviation
$√n$ =The square root of the sample size
- The SE is inversely proportional to the sample size.
- The Larger the sample size, the Samller the sample error will be. As the sample size grows, the sample statistic will approach the actual value of the population
<br>
<font color='purple'>We can say that there are 3 types of standard deviation.</font>
1. **Population standard deviation**:
- $σ = \sqrt{\frac{\displaystyle\sum_{i=1}^{n}(x_i - \mu)^2}{n}}$
2. **Sample standard deviation: It is the SD of samples**
- $s = \sqrt\frac{\displaystyle\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1}$
3. **Standard Error: it is sample standard deviation of sample mean distributions**
- $SE = \frac{σ}{\sqrt n}$
> <font color='purple'>**Q. What is sample mean distributions?**</font>
When we **collect samples** from the population (let say **5 samples**), calculate **it's mean and iterate this process numerous times** then it'll **form a distribution of sample means**.
- This is also known as <font color='purple'>**sampling distributions**</font>.
<br>
Let's take an example:
### <font color='purple'>**Example**</font>
- Let's say there are 300 million people in the USA. To determine this population's average age, the <font color='purple'>statistician takes a sample of 1000 people</font>.
- He determined the **average age of the sample was 37.5 years**.
- The **actual average age** of the entire population is **36.9** which is different from the determined average age of sample.
The standard error is an estimate of the accuracy of the sample average.
Here,
<font color='purple'>If the statistician had taken the sample of 5000 people instead of 1000, the standard error would have been smaller and the average age of the sample would have been closer to the actual population's average age</font>
<br>
> <font color='purple'>Q. What is the difference between Standard deviation and standard error?</font>
### <font color='purple'>**Difference between standard deviation and standard error.**</font>
- <font color='purple'>Standard deviation measures the variability of individual data points</font> within a single sample or a population,
- <font color='purple'>Standard error measures the variability of sample means</font> when multiple samples are drawn from a population.
<br>
So, just like standard deviation is used for population data, standard error is used for sample data to assess the reliability of your estimates.
<br>
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/024/original/abc3.png?1700133107 height = 300 width = 550>
---
title: Quiz-2
description:
duration: 60
card_type: quiz_card
---
# Question
A sample of 30 latest returns on XYZ stock reveals a mean return of $4
with a sample standard deviation of $0.13.
Estimate the SE of the sample mean.
# Choices
- [ ] 0.01
- [x] 0.02
- [ ] 0.08
- [ ] 0.65
---
title: Quiz 2 explanation
description:
duration: 5400
card_type: cue_card
---
Given,
Sample size (n) = 30
Sample mean (x) = 4
Sample standard deviation (s) = 0.13
$SE = \frac{s}{\sqrt n}$
$SE = \frac{0.13}{\sqrt 30}$
$SE = 0.02$
<font color='orange'>**Conclusion**</font>:
If we were to draw more samples from the population of yearly returns on XYZ stock and construct a sample mean distribution, we would end up with a mean of 4 and a standard error of 0.02.
---
title: Uniform distribution
description:
duration: 5400
card_type: cue_card
---
Till now, we've explored various concepts like the sample statistics, point estimates, and standard error. These topics are essential components of statistical analysis and decision-making.
Now, let's broaden our horizons and have a look into probability distribution: Uniform Distribution.
Consider some situations,
> <font color='purple'>**Q. Have you ever participated in a lottery, raffle, or game of chance?**</font>
<font color='red'>**Note**</font>:
A raffle is a gambling competition in which people obtain numbered tickets, each of which has the chance of winning a prize.
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/031/original/tck.png?1700133445 height = 300 width = 500>
Imagine you're in a raffle where you have an equal chance of winning any of the available prizes.
In such cases, each ticket holder has an equal probability of winning any prize, making it a classic example of a Uniform Distribution.
<br>
> <font color='purple'>**Q. But how do we analyze the likelihood of winning or the overall fairness of such events?**</font>
That's where understanding the Uniform Distribution becomes essential.
## <font color='blue'>**Uniform Distribution**</font> (15-17 mins)
A Uniform Distribution is a probability distribution where all possible outcomes are equally likely to occur.
In this distribution, each value within a specified range has the same probability of occurring.
**Example**
- A deck of cards has within it uniform distributions because the likelihood of drawing a heart, a club, a diamond, or a spade is equally likely.
- A coin also has a uniform distribution because the probability of getting either heads or tails in a coin toss is the same.
### **Types of uniform distribution**
1. <font color='purple'>Discrete Uniform Distributions</font>:
- The possible results of rolling a die provide an example of a discrete uniform distribution
- In discrete uniform distribution: **$P(x) = 1/n$**.
Where, P(x) = Probability of a discrete variable, n = Number of values in the range
- It is possible to roll a 1, 2, 3, 4, 5, or 6, but it is not possible to roll a 2.3, 4.7, or 5.5.
- Therefore, the roll of a die generates a discrete distribution with p = 1/6 for each outcome.
2. <font color='purple'>Continuous Uniform Distributions</font>:
- A random number generator would be considered a continuous uniform distribution.
- Suppose we generate random numbers between 0 .0and 1.0,
- With this type of distribution, every point in the continuous range has an equal opportunity of appearing, yet there is an infinite number of points between 0.0 and 1.0
### <font color='purple'>The Distribution function of discrete uniform distribution(PMF)</font>
In a discrete uniform distribution, the probability is calculated using the probability mass function (PMF)
To calculate the probability of a specific value in a discrete uniform distribution:
- PMF(x) = **P(X = x) = 1 / (b - a + 1)** for a ≤ x ≤ b
PMF(x) = **P(X = x) = 0** otherwise
Where:
- PMF(x) is the probability mass function, representing the probability of a random variable having a specific value within the range [a, b].
- "a" and "b" are the minimum and maximum values within the range.
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/032/original/per7.png?1700133693 height = 300 width = 550>
<font color='red'>***Instructor Note:***</font> Derivation of PMF function
The PMF describes the probability of observing each specific value.
Here's the derivation:
- Since the maximum probability for any specific outcome is 1, the total probability across all possible outcomes in the range must add up to 1.
- So, $P(X=x) = 1$
- Deriving $P(X=x)$:
- We know in discrete uniform distribution, probability is goven by **$P(X=x) = 1/n$**.
- $n$ is the all possible outcomes within the range $[a,b]$.
- So there are $b-a +1$ possible outcomes within the range.
- Therefore, $P(X=x) = \frac{1}{b-a+1}$ for each individual outcome $x$ in range $[a,b]$
<br>
Why the "+ 1" is necessary:
- If we only considered $b-a$ it would represent the number of integers between $a$ and $b$. It won't include $a$
- For example, if a=3 and b=6, b-a would be 3 representing the integers 4, 5, and 6.
However, in a discrete uniform distribution, we want to include the starting value $a$ and the ending value $b$ making it a total of 4 values (3, 4, 5, 6).
- Therefore, to include $a$ in the count, we add 1: **$b-a+1$**
### <font color='purple'>The Distribution function of continuous uniform distribution(PDF)</font>
In a continuous uniform distribution, the probability is calculated using probability density function (PDF) which is a constant value within a given range, and it's defined as:
- PDF(x) = 1 / (b - a) for a ≤ x ≤ b
PDF(x) = 0 otherwise
Where:
- PDF(x) is the probability density function, representing the probability of a random variable falling within the range [a, b].
- "a" and "b" are the minimum and maximum values within the range.
- Every value between "a" and "b" is equally likely to occur and any value outside of those bounds has a probability of zero.
**It is also known as height of the graph**
<br>
If a random variable X follows a uniform distribution, then the probability that X takes on a value between x1 and x2 can be found by the following formula:
**$P(x_1 < X < x_2) = \Large \frac{(x_2 – x_1)}{(b – a)}$**
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/034/original/rec56.png?1700133783 height = 300 width = 500>
<font color='red'>***Instructor Note:***</font> Derivation of PDF function
Here's the derivation:
1. Area of the Rectangle:
- The probability density function $f(x)$ represents the height of the rectangle. Since the maximum probability of $X$ within the range $[a,b]$ is 1, the total area under the PDF curve must be 1.
- The area of a rectangle is given by the product of its base and height. In this context, the base is $b-a$ and $f(x)$ is height.
- Therefore, $(b-a) * f(x) = 1$.
<br>
- So, height will be:
- $f(x) = \frac{1}{b-a}$.
- This expression represents the height of the rectangle, which is also the constant probability density within the range $[a,b]$
In simple terms, the derivation shows that to make sure the total probability within the range is 1, the height of the rectangle representing the PDF($f(x)$) needs to be $\frac{1}{b-a}$.
This ensures that the area under the PDF curve (rectangle) equals 1, which aligns with the requirement for a valid probability distribution.
Let's solve some quizzes.
---
title: Quiz-3
description:
duration: 60
card_type: quiz_card
---
# Question
Suppose the weight of dolphins is uniformly distributed between 100 pounds and 150 pounds.
If we randomly select a dolphin at random, then determine the probability that the chosen
dolphin will weigh between 120 and 130 pounds.
# Choices
- [ ] 0.5
- [x] 0.2
- [ ] 0.75
- [ ] 0.8
---
title: Quiz 3 explanation
description:
duration: 5400
card_type: cue_card
---
We want to find, P(120 < X < 130)
Now, by using the formula,
P(x1 < X < x2) = (x2 – x1)/(b – a)
= (130 – 120) / (150 – 100)
= 10/50
= 0.2
The probability that the chosen dolphin will weigh between 120 and 130 pounds is 0.2
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/038/original/q356.png?1700133916 height = 300 width = 500>
---
title: Visualisation and properties of Uniform distribution
description:
duration: 5400
card_type: cue_card
---
Let's plot the uniform distribution
## <font color='blue'>Visualizing Uniform Distribution</font> (5 mins)
Under a uniform distribution, each value in the set of possible values has the same possibility of happening.
When displayed as a bar or line graph, this distribution has the same height for each potential outcome.
- In this way, it can look like a rectangle and therefore is sometimes described as a rectangular distribution.
<br>
<font color='purple'>**Example: Rolling a fair six sided die**</font>
As we discussed, the probability of landing on any one of them is equal i.e. (1/6).
- When plotted on a graph, the distribution is represented as a horizontal line, with each possible outcome captured on the x-axis, at the fixed point of probability along the y-axis.
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/039/original/br86.png?1700134045 height = "400" width = "600">
Let's try to plot in using python
Code:
```python=
import numpy as np
import matplotlib.pyplot as plt
# Generate random data following a uniform distribution
data = np.random.uniform(1, 7, 10000) # 10,000 samples from a uniform distribution between 1 and 6
# Create a histogram
plt.hist(data, bins=6, edgecolor='black', align='mid', rwidth=0.8, density=True, color='skyblue')
# Set labels and title
plt.xlabel('Outcome')
plt.ylabel('Probability')
plt.title('Histogram of a Uniform Distribution (Die Roll Simulation)')
# Show the plot
plt.grid(True)
plt.show()
```
>Output:
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/041/original/ver76.png?1700134153 height = 400 width = 500>
In this code, we generate 10,000 random values following a uniform distribution between 1 and 6, simulating a fair six-sided die roll.
The resulting histogram displays a flat horizontal line because each outcome has an equal probability of occurring.
<br>
- We can notice that the visualization may not appear perfectly uniform due to the random nature of the process and the finite sample size.
- By performing this iteration many times (rolling the die a large number of times), the distribution will tend to follow perfect uniform distribution
Let's see the statistics of uniform distribution
## <font color='blue'>Properties of the Uniform Distribution</font> (3-5 mins)
The uniform distribution has the following properties:
- Mean: $\Large \frac{(a + b)}{2}$
- Variance: $\Large \frac{(b – a)^2}{12}$, for Continuous uniform distributions
- Variance: $\Large \frac{(b – a+1)^2 -1}{12}$, for Discrete uniform distributions
- Standard Deviation: $\Large \frac {(b – a)}{\sqrt12}$, for Continuous uniform distributions.
- Standard Deviation: $\frac{\sqrt{(b – a+1)^2 -1}}{12}$, for Discrete uniform distributions.
---
title: Conclusion
description:
duration: 5400
card_type: cue_card
---
## <font color='blue'>Conclusion</font>
With that, we come to the end of the lecture.
Today, we have learned about samples and explored the uniform distribution
Keep revising the topics and keep solving the questions.
See you in the next class.