---
title: Content
description:
duration: 5400
card_type: cue_card
---
### <font color='blue'>**Content**</font>
- Log Normal Distribution.
- Poisson Distribution
---
title: Log Normal distribution, Key Characteristics of Log Normal Distribution
description:
duration: 5400
card_type: cue_card
---
### <font color='blue'>**Introduction**</font> (2 mins)
Greetings everyone,
In our ongoing journey to explore the world of probability distributions, we will look into two more intriguing distributions: the Log-Normal distribution and the Poisson distribution.
Each of these distributions plays a unique and vital role in various fields.
Today, we'll explore the characteristics, applications, and real-world significance of these two distributions.
## <font color='blue'>**Log Normal Distribution**</font> (10-12 mins)
```
Imagine that you are a data scientist at Amazon/Swiggy/Zomato
You've collected a bunch of data on delivery times,
```
Generally how much time delivery takes?
- Let's assume around 30 mins, maybe sometimes less than 30 maybe more
Now, if we take thousands of these delivery time data points and plot a histogram,
- It may be a bit skewed to the right. Sometimes deliveries are quicker than 30 minutes, and sometimes they take a bit longer.
<font color='purple'>**The lognormal distribution is a continuous probability distribution that models this type of right-skewed data.**</font>
<br>
<font color='purple'>Suppose $X$ is the actual data</font>
- Now the beauty of log normal is when you take <font color='purple'>**the logarithm (log) of the actual delivery time data**</font> and plot a new histogram,
- The new <font color='purple'>**histogram tends to be more symmetrical</font>, like a bell curve**.
In simple terms, <font color='purple'>the Log-Normal Distribution takes the original data, does some math magic (logarithm), and makes it look more like a normal, symmetrical distribution</font>.
<br>
So, in the language of distributions, we say the <font color='purple'>"original delivery time data (X) is log-normal."</font>
- It means if X follows a log-normal distribution, log(X) follows a normal (bell-shaped) distribution.
<br>
You can **exponentiate a normal distribution (exp (X)) to obtain the lognormal distribution**.
In this manner, you can transform back and forth between pairs of related log normal and normal distribution
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/458/original/pd1.png?1700493114 height = 250 width = 300 >
We can see in this image that the original data follows log normal distribution and if we take log of this disttribution, it'll look more symmetrical like a bell shaped curve.
We will implement this on a real life dataset
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/459/original/pd2.png?1700493273 height = 300 width = 400 >
### <font color='purple'>**Real Life dataset**</font>
Let's have a look into the dataset which has waiting time records
Code:
```python=
!wget --no-check-certificate https://drive.google.com/uc?id=1SIZC1FZvZAhVzRvnZ7IFWBUDavvzIafJ -O waiting_time.csv
```
>Output:
```
waiting_time.csv 100%[===================>] 1.58M --.-KB/s in 0.01s
2024-01-18 10:09:25 (134 MB/s) - ‘waiting_time.csv’ saved [1656272/1656272]
```
> <font color='purple'>Importing Libraries</font>
> Code:
```python=
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import poisson, binom
```
Code:
```python=
data = pd.read_csv("/content/waiting_time.csv")
data.head()
```
>Output:
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/461/original/pd3.png?1700493434 height = 200 width = 130 >
Let's plot this data
Code:
```python=
sns.histplot(data,bins=100)
```
>Output:
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/462/original/pd4.png?1700493536 height = 350 width = 500 >
<font color='purple'>**Observation**</font>
We can observe that it is right skewed.
Now,
> <font color='purple'>Q. How can we answer questions related to the data which is distributed in this way?</font>
We can transform this data using a **log** and let's see the distribution of transformed data.
### <font color='purple'>**Log Normal Distribution Parameters**</font>
As we know, the random variable for the original data is $X$ and after transforming it using log, the random variable of transformed data is $log(x)$.
If **$μ$ and $σ$** (mean and standard deviation) of $log(x)$ is given to me and I want to find the mean and SD of Original data (log normal distribution) ($X$), then it is given by:
- **Mean of original $X$** = ${\displaystyle \exp \left(\mu +{\frac {\sigma ^{2}}{2}}\right)}$
<img src='https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/522/original/Screenshot_2023-11-21_at_6.17.01_PM.png?1700571046'>
- **Variance of original $X$** = ${\displaystyle [\exp(\sigma ^{2})-1]\exp(2\mu +\sigma ^{2})}$.
<img src ='https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/523/original/Screenshot_2023-11-21_at_6.18.07_PM.png?1700571088' width=300>
We don't need to remember these formulas as we have functions in python to carry out our analysis.
<font color='purple'>Let's transform our original data:</font>
Code:
```python=
data_log = np.log(data)
data_log
```
>Output:
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/464/original/pd6.png?1700493830 height = 400 width = 160 >
Code:
```python=
sns.histplot(data_log, bins=100)
```
>Output:
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/465/original/pd7.png?1700493949 height = 400 width = 550 >
<font color='purple'>**Observation**</font>
- We can observe that after applying logarithm to the right skewed data, we get the distribution which is approximately normal.
- We converted our data in such a format such that we are able to utilize the properties of gaussian distribution
This is known as <font color='purple'>**log normal transformation**</font>
<br>
> <font color='purple'>Q. Why did we specifically choose log?</font>
1. <font color='purple'>**Symmetry:**</font>
- Our original data is right-skewed, with a long tail on the right side indicating occasional very long delivery times.
- The **logarithmic transformation** compresses larger values more than smaller values.
- The extreme right tail is pulled in, making the distribution more symmetric.
2. <font color='purple'>**Stabilizing Variance:**</font>
- In the original delivery time data, you might observe that the variance (spread) of delivery times increases as the mean delivery time increases.
- Taking the logarithm can **stabilize the variance**.
We can observe that the spread of delivery times after the transformation is more consistent across different independent variables.
In summary,
Applying a logarithmic transformation to the right skewed data can make the distribution more symmetric and stabilize the variance, making it potentially more useful for certain statistical analyses.
<font color='red'>**Instructor's Note:**</font> (you can use these examples to explain the above points if required)
Let's understand this with the help of an example:
Suppose we have values like,
```
X: 1, 10, 100, 1000, 10000
```
Now let's take a log of all these values, we will get:
```
- ln(1) = 0
- ln(10) = 2.30
- ln(100) = 4.60
- ln(1000) = 6.90
- ln(10000) = 9.21
```
**Observation**:
- We can clearly observer that after taking log of all the values it compresses larger values more than smaller values.
- 10,000 got transformed into 9.21 and we can clearly see how much compressed the values got.
- This can bring symmetry to the distribution.
**Example on stabilizing variance:**
We can also observe that after applying the log, variance also got stabilized.
- Let's consider a simple example to illustrate this:
Suppose you have a set of positive numbers with increasing variance:
Original Data: $1,2,4,8,16,32,…$
- If you observe the differences between consecutive values, you'll see that the differences increase:
Differences: $1,2,4,8,16,…$.
Now, if you take the logarithm of the original data:
- Log-Transformed Data: $ln(1),ln(2),ln(4),ln(8),ln(16),…$
The differences between the log-transformed values are now constant around $0.693$
Differences: $ln(2)−ln(1),ln(4)−ln(2),ln(8)−ln(4),ln(16)−ln(8),…$
This constant difference suggests a stabilized variance, which can be beneficial in statistical analyses.
## <font color='blue'>**Key Characteristics of a Log-Normal Distribution**</font> (3-5 mins)
Let's understand the key characteristics of a log-normal distribution.
1. **Positivity:**
- All values in a log-normal distribution are <font color='purple'>positive because the logarithm of any positive number is always real</font>.
2. **Skewedness:**
- If the original data is right-skewed, the log-normal transformation can make it more symmetric and bell-shaped.
3. **Multiplicative Processes:**
- Log-normal distributions are suitable for modelling scenarios where the final outcome is influenced by the product of independent factors.
- In our dataset, we are aware that <font color='purple'>delivery times may get affected by various independent factors like traffic, order processing time</font>, etc.
<font color='red'>***Instructor note:***</font> (if you want to explain the above points using example then you can refer this)
Example for Positivity:
> Suppose you have a set of numbers following a normal distribution with a mean (μ) of 0 and a standard deviation (σ) of 1. The log-normal distribution is obtained by exponentiating these normal distribution values.
**Normal Distribution:**
Let's generate some random values from a normal distribution:
- Random Values from Normal Distribution: $-1.5, 0.8, -0.3, 1.2, -0.7$.
Exponentiate to Obtain Log-Normal Distribution:
Now, exponentiate each of these values:
- Log-Normal Distribution Values: $e^{-1.5}, e^{0.8}, e^{-0.3}, e^{1.2}, e^{-0.7}$
Calculating these values:
- Log-Normal Distribution Values: $0.223, 2.225, 0.741, 3.320, 0.496$
As we can see, all values in the resulting log-normal distribution are positive.
The exponentiation ensures that even if the original values from the normal distribution could be positive or negative, the transformation to the log-normal distribution makes them all positive.
**Example for multiplicative process:**
> Imagine you have a population of bacteria in a controlled environment, and the daily growth of this population is influenced by various independent factors.
Each day, the number of new bacteria added is not a fixed amount but is instead a percentage increase based on these factors.
**Daily Growth in Logarithmic Scale:**
- Let's say you measure the daily growth of the bacteria population in a logarithmic scale. The daily growth, when expressed in this logarithmic scale, follows a normal distribution.
**Total Population:**
- Now, if you're interested in predicting the total population over time, you would consider the cumulative effect of daily growth.
- The total population on a given day $(P_t)$ can be expressed as the product of the previous day's population $(P_{t-1})$ and the exponential of the daily growth in the logarithmic scale $(e^{Xt})$.
$P_t = P_{t-1} * e^{Xt}$
The distribution of the total population $(P_t)$ over time would follow a log-normal distribution.
In this scenario, the log-normal distribution is appropriate because the growth of the population is influenced by multiple independent factors that operate on a multiplicative scale.
Each day's growth is not just an addition but a percentage increase based on the current population size and the cumulative effects of various factors.
(till here)
In summary, a log-normal distribution is a good fit for positively skewed, ensuring **positivity, and aligning with multiplicative processes** often seen in real-world scenarios.
Now, let's see what is poisson distribution
---
title: Poisson Distrbution
description:
duration: 5400
card_type: cue_card
---
### <font color='blue'>**Poisson Distribution**</font> (10-12 mins)
> <font color='purple'>**Scenario: Traffic at a Toll Booth**</font>
```
Imagine you're at a toll booth on a highway,
observing the number of vehicles passing through the toll booth in a given time period.
```
The Poisson distribution comes into play when we want to <font color='purple'>**model the number of events that occur in a fixed interval of time or space**</font>.
- In this case, <font color='purple'>vehicles passing through the toll booth are our event</font>.
<img src='https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/524/original/toll.gif?1700571863' height=200>
<br>
<font color='purple'>**Explanation:**</font>
The Poisson distribution is a **discrete probability distribution** particularly useful when dealing with events that occur randomly and independently, but with a known average rate.
In our toll booth scenario, we can make a few key observations:
1. <font color='purple'>**Fixed Interval:**</font>
- Let's say we want to study the number of vehicles passing through the toll booth in a specific time period, <font color='purple'>say 1 hour</font>.
2. <font color='purple'>**Average Rate:**</font>
- We have an average rate of vehicles passing through the toll booth, <font color='purple'>let's say 30 vehicles per hour</font>.
- It is denoted as $λ$ (lambda), which represents the average rate of occurrence of the event within a given interval.
- Here, <font color='purple'>λ is 30 vehicles per hour</font>.
<br>
Now, the poisson distribution **helps us answer some questions** like:
> <font color='purple'>Q. What is the probability of exactly 25 vehicles passing through the toll booth in the next hour?</font>
> <font color='purple'>Q. What is the probability of more than 40 vehicles passing through the toll booth in the next hour?</font>
This toll booth scenario is just one example of how the Poisson distribution is applied in various fields.
The graph below shows examples of Poisson distributions with different values of λ.
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/470/original/pd13.png?1700495067 height = 400 width = 600 >
- When **λ is low, the distribution is much longer on the right side of its peak than on its left**.
- As **λ increases, the spread of distribution also increases**
- If you <font color='purple'>keep increasing, the distribution looks more and more similar to a normal distribution</font>.
### <font color='blue'>**Poisson Distribution Formula**</font>
If a random variable $X$ follows a Poisson distribution, then the probability that $X = k$ successes can be found by the following formula:
- **$P[X=k] = \Large \frac{λ^k \ * \ e^{-λ}}{k!}$**
where:
- **λ**: rate or mean number of successes that occur during a specific interval
- **k**: number of successes
- **e**: a constant equal to approximately 2.71828
This is also known as **Probability Mass Function (PMF)** of poisson distribution as using this formula we can calculate the probability of exact events.
<br>
### Example 1:
```
suppose a particular hospital experiences an average of 2 births per hour.
We can use the formula above to determine
the probability of experiencing 0, 1, 2, 3 births, etc. in a given hour:
```
Here,
rate ($λ$) = 2
e = constant= 2.71828
$P[X=0] = \Large \frac{2^0 \ * \ e^{-2}}{0!} = 0.1353$
$P[X=1] = \Large \frac{2^1 \ * \ e^{-2}}{1!} = 0.2707$
$P[X=2] = \Large \frac{2^2 \ * \ e^{-2}}{2!} = 0.2707$
$P[X=3] = \Large \frac{2^3 \ * \ e^{-2}}{3!} = 0.1804$
<br>
Here, we can also find the probability using the function in python i.e. **poisson.pmf()** as it is asking for the probability of an exact value
- We have to pass 2 parameters in this function, **k:** number of events and **mu:** average or rate
Code:
```python=
# P[x=0]
poisson.pmf(k=0, mu=2)
```
>Output:
```
0.1353352832366127
```
Code:
```python=
# P[x=1]
poisson.pmf(k=1, mu=2)
```
>Output:
```
0.2706705664732254
```
Code:
```python=
# P[x=2]
poisson.pmf(k=2, mu=2)
```
>Output:
```
0.2706705664732254
```
Code:
```python=
# P[x=3]
poisson.pmf(k=3, mu=2)
```
>Output:
```
0.18044704431548356
```
Let's look into more examples.
### <font color='purple'>**Applications:**</font>
**1) Football Match Goals**
Imagine we have collected data for all the football matches ever happened, now we want to analyze the distribution of goals.
- We observe that the average goal per 90 mins match is 2.5
- So the rate will be 2.5 goals per match (λ = 2.5).
> <font color='purple'>Q. If I want to know the probability of getting 1 goal in last 30 mins?</font>
This is where poisson distribution comes into play.
Here, the rate is 2.5 goal/ 90 mins (per match) which mean average number of goals in 90 mins
- What will be the average number of goals in 45 mins?
2.5 goals -> 90 mins
Average goals for half of the time will be half of the total average rate
Rate: 2.5/2 = 1.25/45 mins (per 45 mins)
Similarly, we can define a range for 30 mins,
1.25 goals -> 45 mins
x goals? -> 30 mins
x = (30 * 1.25)/45
Rate = 0.833 goals/30 mins
So,
> **Q: How long should you stay to witness a goal on average?**
- On average, <font color='purple'>**staying at least 45 minutes increases the probability of witnessing a goal**</font> during a football match.
- Because staying at least 45 minutes aligns with the average goal rate of 1.25 goals per 45 minutes.
- This duration maximizes the likelihood of experiencing a goal during a football match based on the observed rate of scoring.
Next example,
<br>
**2) Support Phone Calls**
Think about a support centre that receives 100 calls per hour.
- So the average call received per minute will be,
100 calls -> 60 mins (1 hr)
x calls -> 1 min
Rate: 100/60 = 1.666 calls/min
This allows us to analyze the probability of receiving a certain number of calls within a specific time frame.
- The call center management can use this rate to determine the optimal number of customer service representatives to have on duty during different time periods.
- For instance, during peak times, when the call rate is high, more staff may be required to handle the increased volume.
One more example
<br>
**3) Hospital OPD Patients**
Consider a hospital's Outpatient Department (OPD) where, on average, 200 patients visit in a day (λ = 200).
- The average hourly rate of patient arrivals can be calculated by dividing the daily rate by the number of working hours.
- For example, if the facility operates for 8 hours, the hourly rate would be $\frac {200}{8} = 25$ patients per hour.
- The facility can use this information for resource planning, such as determining the optimal number of staff, doctors, and examination rooms needed to handle the expected patient load efficiently.
<br>
These are some real life examples where poisson distribution can help us understand the liklihood of an event occurring in a specific time interval or space
## <font color='blue'>**Rules of Poisson Distribution**</font> (5 mins)
**Key rules that govern the Poisson distribution:**
1. <font color='purple'>**Counting:**</font>
- The Poisson distribution is tailored for **counting the number of discrete events happening within a fixed interval**
- The events can take on values like 0, 1, 2, 3, and so on.
2. <font color='purple'>**Independence:**</font>
- The occurrence of one event should not affect the occurrence of another event.
- Events are considered to be independent if the probability of one event happening doesn't change based on whether another event has occurred.
> **For example**,
- <font color='purple'>**if an accident occurs in Delhi at 4 PM, it will have no impact on the occurrence of an accident in Mumbai at the same time**</font>.
- Each event is independent of the other, and the outcome in one location does not influence or affect the outcome in the other location.
<br>
3. <font color='purple'>**Rate (λ or μ):**</font>
- The distribution is defined by a single parameter often denoted as λ (lambda) or μ (mu), which represents the average rate of occurrence of the event within the given interval.
- This rate remains constant throughout the interval and doesn't change based on the occurrences.
4. <font color='purple'>**No Simultaneous Events:**</font>
- The Poisson distribution assumes that there cannot be more than one occurrence of the event at exactly the same time or within an infinitesimally small interval of time or space.
- For instance, <font color='purple'>if a family of five people enters a store, it's counted as a single event, not five separate events.</font>
Let's look at the some examples using Poisson distribution
---
title: Examples of poisson
description:
duration: 5400
card_type: cue_card
---
## <font color='purple'>Example 2:</font> (5 mins)
```
A city sees 3 accidents per day on average.
Find the probability that there will be 5 accidents tomorrow.
```
Solution:
Given,
The rate is given as 3 accidents per day on average,
- $λ = 3$
Let “$X$” denote the number of accidents tomorrow.
- We say “$X$” is Poisson distributed with rate ($λ$) = 3
<br>
So, the probability that there will be 5 accidents tomorrow is $P[X=5]$
By using the formula,
- $P[X=5] = \Large \frac{λ^5 \ * \ e^{-λ}}{5!} = \frac{3^5 \ * \ e^{-3}}{5!}$.
Using python,
Code:
```python=
# P[X=5]
poisson.pmf(k=5, mu=3)
```
>Output:
```
0.10081881344492458
```
There is a 10% chance that there will be 5 accidents tomorrow.
<br>
**Next question**
> **Q. Find the probability that there will be 5 or fewer accidents tomorrow**.
Here we want to calculate $P[X≤5]$,
We will use **poisson.cdf()** here as we want to calculate cumulative probability.
- $P[X≤5] = P[X=0] + P[X=1] + P[X=2]+ P [X=3]+ P[X=4] + P[X=5]$
We can directly find it using poisson.cdf()
Code:
```python=
# P[X ≤ 5]
poisson.cdf(k=5, mu=3)
```
>Output:
```
0.9160820579686966
```
## <font color='purple'>Example 3:</font> (3 mins)
```
Let “X” be the number of typos in a page in a printed book, with mean of 3 typos per page.
What is the probability that a randomly selected page has atmost 1 typo?
```
Here, rate ($λ$) = 3
we want to find for atmost 1 type, so we need to find
$P[X≤1]$ which will be $P[X=0] + P[X=1]$.
We can directly use **poisson.cdf()** here
Code:
```python=
# P[x≤1]
poisson.cdf(k=1, mu=3)
```
>Output:
```
0.1991482734714558
```
Code:
```python=
prob = poisson.pmf(k=0, mu=3) + poisson.pmf(k=1, mu=3)
prob
```
>Output:
```
0.1991482734714558
```
There is a 19% chance that a randomly selected page has atmost 1 typo
## <font color='purple'>Example 4:</font> (3 mins)
```
The shop is open for 8 hours. The average number of customers is 74 - assume Poisson distributed.
(a) What is the probability that in 2 hours, there will be at most 15 customers?
(b)What is the probability that in 2 hours, there will be at least 7 customers?
```
For the first question, we need to find $P[X≤15]$ in 2 hrs
- The rate for this scenario will be:
74 customers -> 8 hrs
x customers -> 2 hrs
Rate = 2 * 74/8 = 74/4
Rate = 18.5 (for 2 hrs)
So, $λ$ = 18.5, $k$ = 15 so **poisson.cdf** will be
Code:
```python=
poisson.cdf(k=15, mu=18.5)
```
>Output:
```
0.24902769151284776
```
> <font color='purple'>What is the probability that in 2 hours, there will be at least 7 customers?</font>
Here we want to find $P[X≥7]$ which will be $1 - P[X≤6]$
Here we will find the probability till 6 customers and then subtract it from 1 will give us the probability that there will be atleast 7 customers.
Code:
```python=
# P[X≥7]
1 - poisson.cdf(k=6, mu=18.5)
```
>Output:
```
0.9992622541111789
```
---
title: Quiz 1
description:
duration: 60
card_type: quiz_card
---
# Question
It is known that a certain website makes 10 sales per hour.
In a given hour, what is the probability that the site makes exactly 8 sales?
# Choices
- [x] 0.1125
- [ ] 0.3544
- [ ] 0.25
- [ ] 0.674
---
title: Quiz 1 explanation
description:
duration: 5400
card_type: cue_card
---
### Quiz 1 explanation
We want to find $P[X=8]$
Given,
λ = 10 and x = 8
Code:
```python=
prob = poisson.pmf(k=8, mu=10)
prob
```
>Output:
```
0.11259903214902009
```
---
title: Quiz 2
description:
duration: 60
card_type: quiz_card
---
# Question
It is known that a certain hospital experience 4 births per hour.
In a given hour, what is the probability that 4 or less births occur?
# Choices
- [ ] 0.585
- [x] 0.6288
- [ ] 0.4723
- [ ] 0.82
---
title: Quiz 2 explanation
description:
duration: 5400
card_type: cue_card
---
### Quiz 2 explanation
Here we want to find $P[X≤4]$
Given,
λ = 4 and x = 4,
Code:
```python=
prob = poisson.cdf(k=4, mu=4)
prob
```
>Output:
```
0.6288369351798734
```
---
title: Quiz 3
description:
duration: 60
card_type: quiz_card
---
# Question
An e-commerce website experiences an average of 10 credit card transactions per day.
What is the probability that there will be at least 12 credit card transactions in a given day?
# Choices
- [ ] 0.2381
- [ ] 0.1263
- [x] 0.3032
- [ ] 0.1755
---
title: Quiz 3 explanation
description:
duration: 5400
card_type: cue_card
---
### Quiz 3 explanation
Here we want to find $P[X\ge12]$ where $X$ is the number of credit card transactions in a day.
Given,
$λ$ = 10, $x$ = 12
Code:
```python=
prob = 1 - poisson.cdf(k=11, mu=10)
prob
```
>Output:
```
0.30322385369689386
```
poisson.cdf(k=11, mu=10) gives the **probability of having 11 or fewer transactions**. To get the probability of at least 12 transactions, **we subtract this probability from 1**.
---
title: Poisson approximation to Binomial
description:
duration: 5400
card_type: cue_card
---
## <font color='blue'>**Poisson approximation to Binomial**</font> (5-7 mins)
```
There are 80 students in a kinder garden class.
Each one of them has 0.015 probability of forgetting their lunch on any given day.
(a) What is the average or expected number of students who forgot lunch in the class?
(b) What is the probability that exactly 3 of them will forget their lunch today?
```
Solution:
First question,
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/471/original/pd14.png?1700496007 height = 150 width = 500 >
Code:
```python=
rate = 80*0.015 # average or rate
rate
```
>Output:
```
1.2
```
**Conclusion**:
This implies that, on average, there are 1.2 students who forget their lunch in a given period.
<br>
> <font color='purple'>(b) What is the probability that exactly 3 of them will forget their lunch today?</font>
here, k = 3 and lambda = 1.2
We can directly use **poisson.pmf()**
Code:
```python=
poisson.pmf(k=3, mu=1.2)
```
>Output:
```
0.08674393303071422
```
There is 8.67% chance that exactly 3 of them will forget their lunch today
<br>
> <font color='purple'>**Q. Can I model this question into binomial distribution?**</font>
We have 80 students, we can define two probabilites here
- probability of success $P(s)$ = student forgot the lunch = $0.015$
- probability of failure $P(f)$ = $1 = P(s) = 1 - 0.015$
We want $P[X=3]$,
- we can represent it as **out of 80 trials, I want 3 success**
Using binomial formula, it will be
- $^{80}C_3 (0.015)^3 (1-0.015)^{77}$
we just make this question of binomial
Code:
```python=
binom.pmf(k=3, n=80, p=0.015) # Large n, small p, np=mu
```
>Output:
```
0.08660120920447566
```
We gt the similar answers using both **poisson and binomial**
<br>
<font color='purple'>In binomial</font>
- we are counting the number of successes in $n$ trials where $P(s)$ = $p$
<font color='purple'>In poisson</font>
- Counting number of occurrences in a given time interval.
<br>
- Now, for 1 success we have probability p so for n success, the probability will be
1 success -> p
n success -> ?
for n success -> np
Here, the $P(s)$ for 1 student is $0.015$. So $P(s)$ for 80 students will be $80 * 0.015 = 1.2$
So we can say that <font color='purple'>**$\Large λ = np$**</font>
This approximation is known as the <font color='purple'>**Poisson approximation to the binomial distribution**</font>.
### <font color='purple'>**Conditions for a reasonable approximation:**</font>
- When the number of trials $(n)$ is large and the probability of success $(p)$ is small, the binomial distribution can be approximated by a Poisson distribution.
- For a reasonable approximation:
- $np ≤ 10$
- $p ≤ 0.1$
<font color='red'>***Intructor Note:***</font>
The concept of "large enough" for the number of trials $(n)$ in the context of the Poisson approximation to the binomial distribution doesn't have a fixed, universally agreed-upon threshold.
However, a commonly used guideline is that
$n$ should be such that $np≤10$
- If the above conditions met the we can use the Poisson distribution to estimate the probabilities of different event counts.
<br>
So, **in the context of our problem**, <font color='purple'>$n=80$ and $p=0.015$, the conditions $np\le10$ and $p≤0.1$ are satisfied.</font>
- We can use the Poisson distribution with $λ=80×0.015$ as an approximation to the binomial distribution.
<img src = https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/057/472/original/pd15.png?1700496171 height = 100 width = 600 >
---
title: Conclusion
description:
duration: 5400
card_type: cue_card
---
## <font color='blue'>Conclusion</font>
With that, we wrap up this lecture. Please go through the lecture once to clearly understand all the topics that we have covered today.