# Where is probability in all this?
<!-- Put the link to this slide here so people can follow -->
slide: https://hackmd.io/@ccornwell/probability1
---
<h3>Filling in the gaps<sup>1</sup></h3>
- <font size=+2>When modeling data, rarely interested in points in the data. Interested instead in points *not* in the data -- the "yet unseen" data.</font>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<font size=+2><sup>1</sup>This phrase might be common way to indicate the idea discussed here (I've seen it in multiple places). Notably, by Jesse Johnson when discussing this idea.</font>
----
<h3>Filling in the gaps</h3>

<font size=+2>
<div style="text-align:right;">(from The Cartoon Introduction to Statistics)</div></font>
----
<h3>Filling in the gaps<sup>1</sup></h3>
- <font size=+2>When modeling data, rarely interested in points in the data. Interested instead in points *not* in the data -- the "yet unseen" data.</font>
- <font size=+2>**Population probability distribution**: what want our model to approximate. Only have observed data to use, a *sample* from population.</font>
- <font size=+2 style="color:#181818;">Statistics$=$ using sample data to be confident about population data. Harness probability theory to make this work.</font>
- <font size=+2 style="color:#181818;">Machine learning: also must use probability to improve models (robust, good predictions on unseen data), and to understand what the model's output *means*.</font>
<font size=+2><sup>1</sup>This phrase might be common way to indicate the idea discussed here (I've seen it in multiple places). Notably, by Jesse Johnson when discussing this idea.</font>
----
<h3>Filling in the gaps<sup>1</sup></h3>
- <font size=+2>When modeling data, rarely interested in points in the data. Interested instead in points *not* in the data -- the "yet unseen" data.</font>
- <font size=+2>**Population probability distribution**: what want our model to approximate. Only have observed data to use, a *sample* from population.</font>
- <font size=+2>Statistics$=$ using sample data to be confident about population data. Harness probability theory to make this work.</font>
- <font size=+2 style="color:#181818;">Machine learning: also must use probability to improve models (robust, good predictions on unseen data), and to understand what the model's output *means*.</font>
<font size=+2><sup>1</sup>This phrase might be common way to indicate the idea discussed here (I've seen it in multiple places). Notably, by Jesse Johnson when discussing this idea.</font>
----
<h3>Filling in the gaps<sup>1</sup></h3>
- <font size=+2>When modeling data, rarely interested in points in the data. Interested instead in points *not* in the data -- the "yet unseen" data.</font>
- <font size=+2>**Population probability distribution**: what want our model to approximate. Only have observed data to use, a *sample* from population.</font>
- <font size=+2>Statistics$=$ using sample data to be confident about population data. Harness probability theory to make this work.</font>
- <font size=+2>Machine learning: also must use probability to improve models (robust, good predictions on unseen data), and to understand what the model's output *means*.</font>
<font size=+2><sup>1</sup>This phrase might be common way to indicate the idea discussed here (I've seen it in multiple places). Notably, by Jesse Johnson when discussing this idea.</font>
---
<h3>Basics of Probability</h3>
Something is being "observed"...
- <font size=+2 style="color:#181818;">**Sample space** $\Omega$: the set of all possible outcomes.</font>
- <font size=+2 style="color:#181818;">**Event**: a subset of $\Omega$. Could consist of just one outcome, $\{\omega\}$, where $\omega\in\Omega$, or could be more than that.</font>
- <font size=+2 style="color:#181818;">*For now*: ${\bf P}(A) = |A|/|\Omega|$ (the probability of $A$ is the number of outcomes in $A$ divided by number of all possible outcomes).</font>
----
<h3>Basics of Probability</h3>
Something is being "observed"...
- <font size=+2>**Sample space** $\Omega$: the set of all possible outcomes.</font>
- <font size=+2>**Event**: a subset of $\Omega$. Could consist of just one outcome, $\{\omega\}$, where $\omega\in\Omega$, or could be more than that.</font>
- <font size=+2 style="color:#181818;">*For now*: ${\bf P}(A) = |A|/|\Omega|$ (the probability of $A$ is the number of outcomes in $A$ divided by number of all possible outcomes).</font>
----
<h3>Basics of Probability</h3>
Something is being "observed"...
- <font size=+2>**Sample space** $\Omega$: the set of all possible outcomes.</font>
- <font size=+2>**Event**: a subset of $\Omega$. Could consist of just one outcome, $\{\omega\}$, where $\omega\in\Omega$, or could be more than that.</font>
- <font size=+2>*For now*: ${\bf P}(A) = |A|/|\Omega|$ (the probability of $A$ is the number of outcomes in $A$ divided by number of all possible outcomes).</font>
---
<h3>Example</h3>
- <font size=+2>Say that drawing one playing card from a deck of $52$ cards.</font>
- <font size=+2>Event: "an Ace is drawn". What is the probability?</font>
---
<h3>Example</h3>
- <font size=+2>Say that drawing two playing cards from a deck of $52$ cards.</font>
- <font size=+2>Event: "at least one Queen is drawn". What is the probability?</font>
---
<h3>Probability functions: Discussion</h3>
{"metaMigratedAt":"2023-06-15T22:43:51.038Z","metaMigratedFrom":"YAML","title":"Probability in Machine Learning and Data Science","breaks":true,"description":"View the slide with \"Slide Mode\".","contributors":"[{\"id\":\"da8891d8-b47c-4b6d-adeb-858379287e60\",\"add\":5708,\"del\":126}]"}