-
Notifications
You must be signed in to change notification settings - Fork 63
/
Copy pathTutorialX_Probability.Rmd
379 lines (258 loc) · 12.9 KB
/
TutorialX_Probability.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
---
title : Probability Refresher
subtitle : DPI StatChat
author : Jared Knowles
job : Research Analyst, Wisconsin DPI
framework : io2012 # {io2012, html5slides, shower, dzslides, ...}
highlighter : prettify # {highlight.js, prettify, highlight}
hitheme : tomorrow #
widgets : [mathjax, bootstrap] # {mathjax, quiz, bootstrap}
mode : standalone # {standalone, draft}
---
<style>
.title-slide {
background-color: #FFFFFF;
/* background-image:url(http://goo.gl/EpXln); */
}
.title-slide hgroup > h1{
font-family: 'Oswald', 'Helvetica', sanserif;
}
.title-slide hgroup > h1,
.title-slide hgroup > h2 {
color: #EF5150; /* ; #EF5150*/
}
em {
font-style: italic
}
strong {
font-weight: bold;
}
pre {
font-size: 18px;
line-height: 20px;
}
.left{
float:left;
}
.right{
float:right;
}
.center{
text-align:center;
}
</style>
## StatChat!
<div class="center">
<img class="acentre" src="img/StatChat.png" height="240" width="280"/>
</div>
- Probability Refresher (today)
- Moving from Data to Analysis (moved to July 29th)
- Designing Evaluations - Getting the Comparison Right (moved to August 5th)
--- bg:white
## Goals for Today
- Provide an overview of probability theory
- A brief refresher on the math of probabilities
- Demonstrate probability in effect
- Discuss why probability is relevant in decision making and problem solving
- Demonstrate statistical distributions
- Discuss their implications for policy making
--- bg:white
<p><q> Why does probability <span class = 'red'>matter to me</span>? </q></p>
---
<q class='shout'>The laws of probability, so true in general, <span class = 'key'>so fallacious in particular.</q>
Edward Gibbon
---
<q class='shout'>The 50-50-90 rule: Anytime you have a 50-50 chance of getting something right, there's a 90% probability you'll get it wrong. </q>
Andy Rooney
---
## Definitions
- $$p = \dfrac{X Occurs}{X Could Occur}$$
- A probability is quantified statement about **how likely** something will happen, ranging between 0 (absolutely certain not to happen), and 1 (absolutely certain to happen)
- To create this ratio we need to know the numerator (how many times X occurred) and the denominator (the universe of possibile trials in which X could have occurred)
- Sometimes the universe is simply a measure of time, others there are rules that describe the discrete universe of events
- In practicality, proving a probability of 0 or 1 is quite difficult, except the probability that you will enjoy this session!
--- bg:white
## Some Motivation for Probability
- Probabilities are inherently tied to the concept of **repeatability**, a probability represents the proportion of times we expect something to be successful over a reasonable number of trials
- Probability allows us to **quantify** how rare **rare events** really are, especially in cases when we cannot observe a long run of outcomes
- Probability also allows us to quantify how surprised or **unusual** the occurrence of a rare event is when we do observe them
- Probability is used to identify a **signal** within a noisy measurement
- The fundamental particles of physics behave according to properties of probability
- Probability is often framed alternatively in terms of **risk** or odds
--- bg:white
## Chained Probabilities
<div class="center">
<img class="acentre" src="img/G42009tree.png" height="375" width="750"/>
</div>
--- bg:white
<p><q> What's wrong with a little <span class = 'red'>math</span>? </q></p>
---
## Probability
Five simple formulas
> - $P(A \cap B) = P(A)P(B)$ (if A and B are independent)
> - $P(A \cup B) = P(A) + P(B)$ (mutually exclusive A and B)
> - $P(A \cup B) = P(A) + P(B) - P(A \cap B)$ (not mutually exclusive)
> - $P(A|B) = \cfrac{P(A \cap B)}{P(B)} = \cfrac{P(B|A)P(A)}{P(B)}$
> - $P(A \cap B) = P(A|B)P(B) = P(B|A)P(A)$ (if A and B are dependent)
We use them everday.
---
## Independent Events (Joint Probability)
$$P(A \cap B) = P(A)P(B)$$
- We simply multiply the probabilities of independent events together.
- Rolling a 3 and a coin landing on tails = $\cfrac{1}{6} \times \cfrac{1}{2}$
- What is another example?
---
## Probability of A or B (mutually exclusive events)
$$P(A \cup B) = P(A) + P(B)$$
- We can simply add probabilities in these cases.
- Rolling a 3 or a 4 = $\cfrac{1}{6} + \cfrac{1}{6}$
- What is another example of mutually exclusive events?
---
## Probability of A or B (non-mutually exclusive events)
$$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$
- Just like above we add the probabilities, but we must subtract the probability
off both events occurring.
- Rolling a 3 or a coin landing on tails = $\cfrac{1}{6} + \cfrac{1}{2} - (\cfrac{1}{6} \times \cfrac{1}{2})$
- What is another example of events that are not mutually exclusive?
---
## Conditional Probability aka the Real World
$$P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{P(B|A)P(A)}{P(B)}$$
- We may want to know the probability of event A, given a fixed outcome of probability B.
- When drawing two aces from the same deck, the probability of drawing an ace on the first draw is $\cfrac{4}{52}$
- but the probability of drawing a second ace conditional on already drawing an ace is $\cfrac{\frac{3}{51} \times \frac{4}{52}}{\frac{4}{52}}$.
- What is another example?
---
## Probability of A and B (dependent events)
$$P(A \cap B) = P(A|B)P(B) = P(B|A)P(A)$$
- Just like the previous example about the probability of the second ace. However,
this does not take into account the probability of drawing an ace in the first place!
- Thus the probability reduces to: $\cfrac{\frac{4}{52} \times \frac{3}{51}}{\frac{4}{52}} \times \cfrac{4}{52} = \frac{4}{52} \times \frac{3}{51}$
- First each draw from the deck we would have a 7.8% chance of an ace. Then, once we drew that first ace, we would have a 5.88% chance at another ace.
- Together, we have a 0.45% chance of drawing two aces sequentially.
--- bg:white
<p><q> What do cards and dice have to do with <span class = 'red'>policy</span>? </q></p>
---
## Real world
- Conditional probabilities are all around us, and are the most interesting to us
- What is the chance a student graduates high school, conditional on having a test score of X?
- What is the chance a student develops an alcohol problem, conditional on having Z type of friends?
- What is the chance a school meets AYP, conditional on receiving additional state and federal aid to improve instruction?
- EXAMPLE
---
## Asiana Airlines Flight 214 Tragedy
> - What are some policy recommendations or priorities that might be recommended after such a tragedy?
> - What is the probability of a commercial airline crash in the US?
> - There are **.466** crashes per 100,000 departures on average from 1983-2000 [NTSB Report](http://www.ntsb.gov/doclib/safetystudies/SR0101.pdf)
> - There were **.0076** accidents per 1 million miles flown (roughly 7 billion miles flown per year nationwide)
> - Given an airline crash, what is the probability that there are fatalities?
> - **0.125**
> - What percentage of airline crash occupants survive?
> - 95.7%
---
## Compare to Automobiles
> - What is the probability of an automobile crash in Wisconsin?
> - **2** crashes per 1 million miles driven [DOT Report](http://www.dot.wisconsin.gov/safety/motorist/crashfacts/docs/crash-crashes.pdf)
> - Given a crash in Wisconsin, what is the chance that it is fatal?
> - **0.004577**
> - How many fatal accidents occur per 100 million miles?
> - **0.8518**
> - What is the probability of a highway fatality in Wisconsin? [NHTSA Report](http://www-nrd.nhtsa.dot.gov/departments/nrd-30/ncsa/STSI/55_WI/2011/55_WI_2011.htm)
> - About 1 per 100 million miles driven
---
## Graph It!
```{r, message=FALSE, echo=FALSE, warning=FALSE, comment=NA, fig.width=8, fig.height=4, fig.align='center'}
library(eeptools)
names <- c("Airline Crashes", "Airline Fatal | Crash", "Auto Crashes", "Auto Fatality | Crash")
probs <- c(7.6e-09, .125, 2e-05, .00457)
qplot(names, probs, geom='bar') + theme_dpi() +
labs(title="Probability of Event",
x="Event", y="Probability") + scale_y_sqrt() +
geom_text(label=probs, vjust=-1)
```
- Which mode of transportation is safer?
- Per million miles there are 100 fatal car accidents and .0004 fatal air crashes
- If you knew you were going to be in a crash, which would you prefer?
- What extra information would help you make this decision?
---
## Exercise
<p><q> Imagine you are from the National Transportation Safety Board and you are reviewing
the crash. Your agency knows the long run safety information above. What are some
<span class = 'red'>policy actions</span> or questions you should be asking?</q></p>
---
## How do we apply this to our work?
> - What is the probability a student in Wisconsin will graduate from high school?
> - Is this a useful question?
> - What additional information would help answer the question better?
> - The list of conditional probabilities is quite large!
> - This is the basis for statistical modeling and statistical analysis
---
<q class='shout'> Probabilistic events generate data <span class = 'red'>in particular patterns</span>, these are called distributions. </q>
---
## Exploring Distributions
- One way to think of distributions is that they capture uncertainty in cases of measurement error. Let's look at some examples of measurement error:
- Write down your estimate of the following three things on a piece of paper and hand them in, make a copy for yourself, keep your estimate to yourself for now:
* How tall in inches (to the nearest quarter inch) is Jared?
* What is the US Census estimate of the median household income in Illinois from 2007-2011, in dollars?
* What is the current world record time in the women's marathon, in minutes?
---
## Graphing your Answers
- While we input your answers, talk with your neighbor about your estimates. Was your neighbor higher or lower than you? Would you change your estimate based on sharing information with your neighbor?
- Talk to another neighbor, would you change your estimate further?
> - How many of you would change your answers now?
> - Let's look at the responses from the group!
---
## Look at the answers
> - Jared is 68.25 inches tall
> - Illinois household income is $56,576 [source](http://quickfacts.census.gov/qfd/states/17000.html)
> - Marathon time is 2:15:25, or 135.42 minutes [source](http://en.wikipedia.org/wiki/Marathon_world_record_progression)
---
## Defining Distributions
- [Hey Look a Web App!](http://glimmer.rstudio.com/wisconsindpi/distribution)
---
## Why do distributions matter?
---
## Further Examples
<p><q> Gameshows and lotteries!</q></p>
---
## Deal or No Deal
- [Let's Play](http://www.nbc.com/Deal_or_No_Deal/game/flash.shtml)
- Discussion
---
## Demonstrate this Through A Lottery
- The lottery is one of the most classic cases of a conditional probability
- Let's run our own lottery
- We'll 'draw' 3 numbers from a pool of numbers 1-6 without replacing the numbers. And we'll draw one "powerball" from a pool of numbers 1-4.
- Everyone pick 3 numbers from 1 to 6; and 1 powerball from 1 to 4
- Order does not matter except for the powerball
- Write them down on a piece of paper with your name and hand them in
---
## Exercise
- While we input your picks, think about what your odds of winning are
- Remember, after a number is drawn, it is not replaced so there is a $\cfrac{1}{6}$ then $\cfrac{1}{5}$, then $\cfrac{1}{4}$ choice set for three balls, plus the $\cfrac{1}{4}$ for the powerball.
- Why or why not might your answer be correct?
---
## Answers
> - Chaining together multiple probabilities really complicates things!
> - If we had to get the balls correctly in order, then the probability would be
very straightforward: $\frac{1}{6} \times \frac{1}{5} \times \frac{1}{4} \times \frac{1}{4}$
> - This is $\frac{1}{480}$, is that too low or too high?
> - But we can get them right in any order -- why does this matter?
> - How many unique combinations of your three numbers are there?
> - 6
> - So out of 120 possibilities for those three numbers, you have a chance at 6, and out of the powerball you have the same chance, 1 in 4
> - What's the new probability?
> - $\frac{6}{120} \times \frac{1}{4}$ or 0.0125 or $\frac{1}{80}$
---
## An Aside on Combinations
- Calculating complex conditional probabilities relies on a type of math known as combinatorics
- To figure out the number of ways to arrange **k** choices drawn from **n** objects,
a simple formula exists: $\cfrac{n!}{(n-k)!}$
- where: ! represents the **factorial** operator: $4! = 4 \times 3 \times 2 \times 1$
---
## Let's see how much you won!
- For each of you, we'll assume you picked the same numbers each time for 1,000 separate lottery drawings
- We'll see how long it took you to win your first lottery, how many times you won, and how many times you would have won without the powerball being included
- Demo
---
## The OP - Double Slit Electron Experiments
<p align="center"><img src="img/Double-slit_experiment_results_Tanamura_four.jpg" height="450" width="750"></p>