So, we're taking a stats class and reading the text. We're learning about probability, and all is going well. Of course flipping a coin has a heads probability of .50! Of course the probability of rolling a 6 on a fair 6-sided die is 1/6! Stats is supposed to be difficult?
Then we find out that we were just learning about ONE flip of the coin and ONE roll of the die. Dr. Quantoid tell us, "It is estimated that 20% of the fish in my lake are large-mouth bass. What is the probability of catching a large-mouth bass?
0.2, right? Right!
But, what is the probability of catching 3 bass out of 5 fish caught?"
Uh-oh....the neurons in this classroom have just started firing....and possibly, not down the correct paths!
The professor throws up this formula on the board:
I will use "typing notation" to restate this as
Pr(Y=k) = (nCk) * (p^k) * ((1-p)^(n-k)
Either way, the students' eyes go blurry and class is followed by a surge in optometrist appointments (possibly saving the day from a predicted sharp economic downturn)
But, the formula REALLY ISN'T COMPLICATED. It just looks that way. It's kinda' like those new-fangled cell phone PDA mp3 abc pdq xyz all-in-one devices. Hey, if you can find "Play" or "Send", it's not all that bad, right?
(Caution: I think the equation is blurry anyway but I can't tell...I'm not wearing my contacts. Please report any suspected true blurriness to the webmaster before visiting your optometrist.)
Moving forward...let's look at the first part:
Pr( Y=k) =
This is just saying that the probability that Y=3 is equal to something. What is Y=3? Well, you only have two options for binary variables....0 and 1. A 0 is usually meant to denote the "failure", the "miss", the "tails", the "no" while the 1 is usually meant to denote the "success", the "hit", the "heads", the "yes".
So, Pr(Y=3) is the probability of catching 3 bass....of having 3 successes....three hits....etc....is equal to something.
That something looks like a messy room full of n's , k's , and p's. (I thought I told you kids to clean...that...mess...up!)
Ok, let's do it.
First, we should befriend those letters. (If you don't believe that letters can be your friend then you definitely didn't watch enough Sesame Street as a kid!)
n is just our sample size....you should be ok with this by now. n=5
k is the number of Y=1's of interest. We want to know the probability of catching 3 bass out of 5. So, k=3
Finally, if we didn't know something about the probability of catching a bass then, yes, this problem would be VERY difficult to solve. But, we do know. (whew)
This probability is 0.2. So, p=.2
So, we plug in all these numbers and get the answer right? Right. But, that doesn't help you understand why the equation is so easy.
Let's look at the second part of the equation...(p^k).
This is asking you to raise the probability to the success power. So, we are raising .20 to the 3rd power. Why? Well, we already should know that the joint probability of two independent events is simply their product. For example, the probability of flipping two heads is .5 * .5 = .25.
So, what is the probability of buying 20 gallons of rotten milk when the probability is .05?
Well, that's .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05 * .05.
Now, why on Earth would we want to write all of that down when it's the same thing as saying .05 to the 20th power, which is .05^20? And, for that matter, why on Mars? Why on Jupiter? Why on any planet?
(I will save other possibilities such as "Why in Heaven?" and "Why in Hell" for the metaphysics community. It's just a bit too 'out there' for statisticians already dealing with issues of causality, etc...")
Many students solve p^k and turn that in as the answer. How callous, rude, and thoughtless! What about the other 2 fish? Y'know....those 2 non-bass?!? Do you really think that just because we are interested in the probability of 3 bass out of 5 fish that we can just ignore the 2 that aren't bass? If you ignore them, aren't you just talking about the probability of 3 bass out of 3 fish? (And discrimination of anything is SO pre-Y2K!)
There is a take-home point here! When we talk about the 0's, failures, misses, tails, non-bass, etc...we are not making a moral indictment on them. They are still INFORMATION. They still must be included in our calculations (even if we are upset that we only caught 3 bass and have 5 mouths at home to feed and don't have a cat to kick and feel compelled to blame Flipper).
So, this all ties into the third part of the equation...(1-p)^(n-k). Well, if the p=.2 is the probability of catching a bass then (1-.2), which is .8, is the probability of catching a non-bass. Like before, we wouldn't want to multiply .8 over and over and over by hand. If there are n=5 in the sample and there are k=3 bass, then we are now interested in 5-3 = 2 non-bass. So, we just raise .8 to the 2nd power.
Combining p^k and (1-p)^(n-k), all we are doing is asking for .2 * .2 * .2 * .8 * .8
It's just the joint probability solved by multiplying the individual probabilities for both successes AND failures. Why didn't they just Dr. Quantoid just put THAT in the textbook? Well, look back at the rotten milk example. Or think about this: What is the probability of 400 students out of 1,200 getting into graduate school when the probability is .14. Would you like to write that out? Or would it better to just say (.14^400) *((1-.14) ^ (1200 - 400))
Finally, what about the first part of the equation? Y'know that oversized parenthesis with an n on top of a k?
Well, we read that as "n choose k". So, it's 5 choose 3. Huh?
Of course we want to choose 3 from 5. But, guess what? There are a lot of ways to do this. Below are my fishing results for 3 different days.
Day 1: Bass Bass Bass Bluegill Carp
Day 2: Carp Bass Bass Shark Bass
Day 3: Whale Dolphin Bass Bass Bass
The point here is that there are many ways to catch 3 fish out of 5. It could be the first 3 fish as in day 1. It could be the last 3 fish as in day 3.
I could write out all of the possibilities and do some math but it's a lot easier to know that "n choose k" will give me the answer right away.
"n choose k" is equal to n! / k!(n-k!)
n! is (5*4*3*2*1)
k! is (3*2*1)
n-k! is (5-3)! = 2! = (2*1)
So, the full solution is ((5*4*3*2*1) / (3*2*1) * (2*1))
(Note: there was actually another bass inside of the whale but that violates the binomial assumption of identical trials so we'll pretend that didn't happen.)
(Note: After catching the whale, I almost caught a bass but the dolphin ate it....that violates the binomial assumption of independent trials so we'll pretend that didn't happen.)
So, what have we learned?
1. We are just multiplying the probabilities of "successes".
2. We are also multiplying the probabilities of "failures".
3. We are multiplying this by all the ways we can have those "successes".