An Urn has 100 Marbles; 50 red and 50 white. You draw 4 marbles(with replacement). What is the probability of all the 4 marbles, have the same color.
Case 1: All Red P(1 Red Ball) = 50/100 = 1/2 = 0.5 P(4 Red Ball) = 0.5 ^ 4 = 0.0625 Case 2: All White P(1 White Ball) = 50/100 = 1/2 = 0.5 P(4 White Ball) = 0.5 ^ 4 = 0.0625 P(All Red or All White) = P(4 Red Ball) + P(4 White Ball) = 0.0625 + 0.0625 = 0.125 = 12.5%
Instead of 4 marbles, if you draw 7 marbles(with replacement). What is the probability of all the 7 marbles, have the same color.
Case 1: All Red P(1 Red Ball) = 50/100 = 1/2 = 0.5 P(7 Red Ball) = 0.5 ^ 7 = 0.0078125 Case 2: All White P(1 White Ball) = 50/100 = 1/2 = 0.5 P(7 White Ball) = 0.5 ^ 7 = 0.0078125 P(All Red or All White) = P(7 Red Ball) + P(7 White Ball) = 0.0078125 + 0.0078125 = 0.015625 = 1.56%
All the 4 marbles, having the same color has 8 (12.5 / 1.56) times more probability than the 7 marbles.
The key take away is, extreme outcomes (both high and low) are more likely to be found in small than in large samples.
Why is this important?
Consider the following statement. The counties in which the incidence of kidney cancer is lowest are mostly rural, sparsely populated and located in traditionally Republican states in the Midwest, the South and the East. What do you make of this?
Most of us will reason it the following way.
- No air and water pollution.
- No stress.
- Eat healthy foods.
Consider a slightly different statement. The counties in which the incidence of kidney cancer is highest. These ailing counties tend to be mostly rural, sparsely populated, and located in traditionally Republican states in the Midwest, the South and the East. What do you make of this?
Most of us will reason it the following way
- High stress due to poverty.
- Drink Alcohol.
- Chew Tobacco.
The answers look reasonable for both the statements. But they are wrong.
Excerpt from ‘Thinking Fast and Slow‘
Something is wrong, of course. The rural lifestyle cannot explain both very high and very low incidence of kidney cancer. The key factor is not that the counties were rural or predominately Republican. It is that rural counties have small populations.
Why did we not consider sample size?
Thinking about sample size, requires mental effort. Most of us will not put in the effort. Instead we depend on causal explanations Excerpt from ‘Thinking Fast and Slow‘
The exaggerated faith in small samples is only one example of a more general illusion – we pay more attention to the content of messages than to information about their reliability, and as a result end up with a view of the world around us that is simpler and more coherent than the data justify. Jumping to conclusions is a safer sport in the world of our imagination than it is in reality.
Are small schools better than larger ones?
One of the research concluded that most successful schools, on average, are small. Based of this data Gates Foundation made a substantial investment in the creation of small schools. Are small schools really better? Excerpt from ‘Thinking Fast and Slow‘
This probably makes intuitive sense to you. It is easy to construct a causal story that explains how small schools are able to provide superior education and thus produce high-achieving scholars by giving them more personal attention and encouragement than they could get in larger schools. Unfortunately, the casual analysis is pointless because the facts are wrong. If the statisticians who reported the Gates foundation had asked about the characteristics of the worst schools, they would have found that bad schools also tend to be smaller than average. The truth is that small schools are not better on average; they are simply more variable.