FAQ 6: What are random or spurious correlations?
What are random or spurious correlations?
Spurious correlations are correlations which are strong but observed purely due to chance, or randomness. The smaller is the sample size of two variables - the higher the chance for random correlations.
A fantastic collection of spurious correlations is maintained by Tyler Vigen here.
Look at an example: Correlation of mozzarella consumption over 10 years is strongly correlated with the number of civil engineering doctorates awarded at 95.9%.
We use these examples in every class and often as safety shares: do not trust high correelations if you have a small sample size!
Which sample size is safe? We say starting from 20 records.
[Technical Note] This is the number of records such as the 95th percentile of a distribution of correlations of random pairs of vectors is below 50%.