The 'clustering illusion' is the natural human tendency to "see
patterns where actually none exist." Since, according to a branch of mathematics known as
Ramsey Theory, complete mathematical disorder in any physical system is an impossibility, it may be more correct to state, however, that the 'clustering illusion' refers to the natural human tendency to associate some
meaning to certain types of patterns which must inevitably appear in any large enough
data set.
For instance, most people say that the sequence "OXXXOXXXOXXOOOXOOXXOO" (Gilovich, 1993) is non-
random, when, in fact, it has many qualities which would also appear to be desirable characteristics of what one expects to see in a "random" stream, such as having an equal number of each result and the fact that the number of adjacent results with the same outcome is equal for both possible outcomes. In sequences like this, people seem to expect to see a greater number of alternations than one would predict
statistically. In fact, in a short number of trials, variability and non-random-looking "streaks" are quite
probable.
As another example, the answers of the
SAT (an important multiple-choice standardized test in the United States) are specifically chosen not to contain any long runs of the same letter, because experience has shown test designers that students believe these runs are unlikely to occur. As a result, a student may feel pressured into choosing a wrong answer just to break a run.
Whether or not patterns exist in a data set can often be decided by means of statistical analysis, or even methods of computational
cryptanalysis. The sequence "XXXOXOXOOOXOXOOOXOX" may appear random to most viewers, but if the position of the X's are associated with
prime numbers, and the O's with
composite numbers, the pattern is clearly non-random.
Data compression algorithms are designed, in a sense, to "look for patterns" in data, and to create alternative representations from which it is possible to reconstruct the original data from a compressed form. Large datasets which contain "clusters" of a non-random nature can in general be expected to compress well, given the right encoding algorithm. On the other hand, if there is no real clustering, or pattern, in a particular data set, then one would expect it to compress poorly, if at all.
Scientific American's Michael Shermer suggests that the clustering illusion is a byproduct of the human brain's capability for
pattern recognition.
[1]
The clustering illusion was central to a widely reported study by
Thomas Gilovich,
Robert Vallone and
Amos Tversky. Their conclusion was that the "hot hand" of basketball is indistinguishable from chance (where "hot hand" is the idea that players shoot successfully in "streaks"). Famous coaches including
Bobby Knight reportedly scoffed at the study.
Using this
cognitive bias in causal reasoning may result in the
Texas sharpshooter fallacy. It may also often be the cause of the
gambler's fallacy. ('See
representativeness heuristic').
See also
★
Illusion of control
★
List of cognitive biases
★
Pareidolia
★
Pattern recognition
★
Apophenia
References
1. http://www.sciam.com/article.cfm?chanID=sa006&colID=13&articleID=000EB977-12BE-1264-8F9683414B7FFE9F
★ Gilovich, T., Vallone, R. & Tversky, A. (1985). The hot hand in basketball: On the misperception of random sequences. ''Cognitive Psychology'' ' 17', 295-314.
★ Gilovich, T. (1993). ''How We Know What Isn't So: The Fallibility of Human Reason in Everyday Life''. New York: The Free Press. ISBN 0-02-911706-2
External links
★
Skeptic's Dictionary: The clustering illusion
★
Hot Hand website: Statistical analysis of sports streakiness