When I was finishing up my Master’s Degree in Philosophy, I sat in on a tutorial with a few Cognitive Science students on Mind. We all had to give individual presentations, and one woman talked about Bayesian reasoning and about the taxicab problem. I found the example massively counter-intuitive, and ended up arguing in E-mail about this with a couple of students over it until everyone got sick of it. This impacted me in two ways:
1) It led to me having a great distrust of Bayesian probability.
2) It confirmed for me something that I had already held to be true about the “Gambler’s Fallacy”, which is that I classify these as “Obi-Wan Fallacies”: what action you should take/what you should believe depends greatly on your point of view.
I was thinking about this again yesterday while hanging around the university waiting for the Alumni office to open, and came to a conclusion about what exactly was wrong with the taxicab problem and why it didn’t work. And then while searching for a good summary of the taxicab problem I found this paper from 1999 that sums that up precisely. Before I summarize it, let me summarize the problem, taken from the appropriate sections here:
In another study done by Tversky and Kahneman, subjects were given the following problem:
“A cab was involved in a hit and run accident at night. Two cab companies, the Green and the Blue, operate in the city. 85% of the cabs in the city are Green and 15% are Blue.
A witness identified the cab as Blue. The court tested the reliability of the witness under the same circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colors 80% of the time and failed 20% of the time.
What is the probability that the cab involved in the accident was Blue rather than Green knowing that this witness identified it as Blue?”
Most subjects gave probabilities over 50%, and some gave answers over 80%. The correct answer, found using Bayes’ theorem, is lower than these estimates:
* There is a 12% chance (15% times 80%) of the witness correctly identifying a blue cab.
* There is a 17% chance (85% times 20%) of the witness incorrectly identifying a green cab as blue.
* There is therefore a 29% chance (12% plus 17%) the witness will identify the cab as blue.
* This results in a 41% chance (12% divided by 29%) that the cab identified as blue is actually blue.
No, to me, the right answer was: 80%. This is the probability that the witness identified it correctly. But, regardless, them being given as over 50% seems to indicate this reasoning: it can’t be the case that someone, under the appropriate conditions, can identify the colour of the cab reliably and yet it be someone more likely that they are identifying the colour of the cab incorrectly in this case. It’s only the Bayesian calculations that say otherwise, but then surely applying Bayes’ theorem here is the wrong way to solve this problem. At the time, I conceded that over time these numbers might work out, because the differing numbers of cabs would result in more mistakes made identifying blue cabs than green ones, but for every indiviual event it can’t work out that way. So an insurance company might want to use the Bayesian numbers, while a judge looking only at a specific case couldn’t. That, then, made it an Obi-Wan Fallacy. Even trying to run a computer model ran into issues of it depending on how you counted.
Michael Levin, in his article, sums up how I came to understand the problem yesterday, with some additional nice mathematics for people who like that sort of thing. The key part is here:
“Reliability” should be explicated so as to preserve the apparent truism that
someone equally reliable at two t asks-such as shooting for two different
regiments, o r identifying cabs o f different c olors-is equally likely to succeed
at both. This principle is violated by the “Bayesian” analysis I have criticized.
For let us assume, as does the received analysis, that Witness is precisely as
reliable about Greens as about Blues, i. e., (5) and (6). To evaluate the prob-
ability that the errant cab was Green i f Witness says it was, switch h with – h
and w with – w in (7); P (-h!-w) is then (.8 x .85) + [(.8 x .85) + .2 x .15)]
66 Michael Levin
= .95. That P (-hl-w) » P(hlw)-the cab is more likely to have been Green
i f Witness says Green than to have been Blue if Witness says Blue-shows
that, whatever we are discussing, it is not the probability that Witness is right.
What I had thought of for a long time was the idea that the Bayesian analysis couldn’t be right because the probability of it being a blue cab or a green cab had to, logically, be identical to the probability that the witness had identified the cab properly. That’s what saying that they can identify the colour of a cab reliably 80% of the time means. What I should be able to do, then, is take the final probability of the cab being blue given that the witness identified it as blue and sub it into the probability that the witness identified the colour of the cab correctly (in this case, as blue). But remember that the probability that the witness identified the colour of the cab correctly was our initial probability, which means that to do that properly you’d have to run it through the Bayesian analysis again, which would change the results, which would lead to an infinite progression until you got to 0, which can’t be what you wanted.
To work around this, you have to argue one of two things:
1) That the probability that the witness can identify the colour of the cab correctly isn’t what was measured, but is the result of the Bayesian analysis. This leads to the looping above and makes the measurement pointless and suspect.
2) That the probability that the witness identified the colour of the cab correctly is not the probability that the cab was the colour the witness identified it as. But written out like this, it seems obvious that the probability that the witness identified the colour of the cab correctly is identical to the probability was the colour they said it was. That seems to be what that means, most of the time.
So, as Levin says:
What we are discussing, when Bayes’s Theorem comes into play, is the
cab’s likely color when we do n ot know the probability that a cab is the color
Witness says it is. Background infonnation, including base rates, then be-
comes pertinent. I f most cabs are Green, the cab Witness saw very likely was
Green, all else equal. I f in addition most o f the time Witness will say a cab is
Green when i t is, a nd say i t is Blue when i t is, the cab he saw is almost certain
to have been Green i fhe says G reen-but less certain to have been Blue i fhe
says Blue. Many situations, like this one, involve an indicator o f unknown
trustworthiness. We know the odds that a subject with clogged arteries will
feel fatigue, and the odds that a subject with nonnal arteries will feel fatigue.
What we would like to know is the specificity o f fatigue, the probability that
someone feeling fatigue has clogged arteries. In such cases we should not say
we know how well fatigue predicts clogged arteries. Did we know that, fur-
ther infonnation would be superfluous. Indeed, knowing an idicator’s trust-
worthiness and what the received analysis calls “trustworthiness” would us to
solve for the base rate.
You don’t and can’t use Bayesian analysis when one of the probabilities you are using in and of itself determines what the final probability is. That’s precisely the mistake that’s being made here. So if you are going to use Bayesian analysis, you need to be very careful to ensure that you don’t fall into this trap. If you do, you will end up with very counter-intuitive results that look right mathematically but fail logically. Which explains my problem with it, since I’m far stronger logically than mathematically, and so insisted that the logic couldn’t be violated even though the mathematics said it could.