Archive for May, 2014

They count only blue cabs …

May 21, 2014

When I was finishing up my Master’s Degree in Philosophy, I sat in on a tutorial with a few Cognitive Science students on Mind. We all had to give individual presentations, and one woman talked about Bayesian reasoning and about the taxicab problem. I found the example massively counter-intuitive, and ended up arguing in E-mail about this with a couple of students over it until everyone got sick of it. This impacted me in two ways:

1) It led to me having a great distrust of Bayesian probability.
2) It confirmed for me something that I had already held to be true about the “Gambler’s Fallacy”, which is that I classify these as “Obi-Wan Fallacies”: what action you should take/what you should believe depends greatly on your point of view.

I was thinking about this again yesterday while hanging around the university waiting for the Alumni office to open, and came to a conclusion about what exactly was wrong with the taxicab problem and why it didn’t work. And then while searching for a good summary of the taxicab problem I found this paper from 1999 that sums that up precisely. Before I summarize it, let me summarize the problem, taken from the appropriate sections here:

In another study done by Tversky and Kahneman, subjects were given the following problem:

“A cab was involved in a hit and run accident at night. Two cab companies, the Green and the Blue, operate in the city. 85% of the cabs in the city are Green and 15% are Blue.

A witness identified the cab as Blue. The court tested the reliability of the witness under the same circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colors 80% of the time and failed 20% of the time.

What is the probability that the cab involved in the accident was Blue rather than Green knowing that this witness identified it as Blue?”

Most subjects gave probabilities over 50%, and some gave answers over 80%. The correct answer, found using Bayes’ theorem, is lower than these estimates:

* There is a 12% chance (15% times 80%) of the witness correctly identifying a blue cab.
* There is a 17% chance (85% times 20%) of the witness incorrectly identifying a green cab as blue.
* There is therefore a 29% chance (12% plus 17%) the witness will identify the cab as blue.
* This results in a 41% chance (12% divided by 29%) that the cab identified as blue is actually blue.

No, to me, the right answer was: 80%. This is the probability that the witness identified it correctly. But, regardless, them being given as over 50% seems to indicate this reasoning: it can’t be the case that someone, under the appropriate conditions, can identify the colour of the cab reliably and yet it be someone more likely that they are identifying the colour of the cab incorrectly in this case. It’s only the Bayesian calculations that say otherwise, but then surely applying Bayes’ theorem here is the wrong way to solve this problem. At the time, I conceded that over time these numbers might work out, because the differing numbers of cabs would result in more mistakes made identifying blue cabs than green ones, but for every indiviual event it can’t work out that way. So an insurance company might want to use the Bayesian numbers, while a judge looking only at a specific case couldn’t. That, then, made it an Obi-Wan Fallacy. Even trying to run a computer model ran into issues of it depending on how you counted.

Michael Levin, in his article, sums up how I came to understand the problem yesterday, with some additional nice mathematics for people who like that sort of thing. The key part is here:

“Reliability” should be explicated so as to preserve the apparent truism that
someone equally reliable at two t asks-such as shooting for two different
regiments, o r identifying cabs o f different c olors-is equally likely to succeed
at both. This principle is violated by the “Bayesian” analysis I have criticized.
For let us assume, as does the received analysis, that Witness is precisely as
reliable about Greens as about Blues, i. e., (5) and (6). To evaluate the prob-
ability that the errant cab was Green i f Witness says it was, switch h with – h
and w with – w in (7); P (-h!-w) is then (.8 x .85) + [(.8 x .85) + .2 x .15)]
66 Michael Levin
= .95. That P (-hl-w) » P(hlw)-the cab is more likely to have been Green
i f Witness says Green than to have been Blue if Witness says Blue-shows
that, whatever we are discussing, it is not the probability that Witness is right.

What I had thought of for a long time was the idea that the Bayesian analysis couldn’t be right because the probability of it being a blue cab or a green cab had to, logically, be identical to the probability that the witness had identified the cab properly. That’s what saying that they can identify the colour of a cab reliably 80% of the time means. What I should be able to do, then, is take the final probability of the cab being blue given that the witness identified it as blue and sub it into the probability that the witness identified the colour of the cab correctly (in this case, as blue). But remember that the probability that the witness identified the colour of the cab correctly was our initial probability, which means that to do that properly you’d have to run it through the Bayesian analysis again, which would change the results, which would lead to an infinite progression until you got to 0, which can’t be what you wanted.

To work around this, you have to argue one of two things:

1) That the probability that the witness can identify the colour of the cab correctly isn’t what was measured, but is the result of the Bayesian analysis. This leads to the looping above and makes the measurement pointless and suspect.

2) That the probability that the witness identified the colour of the cab correctly is not the probability that the cab was the colour the witness identified it as. But written out like this, it seems obvious that the probability that the witness identified the colour of the cab correctly is identical to the probability was the colour they said it was. That seems to be what that means, most of the time.

So, as Levin says:

What we are discussing, when Bayes’s Theorem comes into play, is the
cab’s likely color when we do n ot know the probability that a cab is the color
Witness says it is. Background infonnation, including base rates, then be-
comes pertinent. I f most cabs are Green, the cab Witness saw very likely was
Green, all else equal. I f in addition most o f the time Witness will say a cab is
Green when i t is, a nd say i t is Blue when i t is, the cab he saw is almost certain
to have been Green i fhe says G reen-but less certain to have been Blue i fhe
says Blue. Many situations, like this one, involve an indicator o f unknown
trustworthiness. We know the odds that a subject with clogged arteries will
feel fatigue, and the odds that a subject with nonnal arteries will feel fatigue.
What we would like to know is the specificity o f fatigue, the probability that
someone feeling fatigue has clogged arteries. In such cases we should not say
we know how well fatigue predicts clogged arteries. Did we know that, fur-
ther infonnation would be superfluous. Indeed, knowing an idicator’s trust-
worthiness and what the received analysis calls “trustworthiness” would us to
solve for the base rate.

You don’t and can’t use Bayesian analysis when one of the probabilities you are using in and of itself determines what the final probability is. That’s precisely the mistake that’s being made here. So if you are going to use Bayesian analysis, you need to be very careful to ensure that you don’t fall into this trap. If you do, you will end up with very counter-intuitive results that look right mathematically but fail logically. Which explains my problem with it, since I’m far stronger logically than mathematically, and so insisted that the logic couldn’t be violated even though the mathematics said it could.

I’ve got a bad feeling about this …

May 18, 2014

So, after deciding to back Montreal in the East because I “had a feeling” … they dropped the first game 7 – 2 and Carey Price might be injured.

Oh, what a feeling …

NHL Playoff Predictions: Round 3

May 17, 2014

Well, I had a pretty poor second round, going 1 – 3 leaving me at 6 – 6 for the year. Well, at least I’m still at .500. Of course, these are the rounds that are always harder to pick, so let’s see how I do down the stretch.

Eastern Conference:

Montreal vs Rangers Incorrect

This is a very tight series to call. Both teams have been good, both teams have great goaltending, both teams have beaten teams they shouldn’t have … really, they’ve both had great runs. So, you could really flip a coin here and be no more likely to pick the right team than a deep analysis would. So, I’m going to go with Montreal. No real overwhelming reason; it just feels right.

Western Conference:

Chicago vs L.A. Incorrect

Picking the home teams killed me in the second round; picking all the visiting teams would have had me go 3 – 1. And the Kings have showed a lot of determination in these playoffs. But Chicago are the champions and are at least as good a team, and are at least a bit rested, which should be a factor. So I think Chicago should pull it off.

Round: 0 – 2
Overall Record: 6 – 8

Net Neutrality and the Core Network

May 8, 2014

Reading a tweet on Shamus Young’s site, I was directed to this youtube video by Vi Hart on Net Neutrality. And, in watching it, there are a few misconceptions in it that make sense from the perspective of someone who isn’t in a major ISP — meaning, the people who buy the hardware and maintain it to get all of that traffic from one place to another — but when you know what’s happening behind the scenes you can see that it isn’t quite right. Since I work in telecommunications myself — not at an ISP but at a company that supplies the ISPs, particularly in software that manages all of the equipment that you need to get traffic from one place to another — I thought I’d try to explain some of the things behind the scenes that I can do without, well, putting my job in jeopardy. Note that I don’t plan to say that major ISPs absolutely aren’t playing games in order to make more money, just to point out things that make the analysis and analogy misleading, and reasons why even ISPs that are playing things completely straight won’t like strict Net Neutrality.

The main analogy in the video is of a delivery company, delivering books. It starts by setting it up so that you have a person who is asking for delivery of books from two different companies, where one is from a chain bookstore and one is from a small bookstore. The chain bookstore ships a lot more things through them than the small bookstore, and at some point the delivery company says that the chain bookstore is shipping too much stuff so they’ll have to delay their deliveries while they still ship the book from the small bookstore — even if they’re going to the same person. It is then suggested that they just buy more delivery trucks, but this doesn’t appease them, and the company asks for more money from the chain bookstore instead, which is presented as being completely and totally unreasonable since, after all, isn’t it the case that the person buying more stuff or people buying more stuff brings them more business? Then why would they want more money on top of that?

And then it gets into all sorts of stuff about the FCC that I don’t know much about. But to explain how this ends up being misleading, I first want to talk about where the complaint is. The video talks about main roads and stuff like that, but it mainly talks about driveways, which would be the last bit of fibre from the main line to your house. It also talks about ISPs simply being able to run more cable (which is why this is limited to major ISPs that do lay cable as opposed to those that simply use the existing infrastructure) to solve the problem. All of this misses the point that the complaint is not about the edge of the network — ie the part directly attached to you — but is instead about the core of the network, which is what ships massive amounts of data between cities, across countries, and around the world.

So, let’s start there. Imagine that you have 100 units of bandwidth available for any application that wants to get its data to your customers. This bandwidth has to be shared amongst all applications, and if I understand Net Neutrality properly the idea is that all applications should, ideally, be treated the same. So, let’s say that we have 10 applications that want to use that bandwidth. Ideally, we’d want all of them to use 10 units each, because then the line is used to its full capacity and everyone still gets what they need when they need it. In practice, pretty much all applications will be “bursty” in some way, busier at some times than at others (and, of course, there won’t just be 10, but let’s live with that simplification for now). But that’s an ideal breakdown.

Now, imagine that one particular application starts getting more popular or bandwidth intensive, and so starts using more than its 10 units on a regular basis. Let’s say that it starts using 30 units. This is maintainable as long as everyone else isn’t using their 10 units, their bursts are at low-usage times, or the data isn’t critically time sensitive and so it can wait for a while if everything is busy. So, for example, E-mails and texts tend to be easily scalable this way because if they start putting out too much bandwidth and things get full, all that happens is that they get delayed for a few minutes or hours until things clear out, and most of the time few will really notice. Of course, separating out these cases immediately breaks strict Net Neutrality; we have to introduce the notion of priorities to know what traffic can be delayed for a bit and what has to be sent right now.

Which leads us to Netflix. Video and voice are incredibly high priority in a network, because for them to be useful you need to make sure that the next segment of video — a packet in IP — makes it there with a minimum of delay, at least relative to the last one you sent. If not, you get stuttering and a huge decrease in the quality of the service (in terms of the video, it gets “slow”), Voice, however, is fairly small, especially with all of the data compression that has been used for it over the past few decades (which is the main reason why TDM and ATM networks tended to find T1 level bandwidth acceptable for phone calls, with OC3 level required for their core, both of which as far as I can tell are very small today). Video, however, uses a lot of bandwidth, and it’s bandwidth that has to get there as quickly as possible and cannot be delayed without greatly affecting service.

So, going back to the example above, we have one application that can take up 30 to 50 units of our bandwidth — or possibly even more — and is also of the highest priority, so it will bump out everything else. Thus, what this risks — to return to the delivery truck analogy — that the chain bookstore will fill up all of the trucks so that the small bookstore simply can’t get their books delivered, and since this is in the core and not on the edge that would be true even if they were delivering to the same person. (Part of this is because at the core itself no one really knows where it’s going to end up, and since it’s servicing all customers and is trying to move between cities at times there’s no real sense in trying to figure out who the end user is. You’re trying to get the data to London at that point, not 123 Baker Street). And this is obviously not a good thing.

Now, the comment is that this is increasing the business for the ISP, so why can’t they analogously simply buy more trucks? In this specific case, why can’t they lay more cable? Well, in general, laying cable’s not that easy, but even then it’s not just about laying cable. The biggest part of the expansion is buying all of the switching equipment that figures out all of the important things like how to get the data to London and what traffic has to be sent now and what can wait. This equipment is not cheap, and each of these switches can only handle a certain amount of traffic itself before you need a new one. So there’s a significant amount of capital that you have to expend to expand the network, and to do that you have to believe that that expenditure will make you more money.

But wait, doesn’t the Netflix explosion make the ISPs more money? Well, not necessarily. For many if not most people, their ISP plans budget them get a certain rate of speed and a certain bandwidth and a certain usage in a month. While video uses up a ton of bandwidth, most of the time that’s in the rate they’re supposed to get … and if it isn’t, then at the edge they themselves are slowed down and the problem is solved for them. So most of their existing customers are already paying for enough bandwidth to watch videos, if they use all or most of it, and so won’t actually pay the ISP anymore unless they go on a splurge and have a limited plan … and if they notice this, then they’ll cut back once they hit their limit. That doesn’t stop people from all deciding to watch a great Netflix video all at the same time and flooding the core, and the ISP gets no more money from that than they are already getting. And the intermittent “Use it heavily until we hit our limit and then drop it” makes the expenditure worse because they might end up with an infrastructure that they need for two weeks out of a month and that doesn’t get used for the other two weeks … and they still didn’t get paid anymore for having it.

Thus, the idea of charging high-priority, high-bandwidth applications — again, video in general but Netflix in particular, perhaps — a fee to support an additional infrastructure in the core to get those applications the priority they need without screwing over everyone else. A gatekeeper at your driveway — as the video talked about when it talked about the fastlane — wouldn’t make sense because at that point they already have one. I don’t claim that ISPs aren’t putting one there, and I’d agree that that isn’t sane. What they can do is allocate out of their existing bandwidth a fastlane in the core, which would have a similar effect to the gatekeeper at the door but would ensure that the high-priority, high-bandwidth applications get what they need (as long as they pay for it), that other applications get what they need, and that they can tell when they need to add more infrastructure (ie either the smaller applications are still getting crowded out by themselves, or that those who are paying for the fastlane need more bandwidth to get the service that they’re paying for), all inside a structure where they actually do get more money the more these services come on-line. But to the end user, all they’d notice is that the site was getting slow or stuttery, which looks exactly the same as if there was a gatekeeper at the edge (or the driveway).

Look, I’m as cynical about big business as the next guy. I’m not here to praise nor bury the major ISPs. My goal here was to show the impact that services and applications like Netflix can have on a network, to show why maybe treating them differently isn’t so radical a notion after all. I mean, they would indeed want to be treated differently themselves because of how their traffic has to get there right away while E-mails and file transfers don’t, and so it’s also reasonable for ISPs to say that that — and the large bandwidth requirements — give them specific problems that they want to be able to resolve by treating them differently. At the end of the day, I’m not advocating for or against Net Neutrality or the ISPs “fastlane” ideas, but am instead just pointing out a technological issue from the other side that might have an impact on the discussion.

A fangirl by any other name …

May 1, 2014

So, there have been a lot of controversies in the geek/nerd/whatever-we’re-calling-it-these-days sphere, over what seems to be the most popular topic in most areas lately: sexism. It seems that some T-shirt company made a set of shirts that read as follows:

Shirt 1 – “I like fangirls like I like my coffee. I HATE coffee”.
Shirt 2 – “I like fanboys like I like my coffee. I HATE coffee”.

Cue angry denunciations of the first shirt, mostly for being sexist and discouraging women from being geeks and tying it all back to the old “fake geek girl controversy”. The company responded to the comments with a post on their Facebook page, saying this:

So, we’ve apparently received some bad word on our fan girl shirt, with accusations of sexism being thrown at us from a certain few bloggers…

…who have completely ignored our other variant shirt on display or didn’t even bother to ask our take on it.

Apparently it’s only sexism if it is insulting to one gender. Woo double standards. …

Anyways, the fangirl/fanboy shirts can best be explained like this: fangirls/boys =/= fans. Fans are people who like and genuinely respect a fandom, and it’s creators. Fangirls/boys are like those who have an unhealthy obsession who make us all collectively cringe in pain at what they do to the things we love.

No one should ever defend these kinds of people. Seriously, they make the rest of us look bad.

Before I get into the blog posts, if you read the comments one of the objections to this is that while they have a fanboy shirt, fanboy does mean what the sort of obsessive fan that they talk about here, but fangirl just means any girl who is a fan, and so it’s a problem. Well, let’s make sure that it does, shall we:

From dictionary.reference.com (note that the entry I’m using here is one that combines fanboy and fangirl into one entry):

a person obsessed with an element of video or electronic culture, such as a game, sci-fi movie, comic or animé, music, etc; a person obsessed with any other single subject or hobby

From Oxford American English:

• informal • derogatory An obsessive female fan (usually of movies, comic books, or science fiction).

And from Oxford World English:

A female fan, especially one who is obsessive about comics, film, music, or science fiction

Only the last definition even hints at it applying to all female fans, and still makes it clear that in general it’s meant to apply to obsessive ones. And before anyone uses that to support the claim of a difference between the terms “fanboy” and “fangirl”, here’s the Oxford World English definition of fanboy:

A male fan, especially one who is obsessive about comics, music, film, or science fiction.

So, no, if the fangirl T-shirt is a problem, then so is the fanboy T-shirt, at least in terms of terminology. They mean the same thing.

Now, some have commented that they aren’t really taking exception with the sexism, but with the shirt implying things about how people ought to be fans. The problem is that the terms fanboy/fangirl are usually given to people who … try to tell people how to be fans of a work. Most commonly, what made the terms derogatory is that it refers to people who jump into any conversation about a work and rant about what people should like about a work, insisting that it’s the best thing ever and no one should ever find any flaws or problems with it and that no one should ever, God forbid, not like the work. That’s just inconceivable for the stereotypical fanboy/fangirl. These are the people who give the hobby a bad name. not those who are saying that that sort of obsession isn’t a good thing. So those complaining that this is telling people how to be fans of a genre or work should be the ones who hate fanboys/fangirls the most.

But, aside from that, the sexism really is the big complaint here, and the comments on the Facebook page that it seems that trying to apply a criticism to women seem to be valid. Aside from most of the comments on that page, we have this article from Rebecca Pahle. She starts off in the title talking about “Fake Geek Crap”, which is odd since no one has or does claim that fanboys/fangirls are fake geeks. They can be legitimate geeks. They’re just bad ones. To make that accusation is like saying that alcoholics are fake drinkers; yes, they are still drinkers, and are just too much so. The same can be said for fanboys/fangirls; they’re still fans, but take it too far.

Now, she does manage to stay somewhat focused on telling fans how to like a work, but she does link it to sexism directly here:

…that rightfully got a lot of people ticked off because of the way it perpetuates the toxic “there’s only one right way to be a fan of something” attitude that’s long infected geek culture and often manifests specifically in a way that’s intended to push girls out of geek spaces.

This would seem to imply that there’s an implication here that’s worse for women, and note that her update to the shirt to a more accurate version replaces “hate” with “scared of” which is a common complaint aimed at supposedly sexist geeks who don’t want women to get into the hobby because they’re scared of them. But at least she does say multiple times that it’s about not telling fans how to like a work, which is better than the original post by Greg Rucka, whose title starts by linking it to gatekeeping of women in geek culture and spends most of the post talking about the trials of his daughter and ends with this:

And some asshole thinks selling a shirt that, essentially, says, GURLS STAY OUT is funny. He’s talking to my wife. He’s talking to my daughter. He’s talking to my friends. He’s talking to my fans. He’s talking to some of the best writers in the industry, some of the most gifted artists, some of the most talented creators in the arts.

GURLS STAY OUT. Heh heh heh.

Since Pahle references Rucka’s article to claim that the creators of the T-shirt ignored the main issue of telling people how to be fans, one would assume she’d, well, read the article. And anyone who read that article would certainly forgive them for thinking that the main issue was sexism, not “telling people how to be fans”. In that sense, it almost sounds like “moving the goalposts” is in play here: once the “fanboy” T-shirt was “revealed”, sexism wasn’t as easy a case anymore, so it switches to the real issue being about telling people how to be fans. Again, this wouldn’t be an issue if Pahle hadn’t referenced Rucka’s post, which is clearly more about sexism than about telling people how to be fans.

The facts of the matter are this:

It isn’t sexist to use the term “fangirl” to describe an overly obsessive female fan, particularly one who is annoyingly vocal about that obsession in a way that implies that if you don’t like what she likes, then there’s something wrong with you or you aren’t really a fan or you don’t know what you’re talking about. It is less sexist to do that than to try to lump all of those fans — male and female — into the term “fanboy” which, as anyone who knows anything about feminism knows, normalizes the male and so is incredibly sexist. While it many be debatable, a good case can be made that overly obsessive fans of any gender are a problem for the geek community, precisely because they end up telling people how to enjoy the works or the genres that they refer to, and that is indeed bad for the community (the objections on that point are valid, as far as they go). In the Facebook quote, could the creators of the T-shirt be doing that (some earlier comment/version of the post might have made reference to hetalia shippers and something else, but it’s not there now)? Maybe, and for that they’d deserve criticism. The shirts, however, don’t actually say things like that , and so to harp on that would be nothing more than a distraction from the issues around the shirts, which started the mess in the first place.

There’s nothing wrong with the shirts, as far as I can see. And if people disagree then they can … post comments here (no swearing, please) telling me why I’m wrong.

NHL Playoff Predictions: Round 2

May 1, 2014

So, my record in the first round was a respectable 5 – 3. At least I was over .500. The interesting thing about that was that in the East I went with all of the teams that had home ice advantage and went 3 – 1, while in the West I went with all of the teams that didn’t except for one — Colorado — and went 2 – 2. You could say that maybe I should have picked more teams with home ice advantage except that the one team that I did pick that had home ice advantage lost their series.

Anyway, moving on to Round 2:

Eastern Conference:

Boston vs Montreal Incorrect
Pittsburgh vs Rangers Incorrect

Boston is just an overall better team than Montreal, so they should be able to pull it off.

Fleury wasn’t exactly stellar in the first round, but Pittsburgh still has a lot of offense and the Rangers just came through a really tough series against what I’d at least consider to be a weaker opponent and almost lost it. Pittsburgh should be able to pull it off unless Lundqvist stones them.

Western Conference:

Chicago vs Minnesota Correct
Anaheim vs L.A. Incorrect

Chicago is an overall better team than Colorado was, so they should be able to beat Minnesota, especially since Minnesota would have just come through a tough series.

Anaheim and L.A. is a close one. L.A. won’t give up since they know they can come back from anything, and Anaheim has choked in the past. But I think that Anaheim will be able to run L.A. out of gas and win the series, although if it goes to Game 7 my money would be on L.A.

Round: 1 – 3
Overall Record: 6 – 6