Theory-driven vs Data-driven

So, last time, I talked about how Stephanie Zvan was at least mistaken in claiming that the rationalists she encounters are rationalists in the philosophical sense, like Descartes and Plato. Here, I’m trying to come up with a way to characterize the differences between her views and theirs without relying on the term “rationalism” and so avoiding that confusion. And I think the difference might be that they are theory-driven while Zvan is data-driven.

So what do I mean by that? I’ll call “theory-driven” the idea that we ought to go about determining what propositions to accept by formulating theories about the world and fitting data into those theories. The main idea here is that data by itself has no inherent meaning, but is only given meaning when it is interpreted rationally and in the context of a well-formed theory. Theories can be changed by data or even invalidated by it, but this view would, it seems to me, deny that a useful approach to gaining knowledge would be to go out and gather as much data as one can. Instead, one should gather enough data to form an initial theory, and then add and absorb new data into the model slowly, altering the theory at each point, and using the theory to both predict what data is needed and what the data ought to be. If the data doesn’t fit the model, in a theory-driven approach one might at least initially question the data.

The “data-driven” approach, on the other hand, finds huge value in gathering as much data as possible, and only then tries to formulate a theory based on that data. This view thinks that data in and of itself can push or support an interpretation, and that one’s interpretation improves mostly with better data and not with better theorizing. If the data conflicts with the basic theory, then toss the theory out.

I think the best way to describe this is to consider it in terms of plotting a curve on a set of data points. The theory-driven approach would take a few points, build a curve, and then fit other points onto that initial curve, altering it as little as possible and only radically changing it when they had to, and would extend the curve past the last data point based on the theory behind it. The data-driven approach would generate as many data points as possible and then draw the curve, and be hesitant to go beyond that data set without good reason or more data. As such, the theory-driven approach will tend to go for more depth, while the data-driven approach will go for more breadth: the theory-driven approach will extend to and try to cover more data sets as an attempt to apply the theory to more problems and be a deeper theory, while the data-driven approach will extend to and cover more data sets simply from the fact that they just gathered more data and are trying to iterate over more varied data. As such, theory-driven approaches will adopt a more guided view of gathering data, gathering the data that the theory says they need instead of all data, while the data-driven approach will just gather lots and lots of data and worry about what really fits together later.

At this point, you should be able to see the issues with each. The theory-driven approach will tend towards rationalization, where once you’ve decided on the right interpretation for a set of data you’ll keep rationalizing the data you encounter and doing minor tweaks to the theory until you have to toss it away. The data-driven approach, on the other hand, will tend towards shallow interpretations because it isn’t easy to take a massive set of data and come up with a common interpretation that works, and a data-driven approach is loathe to toss away data if it can help it. Philosophically, the theory-driven approach aligns best with “Arm-chair” philosophizing, while the data-driven approach aligns best with naturalized philosophizing, where you gather up all of the instances of a concept that you can find and then figure out what characteristics matter, and filter out based on that.

I think that the debate that Zvan is going through reflects this fairly well. I think that a lot of those “rationalists” have a theory in mind, and point out that her interpretations don’t fit that theory, at which point she insists that they need to look at the data because it’s obvious, at which point they reply that the data does support her interpretation as obviously as she thinks it does. While some may argue that the social and political issues that she talks about are explained best by those on the social justice side formulating theories and then insisting that they be used despite the data, I see them as being best described as examples of the data-driven model, as their concepts have many potential theoretical issues that they don’t consider particularly important — intersectionality with privilege, for example, or cultural influences and their role beyond them being there — and so are generally theoretically shallow (patriarchy is not very complex theoretically), and that they constantly try to justify those theories not by appeals to the logical consistency of the theory, but by the data they have. Rape culture might be the best example if this, as it incorporates a widely diverse set of phenomena, but the exact theory of rape culture is rather vague, including how one decides if something is an example of rape culture or not.

Science, as it turns out, is neither theory-driven nor data-driven. The arguably most successful scientific field is physics, and physics explicitly incorporates both the theory-driven and data-driven approaches. Theoretical physics is explicitly theory-driven, while experimental physics is explicitly data-driven. And while they do clash at times, overall the success of physics is driven by the interaction between the two, with the theoretical work informing the experimentation and the experimental results feeding back into the theoretical work. In general, both approaches have their places, and which one get precedence in a situation depends greatly on the context of the problem. Which means that those who advocate for science can indeed actually claim that science supports theory-driven or data-driven approaches; it supports and uses both, when appropriate.

Without specific examples of the “naive errors” that Zvan’s “rationalists” are making when they “think but don’t study”, I can’t say whether this really does capture the distinction that Zvan wants to make. But it seems to fit, especially in her description of what frustrates her and if it does fit, then this is a less confusing and hopefully less judgemental way to approach the issues. Remember, neither of these views are bad, but both are required in the proper time and the proper place. There is no shame in being theory-driven or data-driven as long as you understand that you need someone else to do the other approach if you aren’t going to do it yourself if you want to get full knowledge.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: