October 13, 2020 In Social Sciences

Causality in the social sciences: Forget about regularities

I believe that social science is about people doing stuff. This phrase I borrowed from Paul Benneworth, a brilliant colleague and a good friend who left this world much too soon. People doing stuff draws attention to the role of human agency in causal explanation. Unfortunately, we do not see a lot of that in social-science research. Instead, we get abstract mathematical models where human agents disappear behind variables. The following is only a slight simplification of causal explanation in mainstream social science. Variable-based researchers believe in ontological determinism; that causes connect to outcomes via fixed paths, processes or mechanisms. They further believe that fixed paths and mechanisms between causes and outcomes may be observed as empirical regularities between independent and dependent variables. If they had flawless data, variable-based researchers believe, they would find perfect regularities evidencing underlying causal mechanisms.

If human agency matters in causal explanation, the variable-based view of causality is deeply problematic. First, ontological determinism suggests that human agents trigger a causal mechanism simply because a cause is present, which completely ignores the spontaneity and intentionality of human agency. For example, one may find that higher levels of education have an effect on income levels; however, people do not take a higher-paid job just because they are higher educated. In fact, many higher-educated people are happy to sacrifice pay to do what they consider meaningful work. Second, empirical regularities between independent and dependent variables are not objective or neutral observations but theory- and value-laden constructs. What counts as higher educated is perspectival; it means something different in developed versus developing countries, in upper-class versus working-class communities, for a 19th Century sociologist versus a 21st Century sociologist. Partially at least, the regularity you ‘find’ in the data depends on the definitions of your concepts. Moreover, using different indicators or control variables in a statistical analysis changes correlation coefficients between independent and dependent variables, but, obviously, this does not change the underlying relationship between cause and outcome. Effectively this means that variable-based researchers who believe that empirically observed regularities evidence underlying causal mechanisms, actually believe that empirical observations affect the nature of the underlying causality. Such a position would suit social constructionists but it does not at all square with the ontological determinism of variable-based research. Third, obviously, variable-based researchers argue that their causal claims are probabilistic and pertain to a population, not to individual cases. That is, the effect of education on income means that level of education correlates significantly with level of income in a population. Consequently, an increase in an individual’s level of education only increases the probability of their achieving a higher level of income. Great. Now agency is transferred from human agents to variables. Causality is now a matter of an independent variable ‘doing something’ to a dependent variable — producing a causal effect, a variable-based researcher would say. However, variables do not exist; they are statistical abstractions. Variables do not do anything much less have a causal effect on anything. To think of causality as an independent variable ‘doing something’ to a dependent variable is the anthropomorphization of statistical abstractions. It has no bearing whatsoever on people doing stuff.

The conflating of empirical regularities and underlying causal mechanisms and the anthropomorphization of statistical abstractions leads to spectacular derailments in the practice of variable-based social science. My favourite is the paper of Alfonso Gambardella and colleagues on openness and regional performance, published in Regional Studies in 2009. They argue, quite reasonably, that the socio-cultural openness of a region strengthens its economic performance because openness produces new ideas, which lead to innovation, which, in turn, strengthens economic performance. That is, the openness of a region is connected to its economic performance through an innovation mechanism. To indicate regional economic performance Gambardella and colleagues use labour productivity, which is a slight stretch but never mind. But now this, to indicate openness they use the number of hotel guests relative to the region’s population. Of course, North Korea has few hotel guests and is not particularly open. But is hotel guests a suitable indicator for openness in European regions, the context of the study of Gambardella and colleagues? Moreover, what do they want us to believe? That hotel guests spend their time discussing new ideas with the regional workforce rather than shopping, dining, site-seeing, etc.? Do we really believe that an empirical correlation between hotel guests and labour productivity evidences an innovation mechanism connecting new ideas to economic performance? And if not, why do we accept this work as a contribution to science instead of dismissing it as fiction?

The mistaken believe in mainstream social science that empirical regularities evidence underlying causal mechanisms gave birth to an obsession with statistical models. The obsession is such that mathematical sophistication to estimate the ‘correct’ causal effect overshadows not only people doing stuff but also common sense. If empirical considerations allow the use of indicators that have no realistic bearing on the concepts they are supposed to measure, statistical modelling stops being science and becomes a scientism — an anti-scientific scientism. To me, the paper of Gambardella and colleagues is a requiem to logic.

Social science must be about people doing stuff, or it is not social. So do not hide people and the stuff they do behind variables. Stop talking about indicators and get real about concepts. Put differently, ontology (concepts and meaning) comes first; epistemology (indicators and measurement) derives from ontology. Deleting an indicator to measure a variable may increase Cronbach’s alpha (epistemology); however, ‘being in a stable relationship’, ‘having many friends’ and ‘being materially well-off’ measures a very different kind of happiness than only ‘being in a stable relationship’ and ‘having many friends’ (ontology). Finding empirical regularities is important because it is the only way for social scientists to know that there may be an underlying connection between a putative cause and an outcome. However, whether an empirical regularity reflects a causal relationship requires substantive interpretation; i.e. thinking in terms of people doing stuff — how and why the presence of a cause makes it possible for people to do the kind of stuff that achieves the outcome.

Of course, variable-based researchers know that correlation is not causation. The problem is how to distinguish between empirical regularities that are causally interpretable and those that are not. Using something like the P-value is not a good way to distinguish between genuine and spurious correlations because the P-value, too, is a correlational metric. This makes using the P-value, or other correlational metrics, to decide which correlations are causally interpretable a bit like saying; I believe this colour is green, because I believe it is green. Fundamentally, empirical analysis describes patterns in the data. However, one description is not better than another because it is empirically more robust, because it has a higher P-value. There is no meaningful difference between a P-value of 0.1, 0.05 and 0.01. From a probabilistic perspective, they all reflect valid descriptions of the data. What makes a good, i.e. causally interpretable, description is not the robustness of the empirical regularity but whether it makes sense from the perspective of people doing stuff.

Leave a Reply

Enter Captcha Here : *

Reload Image