This essay will explore a case study that I will argue exemplifies an area of scientific practice being influenced by contextual values. By contextual values I mean, for example, the various biases, interests, and ideological and political values, and so forth, that individual scientists harbor. Specifically, I will be looking at the research program in social psychology that has aimed to develop and empirically test the psychometric instrument known as the ‘Implicit Association Test’ (henceforth referred to as IAT), which purports to uncover the unconscious racial biases of individuals (in addition to other unconscious biases). The primary argument I will try to make is that, for a time, contextual values allowed the IAT to gain acceptance within social psychology despite, I contend, the instrument itself being evidentially impoverished and turning out to be fatally flawed. In addition to affecting the credibility process of scientific practice in relation to the IAT, I’ll also argue that contextual values have affected the discovery process, too, via at least one of the IAT’s primary advocates. From an ethical point of view, the intrusion of contextual values into this area of research can be seen as all the more troubling given the relatively widespread use of the IAT not only by a vast number of individuals, but also by a large number of organizations in the wider society—and all, seemingly, without any protestations by its chief advocates. To put the matter bluntly, it can be seen as a form of ethical malpractice to state or otherwise imply that a psychological instrument does X—in this case, diagnose whether and to what extent an individual harbors implicit racial bias—when, in fact, it cannot do so. Before attempting to make the case for the intrusion of contextual values into this area of research, however, I’ll very briefly survey the IAT and its critical flaws.
2. Assessing the IAT
The IAT’s chief architects are the psychologists Mahzarin Banaji and Anthony Greenwald, who unveiled it in 1998 as a scientific instrument that allegedly could assess the degree to which an individual harbored any of a number of implicit biases (Greenwald, Nosek & Banaji, 2003). Most researchers that have utilized the IAT in their attempt to study implicit bias have tended to focus on racial implicit bias in particular (usually pertaining to blacks and whites). Very generally speaking, the IAT attempts to generate an indicator of one’s supposed implicit bias by assessing the degree to which a test-taker is quicker to associate a positive term, such as ‘good’ or ‘nice’, with pictures of white faces than they are with pictures of black faces; and also the degree to which a test-taker is quicker to associate a negative term, such as ‘bad’ or ‘scary’, with pictures of black faces than they are with pictures of white faces (it is worth noting that in the experimental setup of the IAT, faces and words are displayed simultaneously.) The IAT also makes for an interesting case study given the fact that it is widely recognized among the general public, with over 17-million unique test sessions having been taken at the test’s official website (Project Implicit, 2017). As alluded to, the IAT is designed to measure implicit bias with regard to a number of different targets (i.e., elderly people, racial minorities, overweight individuals), but most of the research focus and public attention has been directed to the racial variant of the test, and, unless stated otherwise, my discussion will be exclusively about the racial version.
To understand why the IAT is manifestly not a good measure of what it purports to measure—namely implicit bias—we need to examine how it fares in relation to two key concepts: reliability and validity (Kalat, 2014, ch. 9). Psychometrically, the concept of reliability, broadly and roughly speaking, refers to the degree to which a given measurement instrument yields similar measurements when administered on separate occasions—for instance, when the test is administered to the same individual, say, two hours apart, four weeks apart, one year apart, and so on. The concept of validity, on the other hand—and again, broadly and roughly speaking—refers to how well the test in question can account for or predict what it alleges to account for or predict. So, if a large number of people are administered a given test instrument on many different occasions, the test is said to be reliable to the extent that individuals’ results are similar on all of the occasions tested, and unreliable to the extent that individuals’ results vary between the occasions tested. And a given test instrument is held to be valid to the extent that it can account for or predict the phenomenon it purports to account for or predict.
We can now ask how the IAT fares in relation to these two critical features. And as it turns out, the IAT performs abysmally. Although there is a rather involved and technical back-and-forth between its proponents and detractors in the technical peer-reviewed literature, we can for our purposes zoom in on some of the key, recent overviews and meta-analytical studies done on both the reliability and validity of the IAT. For instance, in a review of the literature, two of the authors of which were Banaji and Greenwald, the IAT was reported as having a reliability ranging from .32 to .65 (Lane, Banaji, Nosek & Greenwald, 2007). (In particular, the reliability was .32 when comparing tests performed two weeks apart (which included four tests in total); a reliability of .65 when comparing tests performed twenty-four hours apart (which included two tests); and a reliability of .39 when comparing tests performed within a single testing session (which included two tests)). Bar-Anan and Nosek (2013) also declared an IAT reliability of .40. Now, with these results in view, it is important to ask how to make sense of the values reported—in particular, what they say about the reliability of the IAT itself. Broadly speaking, a test instrument in psychological science is generally accorded as being acceptable if it reaches a reliability of approximately .80 (although this value can vary). And clearly, none of the best and most comprehensive estimates of the IAT reach that threshold. Indeed, there is wide divergence between the highest and lowest values reported, and, rather anomalously, reliability is significantly higher between tests occurring only twenty-four hours apart than when the tests are spaced out within the same session.
With regard to validity, Greenwald, Poehlman, Uhlmann, and Banaji (2009) reported that IAT scores accounted for approximately 5.5 percent of the discriminatory behavior measured in the lab. A separate meta-analysis disputed this figure, arguing it was an overestimate predicated on a methodological error and should instead be adjusted downward (Oswald, Mitchell, Blanton, Jaccard & Tetlock, 2013). In any case, such a very low figure is especially noteworthy for at least three reasons. Firstly, because it pales in comparison to how powerful its proponents have alleged the IAT is as a predictor of discriminatory behavior on the whole. Secondly, because even being able to find any correlations between IAT scores on the one hand, and discriminatory behavior within a lab setting on the other, still leaves it an open question as to whether such correlations can account for any discriminatory behavior in real-world contexts. And thirdly, because, as matter of rudimentary statistical reasoning, it still remains the case, even within those lab settings, that correlations do not necessarily entail causal processes (as the correlations could potentially be accounted for by other variables once they are identified and controlled for). Just as noteworthy, however, even the main advocates of the IAT have recently conceded that it is not reliable as a measure of an individual’s implicit bias, and hence shouldn’t be used as such (Greenwald, Banaji & Nosek, 2015).
3. A Case for Political Bias
This last concession turns out to be of importance. For, despite the terrible psychometric and empirical standing of the IAT, its two chief proponents, Greenwald and Banaji, still appear to persist in advocating for its utility. As an elementary matter of scientific ethics, it is troubling that the IAT, despite its inability to indicate anything about an individual’s proclivity to behave in a racially discriminatory manner, is still nonetheless available for public use at the Project Implicit website. It stands to reason, then, that test-takers might be lead to believe that the test results do indicate something about their unconscious capacity to act in biased, discriminatory ways toward black people, for example (including, even, black people interacting with other black people). One way of attempting to assess whether the choice by the researchers to leave the IAT accessible to the public is a choice contaminated by contextual values is to evaluate it in light of a thought-experiment proposed by Tetlock (1994) (who incidentally is one of the IAT’s most active critics), namely the ‘turnabout test’. For our purposes, we can use the turnabout test to ask the following question: Would it be acceptable by the community of psychologists if a psychometric instrument that was widely viewed as concordant with and advancing a politically conservative agenda, after having been shown to be as badly supported as the IAT, was nonetheless still accessible to the public, widely used by various organizations, and still advocated for by its originators? I will leave it to the reader to attempt to honestly answer this query. However, to help put the thought-experiment in context, one should be made aware of the ideological breakdown of current psychologists. As it turns out, psychologists overwhelmingly report being left-of-center politically—i.e., liberal, progressive, socialist—and there are very few who report as being, say, libertarian, and vanishingly few reporting as politically conservative (although there is some reason to think that the number of libertarians and conservatives is an underestimate, given pressures to not ‘out’ oneself as such) (e.g., Inbar & Lammers, 2012; Duarte et al., 2014). One might plausibly argue that if the field were (counterfactually) dominated by libertarians and conservatives, there might be a tendency to be less worried about empirically unsupported test instruments being administered and widely applied in the way that the IAT apparently is, but where the test instruments in question would be perceived as being friendly to a libertarian or conservative agenda. Assuming this kind of symmetry in the way in which political values might govern this hypothetical tendency to go lightly on test instruments (and theories, etc.) that broadly support or are consistent with the political views of the regnant majority of a field, it may also serve as an argument in favor of aspiring toward incorporating more political and ideological diversity in the field. Of direct relevance to this are arguments advanced by feminist philosophers such as Longino (1990), Harding (1991), and others to the effect that the processes of both scientific discovery and credibility are enhanced substantively, in general, to the extent that a plurality of perspectives are reflected by scientific practitioners. Indeed, Duarte et al. (2014) provide cogent illustrations of this perspective in a recent paper that precisely argues in its favor within social psychological science. Specifically, given the current domination of psychology by those with political stances that are left-of-center, they argue that the discovery and credibility dimensions of scientific practice can benefit greatly if more viewpoint (i.e., political) diversity were incorporated into the discipline.
As applied to the current case study of the IAT, one could plausibly argue that increased political diversity in psychology might very well have caught the troubles with the IAT at a much earlier stage of the credibility process. That is to say, had there been, say, more psychologists with libertarian and conservative political views, the IAT might have been scrutinized more rigorously and critically by journal editors, peer-reviewers, Banaji and Greenwald’s colleagues and students, and so forth. Because it is widely construed as being friendly to and advancing a progressive political agenda, the IAT appears to have been evaluated less stringently during the credibility process by journal editors, peer-reviewers, and others. In any case, we might surmise that the internal error-detecting and error-correcting mechanisms of the scientific credibility process were delayed in being fully brought to bear against the IAT, by dint of the relatively few non-liberals and non-progressives in psychology—although it should be pointed out that seemingly none of the IAT’s critics are self-identified conservatives.
Apart from the credibility process, however, it is evident that the major advocates of the IAT, particularly Banaji and Greenwald, were quite vocal champions of it long before adequate evidence of its reliability or validity had been demonstrated (e.g., Singal, 2017). Now, it is important to be fair and charitable to both Banaji and Greenwald and thus not impugn their motives without good evidence that they were motivated to champion the IAT at least partly for political reasons. Although I will not draw a firm conclusion on whether either one of them or both are motivated to champion the IAT for political reasons, I will at least offer some reasons to think why it may be the case. In order to examine this possibility, I will deploy the argument structure used by the philosopher of science Sesardic (2005), who, in nutshell, attempts to ground claims of politically-motivated scientific errors by simply looking to statements made by the scientists found guilty of such errors. A clear virtue of this approach is that it sidesteps the tricky and morally fraught issue of baselessly accusing people of politically-motivated bias (baseless in the sense of making accusations without evidence). In trying to ascertain whether a scientist has committed an error due to political bias, Sesardic (2005) suggests the strategy of trying to verify the following:
(1) that scientist X who is accused of a mistake did actually commit the mistake, (2) that the mistake is a serious blunder, rather than one of those bona fide errors that are expected to happen sporadically in the course of normal scientific work, (3) that X also had a particular political attitude, and (4) that the mistake was really due to the influence of that political attitude. (p.186)
So far as trying to verify steps (3) and (4), one suggestion offered by Sesardic is to look for clear statements made by a given scientist that betray one (or more) of their political beliefs, and then additionally show that their error was influenced by that belief. Most plainly, this can be revealed by statements indicating that the scientist’s promulgation of an erroneous view is connected in some way to that political belief.
Since we have seen that the IAT is unfounded as an indicator of an individual’s unconscious bias, we can look to a statement made by Banaji after it had become clear that the IAT was unreliable and not a valid predictor of bias, and after she had conceded as much, in order to test whether her scientific position is in any way influenced by her political beliefs. To begin with, we can note that in an interview published in October 2014, she declared that “I was raised to be a progressive” (Philip, 2014). This, I presume, is a plausible yet fallible indicator of her current political orientation as well (but of course, technically speaking, this may be an erroneous inference). In another recent statement made to a journalist, published in January 2017, and in the context of replying to some of her academic critics, she states:
There’s too much interesting stuff to do and too many amazing people doing it for me to justify worrying about a small group of aggrieved individuals who think that Black people have it easy in American society and that the IAT work might make their lives easier. (Singal, 2017)
This clearly indicates that Banaji believes that the IAT might make the lives of black Americans easier—a sentiment that connects with her progressive political orientation. So, given everything up to this point, we can plausibly but fallibly (!) infer the following: that Banaji is someone who is politically progressive; is of the mind that the IAT might make the lives of black Americans easier; publicly championed (along with her colleague Greenwald) the IAT long before adequate evidence of its reliability and validity was in hand; and, even after the IAT has been shown to rest on unsupportable foundations, and even after having personally conceded as much, still advocates for its social utility. From this, we can attempt to infer, again fallibly, that she may have been led to commit a scientific error in a way that was connected to her progressive political beliefs. I should add that in fielding such a case against Banaji, it should in no way be taken to be necessarily impugning her political beliefs, or as impugning the idea that discrimination on the basis of arbitrary characteristics is immoral.
To sum up, this essay has tried to make the case that contextual values have intruded into the scientific discovery and credibility processes in relation to the IAT. With regard to the credibility process, it appears as if the overwhelmingly left-of-center political views of social psychologists have allowed the IAT to be scrutinized less intensely. Quite plausibly this is because the IAT is generally viewed as supporting and advancing a politically progressive agenda. Finally, with regard to the discovery process, a reasonable but fallible case can be made that the IAT was developed, championed, and is still supported by one of its originators, Mahzarin Banaji, at least partly because of her political beliefs, and despite no evidence of its reliability having yet been produced.
 See Grinnell (2009) for an exposition of the discovery and credibility processes of scientific practice.
 My analysis of the IAT draws from Singal (2017).
 In his case, Sesardic (2005) asks how we might detect political motives among scientists, philosophers of science, and others when it comes to debates in behavior genetics, including the one between ‘hereditarians’ and ‘environmentalists’ with regard to racial differences in general cognitive ability.
Bar-Anan, Y., & Nosek, B. (2013). A comparative investigation of seven indirect attitude measures. Behavior Research Methods, 46(3), 668-688. http://dx.doi.org/10.3758/s13428-013-0410-6
Duarte, J., Crawford, J., Stern, C., Haidt, J., Jussim, L., & Tetlock, P. (2014). Political Diversity Will Improve Social Psychological Science. Behavioral And Brain Sciences, 1-54. http://dx.doi.org/10.1017/s0140525x14000430
Greenwald, A., Banaji, M., & Nosek, B. (2015). Statistically small effects of the Implicit Association Test can have societally large effects. Journal Of Personality And Social Psychology, 108(4), 553-561. http://dx.doi.org/10.1037/pspa0000016
Greenwald, A., Nosek, B., & Banaji, M. (2003). Understanding and using the Implicit Association Test: I. An improved scoring algorithm. Journal Of Personality And Social Psychology, 85(2), 197-216. http://dx.doi.org/10.1037/0022-3518.104.22.168
Greenwald, A., Poehlman, T., Uhlmann, E., & Banaji, M. (2009). Understanding and using the Implicit Association Test: III. Meta-analysis of predictive validity. Journal Of Personality And Social Psychology, 97(1), 17-41. http://dx.doi.org/10.1037/a0015575
Grinnell, F. (2009). Everyday practice of science. Oxford: Oxford University Press.
Harding, S. (1991). Whose science? Whose knowledge?. Ithaca, NY: Cornell Univ. Press.
Inbar, Y., & Lammers, J. (2012). Political Diversity in Social and Personality Psychology. Perspectives On Psychological Science, 7(5), 496-503. http://dx.doi.org/10.1177/1745691612448792
Kalat, J. (2014). Introduction to psychology (9th ed.). Belmont, CA: Wadsworth Cengage Learning.
Lane, K. A., Banaji, M. R., Nosek, B. A., & Greenwald, A. G. (2007). Understanding and using the implicit association test: IV. Implicit measures of attitudes, 59-102.
Longino, H. (1990). Science as social knowledge. Princeton, N.J.: Princeton University Press.
Oswald, F., Mitchell, G., Blanton, H., Jaccard, J., & Tetlock, P. (2013). Predicting ethnic and racial discrimination: A meta-analysis of IAT criterion studies. Journal Of Personality And Social Psychology, 105(2), 171-192. http://dx.doi.org/10.1037/a0032734
Philip, J. (2014). Mahzarin Banaji – Zooming in on blind spots. Live Mint. Retrieved 12 August 2017, from http://www.livemint.com/Companies/FLdG6A8ft9MJhASrJvwNpO/Mahzarin-Banaji–Zooming-in-on-blind-spots.html
Project Implicit. (2017). Retrieved 12 August 2017, from https://implicit.harvard.edu/implicit/
Sesardic, N. (2005). Making sense of heritability. Cambridge: Cambridge University Press.
Singal, J. (2017). Psychology’s Favorite Tool for Measuring Racism Isn’t Up to the Job. Science of Us. Retrieved 12 August 2017, from http://nymag.com/scienceofus/2017/01/psychologys-racism-measuring-tool-isnt-up-to-the-job.html
Tetlock, P. (1994). Political Psychology or Politicized Psychology: Is the Road to Scientific Hell Paved with Good Moral Intentions?. Political Psychology, 15(3), 509. http://dx.doi.org/10.2307/3791569