‘The data’ and the replication crisis in psychology.

Second Workshop Perspectives on Scientific Error
19-21 September 2018, Groningen, Nederland


‘The data’ and the replication crisis in psychology.
A qualitative case analysis of validity of data collection in psychotherapy research


Short Abstract
The replication crisis in psychological science was shown by re-analysis of ‘the data’. Explanations are often focused on method, ranging from selective publication to erroneous or even fraudulent research conduct. We explore a more basic empirical explanation: we scrutinize validity of ‘the data’, as data has to be valid to allow for validity in every step of evidence generation. We use a qualitative case study to discuss how patients’ stories are translated into quantitative data in psychotherapy research. We discuss epistemic consequences of found validity issues and argue for the need of a concept of ‘validity of data collection’.

Extended Abstract
The replication crisis in psychological science came to the fore by re-analysis of ‘the data’. Explanations for the lack of replicability are often focused on methodological design and analysis[1], ranging from biased interpretations and selective publication to questionable or even fraudulent research conduct. This focus yields the implicit assumption that when the procedure of analysis of ‘the data’ would be conducted properly, the resulting evidence would be valid by principle.[2]However, this assumption leaves ‘the data’ untouched, while in fact the validity of data is key to the validity of all the subsequent steps of analysis: if the data is invalid, every analysis and evidence will be invalid by principle. Therefore, in this paper, we scrutinize validity of ‘the data’ to explore a more basic empirical explanation for the replication crisis in psychological research in general and psychotherapeutic research in particular. We base our argumentation on our analysis of a case, in which we scrutinized how participants’ stories are translated into the quantitative data that travels through all the subsequent steps of evidence generation.[3]

In psychotherapeutic research, collected data is predominantly quantitative, as the default means of data collection are validated self-report questionnaires that are scored on numerical scales.[4]Although evidence for treatment efficacy could be gained using alternative assessment methods, statistical analyses are key to the ‘top methods’ in the hierarchy of evidence[5]and therefore quantitative data collection remains default.[6]Provided that probability statistics are fundamental to the definition of causation that is entertained in the much used randomized controlled research design[7], and that the evaluation of evidence in systematic reviews and meta-analyses is predominantly pursued statistically, the choice for numerical data at the basis of psychotherapy research seems vital for its proceedings up to the level of the Evidence-Based Treatment (EBT).

In this paper, we zoom in onto that basic numerical level of research. In the current paper, we use the findings of a case study to provide an empirical illustration of validity issues in ‘the data’ and its collection, and we argue for the need of a broader concept of validity to properly capture the validity of the data collection process. Specifically, we discuss a case who participated in our psychotherapy study[8]and thus completed a battery of self-report questionnaires that are much used for outcome measurement. June[9], the patient, annotated her paper-and-pencil questionnaires in a very noticeable way, thus providing a particularly lucid insight into the story behind the numbers. We combined a qualitative analysis of her narrated experience with a visual analysis of the annotated questionnaires. This way, we aimed to analyse June’s experience of telling her story via questionnaires, but also to go beyond the surface and scrutinize how this experience was embedded in the complaints that got June to start therapy in the first place.

For June, the questionnaire administration was by no means self-evident. Her experience echoes critiques in the literature on a number of levels. First, June expressed a troubled interpretation of the questionnaires themselves. She felt conflicted about the meaning of items, regarding terms and definitions used in the items, the generality of instructions and the scoring options on the response scales. Both the narrated experiences and the multitude of written and drawn re-interpretations, annotations and re-scalings on the paper-and-pencil questionnaires, give a vivid illustration of the cognitive challenges that were described by Schwarz[10]. This may at least cause differences in interpretation by different respondents – thus harming between-subjects comparability – and at most cause an entirely invalid measurement of the intended construct. Second, the communicative problems of scales that Schwarz8(ibid.) put forward as a threat to questionnaire validity also played a substantial role in June’s experience, as she appeared to be very aware of the ‘audience’ of the questionnaires. This awareness showed in her persistent alteration and annotation of provided scales, to guarantee that the administrator would understand the nuance of the story every time again. Although this is a straightforward attempt by June to make her responses morevalid, practically it forms a significant threat to validity of research, as it forces a nearly impossible decision onto the researcher to pick and choose between losing meaningful and valuable information orcomparability and generalizability. Thirdly, June appeared to be highly aware of the evaluationof the other who would read her questionnaires, which made her fear the verdict that might come out of the scores. Her awareness of the intentions of the administrator also set a process in motion in which June struggled with her own aversion towards responding socially desirable, which she would find dishonest.

At this point, June’s experiences go beyond the communicative and cognitive problems that Norbert Schwarz8described in his renowned psychometric critique on self-report questionnaires. Schwarz’8description regards the problems in gaining a valid report, that is, in withdrawing sound information from the respondent that could give the administrator an accurate image of the inquired state in the respondent. However, June’s case makes clear that questionnaire administration in psychotherapy research is not simply a withdrawal of information from ‘the object of interest’ but involves an active process of meaning-making by the subject who is responding[11]. Asking questions – such as in particular questionnaires or in a research procedure at large – can sort an effect of its own, especially in psychological or psychotherapeutic research.[12]In June’s case it became clear that the very act of asking the question interferedwith her answers. That way, questionnaire administration becomes performative.Such an effect is particularly salient in psychotherapy research: if the act of symptom assessment is able to change the symptomatology under study, the measure may not be able to measure the ‘intended’ variable anymore, as that variable was changed. Consequently, the data collected by these questionnaires may thus become invalid because the administration itself became performative towards the variable under study.

In the current paper, we discuss the epistemic consequences of validity issues that we found in June’s collected data, and we argue for the need of a concept of validity that can properly capture the validity of data within the epistemic endeavour. By discussion of performativity of data collection, we reflect on the feasibility of generalizing data that is loaded with more and more diffuse meaning than the numerical language would suggest. Further, we discuss the consequences of our findings for the way that this data travels through the steps of analysis up to the level of treatment efficacy[13]. This way, we elaborate how validity issues in ‘the data’ may yield a particular empirical explanation for the problem of replicability in psychotherapeutic research in particular and in psychological research at large.



[1]Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science, 22, 1359–1366.

[2]This argumentation resonates with what Douglas (2009) calls the ideal of ‘procedural objectivity. In Truijens (2017), this ideal shown to be problematic when applied in psychotherapy research. Cf. Douglas, H. E. (2009). Science, policy, and the value-free ideal. Pittsburgh: University of Pittsburgh press and Truijens, F. L. (2017). Do the numbers speak for themselves? A critical analysis of procedural objectivity in psychotherapeutic efficacy research. Synthese, 194, 4721-4740.

[3]Truijens, F. L., Desmet, M., De Coster, E., Uyttenhove, H., & Deeren, B. (forthcoming). When quantitative measures become a qualitative storybook. A qualitative and visual analysis of the validity and performativity of questionnaire administration for a case in psychotherapy research. Manuscript under review.

[4]Wampold, B. E., & Imel, Z. E. (2015). The great psychotherapy debate. The evidence for what makes psychotherapy work (2nd ed.).New York: Routledge.

[5]Cf. Cartwright, N. (2007). Are RCTs the gold standard? Biosocieties, 2, 11-20.

[6]Chambless, D. L., & Hollon, S. D. (1998). Defining empirically supported therapies. Journal of Consulting and Clinical Psychology, 66, 7–18; cf. Cartwright, N. (2007). Are RCTs the gold standard? Biosocieties, 2, 11-20.

[7]Hyde, P. (2004). Fool’s gold: Examining the use of gold standards in the production of research evidence. British Journal of Occupational Therapy, 67, 89-94.

[8]‘SCS’, cf. Cornelis, S., Desmet, M., Meganck, R., Cauwe, J., Inslegers, R., Willemsen, J., Van Nieuwenhove, K., et al. (2017). Interactions between obsessional symptoms and interpersonal dynamics: An empirical case study.Frontiers in Psychology, 8, 960.

[9]‘June’ is a pseudonym. All identifiable information has been changed.

[10]Schwarz, N. (1999). Self-reports. How the questions shape the answers. American Psychologist, 54, 93–105; Schwarz, N. (2007). Cognitive Aspects of Survey Methodology. Applied Cognitive Psychology, 21, 227-287.

[11]McCambridge, J. (2015). From question-behaviour effects in trials to the social psychology of research participation. Psychology and Health, 30, 72-84.

[12]McClimans, L. (2010). A theoretical framework for patient-reported outcome measures. Theoretical Medicine and Bioethics, 31, 225–240; McClimans, L. (2011). The art of asking questions. International Journal of Philosophical Studies, 19, 521-538.

[13]Truijens, F. L., Cornelis, S., & Desmet, M. (2018). Validity beyond measurement. On epistemic validity of test validity in psychotherapy research. Manuscript under review; Truijens, F. L. (2017). Do the numbers speak for themselves? A critical analysis of procedural objectivity in psychotherapeutic efficacy research. Synthese, 194, 4721-4740.


Comments are closed.