The evaluation of psychological treatment

Paul Crits-Christoph

Mary Beth Connolly Gibbons

Introduction

Psychotherapy continues to be a widely practised treatment for psychiatric disorders and other problems in living. Since publication in 1952 of the well-known article by Hans Eysenck,⁽¹⁾ in which he claimed that there was no evidence that psychotherapy was effective, there has been an accelerating literature concerned with methodologies for evaluating psychotherapy, as well as specific studies demonstrating the efficacy, or lack thereof, of various psychotherapies. In more recent years, pressures from the government agencies and insurance companies that bear much of the cost of mental health treatments have added to the call for accountability regarding psychotherapeutic treatment.

Despite a vast literature of over 1000 outcome studies of the effects of psychotherapy, questions remain about the role of psychotherapy as a treatment for mental disorders. Extensive metaanalytical reviews of the psychotherapy outcome literature provided evidence that, generally speaking, psychotherapy appears to be efficacious.⁽²⁾ While encouraging, this information was not particularly useful. As with any medical problem or disorder, the relevant public health clinical question is whether a treatment is beneficial for the presenting problem or psychiatric disorder for which help is sought. Along these lines, a number of efforts have been made at summarizing the results of the psychotherapy outcome literature in terms of what works for different disorders or problems.⁽³^, ⁴⁾ For example, these efforts have arrived at conclusions such as ‘cognitive therapy is efficacious in the treatment of major depressive disorder’.

The simplicity and clinical appeal of such conclusions, about which psychotherapy treatments work for which patient problems, belies a host of more complex issues regarding how one evaluates psychotherapy and makes a decision about whether treatment ‘works’ or not. Other treatments within psychiatry, such as pharmacotherapy, lend themselves to rather straightforward designs (namely placebo-controlled randomized clinical trials) that permit clear inferences about the efficacy of a treatment approach. In contrast, research on psychotherapy as a verbal interchange between two or more participants does not have the luxury of such straightforward pharmacotherapy research designs. Instead, psychotherapy outcome research is characterized by the use of a variety of research designs and methods that, while often not without limitations to strong scientific inferences about treatment efficacy, can provide incremental scientific advance in the understanding of the usefulness of psychotherapeutic treatments. The aim of the current chapter is to provide an overview of approaches to the evaluation
of psychological treatments. We begin with a discussion of specific research designs employed in psychotherapy outcome research, with a discussion of some of the broad issues that currently guide the selection among these different experimental designs. This is followed by a selective review of assessment strategies for outcome evaluation, with discussion of examples of instruments.

Issues in planning research evaluating psychotherapy

A number of other sources provide a detailed discussion of issues involved in planning a study on psychotherapy, as well as explication of various research designs. In particular, our presentation draws heavily from Kazdin,⁽⁵⁾ supplemented with writings that illustrate more recent trends in both design and methodology. There are, of course, a wide range of decisions to be made in designing an evaluation study of psychotherapy. These decisions affect the choice of patients, therapists, control groups, data analytical strategies, etc. Table 6.1.2.1 presents a list of the types of questions that need to be asked in designing or evaluating a study of psychotherapy outcome. A discussion of some of the key methodological issues that cut across many of the questions raised in Table 6.1.2.1 follows.

Internal versus external validity

An initial important decision in planning an evaluation of psychotherapy outcome, or any intervention, is the relative emphasis on internal versus external validity of the inferences from the investigation. Internal validity refers to the extent to which inferences can be attributed to the intervention per se, as opposed to other factors. In order to maximize internal validity, the investigator attempts to control as many of the extraneous factors as possible through a variety of procedures including, among others, random assignment, the use of control groups, assessing subjects in the same ways and at the same point in time, and careful selection of a relatively homogeneous subject sample. With as many factors as possible held constant across treatment groups except for the nature of the intervention, an outcome difference detected between the experimental and control group(s) can be attributed to the intervention, rather than other factors.

External validity, in contrast, refers to the extent to which the results of a study can be generalized to other subjects, settings, treatment durations, and treatment providers other than those used within the specific study. In regard to the evaluation of psychotherapy outcomes, external validity is often invoked to raise the question of whether study results pertain to the ‘real world’ in which psychotherapy is practiced—the diverse set of patients, therapists, and settings occurring in the community that may be quite different from the conditions of an investigation conducted in a research setting.

Clearly, both internal and external validity are important, but it is difficult to maximize each within the context of the same study. Studies of homogeneous patient samples, for example, may have high internal validity, but generalize poorly to the mix of heterogeneous patients seen in clinical practice. The relative merits of studies with high internal versus external validity have been a source of ongoing debate among psychotherapy outcome researchers.

Different research designs are more or less appropriate depending upon the scientific question of interest. For example, the process of developing and testing new treatments generally proceeds stepwise, beginning with individual case reports and then progressing to an ‘open-label’ (a term derived from pharmacotherapy research) trial involving the application of a single treatment to a relatively small group of patients. Following an open-label trial, a promising treatment would then be tested within the context of a controlled, efficacy trial. In this efficacy trial, the treatment would be tested under ideal circumstances (for instance, by highly trained clinicians). If an effect is found in the controlled efficacy trial, a controlled, effectiveness trial is the next step. In this effectiveness trial, the treatment would now be tested under more ‘real world’ conditions. This line of research is oriented towards understanding whether the treatment per se is responsible for change (efficacy trial) and whether the effect generalizes (effectiveness trial).

Naturalistic studies represent an alternative type of effectiveness trial in which the scientific question is usually not a focus of type of treatment. Instead, such studies might examine the relationship of patient characteristics, therapist factors, or length (dose) of treatment to outcome.

Selection criteria for psychotherapy outcome studies

The choice of selection criteria for a psychotherapy outcome study depends, of course, on the nature of the research question to be asked. From a public health perspective, samples are usually chosen based upon the presence of a discrete disorder or problem that has significance to society. The selection of the target disorder, however, is only the beginning of the selection process. For studies of DSM Axis I non-psychotic disorders, it is typical that other major psychotic disorders such as schizophrenia and bipolar disorder are excluded from the study. However, there is wide variability across research studies in the extent to which other Axis I and Axis II disorders are included in a study or not.

This aspect of selection criteria relates primarily to the internal versus external validity distinction discussed above. Studies that emphasize internal validity will probably exclude many comorbid diagnoses, while studies that maximize external validity will tend to be more inclusive. As the comorbidities among Axis I diagnoses can be high, the impact on the nature of the patient sample selected can be considerable.

Naturalistic studies that focus on psychotherapy per se, rather than public health concerns, are oriented towards external validity and typically do not have restrictive selection criteria. For these studies, the question is ‘how effective is psychotherapy for the types of patients that end up in psychotherapeutic treatment in the community?’ Thus, few, if any, selection criteria are specified.

One particular selection problem that affects any type of psychotherapy outcome study is whether or not patients currently treated with psychotropic medication are included in the evaluation study. Once again, from the point of view of internal validity—attempting to attribute the treatment outcome to the psychotherapy treatment per se—patients on concurrent medication treatment are usually excluded. In contrast, external validity concerns would lead to the inclusion of patients on medications, since increasing numbers of patients in the community with anxiety and affective
disorders are receiving psychotropic medications for their problems. Often, a compromise is struck: patients on medications are eligible for the psychotherapy study as long as they (and their prescribing doctor) agree to maintain a stable dosage of the medication for the duration of the psychotherapy study.

Table 6.1.2.1 Selected questions to raise in planning a study of psychotherapy

Sample characteristics
1	Who are the subjects and how many of them are there in this study?
2	Why was this sample selected in light of the research goals?
3	How was this sample obtained, recruited, and selected?
4	What are the subject and demographic characteristics of the sample (e.g. sex, age, ethnicity, race, socio-economic status)?
5	What, if any, inclusion and exclusion criteria were invoked (i.e. selection rules to obtain participants)?
6	How many of those subjects eligible or recruited actually were selected and participated in the study?
7	With regard to clinical dysfunction or subject and demographic characteristics, is this a relatively homogeneous or heterogeneous sample?
Design
1	How were subjects assigned to groups or conditions?
2	How many groups were included in the design?
3	How are the groups similar and different in how they are treated in the study?
4	Why are these groups critical for addressing the questions of interest?
Procedures
1	Where was the study conducted (setting)?
2	What measures, materials, equipment, and/or apparatus were used in the study?
3	What is the chronological sequence of events to which subjects were exposed?
4	What intervals elapsed between different aspects of the study (assessment, treatment, follow-up)?
5	What variation in administration of conditions emerged over the course of the study that may introduce variation within and between conditions?
6	What procedural checks were completed to avert potential sources of bias in implementing the manipulation and assessment of dependent measures?
7	What checks were made to ensure that the conditions were carried out as intended?
8	What other information does the reader need to know to understand how subjects were treated and what conditions were provided?
Therapists
1	Who are the therapists, and why are these individuals selected?
2	Can the influence of the therapist be evaluated in the design as a ‘factor’ (as in a factorial design) or can therapist efforts be evaluated within a condition?
3	Are the therapists adequately trained? By what criteria?
4	Can the quantity and quality of their training and implementation of treatment be measured?
Treatment
1	What characteristics of the clinical problem or cases make this particular treatment a reasonable approach?
2	Does the version of treatment represent the treatment as it is usually carried out?
3	Does the investigation provide a strong test of treatment? On what basis has one decided that this is a strong test?
4	Has treatment been specified in manual form or have explicit guidelines been provided?
5	Has the treatment been carried out as intended? (Integrity is examined during the study but evaluated after it is completed.)
6	Can the degree of adherence of therapists to the treatment manual be codified?
7	What defines a completed case (e.g. completion of so many sessions)?
Assessment
1	If specific processes in the clients or their interpersonal environment are hypothesized to change with treatment, are these to be assessed?
2	If therapy is having the intended effect on these processes, how would performance be evident on the measure? How would groups differ on this measure?
3	Are there additional processes in therapy that are essential or facilitative to this treatment, and are these being assessed?
4	Does the outcome assessment battery include a diverse range of measures to reflect different perspectives, methods, and domains of functioning?
5	What data can be brought to bear regarding pertinent types of reliability and validity for these measures?
6	Are treatment effects evident in measures of daily functioning (e.g. work, social activities)?
7	Are outcomes being assessed at different times after treatment?
Data evaluation
1	What are the primary measures and data upon which the predictions depend?
2	What statistical analyses are to be used and how specifically do these address the original hypotheses and purposes?
3	Are the assumptions of the data analyses met?
4	What is the likely effect size that will be found based on other treatment studies or meta-analyses?
5	Given the likely effect size, how large a sample is needed to provide a strong powerful test of treatment (e.g. power ≥ 0.80)?
6	Are there subdivisions of the sample that will be made to reduce the power of tests of interest to the investigator?
7	What is the likely rate of attrition over the course of treatment, and post-treatment and follow-up assessments?
8	With the anticipated loss of cases, is the test likely to be sufficiently powerful to demonstrate differences between groups if all cases complete treatment?
9	If multiple tests are used, what means are provided to control error rates?
10	Prior to the experimental conditions, were groups similar on variables that might otherwise explain the results (e.g. diagnosis, age)?
11	Are data missing due to incomplete measures (not filled out completely by the subject(s) or loss of subjects)? If so, how are these handled in the data analyses?
12	Will the clinical significance of client improvement be evaluated and if so by what method(s)?
13	Are there ancillary analyses that might further inform the primary analyses or exploratory analyses that might stimulate further work?
Reproduced with permission from A. Kazdin. Methodology, design and evaluation in psychotherapy research. In Handbook of Psychotherapy and Behavior Change (eds. A.E. Bergin and S.L. Garfield), pp. 19-71. Copyright 1994, John Wiley & Sons, Inc.

Treatment standardization

Psychotherapy efficacy research, like pharmacotherapy research, requires that the treatment be standardized. Such standardization serves two related purposes. First, from a clinical point of view, it is necessary that the treatment be clearly specified, so that any conclusions about differential treatment efficacy can be translated into clear treatment recommendations. From a research point of view, treatment standardization allows studies to be replicated. In addition, by making the delivery of a treatment more standardized, differences between therapists and the statistical problems that result from the non-independence that ‘therapist effects’ introduce can be avoided.⁽⁶⁾

Standardization of pharmacological interventions is relatively straightforward—a per-day dosage (or range of dosages) is set in advance. But for psychotherapy, how can something so complex as patient-therapist dialogue be standardized? The central ingredient in standardization of a psychosocial treatment is a treatment manual. A psychotherapy manual describes the treatment in detail, with case examples and instruments for psychotherapists. Some treatment manuals, particularly those coming from a cognitive behavioural perspective, present a highly systematized step-by-step programme which therapists follow over the course of therapy. The relative success of treatment manuals in standardizing psychotherapy has been supported by a meta-analysis,⁽⁷⁾ which documented that studies employing treatment manuals had fewer outcome differences between therapists compared with studies that did not employ treatment manuals. Thus, when a treatment manual is used, therapists appear to produce relatively more uniform outcomes. In contrast, when no treatment manual is used, therapists differ considerably in their typical outcomes with patients; suggesting that different therapists are likely to be conducting sessions in discrepant ways, with some therapists producing more favourable outcomes and other therapists producing less favourable outcomes.

Treatment standardization, however, does not simply translate to the use of a treatment manual. A variety of steps are needed to ensure that therapists are delivering the intended treatment (Table 6.1.2.2), including: the careful selection of therapists; training of therapists in the intended modality using a treatment manual; certification of therapists based upon their adherence to the treatment model during training; and continuing adherence and competence monitoring of therapists during a clinical trial.

Concerns have been raised about the ‘treatment manual’ concept applied to less directive treatments such as psychodynamic therapy. The belief is that session-by-session manuals would remove the essence of good psychotherapy, and good dynamic therapy in particular, by making treatment artificially rigid and taking away the necessary clinical flexibility and creativity. Psychodynamic treatment manuals, however, are perhaps better described as ‘guides’, which specify the principles of treatment but do not overly constrain the necessary clinical flexibility and creativity. The flexibility of treatment is fully retained through the principle of tailoring the treatment intervention to the specific idiosyncratic issues that are salient for each patient. The actual learning of the practice of treatment is accomplished through supervision in the application of the treatment manual. Because dynamic treatment manuals are less like ‘cookbooks’, there may be a greater reliance on the supervision process compared with perhaps more straightforward behavioural treatments.

Table 6.1.2.2 Steps involved in the standardization of psychotherapy for outcome research

Selection of therapists

Training of therapists using a treatment manual

Certification of therapists based upon adherence to the treatment model

Continued assessment of therapist adherence and competence during a clinical trial

Research designs

Unlike pharmacological research where a single form of research design (placebo-controlled study) dominates the literature, psychotherapy researchers have employed a host of different research designs to understand the effects of psychotherapy. Some of the more common designs are listed in Table 6.1.2.3 and are explicated in the next section.

Single-case designs

Clinical evaluation of the effects of psychotherapy dates back to Freud’s descriptions of individual cases in treatment. However, methods to systematically examine the effects of interventions with individual patients were developed by behaviour therapists.⁽⁸⁾ These single-case experiments rely on comparing patient responses to differing experimental conditions over time. Typically, such single-case studies begin with an extended baseline period where patient behaviours or symptoms are recorded without any intervention. Then, different intervention phases are introduced, usually followed by more baseline (no intervention) assessment phases.

While experimental, single-case designs lend themselves well to the investigation of behavioural treatments that include a focus on immediate overt behaviour, such designs have rarely been employed with other verbal psychotherapies that emphasize longer term processes such as patient psychological growth and functioning. The generalizability of findings from single-case research is another limitation to this form of research.

Table 6.1.2.3 Common research designs for the evaluation of psychotherapy

Only gold members can continue reading. Log In or Register to continue