Screening for deficits
In the previous section we reviewed the nature, pervasiveness, and magnitude of the cognitive difficulties experienced by patients with MDD. We have argued that the magnitude of the deficits seen in patients with MDD is likely to represent a clinically relevant deficit and thus cognitive deficits in patients with MDD appear to represent a significant area of unmet need. Identifying deficits at de novo presentation is not part of the typical management of patients, unless the presenting patient is elderly and dementia is suspected. Objective cognitive testing is sometimes employed, though it usually includes only brief screening tests such as the Mini Mental Status Exam (MMSE) (Folstein, Folstein, & McHugh, 1975) and Montreal Cognitive Assessment (MoCA) (Nasreddine et al., 2005). Both these tests are deficient with respect to key psychometric requirements and very unlikely to be fit for the purpose of screening patients with MDD for cognitive difficulties. In the following sections we will consider alternative courses of action, discuss the potential benefits of computer-assisted screening, and conclude with some recommendations.
The studies reviewed in the first section suggest that clinically relevant cognitive deficits are a feature of the average patient with MDD. The magnitude of these deficits seems to be around 0.8 standard deviations (SDs) from expected performance. Such a difference is readily detected in cohorts of patients with MDD when compared with controls. However, detecting deficits of this magnitude in individual patients represents a significant challenge. A deficit of −0.8 SDs is some way short of the −2.00 SD threshold typically employed by psychologists as the cut-off for abnormal performance. The vast majority of tests employed clinically allow for judgments about individual performance based on age and years of education (YoE) corrected normative data. This approach to a very large extent helps manage the issue of individual differences in performance. However, a pervasive and significant challenge in dealing with individuals whose cognition has declined as a consequence of illness or injury is the absence of information regarding their premorbid levels of function. Estimates of likely performance levels can be made on the basis of YoE and the use of tests such as the National Adult Reading Test (NART) (Nelson, 1982) and AmNART (Grober & Sliwinski, 1991). However, such methods are less than perfect estimates of true premorbid IQ.
Our challenge in detecting cognitive deficits in patients with MDD is to determine the extent to which cognition has declined since onset. Objective testing on first presentation requires that we set a statistical cut-off for detecting impairment on our selected measures. This is analogous to the process originally proposed for detecting prodromal Alzheimer’s disease, often termed mild cognitive impairment, where a −1.5 SD cut-off was proposed and has been routinely employed (Petersen et al., 1999). Astute readers will recognize that adoption of this threshold will necessarily include slightly less than 7 percent of the population as possible false positive cases. Just as importantly, high functioning individuals who may have suffered a precipitous loss of function will constitute a significant proportion of false negative cases. To illustrate this, let us take the example of a college professor whose normal level of function is +2 SDs above the expected mean. A decline of 0.8 SDs is insufficient for her to cross our threshold for detection, but so too would be a decline of 3 SDs. When dealing with first presentation detection of cognitive difficulties in patients with MDD we are faced with a similar problem, as employing a liberal criterion such as a cut-off of −0.8 SDs yields a significant number of false positives and does not significantly improve the false negative issue outlined above. However, there are other possible remedies to assist with identifying true positive cases of cognitive deficits in MDD, the first of which utilizes subjective accounts of cognitive difficulties, which is discussed in the next paragraph.
Introspection as a methodology for understanding and interpreting human behavior has something of a bad reputation with psychologists. However, clinical anecdote and experience suggest that asking patients with MDD about their cognitive systems has utility. In the previous section, mention was made of Conradi et al.’s (2011) study in which patients were asked whether they had “a diminished ability to think or concentrate, or indecisiveness.” This suggests some level of insight and an awareness of cognitive difficulties. Intuitively it seems reasonable to suppose that high functioning individuals will be aware of what might be very slight changes in their cognitive acuity. Soliciting views regarding cognitive changes therefore seems a potentially useful means of screening first presentation individuals for evidence of cognitive dysfunction. One possible method for screening using both objective cognitive testing and guided self-report will be discussed later in this chapter.
Repeated assessment
A significant proportion of the cognitive tests employed for use as measures in clinical drug trials were designed to be used in the detection of impairment. Often, therefore, little thought or consideration was given to the development of tests that would be appropriate for repeated assessment. Issues of temporal reliability were seldom addressed prior to publication and even for tests likely to be prone to repeated assessment effects, such as practice, item familiarity, etc., little or no provision was made for the development of equivalent parallel versions to mitigate these effects. The emphasis has also often been on the development of instruments designed to detect impairment, and so tests have tended to be designed on the basis of normal performance either represented as perfect performance (an “absolute ceiling effect”) or close to perfect performance, an “effective ceiling effect.” The assessment community has been aware of the fundamental psychometric deficits of popularly employed cognitive measures for some time. For indications such as Alzheimer’s disease, guidance has been available since 1997 regarding the use of objective psychometric testing (Ferris et al., 1997). This article provides robust guidance for test construction and selection, emphasizing the need for reliable, valid, and sensitive measures to be employed. The authors also offer helpful guidance on best practice for computerized cognitive assessment, a theme explored more recently by Harrison and Maruff (2008).
Human performance on tests of cognition is prone to a number of sources of variance and successful test integration is largely about error reduction. Some of these sources of variation are integral to the study participant themselves. One key idiosyncratic dimension is the cognitive style with regard to speed and accuracy. Psychologists have long been aware that individuals differ with respect to setting a speed/accuracy trade-off. Whilst this source of between-subject variability can never be entirely managed, the judicious use of task instructions can induce study participants to approximate to a common trade-off setting. Motivation is also a key source of inter-subject variability, and an issue the importance of which might well be magnified in patients with MDD. A further key factor is the effect of diurnal variation. It is again well known that performance varies across the day, with some individuals (“larks”) apparently best able to perform earlier in the day and others (“owls”) tending to perform more successfully later in the day. Recent post-hoc analyses of clinical trial data for studies of patients with Cognitive Impairment Associated with Schizophrenia (CIAS) have suggested that the efficacy detected is heavily influenced by the consistency of assessment timings (Hufford et al., 2014) with consistent timing yielding greater evidence of efficacy.
A further source of variance relates to factors extraneous to the study participant. For example, certain environmental factors may impinge on performance. These factors include more obvious influences such as visual and auditory distraction, but also extend to ambient illumination, temperature, etc. A further key source of variance relates to so-called “experimenter effects.” These effects can sometimes be manifest as involuntary clues that influence test performance. Experience suggests that administration of the ADAS-cog can be prone to these effects. Sometimes this might be as mundane as pantomiming a response in the “Commands” subtest, such as physically making a fist when the instruction is solely to instruct the study participant to “Make a fist.” These effects are sometimes more subtle. One example of this from the ADAS-cog is in the administration of the “Word Recognition” subtest. Here study participants are required to view and read words and then in the test phase identify the 12 original words mixed in with 12 new “foils.” The forced choice is “yes” to words that were seen before and “no” to new words. Test administrators familiar with the stimuli may telegraph the correct answer with nonverbal cues. The demeanor of the test administrator can also affect performance.
Other disciplines have routinely concerned themselves with managing error and have specified precautions into their procedures. For example, physics laboratories in the USA are governed by instructions to help maximize the reliability of their procedures. Appendix D of these procedures (Taylor & Kuyatt, 1994) specifies a list of “repeatability conditions,” which include employing:
the same measurement procedure;
the same observer;
the same measuring instrument, under the same conditions;
the same location.
These precautions seem eminently sensible and worthwhile. Intuitively it seems reasonable to include them in both screening and efficacy testing. The fourth issue is particularly relevant in the context of orientation questions such as the ones contained in the MMSE. In this measure five of the possible 30 points are achieved by correctly telling the rater five elements of the geographical location at which the testing is occurring. If assessment has been carried out repeatedly at the same location and then changes, this may dramatically influence the study participant’s score.
There are some further requirements attached to the third of these requirements, specifically the use of the “same measuring instrument.” A few cognitive measures, such as tests of verbal fluency, appear to be relatively immune to practice effects (Harrison, Buxton, Husain, & Wise, 2000). However, other measures can be very prone to marked practice effects. For example, the Wisconsin Card Sorting Test (WCST) requires that study participants sort cards according to three rules, but have to determine the rules for themselves through trial and error. However, in the performance of this test, study participants can realize that the rules extend only to color, shape, and number. Realization of this can yield significantly improved performance.
Tests that are prone to practice effects are unhelpful measures for a number of reasons. A key issue is with regard to the construct under investigation. For example, the Word Registration component of the MMSE requires the study participant to remember the words “apple, penny, table.” It was clearly the intention of the test authors to employ these items as tests of immediate and delayed episodic memory. However, repeated assessment and/or prior tuition can lead to study participants learning the word registration stimuli. These items once learnt become tests of semantic memory, a very different cognitive construct to episodic memory. Note that the attention subtest of the MMSE, whether it be “Serial 7 subtraction” or recalling the letter order of the word “World” backward, is prone to exactly the same problem. Thus 11/30 points on the MMSE are achieved in methods not intended by the authors, placing limitations on the test’s validity and reliability.
A further issue is the differential effects of practice on group performance. In testing the efficacy of interventions we are concerned with the difference in measures of central tendency. However, to maximize the probability of detecting true differences we would do well to be as attentive to the denominator used to calculate our inferential test statistics. Reducing both between- and within-subject variability can beneficially affect our analyses and requires only some modest precautions. One method by which this can be achieved is by exposing study participants to pre-baseline assessments. This is fairly readily achieved in the time available on study screening visits and often has clear benefits. Brief testing facilitates this procedure and two exposures to the selected measures often accounts for the most obvious test familiarity effects and ensures that performance is at an effective asymptote when the study baseline measure is taken. Screening assessments reduce both within- and between-subject variability, thus yielding tighter confidence intervals around our estimates of central tendency.