Measurement and Assessment in Late-Life Depression

Measure

Number of items

Response format

Features

Can be mapped to DSM criteria

Includes item on suicidal ideation

Possible cut points for MDD or clinically significant depressive symptoms^a

Average completion times

PHQ-9

4-Point Likert

Yes

≥10, or algorithm

<5 min

GDS-15

Yes/no

≥5

<5 min

CESD-10

Yes/no, or 4-point Likert

≥4 For yes/no format; ≥10 for Likert format

<5 min

BDI-II

4-Point Likert

Yes

Variable; Beck et al. [28] suggest score ranges: minimal (0–13), mild (14–19), moderate (20–28), and severe (29–63)

5–10 min

CES-D

4-Point Likert

≥16

5–10 min

CES-D-revised

4-Point Likert

Yes

≥16

5–10 min

^aRefer to Table 6.2 in this Chapter for more details regarding the validated cut points for these measures when compared to diagnostic, “gold-standard” interviews

The Centers for Epidemiologic Studies Depression Scale, or CES-D [1], is among the most popular depression screening measures in research. The CES-D can also be used to screen for both depression in very old populations [2]. Long forms (20 items) and short forms (e.g., 10 items) [3, 4] are available; the abbreviated ten-item version of the CES-D [4] takes <5 min to complete on average. Furthermore, both Likert-style [3] and binary (yes/no) [4] formats are available for the ten-item version, further increasing ease of administration. Reliability of the CES-D, as assessed using measures of internal consistency, for example, is reported to be high [4, 5]. Additionally, cut points for clinically significant depressive symptoms have been created for long (≥16 points) and short versions of the CES-D (≥10 points is suggested for the 10-item Likert-style version [3], and ≥4 points for the binary format version) [4, 6]; however, as illustrated in Table 6.2 of this chapter, variation has been observed in the optimal cut points for identifying depression among older adults [7–9]. The key aspect of the CES-D is that the symptoms focus heavily on core affective aspects of depression—such as feelings of being sad, depressed, fearful, or hopeless. Thus, a potential limitation of the CES-D is that depressed individuals who do not respond to items with a high focus on the affective aspects of depression may not be detected as such by the CES-D. This is a particular challenge among older adults, among whom the state of anhedonia—as opposed to expressed, subjective sadness—may be the dominant feature of depression [4, 6]. A related limitation is that of item bias, or differential item-functioning. For example, in an analysis [10] among a large-scale Australian cohort of women, there was notable response variation for one item on the CES-D: older women, compared to younger women, were substantially more likely to be missing responses some items (e.g., “I felt hopeful about the future”); although the precise explanation for this missing pattern is not clear, it is possible that it may have been more challenging for the oldest versus younger women to answer a question related to speculating on future time. Other investigators have found evidence of item-response bias among older adults by race/ethnicity on the CES-D [11, 12]. Furthermore, the development of the CES-D pre-dated the advent of the modern criteria for major depression (which emerged with the publication of the Diagnostic and Statistical Manual of Mental Disorders III, or DSM-III in 1980); thus, the items do not explicitly map to all of the DSM criteria for major depressive disorder (MDD). Consequently, a revised version of the CES-D was recently developed by Gallo [13]. The CES-D-revised is a 20-item scale with a similar Likert response format, and it provides increased emphasis of symptoms of depression as they may be experienced by older persons (e.g., more questions related to anhedonia and less reliance on subjective sadness) and also includes questions about thoughts of death/self-harm. In addition, unlike the original CES-D, the CES-D-revised can be mapped directly to the criteria of the DSM-IV [14].

Table 6.2

Comparisons of common brief depression scales with diagnostic gold standards in late-life depression^a

Study authors (location)	Year	Study population	N	Brief scale	Gold standard	Author findings
Haringsma et al. (Leiden, The Netherlands)	2004	Dutch community residents aged 55–85 years who referred themselves to depression prevention program	318	CES-D	MINI	ROC analysis found cut-off score of 25 to be optimal for MDD (sensitivity = 0.85, specificity = 0.64) and 22 for clinically relevant depression (sensitivity = 0.84, specificity = 0.6)
Beekman et al. (Amsterdam, The Netherlands)	2002	Dutch adults between 55 and 85 years old, enrolled in the Longitudinal Aging Study Amsterdam	277	CES-D	DIS	Using a cutoff score of ≥16 for MDD, sensitivity was 1 and specificity = 0.88
Lyness et al. (Rochester, NY)	1997	Patients ≥60 of 3 primary care practices in Rochester, NY	130	GDS, GDS-15, CES-D	SCID	ROC analyses found that using a cutoff point of ≥21 on the CES-D for MDD, sensitivity = 0.92 and specificity = 0.87. A cutoff point for MDD of ≥10 on the GDS resulted in a sensitivity of 1 and specificity of 0.84. A cutoff of ≥5 on the GDS-15 had a sensitivity of 92 % and a specificity of 81 %. All scales lost accuracy for detection of minor depression
Irwin et al. (San Diego, CA)	1999	Patients ≥60 enrolled in university-affiliated primary care clinics	68	CES-D-10	SCID	An optimal cutoff point of ≥4 was found for major depression (sensitivity = 1, specificity = 0.92). The ten-item CES-D appeared as effective at screening for depression as the original CES-D
Blank et al. (Hartford, CT)	2004	Patients ≥60 from urban primary care practices, nursing homes and a hospital	360	CES-D-10, GDS-15	DIS	The GDS-15 had the highest sensitivity/specificity when used to screen for major depression in nursing homes (sensitivity = 0.86, specificity = 0.82 using cutoff of ≥6). The CES-D-10 performed better than other instruments in clinics (sensitivity = 0.79, specificity = 0.81 using cutoff of ≥4) and hospital settings (sensitivity = 0.92, specificity = 0.77)
Nyunt et al. (Singapore)	2009	Adults ≥60 living in communities or nursing homes	4,253	GDS-15	SCID	Using cutoff of ≥5 for MDD, sensitivity was 0.97 and specificity was 0.95
Chen et al. (Huangzhou, China)	2010	Patients (aged ≥60) of primary care clinics in Huangzhou	77	PHQ-9, PHQ-2	SCID	When using a cutoff point of ≥9 for major depression, the PHQ-9 had a sensitivity of 0.86 and specificity of 0.77. The PHQ-2 was also found to be an effective screening tool, with sensitivity = 0.84 and specificity = 0.9 using the cutoff of ≥3
Marc et al. (Westchester, NY)	2008	Adults aged ≥65 receiving home care from a visiting nurse service agency	526	GDS-15	SCID	The optimal cutoff point for major depression was found to be ≥5, which yielded sensitivity = 0.72 and specificity = 0.78
Lamers et al. (Maastricht, The Netherlands)	2008	Patients ≥60 of general practices in The Netherlands	105	PHQ-9	MINI	For ADD (any depression), a cutoff of ≥6 for the summed PHQ-9 score was used, and corresponded to sensitivity = 0.96 and specificity = 0.81. The cutoff point ≥7 was used for MDD, with sensitivity = 0.92 and specificity = 0.78. When using the algorithm-based score of the PHQ-9 instead of the summed score, the test for both ADD and MDD had low sensitivity
Ros et al. (Albacete, Spain)	2011	Older adults with or without cognitive impairment	623	CES-D	CIDI	Among participants without cognitive impairment, the optimal cut-off point was found to be 28, with sensitivity = 0.82 and specificity = 0.94. The optimal cut-off point among those with cognitive impairment was just 13, however, which resulted in sensitivity = 0.86 and specificity = 0.72
Mitchell et al. (Leicester, UK)	2010	Older adults in medical inpatient/outpatient settings and in nursing homes	4,969	GDS-30, GDS-15, GDS-4	Semi-structured psychiatric interviews	Meta-analysis found the GDS-30 to have a sensitivity of 0.82 and a specificity of 0.78. The GDS-15 had sensitivity = 0.84 and specificity = 0.74, and the GDS-4 was found to be the most effective at screening for depression in medical settings, with sensitivity = 0.93 and specificity = 0.77
Watson et all (Chapel Hill, NC)	2009	Adults ≥65 in residential care/assisted living communities	112	GDS-15, PHQ-2	SCID	Using the cutoff point of one positive answer as criterion for depression, the PHQ-2 had sensitivity = 0.80 and specificity = 0.71. The GDS-15 had sensitivity = 0.60 and specificity = 0.75 with a cutoff point of ≥5, and yielded sensitivity = 0.73 and specificity = 0.65 when using a cutoff of ≥4
Bunevicius et al. (Palanga, Lithuiana)	2012	Mid-life and older adults with coronary disease (mean age = 58 years; range 32–77 years)	522	BDI-II	MINI	A cutoff point of ≥14 produced a sensitivity of 0.89 and specificity of 0.74 for MDD as diagnosed by the MINI
Frasure-Smith and Lespérance (Montreal, QC, Canada)	2008	Mid-life and older adults with coronary disease (mean age = 60 years; SD = 10 years)	811	BDI-II	SCID	A cutoff point of ≥14 produced a sensitivity of 0.91 and specificity of 0.78 for DSM-IV MDD
Huffman et al. (Boston, MA)	2010	Mid-life and older adults with coronary disease (mean age = 62 years; SD = 12.5 years)	131	BDI-II	SCID	A cutoff point of ≥16 produced a sensitivity of 0.88 and specificity of 0.92 for DSM-IV MDD

^aStudies assessing the BDI-II did not focus exclusively on the late-life population but included samples with a mean age of approximately 60 years or greater

Another scale that has achieved widespread use, as has the CES-D, is the Geriatric Depression Scale (GDS) [15]. The Geriatric Depression Scale is a self-report instrument measuring depressive symptoms in older adults [15, 16]. The scale is designed to identify symptoms that distinguish depression in the elderly from dementia. The GDS is comprised of the affective (e.g., sadness, apathy, crying) and cognitive (e.g., thoughts of hopelessness, helplessness, guilt, worthlessness) symptoms of depression in the elderly [17]. It downplays the somatic items (e.g., disturbances in appetite, sleep, energy level, and sexual interest) that many suggest are important confounds among older adults [18, 19]. For example, compared with younger depressed adults, depressed older persons tend to present more often with somatic symptoms (e.g., general somatic and gastrointestinal symptoms) and hypochondriasis [20].

Available in both long (30-item) and shorter versions (e.g., the commonly used GDS-15, with 15 items)—the shorter version [15] (GDS-15) with a yes/no response format was developed to decrease fatigue or lack of focus that may be experienced by older persons [21]—the GDS has several advantages, and its 15-item format is primarily used in the context of brief screening and assessment. First, the yes/no response format allows for brevity in administration, and scoring is very straightforward for the busy clinician [7, 22]. Validated cut points have been published both for older adults in primary care settings and also for older adults with specific co-morbidities (see Table 6.2 for details) [6, 7, 17]. Moreover, as its name implies, the GDS-15 has particular value for detecting late-life depression because featured items also emphasize symptoms that may be most relevant to the experience of depression in older persons—especially several questions honing in on anhedonia (e.g., items re: dropping activities, not doing new activities, boredom, emptiness) [6, 17]. Potential limitations include a decrease in the kind of dimensionality (i.e., ability for symptom scores to be expressed on a broader continuum) than is possible with the Likert-style CES-D and other measures, as well as the inability to map directly to DSM diagnostic criteria for MDD; the latter is not surprising, however, as—like the CES-D—the original development period of the GDS pre-dated the era of the modern DSM criteria. Another potential limitation of the GDS-15 is that the items do not feature emphasis on appetite/weight or sleep disturbance; such symptoms are not only potentially relevant to detecting depression but also allow for discrimination of subtypes of depression—for example, typical (reduced appetite/weight, decreased sleep, etc.) versus atypical (increased appetite/weight, hypersomnia, etc.) [23–25]. Nevertheless, there is some evidence that dimensionality in the GDS-15 is likely adequate. For example, in a recent analysis of long-term depression patterns in a large cohort of older female adults, Byers et al. were able to employ repeated measures of the GDS-15 in order to derive distinct latent subgroups of depression trajectories [26].

The Beck Depression Inventory, or BDI, is another widely used measure [27]. Although the original BDI was developed by Aaron Beck in the 1960s, the most commonly used form is the 21-item BDI-II [28]. Importantly, the BDI-II, unlike its predecessor, was developed in light of the newly available diagnostic criteria for MDD in the DSM nosology. The Beck Depression Inventory (BDI) is a valid instrument for the diagnosis of depression in older adults [29]; there is also a seven-item BDI–PC which is an abbreviated primary care version of the BDI–II, but it has not been specifically tested in elderly medical patients [30]. Validation studies have found the BDI-II to be effective at screening for depression in middle-aged and older adults with coronary disease, even though it contains several questions regarding somatic symptoms that could act as confounders [27, 31, 32]. Like the GDS, the BDI-II incorporates more aspects relevant to the behaviors and cognitions in depression, as opposed to the relatively greater emphasis on the affective elements seen in the CES-D [32]. The BDI-II enjoys advantages of reliability, validity, dimensionality, sensitivity to change (useful in context of symptom monitoring), and applicability in assessing a broad range of aspects in depression, including the elements of typicality versus atypicality [33, 34]; although, its 21-item length renders it less practical than other screeners, especially in the primary care context [29]. Also, less is known about the BDI regarding differential item functioning and any concerns regarding its use in culturally diverse contexts. Finally, there have been concerns about overlapping symptoms between medical conditions and the depressive somatic symptoms on the BDI [33], and some elderly individuals may not be able to complete the scale due to reading difficulty, physical debility, or compromised cognitive functioning [34].

A measure that has rapidly emerged as a screener of choice among many experts [35] is the Patient Health Questionnaire-9. Originally developed in 1990s as part of the PRIME-MD (Primary Care Evaluation of Mental Disorders) by Spitzer, Williams and Kroenke [36], the PHQ-9 is now publicly available and features a Likert-style response format for easy self-administration and rapid scoring by the clinician. The PHQ-9 has been shown to be effective and efficient as a depression screening tool for older adults in primary care settings [37] and has been successfully used to identify depression in homebound older persons [38]. It has high reliability and has been extensively validated in a variety of contexts [39–45]; both a diagnostic algorithm and cut point (of 10 points or above) have been validated for the PHQ. Furthermore, the measure has dimensionality and demonstrated sensitivity to change [40, 41], both of which are useful in the context of tracking symptoms among persons under treatment or monitoring persons at risk for depression. Among commonly used measures, the PHQ-9 was designed to map most directly to the modern criteria for major depression in the DSM [46]. To increase applicability, an eight-item version (PHQ-8) is also available and it removes the item pertaining to thoughts of death/suicide, as this may be inappropriate in certain large-scale screening contexts where rapid clarification and follow-up is not possible; however, current evidence shows that the validated cutoff of the PHQ-8 has comparable sensitivity and specificity for depression as the original nine-item version [47]. A two-item version of the PHQ has also performed well in validation studies, and has been recommended for initial depression screening due to its brevity [48, 49]. Unlike most other screeners, the PHQ also does not show any evidence of differential item functioning that was observed for other measures [42, 43]. In part because of the attributes noted above, the PHQ has gained traction as a screener and has been recommended for the purpose by a variety of agencies/organizations, and Medicare has integrated the PHQ-9 into the revised Minimum Data Set assessments (MDS 3.0) for nursing home residents [50]. (For more re: policy, see Table 9.1 in Chap. 9.)

Only gold members can continue reading. Log In or Register to continue