(1)
Cognitive Function Clinic Walton Centre for Neurology and Neurosurgery, Liverpool, UK
Abstract
This chapter examines some possible future prospects for diagnostic test accuracy studies in dementia, in terms of both potential problems and opportunities. A proposal for the role of pragmatic diagnostic test accuracy studies concludes these considerations.
Keywords
DementiaDiagnosisDiagnostic accuracy studiesIn the future, tests or testing strategies may become available which reliably diagnose the various forms of dementia and differentiate them from one another at all stages of disease severity. That future prospect is not, in this author’s view, imminent, not least because of the incomplete understanding of the pathogenesis of these diseases, despite the great advances that have been made in recent decades.
For this reason, diagnostic test accuracy studies in dementia will continue to be required for the foreseeable future. It therefore behoves clinicians to conduct such studies meticulously, guided as per international initiatives (Noel-Storr et al. 2014) and, perhaps, the ideas outlined in this book, in order to better inform themselves and their clinical colleagues about test diagnostic utility and applicability. Better diagnostic tests, as informed by test accuracy studies, should produce more homogeneous patient cohorts for randomised controlled trials of new treatments, which will be of critical importance, especially if treatments delivered via more invasive routes, such as intravenous antibodies (Panza et al. 2014) or other blood derivatives (Villeda et al. 2014) are to be contemplated.
Considering the future prospects for diagnostic test accuracy studies in dementia, both problems and opportunities may be envisaged.
6.1 Problems
Some problems are generic to diagnostic test accuracy studies per se, some are specific to the dementia field. Clearly the results of any diagnostic test accuracy study will be undermined if there is insufficient rigour or inadequacy in the prosecution of the study, including factors such as patient selection, administration of the test and the reference standard, data collection, and analysis. Poor quality diagnostic test accuracy studies may lead to consequences such as incorrect diagnosis, delivery of inappropriate treatment, and ultimately to poor commissioning, planning and financing of services.
The dementia syndrome is often not a point diagnosis, since the chronicity of the underlying disorders often encompasses long preclinical and prodromal phases, hence necessitating delayed verification/longitudinal studies as well as the more typical clinical paradigm of cross-sectional studies, with the increase in time, expense and complexity required in the former.
6.1.1 Bias
Bias in its many forms (Sect. 1.4) is inherent in diagnostic test accuracy studies; although it may be minimised by careful study planning, it cannot be entirely eliminated. Some of these sources of bias may be addressed in pragmatic diagnostic test accuracy studies, but not all.
Any study based within a clinical service, be it primary or secondary care, will be subject to selection bias by the very act of patients presenting themselves to clinical attention (referral filter), and also probably clinical review bias on the basis of clinical information already known to the clinician. This may be exacerbated in secondary care by the provision of the referral letter, which a clinician cannot (or would be very ill-advised to) ignore prior to consultation. If insufficient clinical personnel are available, as is often the case outside research-based settings, diagnostic review bias due to inadequate blinding may also be a significant issue. On the other hand, verification bias and incorporation bias can be minimised, by ensuring that all patients receive both index test and reference standard, and by examining tests which are not part of the reference standard.
Spectrum bias is a factor related in part to clinical setting which affects demographic (patient age) and clinical (disease severity) features, and this may be exacerbated in secondary care by the application of rigorous inclusion/exclusion criteria. By using consecutive patient cohorts, pragmatic diagnostic test accuracy studies may minimise spectrum bias by evaluating all patients irrespective of age, educational levels, and cognitive impairment, and hence may have high external validity. Low educational levels increase the risk of cognitive decline in later life, but sample spectrum bias might exclude those with illiteracy from diagnostic accuracy studies of some cognitive screening instruments (e.g. the Mini-Mental State Examination is heavily oriented to language). Tests designed for use with patients who are illiterate have been described (Carnero-Pardo et al. 2011).
6.1.2 Index Test and Reference Standard
Administration of the index test must be standardized. If some interpretation of test results is required (e.g. qualitative as well as quantitative assessment of performance on cognitive screening instruments, or appearances on brain imaging studies) then observer variability needs to be taken into account and if necessary some form of training put in place to ensure that all raters or observers interpret in a similar way in order to minimise variability.
Similar considerations apply to administration of the reference standard. The lack of a “gold” standard, or the existence of variable standards (e.g. different sets of diagnostic criteria; see Box 2.1) may undermine diagnostic accuracy studies unless a clear protocol is determined before study onset.
6.1.3 Blinding
As previously mentioned (Sect. 2.2.4), blinding of raters or observers may represent the most significant problem for the validity of pragmatic diagnostic test accuracy studies in dementia.
Unlike the research setting where a plenitude of personnel may be available to undertake separate aspects of patient assessment (index test, reference standard), this is less easy to arrange in the outpatient setting where clinicians may work in (relative) isolation. Furthermore, the setting of a dedicated memory or cognitive disorders clinic inevitably introduces observer bias since practically all attendees will have at minimum a subjective memory complaint (in the author’s experience only occasional patients without cognitive complaint are seen, for example because of a strong family history of cognitive disorder when the issue of neurogenetic testing has arisen, or when some form of structural brain imaging has been previously undertaken for another reason such as headache and a report of “atrophy” has been made, usually by a general radiologist). Hence “unblinding” will have occurred even before the clinic letter is read, or any patient contact is established. Even cursory patient examination may disclose clinical signs suggestive of the presence (e.g. head turning sign; Larner 2012a) or absence (attended alone sign; Larner 2014a) of cognitive impairment.
6.1.4 Reproducibility
As previously mentioned (Sect. 4.3.6), measuring reproducibility or reliability is a particular problem for diagnostic test accuracy studies of cognitive screening instruments because of practice effects, i.e. repeating a test after a short period of time, as is desirable to measure intra- and inter-rater reliability, risks the possibility of practice effects which might result in an underestimate of reproducibility. Hence reproducibility for these tests can only be assessed by examining independent patient cohorts, which may be time consuming. Automated tests are usually associated with quality control procedures to ensure reliability, but variation between laboratories can occur, necessitating some kind of harmonisation procedure (Hort et al. 2010).
6.1.5 Wrong Paradigm
In some situations, data from diagnostic test accuracy studies may have serious limitations, and indeed may not be able to answer the clinical question (e.g. Burch et al. 2012). If a (so-called) diagnostic test accuracy study cannot provide information about the diagnostic accuracy of the test being evaluated for whatever reasons (e.g. a rare condition with resultant small sample size, inadequate blinding to test results, uncertain reference standard or test outcome measures), then this study design may be inappropriate to answer the diagnostic question.
6.2 Opportunities
There are many future opportunities for diagnostic test accuracy studies in dementia, some of which are considered here, in terms of the questions which might be addressed, the settings in which such questions might be addressed, and the analysis of such studies.
6.2.1 Questions Which Might Be Addressed
6.2.1.1 Clinical Signs Associated with Dementia
As pointed out some years ago (Larner 2001:xii), the diagnostic accuracy of most clinical signs, neurological or otherwise, remains to be defined, despite that fact that many are routinely taught to medical students and postgraduate trainees. Although an initiative to rectify this was commenced (CARE: Clinical Assessment of the Reliability of the Examination; McAlister et al. 1999), relatively little seems to have emerged from this programme, at least in the field of neurological signs.
Some attempt has been made to assess the accuracy of a few signs relevant to dementia, including the “attended alone” sign, the head turning sign, and the applause sign, all of which may be defined as non-canonical neurological signs (Larner 2014b; Table 6.1). However, this remains a neglected field which invites further studies, ideally suited to the day-to-day workings of an outpatient clinic.
Table 6.1
Diagnostic accuracy of three non-canonical neurological signs for the diagnosis of any cognitive impairment when examined in consecutive patient cohorts seen in a cognitive disorders clinic
Attended alone sign | Head turning sign | Head turning sign | Applause sign | |
---|---|---|---|---|
N | 726 | 207 | 191 | 240 |
Prevalence of any cognitive impairment | 0.32 | 0.40 | 0.45 | 0.45 |
Sensitivity | 0.93 | 0.63 | 0.68 | 0.36 |
Specificity | 0.45 | 0.95 | 0.94 | 0.88 |
PPV | 0.47 | 0.94 | 0.96 | 0.73 |
NPV | 0.93 | 0.64 | 0.58 | 0.61 |
Reference | Larner (2014a) | Larner (2012a) | Ghadiri-Sani and Larner (2013) | Bonello and Larner (2015) |
6.2.1.2 Preclinical Disease
It is clear from many disparate studies that the neurobiological changes which underpin Alzheimer’s disease commence many years, indeed decades, before the emergence of clinical symptomatology (Amieva et al. 2005; Bateman et al. 2012; Jack et al. 2013). The need for tools to detect the preclinical and presymptomatic stages of AD, as well as mild cognitive impairment due to AD, is well recognised (Snyder et al. 2014). Diagnostic accuracy studies for any tests developed to identify these very earliest disease stages will be of paramount importance if meaningful trials of disease-modifying agents are to be undertaken in AD. Delayed verification studies of tests measuring disease biomarkers, such as amyloid PET imaging and CSF biomarkers (Sect. 3.3), represent a start in this process. A small study has suggested that CSF Abeta42 and hippocampal volume measurement may be used in combination to best identify prodromal AD (Prestia et al. 2013) and certain neuropsychological test batteries might also predict incident AD (Wolfsgruber et al. 2014).
6.2.1.3 Combinations of Tests
Diagnostic test accuracy studies assessing different sequences or batteries of tests, rather than a single test in isolation, might be a fruitful undertaking. The dementia syndrome encompasses not only cognitive but behavioural, functional, and neurovegetative changes (American Psychiatric Association 2000) and hence diagnostic tests addressing more than one of these domains or constructs might be appropriate (Larner 2014c:184–9). Some attempts to combine results of tests from different diagnostic domains has been essayed (Sect. 4.3.3.3), and this might be extended.
6.2.1.4 Extending the Scope of Test Use: “Phase IIIb”
In the nomenclature describing the architecture of clinical research devised by Sackett and Haynes (Sackett and Haynes 2002), phase III questions ask whether diagnostic test results distinguish those with and without the target disorder amongst those in whom the test would be used (my italics), i.e. those in whom it is clinically sensible to suspect the target disorder (Sect. 1.3.1.2). Phase III studies therefore address the clinical problem for which the test should be evaluated. It might be possible to extend this (perhaps to be denoted “Phase IIIb”?) to ask whether diagnostic test results distinguish those with and without the target disorder among those in whom the test could be used, or to address the clinical problem for which the test could be evaluated. For example, since the dementia syndrome encompasses many domains, could cognitive scales be used to identify patients with cognitive complaints who in fact have depression rather than dementia, even though this was not what the scale was originally designed for? This is simply to recognise the clinical truism that cognitive complaints and impairments have a differential diagnosis which encompasses both cognitive and affective disorders.
Although at first glance this might seem somewhat counterintuitive or even inappropriate, there are some examples available of such a “phase IIIb” approach, where the utility of a cognitive scale in differentiating cognitive and psychiatric causes of memory disturbance has been shown, for example using a computerized battery (CANTAB-PAL; Swainson et al. 2001), the Addenbrooke’s Cognitive Examination (ACE; Dudas et al. 2005) and its revision, ACE-R (P Hancock, cited in Larner 2014c:168–9; Rotomskis et al. 2015). Conversely, depression screening instruments have been examined in pragmatic test accuracy studies for the diagnosis of dementia, such as the Patient Health Questionnaire-9 (PHQ-9; Hancock and Larner 2009) and the Cornell Scale for Depression in Dementia (CSDD; Hancock and Larner 2015). Although both the latter proved inadequate for dementia diagnosis in terms of sensitivity and specificity, nevertheless they were of pragmatic value in identifying patients who might benefit from trials of antidepressant medication (Hancock and Larner 2009, 2015).
Such diagnostic test accuracy studies have the potential to broaden or extend the scope of test use. For example, the Montreal Cognitive Assessment (MoCA), originally designed to diagnose Alzheimer’s disease and mild cognitive impairment (Nasreddine et al. 2005), has found application for the diagnosis of cognitive impairment in many other neurological disorders including cerebrovascular disease, Parkinson’s disease, Huntington’s disease, brain tumours, systemic lupus erythematosus, substance use disorders, obstructive sleep apnoea, and epilepsy (Julayanont et al. 2013). MoCA might also have a role as a cognitive screener in psychiatric disease (Musso et al. 2014) and has been recommended as one of the screening tools to detect fluctuations in cognition in the context of delirium (Inouye et al. 2014). All of these scenarios require “phase IIIb” diagnostic test accuracy studies. Mini-Mental Parkinson (MMP), originally designed to identify cognitive impairments in Parkinson’s disease (Mahieux et al. 1995), may also have utility as a general cognitive screening instrument (Larner 2012b).
6.2.1.5 Prognostic Diagnostic Test Accuracy Studies: Phase IV/“IVb”
Phase IV questions (Sackett and Haynes 2002) ask whether patients undergoing a diagnostic test fare better in their ultimate health outcomes than similar patients who do not (prognostic test impact), hence constitute an evaluation of the entire diagnostic test-treatment pathway for clinical effectiveness (Ferrante di Ruffano et al. 2012). Such phase IV studies remain unexplored in dementia, undoubtedly because of the currently very limited treatment options (symptomatic treatment with cholinesterase inhibitors and/or memantine). When more meaningful dementia treatments become available, this will be an important area for diagnostic test accuracy studies, although such studies will, due to the nature of the disease, be of long duration, especially if commenced at early, even presymptomatic, stages, with many years of follow-up required to test the efficacy of interventions. They might be easier to undertake in dementia disorders with currently poor survival rates, such as prion disease or frontotemporal dementia with motor neurone disease.
Phase IV questions add layers of complexity to diagnostic test accuracy studies. For example, a study of which diagnostic modalities should be used to identify the epileptogenic zone in patients with refractory epilepsy being considered for surgery found that extant diagnostic test accuracy studies could not provide information on diagnostic accuracy or clinical utility, with all likelihood ratios close to unity (Burch et al. 2012). Dichotomizing the data into a standard 2 × 2 table proved difficult, and indeed may be inappropriate when the object of tests is something other than identifying the presence or absence of disease (Burch et al. 2012).
It is inevitable, in the author’s opinion, that phase IV questions will also eventually attract health economic evaluations (“Phase IVb”?), as for randomised controlled trials of dementia treatments (Wimo et al. 2013). In times of economic recession or retrenchment, cost implications will factor highly with those charged with commissioning services, over and above issues of efficacy which are of most concern to clinicians and patients.
6.2.1.6 Policy Decisions
Because of the societal and economic impact of the increasing prevalence of dementia associated with the demographic ageing of populations, dementia has become a subject of interest to politicians as well as clinicians and neurobiologists (e.g. in the UK, the Prime Minister’s Challenge on Dementia; Department of Health 2012a). A consequence of this interest is the promulgation of policies, ostensibly aimed at improving the lot of dementia sufferers, which may have consequences rather different to those anticipated (if contemplated at all) by the policy formulators.