Introduction




(1)
Cognitive Function Clinic Walton Centre for Neurology and Neurosurgery, Liverpool, UK

 



Abstract

This chapter examines the introductory elements in the report of a diagnostic test accuracy study. Central to this is the definition of the research question to be examined. An important distinction needs to be drawn between proof-of-concept or experimental studies, which are particularly appropriate for new diagnostic tests, and which may be undertaken in ideal or extreme contrast settings; and pragmatic studies which recruit consecutive patients and hence are more reflective of the idiom of day-to-day clinical practice.


Keywords
DementiaDiagnostic test accuracy studiesResearch questionBias



1.1 Prologue


The need for diagnostic test accuracy studies is self-evident to any clinician. Although some diagnoses can be made on history from patient and informant alone (perhaps particularly in neurology and psychiatry), more often than not further testing by means of examination and investigation is needed to confirm or refute diagnostic hypotheses emerging from the history (Larner et al. 2011). Clinicians need to know the diagnostic accuracy of such examination signs and diagnostic tests. Hence the requirement for diagnostic test accuracy studies is well-recognised (Cordell et al. 2013). Studies to generate such data require methodological rigour to ensure their utility and applicability. This is not some sterile academic exercise in arid numeration, but a vital process to appreciate the benefits and limitations of diagnostic tests and to promote their intelligent, rather than indiscriminate, use. Evidently, reliable diagnosis will pave the way for many processes, including but not limited to the giving of information to patients and their relatives, the initiation of symptomatic and/or disease modifying treatment, and the planning of care needs.

The quality of diagnostic test accuracy studies may be evaluated using methodological quality assessment tools (e.g., Scottish Intercollegiate Guidelines Network 2007), of which the best known and widely adopted are the STAndards for the Reporting of Diagnostic accuracy studies (STARD; Bossuyt et al. 2003; Ochodo and Bossuyt 2013) and the Quality Assessment of Diagnostic Accuracy Studies (QUADAS; Whiting et al. 2004) and its revision (QUADAS-2; Whiting et al. 2011). These initiatives were in part a consequence of the perception that diagnostic test accuracy study methodology was of poorer quality than that used in studies of therapeutic interventions (randomised double-blind placebo-controlled studies).

STARD is a prospective tool which may be used to plan and implement well-designed studies, relatively free of bias. It includes a checklist of 25 items and a flow chart which should be followed to optimise study design and reporting. QUADAS is a retrospective instrument used to assess the methodological rigour of diagnostic accuracy studies, using 14 criteria to assess the quality of research studies. High quality diagnostic test accuracy studies not only inform clinicians in their decision making but also may be suitable for inclusion in meta-analyses, which have their own guidelines for performance and reporting (the PRISMA statement; Liberati et al. 2009; Moher et al. 2009; Shamseer et al. 2015).

More recently, guidelines for diagnostic test accuracy studies specific to dementia have been published by the Cochrane Dementia and Cognitive Improvement Group (Noel-Storr et al. 2014), based on the original STARD guidelines. This STARDdem initiative (www.​starddem.​org) acknowledged areas where revisions of STARD pertinent to dementia and cognitive disorders were required, as well as highlighting areas in which reporting to date has hitherto been poor.

The diagnosis of dementia poses many challenges. Dementia and cognitive impairment are syndromes with many potential causes (Mendez and Cummings 2003; Kurlan 2006; Ames et al. 2010; Dickerson and Atri 2014; Quinn 2014), including many neurological and non-neurological diseases (Larner 2013a). The clinical heterogeneity of the casemix in clinics dedicated to the assessment of cognitive disorders is a given, unless significant selection by clinicians at the referral stage, for example by the imposition of exacting clinical inclusion and exclusion criteria, is permitted.

Moreover, cognitive impairment is a process rather than an event (with the possible exception of strategic infarct dementia, a fairly rare occurrence) and hence often of changing severity over time. The evolution of cognitive decline (illustrated both beautifully and harrowingly in the novel We are not ourselves by Matthew Thomas: 2014) means that early signs are often passed off or explained away, and hence delay in presentation for clinical assessment is common. Patients with dementia disorders may therefore present at different stages of disease, of variable clinical severity.

An added complication in diagnosis, and one brought more sharply into focus by the drive to early diagnosis and initiation of disease-modifying drugs (when these become available), is correct identification of patients in early disease stages, before criteria for dementia are fulfilled. Various terminologies have been used for such states (mild cognitive impairment, cognitive impairment no dementia, mild cognitive dysfunction), indeed a lexicon has been proposed (Dubois et al. 2010). Certainly the old binary classification for the diagnosis of Alzheimer’s disease (Is it dementia? If so, is it Alzheimer’s disease? McKhann et al. 1984) has been rejected in favour of diagnosis based on disease biomarkers (Dubois et al. 2007; McKhann et al. 2011), a move from understanding AD as a clinicopathological entity to a clinicobiological entity (Dubois et al. 2014). Disease biomarkers may be positive long before clinical features become apparent (Bateman et al. 2012; Jack et al. 2013). Diagnostic studies in dementia may therefore be either cross-sectional, the typical paradigm of clinical practice, or longitudinal, the delayed verification paradigm. Passage of time is certainly one of the most informative diagnostic tests for dementia syndromes, but its application may result in opportunities for early treatment being missed.

Diagnostic test accuracy studies which score highly on the STARD/QUADAS ratings may not necessarily reflect the situations encountered by clinicians in daily practice. For example, such studies may have been undertaken in populations selected for a known diagnosis and compared with normal controls, a situation alien to day-to-day clinical practice. Pragmatic diagnostic test accuracy studies (Larner 2012a, 2014a:33–5) may therefore also be required, to provide information supporting or refuting a given diagnosis suspected on clinical assessment. This is analogous to the need for pragmatic studies of treatments to supplement the findings of double-blind placebo-controlled randomized controlled trials (Marson et al. 2005). This book examines some of the practicalities of performing diagnostic test accuracy studies, particularly from a pragmatic perspective.

A note should be appended here about whether tests are being used for diagnosis or for screening. Some authorities appear to envisage screening as a process applied to asymptomatic individuals with early disease (Sackett and Haynes 2002a:33), although the widely accepted World Health Organization (WHO) Screening Criteria (Wilson and Jungner 1968) do not seem to require that the condition being screened for is asymptomatic, merely that it has a “recognised latent or presymptomatic stage”. Many tests used in the evaluation of patients with memory complaints which may possibly be a consequence of brain disease are not diagnostic per se, but may indicate those patients who are, or might be (“at risk”), in an early symptomatic phase and require further investigation to confirm or refute a suspected diagnosis. This is perhaps particularly the case for cognitive screening instruments (Larner 2013b), hence this nomenclature. Many factors other than the presence of a dementia disorder may conspire to produce poor patient performance on these measures, such as sleep disturbance, primary sensory deficits, or affective disorder. In other words, tests which are not examining biomarkers may be influenced by factors other than the disease per se. Hence these tests may be able to do no more than screen patients for the possible presence of a dementing disorder (although some claim to be relatively specific for Alzheimer’s disease). Debate about the value of screening of whole populations for cognitive impairment, which will inevitably include testing of large numbers of asymptomatic individuals, continues (e.g., Fox et al. 2013).

With increasing efforts to define neurodegenerative disorders such as Alzheimer’s disease as clinicobiological, rather than clinicopathological, entities (Dubois et al. 2007, 2014), it may be that truly diagnostic tests, addressing the biology of disease, will be forthcoming, such as CSF and neuroimaging biomarkers (some of which are considered in Sect. 4.​3). Even if this is so, such sophisticated tests may not be universally, or indeed widely, available, and hence the use of cognitive screening instruments rather than diagnostic (biomarker) tests may persist. Both screening and biomarker tests require assessment using test accuracy studies, but in these circumstances the former may be better denoted as “screening test accuracy studies” rather than “diagnostic test accuracy studies”. In the interests of simplicity the latter term has been used throughout in this book (although the screening utility of clinical signs and cognitive instruments has been acknowledged in previous publications, e.g., Larner 2007a, 2012b, c, 2014b). The issue of developing tests to screen for asymptomatic individuals who may be harbouring dementing disorders, and the nature of the test accuracy studies required for them, is one of the key areas for the future (Sect. 6.​2.​1.​2).


1.2 Title/Abstract/Keywords


Little advice need be proffered on the subject of the title of an article reporting a diagnostic test accuracy study. Whereas in case reports or case series, where a catchy, alliterative, interrogative, ambiguous, or teasing title may be important in order to garner reader attention for what is essentially anecdotal evidence (Ghadiri-Sani and Larner 2014a), in diagnostic test accuracy studies no such gimmickry is required (or desired). The article title should be entirely informative, and should perhaps use the exact terminology (“a diagnostic test accuracy study”) to alert potential readers and to avoid all ambiguity. At time of writing (January 2015), searching Pubmed title words with the term “diagnostic accuracy” coupled with “dementia” or “Alzheimer’s” achieves few hits (<50).

Similar considerations inform the content of the abstract and the choice of keywords. Many journals require structured abstracts which facilitate the inclusion of key information items, for example on the background, aims/objectives, setting, results, and conclusions of the study.

Both STARD and STARDdem recommend the use of “sensitivity and specificity” amongst the keywords of papers reporting diagnostic test accuracy studies. At time of writing (January 2015), searching Pubmed title words with the terms “sensitivity” and “specificity” coupled with “dementia” or “Alzheimer’s” achieves few hits (<50). In addition, it would seem appropriate to use as keywords the name of the test(s) under investigation (although new tests may not yet have acquired MeSH terms) and the setting of the study. These considerations not only inform the individual reader but should also facilitate inclusion of studies in systematic reviews and meta-analyses.


1.3 Introduction


The purpose of the introduction is to give some background to the current study by contextualising it within the framework of previous studies looking at similar or related issues. This should be a brief critical appraisal (for which methodologies exist: Crombie 1996), of the current state of knowledge (“What is already known”), as well as pointing out any lacunae or shortcomings of the current knowledge base and hence what further information might be desirable. Critical appraisal of the state of current knowledge helps to define the optimal research question (Knottnerus and Muris 2002:43).

In the context of studies related to the diagnosis of dementia, it is often the case that authors will allude to the current frequency of the disorder as estimated in epidemiological studies, the likely growth in the numbers of dementia patients as a consequence of projected demographic changes (i.e., ageing of the global population), and the economic consequences of these demographic changes. All of these topics have been the subject of many national and international studies and consensus statements in recent years (e.g., Wimo and Prince 2010; World Health Organization 2012; Prince et al. 2013; Alzheimer’s Society 2014), and have galvanised policy makers and clinicians to initiate programmes to try to address these issues (e.g., the National Dementia Strategy in England; Department of Health 2009, 2012a, b). The long preclinical phase of some dementia disorders, particularly Alzheimer’s disease, with evidence for change in biomarkers from 15 to 25 years before predicted onset of clinical symptoms or diagnosis (Bateman et al. 2012; Jack et al. 2013), and the potential preventability of a significant percentage of cases (Norton et al. 2014), may also be alluded to.

In the case of a diagnostic test accuracy study, the potential benefits of early diagnosis of dementia may be discussed in the introduction, as well as the possible costs of delayed diagnosis (Prince et al. 2011). The existence of a “dementia diagnosis gap”, the discrepancy between the observed (diagnosed) versus expected frequency of dementia (based on epidemiological expectations), which has been well documented in the United Kingdom (Alzheimer’s Society 2011, 2013), may also be cited as leverage for the importance of using diagnostic tests (Cagliarini et al. 2013; Ghadiri-Sani and Larner 2014b; Larner 2010a, 2014c; Menon and Larner 2011) and hence for the importance of diagnostic test accuracy studies in dementia.

Some information on the particular diagnostic test which is being examined will also likely feature in the introduction, such as details of its original development or description and subsequent modification as necessary (with appropriate citations), and perhaps a summary of previous test accuracy studies, their findings and deficiencies, and hence the rationale for further studies (e.g., in a different clinical setting, or with a different patient cohort). Occasionally, for the more frequently used diagnostic tests, meta-analyses of relevant test accuracy studies may have been undertaken.

The introduction gives the background which should lead seamlessly into the aims or objectives of the diagnostic test accuracy study, framed in terms of the research question.


1.3.1 Research Question


Framing the research question is pivotal in diagnostic test accuracy studies.

Sackett and Haynes (2002a, b) have described an “architecture of diagnostic research” which divides research questions into four relevant types or phases. (A similar nomenclature, but with different meanings to each phase, is also proposed by Gluud and Gluud 2005.) Sackett and Haynes suggested that the “question is everything”, which is true if clinically meaningful answers are to be reached. The suggested questions or categories may overlap to some extent with the levels in the hierarchical model of diagnostic test evaluation described previously by Fryback and Thornbury (1991). Other such frameworks have also been described (Lijmer et al. 2009). Possible further developments of the framework outlined here may be required in the future (see Sects. 6.​2.​1.​4 and 6.​2.​1.​5).


1.3.1.1 Phase I/II


Phase I and II questions ask respectively whether patients with the target disorder have different test results from normal individuals, and whether patients with certain test results are more likely to have the target disorder than patients with other test results. Both these questions may potentially be answered from the same dataset (Sackett and Haynes 2002a). These phases may be akin to level 2 of the hierarchical model of Fryback and Thornbury (1991), “diagnostic accuracy efficacy”, defined as the determination of diagnostic outcomes such as sensitivity, specificity and predictive values in patients with and without the target disorder.

Studies asking questions of this type may be characterised as experimental, case-control, case-referent, or proof-of-concept studies. They may be carried out retrospectively, i.e., after the course and final status (e.g., diagnosis) of patients are known (Knottnerus and van Weel 2002:13), and hence may be described as taking place in “ideal circumstances” (Sackett and Haynes 2002a:19; Hancock and Larner 2011:976) or “extreme contrast” settings (Knottnerus and Muris 2002:39). Specifically, subjects for such studies are often recruited on the basis of known disease status, that is they either have the target disease or they are normal. These studies may occur in research, rather than specifically clinical, settings

This approach is eminently suitable to assess new diagnostic tests, particularly when test results require some degree of interpretative skill. Relatively small sample sizes may be sufficient to answer phase I and II questions because of the marked contrast between diseased and normal subjects, although they are vulnerable to selection and spectrum bias (Sects. 1.3.1.1 and 1.3.1.2). For example, the inclusion of normal controls (i.e., true negatives) in the “extreme contrast” study setting will inflate test specificity (item d in a 2 × 2 table; see Sect. 3.​2 and Fig. 3.​1).

Although entirely valid in the assessment of new diagnostic tests, this approach is problematic in that it is entirely alien to the idiom of clinical practice: patients generally do not present themselves to clinicians with known diagnoses. Patients present to clinicians to ask whether their symptoms are indicative of disease, and if so whether they are amenable to treatment. In the context of dementia services and memory clinics, a patient with a presenting complaint of a “terrible memory” often wants to know, even if this is not overtly stated, whether these symptoms are the beginnings of dementia or Alzheimer’s disease, particularly if there is a family history of these conditions (Larner 2013c). Furthermore, individuals with subjectively normal memory do not often drop into clinics just to provide a control population (in my experience cognitively normal individuals are only referred to a cognitive clinic if there is a family history of dementia, particularly if of early-onset disease, in the hope of having a diagnostic test undertaken to show whether or not they will develop dementia). At minimum, virtually all patients referred to a memory or cognitive disorders clinic will have subjective memory impairment which may or may not reflect an underlying brain disease (Larner 2014a:278–81) – often not, since subjective memory complaints are common, and lack of correlation between subjective and objective memory impairment is often found (Kapur and Pearson 1983), a conclusion tempered by the finding that older people with subjective memory complaints are at increased risk of dementia even in the absence of objective complaints (Mitchell et al. 2014). Hence an approach different from that of phase I and II studies is required for diagnostic test accuracy studies aiming to answer questions relevant in day-to-day clinical practice.


1.3.1.2 Phase III: Pragmatic Diagnostic Test Accuracy Studies


Phase III questions, in the nomenclature of Sackett and Haynes (2002a:24), ask whether the test result distinguishes those with and without the target disorder among those in whom it is clinically sensible to suspect the target disorder (my italics). These questions examine a population of subjects whose diagnosis is not known in advance, hence addressing the clinical problem for which the test should be evaluated (i.e., avoiding spectrum bias: Knottnerus and van Weel 2002:8) and comparable to the experience of a clinical practice setting (Knottnerus and Muris 2002:39). This may be described as the indicated, candidate, or intended population for the test, and may correspond in meaning to “field use”. There seems to be no analogous level in the hierarchy of Fryback and Thornbury (1991) to the phase III question, but there may be overlap with what Gluud and Gluud (2005) call a “phase IIc” study in their proposal for “evidence based diagnosis”.

In these phase III studies discrimination is more difficult because the contrast between normal and abnormal is less than in phase I/II (proof-of-concept) studies: estimates of diagnostic test accuracy are higher with non-consecutive patient inclusion (Rutjes et al. 2006). This is the challenge that clinicians face on a daily basis, and hence they require tests which have been applied in this situation to help answer diagnostic questions. Consecutive series of patients may be required to answer such questions, and a larger sample size is generally required than in phase I and II questions. The phase III nomenclature has been used in some prior studies of diagnostic test accuracy (e.g., Sundar et al. 2007; Bresani et al. 2013), including cognitive screening instruments (e.g., Carnero-Pardo et al. 2011) and functional imaging modalities (e.g., McKeith et al. 2007).

These considerations are also applicable to what have been described elsewhere as “pragmatic diagnostic accuracy studies” (Larner 2012a, 2014a:33–5), and in this book as pragmatic diagnostic test accuracy studies. The term “pragmatic” was first used by the author in this sense as a descriptor in a study of the Addenbrooke’s Cognitive Examination (Larner 2007b:491), and has subsequently been used in the description and/or discussion (Hancock and Larner 2007:134,138; 2008:23,25; 2009a:526; 2009b:passim, 2009c:1237,1240; 2011:passim; Larner 2007c:685; 2010b:393; 2012b:391,392; 2012c:138; 2013d:e426; Sells and Larner 2011:18,21) and titles (Abdel-Aziz and Larner 2015; Larner 2012d, e, 2013e, f, 2014a, d, 2015a, b, c, d) of other studies from this clinic.

Since in clinical practice tests are used to provide arguments for a given diagnosis suspected on clinical assessment, it may also be appropriate for pragmatic diagnostic test accuracy studies to recruit only patients in whom the target diagnosis features in the differential diagnosis, hence a situation-specific (Larner 2013e:107) rather than a consecutive cohort may be recruited. For example, instruments developed to identify specific conditions, rather than just dementia/cognitive impairment, may mandate a more restricted test cohort, such as possible synucleinopathy (Larner 2012e), behavioural variant frontotemporal dementia (Larner 2013e), or amnestic mild cognitive impairment/mild AD (Larner 2015c).

The importance of a “pragmatic clinical perspective” in diagnostic accuracy studies has been advocated by other authors (Richard et al. 2013). The analogous pragmatic randomised controlled treatment trial is also reported to be increasingly popular (Burch et al. 2012:1299), and its possible application to issues of importance in the management of dementia has also been advocated (Larner and Marson 2011).

It might be asked what difference there is between a pragmatic diagnostic test accuracy study and an audit of test use (Larner 2014a:xvi). Clearly there are areas of overlap in methodology, since both have an observational aspect, surveying test use, and may be prospective or retrospective in nature. However, audit seeks to address what is happening in practice, often with respect to compliance with a pre-specified external standard (National Institute for Clinical Excellence 2002). As an illustrative example of such a standard, one might consider the single screening question promulgated in the UK Government Dementia Commissioning for Quality and Innovation (Dementia CQUIN) document, which aimed to promote a proactive approach to dementia diagnosis (Department of Health 2012b). All individuals aged 75 years or over presenting to primary or secondary care for whatever reason are required to be asked: “Have you been more forgetful in the past 12 months to the extent that it has significantly affected your life?” Answers in the affirmative should trigger a “Dementia Risk Assessment”. The precise details of this assessment were not specified, but the recommendations came with financial incentives for compliance. Unsurprisingly then, attempts have been made to improve the screening process to ensure this question is administered (e.g., Mills et al. 2014), but this gives no indication of the sensitivity or specificity of the screening question. Specifically, do those answering yes include large numbers of false positives (we have some empirical evidence that this is in fact the case: Aji and Larner 2015), and those answering no large numbers of false negatives? A pragmatic diagnostic accuracy study would seek to generate such probabilities through inferential statistics (see Chapter 3). The nature of the research question in audit (Is practice compliant?) thus differs significantly from the phase III question of a pragmatic diagnostic test accuracy study: the former is an observational study whereas the latter may be considered to be part of an experimental paradigm. The author has previously and incorrectly, according to these definitions, used the term “audit” in the titles of papers which report diagnostic test accuracy studies (Doran et al. 2005; Larner 2005, 2006a, b), as well as appropriately in reports examining compliance with external standards (Larner 2011). In the former examples, “audit of established practice” may be cognate with “diagnostic test accuracy study” (and hence the study exempt from institutional ethical or research board review).

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Jun 3, 2017 | Posted by in NEUROLOGY | Comments Off on Introduction

Full access? Get Clinical Tree

Get Clinical Tree app for offline access