Evidence-Based Medicine in Neurology



Evidence-Based Medicine in Neurology





Over the past 20 years more attention has been given to the quality of evidence underlying medical decision making. While each discipline in medicine may have different disease entities and treatments for these diseases, basic approaches to analyzing diagnostic tests, prognosis, medications, and surgical interventions cross disciplinary lines. The science of designing, implementing, and evaluating the outcomes of clinical research falls under the rubric of evidence-based medicine, a term coined by physicians at McMaster University in Canada to emphasize the primacy of excellent-quality evidence in determining medical decisions.

In the past much of medical teaching revolved around teaching the experience of senior physicians during an apprenticeship period. It has become clear that the observation of individual patients and outcomes, while important in medical care, can be deceiving in terms of understanding the efficacy of medical interventions such as surgical therapies, medications, clinical practices, and so forth. Clinical trials design has become more sophisticated in an attempt to provide data that are as free of various biases as possible, and that when reviewed by experts in an area and experts in trial design can be judged to be valid and a true measure of efficacy for that population.

It also has been shown that theoretical approaches to disease management, such as extrapolating from animal model results, or using pathophysiologic approaches, are rife with error. Patients are not lab animals and randomized trials may not reproduce how we cogitate about disease entities. Time and time again randomized trials have produced results that would have been counter-intuitive but that are scientifically robust in nature.

Although it is beyond the scope of this text to fully review evidence-based medicine, it may be useful to have a description of approaches to analyzing the medical literature.

For diagnostic tests:



  • Has this test been analyzed in a patient sample that is representative of the spectrum of disease and types of patients that would be analyzed in a real world setting?


  • Is there a “gold standard” that this test is compared to independently and in a blind fashion?



  • If used, does this test change practice in some positive fashion (e.g., speed diagnosis, reduce costs, reduce inconvenience, reduce pain, improve sensitivity or specificity of diagnosis, lead to improvement of treatments, etc.)?

For treatment studies:



  • Were patients randomly assigned to treatment or comparison groups? Was there allocation bias? Were the patients representative of the ultimate target treatment group?


  • Were all patients entered in the study accounted for at the end of the study?


  • Were the outcomes significantly different between treatment and comparison groups? Were these differences clinically important?


  • Is the cost, inconvenience, side effect profile, or other operational features of the treatment worth it for the outcome change?


  • Were the outcomes measured in patient-oriented parameters that mattered (death, morbidity, patient satisfaction, cost, etc.) rather than disease-oriented outcomes (blood pressure, blood urea nitrogen, carotid luminal diameter, etc.)?

For review articles:



  • Were explicit, systematic methods used to determine and assess the articles reviewed? If this is a systematic review, were the methods of review explicitly expressed?


  • Does the reviewer have biases related to his or her own research and publication or links to pharmaceutical or medical equipment manufacturers that may sway the review?

In neurology, evidence is just as important as it is in other areas. At times using EBM approaches in neurology creates problems that should at least be recognized.



  • Rare diseases are common neurology, and rare diseases may be difficult to study using randomized trials design because large numbers of patients may be hard to come by.


  • Outcome measures may be soft in some areas of neurology with slowly developing change (e.g., progression in multiple sclerosis, worsening in Alzheimer disease) so that surrogate markers such as MRI or cognitive testing are used, which may or may not express meaningful changes in disease outcomes.


  • Time constraints in acute neurologic disease may mean that only a small sample of a population at risk can enter studies,
    threatening the generalizability and utility of treatments based on such a population (e.g., acute stroke trials).

Having said this, many of the new treatments referenced in this text are based on sound clinical trials methodology and will, it is hoped, stand up to the test of time. Where possible we have referenced levels of evidence or grades of recommendation using the Oxford Center for Evidence-Based Medicine guidelines for such grading (Table 36.1). We have also tried to reference where the evidence is lacking in an area of treatment.

A note about ARR, RRR, and NNT, and their use in neurology: A key element assessing whether a treatment is “worthwhile” even if it is statistically better than the comparator is the absolute risk reduction (ARR) of the treatment, which can also be expressed as a number needed to treat (NNT). The ARR is the absolute difference in event rates of an adverse event of interest between the control group and treatment group. It is expressed arithmetically as the absolute value of the event rate in the treatment group minus the event rate in the comparison group. For example, if there are 10% strokes in the control group and 5% in the treatment group, the absolute risk reduction is 5%.

Such results are often expressed as relative risk reduction (RRR), which is the proportion of adverse events that would have occurred in the control group avoided by the treatment. In the example given above, the RRR would be 100%. Pharmaceutical reports often emphasize RRR, which sounds larger than ARR but may falsely express the absolute risk reduction, which is after all the “bang for the buck” of treatment.

Another way of expressing the absolute risk reduction is by using number needed to treat (NNT). This is the inverse of the ARR. For example, with an absolute risk reduction of 5%, one needs to treat 20 people to avoid one adverse outcome (1/ARR). With an absolute risk reduction of 1%, one would need to treat 100 people to avoid the outcome. NNT is a useful measure of clinical utility.


GLOSSARY OF EVIDENCE-BASED MEDICINE TERMS

Absolute risk reduction: the absolute difference in event rates of an adverse event of interest between the control group and treatment group.

Blinding: any one of the patients, clinicians, pharmacy staff, technicians, assessors, and so on can be blinded (multiple levels of blinding). Blinding means that some or all of these individuals were unaware of which study group the patient was entered into.










TABLE 36.1. Oxford Centre for Evidence-based Medicine Levels of Evidence (May 2001)


























































































Level


Therapy/Prevention, Aetiology/Harm


Prognosis


Diagnosis


Differential Diagnosis/Symptom Prevalence Study


Economic and Decision Analyses


1a


SR (with homogeneity*) of RCTs


SR (with homogeneity*) of inception cohort studies; CDR validated in different populations


SR (with homogeneity*) of Level 1 diagnostic studies; CDR with 1b studies from different clinical centers


SR (with homogeneity*) of prospective cohort studies


SR (with homogeneity*) of Level 1 economic studies


1b


Individual RCT (with narrow Confidence Interval)


Individual inception cohort study with ≥80% follow-up; CDR validated in a single population


Validating** cohort study with good††† reference standards; or CDR tested within one clinical center


Prospective cohort study with good follow-up****


Analysis based on clinically sensible costs or alternatives; systematic review(s) of the evidence; and including multi-way sensitivity analyses


1c


All or none§


All or none case-series


Absolute SpPins and SnNouts††


All or none case-series


Absolute better-value or worse-value analyses††††


2a


SR (with homogeneity*) of cohort studies


SR (with homogeneity*) of either retrospective cohort studies or untreated control groups in RCTs


SR (with homogeneity*) of Level >2 diagnostic studies


SR (with homogeneity*) of 2b and better studies


SR (with homogeneity*) of Level >2 economic studies


2b


Individual cohort study (including low quality RCT; e.g., <80% follow-up)


Retrospective cohort study or follow-up of untreated control patients in an RCT; Derivation of CDR or validated on split-sample§§§ only


Exploratory** cohort study with good††† reference standards; CDR after derivation, or validated only on split-sample§§§ or databases


Retrospective cohort study, or poor follow-up


Analysis based on clinically sensible costs or alternatives; limited review(s) of the evidence, or single studies; and including multi-way sensitivity analyses


2c


“Outcomes” Research; Ecological studies


“Outcomes” Research



Ecological studies


Audit or outcomes research


3a


SR (with homogeneity*) of case-control studies



SR (with homogeneity*) of 3b and better studies


SR (with homogeneity*) of 3b and better studies


SR (with homogeneity*) of 3b and better studies


3b


Individual Case-Control Study



Non-consecutive study; or without consistently applied reference standards


Non-consecutive cohort study, or very limited population


Analysis based on limited alternatives or costs, poor quality estimates of data, but including sensitivity analyses incorporating clinically sensible variations


4


Case-series (and poor quality cohort and case-control studies§§)


Case-series (and poor quality prognostic cohort studies***)


Case-control study, poor or non-independent reference standard


Case-series or superseded reference standards


Analysis with no sensitivity analysis


5


Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”


Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”


Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”


Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”


Expert opinion without explicit critical appraisal, or based on economic theory or “first principles”


Notes


Users can add a minus-sign “-” to denote the level of that fails to provide a conclusive answer because of:
• EITHER a single result with a wide Confidence Interval (such that, for example, an ARR in an RCT is not statistically significant but whose confidence intervals fail to exclude clinically important benefit or harm)
• OR a Systematic Review with troublesome (and statistically significant) heterogeneity.
• Such evidence is inconclusive, and therefore can only generate Grade D recommendations.


* By homogeneity we mean a systematic review that is free of worrisome variations (heterogeneity) in the directions and degrees of results between individual studies. Not all systematic reviews with statistically significant heterogeneity need be worrisome, and not all worrisome heterogeneity need be statistically significant. As noted above, studies displaying worrisome heterogeneity should be tagged with a “-” at the end of their designated level.

Clinical Decision Rule. (These are algorithms or scoring systems which lead to a prognostic estimation or a diagnostic category.)

See note #2 for advice on how to understand, rate and use trials or other studies with wide confidence intervals.

§ Met when all patients died before the Rx became available, but some now survive on it; or when some patients died before the Rx became available, but none now die on it.

§§ By poor quality cohort study we mean one that failed to clearly define comparison groups and/or failed to measure exposures and outcomes in the same (preferably blinded), objective way in both exposed and non-exposed individuals and/or failed to identify or appropriately control known confounders and/or failed to carry out a sufficiently long and complete follow-up of patients. By poor quality case-control study we mean one that failed to clearly define comparison groups and/or failed to measure exposures and outcomes in the same (preferably blinded), objective way in both cases and controls and/or failed to identify or appropriately control known confounders.

§§§ Split-sample validation is achieved by collecting all the information in a single tranche, then artificially dividing this into “derivation” and “validation” samples.

†† An “Absolute SpPin” is a diagnostic finding whose Specificity is so high that a Positive result rules-in the diagnosis. An “Absolute SnNout” is a diagnostic finding whose Sensitivity is so high that a Negative result rules-out the diagnosis.
‡‡ Good, better, bad and worse refer to the comparisons between treatments in terms of their clinical risks and benefits.

††† Good reference standards are independent of the test, and applied blindly or objectively to applied to all patients. Poor reference standards are haphazardly applied, but still independent of the test. Use of a non-independent reference standard (where the ‘test’ is included in the ‘reference’, or where the ‘testing’ affects the ‘reference’) implies a level 4 study.

†††† Better-value treatments are clearly as good but cheaper, or better at the same or reduced cost. Worse-value treatments are as good and more expensive, or worse and equally or more expensive.

** Validating studies test the quality of a specific diagnostic test, based on prior evidence.An exploratory study collects information and trawls the data (e.g. using a regression analysis) to find which factors are ‘significant’.

*** By poor quality prognostic cohort study we mean one in which sampling was biased in favour of patients who already had the target outcome, or the measurement of outcomes was accomplished in <80% of study patients, or outcomes were determined in an unblinded, non-objective way, or there was no correction for confounding factors.

**** Good follow-up in a differential diagnosis study is >80%, with adequate time for alternative diagnoses to emerge (e.g., 1-6 months acute, 1-5 years chronic)


Grades of Recommendation
A consistent level 1 studies
B consistent level 2 or 3 studies or extrapolations from level 1 studies
C level 4 studies or extrapolations from level 2 or 3 studies
D level 5 evidence or troublingly inconsistent or inconclusive studies of any level


“Extrapolations” are where data are used in a situation which has potentially clinically important differences than the original study situation.
Produced by Bob Phillips, Chris Ball, Dave Sackett, Doug Badenoch, Sharon Straus, Brian Haynes, Martin Dawes since November 1998.
Reproduced with permission, 2007

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Oct 20, 2016 | Posted by in NEUROLOGY | Comments Off on Evidence-Based Medicine in Neurology

Full access? Get Clinical Tree

Get Clinical Tree app for offline access