Evidence-Based Medicine: A Conceptual Framework

images 1 images


Evidence-Based Medicine: A Conceptual Framework
Stephen J. Haines, Joyce S. Nicholas


What Is Evidence-Based Medicine?


Evidence-based medicine is a way of practicing medicine, of making clinical decisions and solving patients’ problems. It is a discipline (in the sense of a set or system of rules and regulations) that recognizes the importance of assessing the quality of evidence which is incorporated into the complex mixture of science, clinical experience, intuition, and values in the practice of medicine.1 Guyatt and colleagues,2 in coining the term, defined it as a paradigm change that elevated the examination of evidence from clinical research over intuition, pathophysiologic reasoning, and unsystematic clinical experience, defined a set of rules for examining such evidence, and reduced the role of “authority” in clinical decision making.


We propose the following formal definition of evidence-based neurosurgery: a paradigm of neurosurgical practice in which best available evidence is consistently consulted first to establish principles of diagnosis and treatment that are artfully applied in light of the neurosurgeon’s training and experience informed by the patient’s individual circumstances and preferences to regularly produce the best possible health outcomes.


It is our goal in this book to reinforce the concepts of evidence-based medicine to neurosurgical practitioners by discussing concepts, techniques, and examples.


Why Evidence-Based Neurosurgery?


Neurosurgeons typically accept no compromises when it comes to the technical aspects of their surgery, the surgical environment, and the instruments with which they operate. The principles of evidence-based medicine provide tools and information that allow us to make the same demand of the technical aspects of our clinical decision making.


It is difficult to imagine an objection to the concept that the best available evidence should be consistently applied to clinical decision making in neurosurgery. The problem arises in the word consistently. The wide variation in practice among neurosurgeons is perhaps best demonstrated by the studies of Wennberg and colleagues regarding the frequency of laminectomy in Maine.3 In the aftermath of the New York City jogger incident, the variation in head injury management was highlighted.4 Controversy on subjects as basic as the value of resection of malignant brain tumors attests to inconsistency in the application of evidence across the profession.


Neurosurgeons are accustomed to making critical decisions with incomplete information; they are trained to become comfortable doing so. This is necessary in a field dealing with uncommon conditions for which a large evidence base may not exist. Unfortunately, the convenience of making decisions based primarily on past experience, training, and reasoning from basic principles, and the natural tendency to treat a highly trained specialist in an arcane field as an authority, can lead to a habit of assuming that evidence of high quality does not exist and therefore does not need to be incorporated into the decision-making process. Despite the uncommon nature of many neurosurgical illnesses, there is a substantial and rapidly increasing body of quality evidence available to inform neurosurgical practice. The application of the principles of evidence-based medicine to neurosurgical practice allows this evidence to be applied to neurosurgical decision making, reducing the unexplained variation in neurosurgical practice and making neurosurgical care consistently better.


Evidence in Medicine: A Brief History


Progress in medicine has always depended upon the interaction of observations and their interpretation in the context of current belief and knowledge. Progress has been associated with the alteration of the theoretical structure of medical intervention by new observations. The idea that the combined observations of more than one physician might lead to deeper understandings that could be generalized to the practice of many physicians is impossible to attribute to a single person or era. One of its earliest manifestations in Western medicine, however, is attributable to the French physician, Pierre Charles Alexandre Louis, who proposed in 1829 that the routine tabulation of treatments and outcomes for tuberculosis, followed by statistical summarization, could lead to new insights and improvement in treatment.5


The development of the mathematics of probability and of sophisticated statistical methods in the late 19th and early 20th centuries provided tools for more accurately interpreting collected numerical observations. These techniques were readily applicable to easily measured phenomena such as the number, height, and weight of plants growing in a given plot of ground; the height, weight, and mortality rate of people and, as more and more techniques for measuring chemical values in blood became available, parameters such as sodium and glucose.


The application of such techniques to less easily measured but clinically very important parameters such as pain, emotion, and functional ability developed gradually by necessity in the quantitative social sciences, and in recognition of the psychosocial aspects of patient experience. The use of these techniques to measure clinical phenomena is well summarized by Alvin Feinstein.6


Simultaneous with the development of methods for more reliably measuring the semiquantitative information that makes up much of the physician’s database, techniques for applying the experimental method—so successfully used in the laboratory and agriculture — were under development in the clinical setting. Much of the early history of the development of randomized clinical trials has been reviewed previously.7,8 The first generally acknowledged true randomized clinical trials are those of Sir Richard Doll studying the treatment of pulmonary tuberculosis and Hart and Daniels studying the treatment of whooping cough.9 The first identified neurosurgical trial is that of McKissock et al published in 1960.10


The two lines of investigation in the development in clinical research, improving the reliability of semiquantitative measurement and application of the experimental method to the clinical situation, converged in the latter half of the 20th century in the discipline of clinical epidemiology. Called by David Sackett “a basic science for clinical medicine,” clinical epidemiology embraces the full range of clinical observation, measurement, and experimentation.11 The discipline provides a set of tools for applying scientific methods to the clinical practice of medicine in the same way that a different set of tools has been applied to fundamental biological questions.


As the end of the 20th century approached, both basic and clinical medical sciences had established fundamental paradigms for continuously investigating and improving knowledge about biology and disease. There remained the problem of properly applying the results of these investigative techniques to the daily practice of medicine. This was the vision of the Evidence-Based Working Group at McMaster University when they coined the term and defined evidence-based medicine as a new way of teaching the practice of medicine.2 Evidence-based medicine seeks to provide a set of tools for the practicing physician to artfully interpret the best available research evidence in light of the individual patient’s unique situation and the physician’s past experience to optimize the outcome. In this sense, evidence-based medicine is a discipline of medical practice that insists on rigor and understanding of what is and is not known about a particular disease. It requires a detailed understanding of the individual patient’s condition so that the general information, given from best available evidence, can be appropriately applied to the individual patient.


Quality of Evidence


The fundamental insight of evidence-based medicine and its basic science, clinical epidemiology, is that the quality of evidence is more important than its quantity. Evidentiary quality has several dimensions, each of which must be examined if quality is to be assessed. These dimensions are the clarity of the question that the evidence addresses, the reliability of the measurements, the appropriateness of the analysis, and the soundness of the conclusion.


Clarity is required because the question asked determines what must be measured and how it should be analyzed. A broad question such as, What happens to people with head injuries? leads either to an unfocused set of observations that are ultimately very difficult to apply to individual situations or to a large number of much more restricted and clearly defined questions (for example, What are the functional outcomes one year after minor head injury in children not treated in a hospital?) that become subprojects in the more broadly defined overall investigation. The art of clearly defining a clinical research question is analogous to the art of clearly defining a patient’s chief complaint. The patient may start with a complaint as general as I don’t feel good or My back hurts. Through a series of questions, the clinician must refine that complaint to one that can be appropriately addressed. Below we will discuss the different types of clinical research questions that provide a basis for collecting high-quality clinical evidence to support clinical practice.


Reliability refers to both the validity and the reproducibility of measurement. These concepts are familiar in the laboratory and the carpenter shop. If one needs to measure extracellular sodium, the measurement device must measure sodium, not all positively charged ions. Length is measured with a ruler not a scale. The rule “measure twice, cut once” indicates the carpenter’s demands that repeated measurements must be the same (i.e., reproducible within a predetermined acceptable range of error) if they are to be used to guide action.


These concepts are just as important in clinical measurement but much more difficult to implement. The validity of a new instrument can be assessed once compared with the existing gold standard, but frequently no gold standard exists. Validity must then be inferred by comparison with other indirect measures or a combination of direct measures.6


In clinical applications, reproducibility is measured both between observations made by the same observer (intraobserver or intrarater reliability) and those made by different observers (interobserver or interrater reliability). The measurement of reproducibility is a complex field in and of itself. The measured reproducibility of clinical assessment tools should be known to anyone who uses the tools, or the information generated by them, for research or clinical practice.6


The appropriateness of the analysis is also important in evidence-based medicine. Reliably acquired measurements resulting from a clear clinical research question can normally be interpreted if the measurements are appropriately analyzed. The choice and conduct of appropriate analysis of clinical research data are an extraordinarily complex and rapidly changing discipline. Few, if any, active clinicians engaging in clinical research should attempt developing high-quality clinical evidence without consultation with an expert in clinical biostatistics or clinical epidemiology. Markers of quality begin with the design of the investigation. This minimizes the introduction of bias. Other markers of quality that follow are the collection of the data, the verification of its reliability, and the use of the appropriate analytic techniques as well as tests to ensure that the planned protection against bias has been successful. The biostatistical consultant or clinical epidemiologist must participate in all phases of the design and analysis if the quality of evidence is to be optimized. These issues have been reviewed in detail previously. Several summaries are available.1214


The soundness of the conclusion of a clinical investigation is a direct outgrowth of the quality of evidence gathered and the degree to which the conclusion is directly related to that evidence. It is a serious temptation for the investigator to draw conclusions that go beyond the limits of available evidence (for example, claiming safety for a new procedure when all that has been demonstrated is a low rate of complications in a relatively small number of cases). Fortunately, if well reported, the data speak for themselves and the alert reader can identify such inappropriate extrapolations.


Quality Assessment of Individual Studies


The largest number of measurable determinants of the quality of a clinical research study is found in the design and conduct of the study. Therefore, the best-accepted schemes that classify a study’s quality are based on the hierarchy of study design. This was first formally done in the formulation of recommendations for the use of antithrombotic therapy.15 Studies of therapy received the earliest focus and the most attention; hence, their rating system is best developed. Similar classifications of studies for prognosis, diagnosis, symptom prevalence, and economic and decision analysis have been developed. These are presented in detail at the Centre for Evidence-Based Medicine Web site (http://www.cebm.net/levels_of_evidence.asp). In these schemes, single studies are generally classified into 1 of 5 levels of evidence with level 1 being evidence of highest quality. Levels 1, 2, and 3 are subdivided to indicate that under some unusual circumstances studies which would not ordinarily be given that quality rating may be included (for example, a formerly fatal disease in which survival is now reliably reported with a new treatment).


For therapeutic studies, level 1 evidence generally comes from well-designed and well-conducted randomized clinical trials with small confidence intervals around the treatment effect. Level 2 evidence comes from randomized clinical trials of lesser quality or well-designed and well-conducted prospective cohort studies (studies of defined, but not randomized, groups of patients followed forward in time with rigorous methodology). Level 3 evidence is generally associated with well-designed retrospective cohort or case control studies, whereas level 4 evidence comes generally from case series or lower quality cohort and case control studies. Level 5 evidence is derived from expert opinion or authoritative statements.


Many attempts have been made to develop scales to rate the quality of clinical trials. One of the earliest attempts was by Chalmers et al in 1981 (Table 1–1).16 The complexity and subjectivity in some of the required judgments have kept his scale from achieving widespread acceptance. For randomized trials, the only validated quality measurement is that of Jadad et al.17 The scale has the advantage of brevity and the disadvantage of assessing only three aspects of study design (Table 1–2).


Level 1 studies of prognosis involve the prospective study of an inception cohort (patients who enter the study at a well-defined and similar stage of disease) with a high percentage of follow-up (>80%). Level 2 evidence comes from retrospective cohort studies or the untreated subgroup of a randomized clinical trial. The next level of prognostic study, the case series (level 3), is poorly designed or the conducted cohort studies are considered level 4 evidence, and level 5 evidence is again based on expert opinion.


 





























Table 1–1 Chalmer’s Random Clinical Trial Assessment Scheme
Dimension Number of Items Possible Points
Basic descriptive information 9 0
Study protocol 14 60
Statistical analysis 9 30
Presentation of results 4 10

Source: Adapted from Chalmers TC, Smith H Jr, Blackburn B, et al. A method for assessing the quality of a randomized control trial. Control Clin Trials 1981;2:31–49. Adapted by permission.


Note: Different numbers of points are assigned for each item. A different allocation of points in the subsection of the Protocol section is done depending on the type of study end points used.


 




















Table 1–2 Jadad et al Random Clinical Trial Assessment Scheme
Question Points for “Yes” Answer
Was the study described as randomized? 1
Was the study described as double-blind? 1
Was there a description of withdrawals and dropouts? 1

Note: Extra points are given for the first question if the method of randomization is described and appropriate as well as for the second question if the method of blinding was described and appropriate. One point is deducted if either or both the method of randomization and the method of blinding were described and inappropriate.


 


For studies of diagnostic methods, level 1 evidence comes from cohort studies that blindly apply an independent gold standard or “reference test” to a well-selected and diverse population of patients with and without the disease to be diagnosed. Level 2 evidence comes from exploratory cohort studies that may examine several factors to determine which are most closely associated with the diagnosis made by a gold standard reference test. Level 3 evidence comes from the study of nonconsecutive patients or one with inconsistently applied reference standards, whereas level 4 evidence may use a case control technique or not have a gold standard. Level 5 evidence comes from expert opinion. For more detail, the Levels of Evidence table at the Centre for Evidence-Based Medicine Web site should be consulted (http://www.cebm.net/levels_of_evidence.asp).


Rating Quality of Cumulative Evidence


One can rarely draw firm conclusions from a single clinical study, even one of excellent quality. Recommendations regarding best treatment, best methods of diagnosis, most-confident predictions of prognosis, and definitive statements about the safety or risks of an intervention come from consistent results accumulated across several studies of the same question by different groups of investigators on different groups of patients. When a great diversity in patient population is associated with high consistency (homogeneity) of results, the confidence in the conclusion and its generality is greatest.


Recommendations made by summarizing cumulative evidence are also graded according to the quality of evidence that supports them. Grade A recommendations are based on level 1 evidence. Those conclusions may be supported by evidence of lower quality, but a grade A recommendation cannot be made without level 1 evidence supporting it. Recommendations are given grade B when the highest level of evidence supporting them is consistent level 2 or 3 evidence. Level 4 evidence supports grade C recommendations; all others are grade D (http://www.cebm.net/levels_of_evidence.asp). Mercifully, only four grades of recommendation have been proposed, and therefore no recommendation is given the grade F.


The process of summarizing evidence has been the object of study in the past 2 decades and rigorous methodologies, including the development of critically appraised topics (CATs) and systematic review have emerged as scientific replacements for the traditional authoritative-based “review article” that includes a synthesis of the literature without any attempt to address its quality (http://www.cebm.net/cats.asp).


Markers of quality for evidence summaries are similar to those for individual clinical research studies. There should be clarity in the question reviewed. The process should be reliable with the expectation that if a different set of reviewers performed the same process they would reach the same conclusions. The techniques of evidence identification and summarization should be appropriate. This applies particularly to methods of literature searching and data pooling, such as meta-analysis. The conclusion should be sound and directly related to the accumulated evidence and its quality.


The techniques of systematic review are arduous because they are rigorous. They begin to apply the same scientific rigor to clinical practice that physicians expect in the basic science laboratory. (See the Cochrane Library Reviewer’s Manual at http://www.update-software.com./cochrane/ for detailed instructions on systematic review.)


The Practice Guidelines development process adopted by organized neurosurgery has made some practical adaptations of these detailed schemes for single article assessment and recommendations based on summarized evidence. Utilizing the classification of evidence promulgated by the American Medical Association and the American Academy of Neurology, evidence for both single studies and evidence summaries is placed in one of three categories and referred to as class I, II, or III evidence.18


Improving Evidence Quality


CLARITY IN ASKING QUESTIONS

Scientific questions will generally be asked in one of two formats: a testing of hypothesis or the estimation of a parameter.


Hypothesis testing is familiar to most physicians. Typically, a comparison is being made (for example, operation A cures more people with disease X than does operation B or no operation). The hypothesis that there is no difference (the null hypothesis) between the treatments is established and tested. If the observed difference between the results of the treatments is sufficiently large (as determined by appropriate statistical techniques) that hypothesis is rejected, and it is concluded that the results of the treatments are different. It is the job of statisticians to determine how much of a difference is necessary for a confident rejection of the null hypothesis.


Hypothesis testing, particularly with the sophisticated application of statistical power analysis, has served medical statistics well, but the logic is convoluted (trying to disprove the hypothesis that the difference that you were looking for does not exist) and the opportunities for misinterpretation (particularly when the null hypothesis is not rejected) are many.


In recent years, there has been increasing use of estimation techniques in which confidence intervals around measures of treatment effects, likelihood of diagnosis, or rates of survival are made. These techniques still allow planning for sample sizes to arrive at specified estimates; also, the interpretation of the results is more straightforward. In either case, the framing of questions so that precise estimates can be made or hypotheses tested helps the investigator formulate clear research questions.


RELIABILITY: VALIDITY AND REPRODUCIBILITY

The scientific basis of reliability in clinical research rests on techniques for developing and validating measures of clinical parameters. Tools used to assess commonly measured clinical parameters such as muscle power, level of consciousness, functional ability, mood, and pain should be put through the same rigorous process of testing that we expect from the surgical instruments we depend on in the operating room. These techniques are available and should supplant the ad hoc creation of measurement tools by investigators at the outset of their study. These techniques are a discipline of their own; for details the reader is again referred to Feinstein’s Clinimetrics.6 However, investigators should report the reliability of the measuring instruments that they use. Readers of clinical research reports should also require information on reliability to interpret the results of such studies.


APPROPRIATENESS: CONTROL, NOISE, AND BIAS

Control Most scientific observations require a comparison. This is most obvious in therapeutic research. It does us little good to know that an operation was successful 70% of the time in a series of patients unless we know how successful it was in another group of patients or in the hands of another surgeon. The best controls most closely resemble the treated patients with the exception of the treatment applied. Most of the techniques for minimizing bias in therapeutic research are directed at achieving this goal (see below).


In diagnostic research, the concept of a control is embodied in the gold standard. Such a standard has previously been shown to be the most valid identifier of the disease in question. The gold standard is applied blindly and independently to the same patients to whom the new diagnostic test is applied so that its ability to identify the disease can be compared.


Even in prognostic research, the prognosis of patient groups identified by important variables such as age, gender, histologic class, or even presence or absence of the disease is frequently compared. Isolated observations without a standard of comparison are of relatively little value.


Bias Error in measurement comes in two forms: bias and noise. Bias is a systematic error introduced by some factor not related to the phenomenon being measured. Bias is what happens when a scale measures 1 pound of weight with nothing on it and systematically reports every weight 1 pound heavier than it actually is. The opportunities to introduce bias into the measurement of clinical phenomena are nearly unlimited. A 1979 catalogue of bias ran to 57 varieties.19 Many more have subsequently been identified.20 However, the major categories include biases introduced by the passage of time (chronology bias), by unequal opportunities for observation (observation bias), by different susceptibility to disease or its treatment (susceptibility bias), and by deliberate or inadvertent lack of compliance with the protocol by patient and/or physician (compliance bias).


The fundamental technique for controlling chronology bias is the use of contemporaneous controls. Control of susceptibility bias is accomplished when patients are stratified as they enter the study according to factors known to affect or predict susceptibility to the disease or treatment under study. This can also be addressed during analysis, but when factors are known to affect the outcome of the study, stratification upon entry is the most secure method of eliminating this form of bias. Observation bias is controlled by the classic technique of blinding. The history of clinical science is replete with examples of unintentional and intentional subversion of clinical research when awareness of the intervention affected the observations made during the study. Even when practicalities prevent the investigator from being unaware of the intervention, those who collect and analyze the data can be kept in the dark to minimize the risk. Compliance bias is controlled through careful design and implementation of the study protocol with specific monitoring for and reporting of compliance.


The ultimate tool for controlling bias is randomization. With sufficient numbers of patients, randomization equalizes the probability that any known or unknown factor that may affect an outcome will be present in the comparison groups. The ability of randomization to equalize the risk of an unknown factor affecting outcomes gives it its unusual power and special place in experimental design. Whereas stratification and analysis can be used to control for factors known to affect outcome, we have no other technique for controlling something that we do not know about. It is for this reason, and this reason alone, that randomization is part of the requirement for the highest levels of evidentiary quality in therapeutic research.


The control of bias is one of the primary goals of the design of clinical investigations. Statistical techniques can be used to adjust for known bias, but known bias is generally designed out of good studies. The role of statistics is to deal with random variation (noise) that is an inherent feature of measuring natural phenomena.


Noise No measurement is perfect, and no natural phenomena exactly repeat themselves endlessly. There is random variation in both measurement and the phenomenon itself. This random variation, or noise, fortunately follows predictable patterns. Statistics is the study of these patterns and the science of analyzing them and considering them when drawing conclusions from observation.


Most physicians are familiar with the concept of the “normal” or Gaussian distribution, which describes the pattern of random variation in many natural phenomena. The application of these understandings has allowed the development of parametric statistics that can provide estimates of the likelihood of a particular observation coming from a population of known mean and a normal distribution of random error. Complex distributions of error associated with other types of phenomena and observations and the wide variety of distributions are the reasons that statistical testing is a discipline unto itself. It is sufficient that the physician understand that statistical analysis takes this random variation into account.


The reliability of measurement tools is an important factor in determining the amount of noise or random variation that can be expected in a study. The more reliable the instruments, the smaller the amount of noise, the easier it will be to identify treatment effects, estimate their magnitude, validate a new diagnostic test, or define the prognosis of a specific disease.


The appropriateness of the analysis of a clinical study therefore depends on both the ability of the design of the study to control for bias and the use of the right statistical techniques to manage random variation or noise in the data. The concepts have been incorporated into the levels of evidentiary quality discussed above.


Soundness of Conclusions


Some would argue that the soundness of conclusions expressed by the authors of single reports of clinical investigation is irrelevant to the process of critical analysis and evidence synthesis. Ask an appropriate question, address it with proper methods performed correctly, assess the quality of evidence provided, and the conclusions that flow from the data will be evident to all who read the report. Unfortunately, in the pressure of day-to-day practice, many neurosurgeons do not have enough time to carry out these tasks, and they rely on the conclusions summarized by the authors in their article or abstract for the “take home” message. Too often, those conclusions extrapolate well beyond the limits of the data presented, assume that others will have similar results, or simply state a conclusion believed by the authors that has little or nothing to do with the data presented. It is always, therefore, important to test the conclusion against the question originally asked, the methods used, and the data collected.


The soundness of conclusions reached because of synthesizing cumulated evidence must also be checked against the question posed and the quality of the evidence collection process and the evidence collected. There will necessarily be judgments made about the weight of evidence, ideally determined by its quality primarily. However, in a well-conducted and well-reported evidence synthesis process, the reader will clearly see how those judgments are made and reach conclusions about their validity.


Different Types of Questions Require Different Types of Evidence


Just as different operations require different tools, different clinical questions require different analytic techniques. We cannot define prognosis with a randomized clinical trial; ethically, we cannot deliberately allocate people to be given a disease. (We can learn something about the prognosis of patients eligible for the trial from the control group, however.) We cannot adequately evaluate the difference between two treatments by following a carefully selected group of patients who have the same disease and receive the same treatment, although we may learn much about their treated prognosis in this way. Matching evidentiary technique to the question being asked is an important research and evidence synthesis tool. Understanding that this is important and watching out for mismatches when critically analyzing the literature is an important skill for every neurosurgeon (Table 1–3).


Patient Assessment: Agreement among Observations


We examine patients to obtain clues to their underlying disease. We obviously must be able to rely on the findings of examination if they are to point us in the right direction in diagnosis and management. We want to know that the findings are reproducible. If a patient is examined under the same conditions at two different times and nothing else has changed the findings should be the same. Likewise, if two or more different clinicians examine the patient under identical situations and at approximately the same time, the findings should be the same. This desire for reproducibility defines the type of question that must be asked of methods of patient assessment.


 





























Table 1–3 Types of Clinical Research Questions
Questions of assessment Patient assessment (clinical exam)
Diagnostic test assessment
Questions of prognosis Natural history
(Prognosis without treatment)
Prognosis with treatment
Questions of causation Etiology studies
Safety and harm studies
Questions of treatment Therapeutic efficacy
Therapeutic effectiveness

 


Diagnostic Testing: Agreement with the Gold Standard


The purpose of a diagnostic test is to help us identify a disease or condition. We might hope that magnetic resonance (MR) spectroscopy could accurately identify malignant glioma, for example. We should ask the question: When the diagnosis made by MR spectroscopy is compared with the current gold standard (histopathologic diagnosis from biopsied or resected brain), how well do they match?


Natural History and Prognosis: Observation over Time


All predictions of the usefulness of therapeutic intervention imply a comparison to what happens if there is no intervention. The course of disease without intervention is its natural history. The question asked for a natural history investigation is straightforward: What happens over time to patients with a specific established diagnosis?


Obtaining such information is more difficult than it would seem because we are unlikely to have no opinion about what may happen in the future to a patient once a diagnosis is made. Even in the absence of good evidence of therapeutic benefit, we often intervene with an educated guess based on experience, logic, or dogma. Therefore, we frequently must make do with evidence of what happens to patients with a certain diagnosis under specified conditions (a particular type of treatment having been administered, the disease having been modified in some way by age, gender, availability of medical care, or any of a host of factors).


In either case it is necessary to reliably identify the disease and how advanced it is in a defined group of patients recognizably similar to others with the disease, and carefully follow them long enough to determine important outcomes. Ethically we cannot deliberately give people diseases; therefore, we cannot control the allocation of the disease state. This fundamentally affects how the question is asked and answered when natural history or prognostic information is required.


Treatment Efficacy and Effectiveness: Comparison of Outcome


Treatment attracts the greatest attention from physicians and patients; it is what patients seek and physicians provide. This is where medical intervention can change the course of disease for the good of the patient. The very concept of intervention demands a comparison of the results of treatment with the results of some alternative. This leads to the fundamental question, Compared to what?, that is the crux of therapeutic evaluation.


Efficacy refers to the ability of an intervention to provide superior out-come under the carefully controlled circumstances of a clinical experiment. Precautions are taken to be sure that the intervention is tested only on the group of patients for which it has been designed. The outcome comparison should be to the best-proven therapy for that condition.


Effectiveness is studied once efficacy has been demonstrated. The question changes from, Does this intervention work as it was intended? to, Does this intervention work as well in real-world practice as it did in the efficacy trials?


Causation: Outcome Dependence


Causation often suggests an association between an event and the out-come, but it can only be proven when it is shown that the outcome is contingent on the prior event. The problem of identifying that dependence (i.e., the cause of a disease), is even more difficult than clarifying its diagnosis, knowing its natural history, or verifying the benefit of its treatment. Many factors may be involved in causation, and we cannot ethically cause a disease to study it. Causation is different from association: Both the incidence of AIDS and the value of the stock market rose continuously through the 1990s, but it is unlikely that either caused the other. Many aspects of the apparent relationship between putative cause and outcome must be assessed (timing, dose-dependence, consistency, biological feasibility, etc.).


The technical details of how best to ask and answer these questions have been discussed elsewhere.21 The important concept is that different questions demand different types of data and different techniques of analysis. There is no “one size fits all” method of clinical investigation. Many failures of published studies to answer the question originally posed result from choosing the wrong technique and seeking the wrong type of data to provide the answer.


Conclusion


The methods of evidence-based medicine provide methods that allow rigor in clinical decision making comparable to the rigor of laboratory science and surgical technique. Their application to neurosurgery is difficult, but allows the neurosurgeon to be as demanding in deciding to perform surgery and evaluating its outcome as he or she is in performing it. We have reviewed the concepts that underlie evidence-based medicine. The following chapters present neurosurgical examples of the application of these principles that should help the reader improve his or her decision making and outcome technique.


References



  1. Stein, J. Random House Dictionary of the English Language. New York: Random House; 1966
  2. Evidence-Based Medicine Working Group. Evidence-based medicine: a new approach to teaching the practice of medicine. JAMA 1992;268:2420–2425
  3. Keller, RB, Soule, DN, Wennberg, JE, Hanley, DF. Dealing with geographic variations in the use of hospitals: the experience of the Maine Medical Assessment Foundation Orthopaedic Study Group. J Bone Joint Surg Am 1990;72:1286–1293
  4. Bulger, EM, Nathens, AB, Rivara, FP, Moore, M, MacKenzie, EJ, Jurkovich, GJ. Management of severe head injury: institutional variations in care and effect on outcome. Crit Care Med 2002;30:1870–1876
  5. Louis, PCA. Anatomic, pathologic and therapeutic research on the disease known by the name of gastroenteritis putrid fever, adynamic atoxic typhoid, etc. Am J Med Sci 1829;4:403
  6. Feinstein, AR. Clinimetrics. New Haven, CT: Yale University Press; 1987
  7. Haines, SJ. Randomized clinical trials in the evaluation of surgical innovation. J Neurosurg 1979;51:5–11
  8. Haines, SJ. Randomized clinical trials in neurosurgery. Neurosurgery 1983;12:259–264
  9. Doll, R. Controlled trials: the 1948 watershed. BMJ 1998;317:1217–1220
  10. McKissock, W, Richardson, A, Walsh, L. Posterior-communicating aneurysms: a controlled trial of the conservative and surgical treatment of ruptured aneurysms of the internal carotid artery at or near the point of origin of the posterior communicating artery. Lancet 4 June 1960;1:7136–7139
  11. Sackett, DL, Haynes, RB, Guyatt, GH, Tugwell, P. Clinical Epidemiology. A Basic Science for Clinical Medicine. 2nd ed. Boston: Little, Brown; 1991
  12. Riegelman, RK. Studying a Study and Testing a Test: How to Read the Medical Evidence. 4th ed. Philadelphia: Lippincott, Williams & Wilkins; 2000
  13. Hulley, SB, Cummings, SR, Browner, WS, Grady, D, Hearst, N, Newman, TB. Designing Clinical Research: An Epidemiolgic Approach. 2nd ed. Philadelphia, PA: Lippincott, Williams & Wilkins; 2001
  14. Feinstein, AR. Clinical Epidemiology. The Architecture of Clinical Research. Philadelphia, PA: WB Saunders; 1985
  15. Sackett, DL. Rules of evidence and clinical recommendations on the use of antithrombotic agents. Chest 1989;95(Suppl 2):2S–4S
  16. Chalmers, TC, Smith, H Jr, Blackburn, B, et al. A method for assessing the quality of a randomized control trial. Control Clin Trials 1981;2:31–49
  17. Jadad, AR, Moore, RA, Carroll, D, et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials 1996;17:1–12
  18. Walters, BC. Clinical practice parameter development in neurosurgery. In: Neurosurgery in Transition: The Socioeconomic Transformation of Neurological Surgery. Baltimore, MD: Williams and Wilkins; 1998:99–111
  19. Sackett, DL. Bias in analytic research. J Chronic Dis 1979;32:51–63
  20. Choi, BC, Noseworthy, AL. Classification, direction, and prevention of bias in epidemiologic research. J Occup Med 1992;34:265–271
  21. Haines, SJ. Evidence-based neurosurgery. Neurosurgery 2003;52:36–47

< div class='tao-gold-member'>

Stay updated, free articles. Join our Telegram channel

Jul 2, 2016 | Posted by in NEUROSURGERY | Comments Off on Evidence-Based Medicine: A Conceptual Framework

Full access? Get Clinical Tree

Get Clinical Tree app for offline access