49 Clinical Outcome Analyses Abstract Increasingly attention has been focused on the tenets of evidence-based medicine in an effort to improve health care delivery and patient management by evaluating the best evidence along with the different parameters that may affect present-day management of spine disorders. A variety of outcome measures have been developed over the years to evaluate the effectiveness of treatments. These outcome measures will be discussed in this chapter as well as levels of evidence, and types of clinical research and methodology. More specifically, and as stated by Sackett et al, “Evidence-based medicine is the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients. The practice of evidence-based medicine means integrating individual clinical expertise with the best available external clinical evidence from systematic research.” Keywords: patient outcome measures, Oswestry, SF-36, research design, methodology, clinical evidence, levels of evidence The principal objectives of spine surgery include the following: relief and stabilization of pain, appropriate decompression, and restoration of function. To that end, spine surgery has undergone a tremendous evolution over the past 30 years in terms of the development and use of instrumentation, refined or newly designed surgical techniques, use of different graft substrates and biologic agents, implementation of surgical adjuncts, and the application of tissue engineering and gene therapy. Such advances in spine biotechnology have been fueled by the urgent need to improve patient outcomes. However, costs associated with patient management continue to increase and are a growing concern directly affecting physicians’ practices, insurance rates, and hospital expenses. As a result, clinical outcomes have become even more significant in terms of monitoring and evaluating various clinical practices within the discipline of spine surgery and their correlation with the quality of health care delivery. Therefore, much attention has been focused on the tenets of evidence-based medicine in an effort to improve health care and patient management by evaluating the best evidence along with the different parameters that may affect present-day management of spine disorders. More specifically, and as stated by Sackett et al,1 “Evidence-based medicine is the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients. The practice of evidence-based medicine means integrating individual clinical expertise with the best available external clinical evidence from systematic research.” To determine the best evidence with respect to treatment efficacy in improving a patient’s health status, outcomes research is an integral component in the decision-making armamentarium of the spine surgeon. In essence, outcomes research is research that strives to compare the effects of one or more treatment regimens among various groups. In spine surgery, physical examination and radiographic analysis have been instrumental in evaluating various outcome parameters. However, the past decade has seen the development of a variety of outcome assessment tools addressing a wide range of factors that are indicative of a patient’s condition as well as the preoperative and postoperative course. These outcome measurement tools may have the potential to affect outcomes and provide a foundation for developing the best treatment strategy for each patient. The past decade has been witness to a paradigm shift away from an emphasis on disease to a focus on the patient’s health, ability to function, and sense of well-being.2 Incorporating health-related quality-of-life (HRQL) assessments into practice enables the clinician, hospital, and patient to devise and adopt optimal management strategies. The primary focus of HRQL assessments is to address the effects of the patient’s physical, emotional, and social well-being on his or her medical condition and treatment. More specifically, and according to Patrick and Deyo,3 HRQL measurements define the scope of the following: impairment, functional status, health perceptions, social interaction, and duration of life. To properly address clinical outcomes or progression of a medical condition and the effects of treatment, various aspects of assessment should be considered. It has been recommended that the following five measures be incorporated into any treatment plan: the patient’s physical condition or health status, the effect of the condition on the patient’s quality of life, general assessments of health status (not specifically linked to the patient’s condition), patient expectations and satisfaction with treatment, and covariates needed to identify subgroups of patients who might respond differently to treatment.4 Because of the multitude of factors that may affect outcome, a proper measurement tool should be employed. However, the breadth of coverage of the selected outcomes instrument must be balanced with an appropriate length of time needed for completion as well as an ability to measure the outcome of interest. Implementation of instruments for proper assessment of health status and clinical outcomes is key, and there are a variety of appropriate measurement tools from which to choose ( Table 49.1 Types of outcome measures
49.1 Introduction
49.2 Criteria for Standardized Outcomes Assessment
49.3 Classification of Health Status Instruments
Table 49.1). We will limit our discussion to the generic or disease-specific instruments that are commonly used in spine surgery.5 Generic instruments are more comprehensive in scope and address the overall impact of an illness or intervention on conditions across a wide variety of populations. However, generic instruments lack specificity because they fail to isolate variables of interest, and thus certain aspects of treatment and additional health-related dimensions affected by a particular condition may be overlooked. Alternatively, disease-specific instruments strive to identify the various domains associated with a condition and intervention. Such an outcome measurement tool is either clinical in scope (i.e., focuses on signs, symptoms, and direct sequelae) or experimental in design (i.e., addresses the impact of an illness or problem).5
Type of measure | Description |
Dimension specific | Focus is on a particular aspect of health (e.g., Beck’s depression inventory54) |
Disease/population specific | Measures several health domains and focuses on aspects of health that are relevant to particular health problems |
Generic | Measures outcomes across diseases and different patient populations |
Individualized | Measures the importance of certain aspects of the respondent’s life and assigns weights to produce a single score (e.g., patient-generated index scores55) |
Role specific | A more specific generic tool that captures aspects of working life (e.g., Occupational Role Questionnaire19) |
Utility | Developed for economic evaluation; entails preferences for health states and yields a single index (e.g., EuroQol EQ-5D56) |
A plethora of tools for assessing outcomes have been designed to address spine-related pathology. However, development of such instruments has been a daunting task because of the wide variability in therapeutic interventions and the challenge of properly quantifying and assessing the aspects of pain and the functional disabilities of a given condition. A number of assessment instruments do exist including method of measuring pain, back-specific disability scales, neck pain and disability scales, and instruments for evaluating general functional status ( Table 49.2). The strength of a particular outcome-measuring tool is based on its specificity with regard to a given condition or population, sensitivity to various factors stemming from the patient’s condition, reproducibility, validity, responsiveness, and interpretability (
Table 49.3). In recent years, such tools have been specifically designed not only to address localized manifestations stemming from the spinal pathology, but also to include assessment of patient-specific factors that may affect health status and treatment outcomes including the following: level of education, employment history and job satisfaction, psychological considerations, worker’s compensation and third-party claims, expectations, and level of satisfaction (see
Table 49.2). Whatever the case, the practitioner and the researcher should both be aware of the many factors that may adversely affect clinical outcomes and select the health status assessment tool that best addresses the pathology in question and its associated domains. Several examples of outcome measurement tools that are common in spine surgery are discussed in this chapter.
Table 49.2 Various conditions and outcome measurement tools
Category | Type of measurement |
Pain scales | Verbal Rating Scale |
Visual Analog Scale | |
Numerical Rating Scale | |
Wisconsin Brief Pain Questionnaire | |
Memorial Pain Questionnaire | |
McGill Pain Questionnaire | |
Patient Outcome Questionnaire | |
Medical Outcomes Study | |
Descriptor Differential Scale | |
Integrated Pain Scale | |
Pain Perception Profile | |
West Haven-Yale Multidimensional Pain Inventory | |
Brief Pain Inventory | |
Unmet Analgesic Needs Questionnaire | |
City of Hope Mayday Pain Resource Center Pain Audit Tools | |
City of Hope Mayday Pain Resource Center Patient Pain Questionnaire | |
Dallas Pain Questionnaire | |
Northwick Park Neck Pain Questionnaire | |
Neck Pain and Disability Scale | |
Disability: lower back questionnaires | Modified Oswestry Low Back Pain Disability Questionnaire |
Million Disability Questionnaire | |
Roland–Morris Disability Questionnaire | |
Waddell Disability Index | |
Low Back Pain Type Specifications | |
Disability: cervical questionnaires | Neck Disability Index |
Neck Pain and Disability Scale | |
Headache Disability Index | |
Psychometric questionnaires | Illness Behavior Questionnaire |
Psychosocial Pain Inventory | |
Waddell Nonorganic Low Back Pain Signs | |
Modified Somatic Perception Questionnaire | |
Somatic Amplification Rating Scale | |
Modified Self-Rating Zung Depression Scale | |
Minnesota Multiphasic Personality Inventory | |
Health Status Questionnaire | |
Fear Avoidance Beliefs Questionnaire | |
Patient satisfaction questionnaires | Patient Satisfaction Questionnaire |
Group Health Association of America Consumer Satisfaction Survey | |
Chiropractic Satisfaction Questionnaire | |
Combined assessment scales | Edmonton Symptom Assessment System |
Symptom Distress Scale | |
Memorial Symptoms Assessment Scale | |
Symptom Scale | |
Voices | |
Rotterdam Symptom Checklist | |
Support Team Assessment Schedule | |
National Hospice Study | |
COOP Charts | |
Hospice Quality of Life Index | |
McGill Quality of Life Index | |
Quality of Well-Being Scale | |
EORTC QOL-30 | |
VITAS Quality of Life Index | |
SF-36, SF-12 | |
Health Status Questionnaire | |
RAND 36-Item Health Survey | |
Sickness Impact Profile | |
Nottingham Health Profile | |
Scoliosis Follow-Up Questionnaire | |
Cervical Spine Outcomes Questionnaire | |
North American Spine Society Lumbar Spine Outcome Assessment Instrument |
49.4 Outcome Assessment Instruments
49.4.1 Short Form-36
The Short Form-36 (SF-36) is a 36-item questionnaire that was developed to evaluate how the health care system affects patient health.6 The SF-36 was initially formulated to address a variety of psychometric standards for group comparisons and stemmed from concepts proposed by the Medical Outcomes Study (MOS) of the late 1980s.7 Relying on various health assessment instruments of the 1970s and 1980s, such as the Health Insurance Experiment,8 the Health Perceptions Questionnaire,9 the General Psychological Well-Being Inventory,10 the Functioning and Well-Being Profile,11 and other physical and functioning measures, the SF-36 health survey has been a widely used, comprehensive, generic outcome assessment tool that quantitatively measures physical and psychological dimensions. The SF-36 survey has been the subject of numerous investigations and has been translated into different languages in more than 40 countries as part of the International Quality of Life Assessment (IQOLA) Project.12 Based on the standard SF-36 survey, which was introduced in the early 1990s,6 a second version of the IQOLA questionnaire now includes the following components: physical functioning, limitations due to physical or emotional problems, bodily pain, social interaction, general mental health (psychological distress and well-being), vitality (energy/fatigue), and general health perceptions. In essence, the SF-36 is a reliable and valid tool that is commonly used because of its brevity, psychometric assessment, and applicability to patients with a variety of medical conditions and demographics.13,14,15,16 Moreover, the SF-36 questionnaire is self-administered, is sensitive to differences in disease severity, and distinguishes between sick and healthy populations.7 In addition, to further improve efficiency and decrease the costs associated with administering the SF-36, a shorter version of the questionnaire was created in the mid-1990s and aptly named the SF-12. This shorter questionnaire continues to use the same eight-scale profile as the SF-36, but with fewer levels and less precise scoring in comparison to its more in-depth counterpart.17
Table 49.3 Critical analyses models for review of outcome tools
Type | Criteria |
Conceptual and measurement | Does the scale measure a single or distinct domain? |
Is the variability of the scale reported? | |
Determine intended level of measurement (categorical, ordinal, interval, or rational) | |
Reliability | Did the instrument address internal consistency? |
Did the instrument address reproducibility? | |
Validity | Did the instrument address content- and construct-related factors as well as criterion validity? |
Responsiveness | Was the scale used as an outcome measure before and in what populations? |
Interpretability | Who is the population being tested? |
Can the score be translated to a relevant clinical event? | |
Is the score predictive of outcome events? |
49.4.2 Sickness Impact Profile
Another generic assessment tool, the Sickness Impact Profile (SIP), was also developed to evaluate the functional consequences of health care.18 The SIP is a behavior-based measurement tool that presents a set of items, in a “yes/no” dichotomous format, that relate to the type of work chronically ill patients will perform to accommodate limitations with respect to their illnesses and also addresses how these individuals will respond in their working environments as a result of their medical conditions. Although the SIP attempts to elicit a descriptive profile of the changes in patients’ behaviors resulting from their illnesses, it fails to capture the potential dynamics between a person’s health and his or her occupation. This is likely due to the inadequate depiction of a general set of work activities as well as the potential categorical limitations that the form presents because of the yes/no response format, which weakens the sensitivity of such a tool. However, the SIP and certain other measurement tools, such as the Occupational Role Questionnaire (ORQ),19 the Work Limitations Questionnaire (WLQ),20,21 and the Work Limitation Questionnaire (WL-26)22 have become the foundation as well as the impetus for more precise role-specific assessments as they relate to a patient’s working life.
49.4.3 North American Spine Society Lumbar Spine Outcome Assessment Instrument
The North American Spine Society Lumbar Spine Outcome Assessment Instrument is a self-administered tool that is designed to measure disabilities and neurogenic symptoms related to back pain.23 This instrument is region specific and assesses five broad categories: demographics; medical history; pain, neurogenic symptoms, and function; employment history; and treatment outcomes. Although the questionnaire strives to address influential outcome factors, such as sociodemographics and work-related issues, it fails to examine the impact of the disability and does not address health-related concerns inherent in the pediatric population with spine pathology.23,24
49.4.4 McGill Pain Questionnaire
Considered a benchmark for standardization in the evaluation of pain, the McGill Pain Questionnaire (MPQ) is a reliable, valid, and sensitive tool in the assessment of pain relief and treatment.25 The MPQ relies on descriptors to measure subjective pain experiences. Four major groups, each consisting of five items, represent these descriptors and entail the following: sensory, affective, evaluative, and miscellaneous. Each descriptor contains a rank value, which is based on its position in the word set. The sum of the rank values results in the pain rating index. A pain rating intensity is also implemented and is based on a scale of 1 to 5. In addition, a short form of the MPQ (SF-MPQ) has been developed that entails 15 questions; 11 of these address sensory dimensions and 4 are related to affective dimensions. The intensity scale in the SF-MPQ has been reduced to 4 points, and the pain rating index is incorporated as a visual analog scale.
49.4.5 Oswestry Disability Index
One of the first disease-specific instruments, the Oswestry Disability Index (ODI), is a self-administered tool that measures disability due to low back pain.26 The development of this tool was initiated in 1976 by Dr. John O’Brien. At that time, the questionnaire was administered by an orthopedic surgeon and an occupational therapist. The questionnaire was eventually published in 198026 and after the 1981 annual meeting of the International Society for the Study of the Lumbar Spine, this instrument gained widespread attention. Since then, this assessment tool has undergone revisions.27,28,29 In general, the ODI is divided into the following 10 sections: pain intensity, personal care, lifting, walking, sitting, standing, sleeping, social life, traveling, and changing degree of pain. Each section comprises six statements that describe a greater degree of disability in that particular activity. The results are scored as percentages of the level of function and provide some type of insight into how conditions of the low back affect everyday function. However, this questionnaire fails to address aspects of occupation and psychometric properties that may affect outcome. Nonetheless, this instrument is a reliable and reproducible tool for the assessment of disability due to low back pain.26,30
49.4.6 Neck Pain and Disability Scale
The Neck Pain and Disability Scale (NPDS) is a comprehensive tool that is used to measure neck pain and associated functional status.31 This instrument is an extension of the Neck Disability Index (NDI), developed by Vernon and Mior32 in 1991, which consisted of 10 sections of distinct activity with increasing severity addressing that activity. The NPDS consists of 10 items in the same vein as the NDI, but with the exception of an ordinal scale to measure neck pain. Nonetheless, the NPDS addresses the severity of pain and its interference with vocational, recreational, social, and functional aspects of living. However, this tool fails to address psychological factors, patient satisfaction, and the secondary economic gain that may affect patient outcome and interpretation.
49.4.7 Prolo Anatomic–Economic–Functional Rating System
Regarded as a simple outcome measure that provides a semi-quantitative method for denoting the patient’s progress and outcome, the Prolo Anatomic–Economic–Functional Rating System is a useful measure that addresses economic and functional status before and after treatment.33 Based on this assessment, an economic grade is determined that aids in establishing the patient’s capacity for employment or ability to participate in other types of activities. This rating system also helps determine the effect of pain on daily activities but overall evaluates five criteria before and after treatment. Responses to these criteria are based on scores ranging from 2 to 10, where 2 is defined as incapacitating and 10 as perfect. As a result, the following four main grades are possible: excellent (10, 9), good (8, 7), fair (6, 5), and poor (4–2).
49.4.8 Cervical Spine Outcomes Questionnaire
Although various generic and disease-specific outcome health assessment tools exist to evaluate conditions in the neck, a comprehensive tool for evaluating the cervical spine did not exist until the Cervical Spine Outcomes Questionnaire (CSOQ) was developed by BenDebba et al.34 The CSOQ has its roots in past instruments, such as the NDI, the NPDI, the ODI, and the North American Spine Society Cervical Spine Outcome Assessment tool, and covers a range of factors organized in a 57-item format. The CSOQ was designed not only to address levels of pain severity, functional disability, physical symptoms, health care utilization, and patient satisfaction but also to encompass psychometric parameters that are deemed essential in the architecture of a proper health care strategy and outcome assessment.35,36
49.4.9 American Spinal Injury Association Impairment Scale
A grading system for classifying spinal cord injuries was developed in the early 1990s by the American Spinal Injury Association (ASIA) and the International Society of Paraplegia. It is known as the American Spinal Injury Association Impairment Scale.37 This scale is, in essence, a modified form of Frankel’s classification denoting various levels of functional disability and has been universally accepted for classification of spinal cord injury.38 The scale is composed of four categories or grades. Grade A represents a complete injury where no sensory or motor function is preserved in the lowest sacral level. Grade B denotes an incomplete injury where sensory, but not motor, function is preserved below the level of injury. Based on preservation of motor function below the neurologic level, grades C and D represent incomplete spinal cord injuries with muscle grades less than 3 and greater than 3, respectively. Alternatively, grade E represents normal sensory and motor functions.
49.4.10 Modified Japanese Orthopedic Association Scale
In recent years, the modified Japanese Orthopedic Association (mJOA) scale has been applied and validated to assess outcomes in patients undergoing surgical treatment for various forms of degenerative cervical myelopathy (DCM), including cervical spondylotic myelopathy and ossification of the posterior longitudinal ligament. The mJOA is a scale that rates patients’ function on a 0 to 18 point scale that assesses function on the domains of hand and upper extremity function, sensation, gait, and bladder function. Patients’ level of impairment is classified into mild (mJOA 15–17), moderate (12–14), and severe (0–11). The minimal clinically important difference (MCID) of the mJOA has also been established and varies according to the level of baseline impairment (1 point for mild dilated cardiomyopathy [DCM], 2 points for moderate DCM, and 3 points for severe DCM).39,40
49.5 Methodological Issues Relevant to Outcomes Research
The basic premise of outcomes research lies in understanding that medical practitioners need to know when recommending management the likely outcomes of different treatments and what matters to patients. Moreover, outcomes research can further elaborate on how patients value alternative outcomes that can have a direct effect on or guide policy decisions and practice guidelines. However, lack of good-quality data or access to data, small sample sizes, self-serving manipulation of data, inappropriate and inadequate statistical analyses, biases, and confounding variables may impede proper analysis and the quality of outcomes research. Various study designs can address such concerns, to some extent; however, meta-analysis and, to a greater extent, systematic reviews are considered studies associated with the highest level of evidence.41 Systematic reviews focus on a clinical question encompassing a comprehensive and explicit search strategy with a uniformly applied criterion-based selection that undergoes rigorous critical appraisal and is synthesized with formal rules (that may include meta-analysis), the inferences of which are evidence based and can be replicated.41 Moreover, systematic reviews evaluate the “quality” of the study design, methodology, and analyses, thereby endeavoring to integrate valid and reliable information to establish or facilitate rational decision making.
A hierarchy of evidence for grading research has been recommended as part of the initiative by the Canadian Task Force on Preventive Health Examination from the late 1970s.42 The levels of evidence ( Table 49.4) and associated grading (
Table 49.5) contemplated by this group were further refined by Sackett et al at McMaster University, and at the University of Oxford in the ensuing years.1,43,44,45,46,47 Based on such levels and grades of evidence, evaluation of the reported peer-reviewed literature by Freedman et al48 examined 500 published research papers in two highly regarded orthopedic journals and noted only 33 randomized controlled trials (
Fig. 49.1).
Level | Sources of evidence |
I | Meta-analysis of multiple well-designed, controlled studies; randomized trials with low false-positive and low false-negative errors (high power) |
II | At least one well-designed experimental study; randomized trials with high false-positive or high false-negative errors or both (low power) |
III | Well-designed, quasi-experimental studies, such as nonrandomized, controlled, single-group, preoperative/postoperative comparison, cohort, time, or matched case/control series |
IV | Well-designed, nonexperimental studies, such as comparative and correlational descriptive and case studies |
V | Case series, case reports, and clinical examples |
Table 49.5 Grade of recommendations of levels of evidence
Grade | Grade of recommendation |
A | Evidence of type I or consistent findings from multiple studies of type II, III, or IV |
B | Evidence of type II, III, or IV and generally consistent findings |
C | Evidence of type II, III, or IV, but inconsistent findings |
D | Little or no systematic empiric evidence |