Evaluation is the basis for improving care to people with mental illness. It is vital to know whether interventions are beneficial or harmful, and whether they offer value for money. Mental health interventions need to be understood both in terms of their active ingredients and how they fit within their context.(1) Such combined interventions, often including pharmacological, psychological, and social elements, are the epitome of ‘complex interventions’(2) and their evaluation poses considerable challenges. In this chapter we shall discuss definitions of evaluation, and go on to discuss why evaluate, what to evaluate, and how to evaluate mental health services. In our conclusion we shall offer an indication of the most important trends in this field in the coming years. The overall approach that we take is centred upon the idea that ongoing evaluative research is of fundamental importance in discovering which interventions are effective, neutral, or harmful, and that such information is essential to deliver better mental health care.
Evaluation: definitions and conceptual framework
The Concise Oxford English Dictionary(3) gives the following definitions of ‘evaluation’:
evaluate (verb transitive) 1. assess, appraise; 2a. find or state the number or amount of; 2b. find a numerical expression for.
The etymological root of the word therefore refers directly to ‘value’, although in common usage ‘evaluation’ now has a more technical connotation. In our view evaluation necessarily requires both the precise measurement of the effects of treatments or services, alongside a contextual understand of the meaning, and value of such results.
A conceptual model that can be used to clarify key issues related to the evaluation of mental health services is the Matrix Model.(4,5) The two dimensions of this model are place and time (see Table 7.6.1). Place refers to three geographical levels: (1) country/regional, (2) local, and (3) individual. Time refers to three phases: (A) inputs, (B) processes, and (C) outcomes. In this framework inputs relates to all those resources which are necessary before health care can take place (such as financial and human resources, policies, and treatment guidelines), processes refers to all those activities which constitute the delivery of health care (such as outpatient consultations, or hospital admissions), while outcomes refers to the consequences of health care (such as changes in symptoms, disability, and quality of life). In relation to the evaluation of mental health services, we shall illustrate in this chapter how inputs and processes need to be measured and understood in their contribution to the outcomes of care.
Historically, the first attempts to evaluate psychiatric practice originated in the mid-nineteenth century as the tabulation of admissions, discharges, and deaths in mental hospitals, simply describing the inputs and processes of care. In recent decades, as more sophisticated research methodologies and more valid and reliable research measures have been developed, so the evaluation of mental health services has increasingly focussed upon the analysis of the outcomes of care. As Sartorius has put it, ‘In its most classical form, evaluation denotes a comparison between results and goals of activity’,(6) indicating that evaluation has now become a purposeful exercise in which measurements are used as tools to answer specific questions, usually defined a priori at the beginning of a scientific study.
Why evaluate mental health services?
In our view, the main purposes of mental health service evaluation are to assess the effectiveness and cost-effectiveness of care, either at the organizational (local) or at the patient (individual) level. In the long-term such evidence can be used to provide better services for people with mental illness. For example, evaluation can be applied to comparing differing models of care, such as studies in England showing that home-treatment teams can provide a realistic alternative to emergency hospital admission.(7,8,9) Evaluation therefore measures the impact of care (outcomes) and also aims to increase understanding of the active ingredients (inputs and processes) which contribute to better outcomes.(1) In fact, a wider range of purposes can be served by the evaluation of mental health services, as shown in Table 7.6.2.
What to evaluate in mental health services?
In our view the most important focus of evaluation is upon the outcomes of care.(10,11) The outcome chosen for any particular evaluation will depend upon the central question addressed and the level at which outcomes are assessed, as shown in Table 7.6.3.
Table 7.6.1 Overview of the Matrix Model, with examples of inputs, processes, and outcomes
Local service budgets and balance for hospital and community services
Local population needs assessment
Staff numbers and mix
Clinical and non-clinical services
Working relationships between teams
Service contacts and patterns of service use
Pathways to care and continuity
Targeting of services to special groups
Suicide rates among people with mental illness
Employment rates
Physical morbidity rates
(3) Individual level
3A
3B
3C
Assessments of individual needs made by staff, service users, and by families
Therapeutic expertise of staff
Information for service users
Information for family members
Content of therapeutic interventions (both psychological, social, and pharmacological)
Continuity of clinical staff
Frequency of appointments
Symptom severity
Impact on caregivers
Satisfaction with services
Quality of life
Disability
Met and unmet needs
Directly in relation to the population level, a frequently used outcome measure is suicide rate (see cell 1C in Table 7.6.1). Rates of homelessness among mentally ill people (or rates of mental illness among the homeless) can also be used as an outcome indicator of the effectiveness of mental illness policies at the national (or regional) level.
At the local level, outcome indicators useful for evaluation can be made in three ways: (i) by interpolating from regional/national data; (ii) by measuring directly at the local level; and (iii) by aggregating individual-level information up to the local level. For example, rates of suicide and unemployment can be estimated using the first method, or directly measured using the second approach if the appropriate data and resources exist, which will provide more accurate and up-to-date information. The third approach is to aggregate up to the local level information gathered from individual patients, if institutions providing care to those local patients are willing to cooperate in integrating their datasets.
Table 7.6.2 Main purposes of mental health service evaluation
To assess the outcomes of services in experimental conditions (efficacy)
To investigate whether interventions which have demonstrated efficacy under experimental conditions are also effective in ordinary, routine clinical conditions
To understand the mechanism of action (i.e. active ingredients) of interventions
To inform mental health service investment decisions, for example using health economic data on cost-effectiveness
To raise awareness among planners, policy makers, and politicians of service gaps
To test a priori or to check post hoc the value of planning decisions (for example, the closure of mental hospitals)
At the individual level mental health service evaluation increasingly acknowledges the importance of outcomes other than symptom severity.(10,11) Traditionally, symptom severity measures have been used most often to assess the effectiveness of the early, mental health treatments. Psychiatrists and psychologists have contributed to the early development of such assessment scales to allow this research to take place.(10,11) While the primary symptoms are clearly important, for most of the more severe mental disorders there is symptom persistence, and, at present, it is unrealistic to see symptom eradication as the sole aim of treatment. Therefore, very often, after the point of maximum symptom relief, when the extent of the ongoing impairments is clear, then the clinical task becomes one of attempting to minimize the consequent disability and handicap.
Table 7.6.3 Outcome measures suitable for use in routine clinical practice
Outcome measure
Place dimension
Country level
Local level
Individual level
Employment status
[check mark]
[check mark]
[check mark][check mark]
Physical morbidity
[check mark]
[check mark]
[check mark]
Suicide and self-harm
[check mark][check mark]
[check mark]
[check mark][check mark]
Homelessness
[check mark]
[check mark][check mark]
Standardized mortality ratios
[check mark]
[check mark]
Symptom severity
[check mark]
[check mark][check mark]
Impact on caregivers
[check mark]
[check mark]
Satisfaction with services
[check mark]
[check mark][check mark]
Quality of life
[check mark]
[check mark]
Disability
[check mark]
[check mark][check mark]
Met and unmet needs for care
[check mark]
[check mark]
Key: [check mark] = suitable for use as an outcome, [check mark][check mark] = commonly used as an outcome.
The importance of the impact of caring for people with mental illnesses upon family members and others who provide informal care has long been recognized, but has only been subjected to concerted research relatively recently.(11,12,13) Such research has shown that it is common for carers themselves to suffer from mental illnesses, most commonly depression and anxiety, and to worry about the future when they may no longer be able to cope. Moreover, many family members are most distressed by the patient’s underactivity, and are often poorly informed about the clinical condition, its treatment, and the likely prognosis, as well as being inadequately provided with a practical action plan of what to do in the future should a crisis occur. Indeed, some services continue to convey to families the outmoded idea that carers, especially parents, are in some way to blame for the disorder or for relapses of the condition. The regular provision of information sessions for family members is now a hallmark of a good practice.(14,15)
Patients’ satisfaction with services is a further domain that has recently become established as a legitimate, important, and feasible area of outcome assessment.(16) This is a recognition of the contribution that service users and their carers can make to outcome assessment. Psychometrically adequate scales in this field are those that adopt a multidimensional approach, assess the full range of service characteristics, are independently administered (so that patient ratings have no consequences upon their future clinical care), and have established validity and reliability.(17)
Quality of life ratings have also become prominent during the last decade, and several scales have been constructed that reflect different basic approaches to the topic.(18) The first distinction is between scales that address subjective well-being, compared with those that also measure objective elements of quality of life. The second main point of differentiation is between scales constructed for the general population and those designed for patients suffering from specific disorders, including the more severe mental illnesses.(19) One advantage of quality of life data is that they tend to be popular with politicians, for whom the concept often has powerful face validity.
Among people with longer-term or more complex mental illnesses, the measurement of disability is often an important consideration.(20) Increasing importance is also being attached to the needs of people with mental illness, where met needs are difficulties faced by people with mental illness in the presence of appropriate interventions.(21) Needs (both met and unmet) may be defined by professionals/experts, or by service users, and in fact there is emerging evidence that service user ratings may be more informative, for example in predicting quality of life.(22,23,24)
Psychometric properties of outcome measures
Establishing the psychometric qualities of scales used for service evaluation is a central issue.(4) Among the most important characteristics of outcome scales are validity and reliability. Validity refers to whether a scale actually measures what it is intended to measure. It is conventionally assessed in terms of face validity, content validity, consensual validity, criterion-related validity, and construct validity.
In addition, a rating scale must give repeatable results for the same subject when used under different conditions, i.e. it must be reliable. There are four widely used methods to gauge reliability: inter-rater reliability, test–retest reliability, parallel-form reliability, and split-half reliability. The main issue for the evaluation of mental health services is to use wherever possible scales with known and adequate psychometric properties.
How to evaluate mental health services
In this section we consider research designs that may be applicable to the range of contexts used in mental health service evaluation.(1) Different types of evidence produced using these designs cannot be considered as equivalent. A hierarchical order has been proposed by Geddes and Harrison(25) as shown in Table 7.6.4.
In terms of research methods or designs which can be used to produce such evidence, they can be considered as: (i) randomized controlled trial (RCT), (ii) quasi-experimental studies, (iii) case-control studies, (iv) cohort studies (prospective or retrospective), (v) cross sectional studies, and (vi) case series and single case studies. Since evaluations of mental health services are usually concerned with complex interventions, it is helpful to have an overall scheme linking different stages of research to test treatment interventions. The Medical Research Council (MRC) framework for the evaluation of complex interventions sets out one such sequence, as shown in relation to anti-stigma interventions in Table 7.6.5. The elements in this scheme can be considered as sequential, or stages 0, 1, and 2 can be seen as one larger iterative activity.(1) Nevertheless, although this gives salience to randomized controlled trial designs, it is important to appreciate that research study designs need to be matched to the purpose of each type of evaluation, as shown in Table 7.6.6.
Evidence from a meta-analysis of randomized controlled trials
Meta-analysis can be defined as ‘the quantitative synthesis of the results of systematic overviews of previous studies’, while systematic overviews, in turn, are methods of collating and synthesizing all the available evidence on a particular scientific question.(26) Since randomized controlled trials are often considered to produce the most sophisticated evidence on the efficacy of medical treatments, a meta-analysis conducted on well selected and relevant randomized controlled trials can be seen as the highest order of knowledge. It follows that the quality of systematic overviews is limited by the quality and quantity of the contributory trials (see Table 7.6.7).(27)
Table 7.6.4 Hierarchy of evidence
1a
Evidence from a meta-analysis of RCTs
1b
Evidence from at least one RCT
2a
Evidence from at least one controlled study without randomization
2b
Evidence from at least one other type of quasi-experimental study
3
Evidence from non-experimental descriptive studies, such as comparative studies, correlation studies, and case-control studies
4
Evidence from expert committee reports or opinions and/or clinical experience of respected authorities
(Reproduced from J.R. Geddes, and P.J. Harrison, Closing the gap between research and practice, The British Journal of Psychiatry, 171, 220–5, copyright 1997, The Royal College of Psychiatrists.)
Table 7.6.5 Phases of the Medical Research Council framework for the evaluation of complex interventions(1,2)
0 Preclinical
1 Modelling/manualization
2 Exploratory
3 Definitive trial
4 Long-term implementation
Explore relevant theory to ensure best choice of intervention and hypothesis and to predict major confounders and strategic design issues
Identify the components of the intervention and the underlying mechanisms by which they will influence outcomes to provide evidence that you can predict how they relate to and interact with each other
Describe the constant and variable components of a replicable intervention and a feasible protocol for comparing the intervention with an appropriate alternative
Compare a fully defined intervention with an appropriate alternative using a protocol that is theoretically defensible, reproducible, and adequately controlled in a study with appropriate statistical power
Determine whether others can reliably replicate your intervention and the results in uncontrolled settings over the long-term
Example: anti-stigma intervention in schools study