THE USE OF STANDARDIZED TESTS
Standardized tests assess aspects of current functioning in a systematic way, providing results that can be compared with normative data1. They are developed with the aim of maximizing reliability and validity; that is, they should provide a consistent measure of the construct of interest. Standardized tests are always administered in the same way, following administration procedures set out in the test manual. Thus, the instructions given to each person taking the test, and all the parameters of the test such as the length of time for which stimuli are presented, are identical. These conditions are exactly the same as those used to obtain data from the reference group or normative sample against which the individual’s test scores will be compared, ensuring a valid comparison. Norms typically take into account the effects of age, providing stratified norms for different age groups, and adjustment may be made for other demographic factors that affect test performance, such as level of education. It follows that familiarity with the test procedures is essential in order to administer the test appropriately. Similarly, familiarity with the scientific basis and technical properties of the test, as well as the ability to consider the complex range of factors that may impact on test performance, is necessary for accurate interpretation of test results. For this reason, use of many standardized tests is restricted to those professionals who can demonstrate either by virtue of their professional education or as a result of further training that they are specifically trained in test administration and interpretation.
Neuropsychological assessment of older people combines two traditions of standardized testing: psychometric testing and clinical neuropsychological assessment. These two approaches have been described as population-based and deficit-oriented, respectively2. In practice, the two approaches are typically used alongside one another, with an emphasis on considering patterns of scores across a range of tasks.
Psychometric testing assumes that the ability or factor being tested is normally distributed in the general population, and seeks to establish how the individual’s scores relate to those found in a representative sample of that population; for example, whether scores are close to the average for the reference group, or whether they are exceptionally high or low, and thus unusual. The classic example of this would be the measurement of IQ. Standardized tests of this kind typically include items across the range of difficulty, such that there will be some easy items that just about everyone should be able to do and some very challenging items that hardly anyone will be able to answer. Some tests identify different starting and finishing points for different age groups so that a range of items of appropriate levels of difficulty is administered, and some tests have rules specifying that the task should be discontinued in the event of incorrect answers being given on a specified number of items, on the basis that further, more difficult items would not be expected to be answered correctly. Therefore, these tests are suitable for individuals across the ability range and provide a challenging test for even those individuals with the highest levels of ability.
Clinical neuropsychological assessment sets out to determine whether there is evidence of impairment in a particular ability, such that the individual’s scores are different to what they would have been if illness or injury had not intervened. This may be determined by comparing performance on abilities thought to be affected by the illness or injury with performance in areas thought to remain intact, or may be inferred from a comparison of the individual’s scores with scores obtained by a normative sample, taking into account what would be expected on the basis of the individual’s background. In the latter case, normative data will typically have been used to identify a cut-off score, signifying that scores more extreme than this value are unlikely to occur within the normal range. Standardized tests based on this approach rest on the assumption that the normal population will generally perform well and that poor scores are indicative of neurological abnormality.
Scores on standardized tests may be expressed in a number of ways. Most frequently, raw scores are typically converted to either a standard score or percentile rank, allowing comparison across subtests and against norms. The standard scores corresponding to given raw scores on a test are calculated on the basis of normative data collected from a large sample, taking into account the distribution of scores in the sample and the mean level of achievement. Where age is likely to affect performance, the standard scores derived from the raw scores are compared to norms for the relevant age group. The use of percentile ranks serves a similar purpose, indicating whether a score is in the average range or is very infrequent and therefore unusual. Standard scores provide a consistent scale with a defined mean and standard deviation; for example, in the Wechsler Scale of Adult Intelligence (WAIS-III)3, a scale with a mean of 100 and a standard deviation of 15 is used. Subtest scores are converted to this scale and overall scores are calculated on the same scale. Scores that are two standard deviations above or below the mean are generally considered exceptional, in that they arise very rarely. Scores of 70 (two standard deviations below the mean of 100) or below occur in approximately 5% or fewer of the population. Some tests use a mean of 10 and a standard deviation of 3; in this case, scores of 5 or below occur only in 5% or fewer of the population.
With tests that use population-based norms, the fact that a score is unusual does not in itself indicate that there is an impairment. The observed score may simply reflect the individual’s long-standing ability level: the individual may be one of the 5% of the population who score very poorly on the task. Therefore, information about the expected level of performance is needed in order to interpret the results. If the individual’s performance is widely discrepant from what would be expected for that individual, this suggests there may be a difficulty or impairment resulting from illness or injury. In practice, expectations about likely performance are usually based on consideration of a number of factors such as the individual’s educational and occupational background and level of achievement. For a very high-achieving individual, a score that is in the low average range may reflect a significant decline from previous, superior performance levels. However, in some cases, tests are available that can provide an estimate of expected performance levels. For example, in the English language, the ability to read irregularly-spelled words can be tested with the Wechsler Test of Adult Reading (WTAR)4 or the National Adult Reading Test (NART)5. Since this is considered a ‘crystallized’ ability, and fairly resistant to many forms of brain injury as well as the very early signs of dementia, scores on this task can be converted to obtain a predicted score on the WAIS-III3. If comparing this predicted score with the observed score yields a significant discrepancy, with the observed score lower than the predicted score, this is suggestive of a decline in overall intellectual functioning. Similarly, a comparison might be made between the WAIS IQ score and the Wechsler Memory Scale (WMS-III)6 score; normative data are available to indicate whether or not a discrepancy in scores is large enough, and unusual enough, to be considered a significant indicator of an impairment in memory relative to what would be expected on the basis of the IQ score.
With deficit-oriented tests, a score that would be unusual in the reference group is generally considered to provide an indication of impairment, given that the poor score cannot be explained by other, contextual factors. The 5th percentile often provides in effect a cutoff for identifying impairment. Scores at or below the 5th percentile in the normal population may be defined as falling into the impaired range. An example of a test using this approach is the Visual Object and Space Perception Battery7. For tests using standard scores, as noted above, a score at or below the 5th percentile approximately translates into a score of 70 or below (on the scale with a mean of 100 and standard deviation of 15) or a score of 5 or below (on the scale with a mean of 10 and standard deviation of 3). Some tests use their own classifications, indicating whether a particular score is ‘impaired’, ‘poor’, or ‘normal’. An example of a test using this approach is the Rivermead Behavioural Memory Test8. Some tests reflect tasks that should be achievable by anyone with normal functioning and thus the presence of errors in itself reflects likely impairments; for example, the Behavioural Inattention Test9 identifies scores as impaired if they are below those of the lowest performing normal controls.
Provision of appropriate test norms is not straightforward, and this is especially the case with older age-groups (for a fuller discussion of this issue, see Busch et al.2). Where tests were initially developed for younger adults, norms for older people may be unavailable, or may relate to a restricted age range. They may be based on more limited sample sizes than those for younger age groups, or may have drawn upon a non-representative sample containing only those highly motivated to assist with research. Some measures have only one set of norms available, while other frequently used measures offer numerous sets of norms among which the clinician can select those that are most appropriate for the patient and for the questions being addressed in the assessment. Details of many of these can be found in the compendium compiled by Strauss et al.10. Cultural differences may mean that the available norms are not directly appropriate for the individual; for example, an individual who has grown up in a developing country and immigrated to the United Kingdom or United States as an adult will have had different formative and educational experiences from the indigenous population, and thus norms based on a United Kingdom or United States indigenous population may be inappropriate. This in itself does not necessarily preclude the use of the test, but the lack of appropriate norms will place constraints on the interpretation of results. Availability of neuropsychological tests in languages other than English is limited, and use of interpreters during testing is unsatisfactory, especially where these are family members. Translation of tests is a highly skilled and complex task and cultural, as well as linguistic, equivalence must be established. For a fuller discussion of cultural issues in assessment, see the review by Manly11.

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree

