Psychological and Neuropsychological Assessment of Children
Katherine D. Tsatsanis Ph.D
Associate Research Scientist Yale Child Study Center Yale University School of Medicine 230 S. Frontage Road New Haven, CT 06520
Email: katherine.tsatsanis@yale.edu
Nature and Use of Psychological and Neuropsychological Assessment
The broad aims of a psychological and/or neuropsychological assessment are two-fold: i) to provide a more complete description and understanding of the child, and ii) to inform strategies for intervention. This is accomplished in part through the use of psychological tests that offer an objective and standardized measure of a sample of behavior, one that allows performance to be evaluated on the basis of empirical data (1). However, it must be emphasized that psychological or neuropsychological tests and resulting test scores are but one part of the assessment process. Test selection and administration are important factors, as is above all test interpretation. The final analysis is formed from multiple lines of converging evidence and takes into consideration the developmental and environmental context.
Psychological Assessment
Psychological tests were developed as a means to measure individual differences. Although diverse with regard to content, such measures shared a common use, which was to categorize and classify individuals based on observations of their behavior under uniform conditions (1). At the outset such measures were applied toward educational, personnel, and military classification. Differential diagnosis was also identified as a concern in the context of changes taking place in the nineteenth century in institutional care, and as test development was intended to aid in the educational placement of children, specifically in the study and instruction of children with mental retardation. Experimental psychology was concerned with universal descriptions of human behavior, in the physiology of sensory responses at this time; however, the general emphasis on the need for controlled conditions when making observations has remained at the heart of the standardization of procedures (uniform conditions) in psychological testing (1).
Early interest in educational testing led to the development of more sophisticated principles and measurement techniques that are now used to assess a wide variety of domains of functioning, including social, emotional, neuropsychological, and adaptive behavior. However, it is intellectual assessment that holds a place of notoriety in the history of psychology. Two of the more fundamental issues that have beset intelligence testing are the definition of intelligence, and the use and interpretation of measures of intelligence. As illustrated in Table 4.2.4.1, theories of intelligence abound. More than this, each theorist posits multiple components or abilities as part of his account of intelligence. As such, it is worth keeping in mind that intelligence is a construct that is neither unitary nor fixed. Additionally, there is a distinction to be made between theories of intelligence and psychometric intelligence. Whereas the former provides conceptualizations of the nature of intelligence, the latter represents the measurement of general mental ability using standardized tests. The global scores yielded from these measures are usually stable and have general predictive value for educational, social, and job outcome. The instruments are limited to what they are measuring and their interpretation is contingent on valid use, and are of course subject to misuse.
The use and interpretation of IQ scores is an important matter and a lengthy subject. In brief, from a psychometric perspective, early approaches to psychometric intelligence focused on quantifying a general level of intelligence as represented by a single number (the IQ score) and assignment to a descriptive classification (e.g., “dull” or “very bright”). Subsequent methods have involved profile analysis or a consideration of individual areas of strength and weakness. This approach may be most powerful when integrated with theories of cognitive abilities (13). Indeed, the cross battery assessment approach (14) that has emerged of late in the arena of psychological assessment emphasizes the usefulness of identifying
cognitive processes versus, for example, reporting cognitive functioning in the context of a single IQ number. This represents a shift in thinking of cognitive activity in terms of a single function— intelligence— to a multifaceted entity.
cognitive processes versus, for example, reporting cognitive functioning in the context of a single IQ number. This represents a shift in thinking of cognitive activity in terms of a single function— intelligence— to a multifaceted entity.
TABLE 4.2.4.1 FACTOR ANALYTIC THEORIES OF INTELLIGENCE | |
---|---|
|
Psychological Assessment Goals
A fundamental first step toward treatment planning is gaining a full understanding of the individual child. As noted, the psychological/neuropsychological examination is considered to be an integral part of this process. The psychologist/neuropsychologist is also in a unique position to consider the influence of the child’s cognitive functioning on academic and social emotional functioning. One purpose for seeking an assessment is that of diagnosis and/or differential diagnosis. The referral question may focus on a diagnostic ambiguity or the question may be one of level of functioning or development of a specific skill. A second major purpose for an evaluation is to gain information about a child’s cognitive and academic profile and/or an augmented understanding of his/her behavioral and emotional functioning. Diagnosis is often emphasized, but what may be needed to design educational as well as treatment objectives is a more detailed assessment of the child’s strengths and weaknesses in several areas. For example, language deficits may interfere with the child’s ability to form a personal narrative; memory deficits may account for challenges in learning or treatment gains; a child’s learning strengths/difficulties may inform the best modality for presenting information. Third, clinically and in research, assessment measures may be used for pre- and post-comparisons (in the case of brain trauma, in the evaluation of medication or a treatment program). Measurement through well constructed tests further serves an important function in research toward the identification of environmental and biological factors associated with behavioral differences (e.g., gene–brain–behavior relationships).
Neuropsychological Assessment
The traditional neuropsychological assessment is distinguished by its emphasis on producing a description and understanding of the relationship between brain and behavior. A fundamental approach to neuropsychological assessment is measurement of multiple ability domains sufficient to i) represent the principal areas of functioning thought to be mediated by the brain and ii) gather the information needed to address the clinical problems presented by the child (15). The assessment is typically quite comprehensive, as it is designed to sample a broad range of skills and abilities in the child. Given the emphasis on brain and behavior, it has been the longstanding practice of neuropsychologists to consider cognitive functioning as multidimensional. In describing the
brain–behavior relationship, there is the implicit recognition that cognition as an operation of the brain is complex and any inferences that are made about behavior conceptualized in terms of cognition should reflect this complexity (16).
brain–behavior relationship, there is the implicit recognition that cognition as an operation of the brain is complex and any inferences that are made about behavior conceptualized in terms of cognition should reflect this complexity (16).
The basic neuropsychological framework for understanding dimensions of behavior reflects the functional systems of the brain. These divisions may be represented broadly as cognitive, emotional, and control processes, and as connected systems in the brain they can be thought to have reciprocal influence. The domains for assessment include: i) alertness/arousal; ii) sensory perception; iii) attention; iv) memory or the encoding, storage, and retrieval of information; v) information processing, such as analysis and synthesis of information, problem solving, concept formation, etc.; vi) motor activity; and vii) intentional or goal-directed activity, i.e., the organizational programs of behavior, sometimes referred to as executive functions. Alterations in motivation and emotional capacity are also evidenced in brain injury or disease and should be considered for their impact on these other systems.
A neuropsychological assessment specifically may be sought to: i) ascertain the likelihood that the child’s problems in adaptation are the result of compromised brain functioning (versus, for example, the result of a psychiatric disturbance); ii) enhance understanding of the child’s psychosocial behavior by examining cognitive and control processes, such as how information is received, processed, and expressed by the child; and ultimately iii) identify the pattern or constellation of neuropsychological assets and deficits displayed by the child toward developing strategies for behavioral or educational intervention.
The Assessment Process
The psychological/neuropsychological assessment involves: i) clarifying the referral question, ii) selection and administration of psychological tests, iii) observation, iv) interpretation, and v) diagnostic formulation and recommendations.
Referral Question and Background History
The referral question(s) are initially identified by the parents and/or referring professional involved in the child’s care. They are further refined by obtaining information from multiple sources, including interviews with key people in the child’s life (parents, teachers, and other professionals), a review of past records (school reports, previous testing, and medical information), thorough history-taking, and talking with the child.
Selection and Administration of Psychological Tests
The types of assessment methods used and the breadth of the battery formed are key to test selection. Typically, a comprehensive evaluation will make use of a variety of assessment methods and assess a range of domains of functioning. One reason for sampling a range of functions lies in the fact that most psychological measures are not “pure”— that is, they do not assess one ability domain alone. It is important to discern whether, for example, on a timed task in which the child is asked to copy figures, poor performance is related to a motor, visual perceptual, attentional, and/or speed of processing issue. Difficulty on a measure of math skills may reflect limits in understanding numerical concepts, remembering math facts, understanding the language of mathematics (symbol use), knowing which operations to apply when sequencing (e.g., performing the correct steps in the correct order), copying errors, and/or attending to meaningful visual details (operational sign, place, columns of numbers). Test selection is guided by evaluation of the test itself and related constructs such as norm groups, reliability, and validity (see Table 4.2.4.2 and discussion below).
TABLE 4.2.4.2 CRITICAL VARIABLES IN TEST SELECTION | |
---|---|
|
Test administration variables include the environmental setting (e.g., quiet, well lit room, free of interruptions), establishing rapport with the child, and engaging the child in a manner so as to obtain the best possible performance. The rationale for creating optimal performance conditions is related to the purpose of the evaluation; that is, to determine if the child has the component cognitive skills or abilities necessary to function adequately (or more than adequately) at home, at school, or with others. Standardization, which refers to the uniformity of procedure in administering and scoring a test, is also a key concept in test administration. The examiner must know and adhere to the test procedures, including presentation of directions, use of materials, response to queries, etc. In all, the assessment must be conducted effectively to obtain information regarding the level of performance that the child is capable of but also in a standardized manner to ensure comparability of the scores obtained.
Observation of Test Behavior
In addition to obtaining test data, the examiner makes qualitative observations of the child’s presentation and performance during the test sessions. Clinical observation is an essential aspect of test interpretation. Test scores represent how a child performs on a particular test at a particular time. Qualitative observations must be integrated with the quantitative information or test scores to provide a more complete understanding of the results and conditions under which they were obtained. This includes variables such as attention, motivation, persistence, fatigue, illness, and rapport, as well as observations regarding how the child approached the tasks (e.g. use of verbal mediation, trial and error, slow but accurate style).
Interpretation of Results
Test interpretation typically involves an analysis of levels and patterns of test performance. The clinician engages in a dynamic process of hypothesis testing and information gathering, reasoning deductively and inductively from the data collected. Interpretation of test results also requires taking into consideration the behavior observed during testing and other relevant behavioral data (e.g., suspicions of a primary visual or hearing impairment) and case history information (cultural, economic, family variables).
Summary and Recommendations
Assessment conclusions and recommendations should be based on all sources of information. The comprehensive assessment is designed to identify the child’s assets and deficits in a variety of domains of functioning. This approach promotes an understanding of the challenges the child faces and why, but also the strengths he/she possesses and how these can be used to help remediate areas of weakness. Inferences are made from these data to determine the services and strategies that will facilitate the child’s social, emotional, and academic functioning.
Principles of Assessment
The psychometric principles of assessment influence test selection, administration, and interpretation. These measurement issues are outlined below to familiarize the reader with the basic constructs and related issues.
Standardization Sampling/Developmental Norms
The raw scores obtained from tests are for practical purposes meaningless without a basis of comparison. As such, the data obtained from psychological tests are interpreted with reference to a norm group. This aspect of test development and use is fundamental as it permits that evaluation of a child’s behavior need not rely on subjective interpretation alone. Rather, such norm-referenced tests offer: 1) quantification of the child’s level of performance with reference to his/her peer group 2) an ipsative (“of the self”) comparison or analysis of the child’s performance across different measures to determine areas of personal asset and deficit, and 3) longitudinal comparison or assessment of gains/loss over time.
Norms are developed empirically on the basis of the performance of the normative sample, also sometimes referred to as the standardization sample. Standardization sampling represents the procedure used; the normative data are obtained under standard conditions with regard to consistency of item content, administration procedures, and scoring criteria. The norm group should be evaluated for representativeness, size, and relevance (17). The norm group should be a representative group of the child’s peers and large enough to ensure stability of the test scores. Sattler (17) recommends at least 100 subjects for each age group in the normative sample. In most cases, test developers will draw from U.S. Census Bureau data to determine the composition of this sample based on stratification variables such as age, gender, socioeconomic level, race, geographic region, etc. Many instruments will also offer normative data obtained from special populations to permit comparisons of the child to other children with the same disorder (a peer group as well).
Reliability
The reliability of a test refers to the consistency of measurement; as such, it also speaks to the degree to which test scores are free from random fluctuations of measurement (18). The example of a scale used to measure weight vividly illustrates the importance of the stability of test scores as concerns accuracy or dependability of measurement. Let us say that on one day when you step on the scale it shows a weight of 150 pounds, 100 pounds on the next day, and then 175 pounds the following day. The scale could not be considered a meaningful or accurate measure of your weight. The same could be said for a psychological test that is not reliable. Test results are not interpretable if the test is not reliable, making reliability a fundamental factor in test selection and interpretation.
There are different types of reliability, each of which reflects a different aspect of how a test score is reproducible (Table 4.2.4.3). The reliability coefficient, symbolized by the letter r with two identical subscripts, is used to express the degree of consistency of test scores. It is a particular kind of correlation coefficient with a range of .00 (indicating no association or consistency between scores) to 1.00 (perfect reliability). No assessment measure is 100% reliable, and as such, some error of measurement is to be expected. The reliability coefficient can be used to determine the degree of error variance or random or unsystematic variation in the measurement instrument (1). The error variance of a test is calculated by subtracting the reliability coefficient from 1.00, where 1.00 indicates perfect reliability. Thus, a reliability coefficient of .80 indicates 80% reliability and 20% error variance. Typically, a minimum acceptable level of reliability is .80 (17).
The reliability coefficient is an important number as a measure of consistency, but also as a source of the amount of reliable variance associated with the test. As such, it is used in the calculation of the standard error of measurement (SEM) of a test score and in turn the confidence interval. Scores for a psychological test (e.g., an IQ score) are often reported as falling within a specific range of scores, or within a confidence interval. The psychometric properties of a test are such that although they may aim to quantify level of functioning in a real way, the obtained test score is actually composed of a true score (hypothetical) and an error score (1). The confidence interval represents the range of scores surrounding the obtained score within which the true score is likely to lie. For example, if a child obtains an IQ score of 90, we can state at the 95% confidence level (the usual reported level) that her IQ score on any single administration of the test will lie between 84 and 96. That is, 95 times out of 100, her IQ score will fall within this band of values. The confidence interval is determined by the SEM for the instrument, which in turn is computed from the reliability coefficient.
The major point to underscore here is that the obtained test score is not precise or definitive, as each test inherently contains measurement error that should be taken into consideration when decisions are being made based on a single score (e.g., IQ score for qualification of services). The second and related point is that measurement error must be accounted for when reported scores are compared over time or across instruments. The difference between two test scores may be due to chance factors or the error variance associated with each test. Correspondingly, the reliability coefficient for each test is taken into consideration in the calculation of discrepancy scores.
Validity
Test validity is a term used to represent the meaning or relevance of a test, specifically, whether it measures what it is purported to measure (1). It is a fundamental psychometric concept that can be approached in several ways, as detailed in Table 4.2.4.4. The validity of a test is relevant when assessing what is being measured and how completely, as well as how to use a test appropriately. Validity coefficients are a type of correlation coefficient and accordingly are impacted by the range of attributes being measured (the narrower the range, the lower the value of the validity coefficient). Examinee variables also can impact validity; if an examinee presents with severe test-taking anxiety, extreme fatigue or illness, a hearing or vision impairment (and for example forgets to wear her glasses), or fails to understand the instructions, these factors are likely to render the test scores invalid, as the test is no longer measuring the characteristic it is intended to measure. As such,
psychological reports include a section on observations of test behavior and a description of any presenting factors that are a threat to validity. Extrinsic or environmental factors such as socioeconomic status, access to quality teaching or textbooks, or cultural experiences can similarly impact validity and are addressed in the interpretation of test scores.
psychological reports include a section on observations of test behavior and a description of any presenting factors that are a threat to validity. Extrinsic or environmental factors such as socioeconomic status, access to quality teaching or textbooks, or cultural experiences can similarly impact validity and are addressed in the interpretation of test scores.
TABLE 4.2.4.3 OVERVIEW OF TYPES OF RELIABILITY | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Interpretation of Test Scores
Derived Scores
A basic feature of interpretation of test scores is the comparison of scores to some standard or norm. As mentioned above, raw test scores, whether the number of points earned, items successfully completed, or symptoms endorsed, are meaningless on their own. Rather, the raw test score is evaluated relative to the test performance of the standardization sample. The question that is answered in this process is where this particular child’s score falls relative to the distribution of scores produced by the standardization sample, where the mean represents the average and the standard deviation (SD) represents the variability. There are a variety of derived scores or ways in which this comparison can be reported.
Standard Scores
Standard scores are the most typical and often the most suitable kind of score to report. Standard scores are particularly useful for making comparisons across tests, as the mean and SD are set and there are equal units along the scale (18). For all standard scores, a score falling 1 SD below the mean (below the average range) or a score falling 2 SD above the mean (well above average) occupies the same position relative to the group apart from the instrument used. Comparability of scores in this manner is achieved through a transformation of the raw data. The usual types of reported scores are: standard scores, scaled scores, and T-scores (Table 4.2.4.5). The typical standard score has a mean set at 100 and a standard deviation of 15. Most major cognitive and achievement assessment batteries report global scores in this format. Individual subtest scores however may be represented as scaled scores with a mean of 10 and a standard deviation
of 3. Some tests and many behavior checklists yield T-scores, which have a mean of 50 and a standard deviation of 10.
of 3. Some tests and many behavior checklists yield T-scores, which have a mean of 50 and a standard deviation of 10.
TABLE 4.2.4.4 OVERVIEW OF TYPES OF VALIDITY | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
TABLE 4.2.4.5 Z-SCORES AND DERIVED SCORE EQUIVALENTS | ||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Percentile Scores
Percentile scores are a popular means for reporting test performance as they are easy to understand; a percentile rank is a way of positioning the child’s performance relative to the norm group in familiar terms (17). For example, a percentile rank of 84 indicates that the child scored as well as or better than 84% of the norm group. There are some caveats. Naïve consumers of these test scores may confuse percentile ranks with percent of items passed (such as an 84% on a test). A major concern also with regard to percentile ranks (versus standard scores) is that the units are unequal. The numbers can be deceptive and may overemphasize or underemphasize differences between standard scores. For example, note that scores between the 25th and 75th percentile are all within the average range.
Age and Grade Equivalent Scores
Test scores in some cases are also reported in terms of age and grade equivalents. An age equivalent score is determined based on the performance of each age group in the norm sample. If the average raw score of the 8-year-olds in the sample is 14, then a raw score of 14 yields an age equivalent of 8 years. Age equivalent scores thus describe the raw score obtained and do not necessarily correspond to the child’s level of functioning, nor do they represent equal units.
Grade equivalents are similar in that grade norms are computed from the mean raw score obtained by children in each grade in the standardization sample. Again, if the average raw score on a reading test for single words corresponds to 25 for 4th graders, then a raw score of 25 corresponds to a grade equivalent of 4. Grade units are unequal and do not represent the variability between subject areas at different levels. It is also important to recognize that when a fourth grade child obtains a grade equivalent score of 6.5 on an arithmetic test, it does not necessarily indicate that the child is capable of grade 6 arithmetical processes or should be placed in the seventh grade curriculum (1). Rather, the child’s total raw score may reflect superior performance on fourth grade arithmetic. The point is that psychological tests are typically constructed to provide a range of scores. If we consider the standardization sample of children in the fourth grade, there will be a distribution of scores for this group of children; an average raw score for this group will represent the average score on the test at the fourth grade level. The children who perform well above average within this distribution will produce a well above average score on the test for their comparison group and may in turn share a total raw score with the average sixth grader. Additionally, the same raw score yielding the same age or grade equivalent could be obtained in a very different way relative to the individual items of the test and thus have a different meaning.
Although appealing, age and grade equivalent scores are easily subject to misinterpretation; they do not necessarily reflect a particular level of knowledge but are rather another means of indicating where a child falls relative to a particular kind of reference group (18). On the other hand, the advantage of reporting scores in this format is that they are easily understandable and place performance within a familiar developmental context. In this case, correct interpretation is paramount.
Descriptive Levels
In addition to a quantitative representation of performance, classification of ability levels is applied to standard scores, based on whether the score corresponds to the average performance of the normative group or, for example, the upper or lower extreme end of the distribution. These descriptions of performance are widely used to represent how far a child’s score deviates above or below the mean (Table 4.2.4.6). In the event that a child’s file contains several different reports, one should keep in mind that derived scores can be converted to a uniform metric for comparison. Knowing the child’s test score, the mean and SD of the test, and assuming that the scores on the test fit to a normal distribution (which would be true of most major assessment instruments), then it is easy to determine where the child’s scores fall relative to the mean. This would be achieved by taking the child’s test score, subtracting the mean, and dividing by the SD. Thus, if a child obtained an IQ score of 115 on a test that has a mean of 100 and SD of 15, his or her score would be 1 SD above the mean. Similarly, if the same child obtained a T-score of 60 on a different measure (with a mean of 50 and SD of 10), his or her score would also be 1 SD above the mean. If percentile scores are reported on a given test, these too can represent the child’s position relative to the standardization sample (see Table 4.2.4.5). A special note about IQ scores: The traditional means of obtaining an IQ score (or intelligence quotient) was to take the ratio of mental age (MA) divided by chronological age (CA) and multiply by 100. The problem with this procedure is that ratio IQ scores at different ages are not comparable and thus not psychometrically sound in practice. Although the term IQ has been retained, current measures of IQ do not derive scores based on the above formula; rather, these so-called deviation IQs are a type of standard score and as described above fit to a distribution with a mean of 100 and SD of 15. However, knowledge of the ratio IQ is handy
when there is limited information and a need to make a rough approximation of the child’s ability level.
when there is limited information and a need to make a rough approximation of the child’s ability level.
TABLE 4.2.4.6 CLASSIFICATION LEVELS | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Significant Difference
Derived scores and classification of ability levels provide a means to compare a child’s performance relative to his or her peers as defined by the norm group. However, test interpretation also involves a comparison of the child’s different ability levels across domains of functioning and in some cases across time. For example, we may want to know whether Sam is more able on verbal versus visual spatial tasks, whether Alice’s reading skills are consistent with expectations given her overall IQ, or whether ratings of Justin’s behavior at home and school are significantly different. There are two considerations to keep in mind when comparing whether two scores are different or not: a) statistical significance and b) unusualness or abnormality of difference. The first, statistical significance, answers whether the results differ from what would be expected based on chance alone (17). The usual p-value for this calculation is .05, meaning that, if the difference between two scores is significant (not due to chance factors), we accept a 5 out of 100 chance of being wrong. Test publishers will report domain and subtest score differences in their manuals or computerized printouts. When making comparisons between two different tests, either of the following two calculations can be made, which take into account the error variance of each test:
SEdiff = √(SEM1)2 + (SEM2)2
SEdiff = SD √2-r11-r22
The standard error of the difference (SEdiff) is then multiplied by 1.96 to determine how large a score difference could be obtained by chance at the .05 level (1).
In addition to answering whether two scores are significantly different in statistical terms, the second part of interpretation lies in determining whether the difference is clinically significant. One way to address this particular question is to examine base rate frequency, that is, to ask how unusual is it to find this difference in scores. Test publishers of cognitive and achievement test batteries will typically provide this information in supplementary tables. A difference of scores that is found in only 5% of the norm sample can be considered unusual and highly unusual when present in only 1% of the sample.
Sometimes standard scores in a domain of functioning may decline over time. This finding does not necessarily represent a deterioration or regression, but rather, may reflect a failure to make age-appropriate gains (rate of gain slower than rate of change in chronological age). If standard scores on the same test are found to be significantly different across time, it would be important to look at the raw scores and pattern of scores obtained on each test before assuming there has been a loss of skill. Additionally, of course, it will be important to consider other variables that might have affected test performance (such as fatigue, illness, compliance). Table 4.2.4.7 lists several other factors to consider when test scores for the same child on (ostensibly) the same kind of test differ (the child scores in the mentally retarded range on one cognitive test but not another). These factors are important to consider, particularly in the interpretation of IQ scores, as they impact diagnosis and the procurement of services.

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree

