Assessment measures
Age suitability
Administration time
Screening instruments
Checklist for Autism in Toddlers (CHAT)
18 months and above
10–15 min
Modified Checklist for Autism in Toddlers (MCHAT)
16–30 months
5–10 min
Screening Tool for Autism in Two-Year-Olds (STAT)
24 to 35 months
15–20 min
Pervasive Developmental Disorders Screening Test- Second Edition (PDDST-II)
14 to 48 months
Varies based on form used
Social Communication Questionnaire (SCQ)
4 years and above (mental age of 2 years or older)
10 min
First Year Inventory (FYI)
12 months and above
Not reported
Observational measures
Childhood autism rating scale (CARS)
2 years and above
30–45 min
Childhood autism rating scale- Second Edition (CARS2)
2 years and above
30–45 min
Pre-Linguistic Autism Diagnostic Observation Schedule (PL-ADOS)
6 years or younger
30 min
Autism Diagnostic Observation Scale-Generic (ADOS-G)
15 months or older with a mental age of 20 months or older
Approximately 30 min
Parent/caregiver structured interviews
Autism Diagnostic Interview—Revised (ADI-R)
Mental age of 2 years or above
1.52 h
Diagnostic Interview for Social and Communication Disorders (DISCO)
Entire lifespan
2–4 h
Informant-based behavior checklists
Gilliam Autism Rating Scale- Second Edition (GARS-2)
3–22 years
20 min
Pervasive Developmental Disorders in Mentally Retarded Persons (PDD-MRS)
2–55 years
10–20 min
Baby and Infant Screen for aUtistIc Traits (BISCUIT)
17–37 months
20–30 min
Methods of Assessment
While there are many measures available to assess for ASD, these measures typically utilize one of three different methods to collect information about the child . These methods are clinician-rated observational measures, diagnostic interviews, and informant-based (typically parent or caregiver) behavior checklists. Each of these methods offers unique benefits while also having some weaknesses. Each of these methods is briefly discussed and the strengths and weaknesses of each are reviewed.
One popular form of assessment for ASD in young children is based on clinician ratings of the child’s behavior. These observational measures’ items represent behaviors that the clinician aims to elicit during the course of the observation/assessment session. Some of these measures can be highly structured and standardized with specific toys and objects being used during the assessment (Lord et al. 2002). One benefit of this assessment method is that it allows a clinician with training in developmental disabilities and who is familiar with developmental norms to make informed judgments based on those observations. This hopefully increases the validity of their ratings of behaviors and avoids some of the weaknesses of the other assessment methods that depend heavily upon parent or caregiver reports. On the other hand, the limited time during the observations means that certain low-frequency behaviors may not be elicited from the child though they occur outside of the assessment session. For example, repetitive behaviors and restricted interests may not be observed during the assessment but may be reported to occur by the parents. For this reason, some measures do not take into account repetitive behavior and restricted interests (Lord et al. 2002), and other measures allow observational information from the clinician to be supplemented with parent report (Schopler et al. 2010) .
A second method of assessment that does utilize parents and caregivers as informants is the diagnostic interview. Diagnostic interviews, such as the Autism Diagnostic Interview-Revised (ADI-R; Rutter et al. 2003b), rely on informants to provide detailed information to structured interview questions. Unlike the informant-based behavior checklists (reviewed below) that obtain a number rating from the parents to represent the presence of specific behaviors, the structured diagnostic interviews allow the clinician to obtain detailed information about specific areas. This way, informants can elaborate on their responses, and if needed, the clinician can ask follow-up questions to acquire other important, related information. Another possible benefit of the diagnostic interview is that because more details are reported, this may allow the clinician to judge if parents are under- or overreporting. For example, if the parent reports “severe self-injurious behavior” but describes the behavior as only occurring a few times a week with no injury, then the clinician can take that into account and may consider that the parent is overreporting symptoms. The diagnostic interview is not without its weaknesses however. Compared to some other assessment methods (e.g., behavior checklists), the clinician requires more intensive training; sometime this includes attending specific trainings on administering one specific measure. This can become time-consuming and expensive for the clinician. Additionally, these interviews can require up to 2 h to administer whereas behavior checklists may only require 20 min (Table 5.1) .
The last commonly used method to assess symptoms of ASD in children is the informant-based behavior checklist. These measures ask parents or caregivers of the child to answer items based on the child’s behavior and symptoms, typically using a Likert scale for responding. Likert scales use ordinal ratings to determine the strength or level of the behavior in question. In the developmental disorders literature , using an informant is common as the individual being assessed often cannot report on these behaviors themselves. Benefits of behavior checklists include relatively short administration times with limited assistance needed from the clinician. Hence, only limited training, mostly on the scoring and interpretation of the instrument, is required. Furthermore, with this method, two informants can easily complete the questionnaire independently so that results may be compared and inconsistencies identified and addressed. Using multiple informants independently is not typically feasible with some other methods of assessment because of the length of time needed to administer the measure. While using a parent or caregiver to glean information about the child’s symptoms provides the benefit of being able to sample behavior during a large time period, this method does have some limitations. As mentioned earlier, it may be difficult to determine if the information reported is an accurate representation of the child’s behavior. It may be the case that parents over- or underreport symptoms. This may be due to a parent wanting a certain outcome from the assessment or may be from a lack of knowledge about what is considered typical behavior. Some assessments ask the informant to compare the child’s behavior to typically developing peers which can be difficult if the informant has limited experience with other children .
Screening Instruments
Now that different methods for assessment of ASD have been discussed, special attention to one class of assessment tools is warranted, screening instruments . Often before a formal diagnostic assessment takes place, a screening measure is administered to parents or caregivers to determine if a child is at risk for ASD. Screening all children for ASD allows children who are not at risk (i.e., those that pass the screener) to avoid a more thorough, extensive assessment while also identifying children who are at risk (i.e., those who fail the screener) because they are exhibiting some symptoms of ASD. Within these screening instruments, there are two levels of assessments. Level 1 screeners are the broadest form and are typically administered to all children even if there is no current concern regarding development (e.g., during well-child visits at a pediatrician’s office). These brief questionnaires are usually filled out by parents with little assistance from the clinician, though clinician observations are sometimes integrated into the screening process. Because these instruments are administered to such a wide range of children, the goal of these assessments is only to determine if the child meets the threshold for a developmental disability, not to differentiate ASD from other disorders. Some states have begun requiring that parents of all infants and toddlers who are at risk for ASD be offered these screenings as part of an effort to ensure early intervention for children with developmental disabilities .
In comparison to Level 1 screeners, Level 2 screening instruments offer a more specific look at ASD. While still broad in comparison to diagnostic tools, Level 2 screeners are used for those children who are already suspected of having a developmental disorder of some sort. These instruments often use observations by clinicians who are more familiar with the behaviors of typically developing children and of children with developmental disorders . Additionally, Level 2 screening instruments should be able to differentiate ASD from other developmental disorders such as language delay and intellectual disability. Unlike Level 1 screening tools, Level 2 screening tools do not depend solely on parent report which may be beneficial since parents may over- or underreport symptoms. On the other hand, screenings that rely more heavily on clinician observation may be inaccurate if behaviors during the observation are not representative of the child’s typical behavior.
For both Level 1 and 2 screening instruments, as well as for the other assessment measures reviewed in this chapter, the usefulness of these measures is often evaluated based on their sensitivity and specificity. That is, how often does the tool accurately classify children as having ASD who go on to be diagnosed with ASD later in childhood (i.e., sensitivity)? And how often does the instrument identify children as not having ASD who do not receive later diagnoses (i.e., specificity)? False positives and false negatives on these screening instruments can have implications for the families. Children who pass the screener but truly do have the disorder (i.e., false negatives) will likely be delayed in receiving services, while children who fail the screener but do not have the disorder (i.e., false positives) will likely be subjected to additional testing which can be lengthy and expensive for the parents. In general, when discussing screening measures, it is more acceptable to have a higher level of false positives than false negatives. That is to say that it is better to unnecessarily complete a diagnostic work up for a child who does not have the disorder than to let a child with the disorder go unassessed and untreated .
Measures of ASD in Young Children
Having discussed the general methods of assessment available for screening and diagnosing ASD, specific measures under each method will now be reviewed . The focus of this review is on the most highly used and researched measures. Additionally, as Aspergerʼs syndrome is not typically diagnosed until a later age, no measures specific to Aspergerʼs syndrome will be discussed. Screening measures will be discussed first followed by observational measures, diagnostic interviews, and informant-based behavior checklists.
Screening Instruments
The Checklist for Autism in Toddlers
The Checklist for Autism in Toddlers (CHAT; Baron-Cohen et al. 1992) is a Level 1 screening tool which can be easily administered by a pediatrician or other clinician with minimal training. Composed of two parts, the CHAT requires 10–15 min to administer. Part 1 of the CHAT includes nine items that are answered by the parent during a brief interview. Five of these items are key items. If all of these items are failed, the presence of an ASD is likely. A subset of these items (e.g., assessing protodeclarative pointing) indicates a moderate risk for ASD if failed. Aside from these key items, the other items are meant to aid in differentiating among different developmental disorders . The second portion of the CHAT is an observation that is made up of five behaviors suggestive of developmental disorders that is completed by the clinician.
In examining the usefulness of the CHAT, the instrument was administered to 50 18-month-old children in order to determine which items were normally passed by typically developing children (Baron-Cohen et al. 1992). Then, the CHAT was administered to 41 children who had siblings with ASD hence placing these children at a higher risk. The CHAT accurately identified all four of the children in the sample who went on to receive later diagnoses of ASD .
Following this study, a larger sample of 16,000 18-month-olds from England was obtained to examine the usefulness of three key items (i.e., “protodeclarative pointing,” “gaze-monitoring,” and “pretend play”) in identifying children who go on to be diagnosed with ASD (Baron-Cohen et al. 1996). Of the sample, 12 children failed all three items and ten went on to receive diagnoses of ASD which resulted in a true positive rate of 83.3 % and a false positive rate of 16.6 %. For children who failed protodeclarative pointing and/or pretend play items, 68.2 % received a diagnosis of language delay, but none were diagnosed with ASD. Overall, these findings indicate the three key items adequately identify children who go on to be diagnosed with ASD.
Baird et al. (2000) examined the effectiveness of the CHAT with a 6-year follow-up study using a sample of 16, 235 children. Nineteen children were identified as having ASD at the first administration of the CHAT. At follow-up using the children’s current diagnoses, 50 children had received a diagnosis of ASD. This resulted in a specificity of 98 %, but a low sensitivity of 38 %. When the CHAT was readministered at 19 months of age, the sensitivity was once again low at 20 %, but the specificity remained high at 100 % with the overall positive predictive value being 75 %. These findings highlight the weakness of the CHAT in terms of its sensitivity, though sensitivity is improved in high versus moderate risk samples of children. In addition, the inability of the CHAT to accurately assess children with PDD-NOS as opposed to autistic disorder (AD) has been highlighted as a weakness (Robins et al. 2001; Scrambler et al. 2001) .
The Modified Checklist for Autism in Toddlers (M-CHAT; Robins et al. 2001)
In response to some of the criticisms and weaknesses of the CHAT, a modified version was created named the Modified Checklist for Autism in Toddlers (M-CHAT; Robins et al. 2001). The changes seen in the M-CHAT include an extended age range, now appropriate for children 16–30 months, and the elimination of the observational component (Dumont-Matthieu and Fein 2005). This was in an effort to make the M-CHAT usable across a variety of cultures in which the observational component was not as feasible. Because of the removal of the observational aspect, the parent-report questions are broader to sample a wider range of behaviors. The 23 items, six of which are critical items, are answered in a yes/no format. The full measure requires about 5 min to administer. The screener is failed if the child fails two of the critical items or three of the 23 items.
Several researchers have examined the psychometrics of the M-CHAT. Robins et al. (2001) used a sample of 1,122 children aged 18–24 months during well-baby checkups and 171 children with previously diagnosed DSM-IV disorders. In terms of reliability, internal consistency was 0.85 for all items and 0.83 for critical items. Further investigation revealed a positive predictive power of 0.80, a negative predictive power of 0.99, a sensitivity of 0.87, and a specificity of 0.99. The M-CHAT attempted to strengthen its specificity by decreasing the cutoff score for a positive screen when compared to the CHAT (Coonrod and Stone 2005; Robins et al. 2001). A limitation of this study, however, was that diagnoses were not confirmed with follow-up evaluations. As a result, conclusions from this study should be interpreted with caution. Other studies examining the psychometrics of the M-CHAT have revealed fair to excellent internal consistency, 0.77 for critical items and 0.92 for total scores; however, specificity was still found to be lacking, 0.43 and 0.27, respectively (Eaves et al. 2006b). As was the case with the CHAT, the M-CHAT should be used with the understanding that the likelihood for a false positive is relatively high and a thorough diagnostic work up should be completed to confirm diagnoses .
While the M-CHAT is commonly used in English-speaking countries, it has also been translated into other languages and used across the world. As of 2011, the M-CHAT had been translated into 28 different languages (Robins 2011). One example of these translations and adaptations is the CHAT-23 which was translated and adapted for Chinese populations (Wong et al. 2004). In order to adapt the scale, the authors used a 4-point Likert scale as opposed to the original yes/no format and also included the five observational measures from the original CHAT. As with the original measure, the scoring has critical items and noncritical items. The CHAT-23 is considered to be failed when two of the seven critical items or six of the 23 total items are failed. The authors proposed that the questionnaire portion of the CHAT-23 be used as a first tier assessment, while the observational component should only be given if the first portion is failed. As with any translated measure, the psychometrics of the scale need to be reestablished in the new language. The authors reported a sensitivity of 0.74–0.93, a specificity of 0.77–0.91, and a positive predictive value of 0.74–0.85 with the new version of the measure .
Screening Tool for Autism in Two-Year-Olds
The Screening Tool for Autism in Two-Year-Olds (STAT; Stone and Ousley 1997) is a Level 2 screening tool to be administered by health-care workers or other service providers. This 12-item measure is designed for children aged 24–35 months and is to be completed during play interactions between the child and the clinician. The entire measure requires 15–20 min to complete. Within these 12 items, there are several domains of behavior examined: two regarding play, four examining imitation, four regarding directing-attention items, and two unscored items of response to requests. Each item is scored based on if the child completes the goal target behavior, and each of the areas (i.e., play, imitation, and attention) has its own cutoff score. If two of the three scored areas are failed, the total screen is considered to be failed.
Limited studies have examined the psychometrics of the STAT. Based on the scoring criteria proposed by the authors, the sensitivity was 0.83 while the specificity was 0.86 (Stone et al. 2000). However, other studies using different scoring criteria found improved sensitivity and specificity, 0.92 and 0.85, respectively (Stone et al. 2004). Stone et al. (2008) also examined the STAT to determine if it would be a useful screening instrument for children aged 12–23 months. Findings indicated promising sensitivity, 0.95, and specificity, 0.73. The positive predictive value was somewhat low at 0.56 while the negative predictive value was 0.97. Further investigation revealed that the sensitivity and specificity were especially lower for children aged 12–13 months indicating this measure should be used with caution for children this young. Additionally, studies examining the reliability and validity of this scale are lacking and should be conducted to support the use of the measure in clinical settings .
Pervasive Developmental Disorders Screening Test-Second Edition
The Pervasive Developmental Disorders Screening Test-Second Edition (PDDST-II; Siegel 2004) is a screening tool that differs from some of the previously discussed measures in that it can function as a Level 1 or Level 2 measure. This instrument, appropriate for children aged 12–48 months, has three forms that are administered in different stages. The three forms are Primary Care Screener (23 items), Developmental Clinic Screener (14 items), Autism Clinic Severity Screener (12 items), and they range from a general screening tool that detects any developmental problem to the higher-level forms that differentiate among different disorders on the ASD spectrum. This measure was standardized with children who had other neurodevelopmental disorders (e.g., ADHD, ID, language disorders) as well as children with ASD. By using this sample to norm the PDDST-II, ASD can be differentiated from these other disorders common in early childhood. For the first stage of the assessment, items are scored on a three-point scale representing how often the behaviors occur with total scores of five or greater indicating that more extensive screening should be completed with Stages 2 and 3 of the PDDST-II.
Once again, limited information is available on the psychometrics of this instrument. There are some promising data on Stage 1 (i.e., Primary Care Screener) with a sensitivity of 0.92 and a specificity of 0.91; however, sensitivity and specificity are not as promising for Stages 2 and 3, with sensitivities of 0.73 and 0.58 and specificities of 0.49 and 0.60, respectively. Based on this information, the PDDST-II does not seem to have adequate power to differentiate among different ASD. While the stage structure of this measure seems good in theory, further research is needed to support the utility of this measure .
Social Communication Questionnaire
The Social Communication Questionnaire (SCQ; Rutter et al. 2003b) is a screening tool that was developed using items from the ADI-R (Rutter et al. 2003b), which is reviewed in depth in the diagnostic interview section. Previously named the Autism Screening Questionnaire (ASQ), the SCQ is a 40-item parent-report questionnaire that requires 10 min to administer and can be used to assess children as young as 4 years with a mental age of 2 years. Like the ADI-R, the SCQ has three subscales: social development and play, communication, and repetitive and restricted behaviors. There are also different forms of the SCQ including the lifetime form that takes all developmental history into account and the current form that only examines behavior over the past 3 months. A score of 15 or above indicates risk for ASD and the need for a more comprehensive assessment. Using this cutoff, the SCQ is adequately able to discriminate between ASD and non-ASD across all cognitive levels. While 15 is the cutoff point determined by Berument, Rutter, Lord, Pickles, and Bailey, other researchers have suggested that different cut points may be useful depending on the sample and purpose of the assessment (e.g., research versus clinical purposes; Lee et al. 2007) .
The psychometrics of this measure seem promising overall. Not surprisingly, the developers of the scale found that the SCQ correlates highly with the ADI-R (Berument et al. 1999); however, due to methodological flaws the results must be interpreted with caution. Berument et al. (1999) also examined the sensitivity and specificity between ASD and non-ASD which was acceptable with a sensitivity of 0.85 and a specificity of 0.75 (Berument et al. 1999). Using both clinical and general population samples with ASD, Chandler et al. (2007) found similar results, though it should be noted that an older sample of children was used. In this study, when differentiating between ASD and non-ASD, sensitivity was 0.88 and specificity was 0.72. These statistics remained high even when differentiating AD from non-AD (all other ASD and non-ASD; sensitivity 0.90, specificity 0.86). On the other hand, other studies examining these indicators were not quite as strong with a reduced sensitivity of 0.71 (Eaves et al. 2006b) and reduced specificity of 0.54 (Eaves et al. 2006b). Overall, this measure seems to have some utility as a screening measure though other psychometrics, such as reliability, need to be evaluated in more depth.
First Year Inventory
The First Year Inventory (FYI; Reznick et al. 2007) is a relatively new screening tool that aims to assess children beginning at the age of 12 months. This measure is meant to identify children who are at a risk for atypical development, but there is a specific focus on examining ASD characteristics. The items for the FYI were created from a pool of items developed based on extensive literature reviews and current theories regarding ASD. While the two main factors of this scale are social-communication and sensory-regulatory functions, there are also some items which target general developmental problems and associated problems with autism. The whole measure includes 63 items; the first group of items are scored on a 4-point Likert scale (1 = “Never,” 2 = “Seldom,” 3 = “Sometimes,” and 4 = “Often”) while the second group of items are multiple choice. While limited research has been conducted on the FYI, the pilot study, which used mailings to families, suggests that the measure is easy to use and may be useful to pediatricians as a screening tool. Some preliminary data also suggest the FYI can discriminate among children with ASD, other developmental disorders , and typical development (Watson et al. 2007). However, the FYI is typically used more often in research, and the measure is also longer than most other screening measures which may limit its utility in clinical settings .
Diagnostic Measures
Having reviewed the common measures used to screen for ASD in young children, the discussion now moves to those measures used during more comprehensive evaluations. These measures, in some cases, can reliably distinguish among different ASD (Mahoney et al. 1998). As mentioned above, several methods and types of these assessments exist including observational measures, structured parent/caregiver interviews, and parent/caregiver-report behavior checklists. Specific measures within each of these categories are reviewed.
Observational measures
Observations by the clinician can provide invaluable information during the assessment process. Because these clinicians have specific training and a background in ASD, they know what behaviors to look for and try to elicit during the assessment. With the use of structured observational assessments, this process can be standardized and scores can be assigned that can then be interpreted based on norms. Some of the more commonly used observational scales are reviewed here including the Childhood Autism Rating Scale (CARS; Schopler et al. 1988), the Childhood Autism Rating Scale-Second Edition (CARS2; Schopler et al. 2010), the Pre-Linguistic Autism diagnostic Observation Schedule (PL-ADOS; DiLavore et al. 1995), and the Autism Diagnostic Observation Scale-Generic (ADOS-G; Lord et al. 2002) .