Diagnostic result
Screening result
Disorder
No disorder
Positive
True positive 40
False positive 95
PPV 40/135 = 0.30
Negative
False negative 10
True negative 855
NPV 855/865 = 0.99
Sensitivity 40/50 = 0.80
Specificity 855/950 = 0.90
Hit rate 40 + 855/1000 = 0.90
Within the screening paradigm, two errors result: false positives and false negatives. Each is associated with untoward outcomes. For false negatives, the screening error does not allow for detection of a condition and receipt of appropriate intervention and may produce a misleading assumption that additional screening is unnecessary in the future. For false positives, the screening error wastes time and resources for individuals who do not need assessment and interventions. False positives may also produce undue stress for those undergoing additional assessment. Given the general purpose of screening, however, false positives are typically viewed as more acceptable errors.
Screening for Autism in Young Children
Presently, ASD is a neurodevelopmental disorder defined by (a) social-communicative (SC) impairments and (b) impairing restrictive/repetitive behaviors or interests (RRB) present early in development. Parents often identify concerns about their children’s development in the first 2 years of life. Concerns are often shared with healthcare providers when children are 14–18 months old, with some concerns being conveyed as early as 11 months (Chawarska, Klin, Paul, & Volkmar, 2007; Coonrod & Stone, 2004). First symptoms often involve language delay accompanied by social communication delays or deficits. For example, infants and toddler with ASD are often less responsive to their name being called; have difficulties with eye contact; demonstrate less social smiling; show poor imitation skills; or lack imitation skills altogether. During children’s early development, caregivers often report concerns that their child may be deaf due to the lack of social response to their name being called. Early symptoms of ASD also include poor pretend play skills and impairments in joint attention, both in its initiation and appropriate response. The social communicative and play difficulties exhibited by many young children with ASD are part of the repertoire of typically developing children by the age of 18 months. The presence of these symptoms also discriminates between young children with ASD and those with language and developmental delays.
Early symptoms of RRB include unusual toy play (e.g., repetitive play with toys; lining up toys), repetitive interests (e.g., watching same videotape or video clip), and repetitive movements (e.g., hand flapping). Approximately one third of those with ASD experience a period of developmental regression, whereby acquired skills are lost. Regression is most often reported in the area of language development and most often during the ages of 20–24 months (Barger, Campbell, & McDonough, 2013). Despite the presence of early parental concerns and symptoms, the average age of diagnosis for ASD diagnosis in the United States is often reported at 4–5 years of age (e.g., Centers for Disease Control [CDC], 2012). Wiggins, Baio, and Rice (2006) further documented that the average time delay between initial evaluation for developmental concerns and diagnosis of ASD was 13 months. Given these findings, it is important that research and clinical practice continue to focus on reducing the time between initial parental concerns, age of initial evaluation for ASD, and age of diagnosis. By screening for ASD in young children, clinicians have the opportunity to promote earlier evaluation, diagnosis, and access to specialized interventions, which have been shown to improve social, emotional, cognitive, and behavioral functioning in young children with ASD (Dawson et al., 2010; Eaves & Ho, 2004).
Overview of Screening Measures for Early Childhood
Due to the importance of early assessment and targeted interventions for young children with ASD, the field has developed and validated, with some success, screening measures designed to identify autism-specific symptoms in young children. By utilizing screening tools with young children, clinicians are better able to identify children at risk for developmental delays and ASD in order to refer them for more comprehensive evaluations (Meisels, 1985). The current section reviews Level 1 screeners, which are designed to identify children at risk for developmental disorders from unselected, generally low-risk populations, as well as Level 2 screeners, which are used to differentiate children at risk for autism versus those at risk for other developmental disorders.
Screening measures differ in purpose and usability across settings (Zwaigenbaum & Stone, 2006). Specifically, Level 1 screeners tend to be used commonly in pediatric or primary healthcare settings at well-child visits, thus suggesting that these screeners should be quick and easy to administer and score given the limited time clinicians can typically spend with each child. On the other hand, Level 2 screeners are used more frequently in community settings that serve children with a range of disabilities such as early intervention programs or diagnostic centers, which tend to have more time to conduct more interactive, time-consuming evaluations. Despite the differences in the types of screeners, researchers have suggested that multilevel models of screening and a combination of screening tools may be more effective than a single screener in some cases (Miller et al., 2011; Roux et al., 2012). For example, a risk-prevention model, in which Level 2 interactive screeners are used to assess children identified as at risk for autism during Level 1 screening, is designed to increase children’s access to earlier, specialized interventions (Ibañez, Stone, & Coonrod, 2014).
Level 1 Screening Measures
In order to identify children at risk for ASD within the general population, two approaches can be used. One strategy, referred to as general developmental screening, identifies children at risk for a variety of developmental problems including ASD. In contrast, Level 1, autism-specific screeners are used to screen the general population to identify ASD symptoms within a child’s overall developmental profile. In the following section, both types of Level 1 screening measures are described, including brief overview of validity and reliability information presented in peer-reviewed publications.
General Developmental Screening
Researchers have found that most (82 %) of pediatricians screen for general developmental delays; however, less than half of these pediatricians utilized validated procedures (dosReis, Weiner, Johnson, & Newschaffer, 2006; Self, Parham, & Rajagopalan, 2014). It is crucial for healthcare providers who service young children to use general developmental screeners in order to identify children with cognitive, language, or social delays. By using general developmental screening measures, healthcare providers can make referrals to specialty clinics or early intervention centers if children are identified as at risk for a developmental delay or disorder. Many broad developmental screeners play a role in the early identification process; three measures are briefly reviewed in this section. Two widely used general developmental measures are the Ages and Stages Questionnaire, Third Edition (ASQ-3; Squires & Bricker, 2009) and the Parents’ Evaluation of Developmental Status (PEDS; Glascoe, 2003). A third tool, the Infant/Toddler Checklist (ITC) component of the Communication and Symbolic Behavior Scales Developmental Profile (CSBS DP; Wetherby & Prizant, 2002), focuses more specifically on children’s communication and symbolic functioning.
Ages and Stages Questionnaire, Third Edition
The Ages and Stages Questionnaire, Third Edition (ASQ-3 ) is a 30-item parent-report measure designed to examine developmental functioning in children ages 1–66 months in the following five domains: communication, fine motor, gross motor, personal-social, and problem solving (Bricker & Squires, 1999; Squires & Bricker, 2009). The ASQ-3 includes age-specific questions and identifies children as “at risk,” “not at risk,” or in the “monitoring zone,” which indicates their development should continue to be monitored over time. For risk classification, the ASQ-3 has high test-retest reliability (0.92) and inter-rater reliability (0.93). Sensitivity ranges from 0.83 to 0.89 and specificity ranges from 0.80 to 0.92 across ages (Squires & Bricker, 2009). Overall, the ASQ-3 seems to screen appropriately for overall general developmental functioning; however, it will not identify specific cases of ASD or ASD symptoms, such as joint attention or interest in peers.
Parents’ Evaluation of Developmental Status
The Parents’ Evaluation of Developmental Status (PEDS is a brief, 10-item yes/no parent questionnaire that assesses developmental concerns for children ages 1–95 months in the following five domains: global/cognitive, expressive language, receptive language, social-emotional, and other (Glascoe, 1998, 2003). Responses to the PEDS are divided into “predictive” or “non-predictive” concerns. The PEDS was validated on a sample of 771 children ages 0–8 from urban, rural, and suburban areas across the United States. Sensitivity ranges from 0.74 to 0.79 while specificity ranges from 0.70 to 0.80. Currently, mixed findings have been reported regarding the PEDS’ ability to identify children at risk for ASD among the general population. One group of researchers found that the PEDS failed to identify a large portion of children who were identified using the Modified Checklist for Autism in Toddlers (M-CHAT; Robins, Fein, Barton, & Green, 2001), which is an autism-specific screener. In conclusion, the PEDS meets the recommended psychometric properties for a general development screener, and it has been standardized and validated as well as used commonly in settings that serve young children. Future research should continue to explore the usability and psychometric properties of the PEDS as it relates to the identification of ASD.
Infant Toddler Checklist
Another tool focused on identifying children at risk for language, social communication, and general developmental delays is the Communication and Symbolic Behavior Scales Developmental Profile (CSBS DP; Wetherby et al., 2004; Wetherby & Prizant, 2002). Based on Wetherby and Prizant’s (1993) work, the CSBS DP is comprised of three separate measures that can be used for a variety of purposes depending on the setting and particular needs of the population. The Infant/Toddler Checklist (ITC ) is reviewed here as it is considered to be a broad, population screener, and the other two tools, the CSBS DP Caregiver Questionnaire and the CSBS Behavior Sample, are follow-up assessment measures typically employed after children have been previously identified. For a comprehensive review of these two measures, refer to Wetherby and Prizant (2002). Based on standard scores across a 4-month interval for a normative sample, the CSBS DP has internal consistency ranging from 0.86 to 0.92 and good test-retest reliability (Wetherby, Brosnan-Maddox, Peace, & Newton, 2008).
The ITC component of the CSBS DP is a standardized instrument, consisting of 24 yes/no parent-report items and one open-ended parent concern question. Specifically, parents are asked to describe their child’s developmental concerns if they answer “yes” to the following question: “Do you have any concerns about your child’s development?” The ITC screens for deficits in communication and symbolic skills among 6–24-month-old infants (Wetherby et al., 2008; Wetherby & Prizant, 2002). The ITC not only features screening cutoff scores but also has related standard scores at monthly intervals based on a normative sample of 2188 children ages 6–24 months (Wetherby & Prizant, 2002). In one study, Wetherby et al. (2004) examined the validity of the ITC in detecting communication delays in over 3000 children ages 6–24 months who were screened from a general population sample as part of the FIRST WORDS® Project. The following two samples were asked to receive further evaluation using the CSBS DP Behavior Sample after they were initially screened with the ITC: (a) children who scored in the bottom tenth percentile on the ITC and (b) randomly selected children functioning within normal limits on the ITC. After further evaluation, children were diagnosed with ASD, diagnosed with developmental delay, or identified as typically developing.
When the ASD and DD groups were combined together and compared to the typically developing group, sensitivity was estimated to be 88.9 %. However, sensitivity increased to 94.4 % when the ASD group was solely examined with the typically developing group. Overall specificity was 88.9 %. Thus, the ITC had good sensitivity and specificity to be used as a general population screener for developmental abnormalities, including ASD and other DDs. More recently, researchers used similar procedures as Wetherby et al. (2004) to further validate the ITC. Results suggested that the ITC is valid for screening children ages 9–24 months, but it fails to accurately assess parental concerns at 6–8 months (Wetherby et al., 2008). Specifically, the PPV and NPV, which were above 70 %, both support validity of the ITC for children 9–24 months; however, the false negative rate was high for 6- to 8-month-old infants. Additionally, less than half of the parents in the sample reported concerns between 6 and 15 months; however, 75 % reported concerns between 21 and 24 months.
Currently, a positive screen on the ITC does not necessarily differentiate children with ASD from those with other developmental problems; however, some researchers suggest the ITC is more capable of screening a heterogeneous sample of children with ASD that is more inclusive of high-functioning individuals. Specifically, the ITC was able to identify children with higher composite scores and greater variability on the Mullen Early Learning Scales (Wetherby et al., 2008) than were identified in a lower-scoring sample screened using another parent-rated screener (Kleinman et al., 2008). If children screen positive on the ITC screener, then clinicians may consider referral for further communication evaluation using the CSBS Behavior Sample or an autism-specific Systematic Observation of Red Flags for Autism (SORF). If children screen negative on the ITC, then they should consistently participate in developmental screening every 3 months until age 24 months (Wetherby et al., 2008). Future research should continue to examine the validity of the ITC in determining which children should receive ASD diagnoses within a large, general sample.
Summary
Although differences in population makeup and sampling may explain various results, general consensus suggests that broad-based measures do not sufficiently identify all children who may be at risk for ASD. Thus, general developmental screeners should be utilized in pediatric primary care settings to identify children for a range of developmental concerns; however, they do not seem to replace first-stage, autism-specific measures. If general developmental measures are to be used as first-stage screeners, further research is needed to validate their use in detecting children with ASD and other DDs. Currently, the most accurate approach is to use a broadband measure followed by an ASD-specific tools when screening children ages 18–24 months in the general population (Ibañez et al., 2014).
Level 1 Autism-Specific Screening
In order to identify unique behavioral symptoms indicative of ASD, Level 1, autism-specific measures have been developed for screening general populations. The American Academy of Pediatrics (AAP) recommends that these measures be used at 18- and 24-month preventive pediatric healthcare visits (Johnson & Myers, 2007); however, pediatricians often do not screen for ASD and, if they do, they often do not adhere to the AAP guidelines (e.g., Self et al., 2014). Comprehensive reviews of published autism-specific screeners are available to supplement our review (Mawle & Griffiths, 2006; Robins & Dumont-Mathieu, 2006); select peer-reviewed Level 1 autism-specific screeners are reviewed in the following section.
Checklist for Autism in Toddlers
Over two decades ago, Baron-Cohen, Allen, and Gillberg (1992) developed the Checklist for Autism in Toddlers (CHAT), which was the first autism-specific measure designed for general population screening during 18-month-year olds’ routine healthcare visits. The CHAT is a nine-item parent-report measure combined with five items to be observed by health professionals. The CHAT samples children’s functioning in several areas, with particular focus on early signs of ASD, such as gaze monitoring, use of protodeclarative pointing (i.e., initiating joint attention), and pretend play (Baron-Cohen et al., 1992). In the first publication establishing the CHAT’s psychometric properties, the measure was used to screen 50 infants during routine, 18-month checkups as well as a sample of 41 young siblings of children with autism, a high-risk sample (Baron-Cohen et al., 1992). Using a cutoff criteria of failing two or more skill areas, the CHAT correctly identified four children who were later diagnosed with ASD while none of the typically developing siblings were identified using the CHAT.
A later study used the number of passes and failures within each of the three domains to place 16,000 18-month children into one of three groups: Autism, Developmental Delay (DD), or Typically Developing (Baron-Cohen et al., 1996). Out of the 12 children placed in the Autism group, ten later received a diagnosis of autism and two received a diagnosis of DD, which were confirmed 3.5 years after initial evaluations. A follow-up study conducted 6 years later rescreened the sample and established scoring thresholds for groups identified as either high or medium risk for autism (Baird et al., 2000). The high-risk criteria required children to fail items about protodeclarative pointing and pretend play on both parent and observer portions of the CHAT as well as gaze monitoring items when observed by the clinician. However, the medium-risk criteria required children to fail the protodeclarative pointing parent and observer portions but pass one of the other items.
Using the high-risk criteria, the CHAT identified 10 of 50 children with ASD in the population sample of 16,235. As such, the CHAT produced a sensitivity of 0.20 and specificity of 0.998. Using medium-risk criteria, sensitivity was 0.38, specificity was 0.98, and the PPV was 0.05. When children were screened twice using the CHAT, the PPV increased to 0.75 and the sensitivity decreased to 0.18 (Baird et al., 2000). Although the CHAT identified some children who later received diagnoses of ASD, it did not identify a majority of the children. Overall, the poor sensitivity and high false negative rates associated with the CHAT suggest that future research is needed to determine its effectiveness in screening for ASD symptoms in 18-month-old infants. Additionally, the CHAT may not represent the ideal screening tool for all settings as it requires both clinician observation of children’s behaviors and parental report.
Modified Checklist for Autism in Toddlers
The Modified Checklist for Autism in Toddlers (M-CHAT) is a modified version of the CHAT adapted for use as a Level 1 screener in pediatric settings in the United States (Robins et al., 2001). The M-CHAT consists only of parent-rated items; however, physicians can “flag” the screener when they suspect autism despite responses on the parent checklist. The M-CHAT is comprised of 23 questions, including nine items from the parent-report CHAT and 14 other items specifically related to symptoms of autism present in young children such as repetitive behaviors, which are not included on the CHAT. The following six critical items are included on the M-CHAT: protodeclarative pointing, following a point, showing objects, imitation, interest in other children, and response to name (Robins et al., 2001). Internal consistency reliabilities for the entire screener (a = 0.85) and six critical items (a = 0.83–0.84) are adequate (Kleinman et al., 2008; Robins et al., 2001). A Chinese version of the M-CHAT, known as the CHAT-23, has recently been developed; however, the measure should continue to be examined for its utility across settings and in other countries (Wong et al., 2004). The English version of the M-CHAT is reviewed in the following section.
To examine initial psychometric properties of the M-CHAT, 1122 children were screened in primary care settings and 141 children in early intervention sites using the M-CHAT screener (Robins et al., 2001). Robins et al. (2001) utilized follow-up interviews to confirm the presence of symptoms in children who met the cutoff criteria, which were either failing two or more critical items or failing any three items. Children who failed the screener after the interview participated in further evaluation. In this sample, 58 children received evaluations, 74 parents completed follow-up interviews that did not end in their children failing the M-CHAT, and 1161 children did not require follow-up interviews.
Most children diagnosed with ASD were referred from early interventionists, indicating the sample was initially a high-risk group. Results varied depending on the cutoff criteria (i.e., failing two critical items or three total items) as well as whether or not children who passed after follow-up interviews were labeled as false positives. Initial results, which examined the checklist and follow-up interview combined, revealed sensitivity ranging from 0.95 to 0.97, specificity ranging from 0.95 to 0.99, PPV from 0.36 to 0.80, and NPV reported at 0.99 (Robins et al., 2001).
Another study examined the M-CHAT by screening 3309 children in a low-risk sample at well checkups and 484 children in a high-risk sample who were either referred by specialists for further evaluation or screened by early intervention providers (Kleinman et al., 2008). Identical to Robins et al. (2001) initial results, the PPV of the entire sample was 0.36; however, the PPV of the low-risk sample alone was 0.11 compared to 0.60 for the high-risk sample. When examining the children whose initial failed screens were confirmed during follow-up interviews, the PPV of the entire sample rose to 0.74. Similarly, the PPVs of both groups (low risk—0.65; high risk—0.76) also increased when solely including the children whose responses on the screener were confirmed via interview. When children were rescreened and re-evaluated at around age 4, seven children were diagnosed with autism who did not fail the M-CHAT screen at a younger age (Kleinman et al., 2008). Thus, seven false negatives were identified out of the total sample of 1416 from combined low- and high-risk samples when children’s symptoms were monitored in longitudinal studies.
In a large, recent follow-up study, 18,989 toddlers between the ages of 16 and 30 months were screened during well-child visits (Chlebowski, Robins, Barton, & Fein, 2013). Of the 1737 children who screened positive on the initial M-CHAT, 74.6 % participated in the follow-up interview, and 1023 children screened negative after the interview. However, 272 continued to screen positive after the phone interview and were referred for further evaluation. The PPV for the initial M-CHAT screening alone was 0.06, and the PPV was 0.53 for the M-CHAT combined with the follow-up phone interview. Overall, results indicate that it is crucial to combine the M-CHAT screener with a follow-up telephone interview to reduce false positive and avoid unnecessary referrals and parent concerns (Chlebowski et al., 2013; Kleinman et al., 2008). This factor is especially important in settings that serve large numbers of families, thus leaving healthcare providers with limited time and resources to spend on each child.
The use of a brief, follow-up interview, either on the phone or in person at a healthcare provider’s office, improves accurate referral for further diagnosis and screening for suspected ASD. Recently, a revised version, referred to as the Modified Checklist for Autism in Toddlers—Revised, with Follow-Up (M-CHAT-R/F; Robins, Fein, & Barton, 2009), was developed to allow physicians to review responses on the M-CHAT-R checklist with parents in greater detail. The follow-up interview serves as a Level 2 screener within the M-CHAT-R/F screener and it is discussed in the Level 2 screening section. In conclusion, mixed results regarding the sensitivity and specificity of the M-CHAT suggest that future research should continue to provide support for the utility, reliability, and validity of this common autism-specific Level 1 screener. However, the M-CHAT is the most commonly used and researched tool for screening for ASD in the general population.
Pervasive Developmental Disorders Screening Test, Second Edition
The Pervasive Developmental Disorders Screening Test, Second Edition (PDDST-II) is a parent-report screening measure for autism and other pervasive developmental disorders designed for children ages 12- to 48-months old (Siegel, 2004). The PDDST-II is comprised of three forms that include both Level 1 and Level 2 screeners as well as an additional form; the appropriate form is selected depending on the proposed purpose of the screener. Depending on clinical use, three PDDST-II forms exist: (a) a Stage 1—Primary Care Setting form, (b) a Stage 2—Developmental Clinic Setting form, and (c) a Stage 3—Autism Clinic Severity Setting form. Each stage is associated with varying cutoff scores and can be used in conjunction or individually.
The Primary Care Setting (PCS) form, which consists of 22 parent-report items, is most likely to be utilized by general pediatricians and primary care physicians to identify 12- to 48-month-old infants at risk for autism (Siegel, 2004). When 681 children at risk for ASD and 256 children with other developmental disorders were screened using the PCS, sensitivity and specificity were found to be 0.92 and 0.91 (Siegel, 2004). The Developmental Clinic Setting (DCS) form includes 14 items that can be used to identify children in specialized developmental settings who are more likely to have autism than a range of other developmental disorders. When the DCS form was used to compare functioning of 490 children diagnosed with ASD to 194 diagnosed with other disorders, sensitivity and specificity were found to be 0.73 and 0.49, respectively, when an associated cutoff score of 5 was utilized (Siegel, 2004).
Lastly, the Autism Clinic Severity Setting (ACSS) form consists of 12 items that assess early symptoms to predict severity levels of ASD. When the ACSS form was used to compare 355 children with ASD to 99 children with either PDD-NOS or Asperger’s disorder, sensitivity and specificity were found to be 0.58 and 0.60, when an associated cutoff score of eight was utilized. The Level 1 PCS form correctly classified over 90 % of cases; however, the sample presented in the manual was a selected sample of children at high risk at the time of screening. Thus, the PCS form of the PDDST-II should be validated by screening children in the general population rather than those who have already been identified as at risk to be fully endorsed as an appropriate Level 1 screener. Additionally, the sensitivity and specificity of the DCS form fall below generally acceptable levels for a screener; therefore, the DCS needs additional validation before it is recommended as a Level 2 screener. Overall, additional studies exploring the psychometric properties and usability of the entire PDDST-II rating system are needed prior to its endorsement.
Summary
Many children falsely identified by autism-specific screeners meet criteria for other developmental delays; therefore, children without ASD but other delays may benefit from early screening using ASD-specific or broad-based tools. Young children should be screened for ASD at 18- and 24-month checkups as well as whenever parental concerns are expressed. When examining sensitivity and specificity, some of the measures (i.e., M-CHAT and M-CHAT-R/F) appear promising; however, results of many studies are difficult to generalize. For example, some studies included high-risk samples when assessing general population screeners, failed to validate cutoff criteria before conducting studies, and refrained from following up with children who passed the screeners after their initial screening. Even for measures, such as the M-CHAT, that have generated promising psychometric support, there are limitations associated with imperfect measures designed to identify relatively low base rate disorders, such as ASD. The impact of low base rate is discussed further in subsequent sections of the chapter.
To address potential concerns and reduce false positives, healthcare providers should follow-up with parents whose children fail the screening by reviewing any flagged items or concerns. Additionally, clinicians should continue to screen children for developmental concerns that may arise in the future despite passing an initial screening. Overall, the M-CHAT and M-CHAT-R serve as the strongest Level 1 autism-specific tool; however, the Level 2, follow-up interview (i.e., M-CHAT-R/F) should be included as part of the initial screening to confirm positive screens. The follow-up interviews can take place over the phone or in physicians’ offices, especially if an electronic version of the M-CHAT is utilized. The electronic version of the M-CHAT, which has been researched preliminarily in a primary care setting (Harrington, Bai, & Perkins, 2013), is unique in that it can be scored instantly, enabling physicians to conduct follow-up questioning at the same time as the developmental screening and well-child visits.
Level 2 Screening Measures
The following section contains a brief overview of measures designed to identify children with autism after developmental concerns have already been noted. Level 2 autism-specific screeners are most commonly used in community settings, such as early intervention centers or evaluation clinics, to help differentiate children at risk for autism from those at risk for other disorders. Peer-reviewed, published measures that utilize a variety of formats (i.e., follow-up interviews, standardized observations, rating scales) are reviewed in this section. The rating scales are relatively easy to score and administer while standardized observations tend to be more time-consuming and require a higher level of clinician training.
Modified Checklist for Autism in Toddlers: Revised/Follow-Up
As discussed above, the M-CHAT is one of the most commonly used Level 1 screeners; however, research suggests clinicians should utilize the follow-up interview to reduce false screens using the M-CHAT alone (Chlebowski et al., 2013). The Modified Checklist for Autism in Toddlers—Revised/Follow-up (M-CHAT-R/F) is a two-step screener for detecting symptoms of ASD in children ages 16 to 30 months. Although the M-CHAT-R/F is similar to the M-CHAT (Robins et al., 2001), several changes have been incorporated including dropping three items that performed poorly, reorganizing the placement of items, simplifying language, and clarifying items by using examples and adding context. In present form, the M-CHAT-R/F has 20 items and classifies children at low (total score <3), medium (total score 3–7), or high risk (total score ≥ 8) for autism based on parental responses (Robins et al., 2009). If children are classified as medium risk at initial M-CHAT-R/F screening, the follow-up interview can be completed via telephone or in person to confirm failed items (Robins et al., 2009). Children who continue to be classified as medium risk after interviews should be referred for further diagnostic evaluation. However, children initially classified as high risk should immediately be referred for further evaluation and/or early intervention services.
Robins et al. (2014) report that 7 % of 16,071 children fell into medium or high risk compared to 9 % of children on earlier versions of the M-CHAT. The overall autism detection rate was higher for the M-CHAT-R/F (67 cases per 10,000 screened) than for the earlier version (45 cases per 10,000 screened). Overall, the modified instrument seems to have several advantages over the earlier versions; however, preliminary data suggest that the screening performance of the M-CHAT-R/F does not differ significantly from the original version as long as the follow-up interview is utilized (Robins et al., 2014). Future research is needed on the M-CHAT-R/F if it is intended to replace the original M-CHAT in primary healthcare settings. The M-CHAT-R/F rating form, follow-up interview, and scoring software are also freely available at: www.mchatscreen.com.
Screening Tool for Autism in Toddlers
The Screening Tool for Autism in Toddlers (STAT) is a Level 2 screener involving a 20-min, play-based interactive session with children ages 24 to 36 months (Stone, Coonrod, & Ousley, 2000; Stone, Coonrod, Turner, & Pozdol, 2004). The 12 items administered during the session assess the four following domains of social communication: play (two items), motor imitation (four items), directing attention (four items), and requesting (two items). Assessment of the four domains does not require language comprehension, and the domain scores are equally weighted and combined to derive a total score ranging from 0 to 4, with higher scores representing more impairments and a cutoff score of 2 indicating “risk for ASD.”
To assess the validity of the STAT, Stone et al. (2000) randomly assigned 24- to 35-month-old children to one of two groups: (a) a development sample and (b) a validation sample. The development sample consisted of seven children with ASD and 33 with disorders other than ASD while the validation sample included 12 children with ASD and 21 with other disorders. When diagnosis based on DSM-IV criteria was used as the standard, the sensitivity and specificity of the development sample were 1.00 and 0.91, respectively. Examination of the validation sample alone yielded sensitivity and specificity of 0.83 and 0.86, as well as PPV of 0.77 and NPV of 0.90. When subgroups of children with and without autism were created and matched on mental age, the sensitivity and specificity were both 0.83.
Using a similar approach as above, Stone et al. (2004) matched two groups consisting of 26 children with autism and 26 children with other developmental delays or language impairments. These children were randomly assigned to either a developmental sample or validation sample to further examine the validity of the STAT. The authors used clinical diagnosis as the standard to create cutoff scores for the development sample before testing the cutoff criteria on the validation sample. Using this approach, the validation sample produced a sensitivity of 0.92, specificity of 0.85, PPV of 0.86, and NPV of 0.92. Concurrent validity of the STAT was examined through agreement with the Autism Diagnostic Observation Schedule—General by comparing STAT risk category (i.e., ASD risk/no risk) to ADOS-G diagnosis (i.e., ASD/no ASD). The resulting Cohen’s kappa of 0.77 and 89 % agreement between the measures provided support for the validity of the STAT. Inter-rater agreement, as measured by Cohen’s kappa, was 0.88 for risk category when 30 children were assessed. Additionally, test-retest reliability was 0.88 when 18 children were screened by two different examiners 2–3 weeks apart, and the correlation between the STAT scores across both times was 0.85 (Stone et al., 2004).
Although the STAT was initially developed and validated on children ages 24 to 36 months of age, exploratory research suggests that the STAT may be suitable for children under the age of 2 (Stone, McMahon, & Henderson, 2008). Researchers examined the validity of the STAT for screening 71 children in a high-risk sample below 24 months of age, of which 59 had an older sibling with ASD and 12 who were referred for evaluation for suspected ASD. In this study, the original STAT cutoff score of 2 for “at risk” was increased to 2.75 in order to maintain adequate sensitivity and specificity for children 12–23 months. The revised cutoff score produced a sensitivity of 0.95, specificity of 0.73, PPV of 0.56, and NPV of 0.97. When 12–13-month-olds were removed from the sample due to high false positives rates (38 %), the sensitivity was 0.93, specificity was 0.83, PPV was 0.68, and NPV was 0.97. Thus, the PPV and specificity improved when younger infants were excluded from the sample while the NPV and sensitivity remained acceptable. Preliminary evidence suggests that the original STAT may be used to screen children under 2 years old; however, results need to be validated in larger samples and cutoff scores need to be validated for younger children.