b'V\n\ni\n\nNo. 20-619\nIN THE\nSUPREME COURT OF THE UNITED\nSTATES\nA.S. a 9-year old child with Autism Spectrum\nDisorder (ASD) entitled to Special Education and\nRelated services per IDEA represented by his\nparents R.S. Pro se and E.S. Pro se\nPlaintiffs-Petitioners\n-v.Board of Education Shenendehowa Central\nSchool District,\nInterim Commissioner Betty Rosa, of The\nUniversity of the State of New York\nDefendants-Respondents\nPetition for Rehearing\nOn Writ of Certiorari\nTo the U.S. Court of Appeals for the 2nd\nCircuit\nPETITION FOR REHEARING\nON SUBSTANTIAL GROUNDS NOT\nPREVIOUSLY PRESENTED\nPetition for Rehearing for the denial of Petition of\nWrit of Certiorari appealing the Decision, Order\n\nRECEIVED\nFEB 1 1 2021\n9iF,^M\xc2\xb0FF<JqH|JERCTL5RSK\n\n\x0cii\n\nand Judgment of The United States Court of\nAppeals of the Second Circuit by Judges Pierre N.\nLeval, Raymond J. Lohier, Jr. and Michael H.\nPark to dismiss for lack of jurisdiction the Appeal\nfrom the Memorandum-Decision and Order and\nJudgment of The United States District Court for\nthe Northern District of New York by Judge\nLawrence E. Khan entered February 20, 2019 and\nMotion to Reopen Granted on March 16, 2020 and\npostmarked on March 16, 2020 where FRAP\nsuggests 14-day timeline begins on March 19, 2020\nin Action No. 20-1153.\nNEW QUESTIONS PRESENTED IN THIS\nPETITION FOR REHEARING\n1. Is the Supreme Court Aware that an Autism\nGene Therapy clinical trial is likely less than 5\nyears away as a result of the Advent of\nCRISPR/Cas9\nbased\nGene\nEditing\nTechnologies such as Base Editing and Prime\nEditing?\n2. Is the Supreme Court Aware that when an\nAutism Gene Therapy Clinical Trial\nCommences it will have to hold itself to the\nsame standard that the Lovaas UCLA Early\nAutism Program (Lovaas, O. I., 1987, Journal\nof Consulting and Clinical Psychology, 55:3-9)\nand High Fidelity Replications (Cohen, H.,\nAmerine-Dickens, M. & Smith, T., 2006,\nDevelopmental and Behavioral Pediatrics,\n27:S145-S155; Howard, J. S., Stanislaw, et al.,\n2014, Research in Developmental Disabilities.\n\n\x0ciii\n\n35:3326-3344; Sallows, G. 0., & Graupner, T.\nD., 2005, AJMR, 110:417-438) held themselves\nto because those standards most closely\nparallel one\xe2\x80\x99s ability to achieve \xe2\x80\x9cfurther\neducation, employment and independent\nliving\xe2\x80\x9d. 20 U.S.C. \xc2\xa7 1400(d)(1)(A) That is an\nachievement of IQ in the normal range,\nachievement if Vineland Adaptive Behavior\nScales (VABS) Composite Score in the normal\nrange that would be expected to be followed by\na normal classroom placement? Thus, Autism\nGene Therapy will need an intensive ABA\nframework in place to ensure that the outcomes\nof any program can be attributed to that\nprogram and not an IBI or intensive ABA\nprogram completed in parallel?\n3. Is the Supreme Court Aware that if a viable\nand national effort to use proven approaches to\nautism is not in place that this country will\nsoon (in about 2 decades) be stuck in\npermanent or long-term recession\xe2\x80\x94removing\nour position as the world\xe2\x80\x99s leading power\xe2\x80\x94as a\nresult of the increasing incidence of Autism?\nQUESTIONS PRESENTED IN ORIGINAL\nPETITION FOR WRIT OF CERTIORARI\n1. Whether an appellate court may sue sponte\ndismiss an appeal which has been filed within\nthe time limitations stated in the Federal Rules\nof Appellate Procedure FRAP Rule 26(c) that\nadds 3 days for service by mail to file an appeal\nfor which the motion has been granted to\n\n\x0civ\nreopen the time to file an appeal under rule\n4(a)(6) of FRAP?\n2. Whether non-attorney pro se parents can\nreasonably have been expected to know of\nunwritten rules that lawyers take for granted\nthat FRAP Rule 26(c) does not apply to mailed\nmotions that are granted to reopen the time to\nfile an appeal under rule 4(a)(6) of FRAP when\nthat is impossible to determine when reading\nthe Federal Rules of Appellate Procedure?\n3. Whether the interpretation of FRAP is\nintended to be based on the stand-alone\ndocument and whether supplementary rules\nare required for its interpretation where such\nsupplementary rules are referenced within\nFRAP to the particular application of FRAP\nrule 26(c) on FRAP rule 4(a)(6)?\n\n4. Is Intensive Behavioral Intervention or its\nequivalent intensive Applied Behavior\nAnalysis (ABA) required for a specific period of\ntime for a child with autism in order for the IEP\nto be \xe2\x80\x9creasonably calculated\xe2\x80\x9d for the child to\nmake progress in light of their circumstance?\n5. In light of question 4, is there any other way to\nraise measures by \xe2\x80\x9ctechnically sound\ninstruments that may assess the relative\ncontribution of cognitive and behavioral\nfactors, in addition to physical or\ndevelopmental factors.\xe2\x80\x9d (20 U.S.C. \xc2\xa7 1414\n\n\x0cV\n\n(b)(2)(C); 8 N.Y.C.R.R. \xc2\xa7 200.6(6)(ii)(x)) such as\nIQ and Vineland Adaptive Behavior Scales\n(VABS) such that \xe2\x80\x9cfurther education,\nemployment and independent living\xe2\x80\x9d 20 U.S.C.\n\xc2\xa7 1400(d)(1)(A) is a reasonable expectation for\nat least half of all school aged children with\nautism?\n6. Can a court defer to the opinion of a lower\njudicial body when there is an alleged bias of\nthat lower judicial body?\n7. Are the rules, regulations and laws of 8\nN.Y.C.R.R. \xc2\xa7200 et seq. and also The IDEA 20\nU.S.C. \xc2\xa7\xc2\xa7 1400-1482 especially as it relates to\npersons with autism written so that they are\nunconstitutionally vague and such that they\ncause confusion and variation in opinion in the\ncourts, absent expensive expert testimony, and\nunlawfully empower school personnel, schools,\nschool districts other Local Education Agencies\n(LEAs) to broadly interpret the education law\nthemselves especially on such pertinent\nmatters of Least Restrictive Environment\n(LRE) determinations and the appropriateness\nof a particular educational approach such that\nit permits the curtailing of the rights of\nstudents receiving special education and their\nparents and consistently results in a denial of\na FAPE, a denial of access to the students LRE\nto the maximum extent appropriate and also\nresults in confusion amongst the appellate\ncourts on how to interpret the education law\nand render a judgment?\n\n\x0cvi\n\n8. Given the nature of the common developmental\ndelays found in nearly all autism spectrum\ndisorder (ASD) diagnoses, if a student with a\nan ASD entitled to an Individualized\nEducation Plan (IEP) and special education\nand related services should the three measures\nof 1) expressive language, 2) conversational\nability (measured in the number of peer aged\nexchanges that a student can consistently\ndemonstrate) with typically developing peers if\nin their LRE and 3) a reduction in prompt\ndependence be guaranteed goals on the\nstudent\xe2\x80\x99s IEP since these measures are\nnecessary to the purpose of The Individuals\nwith Disabilities Education Act (The IDEA) (20\nU.S.C. \xc2\xa7\xc2\xa7 1400-1482) which is \xe2\x80\x9cto ensure that\nstudents with disabilities have available to\nthem a FAPE in the LRE to the maximum\nextent appropriate that emphasizes special\neducation and related services designed to\nmeet their unique needs and prepare them for\nfurther\neducation,\nemployment,\nand\nindependent\nliving\xe2\x80\x9d\n(20\nU.S.C. \xc2\xa7\xc2\xa7\n1400(d)(1)(A))?\n9. If Question 8 (corrected) is not answered in the\naffirmative does 20 U.S.C. \xc2\xa7\xc2\xa7 1400(d)(1)(A))\nhave any meaning for a child with autism?\n\n\x0cvii\nTABLE OF CONTENTS\nA. ORIGINAL QUESTIONS PRESENTED IN\nPETITION\nFOR\nA\nWRIT\nOF\n111\n\xe2\x80\x94\nvi\nCERTIORARI\nB. NEW QUESTIONS PRESENTED IN THIS\n.11 -111\nPETITION FOR REHEARING\nC. ORIGINAL TABLE OF AUTHORITIES FOR\nPETITION FOR A WRIT OF CERTIORARI\nXll - xm\nCases............................................\nXlll \xe2\x80\x94 xiv\nStatutes, Rules and Regulations\nxiv\nLegislative Materials.................\nxiv - xvi\nPublications.................................\nD. NEW TABLE OF AUTHORITIES FOR\nPETITION FOP REHEARING\n,x\nStatutes, Rules and Regulations.....\n,x\nNonprofit and Government Sources\n,x\nVideos................................................\n,xi - xii\nPublications........................................\nE. REASONS TO GRANT THE PETITION FOR\n1 - 12\nREHEARING\nI.\n\nTHE FUTURE OF AUTISM GENE\nTHERAPY MAY DEPEND ON THIS\n1-3\nPETITION\n\nII.\n\nTHE ABILITY OF AUTISM GENE\nTHERAPY TO SERVE THE GREATEST\nGOOD LIKELY REQUIRES THAT\nINTENSIVE ABA METHODOLOGY BE\n\n\x0cviii\nAN\nEDUCATIONAL\nRIGHT\nTO\nPERSONS WITH AUTISM FOR 2 TO 3\nCONSECUTIVE\nUNINTERRUPTED\nYEARS\n3-9\nIII.\n\nBASE EDITING AND PRIME EDITING\nGENE\nTHERAPY\nTECHNOLOGY\nPLATFORM HAS BEEN DEVELOPED\nMAKING AUTISM GENE THERAPY\nPOSSIBLE AND THERE IS ALREADY\nAN UNRELATED FDA APPROVED\nGENE THERAPY TARGETING THE\nBRAIN\n9-10\n\nIV.\n\nTHE INCIDENCE OF AUTISM IS\nINCREASING AND\nNOT FINDING\nTHAT\nINTENSIVE\nABA\nMETHODOLOGY\nBE\nAN\nEDUCATIONAL RIGHT TO PERSONS\nWITH AUTISM WILL PUT THIS\nCOUNTRY ON A COURSE TO\nPERMANENT\nOR\nLONG-TERM\nRECESSION.......................... 10-12\n\nI. CONCLUSION\n\n12\n\nJ. APPENDICIES\nAppendix 1, Lovaas, O. I. (1987). Behavioral\ntreatment and normal educational and intellectual\nfunctioning in young autistic children. Journal of\nConsulting and Clinical Psychology, 55 (1), 3-9.\n\n\x0cix\n\nAppendix 2, McEachin, J.J., Smith, T., Lovaas, O.I.\n(1993) Long-term outcome for children with autism\nwho received early intensive behavioral treatment.\nAJMR, 97, 4, 359 - 372.\nAppendix 3, Sallows, G. O., & Graupner, T. D. (2005).\nIntensive behavioral treatment for children with\nautism: Four-year outcome and predictors. AJMR,\n110, 417-438.\nAppendix 4, Cohen, H., Amerine-Dickens, M., &\nSmith, T. (2006). Early intensive behavioral\ntreatment: Replication of the UCLA model in a\ncommunity setting. Developmental and Behavioral\nPediatrics, 27, S145-S155.\nAppendix 5, Howard, J. S., Stanislaw, H., Green, G.,\nSparkman, C. R., & Cohen, H. G. (2014). Comparison\nof behavior analytic and eclectic early interventions\nfor young children with autism after three years,\nResearch in Developmental Disabilities, 35 (12),\n3326-3344.\nAppendix 6, Thusberg, J., Olatubosun, A. & Vihinen,\nM. Performance of mutation pathogenicity prediction\nmethods on missense variants. Hum Mutat. 2011 32\n(4):358-68.\nAppendix 7, Gerasimavicius, L., Liu, X. & Marsh, J.A.\nIdentification of pathogenic missense mutations using\nprotein stability predictors. 2020, Sci Rep 10, 15387\n\n\x0cX\n\nTABLE OF AUTHORITIES FOR PETITION\nFOR REHEARING\nSTATUTES, RULES AND REGULATIONS\n20 U.S.C. \xc2\xa7 1400(d)(1)(A).......................................... 111\n\nGOVERNMENT AND NONPROFIT SOURCES\nMindspec: Informatics for Autism and\nNeurodevelopmental Disorders\nWeb Address: http://www.mindsnec.org/\nSpecific Referenced Webpage Within: Autism\nInformatics Portal\nWeb Address:\nhttp://autism.mindspec.org/autdb/submitsearch7selfl\nd 0=GENES GENE SYMBOL&selfldv 0=&numOf\nFields=l&userAction=viewall&tableName=AUT HG\n&submit2=View+All\n1\nCenter for Disease Control and Prevention: Data &\nStatistics on Autism Spectrum Disorder\nWeb Address:\nhttps://www.cdc.gov/ncbddd/autism/data.html....... 10\nAutism Speaks: Autism Statistics and Facts\nWeb Address:\nhttps://www.autismspeaks.org/autism-statistics.... 11\nVIDEO PRESENTATIONS\n\n\x0cxi\n\nTristram Smith Keynote Presentation on EvidenceBased Practices for Children with ASD May 30, 2014\nhttps://www.voutube.com/watch?v=tQ2fA32vsZQ.... 4\nPUBLICATIONS\nCohen, H., Amerine-Dickens, M., & Smith, T. (2006).\nEarly intensive behavioral treatment: Replication\nof the UCLA model in a community setting.\nDevelopmental and Behavioral Pediatrics, 27,\nS145-S155\n\xe2\x96\xa0v, 7\nHoward, J. S., Stanislaw, H., Green, G., Sparkman, C.\nR., & Cohen, H. G. (2014). Comparison of\nbehavior analytic and eclectic early interventions\nfor young children with autism after three years.\nResearch in Developmental Disabilities, 35 (12),\n3326 - 3344\nv, 7\nLovaas, O. I. (1987). Behavioral treatment and\nnormal educational and intellectual functioning\nin young autistic children. Journal of Consulting\nand\nClinical\nPsychology,\n55(1),\n39\nii, v, 7, 8\nMcEachin, J.J., Smith, T., Lovaas, O.I. (1993) Long\xc2\xad\nterm outcome for children with autism who\nreceived early intensive behavioral treatment.\nAJMR. 97, 359 - 372\n7\nSallows, G. O., & Graupner, T. D. (2005). Intensive\nbehavioral treatment for children with autism:\nFour-year outcome and predictors. AJMR, 110,\n417-438\n\xe2\x96\xa0v, 7, 8\nThusberg, J., Olatubosun, A. & Vihinen, M.\nPerformance of mutation pathogenicity prediction\n\n\x0cxii\n\nmethods on missense variants. Hum Mutat. 2011\n2\n32 (4):358-68\nGerasimavicius, L., Liu, X. & Marsh, J.A.\nIdentification of pathogenic missense mutations\nusing protein stability predictors. 2020, Sci Rep\n2\n10, 15387\nAUTHORITIES ORIGNALLY USED FOR\nPETITION FOR A WRIT OF CERTIORARI\nCASES\n(Application of a Student with a Disability, NYSED\nSRO Decision No. 17-008)\nA.M. v. New York City Dep\'t of Educ., 845 F.3d 523,\n541-45 (2d Cir. 2017)\nAmanda J. ex rel. Annette J. v. Clark County School\nDist. 267 F.3d 877 (9th Cir. 2001)\nBd. of Educ. v. Rowley, 458 U.S. 176 (1982)\nDeal v. Hamilton County Board of Educ., 392 F.3d 840\n(6th Cir. 2004)\nT.R. v. Kingwood Twp. Bd. of Educ., 205 F. 3d 572, 577\n(3d Cir. 2000)\nL.G. ex rel. E.G. v. Fair Lawn Bd. of Educ. 486 Fed.\nAppx. 967 (3rd Cir. 2012)\nL.B. and J.B. on behalf of K.B. v. Nebo Sch. Dist., 379\nF.3d 966 (10th Cir. 2004)\nMadison Board of Education v. S. V. 2020 WL\n5055149 (U.S. NJ 2020)\n\n\x0cxiii\n\nR.E.B. v. State of Hawaii Department of Education\n870 F.3d 1025 (9d Cir. 2017)\nR.E:B. v. State of Hawaii Department of Education\n55770 Fed. Appx. 796 (9d Cir. 2019)\nRenee J. v. Houston Independent School District 333\nF.Supp.3d 674 (S.D.TX.)\nR.S. v. Bd. of Educ. Shenendehowa Cent. Sch. Dist.,\n1:17-CV-0501 (LEK/CFH) (N.D.N.Y. Feb. 20, 2019)\nSumter County School Dist. 17 v. Heffernan ex rel. TH\n642 F.3d 478 (4th Cir. 2011)\nT.H. v. Board of Education of Palatine, 55 F.Supp.2d\n830 (N.D. Ill. 1999)\nWinkelman v. Parma City School District, 127 S. Ct.\n1994 (2007)\nWittenberg v. Winston-Salem /Forsyth County Board\nof Education, 2008 WL 11189389 (M.D.N.C. 2008)\nZ.F. v. South Harrison Community School Corp. 2005\nWL 2373729 (S.D. IA. 2005)\nSTATUTES, RULES AND REGULATIONS\nFOURTEENTH AMENDMENT SECTION II\n\xe2\x80\x9cVoid for Vagueness\xe2\x80\x9d Doctrine of the U.S.\nConstitution\nIDEA\nDaubert Standard\nEducation of the Handicapped Act Amendments of\n1990, Pub. L. No. 101-476, 104 Stat. 1103 (1990)\n\n\x0cxiv\nIndividuals with Disabilities Education Act\nAmendments of 1997, Pub. L. No. 105\xe2\x80\x9417, 111 Stat.\n37 (1997)\nIndividuals with Disabilities Education Act\nAmendments of Pub. L. No. 108-446, 118 Stat. 2647\n(2004)\n20 U.S.C. \xc2\xa7\xc2\xa7 1400-1482 et seq)\n20 U.S.C. \xc2\xa7 1400(c)(1)\n20 U.S.C. \xc2\xa7 1400(c)(1) (2000 & Supp. IV 2004)\n20 U.S.C. \xc2\xa7 1400(d)(1)(A)\n20 U.S.C. \xc2\xa7 1400(d)(l)(A-B)\n20 U.S.C. \xc2\xa7 1412(a)(5)(A)\n20 U.S.C. \xc2\xa7 1414(b)(2)(C)\n20 U.S.C. \xc2\xa7 1414(d)\n20 U.S.C. \xc2\xa7 1414(d)(l)(A)(i)(IV)\n28 U.S.C. \xc2\xa7 1254\n8 N.Y.C.R.R. \xc2\xa7 200 et seq.\n8 N.Y.C.R.R. \xc2\xa7 200.4(d)(2)(v)(b)\n8 N.Y.C.R.R. \xc2\xa7 200.6(6)(ii)(x)\nFed. R. App. P. 4(a)(6)\nFed. R. App. P. 26(c)\nLEGISLATIVE MATERIALS\nS. Rep. No. 94-168 (1975), as reprinted in 1975\nU.S.C.C.A.N. 1425\nCong. Rec. 19492 (1975)\n\n\x0cXV\n\nPUBLICATIONS\nCohen, H., Amerine-Dickens, M., & Smith, T. (2006).\nEarly intensive behavioral treatment: Replication\nof the UCLA model in a community setting.\nDevelopmental and Behavioral Pediatrics, 27,\nS145-S155\nHoward, J. S., Stanislaw, H., Green, G., Sparkman, C.\nR., & Cohen, H. G. (2014). Comparison of\nbehavior analytic and eclectic early interventions\nfor young children with autism after three years.\nResearch in Developmental Disabilities, 35 (12),\n3326 - 3344\nHoward, J. S., Sparkman, C. R., Cohen, H. G., Green,\nG., & Stanislaw, H. (2005). A comparison of\nintensive behavior analytic and eclectic\ntreatments for young children with autism.\nResearch in Developmental Disabilities, 26, 359\xe2\x80\x94\n383\nKoegel, R. L., Werner, G. A., Vismara, L. A., & Koegel,\nL. K. (2005). The effectiveness of contextually\nsupported play date interactions between\nchildren with autism and typically developing\npeers. Research and Practice for Persons with\nSevere Disabilities, 30, 93-102\nLee, P. F., Thomas, R. E., & Lee, P. A. (2015).\nApproach to autism spectrum disorder: Using the\nnew DSM-V diagnostic criteria and the Can\nMEDS-FM\nframework.\nCanadian\nfamily\nphysician Medecin de famille canadien, 61(5),\n421-424\nLovaas, O. I. (1987). Behavioral treatment and\nnormal educational and intellectual functioning\n\n\x0cxvi\nin young autistic children. Journal of Consulting\nand Clinical Psychology, 55(1), 3-9\nMcEachin, J.J., Smith, T., Lovaas, O.I. (1993) Long\xc2\xad\nterm outcome for children with autism who\nreceived early intensive behavioral treatment.\nAJMR. 97, 359-372\nSallows, G. O., & Graupner, T. D. (2005). Intensive\nbehavioral treatment for children with autism:\nFour-year outcome and predictors. AJMR, 110,\n417-438\n\n\x0c1\n\nREASONS FOR GRANTING THE PETITION\nFOR REHEARING\nI. THE FUTURE OF AUTISM GENE THERAPY\nMAY DEPEND ON THIS PETITION.\nThe advent of CRISPR/Cas9 based gene therapies will\nsoon, hopefully within 5 years, enable researchers to\npursue Autism Gene Therapy and Gene Therapies for\nrelated neurodevelopmental disorders. Although, the\npicture is complex for Autism. For example, there is\nautism.\nnearly 1100 genes associated with\nhttp ://autis m. mindspec. or g/autdb/sub mitsearch?selfl\nd 0=GENES GENE SYMBOL&selfldv 0=&numOf\nFields^ 1 &userAction=viewall&tableName=AUT HG\n&submit2=View+All (Autism Informatics Portal) For\nany given gene there is a large number of potential\nautism causing mutations where causality is not\nalways easy to establish. There may be a mutation of\na gene with possible cause of autism and in some\ninstances the prediction that the mutation causes\nautism is nearly certain and in other instances not.\nCausal links may be easily made with nonsense\nmutations or protein truncating variants (that\nreduces the length of the protein) and frameshift (that\nchanges virtually every amino acid\xe2\x80\x94in comparison to\nthe natural functioning form\xe2\x80\x94that follows where the\nlocation of the frameshift occurs) mutations both that\nmaterially change the protein and impair its function\nand also often in cases involving in-frame deletions\n(the loss of amino acids) or insertions (the addition of\namino acids). Missense mutations that change a\nsingle amino acid, can be more difficult to create a\ncausal link to autism. Structural biology combined\nwith Statistics and computational science including\n\n\x0c2\n\nmachine learning has made it possible to predict the\nlikelihood that a missense mutation would affect\nprotein function. (Thusberg, J., Olatubosun, A.,\nVihinen, M.\nHum Mutat, 2011\n52:359-68.;\nGerasimavicius, L., Liu, X. & Marsh, J.A. Sci Rep,\n2020 10:15387) Even with these tools, in most cases\nof autism scientists can only predict with low\nlikelihood that a specific missense mutation was the\ncause of the autism. In tens of thousands of instances\nthere is 3 or less documented cases of autism for one\nspecific mutation that is often a missense mutation,\nbut multiple different mutations on the same gene.\nAutism while being a spectrum disorder also has a\nseparate spectrum for each of the 1100 genes. A\nsingle gene can have a mutation in one of a number of\nplaces. The degree that the mutation impairs the\nprotein\xe2\x80\x99s function determines the severity of the\nautism within the gene\xe2\x80\x99s spectrum. Additionally,\nindividuals that express less protein than average\nwill be more greatly impacted from the mutation of\none of typically 21 functioning genes, a term referred\nto as haploinsufficiency2. With exceptions to X and Y\nchromosomes in males where generally there is one\nfunctioning gene. Additionally, factors such as\nmultiple functional domains on a single protein can\nalso contribute to the broad spectrum. This web of\ncomplexity, that is the broadness of the autism\nspectrum for any particular gene creates an ethical\n1 Especially, when the protein encoded for by the gene plays a\nmore essential function.\n2 In haploinsufficiency one of two copies of a gene is sufficiently\nnonfunctional such that there is an observable difference in the\nindividual.\n\n\x0c3\n\ndilemma of correcting a supposed autism causing\nmutation before ruling out that the individual can\nachieve typically levels of IQ and Vineland Adaptive\nBehavior Scales (VABS) from intensive behavioral\nintervention (IBI) or intensive Applied Behavior\nAnalysis (ABA). Why should an individual that can\nachieve typical levels of IQ and VABS and thus\nindistinguishable from their typically developing\npeers be subjected to gene therapy in its early stages\nif they can achieve the intended outcome without it?\nIf this court finds that an IEP that does not include\nIBI or intensive ABA for persons with autism cannot\nbe \xe2\x80\x9creasonably calculated\xe2\x80\x9d to confer educational\nbenefit making IBI or intensive ABA a matter of right\nfor persons with autism, then any aspect of the autism\ngene therapy ethical dilemma that relates to ruling\nout the individual can achieve typical levels from IBI\nor intensive ABA is in principle resolved.\nII. THE ABILITY OF AUTISM GENE THERAPY\nTO SERVE THE GREATEST GOOD LIKELY\nREQUIRES\nTHAT\nINTENSIVE\nABA\nMETHODOLOGY BE AN EDUCATIONAL\nRIGHT TO PERSONS WITH AUTISM FOR 2 TO\n3 CONSECUTIVE UNINTERRPUTED YEARS.\nThe ethical dilemma is further complicated because it\nwould not be ethically correct to limit a program to\nthe mutations that leaves persons worst off\xe2\x80\x94as such\ninstances have a host of challenges that reduces the\nlikelihood of their success in the early stages of\nautism gene therapy\xe2\x80\x94and thus that does not serve\nthe greatest good. One might argue that individuals\n\n\x0c4\nwith a particular autism causing gene are profoundly\naffected and they would clearly not recover without\ngene therapy. However, a program only on those\nindividuals is less likely to succeed in a relatively\nshort time window for reasons discussed above and\nthus would not serve the greatest good. What serves\nthe greatest good early on in a program is to initially\ncommence gene therapy those that can reasonably be\nexpected to achieve typical levels of IQ and VABS\xe2\x80\x94in\na relatively short time window\xe2\x80\x94with a successful\ncorrection of the autism causing gene but cannot do so\nwithout such a gene therapy program. This is\nconsistent with a prevailing view on a limited\nprofessional staffing scenario3 (see Tristram Smith\nKeynote Presentation on Evidence-Based Practices\nASD.\nwith\nChildren\nfor\nhttps://www.voutube.com/watch?v=:tQ2fA32vsZQ\n(5/30/2014) at 1:44:30 - 1:45:23) in IBI or intensive\nABA. We have considered the principles behind them\nand we agree with it. We also envision that autism\ngene therapy clinical trials would take place across all\ncorrectable mutations on a given gene including those\nless likely to quickly recover from gene therapy where\nthere is a sizable percentage (40+%) of individuals\nwith the mutation that could be expected to recover\nwithin 2 years from a successful correction of the\nunderlying gene, while unable to do so solely from IBI\nor intensive ABA. This gene targeting has the added\nbenefit where those less likely to quickly recover from\n3 In this scenario there is not enough trained personnel to\nprovide intensive ABA or IBI to all persons with autism so those\nthat are projected to benefit most are given priority over those\nthat are expected to minimally benefit who instead receive the\ntypical program offered by the school district.\n\n\x0c5\n\ngene therapy could participate in a gene therapy\nprogram while keeping with the principle of initially\ntargeting those for gene therapy that can reasonably\nbe expected to recover from gene therapy in a\nrelatively short time window.\nAs Autism Gene Therapy early on will be based on a\nlimited resource model due to limited initial\ninvestment whose early and rapid success determines\nthe amount and speed with which further funds will\nbe invested into such an industry. In other words, the\nearly success of autism gene therapy in recovering\npersons with autism to typically developing levels will\nmean that an increasing amount of funds will be\npoured into the industry in a rather short time\nwindow thereby increasing the number of people\nrecovered from autism as a function of time. These\nsubstantial grounds not previously presented\nfurther establishes the Supreme Court\xe2\x80\x99s Role in\ngranting the Petition for a Writ of Certiorari. If the\nSupreme Court can within its jurisdiction hear a case\nwhose outcome can have far reaching implications\nthat benefit practically all members of humanity,\nthen The Court ought to hear this case. As the\ngreatest good is served for both those than can achieve\ntypical levels from intensive ABA and those that will\nrequire gene therapy to do so. No person with autism\nwill be left behind.\nEstablishing that a mutation cannot be corrected with\nIBI requires that school aged persons with autism\nhave IBI or intensive ABA as a matter of right for 3\nyears followed by 2\xe2\x80\x943 years part time transitional\nABA. This decision falls withing the jurisdiction of\n\n\x0c6\nThe Supreme Court as it is the basis of the Petition\nand the U.S. Courts of Appeals are scattered in their\ntreatment on. (see Petition for Writ of Certiorari\n20-619) This Court can imagine a situation where a\nchild receives Autism Gene Therapy in the future but\nin some instances the child has access to IBI or\nintensive ABA and in other cases not. Further, there\nis wide programming variation across IEPs written to\nsupport unproven eclectic intervention programs.\nThis creates an efficacy nightmare because one may\nnot be able to determine the cause of the improvement\nin IQ and VABS. Was it the child\xe2\x80\x99s intervention\nprogram or was it the gene therapy that brought\nabout the improvement to typical levels?\nSimilarly, there is a challenge associated with the\nplacement being a source that negatively impacts the\npotential gains associated with a gene therapy\nprogram. Did the environment itself e.g. the highly\nrestrictive placement that includes settings that do\nnot provide access to model typically developing peers\nbring about the less than desired outcome from a gene\ntherapy program? The environment lacking model\npeers for even part of the day can equally be the\nsource that negatively impacts the IQ and VABS\noutcome. If one typically developing child is by\nthemselves placed in an educational placement that\nonly included peers with autism would they be\nexpected to develop normally? The answer is almost\ncertainly no! So then if a child with autism with a\nself-contained placement receives autism gene\ntherapy how are they expected to recover? This\nmatter can be decided on by The Supreme Court\nbecause the very foundations of IBI or intensive ABA\n\n\x0c7\n\nsupport an intervention model that takes measures to\navoid detrimental self-contained placements.\nIt has been well established that the placement itself\ncan make an otherwise perfectly designed\nintervention program ineffective. Lovaas noted this\nthat in a setting where all intervention was provided\nin a self-contained environment led to results that did\nnot allow persons with autism to recover, while the\nidentical program in environments that did not\ninclude others with autism led to 47% achievement of\ntypically developing levels that were sustained by all\nbut one participant. (Smith, T., Lovaas, O.I., 1993,\nAJMR. 97:359-372)\nNot hearing this case will potentially stifle an Autism\nGene Therapy program that could correct the autism\nfor those persons that cannot sufficiently benefit from\nIBI to reach typical levels, a number that is about 53%\nof persons with autism. It has been 34 years since\nLovaas had reported on IBI (Lovaas, 1987). All\nreplications and efforts to improve upon the results\nhave not been able to improve the outcome. Across all\nstudies ever reported no program has achieved results\nthat exceed the Lovaas Program or its Replications\n(Sallows, 2005; Cohen, 2006; Howard, 2014). As\nfurther support, in the Wisconsin Early Autism\nProgram Sallows and Graupner (Sallows, 2005) noted\ntwo types of participants in their Lovaas Program\nReplication, Rapid Learners and Moderate Learners.\nInterestingly, they cannot reliably4 be distinguished\n4 Stronger social engagement skills at program onset were\ncorrelated with better outcomes.\n\n\x0c8\n\nfrom each other at program onset as their starting\npoints are similar. See Table 3 (pp. 426, Sallows,\n2005) (Pet. Reh. App. 31). Rapid Learners average\nIntake IQ: 55.3 and VABS: 61.73. Moderate Learners\naverage Intake IQ: 47.8 and VABS: 58.7. However,\nafter the follow-up the Rapid Learners achieved a\nmean IQ of 103.73 and a mean VABS of 88.6 while\nModerate Learners achieved a mean IQ of 50.4 and\nVABS of 49.15. From these results is becomes readily\napparent who would be potential candidates for\nautism gene therapy following 3 years of IBI or\nintensive ABA. Further, it has also been shown that\neclectic intervention programs or special education as\nusual do not allow one to distinguish Rapid Learners\nfor Moderate Learners except for the top 10%\xe2\x80\x9420% of\nRapid Learners. See Table 3 of (pp. 7, Lovaas, 1987)\nsee (Pet. Reh. App. 5). Thus, 80%\xe2\x80\x9490% of Rapid\nLearners that would not need autism gene therapy to\nrecover from autism cannot be identified from special\neducation as usual.\nWe explained to this court that in a country that\nspends $250 to $300 Billion a year on autism finding\nthat an IEP that does not specify intensive ABA\nmethodology cannot be reasonably calculated to\nconfer educational benefit for persons with autism\nwould result in savings of $100 Billion annually in the\nlong term. Autism gene therapy will likely be\neffective on both adults and children. But adults that\nreceive autism gene therapy will have an entirely new\nchallenge, closing the developmental gap, that is more\n5 It should be noted that there is a broad spectrum of the\nmoderate learners. There were many moderate learners that\nsaw gains in IQ and VABS. But not to typical levels.\n\n\x0c9\n\ndifficult to close with increasing age, and finding a\nway to fit into society. This may be easily surmounted\nfor those in financially prominent families while those\nin families that have quite limited financial resources\nwill find this challenge to be significant. There is also\nthe complicated question of the underlying\npsychology after recovering from autism in adulthood.\nThus, autism gene therapy would obviously be\npreferred to be completed in childhood.\n\nIII. PRIME EDITING AND BASE EDITING\nTECHNOLOGY PLATFORMS HAVE BEEN\nDEVELOPED MAKING AUTISM GENE\nTHERAPY POSSIBLE AND THERE IS AN\nUNRELATED FDA APPROVED GENE\nTHERAPY TARGETING THE BRAIN\nIn 2019 the FDA approved Novartis\xe2\x80\x99s gene therapy\nZolgensma.\nhttps://www.zolgensma.com/what-iszolgensma. Zolgensma works by delivering episomal\nDNA\xe2\x80\x94that does not integrate into the host genomic\nDNA\xe2\x80\x94to the brain that makes a new copy of a gene\nknown as human Survival Motor Neuron 1 (SMN1).\nPersons with spinal muscular atrophy typically have\ntwo nonfunctioning copies\xe2\x80\x94referred to as autosomal\nrecessive\xe2\x80\x94of the SMN1 gene.\nMost cases of autism are due to haploinsufficiency.\nBecause thought is more fine-tuned than almost any\nother function in the body one can imagine that the\namount of protein that is needed to be expressed is\nbased on a number of factors that our molecular\nmachinery must be sufficiently sensitive to detect\nwhen more protein is required. There are a host of\n\n\x0c10\nother elements in the genome that are not part of the\ngene that can be activated to express more of the gene\nin the cell when more is required or to stop expression\nwhen there is a sufficient surplus. Thus,\nCRISPR/Cas9 (Clustered Regularly Interspaced\nShort Palindromic Repeats/CRISPR associated\nProtein 9) based gene therapy platforms such as\nPrime Editing and Base Editing technologies may be\nnecessary since these technologies correct mutations\nin the genomic DNA.\nIV. THE INCIDENCE OF AUTISM IS INCREASING\nAND NOT FINDING THAT INTENSIVE ABA\nMETHODOLOGY IS AN EDUCATIONAL RIGHT\nTO PERSONS WITH AUTISM WILL PUT THIS\nCOUNTRY ON A COURSE TO PERMANENT OR\nLONG-TERM RECESSION.\nMany wonder if the increase in the incidence of\nautism\n(now\n1\nin\n54\nbirths) is real.\nhttps://www.cdc.gov/ncbddd/autism/data.html If it is\nreal, then it only means that it is going to get\nprogressively worse. Why might this be? Well, there\nare a tremendous number of genes associated with\nthe functioning of the nervous system. A benign\nmutation in one protein that functions in the nervous\nsystem may not be noticeable. However, multiple\nbenign mutations may be noticeable. It is manifested\nby so many mutations of genes associated with brain\nfunction per generation. Because the mutations\nassociated with autism are mostly completely random\nand the prevalence is increasing it raises a question\nas to whether the incidence of autism could suddenly\ntake off in the next two generations. This may seem\n\n\x0c11\nlike a long time. Although, because there is such an\nextensive number of autism causing mutations the\nwindow of investigation and correction of every\npathological autism causing mutation could take 30\nyears. So, if this matter is delayed much longer the\nconsequences for mankind will be catastrophic.\nNot hearing this case will send this country on a\ncourse into certain bankruptcy that will begin within\n20 years because once the incidence becomes too far\nout of control the financial impact would be\ndevastating. Gene therapy may not be able to bridge\nthe gap quickly enough to avoid economic devastation\nwithout the framework being in place soon. Once\nthings fall outside of a specific parameter chaos\nresults. The butterfly effect parameter is probably\nsomewhere in the area of an incidence of autism that\nis equal to a half the rate of unemployment that has\neconomic consequences, such as recession, about 10%\nunemployment. Thus, an incidence of autism of 1 in\n20 births or 1 in 10 families will result in a recession\nlike situation. That is because a person with autism\ngenerally has one of the parents as the case manager\nhttps://www.autismspeaks.org/autism-statistics and\nmakes it very difficult for that parent to hold a full\xc2\xad\ntime job while meeting the needs of their son or\ndaughter with autism. Consider the following: If the\nreported unemployment rate is 2.5% (which is good\neconomic conditions) the incidence of autism is 5%\ncausing 1 in 20 parents to not be able to maintain\ncompetitive nor full time employment and where a\nintervention provider will have to provide support to\npersons with autism at an effort equivalent to 50% of\nfull time one can imagine a situation when 5%\n\n\x0c12\n\nincidence of autism results in 5% unemployment for\nparents on top of a 2.5% unemployment rate and 2.5%\nof employment to support persons with autism. That\nis effectively equivalent to 10% unemployment. We\nare likely less than one generation away from an\nautism incidence of 1 in 20 births.\nIt is also important to point out that if the economics\nof being a BCBA provider do not sufficiently improve\nor if baseless policies are put in place that reduce their\nnumbers the manpower may not be in place in the\nfuture to provide IBI. We have a situation now that\nfunctions as the ideal situation. Sufficient manpower\nfrom BCBA providers and unprecedented advances in\nPrime Editing and Base Editing gene therapy\ntechnologies. For this reason, the Supreme Court\nmust act now to hear this matter that falls within the\njurisdiction of the court so that all persons with\nautism, their siblings and parents can benefit.\nIn Conclusion, this Petition for Rehearing and Writ of\nCertiorari Should be Granted!\nRespectfully Submitted on February 5, 2021.\n\nR.S. Pro Semi beh;\n\nE.S1\n\nro Se o:\n\nA.S.\n\niehalf of A.S.\n\n\x0c1\n\nNo. 20-619\nIN THE\nSUPREME COURT OF THE UNITED\nSTATES\nA.S. a 9-year old child with Autism Spectrum\nDisorder (ASD) entitled to Special Education and\nRelated services per IDEA represented by his\nparents R.S. Pro se and E.S. Pro se\nPlaintiffs Petitioners\n-v.Board of Education Shenendehowa Central\nSchool District,\nInterim Commissioner Betty Rosa, of The\nUniversity of the State of New York\nDefendants-Respondents\nPetition for Rehearing\nOn Writ of Certiorari\nTo the U.S. Court of Appeals for the 2nd\nCircuit\nCERTIFICATION PETITION IS\nPRESENTED IN GOOD FAITH AND NOT\nFOR DELAY\n\n\x0c2\n\nAs required by Supreme Court Rule 44.1,1 certify that\nthe PETITION FOR REHEARING in the above Case\nNo. 20-619 is presented in good faith and not for\ndelay.\nRespectfully Submitted on February 5, 2021\n\nBy:\nR.S. Pro SHfon beh;\n\nof A.S.\n\nn\n\nBy:\n\nL\n\nE.S. Pro Se on behalf of A.S.\n\nfyicoio Wml}(\\ 4\\vs .tyk\ndrd\n\nsV\xe2\x80\x98s\\"tMa\n\n$\nW\'3 #3A| puYAiL\n\nCHRISTINE MALEO\nNotary Public - State of New York\nNo. 01AL6307204\nQualified in Albany County\nMy Gammiseionlxp. O@/3O^0gg\n\napdi\'id ^(\'<SuaiAt\n....................\n\n\x0c1\n\nNo. 20-619\nIN THE\nSUPREME COURT OF THE UNITED\nSTATES\n\n\xe2\x99\xa6\nA.S. a 9-year old child with Autism Spectrum\nDisorder (ASD) entitled to Special Education and\nRelated services per IDEA represented by his\nparents R.S. Pro se and E.S. Pro se\nPlaintiffs-Petitioners\n-v.Board of Education Shenendehowa Central\nSchool District,\nInterim Commissioner Betty Rosa, of The\nUniversity of the State of New York\n.........\n...... Defendants-Respondents\nPetition for Rehearing\nOn Writ of Certiorari\nTo the U.S. Court of Appeals for the 2nd\nCircuit\nCERTIFICATE OF COMPLIANCE\nAs required by Supreme Court Rule 44.2,1 certify that\nthe PETITION FOR REHEARING in the above Case\n\n\x0c2\n\nNo. 20-619 is filed on Substantial Grounds Not\nPreviously Presented.\nRespectfully Submitted on February 5, 2021\nBy:\nR.S. Pro\n\nof A.S.\n\nBy:\nE.S. Prd^Se on^lmlf of A.S.\n\n\xe2\x96\xa04nd\n\n\\&lt m\n\n0\n\nm\nCHRISTINE MALEO\nNotary Public - State of New York\nNo. 01AL6307204\nQualified in Albany County\nMy Commteelsn Exp; 06/30/2022\n\n6 m^i\n\n(\n\npthxdki\n\n- mm SiB/Vt\n\n\x0cNo. 20-619\nIN THE\nSUPREME COURT OF THE UNITED STATES\nA.S. a 9-year old child with Autism Spectrum Disorder (ASD) entitled\nto Special Education and Related services per IDEA\nrepresented by his parents R.S. Pro se and E.S. Pro se\nPlaintiffs-Petitioners\n\nPETITION FOR\nREHEARING\n\n-v.Board of Education Shenendehowa Central School District,\nInterim Commissioner Betty Rosa, of The University of the State of New York\nDefendants-Respondents\nPetition for Rehearing\nOn Petition for a Writ of Certiorari\nTo the U.S. Court of Appeals for the 2nd Circuit\n\n\xe2\x99\xa6\nTABLE OF CONTENTS FOR\nAPPENDIX TO\nPETITION FOR REHEARING\n\nAppendix 1, Lovaas, O. I. (1987). Behavioral treatment and normal educational and\nintellectual functioning in young autistic children. Journal of Consulting and Clinical\nPsychology, 55 (1), 3-9.\n\nApp 1\n\n\x0cii\n\nAppendix 2, McEachin, J.J., Smith, T., Lovaas, O.I. (1993) Long-term outcome for\nchildren with autism who received early intensive behavioral treatment. AJMR, 97,\n4, 359 - 372.\n\nApp 8\n\nAppendix 3, Sallows, G. O., & Graupner, T. D. (2005). Intensive behavioral\ntreatment for children with autism: Four-year outcome and predictors. AJMR, 110,\n417-438.\n\nApp 22\n\nAppendix 4, Cohen, H., Amerine-Dickens, M., & Smith, T. (2006). Early intensive\nbehavioral treatment: Replication of the UCLA model in a community setting.\nDevelopmental and Behavioral Pediatrics, 27, S145\xe2\x80\x94S155\n\nApp 44\n\nAppendix 5, Howard, J. S., Stanislaw, H., Green, G., Sparkman, C. R., & Cohen, H.\nG. (2014). Comparison of behavior analytic and eclectic early interventions for young\nchildren with autism after three years. Research in Developmental Disabilities, 35\n(12), 3326-3344\n\nApp 55\n\nAppendix 6, Thusberg, J., Olatubosun, A. & Vihinen, M. Performance of mutation\npathogenicity prediction methods on missense variants. Hum Mutat. 2011 32 (4):358App 74\n\n68\n\nAppendix 7, Gerasimavicius, L., Liu, X. & Marsh, J.A. Identification of pathogenic\nmissense mutations using protein stability predictors.\n(2020)\n\nSci Rep\n\n10,\n\n15387\nApp 85\n\n\x0cPet. Reh. App.1\nJournal ofConsulting and Gimcai ftychotogy\n1987, Vol. 55, No. 1,3-9\n\nCopyright ]987 by the American Psycholorca! Association. Inc.\n0022-006X/87tfQ0.75\n\nBehavioral Treatment and Normal Educational and Intellectual\nFunctioning in Young Autistic Children\nO. Ivar Lovaas\nUniversity of California, Los Angeles\nAutism is a serious psychological disorder with onset in early childhood. Autistic children show\nminimal emotional attachment, absent or abnormal speech, retarded IQ, ritualistic behaviors, ag\xc2\xad\ngression, and self-injury. The prognosis is very poor, and medical therapies have not proven effective.\nThis article reports the results of behavior modification treatment for two groups ofsimilarly consti\xc2\xad\ntuted, young autistic children. Follow-up data from an intensive, long-term experimental treatment\ngroup (n = 19) showed that 47% achieved normal intellectual and educational functioning, with\nnormal-range IQ scores and successful first grade performance in public schools. Another 40% were\nmildly retarded and assigned to special classes for the language delayed, and only 10% were pro\xc2\xad\nfoundly retarded and assigned to classes for the autistic/retarded. In contrast, only 2% of the control group children (n = 40) achieved normal educational and intellectual functioning; 45% were mildly\nretarded and placed in language-delayed classes, and 53% were severely retarded and placed in autis\xc2\xad\ntic/retarded classes.\n\nKanner (1943) defined autistic children as children who ex\xc2\xad\nhibit (a) serious failure to develop relationships with other peo\xc2\xad\nple before 30 months of age, (b) problems in development of\nnormal language, (c) ritualistic and obsessional behaviors (\xe2\x80\x9cin\xc2\xad\nsistence on sameness\xe2\x80\x9d), and (d) potential for normal intelli\xc2\xad\ngence. A more complete behavioral definition has been pro\xc2\xad\nvided elsewhere (Lovaas, Koegel, Simmons, & Long, 1973). The\netiology of autism is not known, and the outcome is very poor.\nIn a follow-up study on young autistic children, Rutter (1970)\nreported that only 1.5% of his group (n = 63) had achieved nor\xc2\xad\nmal functioning. About 35% showed fair or good adjustment,\nusually required some degree of supervision, experienced some\ndifficulties with people, had no personal friends, and showed\nminor oddities of behavior. The majority (more than 60%) re\xc2\xad\nmained severely handicapped and were living in hospitals for\nmentally retarded or psychotic individuals or in other protective\nsettings. Initial IQ scores appeared stable over time. Other stud\xc2\xad\nies (Brown, 1969; DeMyer et al., 1973; Eisenberg, 1956; Free\xc2\xad\nman, Ritvo, Needleman, & Yokota, 1985; Havelkova, 1968) re\xc2\xad\n\nport similar data. Higher scores on IQ tests, communicative\nspeech, and appropriate play are considered to be prognostic of\nbetter outcome (Lotter, 1967).\nMedically and psychodynamically oriented therapies have\nnot proven effective in altering outcome (DeMyer, Hingtgen, &\nJackson, 1981). No abnormal environmental etiology has been\nidentified within the children\'s families (Lotter, 1967). At pres\xc2\xad\nent, the most promising treatment for autistic persons is behav\xc2\xad\nior modification as derived from modem learning theory (De\xc2\xad\nMyer et al., 1981). Empirical results from behavioral interven\xc2\xad\ntion with autistic children have been both positive and negative.\nOn the positive side, behavioral treatment can build complex\nbehaviors, such as language, and can help to suppress pathologi\xc2\xad\ncal behaviors, such as aggression and self-stimulatory behavior.\nOients vary widely in the amount of gains obtained but show\ntreatment gains in proportion to the time devoted to treatment.\nOn the negative side, treatment gains have been specific to the\nparticular environment in which the client was treated, sub\xc2\xad\nstantial relapse has been observed at follow-up, and no client\nhas been reported as recovered (Lovaas et al., 1973).\nThe present article reports a behavioral-intervention project\n(begun in 1970) that sought to maximize behavioral treatment\ngains by treating autistic children during most of their waking\nhours for many years. Treatment included all significant per\xc2\xad\nsons in all significant environments. Furthermore, the project\nfocused on very young autistic children (below the age of 4\nyeans) because it was assumed that younger children would be\nless likely to discriminate between environments and therefore\nmore likely to generalize and to maintain their treatment gains.\nFinally, it was assumed that it would be easier to successfully\nmainstream a very young autistic child into preschool than it\nwould be to mainstream an older autistic child into primary\nschool.\nIt may be helpful to hypothesize an outcome of the present\nstudy from a developmental or learning point of view. One may\nassume that normal children learn from their everyday environ-\n\nThis study was supported by Grant MH-11440 from the National\nInstitute of Mental Health. Aspects of this study were presented at the\n1982 convention of the American Psychological Association, Washing\xc2\xad\nton, DC, by Andrea Ackerman, Paula Firestone, Gayle Goldstein, Ron\xc2\xad\nald Leaf, John McEachin, and the author. The author expresses his deep\nappreciation to the many undergraduate students at the University of\nCalifornia, Los Angeles, who served as student therapists on the project,\nto the many graduate students who served as clinic supervisors, and to\nthe many parents who trusted their children to our care. Special thanks\nto Laura Schreibman and Robert Koegel, who collaborated in the early\nstages of this research project. Donald Baer, Bruce Baker, Bradley\nBucher, Arthur Woodward, and Haikang Shen provided statistical ad\xc2\xad\nvice and help in manuscript preparation. B. J. Freeman\'s help in arrang\xc2\xad\ning access to Control Group 2 data is also appreciated.\nCorrespondence concerning this article should be addressed to O. Ivar\nLovaas, Psychology Department, University of California, 405 Hilgard\nAvenue. Los Angeles. California 90024.\n3\n\n\xe2\x96\xa0 i\n\n\x0cPet. Reh. App.2\n4\n\nQ IVAR LOVAAS\n\nmerits most of their waking hours. Autistic children, conversely,\ndo not learn from similar environments. We hypothesized that\nconstruction of a special, intense, and comprehensive learning\nenvironment for very young autistic children would allow some\nof them to catch up with their normal peers by first grade.\n\nMethod\nSubjects\nSubjects were enrolled for treatment if they met three criteria: (a)\nindependent diagnosis of autism from a medical doctor or a licensed\nPhD psychologist, (b) chronological age (CA) less than 40 months if\nmute and less than 46 months if echolalic, and (c) prorated menial age\n(PMA) of 11 months or more at a CA of 30 months. The last criterion\nexcluded 15% of the referrals.\nThe clinical diagnosis of autism emphasized emotional detachment,\nextreme interpersonal isolation, little if any toy or peer play, language\ndisturbance (mutism or echolalia), excessive rituals, and onset in in\xc2\xad\nfancy. The diagnosis was based on a structured psychiatric interview\nwith parents, on observations of the child\'s freoplay behaviors, on psy\xc2\xad\nchological testing of intelligence, and on access to pediatric examina\xc2\xad\ntions. Over the 15 years ofthe project, the exact wording ofthe diagnosis\nchanged slightly in compliance with changes in the Diagnostic and Sta\xc2\xad\ntistical Manual of Mental Disorders (DSM-II1; American Psychiatric\nAssociation, 1980). During the last years, the diagnosis was made in\ncompliance with DSM-1I1 criteria (p. 87). In almost all cases, the diag\xc2\xad\nnosis of autism had been made prior to family contact with the project.\nExcept for one case each in the experimental group and Control Group\n1, all cases were diagnosed by staff of the Department ofChild Psychia\xc2\xad\ntry, University of California, Los Angeles (UCLA) School of Medicine.\nMembers of that staff have contributed to the writing of the DSM-III\nand to the diagnosis ofautism adopted by the National Society for Chil\xc2\xad\ndren and Adults with Autism. If the diagnosis of autism was not made,\nthe case was referred elsewhere. In other words, the project did not select\nits cases. More than 90% of the subjects received two or more indepen\xc2\xad\ndent diagnoses, and agreement on the diagnosis of autism was 100%.\nSimilarly high agreement was not reached for subjects who scored\nwithin the profoundly retarded range on intellectual functioning\n(PMA < 11 months); these subjects were excluded from the study.\n\nTreatment Conditions\nSubjects were assigned to one of two groups: an intensive-treatment\nexperimental group (n = 19) that received more than 40 hours of oneto-one treatment per week, or the minimal-treatment Control Group 1\n(n= 19) that received ! 0 hours or less of one-to-one treatment per week.\nControl Group 1 was used to gain further information about the rate of\nspontaneous improvement in very young autistic children, especially\nthose selected by the same agency that provided the diagnostic work-up\nfor the intensive-treatment experimental group. Both treatment groups\nreceived treatment for 2 or more years. Strict random assignment (e.g.,\nbased on a coin flip) to these groups could not be used due to parent\nprotest and ethical considerations. Instead, subjects were assigned to\nthe experimental group unless there was an insufficient number of staff\nmembers available to render treatment (an assessment made prior to\ncontact with the family). Two subjects were assigned to Control Group\nI because they lived further away from UCLA than a 1-hr drive, which\nmade sufficient staffing unavailable to those clients. Because fluctua\xc2\xad\ntions in staffavailability were not associated in any way with client char\xc2\xad\nacteristics, it was assumed that this assignment would produce unbiased\ngroups. A large number of pretreatment measures were collected to test\nthis assumption. Subjects did not change group assignment. Except for\ntwo families who left the experimental group within the first 6 months\n\n(thisgroup began with 21 subjects), all families stayed with their groups\nfrom beginning to end.\n\nAssessments\nPretreatment mental age (MA) scores were based on the following\nscales (in order ofthe frequency of their use): the Bayley Scales of Infant\nDevelopment (Bayley, 1955), the Cattell Infant Intelligence Scale (Cattell, 1960), the Stanford-Binet Intelligence Scale (Thorndike, 1972), and\nthe Gesell Infant Development Scale (Gesell, 1949). The first three\nscales were administered to 90% of the subjects, and relative usage of\nthese scales was similar in each group. Testing was carried out by gradu\xc2\xad\nate students in psychology who worked under the supervision of clinical\npsychologists at UCLA or licensed PhD psychologists at other agencies.\nThe examiner chose the test that would best accommodate each sub\xc2\xad\nject\xe2\x80\x99s developmental level, and this decision was reached independently\nof the project staff. Five subjects were judged to be untestable (3 in the\nexperimental group and 2 in Control Group I). Instead, the Vineland\nSocial Maturity Scale (Doll, 1953) was used to estimate their MAs (with\nthe mother as informant). To adjust for variations in MA scores as a\nfunction of the subject\xe2\x80\x99s CA at the time of test administration, PMA\nscores were calculated for a CA at 30 months (MA/CA X 30).\nBehavioral observations were based on videotaped recordings of the\nsubject\'s firee-play behavior in a playroom equipped with several simple\neariy-childhood toys. These videotaped recordings were subsequently\nscored for amount of (a) self-stimulatory behaviors, defined as pro\xc2\xad\nlonged ritualistic, repetitive, and stereotyped behavior such as bodyrocking, prolonged gazing at lights, excessive hand-flapping, twirling the\nbody as a top, spinning or lining of objects, and licking or smelling of\nobjects or wall surfaces; (b) appropriate play behaviors, defined as those\nlimiting the use of toys in the playroom to their intended purposes, such\nas pushing the truck on the floor, pushing buttons on the toy cash regis\xc2\xad\nter, putting a record on the record player, and banging with the toy ham\xc2\xad\nmer; and (c) recognizable words, defined to include any recognizable\nword, independent of whether the subject used it in a meaningful con\xc2\xad\ntext or for communicative purposes. One observer who was naive about\nsubjects\' group placement scored all tapes after being trained to agree\nwith two experienced observers (using different training tapes from sim\xc2\xad\nilar subjects). Interobserver reliability was scored on 20% of the tapes\n(randomly selected) and was computed for each category of behavior\nfor each subject by dividing the sum of observer agreements by the sum\nof agreements and disagreements. These scores were then summed and\naveraged across subjects. The mean agreement (based both on occur\xc2\xad\nrences and nonoccurrences) was 91% for self-stimulatory behavior, 85%\nfor appropriate play behavior and 100% for recognizable words. A more\ndetailed description of these behavioral recordings has been provided\nelsewhere (Lovaas et at., 1973).\nA 1-hr parent interview about the subjects\xe2\x80\x99 earlier history provided\nsome diagnostic and descriptive information. Subjects received a score\nof 1 for each ofthe following variables parents reported: no recognizable\nwords; no toy play (failed to use toys for their intended function); lack of\nemotional attachment (failed to respond to parents\' affection); apparent\nsensory deficit (parents had suspected their child to be blind or deaf\nbecause the child exhibited no or minimal eye contact and showed an\nunusually high pain threshold); no peer play (subject did not show inter\xc2\xad\nactive play with peers); self-stimulatory behavior; tantrums (aggression\ntoward family members or self); and no toilet training. These 8 mea\xc2\xad\nsures from parents* intake interviews were summed to provide a sum\npathology score. The intake interview also provided information about\nabnormal speech (0 = normal and meaningful language, however lim\xc2\xad\nited; 1 = echolalic language used meaningfully [e.g., to express needs];\n2 - echolalia; and 3 = mute); age of walking; number of siblings in\nthe family; socioeconomic status of the father; sex; and neurological\nexaminations (including EEGs and CAT scans) that resulted in findings\nof pathology. Finally, CA at first diagnosis and at the beginning of the\n\n\x0cPet. Reh. App.3\n5\n\nTREATMENT OF AUTISTIC CHILDREN\npresent treatment were recorded. This yielded a total of20 pretreatment\nmeasures, 8 of which were collapsed into 1 measure (sum pathology).\nA brief clinical description of the experimental group at intake fol\xc2\xad\nlows (identical to that for Control Croup 1): Only 2 of the 19 subjects\nobtained scores within the normal range of intellectual functioning; 7\nscored in the moderately retarded range, and 10 scored in the severely\nretarded range. No subject evidenced pretend or imaginary play, only 2\nevidenced complex (several different or heterogeneous behaviors that\ntogether formed one activity) play, and the remaining subjects showed\nsimple (the same elementary but appropriate response made repeat\xc2\xad\nedly) play. One subject showed minimal appropriate speech, 7 were\necholalic, and 11 were mute. According to the literature that describes\nthe developmental delays of autistic children in general, the autistic sub\xc2\xad\njects in the present study constituted an average (or below average) sam\xc2\xad\nple of such children.\nPosttreatment measures were recorded as follows; Between the ages\nof 6 and 7 years (when a subject would ordinarily have completed first\ngrade), information about the subjects\xe2\x80\x99 first-grade placement was sought\nand validated; about the same time, an IQ score was obtained. Testing\nwas carried out by examiners who were naive about the subjects\xe2\x80\x99 group\nplacement. Different scales were administered to accommodate differ\xc2\xad\nent developmental levels. For example, a subject with a regular educa\xc2\xad\ntional placement received a Wechsler intelligence Scale for ChildrenRevised (W1SC-R; Wechsler, 1974) or a Stanford-Binet Intelligence\nScale (Thorndike, 1972), whereas a subject in an autistic/retarded class\nreceived a nonverbal test like the Merrill-Palmer Pre-School Perfor\xc2\xad\nmance Test (Stutsman, 1948). In all instances of subjects having\nachieved a normal IQ score, the testing was eventually replicated by\nother examiners. The scales (in order of the frequency of usage) in\xc2\xad\ncluded the WISC-R (Wechsler, 1974), the Stanford-Binet (Thorndike,\n1972), the Peabody Picture Vocabulary Test (Dunn, 1981), the Wech\xc2\xad\nsler Pre-School Scale (Wechsler; 1987), the Bayley Scales of Infant De\xc2\xad\nvelopment (Bayley, 1955), the Cattell Infant Intelligence Scale (Cattell,\nI960), and the Letter International Performance Scale (Leiter, 1959).\nSubjects received a score of 3 for normal functioning if they received a\nscore on the WISC-R or Stanford-Binet in the normal range, completed\nfirst grade in a normal class in a school for normal children, and were\nadvanced to the second grade by the teacher. Subjects received a score\nof 2 if they were placed in first-grade in a smaller aphasia (language\ndelayed, language handicapped, or learning disabled) class. Placement\nin the aphasia class implied a higher level offunctioning than placement\nin classes for the autistic/retarded, but the diagnosis of autism was al\xc2\xad\nmost always retained. A score of I was given if the first-grade placement\nwas in a class for the autistic/retarded and if the child\xe2\x80\x99s IQ score fell\nwithin the severely retarded range.\n\nTreatment Procedure\nEach subject in the experimental group was assigned several well\ntrained student therapists who worked (part-time) with the subject in\nthe subject\xe2\x80\x99s home, school, and community for an average of 40 hr per\nweek for 2 or more years. The parents worked as part of the treatment\nteam throughout the intervention; they were extensively trained in the\ntreatment procedures so that treatment could take place for almost all\nof the subjects\xe2\x80\x99 waking hours, 365 days a year. A detailed presentation\nof the treatment procedure has been presented in a teaching manual\n(Lovaas et al., 1980). The conceptual basis of the treatment was rein\xc2\xad\nforcement (operant) theory; treatment relied heavily on discriminationlearning data and methods. Various behavioral deficiencies were tar\xc2\xad\ngeted, and separate programs were designed to accelerate development\nfor each behavior High, rates of aggressive and self-stimulatory behav\xc2\xad\niors were reduced by being ignored; by the use of time-out; by the shap\xc2\xad\ning of alternate, more socially acceptable forms of behavior, and (as a\nlast resort) by the delivery of a loud \xe2\x80\x9cno\xe2\x80\x9d or a slap on the thigh contin\xc2\xad\ngent upon the presence ofthe undesirable behavior. Contingent physical\naversives were not used in the control group because inadequate staffing\n\nin that group did not allow for adequate teaching of alternate, socially\nappropriate behaviors.\nDuring the find yeat, treatment goals consisted of reducing self-stimu\xc2\xad\nlatory and aggressive behaviors, building compliance to elementary ver\xc2\xad\nbal requests, teaching imitation, establishing the beginnings of appro\xc2\xad\npriate toy play, and promoting the extension of the treatment into the\nfamily. The second year of treatment emphasised teaching expressive\nand early abstract language and interactive play with peers. Treatment\nwas also extended into the community to teach children to function\nwithin a preschool group. The third year emphasized the teaching of\nappropriate and varied expression of emotions; preacademic tasks like\nreading, writing, and arithmetic; and observational learning (learning\nby observing other children learn). Subjects were enrolled only in those\npreschools where the teacher helped to carry out the treatment pro\xc2\xad\ngram. Considerable effort was exercised to mainstream subjects in a\nnormal (average and public) preschool placement and to avoid initial\nplacement in special education classes with the detrimental effects of\nexposure to other autistic children. This occasionally entailed withhold\xc2\xad\ning the subject\xe2\x80\x99s diagnosis of autism. Ifthe child became known as autis\xc2\xad\ntic (or as \xe2\x80\x9ca very difficult child") during the first year in preschool, the\nchild was encouraged to enroll in another, unfamiliar school (to start\nfresh). After preschool, placement in public education classes was deter\xc2\xad\nmined by school personnel. All children who successfully completed\nnormal kindergarten successfully completed first grade and subsequent\nnormal grades. Children who were observed to be experiencing educa\xc2\xad\ntional and psychological problems received their school placement\nthrough Individualized Educational Plan (IEP) stallings (attended by\neducators and psychologists) in accordance with the Education For All\nHandicapped Children Act of 1975.\nAll subjects who went on to a normal first grade were reduced in\ntreatment from the 40 hr per week characteristic of the first 2 years to\n10 hr or less per week during kindergarten. After a subject had started\nfirst grade, the project maintained a minimal (at most) consultant rela\xc2\xad\ntionship with some families. In two cases, this consultation and the sub\xc2\xad\nsequent correction of problem behaviors were judged to be essential\nin maintaining treatment gains. Subjects who did not recover in the\nexperimental group received 40 hr or more per week ofone-to-one treat\xc2\xad\nment for more than 6 years (more than 14,000 hr of one-to-one treat\xc2\xad\nment), with some improvement shown each year but with only I subject\nrecovering.\nSubjects in Control Group 1 received the same kind of treatment as\nthose in the experimental group but with less intensity (less than 10\nhr of one-to-one treatment per week) and without systematic physical\naversives. In addition, these subjects received a variety of treatments\nfrom other sources in the community such as those provided by small\nspecial education classes.\nControl Group 2 consisted of 21 subjects selected from a larger group\n(N - 62) of young autistic children studied by Freeman et al. (1985).\nThese subjects came from the same agency that diagnosed 95% of our\nother subjects. Data from Control Group 2 helped to guard against the\npossibility that subjects who had been referred to us for treatment con\xc2\xad\nstituted a subgroup with particularly favorable or unfavorable out\xc2\xad\ncomes. To provide a group ofsubjects similar to those in the experimen\xc2\xad\ntal group and Control Group 1, subjects for Control Group 2 were se\xc2\xad\nlected if they were 42 months old or younger when first tested, had IQ\nscores above 40 at intake, and had follow-up testing at 6 years of age.\nThese criteria resulted in the selection of 21 subjects. Subjects in Con\xc2\xad\ntrol Group 2 were treated like Control Group 1 subjects but were not\ntreated by the Young Autism Project described here.\n\nResults\nPretreatment Comparisons\nEight pretreatment variables from the experimental group\nand Control Group I (CA at first diagnosis, CA at onset of treat-\n\n\x0cPet. Reh. App.4\n6\n\nO. IVAR LOVAAS\n\nTable 1\nMeans and F Ratios From Comparisons Between Groups on Intake Variables\nGroup\n\nDiagnosis CA\n\nTreatment CA\n\nPMA\n\nExperimental\nControl I\nF*\n\n32.0\n\n35.3\n1.58\n\n34.6\n40.9\n4.02*\n\n18.8\n17.1\n1.49\n\nRecognizable\nwords\n.42\n.S8\n\n.92\n\nToy\n\nSelf\xc2\xad\n\nplay\n\nstimulation\n\nSum\npathology\n\nAbnormal\nspeech\n\n28.2\n20.2\n2.76\n\n12.1\n19.6\n\n6.9\n6.4\n\n2.4\n2.2\n.36\n\n.82\n\n3.37\n\nNote. CA = chronological age; PMA = prorated mental age. Experimental group, n= 19; Control Group I ,n= 19.\n*<//= 1,36.\n*p<.05.\n\nment, PMA, sum pathology, abnormal speech, self-stimulatory\nbehavior, appropriate toy play, and recognizable words) were\nsubjected to a multivariate analysis of variance (manova;\nBrecht & Woodward, 1984). The means and F ratios from this\nanalysts are presented in Table 1. As can be seen, there were no\nsignificant differences between the groups except for CA at on\xc2\xad\nset of our treatment (p < .05). Control subjects were 6 months\nolder on the average than experimental subjects (mean CAs of\n35 months vs. 41 months, respectively). These differences prob\xc2\xad\nably reflect the delay of control subjects in their initiation into\nthe treatment project because of staff shortages; analysis will\nshow that differential CAs are not significantly related to out\xc2\xad\ncome. To ascertain whether another test would reveal a statisti\xc2\xad\ncally significant difference between the groups on toy play, de\xc2\xad\nscriptions of the subjects\xe2\x80\x99 toy play (taken from the videotaped\nrecordings) were typed on cards and rated for their developmen\xc2\xad\ntal level by psychology students who were naive about the pur\xc2\xad\npose of the ratings and subject group assignment. The ratings\nwere reliable among students (r - .79, p < .001), and an F test\nshowed no significant difference in developmental levels of toy\nplay between the two groups.\nThe respective means from the experimental group and Con\xc2\xad\ntrol Group 1 on the eight variables from the parent interview\nwere .89 and .74 for sensory deficit, .63 and .42 for adult rejec\xc2\xad\ntion, .58 and .47 for no recognizable words, .53 and .63 for no\ntoy play, 1.0 and 1.0 for no peer (day, .95 and .89 for body self\xc2\xad\nstimulation, .89 and .79 for tantrums, and .68 and .63 for no\ntoilet training. The experimental group and Control Group 1\nwere also similar in onset of walking (6 vs. 8 early walkers; 1 vs.\n2 late walkers), number of siblings in the family (1.26 in each\ngroup), socioeconomic status of the father (Level 49 vs. Level\n54 according to 1950 Bureau of the Census standards), boys to\ngirls (16:3 vs. 11:8); and number of subjects referred for neuro\xc2\xad\nlogical examinations (10 vs. 15) who showed signs ofdamage (0\nvs. 1). The numbers of favorable versus unfavorable prognostic\nsigns (directions of differences) on the pretreatment variables\ndivide themselves equally between the groups. In short, the two\ngroups appear to have been comparable at intake.\n\ntional placement (p < .001) and IQ (p < .01). The two control\ngroups did not differ significantly at intake or at follow-up. In\nshort, data from Control Group 2 replicate those from Control\nGroup 1 and further validate the effectiveness of our experi\xc2\xad\nmental treatment program. Data are given in Table 2 that show\nthe group means from pretreatment PMA and posttreatment\neducational placement and IQ scores. The table also shows the\nFratios and significance levels of the three group comparisons.\nIn descriptive terms, the 19-subject experimental group\nshows 9 children (47%) who successfully passed through normal first grade in a public school and obtained an average or\nabove average score on IQ tests (M = 107, range = 94-120).\nEight subjects (42%) passed first grade in aphasia classes and\nobtained a mean IQ score within the mildly retarded range of\nintellectual functioning (M - 70, range = 56-95). Only two\nchildren (10%) were placed in classes for autistic/retarded chil\xc2\xad\ndren and scored in the profoundly retarded range (IQ < 30).\nThere were substantial increases in the subjects\xe2\x80\x99 levels of in\xc2\xad\ntellectual functioning after treatment The experimental group\nsubjects gained on the average of 30 IQ points over Control\nGroup 1 subjects. Thus the number of subjects who scored\nwithin the normal range of intellectual functioning increased\nfrom 2 to 12, whereas the number of subjects within the moderate-to-severe range of intellectual retardation dropped from 10\nto 3. As of 1986, the achievements of experimental group sub-\n\nFollow-Up Data\n\nExperimental x Control 1\nExperimental X Control 2\nControl 1 X Control 2\n\nTable 2\nMeans and F Ratiosfor Measures at Pretreatment\nand Posttreatment\nFollow-up\nGroup\n\nIntake PMA\n\nEDP\n\nIQ\n\nMeans\nExperimental\nControl I\nControl 2\n\n18.8\n17.1\n17.6\n\n2.37\n1.42\n1.57\n\n83.3\n52.2\n57.5\n\nFratios*\nSubjects\xe2\x80\x99 PMA at intake, follow-up educational placement,\nand IQ scores were subjected to a manova that contrasted the\nexperimental group with Control Groups 1 and 2. At intake,\nthere were no significant differences between the experimental\ngroup and the control groups. At follow-up, the experimental\ngroup was significantly higher than the control groups on educa-\n\n1.47\n0.77\n0.14\n\n23.6**\n17.6**\n0.63\n\n14.4**\n10.4*\n\n0.45\n\nNote. PMA = prorated mental age; EDP \xe2\x96\xa0 educational placement. Ex\xc2\xad\nperimental group, n = 19; Control Group 1, n - 19; Control Group 2,\nn-21.\nmdf- 1,56.\n*p<.01. **p<.00l.\n\n\x0cPet. Reh. App.5\n7\n\nTREATMENT OF AUTISTIC CHILDREN\nTable 3\nEducational Placement and Mean\nand Range ofIQ at Follow-Up\nGroup\nExperimental\nN\n\nMIQ\nRange\nControl Group 1\nN\n\nRecovered\n9\n107\n94-120\n0\n\nM IQ\nRange\n\nControl Group 2\nN\nM IQ\nRange\n\nAphasic\n\n1\n\n99\n\nAutistic/Retarded\n\n8\n\n2\n\n70\n56-95\n\n30\n\n8\n74\n30-102\n\n11\n36\n20-73\n\n10\n\n10\n44\n\n67\n49-81\n\n35-54\n\nMote. Dashes indicate no score or no entry.\n* Both children received the same score.\n\njects have remained stable. Only 2 subjects have been reclassi\xc2\xad\nfied: I subject (now 18 years old) was moved from an aphasia\nto a normal classroom after the sixth grade; 1 subject (now 13\nyears old) was moved from an aphasia to an autistic/retarded\ndass placement\nThe MA and IQ scores of the two control groups remained\nvirtually unchanged between intake and follow-up, consistent\nwith findings from other studies (Freeman et al., 1985; Rutter,\n1970). The stability of the IQ scores of the young autistic chil\xc2\xad\ndren, as repented in the Freeman et al. study, is particularly\nrelevant for the present study because it reduces the possibility\nof spontaneous recovery effects. In descriptive terms, the com\xc2\xad\nbined follow-up data from the control groups show that their\nsubjects fared poorly: Only I subject (2%) achieved normal\nfunctioning as evidenced by normal first-grade placement and\nan IQ of 99 on the WISC-R; 18 subjects (45%) were in aphasia\nclasses (mean IQ \xc2\xab 70, range = 30-101); and 21 subjects (53%)\nwere in classes for the autistic/retarded (mean IQ = 40, range 20-73). Table 3 provides a convenient descriptive summary of\nthe main follow-up data from the three groups.\nOne final control procedure subjected 4 subjects in the exper\xc2\xad\nimental group (Ackerman, 1980) and 4 subjects in Control\nGroup 1 (McEachin & Leaf, 1984) to a treatment intervention\nin which one component of treatment (the loud \xe2\x80\x9cno" and occa\xc2\xad\nsional slap on the thigh contingent on self-stimulatory, aggres\xc2\xad\nsive, and noncompliant behavior) was at first withheld and then\nintroduced experimentally. A within-subjects replication de\xc2\xad\nsign was used across subjects, situations, and behaviors, with\nbaseline observations varying from 3 weeks to 2 years after\ntreatment had started (using contingent positive reinforcement\nonly). During baseline, when the contingent-aversive compo\xc2\xad\nnent was absent, small and unstable reductions were observed\nin the large amount of inappropriate behaviors, and similar\nsmall and unstable increases were observed in appropriate be\xc2\xad\nhaviors such as play and language. These changes were insuffi\xc2\xad\ncient to allow for the subjects\xe2\x80\x99 successful mainstreaming. Intro\xc2\xad\nduction of contingent aversives resulted in a sudden and stable\nreduction in the inappropriate behaviors and a sudden and sta\xc2\xad\nble increase in appropriate behaviors. This experimental inter\xc2\xad\nvention helps to establish two points: First, at least one compo-\n\nnent in the treatment program functioned to produce change,\nwhich helps to reduce the effect of placebo variables. Second,\nthis treatment component affected both the experimental and\ncontrol groups in a similar manner, supporting the assumption\nthat the two groups contained similar subjects.\nAnalyses of variance were carried out on the eight pretreat\xc2\xad\nment variables to determine which variables, if any, were sig\xc2\xad\nnificantly related to outcome (gauged by educational placement\nand IQ) in the experimental group and Control Group 1. Pro\xc2\xad\nrated mental age was significantly (p < .03) related to outcome\nin both groups, a finding that is consistent with reports from\nother investigators (DeMyer et al., 1981). In addition, abnormal\nspeech was significantly (p < .01) related to outcome in Control\nGroup 1. Chronological age at onset of our treatment was not\nrelated to outcome, which is important because the two groups\ndiffered significantly on this variable at intake (by 6 months).\nThe failure ofCA to relate to outcome may be based on the very\nyoung age of all subjects at onset of treatment\nConceivably, a linear combination of pretreatment variables\ncould have predicted outcome in the experimental group. Using\na discriminant analysis (Ray, 1982) with the eight variables\nused in the first multivariate analysis, it was possible to predict\nperfectly the 9 subjects who did achieve normal functioning,\nand no subject was predicted to achieve this outcome who did\nnot In this analysis, PMA was the only variable that was sig\xc2\xad\nnificantly related to outcome. Finally, when this prediction\nequation was applied to Control Group 1 subjects, 8 were pre\xc2\xad\ndicted to achieve normal functioning with intensive treatment;this further verifies the similarity between the experimental\ngroup and Control Group 1 prior to treatment.\nDiscussion\nThis article reports the results of intensive behavioral treat\xc2\xad\nment for young autistic children. Pretreatment measures re\xc2\xad\nvealed no significant differences between the intensively treated\nexperimental group and the minimally treated control groups.\nAt follow-up, experimental group subjects did significantly betterthan control group subjects. For example, 47% of the experi\xc2\xad\nmental group achieved normal intellectual and educational\nfunctioning in contrast to only 2% of the control group subjects.\nThe study incorporated certain methodological features de\xc2\xad\nsigned to increase confidence in the effectiveness of the experi\xc2\xad\nmental group treatment:\n1. Pretreatment differences between the experimental and\ncontrol groups were minimized in four ways. First, the assign\xc2\xad\nment of subjects to groups was as random as was ethically possi\xc2\xad\nble. The assignment apparently produced unbiased groups as\nevidenced by similar scores on the 20 pretreatment measures\nand by the prediction that an equal number of Control Group\n1 and experimental group subjects would have achieved normal\nfunctioning had the former subjects received intensive treat\xc2\xad\nment. Second, the experimental group was not biased by receiv\xc2\xad\ning subjects with a favorable diagnosis or biased IQ testing be\xc2\xad\ncause both diagnosis and IQ tests were constant across groups.\nThird, the referral process did not favor the project cases be\xc2\xad\ncause there were no significant differences between Control\nGroups 1 and 2 at intake or follow-up, even though Control\nGroup 2 subjects were referred to others by the same agency.\n\n\x0cPet. Reh. App.6\n8\n\nQ IVAR LOVAAS\n\nFourth, subjects stayed within their groups, which preserved the\noriginal (unbiased) group assignment.\n2. A favorable outcome could have been caused not by the\nexperimental treatment but by the attitudes and expectations\nof the staff. There are two findings that contradict this possibil\xc2\xad\nity of treatment agency (placebo) effects. First, because Control\nGroup 2 subjects had no contact with the project, and because\nthere was no difference between Control Groups 1 and 2 at fol\xc2\xad\nlow-up, placebo effects appear implausible. Second, die withinsubjects study showed that at least one treatment component\ncontributed to the favorable outcome in the intensive treatment\n(experimental) group.\n3. It may be argued that the treatment worked because the\nsubjects were not truly autistic. This is counterindicated by the\nhigh reliability of the independent diagnosis and by the out\xc2\xad\ncome data from the control groups, which are consistent with\nthose reported by other investigators (Brown, 1969; DeMeyer\net al., 1973; Eisenberg, 1956; Freeman et al\xe2\x80\x9e 1985; Havelkova,\n1968; Rutter, 1970) for groups of young autistic children diag\xc2\xad\nnosed by a variety of other agencies.\n4. The spontaneous recovery rate among very young autistic\nchildren is unknown, and without a control group the favorable\noutcome in the experimental group could have been attributed\nto spontaneous recovery. However, the poor outcome in the sim\xc2\xad\nilarly constituted Control Groups 1 and 2 would seem to elimi\xc2\xad\nnate spontaneous recovery as a contributing factor to the favor\xc2\xad\nable outcome in the experimental group. The stability ofthe IQ\ntest semes in the young autistic children examined by Freeman\net al. (1985) attests once again to the chronicity of autistic be\xc2\xad\nhaviors and serves to further negate the effects of spontaneous\nrecovery.\n5. Posttreatment data showed that the effects of treatment\n(a) were substantial and easily detected, (b) were apparent on\ncomprehensive, objective, and socially meaningful variables\n(IQ and school placement), and (c) were consistent with a very\nlarge body of prior research on the application of learning the\xc2\xad\nory to the treatment and education ofdevelopmentally disabled\npersons and with the very extensive (100-year-old) history of\npsychology laboratory work on learning processes in man and\nanimals. In short, the favorable outcome reported for the inten\xc2\xad\nsive-treatment experimental group can in all likelihood be at\xc2\xad\ntributed to treatment.\nA number of measurement problems remain to be solved.\nFor example, play, communicative speech, and IQ scores define\nthe characteristics of autistic children and are considered pre\xc2\xad\ndictors of outcome. Yet the measurement of these variables is\nno easy task. Consider play. First, play undoubtedly varies with\nthe kinds of toys provided. Second, it is difficult to distinguish\nlow levels of toy play (simple and repetitive play associated with\nyoung, normal children) from high levels of self-stimulatory be\xc2\xad\nhavior (a psychotic attribute associated with autistic children).\nSuch problems introduce variability that needs immediate at\xc2\xad\ntention before research can proceed in a meaningful manna;.\nThe term normalfunctioning has been used to describe chil\xc2\xad\ndren who successfully passed normal first grade and achieved an\naverage IQ on the WISC-R. But questions can be asked about\nwhether these children truly recovered from autism. On the one\nhand, educational placement is a particularly valuable measure\nof progress because it is sensitive to both educational accom\xc2\xad\nplishments and social-emotional functions. Also, continual\n\npromotion from grade to grade is made not by one particular\nteacher but by several teachers. School personnel describe these\nchildren as indistinguishable from their normal friends. On the\nother hand, certain residual deficits may remain in the normal\nfunctioning group that cannot be detected by teachers and par\xc2\xad\nents and can only be isolated on closer psychological assess\xc2\xad\nment, particularly as these children grow older. Answers to such\nquestions will soon be forthcoming in a more comprehensive\nfollow-up (McEachin, 1987).\nSeveral questions about treatment remain. It is unlikely that\na therapist or investigator could replicate our treatment pro\xc2\xad\ngram for the experimental group without prior extensive theo\xc2\xad\nretical and supervised practical experience in one-to-one be\xc2\xad\nhavioral treatment with developmentally disabled clients as de\xc2\xad\nscribed here and without demonstrated effectiveness in teaching\ncomplex behavioral repertoires as in imitative behavior and ab\xc2\xad\nstract language. In the within-subjects studies that were re\xc2\xad\nported, contingent aversives were isolated as one significant\nvariable. It is therefore unlikely that treatment effects could be\nreplicated without this component. Many treatment variables\nare left unexplored, such as the effect of normal peers. Further\xc2\xad\nmore, the successful mainstreaming of a 2-4-year-old into a\nnormal preschool group is much easier than the mainstreaming\nofan older autistic child into the primary grades. This last point\nunderscores the importance of early intervention and places\nlimits on the generalization of our data to older autistic chil\xc2\xad\ndren.\nHistorically, psychodynamic theory has maintained a strong\ninfluence on research and treatment with autistic children,\noffering some hope for recovery through experiential manipula\xc2\xad\ntions. By the mid-1960s, an increasing number of studies re\xc2\xad\nported that psychodynamic practitioners were unable to deliver\non that promise (Rimland, 1964). One reaction to those failures\nwas an emphasis on organic theories ofautism that offered little\nor no hope for major improvements through psychological and\neducational interventions. In a comprehensive review of re\xc2\xad\nsearch on autism, DeMyer et al. (1981) concluded that \xe2\x80\x9c[in the\npast] psychotic children were believed to be potentially capable\nof normal functioning in virtually all areas of development. . .\nduring the decade of the 1970s it was the rare investigator who\neven gave lip-service to such previously held notions. . .infan\xc2\xad\ntile autism is a type of developmental disorder accompanied by\nsevere and, to a large extent, permanent intellectual/behavioral\ndeficits\xe2\x80\x9d (p.432).\nThe following points can now be made. First, at least two\ndistinctively different groups emerged from the follow-up data\nin the experimental group. Perhaps this finding implies differ\xc2\xad\nent etiologies. If so, future theories of autism will have to iden\xc2\xad\ntify these groups of children. Second, on the basis of testing to\ndate, the recovered children show no permanent intellectual or\nbehavioral deficits and their language appears normal, contrary\nto the position that many have postulated (Rutter, 1974; Chur\xc2\xad\nchill, 1978) but consistent with Kanner\xe2\x80\x99s (1943) position that\nautistic children possess potentially normal or superior intelli\xc2\xad\ngence. Third, at intake, all subjects evidenced deficiencies\nacross a wide range of behaviors, and during treatment they\nshowed a broad improvement across all observed behaviors.\nThe kind of (hypothesized) neural damage that mediates a par\xc2\xad\nticular kind of behavior, such as language (Rutter, 1974), is not\nconsistent with these data.\n\n\x0cPet. Reh. App.7\nTREATMENT OF AUTISTIC CHILDREN\nAhhough serious problems remain for exactly defining au\xc2\xad\ntism or identifying its etiology, one encouraging conclusion can\nbe stated: Given a group of children who show the kinds of be\xc2\xad\nhavioral deficits and excesses evident in our pretreatment mea\xc2\xad\nsures, such children will continue to manifest similar severe\npsychological handicaps later in life unless subjected to inten\xc2\xad\nsive behavioral treatment that can indeed significantly alter that\noutcome.\nThese data promise a major reduction in the emotional hard\xc2\xad\nships of families with autistic children. The treatment proce\xc2\xad\ndures described here may also prove equally effective with other\nchildhood disorders, such as childhood schizophrenia. Certain\nimportant, practical implications in these findings may also be\nnoted. The treatment schedule ofsubjects who achieved normal\nfunctioning could be reduced from 40 hr per week to infrequent\nvisits even after the first 2 years of treatment. The assignment\nof one full-time special-education teacher for 2 years would cost\nan estimated $40,000, in contrast to the nearly $2 million in\xc2\xad\ncurred (in direct costs alone) by each client requiring life-long\ninstitutionalization.\n\nReferences\nAckerman, A. B. (1980). The contribution ofpunishment to the treat\xc2\xad\nment ofpreschool aged children. Unpublished doctoral dissertation,\nUniversity of California, Los Angeles.\nAmerican Psychiatric Association. (1980). Diagnostic and statistical\nmanual of mental disorders (3rd ed.). Washington, DC: Author.\nBayley, N. (1955). On the growth ofintelligence. American Psychologist,\n10.805-818.\nBrecht, M. L., & Woodward, J. A. (1984). ganova: A univariate/multi\xc2\xad\nvariate analysis ofvariance program for the personal computer. Edu\xc2\xad\ncational and Psychological Measurement, 44,169-173.\nBrown, J. (1969). Adolescent development of children with infantile\npsychosis. Seminars in Psychiatry 1,79-89.\nCatteU, P. (1960). The measurement ofintelligence ofinfants and young\nchildren. New M>rk: Psychological Corporation.\nChurchill, D. W. (1978). Language: The problem beyond conditioning.\nIn M. Rutter & E. Schopler (Eds.), Autism: A reappraisal ofconcepts\nand treatment (pp. 71-85). New York: Plenum.\nDeMyer, M. K., Barton, S., DeMyer, W. E., Norton, J. A., Allen, J., &\nSteele, R. (1973). Prognosis in autism: A follow-up study. Journal of\nAutism and Childhood Schizophrenia, 3.199-246.\nDeMyer, M. K., Hingtgen, J. N., & Jackson, R. K. (1981). Infantile\nautism reviewed: A decade of research. Schizophrenia Bulletin. 7.\n388-451.\nDoll, E. A. (1953). The measurement ofsocial competence. Minneapo\xc2\xad\nlis, MN: Minneapolis Educational Test Bureau.\nDunn, L. M. (1981). Peabody Picture Vocabulary Test. Circle River,\nMI: American Guidance Service.\n\n9\n\nEducation for All Handicapped Children Act of 197S. Washington, DC:\nCongressional Record.\nEisenberg, L. (1956). The autistic child in adolescence. American Jour\xc2\xad\nnal ofPsychiatry, 112.607-612.\nFreeman, B. J., Ritvo, E. R\xe2\x80\x9e Needleman, R\xe2\x80\x9e & Yokota, A. (1985). The\nstability of cognitive and linguistic parameters in autism: A 5-year\nstudy. Journal ofthe American Academy ofChild Psychiatry. 24,290311.\nGesell, A. (1949). Gesetl Developmental Schedules. New York: Psycho\xc2\xad\nlogical Corporation.\nHavelkova, M. (1968). Follow-up study of71 children diagnosed as psy\xc2\xad\nchotic in preschool age. American Journal of Orthopsychiatry. 38,\n846-857.\nKanner, L. (1943). Autistic disturbances of affective contact. Nervous\nChild, 2, 217-250.\nLeiter, R. G. (1959). Part I of the manual for the 1948 revision of the\nLeiter International Performance Scale: Evidence of the reliability\nand validity of the Leiter tests. Psychology Service Center Journal.\nII. 1-72.\nLotter, V. (1967). Epidemiology ofautistic conditions in young children:\nII. Some characteristics of the parents and children. Social Psychia\xc2\xad\ntry, l, 163-173.\nLovaas, Q. 1., Ackerman, A. B., Alexander, D.. Firestone. P.. Perkins, J.,\n& Young, D. (1980). Teaching developmentaliy disabled children:\nThe me book. Austin. TX: Pro-Ed.\nLovaas, O. I., Koegel. R. L.. Simmons, J. Q., & Long, 7. (1973). Some\ngeneralization and follow-up measures on autistic children in behavior therapy. Journal ofApplied Behavior A nalysis, 6. 131-166.\nMcEachin, J, J. (1987). Outcome ofautistic children receiving intensive\nbehavioral treatment: Residual deficits. Unpublished doctoral disser\xc2\xad\ntation, University of California, Los Angeles.\nMcEachin, J. J\xe2\x80\x9e & Leaf, R. B. (1984, May). The role ofpunishment in\nmotivation ofautistic children. Paper presented at the convention of\nthe Association for Behavior Analysis, Nashville, TN.\nRay, A. A. (1982). Statistical Analysis System user\'s guide: Statistics,\n1982 edition. Cary, NC: SAS Institute.\nRimland, B. (1964). Infantile autism. New York: Appleton-CenturyCrofts.\nRutter, M. (1970). Autistic children: Infancy to adulthood. Seminars in\nPsychiatry 2.435-450.\nRutter, M. (1974). The development of infantile autism. Psychological\nMedicine, 4, 147-163.\nStutsman, R. (1948). Guidefor administering the Merrill-Palmer Scale\nofMental Tests. New York: Harcourt, Brace & World.\nThorndike, R. L. (1972). Manualfor Stanford-Binet Intelligence Scale.\nBoston: Houghton Mifflin.\nWechsler, D. (1967). Manualfor the Wechsler Pre-School and Primary\nScale ofIntelligence. New York: Psychological Corporation.\nWechsler, D. (1974). Manual for the Wechsler Intelligence Scale for\nChildren-Revtsed. New York: Psychological Corporation.\nReceived October 10, 1985\nRevision received March 28,1986 \xe2\x96\xa0\n\n\x0cPet. Reh. App.8\n\nA mtrican Journal on Mental Retardation\nim. Vol. 97. No. 4, 359-372\n\xc2\xa9 1993 American Association on Mental Retardation\n\nLong-Term Outcome for\nChildren With Autism Who\nReceived Early Intensive\nBehavioral Treatment\n\nJohn J. McEachln, Tristram Smith, and O. Ivar\nLovaas\nUnivenity of California. Los Angeles\n\nAfter a very intensive behavioral intervention, an experimental group of 19\npreschool-age children with autism achieved less restrictive school placements and\nhigher IQs than did a control group of 19 similar children by age 7 (Lovaas,\n1987). The present study followed-up thisfinding by assessing subjects at a mean\nage of 11.5 years. Results showed that the experimental group preserved its gains\nover the control group. The 9 experimental subjects who had achieved the best\noutcomes at age 7 received particularly extensive evaluations indicating that 8 of\nthem were indistinguishable from average children on tests of intelligence and\nadaptive behavior. Thus, behavioral treatment may produce long-lasting and\nsignificant gainsfor many young children with autism.\n\nInfantile autism is a condition\nmarked by severe impairment in intellectual,\nsocial, and emotional functioning. Its onset\noccurs in infancy, and the prognosis appears\nThis Study was supported by Grant No. MH11440 from the National Institute of Mental Health,\nThe study was based on a dissertation submitted\nto the University of California, Los Angeles,\nDepartment of Psychology, in partial fulfillment\nof the requirements for the doctoral degree. The\nauthors express their deep appreciation to the\nmany students at UCLA who served as therapists\nand helped to make this study possible. Special\nthanks to Bruce Baker and Duane Buhrmester,\nwho helped in the design of this study. Requests\nfor reprints of this article, copies of the Clinical\nRating Scale, or additional information about this\nstudy should be sent to O. Ivar Lovaas, 405\nHilgard Ave., UCLA, Department of Psychology,\nLos Angeles, CA 90024-1563.\n\nMcEachin, Smith, and Lovaas\n\nJ\n\nto be extremely poor (Lotter, 1978). For\nexample, in the longest prospective followup study with a sound methodological design, Rutter (1970) found that only 1 of 64\nsubjects with autism (fewer than 2%) could\nbe considered free of clinically significant\nproblems by adulthood, as evidenced by\nholding a job, living independently, and\nmaintaining an active and age-appropriate\nsodal life. The remaining subjects showed\nnumerous dysfunctions, such as marked\noddities in behavior, social isolation, and\nflorid psychopathology. The majority of sub\xc2\xad\njects required supervised living conditions.\nProfessionals have attempted a wide\n...\n, ,\ntntcrventions in an effort to help\nchildren with autism. For many yeais, no\nscientific evidence showed that any of these\ninterventions brightened the children\xe2\x80\x99s long\xc2\xad\nterm prognosis (DeMyer et al., 1981). How-\n\n359\n\n\x0cm\n\nPet. Reh. App.9\n\nV\n\n1\n\nI\n\nI\n1\n\never, since the 1960s, one of these interven\xc2\xad\ntions, behavioral treatment, has appeared\npromising. Behavioral treatment has been\nfound to increase adaptive behaviors such as\nlanguage and social skills, while decreasing\ndisruptive behaviors such as aggression\n(DeMyer, Hingtgen, &Jackson, 1981; Newsom\n& Rincover, 1989; Rutter, 1985). Further\xc2\xad\nmore, behavioral treatment has been con\xc2\xad\ntinuously refined and improved as a result of\nongoing research efforts at a number of sites\nOLovaas & Smith, 1988).\nSome recent evidence has indicated\nthat behavioral treatment has developed to\nthe point that it can produce substantial\nimprovements in the overall functioning of\nyoung children with autism (Simeonnson,\nOlley, & Rosenthal, 1987). Lovaas (1987)\nprovided approximately 40 hours per week\nof one-on-one behavioral treatment for a\nperiod of 2 years or more to an experimental\ngroup of 19. children with autism who were\nunder 4 years of age. This intervention also\nincluded parent training and mainstreaming\ninto regular preschool environments. When\nre-evaluated at a mean age of 7 years, sub\xc2\xad\njects in the experimental group had gained\nan average of 20 IQ points and had made\nmajor advances in educational achievement\nNine of the 19 subjects completed first grade\nin regular (nonspecial education) classes\nentirely on their own and had IQs that\nincreased to the average range. By contrast,\ntwo control groups totalling 40 children, also\ndiagnosed as autistic and comparable to the\nexperimental group at intake, did not fare\nnearly as well. Only one of the control\nsubjects (2.5%) attained normal levels of\nintellectual and educational functioning.\nThese data suggest that behavioral treat\xc2\xad\nment is effective. However, the durability of\ntreatment gains is uncertain. In one prior\nmajor study, Lovaas, Koegel, Simmons, and\nLong (1973) found that children with autism\nregressed following the termination of treat\xc2\xad\nment Other studies have shown that chil\xc2\xad\ndren with autism may display increased dif\xc2\xad\nficulties when they enter adolescence\n(Kanner, 1971; Waterhouse & Fein, 1984).\n360\n\nAlso, as was stated in the first follow-up\n(Lovaas, 1987), \xe2\x80\x9cCertain residual deficits may\nremain in the normal-functioning group that\ncannot be detected by teachers and parents\nand can only be isolated on closer psycho\xc2\xad\nlogical assessment, particularly as these chil\xc2\xad\ndren grow older* (p. 8). This possibility\npoints to the need for a more detailed assess\xc2\xad\nment and for continued follow-ups of the\ngroup over time.\nThe present investigation contained two\nparts: In the first part we examined whether\nseveral years after the evaluation at age 7, the\nexperimental group in Lovaas\'s (1987) study\nhad maintained its treatment gains. Subjects\nin the experimental group and one of the\ncontrol groups completed standardized tests\nof intellectual and adaptive functioning. The\ngroups were then contrasted with each other,\nand their current performance was compared to their performance on previous as\xc2\xad\nsessments. The second part of the investiga\xc2\xad\ntion focused on those subjects who had\nachieved the best outcome at the end of first\ngrade in the Lovaas (1987) study (i.e\xe2\x80\x9e the 9\nsubjects who were classified ais normal func\xc2\xad\ntioning out of the 19 in the experimental\ngroup). We examined the extent to which\nthese best-outcome subjects could be considered free of autistic symptomatology. A\ntest battery was constructed to assess a\nvariety of possible deficits: for example,\nidiosyncratic thought patterns, mannerisms,\nand interests; lack of close relationships with\nfamily and friends; difficulty in getting along\nwith people; relative weaknesses in certain\nareas of cognitive functioning, such as ab\xc2\xad\nstract reasoning; not working up to ability in\nschool; flatness of affect; absence or pecu\xc2\xad\nliarity in sense of humor. Possible strengths\nto be identified included normal intellectual\nfunctioning, good relationships with family\nmembers, ability to function independendy,\nappropriate use of leisure time, and ad\xc2\xad\nequate socialization with peers. Numerous\nmethodological precautions were taken to\nensure objectivity of the follow-up examina\xc2\xad\ntion.\n\nAutism and Early Intervention\n\n\x0ca\n\nPet. Reh. App.10\n\nMethod\nSubjects and Background\n\n4\n\n"\n\n4\n4\n1\n\n?\n\nn\n\nCharacteristics of the subjects and their\ntreatment have been described elsewhere\n(Lovaas, 1987) and will only be summarized\nhere. The initial treatment study contained\n38 children who, at the time of intake, were\nvery young (less than 40 months if mute, less\nthan 46 months if echolalic) and had re\xc2\xad\nceived a diagnosis of autism from a licensed\nclinical psychologist or psychiatrist not in\xc2\xad\nvolved in the study. These 38 subjects were\ndivided into an experimental group and a\ncontrol group. The assignment to groups\nwas made on the basis of staff availability. At\nthe beginning of each academic quarter,\ntreatment teams were formed. The clinic\ndirector and staff members then determined\nwhether any opening existed for intensive\ntreatment If so, the next referral received\nwould enter the experimental group; other\xc2\xad\nwise, the subject entered the control group.\nThe experimental group contained 19 chil\xc2\xad\ndren who received 40 or more hours per\nweek of one-to-one behavioral treatment for\n2 or more years. The control group was\ncomprised of 19 children who received a\nmuch less intensive intervention (10 hours a\nweek or less of one-to-one behavioral treat\xc2\xad\nment in addition to a variety of treatments\nprovided by community agencies, such as\nparent training or special education classes).\nThe initial study also included a second\ncontrol group, consisting of 21 children with\nautism who were followed over time by a\nnearby agency but who were never referred\nfor this study. However, these 21 subjects\nwere not available for the present investiga\xc2\xad\ntion. On standardized measures of intelli\xc2\xad\ngence, the second control group did not\ndiffer from either the experimental group or\nthe first control group at intake, nor did it\ndiffer from the first control group when\nevaluated again when the subjects were 7\nyears old. These findings suggest that, as\nmeasured by standardized tests, (a) the chil\xc2\xad\ndren with autism who were referred to us for\n\ntreatment were comparable to children with\nautism seen elsewhere and (b) the minimal\ntreatment provided to the first control group\ndid not alter intellectual functioning.\nStatistical analysis of an extensive range\nof pretreatment measures confirmed that the\nexperimental group and control group were\ncomparable at intake and closely matched on\nsuch important variables as IQ and severity\nof disturbance. The mean chronological age\n(CA) at diagnosis for subjects in the experi\xc2\xad\nmental group was 32 months. Their mean IQ\nwas 53 (range 30 to 82; all IQs are given as\ndeviation scores). The mean CA of subjects\nin the control group was 35 months; their\nmean IQ was 46 (range 30 to 80). Most of the\nsubjects were mute, all had gross deficien\xc2\xad\ncies in receptive language, none played with\npeers or showed age-appropriate toy play,\nall were emotionally withdrawn, most had\nsevere tantrums, and all showed extensive\nritualistic and stereotyped (self-stimulatory)\nbehaviors. Thus, they appeared to be a\nrepresentative sample of children with au\xc2\xad\ntism (Lovaas, Smith, & McEachin, 1989). A\nmore complete presentation of die intake\ndata was reported by Lovaas (1987).\nThe children in the experimental group\nand control group received their respective\ntreatments from trained student therapists\nwho worked in the child\'s home. The parents\nalso worked with their child, and they re\xc2\xad\nceived extensive instruction and supervision\non appropriate treatment techniques. When\xc2\xad\never possible, the children were integrated\ninto regular preschools. The treatment fo\xc2\xad\ncused primarily on developing language,\nincreasing social behavior, and promoting\ncooperative play with peers along with inde\xc2\xad\npendent and appropriate toy play. Concur\xc2\xad\nrently, substantial efforts were directed at\ndecreasing excessive rituals, tantrums, and\naggressive behavior. (For a more detailed\ndescription of the intervention program, see\nthe treatment manual (Lovaas et al., 1980] and\ninstructional videotapes that supplement the\nmanual (Lovaas & Leaf, 19811.)\nAt the time of the present follow-up\n(1984-1985), the mean CA of the experimen-\n\nA\nMcEachin, Smith, and Lovaas\n\n361\n\n\x0cPet. Reh. App.11\n\nl! f.\nm\n\xe2\x80\xa2. \xe2\x96\xa0\n\n*\n\n\xe2\x80\xa2*\xe2\x80\x99\n\n: \xe2\x96\xa0 *\n\n\xe2\x96\xa0\xe2\x96\xa0\n\n\xe2\x96\xa0.*;\n\n\xe2\x80\xa2\n\n.....\n\n,.\ni*\n\n\xe2\x96\xa0 H\n\nM.\n\n\xe2\x96\xa0;r-\n\n\xe2\x96\xa0M\n\nI\n\nr?\'\\\n\ngiif y*\n\n\xe2\x96\xa0>\n\ni \xe2\x96\xa0\n\n-\n\nfi\n\n%\'\xe2\x96\xa0\n\nft.\n\n\'V:\nfz\n\n\xe2\x96\xa0r!\n\n* $ fi\n\nl\n\na **\n\n\xe2\x96\xa0s\n./\n\nC* -\xe2\x96\xa0>\n\n\xe2\x80\x98i \'r&\'>\n\ni-\n\n*\n\nw\n\n\xe2\x80\x98\n\nf\n\nISm **\nn\n\n>V\n\nI\n\nIP\n\n. \xe2\x80\xa2*-\'\nr* 5\'* d\n*\n\ntal group children was 13 years (range \xc2\xb0 9 to\n19 yeare). AU children who had achieved\nnormal ftmctionmg by the age of 7 yeans had\nended treatment by that pomt (NormalJunetioningvras operauonally defined as scoring\nwithin the normal range on standardized\nintelligence tests and successfully completing first grade in a regular, nonspedal education dass entirely on one\xe2\x80\x99s own.) On the\nother hand, some of the children who had\nnot achieved normal functioning at 7 years of\nage had, at the request of their parents,\nremained in treatment. The length of time\nthat experimental subjects had been out of\ntreatment ranged from 0 to 12 years (mean5)i with the normal-functioning children\nhaving been out for 3 to 9 years (mean-$).\nThe mean age of subjects in the control\ngroup was 10 years (range 6 to 14). The\nlength of time that these children had been\nout of treatment ranged from 0 to 9 years\n(mean - 3). Thus, experimental subjects\ntended to be older and had been out of\ntreatment longer than had control subjects,\nThis difference in age occurred because the\nfirst referrals for the study were all assigned\nto the experimental group due to the fact that\nreferrals came slowly (7 in the first 3.5 years)\nand therapists were available to treat all of\nthem. (As noted earlier, subjects were assigned to the experimental group if therapists were available to treat them; otherwise,\nthey entered the control group.)\nStatistical analyses were conducted to\ntest whether a bias resulted from the tendency for the first referrals to go into the\nexperimental group. For example, it is conceivable that the first referrals could have\nbeen higher functioning at intake or could\nhave had a better prognosis than subsequent\nreferrals. If so, the subject assignment procedure could have favored the experimental\ngroup. To assess this possibility, we correlated the order of referral with intake IQ and\nwith IQ at the first follow-up (age 7 years),\nPearson correlations were computed across\nboth groups and within each group. These\nanalyses indicated that the order in which\nsubjects were referred was not associated\n\n362\n\nwith intake IQ or outcome IQ. Consequei\nalthough thetendencyforthefustreferra\nenter the experimental group created a\ntential bias, the data indicate that this\nunlikely,\nProcedure\nThe assessment procedure indu\nascertaining school placement and admi\ntering three standardized tests. Informal\non school placement was obtained fi\nsubjects\xe2\x80\x99 parents, who classified then:\nbeing in either a regular or a special edi\ntion dass (e.g., a\nfor children ^\nautism or mental retardation, language\nlays, multihandicaps, or learning disal\nties). The three standardized tests wer<\nfollows:\n1. Intelligence test. The Wechsler In\nligence Scale for Children-Revised (Wechi\n1974) was administered when subjects w\nable to provide verbal responses. This\neluded all 9 best-outcome experimental s\njects plus 8 of the remaining 10 experimei\nsubjects and 6 of the 19 control subjects,\nsubjects who were not able to provide vei\nresponses, the Leiter International Per\nmance Scale (Leiter, 1959) and the Peatx\nPicture Vocabulary Test-Revised (Dunn, 19\nwere administered. All of these tests h\nbeen widely used for the assessment\nintellectual functioning in children with\ntism (Short & Marcus, 1986).\n2. The Vineland Adaptive Bebat\nScales (Sparrow, Balia, & Cicchetti, 191\nThe Vineland is a structured interview\nministered to parents assessing the ext\nto which their child exhibits behaviors t\nare needed to cope effectively with\neveryday environment,\n3. Tbe Personality Inventory for C\ndren (Wirt, Lachar, Klinedinst, & Seat, \\91\nThis measure is a 600-item true-false qu\ntionnaire filled out by parents that\nthe extent to which their children sh\nvarious forms of psychological disturbar\n(e.g., anxiety, depression, hyperactivity, a\npsychotic behavior).\n\nAutiam and Earty Intervent\n\n\x0cPet. Reh. App.12\n\nThese three tests were intended to provide a comprehensive evaluation of intellectual, social, and emotional functioning. All of\nthe tests have been standardized on average\npopulations. Hence, they provide an objeclive basis for comparing subjects to children\nwithout handicaps across the various areas\nthat they assess.\nData were obtained on all subjects except one girl in the control group, who was\nknown to be institutionalized and functioning very poorly. The 9 best-outcome subjects\n(those who had been classified as normal\nfunctioning at age 7) received particularly\nextensive evaluations, as outlined later. Of\nthe 28 remaining subjects, 17 were evaluated\nby staff members in our treatment program,\nand 11 received evaluations from outside\nagencies such as schools or psychology\nclinics. (In some cases, the outside agencies\ndid not administer all of the measures in this\nbattery.)\nEvaluation of Best-Outcome Subjects,\nTo ensure objectivity in the evaluation of the\nbest-outcome subjects, we arranged forblind\nadministration and scoring of ail tests for\nthese subjects as follows. A psychologist not\nassociated with the study recruited advanced\ngraduate students in clinical psychology to\nadminister the tests. The examiners were not\nfamiliar with the history of the children, and\nthe psychologist told them simply that the\ntesdng was part of a research study on\nassessment of children. The psychologist\nadvised them that the nature of the study\nnecessitated providing only certain standard\nbackground information: age, school placement and grade, and parent\'s name and\nphone number. To increase the heterogeneity of the sample and to control for any\nexaminer bias, each examiner also tested\none or more subjects who were matched in\nage to the experimental subjects and had no\nhistory of behavioral disturbance. The examiners were randomly assigned an approximately equal number of subjects for testing\nin the experimental group and the comparison group. Two experimental subjects were\nnot living in the local area. Therefore, for\n\nMcEachin, Smith, and Lovaaa\n\nfo >;\n\neach of them, the psychologist recruited a\ntester from the subject\xe2\x80\x99s hometown area as\nwell as an age-matched control subject, and\ndata were collected as just described. In\naddition, the child\xe2\x80\x99s examiner filled out a\nclinical rating scale following a structured\ninterview that covered a list cf standard\ntopics, including friendships, family relations, and school and community activities,\nThe interview was designed both for elidting content and for sampling interpersonal\nstyle. The rating scale consisted of 22 items,\neach scored 0 (best clinical status) to 3\n(marked deviance) points. The items were\ndesigned to include likely areas of difficulty\nfor children with autism of average intelligence (e.g., compulsive or ritualistic behavior, empathy for and interest in others, a\nsense of humor) as well as areas of potential\ndifficulty for the general child population\n(e.g., depressed mood, anxiety, hyperactiv\xc2\xad\nity). (The complete scale and a copy of\ninstructions for the clinical interview can be\nobtained by writing to the third author),\nResults\nExperimental Versus Control Group\nThis first section examines the overall\neffects of treatment through comparison of\nthe follow-up data.from the 19 subjects who\nreceived the intensive (experimental) treatment to the data from those who received the\nminimal (control) treatment Data were obtained from all subjects on school placement\nand from all but one subject in the control\ngroup on IQ. On the Vineland, scores were\nobtained for 18 of 19 experimental subjects\nand 15 of 19 control subjects. The lowest\navailability of follow-up scores was on the\nPersonality Inventory forChildren,with scores\nfor 15 experimental subjects and 12 control\nsubjects.\nThe subjects in the control group who\nhad Personality Inventory forChildren scores\ndid not appear to differ from subjects who\nwere missing these scores, as compared on\n363\n\n\x0cPet. Reh. App.13\n\n\xe2\x80\xa2<P*\n\nyearsokT ^/io\'in\'thVn mtake\n*mrnm\n\n\xc2\xbb\xe2\x96\xa0 t\n\nwas not evaluated. To check whether Project Tabl*1\nl\nstaff members were biased in their evaluaandSDs by Qrpup and Measur^\nmm\ntions or in their selccttonof which subjects ~~\xe2\x80\x94 " ^\nGroup\nto evaluate, we used t tests to compare\nsubjects they evaluated to those evaluated\nExperimental\nConi\nMeasure\nby outside agencies on intake IQ, IQ at age \xe2\x80\x94\nMean\nSO\nMean\n7 years, and IQ in the present study. No IQ\n84.5\n32.4\nsignificant differences between subjects\ni\nS.1\n26.4\nevaluated by Projectstaffmembersand those\noaiiy Living skins\n73.1\n26.9\nevaluated by outside agencies were found\nSocialization\n75.5\n26.6\nSchool Placement. In the experimental Ac^ehaV\xe2\x80\x98\xc2\xb0r\n716 26a\ngroup, 1 of the 9 subjects from the best- Maladaptive Behavior 106\n82\n\n\xc2\xa3 nI\n4\n\n\\W\n\n\xe2\x96\xa01\nb\xe2\x96\xa0\nrj..\n\ni\nIi\n\nSTafLTnwM\xe2\x80\x9c \xe2\x96\xa0TT* a\n\nfi".\n\nciass at age 7 (J. L.) was now in a special\neducation class. However, 1 of the other 10\nsubjects had gone from a special education\nclass to a regular class and was enrolled in a\n\ni\nL fi\n\ni\n\nCS\n?1:\n\\l-\n\nScales > 70\n"\xe2\x96\xa0*\n\n61.8\n4.0\n\n10.2\n3.9\n\nVineland Adaptive Behavior Scale. \xe2\x80\x9cPefionaltv\'invtmi\nfor Children.\nx**\xc2\xab*\xc2\xab\xc2\xbb|\n\nnot changed their classification. Overall, then| 00^\n72\nthe proportion of exoerimental cnhirwc\nP0811\xc2\xae score was 72 in the expenmdf\nregular classes did nJTchange from th?age SSmr^for rh*6 \xe2\x80\x9c"\xe2\x80\x98T* 8T\xc2\xb0YP\xe2\x80\x98 <1\n7 evaluation (9 of 19, or 47%). In the control EStolOO\ngroup, none of the 19 children were in a\ndeviation tl\nregular class, as had been true at the age 7\ni ^\nevaluation. The difference in classroom dace- l^S yi,LlVln^, fn?1SoClah2ation\xe2\x80\x94\nIt\nment between the experimental group and\nthe control group was statistically sjgnifirant\nbetween thc groupsandi,\nX*0, N- 38)- 19.05,p< .05. * ^\n,ndfeaUn\xc2\xae$\nIntellectual Functioning The test srorpc SC0SS ^\n\xe2\x80\x98^bscales^ the experiment\n\nm\n\ni\nm\n\nI\n\xe2\x96\xa0m\n3\n\n\\wi\n\nT\n\nI\n\n;\xe2\x96\xa0\n\n\xe2\x96\xa0<;\n\nof 83and\'\n\n=\xe2\x80\x9cft2asSS! ^=&=\'sc\'7\xe2\x80\x9cc,,\xe2\x80\x9d\'l\n\n$1%*\'\n\nV\n\n^groupatage7^mean\n\n\'*\n\nHl-\'.H! .is\n\nSfel\n\n^ at 7\n\ni\ni.=ti.\n\n.\xe2\x96\xa0.\xe2\x96\xa0if\n\n:\xe2\x96\xa0?\nl\xc2\xa3m\n\nI\nm\nm\n\nlife\nA\n\ni\n\niw\n\ns 1I\n\nton did to comSgroup ffi dSS\n\nIif\n\n^^RZSSSsr5\n\xe2\x80\x94,\xe2\x80\x94. v.,\n... U1C tuniroi\nQia\n\nm\n1 a\xe2\x80\xa2?\nm\nIM\n\nm\nw% \xc2\xbbs\nM.3 L-\n\nm\n\nm\n\n1\n\nI:\n\nwell. The scores were similar to those ob!\netod by to experimental gronp ,\xe2\x80\x9ed eo\xe2\x80\x9e.\n\n\xc2\xbbJ5ET\n\n\xe2\x80\x9cf- \xc2\xaecnr=d|\n\n-\'"\xe2\x80\x94--\xe2\x80\x9c\xe2\x80\xa2\xe2\x96\xa0-\xe2\x96\xa0"I\n\nll10\xe2\x80\x99 W! and 10 or above, at 14 yeas\nih^extSm^f\nfinJn8s .\xe2\x80\x9cwhale tin!\ntoSCS.";iB,\'SZCb3S:\n\n%\n364\nAutmm and Early Inlervnnlrsjl\n\n1\n\n\x0cPet. Reh. App.14\n\niors than did the control group.\nPersonality Functioning, Scores for the\nexperimental group and control group did\nnot differ on overall scale elevation, with\nmean {scores of 62 and 65, respectively. (On\nthis test, the mean t score for the general\npopulation is approximately 50 [5D = 101.) T\nscores above 60 are considered indicative of\npossible or mild deviance, whereas / scores\nabove 70 are viewed as suggesting a clini\xc2\xad\ncally significant problem, namely, one that\nmay require professional attention. There\nwas a significant interaction between the\ngroups and the individual scales on this test,\nF(15, 390) \xe2\x80\x9d 2.36, p < .01. Results of the\nTukey test indicated that the most reliable\ndifference between groups occurred on the\nPsychosis scale, on which the experimental\nsubjects had a mean of 78 and the control\nsubjects had a mean of 104, 7(1, 26) =* 8.53,\np < .01. Seven subjects in the experimental\ngroupscored in the dinically preferred range\n(below 70), whereas no subjects in the con\xc2\xad\ntrol group scored that low. Only one other\nscale showed a significant difference, So\xc2\xad\nmatic Concerns, 7(1,26) - 4.60,p< .05. The\ncontrol subjects tended to display a below\naverage level ofsomatic complaints (mean of\n45 as compared to 54 for the experimental\nsubjects).\nBest-Outcome Versus Nondinical\nComparison Group\nA (test indicated no significant differ\xc2\xad\nence in age between the best-outcome group\nand the comparison group of children with\xc2\xad\nout a history of clinically significant behav\xc2\xad\nioral disturbance. Subjects in the best-out\xc2\xad\ncome group had a mean age of 12.42 years\n(range 10.0 to 16.25) versus 12.92 years\n(range 9.0 to 15.17) for the nondinical com\xc2\xad\nparison group. Scores on the WISC-R and\ndinical rating scale were obtained for all\nsubjects; 1 experimental subject and 2\nnondinical comparison subjects were miss\xc2\xad\ning Vineland scores, and 2 experimental\nsubjects and 1 nondinical comparison sub\xc2\xad\nject were missing Personality Inventory for\n\nMcEachin, Smith, and Lovaaa\nv\n\nr\n\nChildren scores. Both the Vineland and Per\xc2\xad\nsonality Inventory for Children were com\xc2\xad\npleted by parents. In cases where these\nscores were not obtained, the parents had\ndeclined to participate.\nOn the measures that provide standard\xc2\xad\nized scores, the functioning of the bestoutcome subjects was measured most pre\xc2\xad\ncisely by comparing the best-outcome group\nagainst the test norms. Therefore, this analy\xc2\xad\nsis is of primary interest. Data for the\nnondinical comparison group are mainly\nuseful in confirming that the assessment\nprocedures were valid and in providing a\ncontrast group for the one measure without\nnorms, the Clinical Rating Scale. For the\nnondinical comparison group, it will suffice\nto summarize the results as follows: On the\nW1SC-R this group had mean IQs of 116\nVerbal, 118 Performance, arid 119 Full-Scale.\nOn the Vineland the group obtained mean\nstandard scores of 102 Communication, 100\nDaily Living Skills, 102 Sodalization, and 101\nComposite. The mean scale score on the\nPersonality Inventory for Children was 49.\nThus, the nondinical comparison group dis\xc2\xad\nplayed above-average or average function\xc2\xad\ning across all areas that were assessed.\nThe next section is focused on the\nfunctioning of the best-outcome group on\nIQ, adaptive and maladaptive behavior, and\npersonality measures and contrasts the bestoutcome subjects with the comparison sub\xc2\xad\njects on the Clinical Rating Scale.\nIntellectual Functioning. Table 2 pre\xc2\xad\nsents the IQ data for each subject in the bestoutcome group and the mean scores for the\ngroup. This table shows that, as a whole, the\n9 best-outcome subjects performed well on\nthe WISC-R. Their IQs placed them in the\nhigh end of the normal range, about two\nthirds of an SD above the mean. Their FullScale IQs ranged from 99 to 136.\nSubjects\' scores were evenly distributed\nacross a range from 80 to 125 on Verbal IQ\nand from 88 to 138 ion Performance IQ. The\nsubjects averaged 3 points higher on Perfor\xc2\xad\nmance IQ than Verbal IQ. Two of them (J. L\nand A. G.) had at least a 20-point difference\n\n365\n\n\x0cPet. Reh. App.15\n\n*\'KmtpT\n\np\n\n"* i- **\xc2\xbb\n\n\xe2\x80\xa2mw:!W\n\nr&zaxayrmZTii\'S*\xe2\x80\x99\xe2\x80\x99\n\nZM&Z\n\n-=\n\n^ssppwipi^ii.\nlil\nNote. Infrm = Information, Simil - SimHaritieu, Arith=Arithmetic.Vocab = Vocabulary, Compr=Comprehension,\nPicC = Picture Completion, PicA = Picture Arrangement, BlkD = Block Design, ObiA = Object Assembly, Cod\n= Coding, VIQ = Verbal IQ, PIQ = Performance IQ, and Full = Full-Scale IQ.\n\nbetween Veibal and Performance IQ.\nOn each subtest of the WISC-R, the\nmean for the general population is 10 (JSD =\n3). It can be seen from Table 2 that the bestoutcome subjects scored highest on Similarities, Block Design, and Object Assembly,\nThey scored lowest.on Picture Arrangement\nand Arithmetic. Thus, the subjects consistently scored at or above average.\nAdaptive and Maladaptive Behavior,\nTable 3 presents the data for the best-outcome group on the Vineland Adaptive Behavior Scales. It can be seen that the bestoutcome group scored about average on the\nComposite Scale and on the subscales for\nCommunication, Daily living, and Sodalization. However, Table 3 shows that some of\nthe best-outcome subjects had marginal\nscores, including J. L., B. W., and M. M. Even\nso, all of the best-outcome subjects had\nComposite scores within the normal range.\nAs can be seen in Table 3, on the\nMaladaptive Behavior Scale (Parts I and n),\nthe mean score for the best-outcome group\nindicated that, on average, these subjects did\nnot display dinically significant levels of\nmaladaptive behavior. Three of them scored\nin the dinically significant range versus one\nsubject in the nondinical comparison group,\nwhich had a mean of 7.7 on this scale.\nPersonality Functioning. The results of\nthe Personality Inventory for Children are\nsummarized in Table 4. The best-outcome\nsubjects obtained valid profiles on the Per-\n\nsonality Inventory for Children, as measured\nby the three validity scales (lie, Frequency,\nand Defensiveness). As can be seen from the\ntable, the subjects scored in the normal range\nacross all scales. They tended to score highest on Intellectual-Screening, Psychosis, and\nFrequency. Intellectual-Screening assesses\nslow intellectual development, and Psychosis and Frequency assess unusual or strange\nbehaviors. Only Intellectual-Screening was\nabove the normal range, and this scale is\naffected by subjects\xe2\x80\x99 early history. For example, the scale contains statements such as\n\'My child first talked before he (she) was two\nyears old,\xe2\x80\x9d which would be false for the bestoutcome subjects regardless of their current\nlevel of functioning,\nAs Table 4 indicates, 4 best-outcome\nsubjects had a single scale elevated beyond\n\nf\n\n:i\nlb\':";\'\n1% }\n\nI\n\na\n1\n1\n\n1M\nI\n%\n\nI8\nImM\'\nr\n\n*1\n\ni\n\nm ..fe-\n\nr\n\n&\nf\n\n%.\n\nr\ni\n\xc2\xa3\n\ni\n\nTable 3\nScores on the Vineland Adaptive Behavior Scale\nfor the Best-Outcome Subjects\n\n\xe2\x80\x9e\n___Adaptive behavior\n-^ect \xc2\xb0\xc2\xb0w\n\xe2\x96\xa0 800 Comp\n63\n98\n102\n92\nR-S\xe2\x80\x98\nM.C.\n119\n93\n88\n98\nm.m.\n119\n79 114\n105\n107\n108 112\n108\n*j-\xc2\xae77 103\n94\n88\n\xe2\x96\xa1\xc2\xa3.\n93\n81\n82\n80\nA.G.\n\nB.W.\n\n101\n83\n\n97\n74\n\n99\n105\n\n98\n83\n\nMaladaptive\nbehavior\n6\n16\n2\n4\n13\n15\n5\n9\n\ni\ni\n\n!\n\nIS\n\nB.R.\n98 92\nMean\n99\n94\n8.8\nNote. Com = Communication, OLS = Daily Living SkIHs, Soc\n* Socialization. Comp = Adaptive Behavior Composite.\n\nwmem.wmm\nt \xe2\x80\xa2>\xe2\x80\x94y ^......\n\n:r-^* \xe2\x80\xa2\'"lll\'lll\n\nj\n\n\x0cPet. Reh. App.16\nTable 4\nT Scores on the Personality Inventory for Children for the Best-Outcome Subjects\n\nT score\n..........\n\niV\n\n_ .r\n\nl!v\xe2\x80\x9c -I\n\nthe clinically significant range and a 5th (J.L.)\nhad nine scales elevated, including the highest scores in the best-outcome group on\nIntellectual-Screening, Psychosis, and Frequency. Thus, this subject appeared to account for much of the elevation in scores on\nthese scales. By comparison, there were 3\nsubjects in the nondinical comparison group\nwith at least one scale elevated.\nf\nGinical Rating Scale. On this scale, 8 of\nthe best-outcome subjects scored between 0\n& and 10, and the 9th 0- L.) scored 42. The\n|Hfl mean was 8.8, with a standard deviation of\n\\ 12i9. The nondinical comparison subjects all\n\xe2\x80\x99i scored between 0 and 5 (mean = 1.7, SD S\n2.1). Because these SDs are unequal, we\n\'|||! used a nonparametric statistic, a MannWhitney C/test, revealing a significant differ38*^ ence between groups, U= 19, p < .05. Thus,\nx-jg- the best-outcome subjects displayed more\ndeviance than did the comparison subjects,\nJ\xc2\xa7H| but most of the deviance appeared to come\nfrom one subject, J. L.\n\nDiscussion\nThis study is a later and more extensive\nfollow-up of two groups of young subjects\nWith autism who were previously studied by\nLovaas (1987): (a) an experimental group (n\n= 19) that had received very intensive behavioral treatment and (b) a control group (n =\n19) that had received minimal behavioral\n\n\xe2\x80\xa2\'V\'\n\nMcEachin, Smith, and Lovaas\n\n....\'..... \'............... "................\n\ntreatment. In the present study we have\nreported data on these children at a mean age\nof 13 years for subjects in the experimental\ngroup and 10 years for those in the control\ngroup. Hie data were obtained from a cornprehensive assessment battery,\nThe main findings from the test battery\nwere as follows: First, subjects in the experi\xc2\xad\nmental group had maintained their level of\nintellectual functioning between their previous assessment at age 7 and the present\nevaluation at a mean age of 13, as measured\nby standardized intelligence tests. Their mean\nIQ was about 30 points higher than that of\ncontrol subjects. Second, experimental subjects also displayed significantly higher levels of functioning than did control subjects\non measures of adaptive behavior and personality. Third, in a particularly rigorous\nevaluation of the 9 subjects in the experimental group who had been classified as\nbest-outcome (normal-functioning) in the\nearlier study (Lovaas, 1987), the test results\nconsistently indicated that the subjects ex\xc2\xad\nhibited average intelligence and average\nlevels of adaptive functioning. Some devi\xc2\xad\nance from average was found on the personality test and the clinical ratings. However,\nthis deviance appeared to derive from the\nextreme scores of one subject ,J. L (see Table\n2, 3, and 4). This subject also had been\nremoved from nonspedal education classes\nand placed in a class for children with\nlanguage delays, and he obtained relatively\n\n367\n\n-&\n\n\x0cPet. Reh. App.17\n\n171\n;\xe2\x96\xa0 it\n\n\xc2\xa7&m,\n\xe2\x96\xa0 \xe2\x96\xa0\n\n\xe2\x96\xa0\n\n\\i\n\n.\xe2\x96\xba\n\n4\n\nWBw^W^*\n\xe2\x96\xa0 \xe2\x96\xa0 <! :H-)\n\nIliMi\n(j|j||gi\n\n\xe2\x80\xa2 r\n\n\xe2\x96\xa0 y\n\n[i\'jh. \xe2\x96\xa0\xe2\x80\x9c.\xe2\x96\xa0\xe2\x96\xa0\n\nsfe-vH\n\n.\n\n, 4\n\n;i;v - jst; |.\n-.MV \xe2\x96\xa0 .$*/ i\n\nmsmi\n\n.i\n-jt.\n\nlow scores (about 80) on the Verbal section\nof the intelligence test and the Communication section of the measure of adaptive\nbehavior. Thus, he no longer appeared to be\nnormal-functioning. However, the remaining 8 subjects who had previously been\nclassified as normal-functioning demonstrated\naverage IQ, with intellectual performance\nevenly distributed across subtests, were able\nto hold their own in regular classes, did not\nshow signs of emotional disturbance, and\ndemonstrated adequate development of adapfive and sodal skills within the normal range,\nIn addition, subjective clinical impressions\nof blind examiners did not discriminate them\nfrom children with no history of behavioral\ndisturbance. These 8 subjects (42% of the\nexperimental group) may be judged to have\nmade major and enduring gains and may be\ndescribed as \xe2\x80\x98normal-functioning.* By contrast, none of the control group subjects\nachieved such a favorable outcome, consistent with the poor prognosis for children\nwith autism repotted by other investigators\n(Freeman, Ritvo, Needleman, &Yokota, 1985).\nIn order to evaluate this outcome, we\nmust pay close attention to whether or not\nour methodology was sound. The adequacy\nof our methodology is crucial because the\noutcome in the present study represents a\nmajor improvement over outcomes obtained\nin previous experimental studies on the\ntreatment of children with autism (Rutter,\n1985). The only reports of comparable outcomes have come from uncontrolled case\nstudies (e.g., Bettelheim, 1967), and subsequent investigations have indicated that these\ncase studies grossly overestimated the outcomes obtainable with the treatment that\nwas provided. Similarly, reports of major\ngains in other populations, such as large IQ\nincreases in children From impoverished\nbackgrounds, also have been based on highly\nquestionable evidence (Kamin, 1974; Spitz,\n1986). Such reports have the potential to\ncause a great deal of harm by misleading\nconsumers and professionals.\nA detailed description of all the methodological safeguards that should be built\n\ninto a treatment study is beyond the scop I\nthe present report (see Kazdin, 1980; Ken\n&Norton-Ford, 1982; Spitz, 1986). Howe\nwe note that we incorporated a large mini\nof methodological safeguards in both\noriginal study (Lovaas, 1987) and the pres\ninvestigation:\n1. The experimental group and\ncontrol group received equivalent ass<\nment batteries at intake and were found tc\nvery similar on a multitude of import\nvariables. Moreover, the number of coni\ngroup subjects who were predicted to achii\nnormal functioning, had they received int\nsive treatment, was approximately equal\nthe number of experimental subjects w\nactually did achieve normal functioning w\nintensive treatment (Lovaas & Smith, 19\xc2\xa3\nThus, the subject assignment procedi\nyielded groups that were comparable pt\nto treatment This provided a strong indi\ntion that the superior functioning of l\nexperimental group after treatment war\nresult of the treatment itself rather thar\nbiased procedure for assigning subjects\nthe experimental group,\n2. All subjects remained in the groups\nwhich they were assigned at intake. On!\',\nsubjects dropped out, and they were r\nreplaced. Therefore, the original compc\ntion of the groups was essentially preservt\n3. All subjects were independently\nagnosed as autistic by PhD or MDclinidai\nand there was high agreement on the di:\nnosis between the independent dinidar\nThis provided evidence that subjects n\ncriteria for a diagnosis of autism.\n4. Prior to treatment, these subjei\nappeared to be comparable to those dia\nnosed as having autism in other resear\ninvestigations. Evidence for this comes frc\nthe second control group that was incorp\nrated into the initial treatment study. Tl\ngroup was evaluated by another resear\'\nteam (independent of ours), had similar I<\nat intake based on the same measures\nintelligence that weused, yet showed simil\noutcome data to those repotted by oth\ninvestigators. Additional evidence can I\n\n,\n\n< -4\nV * \'i1.*-,\n\nWmmmmi\n\n368\n\nAutism and Early Interventt\n\n\x0cPet. Reh. App.18\n\n;\n\nI\n\n^\nlt\n\nderived from the similarity of our intake data\nto data reported by other investigators (Lovaas\netal., 1989). For example, although Schopler\nand his associates (Schopler, Short,&Mesibov,\n1989) suggested that oursample had a higher\nmean IQ than did other samples of children\nwith autism, their own data do not appear to\ndiffer from ours (Lord & Schopler, 1989).\nThus, there is evidence that our subjects\nwere a typical group of preschool-age children with autism rather than a select group\nofhigh-level children with autism who would\nhave been expected to achieve normal functioning with little or no treatment\n5. The first control group, which received up to 10 hours a week of one-to-one\nbehavioral treatment, did not differ at posttreatment from the second control group,\nwhich received no treatment from us. Both\ngroups achieved substantially less favorable\noutcomes than did the experimental group,\nBecause all groups were similar at pretreattnent, this result confirms that our subjects\nhad problems that responded only to intensive treatment rather than problems such as\nbeing noncompliant or holding back (masking an underlying, essentially average in tellectual functioning that would respond to\nsmalier-scale interventions).\n:\n6. Subjects\xe2\x80\x99families ranged from high to\nlow socioeconomic status, and, on average,\nthey did not differ from the general population(Lovaas, 1987). Thus, although our treatment required extensive family partidpation, a diverse group of families was\napparently able to meet this requirement.\n7. The treatment has been described in\ndetail (Lovaas et al., 1980; Lovaas & Leaf,\n1981), and the effectiveness of many components of the treatment has been demonstarted experimentally by a large number of\ninvestigators over the past 30 years (cf.\nNewsom &Rincover, 1989). Hence, our treatment may be replicable, a point that is\ndiscussed in greater detail later.\n8. The results of the present follow-up,\nwhich extended several years beyond discharge from treatment for most subjects, are\nan encouraging sign that treatment gains\n\nMcEachin, Smith, and Lovaas\n\nhave been maintained for an extended periodoftime.\n9. A wide range of measures was administered, avoiding overreliance on intelligence\ntests, which have limitations if used in isolation (e.g., bias resulting from teaching to the\ntest, selecting a test that would yield espedally favorable results, failing to assess other\naspects of functioning such as sociiai competence or school performance) (Spitz, 1986;\nZigler & Trickett, 1978).\n10. The use at follow-up of a normal\ncomparison group, standardized testing, and\nblind rating allowed for an objective, detailed, and quantifiable assessment of treatment effectiveness. A particularly rigorous\nassessment was given to those subjects who\nshowed the most improvement\nTaken together, these safeguards pro\xc2\xad\nvide considerable assurance that the favorable outcome of the experimental subjects\ncan be attributed to the treatment they received rather than to extraneous factors such\nas improvement that would have occurred\nregardless of treatment, biased procedures\nfor selecting subjects or assigning them to\ngroups, or narrow Dr inappropriate assessment batteries.\nDespite the numerous precautions that\nwe have taken, several concerns may be\nraised about the validity of the results. Perhaps the most important is that the assignment to the experimental or control group\nwas made on the basis of therapist availability rather than a more arbitrary procedure\nsuch as alternating referrals (assigning the\nfirst referral to the experimental group, the\nsecond to the control group, the third to the\nexperimental group, and so forth). However,\nit seems unlikely that the assignment was\nbiased in view of the pretreatment data we\nhave presented on the similarity between the\nexperimental and control groups. On the\nother hand, we do not know as yet whether\nthere exists a pretreatment variable that does\npredict outcome but was not among the 19\nwe chose, yet could have discriminated be\xc2\xad\ntween groups. In an earlier publication\n(Lovaas etal., 1989), we responded in some\n\n369\n\n\x0cPet. Reh. App.19\n\nI\n\nr\n\nt\n1<A\nI\n\nI\n[1\n\nI\n\n\\\ni\n\nI\n\ni\nI\n\xc2\xa5\n\ni\n\ndetail to the concern about subject assignment as well as other possible problems\nassodated with the original study. There are\ncertain additional questions that may be\nraised by this follow-up investigation:\n1. The experimental group was older\nthan the control group at the time of this\nfollow-up evaluation. We explained this finding earlier and noted that data analyses\nindicated that it was unlikely that this age\ndifference reflected a bias in subject assignments.\n2. The follow-up assessments for 17 of\nthe lower functioning subjects in this study\nwere conducted by staff members from our\nProject, who could have biased the test\nresults. However, as noted previously, a\ncheck revealed no evidence of such a bias,\n3. The Clinical Rating Scale, based on an\ninterview with subjects who had been dassified as normal-functioning in the original\nstudy, has no norms or data on reliability and\nvalidity. However, we regard the interview\nsimply as an extra check on whether the\nexaminers detected residual signs of autism\nor other behavior problems that were somehow overlooked in the three other (wellstandardized) measures in the study and\ntheir 30 subscales. We do not regard the\ninterview as an instrument that by itself\nyields condusive results. No other interview\nthat suited our purposes currently exists. In\nfuture investigations, we plan to use an\ninterview that Michael Rutter and his associates are now developing for the purpose of\ndetecting of residual signs of autism in individuals with average intelligence.\n4. As in most long-term follow-up studies, we had some missing data, However,\nthere is no evidence that the missing data\nwould have changed the overall results.\n5. In our analysis of the best-outcome\ngroup, we noted that the group averages\ndeviated from \xe2\x80\x98normal\xe2\x80\x99 on one subscale of\nthe Personality Inventory for Children and\non the Clinical Rating Scale. We then attributed this deviance to the extreme scores of\none subject rather than to general problems\nwithin this group. We recognize that group\n370\n\ni\n\naverages are seldom interpreted this way.\nHowever, as statisticians and methodologistshavepointedout(e.g., Bariow&Hersen,\n1984), there are many times when group\naverages represent the performance of few\nor no subjects within the group. This was one\nofthosetimes, as is dearly shown by the data\non individual subjects (Tables 2, 3, and 4).\nDeviance was found almost exdusively in\none subject, not evenly distributed across all\nsubjects, and we have presented the results\naccordingly.\nThe most important void for research to\nfill at this time is replication by independent\ninvestigators who employ sound methodologies. Given the objective assessment in\xc2\xad\nstruments that we used and the detailed\ndescription that we have provided of the\ntreatment (Lovaas et al., 1980), such a replication should be possible. However, the\ntreatment is complex and to replicate it\nproperly, an investigator probably needs to\npossess (a) a strong foundation in learning\ntheory research; (b) a detailed knowledge of\nthe treatment manual we used; (c) a supervised practicum of at least 6 months in oneto-one work with clients who have developmental ddays, emphasizing discrimination\nlearning and building complex language;\nand (d) a commitment to provide 40 hours of\none-to-one treatment to client per week, 50\nweeks per year, for at least 2 years. Our bestoutcome subjects all required a minimum of\n2 years of intensive treatment to achieve\naverage levds of functioning (another indi\xc2\xad\ncation that those subjects had pervasive\ndisabilities and were not merely noncompliant).\nA second void to fill concerns the majority of children who did not benefit to the\npoint of achieving normal functioning with\nintensive treatment Perhaps an earlier start\nin treatment would have been all that was\nneeded to obtain favorable outcomes with\nmany of these children. More pessimistically,\nperhaps such children require new and different interventions that have yet to be\ndiscovered and implemented. In any case, it\nis essential to develop more appropriate\n\nAutism and Eariy intervention\n\nI\n\n1\n\n*\xe2\x80\xa2\n\nt, t.\n\n\xe2\x96\xa0\nJjj.\n:9f \'**\n: !\xe2\x96\xa0 \xe2\x80\xa2\xe2\x96\xa0\xc2\xa3\n, 71*?* ;\nAr;\n|\n\nJr\n$\n\nm\n\nMb\n\n\xe2\x80\xa2 -e\n\n\x0cPet. Reh. App.20\n\nservices for these children.\nFinally, a rather speculative but promis\xc2\xad\ning area for research is to determine the\nextent to which early intervention alters\nneurological structures in young children\nwith autism. Autism is almost certainly the\nresult of deficits in such neurological struc\xc2\xad\ntures (Rutter & Schopler, 1987). However,\nlaboratory studies on animals have shown\nthat alterations in neurological structure are\nquite possible as a result of changes in the\nenvironment in the first years of life (Sirevaag\n& Greenough, 1988), and there is reason to\nbelieve that alterations are also possible in\nyoung children. For example, children under\n3 years of age overproduce neurons, den\xc2\xad\ndrites, axons, and synapses. Huttenlocher\n(1984) hypothesized that, with appropriate\nstimulation from the environment, this overproduction might allow infants and\npreschoolers to compensate for neurological\nanomalies much more completely than do\nolder children. Caution is needed in gener\xc2\xad\nalizing from these findings on average chil\xc2\xad\ndren to early intervention with children with\nautism, particularly because the exact nature\nof the neurological anomalies of children\nwith autism is unclear at present (e.g., Rutter\n& Schopler, 1987). Nevertheless, die findings\nsuggest that intensive early intervention could\ncompensate for neurological anomalies in\nsuch children. Finding evidence for such\ncompensation would help explain why the\ntreatment in this study was effective. More\ngenerally, it might contribute to an under\xc2\xad\nstanding of brain-behavior relations in young\nchildren.\n\nReferences\nBarlow, D. H., ft Hersen, M. (1984). Single case\nexperimental design: Strategies for studying\nbehavior change (2nd ed.). New York: Pergamon Press.\nBettelheim, B. (1967). The emptyfortress. New\nYork: The Free Press.\nDeMyer, M. K., Hlngtgen, J. N., ftjackson, R.\nK. (1981). Infantile autism reviewed: A de\xc2\xad\ncade of research. Schizophrenia Bulletin, 7,\n\nMcEachin, Smith, and Lovaas\n\ni\n!\n\n388-451.\nDunn, L. M. (1981). Peabody Picture Vocabu\xc2\xad\nlary Test-Revised. Circle River, MN: American\nGuidance Service.\nFreeman, B.J., Rltvo, E. R., Necdlernan, R., ft\nYokota, A. (198$)- The stability off cognitive\nand linguistic parameters in autism: A 5-year\nstudy. Journal of the American Academy of\nChild Psychiatry, 24, 290-311.\nHuttenlocher, P. R. (1984). Synapse elimina\xc2\xad\ntion and plasticity in developing human cere\xc2\xad\nbral cortex. American Journal ofMental Defi\xc2\xad\nciency, 88, 488-496.\nKamin, L. J. (1974). The science andpolitics of\nl.Q. New York: Wiley.\nKanner, L. (1971). Follow-up study of 11 autis\xc2\xad\ntic children originally reported in 1943. Jour\xc2\xad\nnal ofAutism and Childhood Schizophrenia,\n1,119-145.\nKazdln, A. (1980). Research design in clinical\npsychology. New York: Harper & Row.\nKendall, P. C., ft Norton-Ford, J. D. (1982).\nTherapy outcome research methods. In P. C.\nKendall & J. N. Butcher (Eds.), Handbook of\nresearch methods in clinical psychology (pp.\n429-460). New York: Wiley.\nLetter, R. G. (1959). Part I of the manual for the\n1948revision of the Letter International Perfor\xc2\xad\nmance Scale: Evidence of the reliability and\nvalidity of the Letter tests. Psychology Service\nCenterJournal, 11,1-72.\nLord, C., ft Schopler, E. (1989). The role of age\nat assessment, developmental level, and test in\nthe stability of intelligence scores in young\nautistic children. Jou mol ofAutism and Devel\xc2\xad\nopmental Disorders, 19,483-499.\nLotter, V. (1978). Follow-up studies. In M.\nRutter ft E. Schopler (Eds.), Autism: A reap\xc2\xad\npraisal of concepts and treatment. London:\nPlenum Press.\nLovaas, O. I. (1987). Behavioral treatment and\nnormal educational and intellectual function\xc2\xad\ning in young autistic children. Journal ofCon\xc2\xad\nsulting and Clinical Psychology, 55, 3\xe2\x80\x949Lovaas, 0.1., Ackerman, A. B., Alexander, D.,\nFirestone, P., Perkins, J., ft Young, D.\n(1980). Teaching developmental^ disabled\nchildren: The me book. Austin, TX: Pro-Ed.\nLovaas, 0.1., Koegel, R. L., Simmons, J. Q., ft\nLong, J. S. (1973). Some generalization and\nfollow-up measures on autistic children in\nbehavior therapy. Journal of Applied Behavior\nAnalysis, 6,131-166.\nLovaas, O. I., ft Leaf, R. L. (1981). Five video\n\n371\n\nj\n\n8\n\n\x0cPet. Reh. App.21\nffi\n\nl\ntapes for teaching developmental^ disabled\nchildren. Baltimore: University Park Press.\nLovaas, O. I., ft Smith, T. (1988). Intensive\nbehavioral treatment with young autistic children. In B. B. Lahey & A. E. Kazdin (Eds.),\nAdvances in clinical childpsychology (Vol. 11,\npp. 285-324). New York: Plenum Press.\nLovaas, O. I., Smith, T., ft McEachin, J. J.\n(1989). Clarifying comments on the young\nautism study: Reply to Schopler, Short and\nMesibov. Journal of Consulting and Clinical\nPsychology, 57,165-167.\nMcEachin, J. J. (1987). Outcome of autistic\nchildren receiving intensive behavioral treat\xc2\xad\nment- Psychological status 3 to 12 years later.\nUnpublished doctoral dissertation, University\nof California, Los Angeles.\nNewsom, C., ft Rlncover, A. (1989). Autism. In\nE. J. Mash & R. A. Barkley (Eds.), Treatment of\nchildhood disordersipp. 286-346). New York:\nGuilford Press.\nRatter, M. (1970). Autistic children: Infancy to\nadulthood. Seminars inPsychiatry, 2,435-450.\nRatter, M. (198$). The treatment of autistic\nchildren. Journal of Child Psychology 6 Psy\xc2\xad\nchiatry, 26,193-214.\nRutter, M., ft Schopler, E. (1987). Autism and\npervasive developmental disorders: Concepts\nand diagnostic issues. Journal of Autism and\nDevelopmental Disorders, 17,159-186.\nSchopler, E., Short, A., ft Mesibov, G. (1989).\nRelation of behavioral treatment to \xe2\x80\x98normal\nfunctioning*: Comment on Lovaas. Journal of\nConsulting and Clinical Psychology, 57,\n162-164.\nShort, A., ft Marcos, L. (1986). Psychoeducat tonal evaluation of autistic children and ado\xc2\xad\nlescents. In S. S. Strichart & P. Lazarus (Eds.),\n\n{i\nI\n\nI\n\nif\n\nPsycboeducational evaluation ofschool-aged\nchildren with low-incidence disorders (pp.\n155-180). Orlando, PL: Grune & Stratton.\nSimeonnson, R. J., Olley.J. G., ft Rosenthal,\nS. L. (1987). Early intervention for children\nwith autism. In M. J. Guralnick & P. C. Bennett\n(Eds.), The effectiveness of early intervention\nfor at-risk and handicapped children (pp.\n275-296). Orlando, FL- Academic Press.\nSirevaag, A. M., ft Greenough, W. T. (1988). A\nmultivariate statistical summary of synaptic\nplasticity measures in rats exposed to com\xc2\xad\nplex, social and individual environments. Brain\nResearch, 441,386-392.\nSparrow, S. S., Balia, D. A., ft Cicehettl, D. V.\n(1984). IntentrtetvEditionSurveyFormManiud.\nCircle Pines, MN: American Guidance Service.\nSpitz, H. H. (1986). The raising of intelligence.\nHillsdale, NJ: Eribaum.\nWaterhouse, 1_, ft Fein, D. (1984). Deveiopmental trends in cognitive skills for children\ndiagnosed as autistic and schizophrenic. Child\nDevelopment, 55, 236-248.\nWcchsler, D. (1974). Manualfor the Wecbsler\nIntelligence Scale for Cbildren-Revised. New\nYork: Psychological Corp.\nWirt, R. D., Lachar, D., Kllnedlnst, J. K., ft\nSeat, P. O. (1977)- Multidimensional descrip\xc2\xad\ntions of child personality: A manual for the\nPersonality Inventoryfor Children. Los Ange\xc2\xad\nles: Western Psychological Services.\nZlgler, E., ft Trickett, P. K. (1978). IQ, social\ncompetence, and evaluation of early child\xc2\xad\nhood intervention programs. American Psy\xc2\xad\nchologist, 33, 789-798.\nReceived: S/15/91; first decision: 10/16/91; accepted:\n1/23/92.\n\nI\n\ni\nJ\n*\n\n372\n\nAutism and Early Intervention\n\n1\n.\n\ni\n\n\x0cvolume\n\n110, number 6: 417-438 | november2*o!)5^^\' App\'22\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\nIntensive Behavioral Treatment for Children With\nAutism: Four-Year Outcome and Predictors\nGlen O. Sallows and Tamlynn D. Graupner\nWisconsin Early Autism Project (Madison)\n\nAbstract\nTwenty-four children with autism were randomly assigned to a clinic-directed group, rep\xc2\xad\nlicating the parameters of the early intensive behavioral treatment developed at UCLA, or\nto a parent-directed group that received intensive hours but less supervision by equally\nwell-trained supervisors. Outcome after 4 years of treatment, including cognitive, language,\nadaptive, social, and academic measures, was similar for both groups. After combining\ngroups, we found that 48% of all children showed rapid learning, achieved average post\xc2\xad\ntreatment scores, and at age 7, were succeeding in regular education classrooms. Treatment\noutcome was best predicted by pretreatment imitation, language, and social responsiveness.\nThese results are consistent with those reported by Lovaas and colleagues (Lovaas, 1987;\nMcEachin, Smith, & Lovaas, 1993).\n\nBehavioral approaches for addressing the de\xc2\xad\nlays and deficits common in autism have been\nrecognized by many as the most effective treat\xc2\xad\nment methods to date (Green, 1996; Maine Ad\xc2\xad\nministrators of Service for Children With Dis\xc2\xad\nabilities, 2000; New York State Department of\nHealth, 1999; Schreibman, 1988; Smith, 1993).\nThe intervention developed at UCLA in the\n1960s and 1970s is perhaps the best known and\nbest documented (e.g., Dawson & Osterling,\n1997; Green, 1996; Smith, 1993). Building on ear\xc2\xad\nlier research (e.g., Lovaas, Koegel, Simmons, &\nLong, 1973), Lovaas and staff of the UCLA\nYoung Autism Project (1970 to 1984) began treat\xc2\xad\nment with children under 4 years of age using a\ncurriculum emphasizing language development,\nsocial interaction, and school integration skills.\nAfter 2 to 3 years of treatment, 47% of the exper\xc2\xad\nimental group (9 of 19 children) versus 2% of the\ncomparison group (1 of 40 children) were report\xc2\xad\ned to have achieved \xe2\x80\x9cnormal functioning\xe2\x80\x9d (Lo\xc2\xad\nvaas, 1987; McEachin et al., 1993).\nThese findings demonstrated that many chil\xc2\xad\ndren widr autism could make dramatic improve\xc2\xad\nment, even achieve \xe2\x80\x9cnormalcy,\xe2\x80\x9d and many re\xc2\xad\n\xc2\xa9 American Association on Mental Retardation\n\nsearchers now agree that intensive behavioral\ntreatment can result in substantial gains for a large\nproportion of children (e.g., Harris, Handleman,\nGordon, Kristoff, & Fuentes, 1991; Mundy, 1993).\nHowever, the UCLA findings also created consid\xc2\xad\nerable controversy, and the studies were criticized\non methodological and other grounds (e.g.,\nGresham & MacMillan, 1998; Schopler, Short, &\nMesibov, 1989). One criticism was that the UCLA\ngroup used the term recovered to describe children\nwho had achieved IQ_in the average range and\nplacement in regular classrooms. Mundy (1993)\nsuggested that children diagnosed with high func\xc2\xad\ntioning autism might achieve similar outcomes\nand pointed out that several of the recovered chil\xc2\xad\ndren in the follow-up study of the UCLA children\nat age 13 (McEachin et al., 1993) had clinically\nsignificant scores on some behavioral measures.\nThe UCLA team responded by noting that (a)\nevaluators blind to background information had\nnot identified the recovered children as different\nfrom neurotypical children and (b) a few elevated\nscores may not imply abnormality because several\nof the neurotypical peers had them as well (Smith,\nMcEachin, & Lovaas, 1993). Questions were also\n417\n\n\x0cPet. Reh. App.23\n\nVOLUME 110, NUMBER 6: 417-438 I NOVEMBER 2005\n\nIntensive behavioral treatment\n\nraised regarding whether or not the UCLA results\ncould be fully replicated without the use of aversives, which were part of the UCLA protocol, but\nare not acceptable in most communities (Schreibman, 1997). Some have questioned the feasibility\nof implementing the program without the resourc\xc2\xad\nes of a university research center to train and su\xc2\xad\npervise treatment staff (Sheinkopf & Siegel, 1998)\nand to help defray the cost of the program, which,\ndue to the many hours of weekly treatment, can\nexceed $50,000 per year (although it has been ar\xc2\xad\ngued that the cost of not providing treatment may\nbe much greater over time: Jacobson, Mulick, &\nGreen, 1998). Finally, because only about half of\nthe children showed marked gains, the need for\npredictors to determine which children will ben\xc2\xad\nefit has been raised (Kazdin, 1993). Lovaas and\nhis colleagues responded to these and other criti\xc2\xad\ncisms (Lovaas, Smith, & McEachin, 1989; Smith\net al., 1993; Smith & Lovaas, 1997), but agreed\nwith others that replication and further research\nwere necessary.\nThere have now been several reports of partial\nreplication without using aversives (Anderson, Av\xc2\xad\nery, Di Pietro, Edwards, & Christian, 1987; Bimbrauer & Leach, 1993; Eikeseth, Smith, Jahr, &\nEldevik, 2002; Smith, Groen, & Wynn, 2000).\nMost found, as did Lovaas and his colleagues, that\na subset of children showed marked improvement\nin IQ. Although fewer children reached average\nlevels of functioning, the treatment provided in\nthese studies differed from the UCLA model in\nseveral ways (e.g., lower intensity and duration of\ntreatment, different sample characteristics and cur\xc2\xad\nriculum, and less training and supervision of\nstaff).\nAnderson et al. (1987) provided 15 hours per\nweek for 1 to 2 years (parents provided another 5\nhours) and found that 4 of 14 children achieved\nan IQ_over 80 and were in regular classes, but all\nneeded some support. Birnbrauer and Leach\n(1993) provided 19 hours per week for 1.5 to 2\nyears and found that 4 of 9 children achieved an\nIQ^over 80 (classroom placement was not report\xc2\xad\ned), but all had poor play skills and self-stimula\xc2\xad\ntory behaviors. The authors noted, however, that\ntheir treatment program had not addressed these\nareas. Smith et al. (2000) provided 25 hours per\nweek for 33 months and reported that 4 of 15\nchildren achieved an IQ_over 85 and were in reg\xc2\xad\nular classes, but one had behavior problems. The\nauthors noted that their sample had an atypically\nhigh number of mute children, 13 of 15, consid418\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\nG. O. Sallows and T. D. Graupner\n\nerably higher than the commonly cited figure of\n50% (Smith & Lovaas, 1997), and they hypothe\xc2\xad\nsized that this was the reason for the relatively low\nnumber of children functioning in the average\nrange following treatment. Eikeseth et al. (2002)\nprovided 28 hours per week for 1 year. In their\nsample, 7 of 13 children with pretreatment IQ_\nover 50 achieved I Clover 85 and were in regular\nclasses with some support. Data beyond the first\nyear have not yet been reported.\nFour groups of investigators discussed results\nbased on behavioral treatment in classroom set\xc2\xad\ntings, which typically include a mix of 1:1 treat\xc2\xad\nment and group activities, so that time in school\nmay not be comparable to hours reported in\nhome-based studies. Following 4 years of treat\xc2\xad\nment, Fenske, Zalenski, Krantz, and McClannahan (1985) found that 4 of 9 children were\nplaced in regular classes. However, neither pre\xc2\xad\nposttreatment test scores nor amount of support\nin school were reported. Harris et al. (1991) pro\xc2\xad\nvided 5.5 hours per day in class and instructed\nparents to provide an additional 10 to 15 hours\nat home (no data were collected on actual hours\nparents provided). After 1 year of treatment, 6 of\n9 children achieved Iover 85, but were still in\nclasses for students with learning disabilities. A lat\xc2\xad\ner report (Harris & Handleman, 2000) found that\n9 of 27 children achieved IQ_over 85 and were\nplaced in regular classes (time in treatment was\nnot reported), but most required some support.\nMeyer, Taylor, Levin, and Fisher (2001) provided\n30 hours of class time per week for at least 2 years\nand reported that 7 of 26 children were placed in\npublic schools after 3.5 years of treatment, but 5\nrequired support services. Pre-post IQjwas not re\xc2\xad\nported. Romanczyk, Lockshin, and Matey (2001)\nprovided 30 hours of class time per week for 3.3\nyears and reported that 15% of the children were\ndischarged to regular classrooms. No information\non posttreatment test scores or the need for sup\xc2\xad\nports was provided.\nIn two studies researchers examined the ef\xc2\xad\nfects of behavioral treatment for children with low\npretreatment IQ. Smith, Eikeseth, Klevstrand, and\nLovaas (1997) provided children who had pre\xc2\xad\ntreatment IQJess than 35 (M = 28) with 30 hours\nper week for 35 months and reported an increase\nin IQ^of 8 points (3 of 11 children achieved in\xc2\xad\ncreases of over 15 points) and 10 of 11 achieved\nsingle-word expressive speech. Eldevik, Eikeseth,\nJahr, and Smith (in press) provided children who\nhad an average pretreatment IQ_of 41 with 22\n\xc2\xa9 American Association on Mental Retardation\n\n\x0cvolume\n\n110,\n\nnumber 6:\n\n417-438 |\n\nnovember^o!)$\n\nApp.24 AMERICAN JOURNAL ON MENTAL RETARDATION\n\nIntensive behavioral treatment\n\nhours per week of 1:1 treatment for 20 months\nand reported an increase in IQ_of 8 points and an\nincrease in language standard scores of 11 points.\nIn three studies researchers examined results\nof behavioral treatment provided by clinicians\nworking outside university settings in what has\nbeen termed parent-managed treatment because par\xc2\xad\nents implement treatment designed by a workshop\nconsultant, who supervises less ffequendy (e.g.,\nonce every 2 to 4 months) than the supervision\nthat occurs in programs supervised by a local au\xc2\xad\ntism treatment center (e.g., twice per week). Sheinkopf and Siegel (1998) reported results for chil\xc2\xad\ndren who received 19 hours of treatment per week\nfor 16 months supervised by three local providers.\nSix of 11 children achieved IQover 90 and 5 were\nin regular classes, but still had residual symptoms.\nHowever, these children may not be comparable\nto high achievers in other studies because intelli\xc2\xad\ngence tests included the Merrill-Palmer, a measure\nof primarily nonverbal skills, known to yield\nscores about 15 points higher than standard in\xc2\xad\ntelligence tests that include both verbal and non\xc2\xad\nverbal scales. In the second study, Bibby, Eikeseth, Martin, Mudford, and Reeves (2002) de\xc2\xad\nscribed results for children who received 30 hours\nof treatment per week (range = 14 to 40) for 32\nmonths (range = 17 to 43) supervised by 25 dif\xc2\xad\nferent consultants, who saw the children several\ntimes per year (median = 4, range = 0 to 26).\nTen of 66 children achieved Iover 85, and 4\nwere in regular classes without help. However, as\nthe authors noted, their sample was unlike\nUCLA\xe2\x80\x99s in several ways: 15% had a pretreatment\nIQ^ under 37, 57% were older than 48 months,\nmany received fewer than 20 hours per week, 80%\nof the providers were not UCLA-trained, and no\nchild received weekly supervision. Weiss (1999) re\xc2\xad\nported the results of a study in which children did\nreceive high hours: 40 hours of treatment per\nweek for 2 years. She saw each child every 4 to 6\nweeks, reviewed videos of their performance every\n2 to 3 weeks, and spoke with parents weekly. Fol\xc2\xad\nlowing treatment, 9 of 20 children achieved scores\non the Vineland Applied Behavior Composite\n(ABC) of over 90, were placed in regular classes,\nand had scores on the Childhood Autism Rating\nScale in the nonautistic range (under 30). No preor posttreatment IQ^data were reported.\nSeveral researchers have described pretreat\xc2\xad\nment variables that seem to predict (are highly\ncorrelated with) later outcome. Although findings\nhave not always been consistent, the most com\xc2\xad\n\xc2\xa9 American Association on Mental Retardation\n\nG. O. Sallows and T. D. Graupner\n\nmonly noted predictors have been I Q_ (Bibby et\nal., 2002; Eikeseth et al., 2002; Goldstein, 2002;\nLovaas, 1987; Newsom & Rincover, 1989), pres\xc2\xad\nence of imitation ability (Goldstein, 2002; Lovaas\n& Smith, 1988; Newsom & Rincover, 1989;\nWeiss, 1999), language (Lord & Paul, 1997; Ven\xc2\xad\nter, Lord, & Schopler, 1992), younger age at in\xc2\xad\ntervention (Bibby et al., 2002; Fenske et al., 1985;\nGoldstein, 2002; Harris & Handleman, 2000), se\xc2\xad\nverity of symptoms (Venter et al., 1992), and so\xc2\xad\ncial responsiveness or \xe2\x80\x9cjoint attention\xe2\x80\x9d (Bono,\nDaley, & Sigman, 2004; L. Koegel, Koegel, Shoshan, & McNemey, 1999; Lord & Paul, 1997).\nMultiple regression has been used to deter\xc2\xad\nmine combinations of pretreatment variables with\nstrong relationships with outcome. Goldstein\n(2002) reported that verbal imitation plus IQ_plus\nage resulted in an R2 of .78 with acquisition of\nspoken language. Rapid learning during the first 3\nor 4 months of treatment has also been associated\nwith positive outcome (Lovaas & Smith, 1988;\nNewsom & Rincover, 1989; Weiss, 1999). Weiss\nreported that rapid acquisition of verbal imitation\nplus nonverbal imitation plus receptive instruc\xc2\xad\ntions resulted in an R2 of .71 with Vineland ABC\nand .73 with Childhood Autism Rating Scale\nscores 2 years later.\nWe designed the present study to examine\nseveral questions. Can a community-based pro\xc2\xad\ngram operating without the resources, support, or\nsupervision of a university center, implement the\nUCLA program with a similar population of chil\xc2\xad\ndren and achieve similar results without using aversives? Do significant residual symptoms of au\xc2\xad\ntism remain among children who achieve post\xc2\xad\ntreatment test scores in the average range? Can\npretreatment variables be identified that accurate\xc2\xad\nly predict outcome? We also examined the com\xc2\xad\nparative effectiveness of a less cosdy parent-di\xc2\xad\nrected treatment model.\n\nMethod\nParticipants\nResearchers at the Wisconsin site worked in\ncollaboration with and observed the guidelines set\nby the National Institutes of Mental Health\n(NIMH) for Lovaas\xe2\x80\x99 Multi-Site Young Autism\nProject. Children were recruited through local\nbirth to three (special education) programs. All\nchildren were screened for eligibility according to\nthe following criteria: (a) age at intake between 24\nand 42 months, (b) ratio estimate (mental age\n419\n\n\x0cPet. Reh. App.25\n\nVOLUME 110, NUMBER 6: 417-438 I NOVEMBER 2005\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\nG. O. Sallows and T. D. Graupner\n\nIntensive behavioral treatment\n\n[MA] divided by chronological age [CA]) of the\nMental Development Index of 35 or higher (the\nratio estimate was used because almost all children\nscored below the lowest Mental Development In\xc2\xad\ndex of 50 from the Bayley Scales of Infant De\xc2\xad\nvelopment Second Edition (Bayley, 1993), (c)\nneurologically within \xe2\x80\x9cnormal\xe2\x80\x9d limits (children\nwith abnormal EEGs or controlled seizures were\naccepted) as determined by a pediatric neurologist\n(no children were excluded based on this criteri\xc2\xad\non), and (d) a diagnosis of autism by independent\nchild psychiatrists well known for their experience\nand familiarity with autism. All children also met\nthe criteria for autism based on the Diagnostic\nand Statistical Manual of Mental Disorders\xe2\x80\x94\nDSM-IV (American Psychiatric Association, 1994)\nand the Autism Diagnostic Interview-Revised\n(Lord, Rutter, & LeCouteur, 1994), both admin\xc2\xad\nistered by a trained examiner. There were no pa\xc2\xad\nrental criteria for involvement beyond agreeing to\nthe conditions in the informed consent docu\xc2\xad\nment, one of which was accepting random assign\xc2\xad\nment to treatment conditions. The parents of all\nscreened children agreed to participate, and none\ndropped out upon learning of their group assign\xc2\xad\nment, minimizing bias in selection of participants\nand group composition.\nThirteen children began treatment in 1996,11\nin 1997, and 14 in 1998-1999. The last group had\nnot completed treatment when the data from the\nfirst two groups were analyzed, and their data will\nbe reported in a subsequent paper. The 24 chil\xc2\xad\ndren admitted during the first 2 years were 19\nboys and 5 girls. One girl was placed in foster care\nafter 1 year of treatment, and the foster parents\ndid not wish to continue treatment for her. Her\ndata were, therefore, excluded from the analysis.\nThe remaining 23 children had completed 4 years\nof treatment (or had \xe2\x80\x9cgraduated\xe2\x80\x9d earlier) at the\ntime of this report, although 1 child switched to\nanother provider of behavioral treatment after 1\nyear.\n\nDesign\nIn accordance with the research protocol ap\xc2\xad\nproved by NIMH, we matched children on pre\xc2\xad\ntreatment IQ_(Bayley MA divided by CA). They\nwere randomly assigned by a UCLA statistician to\nthe clinic-directed group (n = 13), replicating the\nparameters of the UCLA intensive behavioral\ntreatment (Lovaas, 1987) or to the parent-directed\ngroup (n = 10), intended to be a less intensive\nalternative treatment.\n420\n\nAll children received treatment based on the\nUCLA model. Parents in both groups were in\xc2\xad\nstructed to attend weekly team meetings and were\nencouraged to extend the impact of treatment by\npracticing newly learned material with their child\nthroughout the day. Demographic information as\nwell as hours of treatment and supervision are\nshown in Table 1. Children averaged 33 to 34\nmonths of age at pretest and began treatment at\n35 to 37 months. Children in the clinic-directed\nTable 1. Demographic Information and Hours of\nService by Group\nDescriptor\n\nClinic-directed\n\nBoys, girls\nOne-parent\nfamilies\n\nParentdirected\n\n11, 2\n\n8, 2\n\n0 of 13\n\n1 of 10\n\nIncome\nMedian ($)\n(Range)\n\n62,000\n(35-100+)\n\n59,000\n(30-100+)\n\nEducation (BA)\nMothers\n9 of 12\nFathers\n10 of 12\nSiblings (mean)\n2\nNo. nonverbal (%) 8/13 (62)\n\n9 of 10\n6 of 9\n2\n2/10 (20)\n\nAge (months) (SD)\nPretest\nTreatment\nPosttest\n\n34.20 (5.06)\n37.10 (5.36)\n82.50 (6.61)\n\n33.23 (3.89)\n35.00 (4.86)\n83.23 (8.92)\n\n1:1 hours per\nweek (SD)\nYear 1\nYear 2\nSenior therapist\n\nTeam meetings\nProgress review\n\n38.60 (2.91) 31.67 (5.81)\n36.55 (3.83) 30.88 (4.04)\n6-10 hrs\n6 hrs\nper week\nper month\n3, 2- to 3-hr\n1, 3-hr session\nsessions\nper 2 wks\n1 hr per week 1 hr per 1 or\n2 weeks\n1 hr per wk\n1 hr every\nfor 1-2\nother\nyears then\nmonth\n1 hr per 2\nmonths\n\nNote. The 1:1 hours for parent-directed children excludes\none child who received 14 hours per week.\n\xc2\xa9 American Association on Mental Retardation\n\n\x0cVOLUME 110, NUMBER 6: 417-438 I NOVEMBER2^5^e^*\' ^PP-26\n\nIntensive behavioral treatment\n\ngroup were to receive 40 hours per week of direct\ntreatment. The actual average was 39 during Year\n1 and 37 during Year 2, with gradually decreasing\nhours thereafter as children entered school. Par\xc2\xad\nents in the parent-directed group chose the num\xc2\xad\nber of weekly treatment hours provided by ther\xc2\xad\napists. The average was 32 hours during Year 1\nand 31 during Year 2, with the exception of one\nfamily that chose to have 14 hours both years.\nBecause the parent-directed children as a group\nreceived more intensive treatment than was pro\xc2\xad\nvided in most previous studies, only 6 to 7 hours\nless than the clinic-directed group, our ability to\nexamine the effect of differences in treatment in\xc2\xad\ntensity was limited.\nThe clinic-directed group received 6 to 10\nhours per week of in-home supervision from a se\xc2\xad\nnior therapist and weekly consultation by the se\xc2\xad\nnior author or clinic supervisor. Parent-directed\nchildren received 6 hours per month of in-home\nsupervision from a senior therapist (typically a 3hour session every other week) and consultation\nevery 2 months by the senior author or clinic supervisor.\nDirect treatment staff, referred to as therapists,\nwere hired by Wisconsin Early Autism Project\nstaff members for both the clinic- and parent-di\xc2\xad\nrected groups. Funding for 35 hours of 1:1 treat\xc2\xad\nment per week was provided through the Wiscon\xc2\xad\nsin Medical Assistance program. Treatment hours\nin excess of 35 were funded through project funds.\n\nMeasures\nWe used the Bayley Scales of Infant Devel\xc2\xad\nopment, Second Edition, to determine pretreat\xc2\xad\nment IQ. In addition we used the Merrill-Palmer\nScale of Mental Tests (Stutsman, 1948), an older\ntest of intelligence recommended for use with\nnonverbal children (Howlin, 1998), as a measure\nof nonverbal intelligence but not pre- or posttreat\xc2\xad\nment IQ1 We employed the Reynell Developmen\xc2\xad\ntal Language Scales (Reynell & Gruber, 1990) to\nassess language ability, because of its extensive\npsychometric data for preschool-age children, and\nthe Vineland Adaptive Behavior Scales (Sparrow,\nBalia, & Cicchetti, 1984) to measure adaptive\nfunctioning. Subscales of the Vineland assess\nCommunication in Daily Life, Daily Living Skills,\nand Social Skills. Information regarding develop\xc2\xad\nmental history (including loss of language and\nother skills), use of supplemental treatments and\npretreatment presence of functional speech was\n\xc2\xa9 American Association on Mental Retardation\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\nG. O. Sallows and T. D. Graupner\n\ngathered from parent interviews, reports from oth\xc2\xad\ner professionals, and direct observation.\nFollow-up testing was administered annually\nfor 4 years. As children grew older or became too\nadvanced for the norms of pretreatment tests, we\nused other age-appropriate tests. Cognitive func\xc2\xad\ntioning of older children was assessed using\nWechsler tests for 20 children\xe2\x80\x94Wechsler Pre\xc2\xad\nschool and Primary Scale of Intelligence-RevisedWPPSI (Wechsler, 1989); Wechsler Intelligence\nScale for Children-WISC-III (Wechsler, 1991)\xe2\x80\x94\nand the Bayley II for 3 children. Although we as\xc2\xad\nsessed nonverbal cognitive functioning, it was not\nused as a measure of posttreatment IQ; we em\xc2\xad\nployed the Leiter-R for 11 children (Roid & Mill\xc2\xad\ner, 1995, 1997) and the Merrill-Palmer for 12 chil\xc2\xad\ndren. Language was measured using the Clinical\nEvaluation of Language Fundamentals, Third\nEdition\xe2\x80\x94CELF III (Semel, Wiig, & Secord, 1995)\nfor 11 children and the Reynell for 12 children.\nWe administered the Vineland to all children for\nassessment of adaptive functioning.\nTo assess posttreatment social functioning, we\nreadministered the Autism Diagnostic InterviewRevised and used the Personality Inventory for\nChildren (Wirt, Lachar, Klinedinst, & Seat, 1977),\nwhich was completed by parents of all 23 children\nafter 3 years of treatment. After 4 years of treat\xc2\xad\nment, when the children were approximately 7\nyears old, parents and teachers completed the\nChild Behavior Checklist (Achenbach, 1991a,\n1991b) and Vineland for all 23 children. Bierman\nand Welsh (1997) noted that \xe2\x80\x9cteacher ratings are\nsuperior to those of other informants and provide\ninformation regarding peer interaction and group\nacceptance that are closest to those of peers\xe2\x80\x9d (p.\n348). Information was obtained from teachers on\nclassroom placement (regular, regular with modi\xc2\xad\nfied curriculum, partial special education [e.g.,\npullout/resource room or full special education],\nand supportive/therapeutic services [e.g., class\xc2\xad\nroom aide, speech or occupational therapy]) when\nthe children were 7 years old. We used the Woodcock-Johnson III Tests of Achievement (Wood\xc2\xad\ncock, McGrew, & Mather, 2001) to measure aca\xc2\xad\ndemic skills of children placed in regular educa\xc2\xad\ntion classes at age 7.\nThe second author administered the pretreat\xc2\xad\nment assessment battery prior to children being\nassigned to treatment groups. She received train\xc2\xad\ning in assessment at UCLA and met criterion for\nsatisfactory intertester reliability. One fourth of\nthe children in the current study were tested prior\n421\n\n\x0cVOLUME 110, NUMBER 6: 417-438 I NOVEMBER 2005^\'\n\nIntensive behavioral treatment\n\nto treatment by unaffiliated community psychol\xc2\xad\nogists. These children earned a ratio IQ^of 50.3\non the Bayley administered by the independent\npsychologists and 47.3 from the Wisconsin Pro\xc2\xad\nject evaluator. The mean absolute difference was\nthree points, r = .83, indicating absence of bias\nby the Wisconsin Project evaluator. Children who\nachieved IQs of 85 or higher at annual follow-up\ntesting were thereafter referred for assessment by\npsychologists who had extensive experience test\xc2\xad\ning children with autism at hospital-based assess\xc2\xad\nment clinics that were not affiliated with the Wis\xc2\xad\nconsin Project. These psychologists, who were un\xc2\xad\naware of group assignment or length of time in\ntreatment, used the tests listed above. Follow-up\ntesting of most children whose IQ^ remained de\xc2\xad\nlayed was conducted by the second author to re\xc2\xad\nduce cost.\nOne experimental assessment procedure, the\nEarly Learning Measure developed at UCLA\n(Smith, Buch, & Gamby, 2000) was administered\nto measure the rate of acquisition of skills during\nthe first several months of treatment. Every 3\nweeks for 3 months leading up to the beginning\nof treatment and for 6 months after treatment\nstarted, the same list of 40 items (10 each of verbal\nimitation, nonverbal imitation, following verbal\ninstructions, and expressive object labeling),\nwhich was known only to the experimenter, was\npresented to the children. Two sets of scores were\nobtained from the Early Learning Measure. The\nfirst was the number of items the child performed\ncorrecdy prior to the onset of treatment. The sec\xc2\xad\nond set of scores was the number of weeks re\xc2\xad\nquired for the child to learn 90% of the verbal\nimitation items once treatment had begun, there\xc2\xad\nby providing a measure of the child\xe2\x80\x99s rate of ac\xc2\xad\nquisition. This criterion was selected based on ear\xc2\xad\nlier research with the Early Learning Measure,\nwhich suggested the predictive validity of rapid\nacquisition of verbal imitation (Lovaas & Smith,\n1988).\nTreatment Procedure\nThe treatment procedure and curriculum were\nthose initially described by Lovaas (Lovaas et al.,\n1981), except that no aversives were used, with the\naddition of procedures supported by subsequent\nresearch (e.g., R. Koegel & Koegel, 1995), which\nhave been widely disseminated (e.g., Maurice,\nGreen, & Luce, 1996). Positive interactions were\nbuilt by engaging in favorite activities and re\xc2\xad\nsponding to the gestures used by each child to\n422\n\n^PPaMERICAN journal on mental retardation\n\nG. O. Sallows and T. D. Graupner\n\nindicate desires. Anticipation of success and mo\xc2\xad\ntivation to attend were increased by employing\nbrief, standard instructions and tasks requiring\nonly visual attending (e.g., matching), using fa\xc2\xad\nmiliar materials (e.g., the child\xe2\x80\x99s own ring stacker),\nprompting success (physically assisting him or her\nto place a ring on the pole if a demonstration was\nnot sufficient), presenting only two or three trials\nat a time, and reinforcing each response immedi\xc2\xad\nately with powerful reinforcers (e.g., edibles, phys\xc2\xad\nical play, or enthusiastic proclamations of success\n(such as \xe2\x80\x9cFantastic!\xe2\x80\x9d). Between these brief (ini\xc2\xad\ntially 30 seconds long) learning periods, staff\nmembers played with the children to keep the\nprocess more like play than work, generalize\nlearned material into more natural settings, and\ncontinue to build social responsiveness.\nReceptive language was generally targeted be\xc2\xad\nfore expressive language. We used familiar instruc\xc2\xad\ntions where success was easily prompted, such as\n\xe2\x80\x9csit down\xe2\x80\x9d or \xe2\x80\x9ccome here.\xe2\x80\x9d Expressive language\nbegan with imitation training, first nonverbal then\nvocal imitation, beginning with single sounds and\ngradually progressing to words. Requesting was\ntaught as early as possible, initially using nonverbal\nstrategies if necessary (e.g., gesturing, signing, or the\nPicture Exchange Communication System\xe2\x80\x94PECS\n(Bondy & Frost, 1994), in order to reduce frustra\xc2\xad\ntion (Carr & Durand, 1985) and increase the child\xe2\x80\x99s\nfrequency of communicative initiations (Hart &\nRisley, 1975). Children who showed more modest\ngains in treatment, referred to as visual learners by\nthe UCLA group, denoting difficulty in processing\nlanguage, took longer to acquire verbal imitation\nand language.\nHaving learned many labels, children were\ntaught more complex concepts and skills, such as\ncategorization and speaking in full sentences. So\xc2\xad\ncial interaction and cooperative play were taught\nas part of the in-home program, expanding from\nplaying with staff, to playing with siblings, and\nthen peers for up to 2 hours per day (this was\nmore successful with the subgroup of rapidly\nlearning children). As the children acquired social\nskills, they began mainstream (as opposed to spe\xc2\xad\ncial education) preschool, usually for just 1 or 2\nhalf-days (2.5 hours each) per week. A trained\nshadow (one of the home treatment team mem\xc2\xad\nbers) initially accompanied the child to assist with\nattending to the teacher\xe2\x80\x99s instructions, joining\nothers on the playground, and noting social errors\nto be addressed in 1:1 sessions at home.\nThose children who progressed at a rapid pace\n\xc2\xa9 American Association on Mental Retardation\n\n\x0cVOLUME 110, NUMBER 6: 417-438 I NOVEMBER^lft^^\'\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\nIntensive behavioral treatment\n\nwere taught the beginnings of inferential thought\n(e.g., \xe2\x80\x9cWhy does he feel sad?\xe2\x80\x9d). Social and con\xc2\xad\nversation skills, such as topic maintenance and\nasking appropriate questions, were taught using\nrole-playing (e.g., Jahr, Eldevik, & Eikeseth, 2000),\nvideo modeling (Charlop & Milstein, 1989), social\nstories (Gray, 1994), straightforward discussion of\nsocial rules and etiquette, and in-vivo prompting.\nAcademic skills were also targeted, raising the\nlevel of proficiency of rapidly learning children to\nfirst grade levels. Common classroom rules and\nschool \xe2\x80\x9csurvival skills\xe2\x80\x9d (e.g., responding to group\ninstructions and raising one\xe2\x80\x99s hand to be called\non\xe2\x80\x94Dawson & Osterling, 1997) were taught\nthrough \xe2\x80\x9cmock school\xe2\x80\x9d exercises with several\npeers at home.\nStaff training. Therapists were at least 18 years\nold, had completed a minimum of 1 year of col\xc2\xad\nlege, and were screened for prior police contacts.\nTherapists received 30 hours of training, which\nincluded a minimum of 10 hours of one-to-one\ntraining and feedback while working with their as\xc2\xad\nsigned child. Each therapist worked at least 6\nhours per week (usually three 2-hour shifts) and\nattended weekly or bi-weekly team meetings. Se\xc2\xad\nnior therapists had at least a 4-year college degree\nand experience consisting of 1 year as a therapist\nwith at least two children, followed by an inten\xc2\xad\nsive 16-week internship program modeled after\nthat at UCLA, for a total of 2,000 hours.\nTreatment fidelity. Senior therapists and clinicdirected therapists were required to meet quality\ncontrol criteria set at UCLA. This involved pass\xc2\xad\ning two tests. The first was a written test designed\nto assess knowledge of basic behavioral principles\nand treatment procedures described in The Me\nBook (Lovaas et al., 1981). Second, they were re\xc2\xad\nquired to pass a videotaped review of their work\n(conducted by Tristram Smith, research director\nof the Multi-Site Project, who used the protocol\ndescribed by R. Koegel, Russo, and Rincover,\n1977). All senior therapists also received weekly\nsupervision by the senior author.\nProgress reviews, which the child, parents, and\nsenior therapist attended, were held weekly for\nclinic-directed children and every 2 months for\nparent-directed children. At these reviews, the se\xc2\xad\nnior author or the UCLA-trained clinic supervisor\nobserved the child\xe2\x80\x99s performance and recom\xc2\xad\nmended appropriate changes in the program.\nBoth the senior author and clinic supervisor had\nmet the UCLA criteria for Level Two Therapist,\ndenoting sufficient experience and expertise in\n\xc2\xa9 American Association on Mental Retardation\n\nG. O. Sallows and T. D. Graupner\n\nprogram implementation to work independent of\nsupervision. The senior author had directed a behaviorally oriented inpatient unit for children\nwith autism for 14 years and had trained at UCLA\nfor 6 months. The clinic supervisor had a BA in\npsychology, 1 year of experience as a therapist, 2\nyears of full-time experience as a senior therapist,\nand had completed a 9-month internship at\nUCLA.\n\nData Analysis\nData analysis was carried out by a fourth year\ngraduate student from the University of Wiscon\xc2\xad\nsin Department of Statistics, with consultation\nfrom a university research psychologist. We con\xc2\xad\nducted an ANOVA with a least squares solution\nfor unequal group size, used to examine treatment\neffects. To compare the clinic-directed and parentdirected groups, we used 2x2 ANOVAS (ClinicDirected vs. Parent-Directed X Pre- vs. Posttest\nscores as repeated measures). An initial examina\xc2\xad\ntion of pre-post I CL data showed that the distri\xc2\xad\nbution of scores was bimodal. As can be seen in\nFigure 1, children showed either rapid progress or\nmore moderate progress, with no overlap between\noutcome distributions. This is consistent with ear\xc2\xad\nlier research (Bimbrauer & Leach, 1993; Howard,\nSparkman, Cohen, Green, & Stanislaw, 2005; O.\n1. Lovaas, personal communication, August 27,\n2003). Consequendy, changes in scores for rapid\nlearners and moderate learners were analyzed sep\xc2\xad\narately.\n\nYewt of Treatment\n\nFigure 1. Changes in Full Scale LQduring 4 years\nof behavioral treatment.\n423\n\n\x0cPet. Reh. App.29\n\nVOLUME 110, NUMBER 6: 417-438 I NOVEMBER 2005\nIntensive behavioral treatment\n\nIn examining pretreatment scores of children\nwho would later be identified as rapid learners, we\nfound that those in the clinic-directed group had\nhigher mean IQ_ (60.40, standard deviation [SD]\n= 8.31 compared to those in the parent-directed\ngroup (51.00, SD = 7.02), 1(9) =1.84,/> < .05 (one\ntailed), Vineland scores (clinic-directed = 64.8,\nSD = 2.32; parent-directed = 59.83, SD =3.34),\n1(9) = 231, p < .05 (one tailed), and Verbal Im\xc2\xad\nitation (clinic-directed = 3.88; parent-directed =\n1.67), W[4, 6) = 31, p = .03 (Wilcoxon test). Be\xc2\xad\ncause these pretreatment differences would inter\xc2\xad\nfere with clear interpretation of posttreatment dif\xc2\xad\nferences between subgroups (e.g., clinic-directed\nvs. parent-directed rapid learners), these compari\xc2\xad\nsons were omitted. We used linear and logistic\nregression (best subset selection approach\xe2\x80\x94Hosmer, Jovanovic, & Lemeshow, 1989) to develop\nprediction models using pretreatment measures as\npredictors of 3-year outcome.\n\nResults\nThe average Full Scale IQ_for all 23 children\nincreased from 51 to 76, a 25-point increase. Eight\nof the children achieved IQs of 85 or higher after\n1 year of treatment (5 clinic-directed and 3 parentdirected), and 3 more reached this level after 3 to\n4 years (3 parent-directed) for a total of 11, or\n48%, of the 23 children. Children with higher pre\xc2\xad\ntreatment IQs were more likely to reach 4-year\nIQs in the average range (75% of children with\nIQs between 55 and 64 versus 17%, 1 of 6 chil\xc2\xad\ndren with IQs between 35 and 44).\nAs shown in Table 2, there were no significant\ndifferences between groups at pre- or posttest.\nCombining children in both groups, we found\nthat pretest to posttest gains were significant for\nFull Scale IQ* F{ 1, 21) = 18.77, p < .01, Verbal\nIQ* A(l, 18) = 13.39, p < .01, Performance IQ*\nF(1, 18) = 46.79,/\xe2\x80\x99 < .01, receptive language, F{ 1,\n21) = 9.18,/\' < .01, Vineland Communication,\nF{ 1, 21) = 7.57, p < .05, Vineland Socialization,\nA(l, 21) = 10.30,/\xe2\x80\x99 < .01, Autism Diagnostic In\xc2\xad\nterview-Revised Social Skills, F( 1, 18) = 19.15, p\n< .01, and Communication, F(l, 18) = 41.19, p\n< .01.\n\nRapid and Moderate Learners\nA group of rapid learners showed much larger\nimprovements than did moderate learners (anal\xc2\xad\nogous to the terms best outcome and non-best out\xc2\xad\ncome used in UCLA reports). Figure 1 shows Full\n424\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\nG. O. Sallows and T. D. Graupner\n\nScale IQs prior to treatment and over the next 4\nyears for all 23 children. Eleven of them (5 clinicdirected and 6 parent-directed) showed a large in\xc2\xad\ncrease in IQ* from a mean of 55 prior to treatment\nto 104 after 4 years. These rapid learners repre\xc2\xad\nsented 48% of all 23 children. The IQ_of the re\xc2\xad\nmaining 12 children (8 clinic-directed and 4 par\xc2\xad\nent-directed) did not show a significant increase,\nconsistent with earlier UCLA reports (e.g., Smith\net al., 2000).\nPre- and posttreatment scores of rapid and\nmoderate learners are shown in Table 3. Rapid\nlearners showed significant gains in all areas mea\xc2\xad\nsured (i.e., Full Scale IQ*.F(1, 21) = 143.19, p <\n.01, Verbal IQ, F{1, 18) = 70.76, p < .01, Perfor\xc2\xad\nmance IQ*A(1, 18) = 165.27, p < .01, Nonverbal\nIQ, F{ 1, 19) = 16.69, p < .01, Receptive Lan\xc2\xad\nguage, jF(1, 20) = 217.76,/\xe2\x80\x99 < .01, Expressive Lan\xc2\xad\nguage, F(l, 20) = 77.76,/\xe2\x80\x99 < .01, and all Vineland\nsubscales: Communication, F(l, 21) = 147.07, p\n< .01, Daily Living Skills (A(l,21) = 20.50, p <\n.01), Socialization, A(l, 21) = 42.89, p < .01, and\nApplied Behavior Composite, A(l, 21) = 54.17, p\n< .01). However, the rate of increase over time,\nskill areas, and children was not uniform. As can\nbe seen in Figure 2, during the first year, Perfor\xc2\xad\nmance IQ^of rapid learners rose to the average\nrange (a 40-point increase, WPPSI-R), whereas\nVerbal IQ_and Vineland Socialization scores rose\nto around 80 (a 25-point increase) and language\nscores (Reynell and Clinical Evaluation of Lan\xc2\xad\nguage Fundamentals) rose only to the 60s. Chang\xc2\xad\nes during the second year of treatment were com\xc2\xad\nparatively modest, perhaps reflecting the effect of\nhaving acquired speech during the first year but\nstill lacking more complex language. The rate of\nimprovement increased again during the third and\nfourth years, and all scores increased to the aver\xc2\xad\nage range.\nThe gradual decrease in the slope of the\ngraphs in Years 3 and 4 is largely an artifact of\nincreasing age and does not reflect a decrease in\nrate of MA growth, which, except for the large\nincrease during Year 1, averaged 18 months per\nyear throughout the study. This rate of growth in\nskills is necessary for children with pretreatment\nscores below 60 to \xe2\x80\x9ccatch up\xe2\x80\x9d to peers. Although\nsome writers have noted a rate of growth among\ntreated children of 10 to 12 months per year, this\nis . not enough for them to reach scores in the av\xc2\xad\nerage range within just a few years (Howard et al.,\n2005), and the longer that children are delayed,\nthe more skills they must learn to catch up.\n\xc2\xa9 American Association on Mental Retardation\n\n\x0cVOLUME 110, NUMBER 6: 417-438 I NOVEMBER2?fe^e^*\' APP\xe2\x80\x9830\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\nIntensive behavioral treatment\n\nG. O. Sallows and T. D. Graupner\n\nTable 2. Pretreatment and Outcome Scores of Clinic- (CD) and Parent-Directed (PD) Groups\nMeasure/\nGroup\nFull Scale IQ\nCD\nPD\nVerbal IQ\nCD\nPD\nPerform IQ\nCD\nPD\nNonverbal IQ\nCD\nPD\nRec Language\nCD\nPD\nExp Language\nCD\nPD\nVineland\nCom\nCD\nPD\nDLSa\nCD\nPD\nSoc\nCD\nPD\nABCb\nCD\nPD\nADI-RC\nSocial\nCD\nPD\nCom\nCD\nPD\nRitual\nCD\nPD\n\nPretreatment\n\nPosttreatment\n\nANOVA, combined\ngroups, pre- vs.\nposttreatment (df)\n\nMean\n\nSD\n\nMean\n\nSD\n\n50.85\n52.10\n\n10.57\n8.98\n\n73.08\n79.60\n\n33.08\n21.80\n\n18.77 (1,21)**\n\n78.00\n76.30\n\n33.48\n26.66\n\n13.39 (1,18)**\n\n84.90\n90.70\n\n25.86\n20.72\n\n46.79 (1,18)**\n\n70.58\n82.67\n\n16.54\n14.94\n\n77.58\n89.44\n\n25.24\n18.35\n\n2.07 (1,21)\n\n38.85\n38.78\n\n6.09\n6.44\n\n55.85\n65.78\n\n36.23\n25.81\n\n9.18 (1,21)**\n\n47.92\n48.44\n\n6.17\n6.96\n\n53.38\n59.22\n\n31.91\n25.13\n\n1.30 (1,20)\n\n57.46\n63.20\n\n4.97\n5.58\n\n73.69\n81.40\n\n32.32\n24.33\n\n7.57 (1,21)*\n\n63.92\n64.20\n\n5.53\n3.68\n\n66.23\n64.20\n\n25.95\n12.42\n\n.11 (1,21)\n\n58.38\n60.30\n\n6.17\n5.76\n\n73.92\n68.90\n\n23.49\n10.11\n\n10.30 (1,21)**\n\n59.54\n60.90\n\n5.31\n5.94\n\n69.00\n66.70\n\n28.04\n14.68\n\n17.54\n18.90\n\n3.73\n1.14\n\n12.33\n13.10\n\n10.58\n9.42\n\n19.15 (1,18)**\n\n12.85\n12.90\n\n2.44\n1.22\n\n8.08\n8.80\n\n6.91\n7.43\n\n41.19 (1,18)**\n\n5.38\n6.40\n\n1.69\n1.11\n\n5.08\n5.60\n\n3.75\n3.50\n\n1.72 (1,18)\n\n2.81 (1,21)\n\nNote. CD n = 13; PD n = 10 except for Verbal IQ_and Performance IQ, where n was 10 for both groups because 3\nCD children had only Bayley tests. Neither the main effect of groups (CD vs. PD) nor the interaction of groups by time\nwas significant for any variable. Full scale IQs at pretreatment are Bayley scores.\n\xe2\x80\x99Daily living skills. hAdaptive Behavior Composite. \xe2\x80\x98Autism Diagnostic Interview-Revised.\n*/> < .05. **p < .01.\n\n\xc2\xa9 American Association on Mental Retardation\n\n425\n\n\x0cPet. Reh. App.31\n\nVOLUME 110, NUMBER 6: 417-438 I NOVEMBER 2005\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\nIntensive behavioral treatment\n\nG. O. Sallows and T. D. Graupner\n\nTable 3. Pretreatment and Outcome Scores of Rapid (R) and Moderate (M) Learners\nMeasure/\nGroup\nFull Scale IQ\nR\nM\nVerbal IQ\nR\nM\nPerform IQ\nR\nM\nNonverbal IQ\nR\nM\nRec Language\nR\nM\nExp Language\nR\nM\nVineland\nCom\nR\nM\nDLS3\nR\nM\nSoc\nR\nM\nABCb\nR\nM\nADI-RC\nSocial\nR\nM\nCom\nR\nM\nRitual\nR\nM\n\nPretreatment\n\nPosttreatment\n\nMean\n\nSD\n\nMean\n\nSD\n\nANOVA Pre-Post\ncomparisons\n\n55.27\n47.83\n\n8.96\n9.37\n\n103.73\n50.42\n\n13.35\n6.98\n\n143.19 (1,21)**\n0.45 (1,21)\n\n101.45\n47.44\n\n18.72\n2.06\n\n70.76 (1,18)**\n.02 (1,18)\n\n107.55\n63.67\n\n9.44\n8.43\n\n165.27 (1,18)**\n11.81 (1,18)**\n\n83.56\n69.83\n\n14.84\n15.93\n\n108.78\n67.70\n\n10.96\n12.35\n\n16.69 (1,19)**\n0.19 (1,19)\n\n39.30\n38.42\n\n6.91\n5.59\n\n93.60\n31.83\n\n12.64\n9.87\n\n217.76(1,20)**\n3.84 (1,20)\n\n49.90\n47.50\n\n7.75\n6.54\n\n85.70\n30.83\n\n15.07\n5.89\n\n77.76 (1,20)**\n20.24 (1,20)**\n\n60.82\n59.17\n\n4.02\n7.22\n\n105.09\n51.33\n\n12.83\n10.94\n\n147.07 (1,21)**\n5.07 (1,21)*\n\n66.45\n61.83\n\n4.25\n4.20\n\n82.27\n49.83\n\n16.34\n10.61\n\n20.50 (1,21)**\n12.87 (1,21)**\n\n61.55\n57.08\n\n6.58\n4.63\n\n87.73\n57.08\n\n14.94\n6.40\n\n42.89 (1,21)**\n0.00 (1,21)\n\n61.73\n58.67\n\n4.59\n6.09\n\n88.64\n49.08\n\n15.68\n7.76\n\n54.17 (1,21)**\n7.51 (1,21)*\n\n16.45\n19.67\n\n3.26\n1.55\n\n4.18\n21.18\n\n4.37\n6.28\n\n46.89 (1,21)**\n0.43 (1,21)\n\n11.00\n13.75\n\n3.54\n0.60\n\n2.00\n14.81\n\n2.73\n3.59\n\n52.04 (1,21)**\n1.26 (1,21)\n\n5.91\n5.92\n\n1.62\n1.44\n\n2.73\n7.91\n\n2.67\n2.47\n\n16.46 (1,21)**\n4.87 (1,21)*\n\nNote. R n = 11; M n = 12. Posttreatment language scores for moderate learners are Reynell ratio scores (AE/CA), which\nare about 10 points lower than standard scores. Effect size expressed as proportion of variance was .88 for Full Scale IQ,\n.90 for receptive language, .84 for expressive language, and .73 for Vineland ABC, all quite large (Cohen, 1988). Full\nScale IQs at pretreatment are Bayley scores.\n"Daily living skills. bAdaptive Behavior Composite. \'Autism Diagnostic Interview-Revised.\n*p < .05. **p < .01.\n426\n\n\xc2\xa9 American Association on Mental Retardation\n\n\x0cVOLUME 110, NUMBER 6: 417-438 I NOVEMBER^olft^^\xe2\x80\x99 ^PP\'^\n\nIntensive behavioral treatment\n\nG. O. Sallows and T. D. Graupner\n\n110\n\n100\n\n\xc2\xa7\n8\nT? 70\n\ni\n\n3\n<o\n\nPie treatment\n\n2yr\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\n3yr\n\n4yr\n\nYear in treatment\n\nFigure 2. Mean IQ, language, and socialization\nscores during treatment for rapid (RL) and mod\xc2\xad\nerate (ML) learners. Initial IQ^and language scores\nare ratio scores as are all language scores of mod\xc2\xad\nerate learners.\nMost parents waited until their children were\n6 years old to enter kindergarten, per our recom\xc2\xad\nmendation, in order to allow them more time to\nacquire social interaction skills. At an average age\nof 7.67, the 11 rapidly learning children were suc\xc2\xad\nceeding in regular first or second grade classes fol\xc2\xad\nlowing the regular curriculum. On the Woodcock\nJohnson III Tests of Achievement, Oral Expres\xc2\xad\nsion averaged 102 (SD = 11.9, 1 scored below 85),\nListening Comprehension averaged 101 (SD =\n15.27, 2 scored below 85), Broad Reading aver\xc2\xad\naged 105 (SD = 11,9, all scored over 85), Broad\nMath averaged 104 (SD = 18.4, one scored below\n85), Spelling averaged 112 (SD = 18.83, all scored\nover 85) and general Academic Knowledge aver\xc2\xad\naged 98 (SD = 18.1, 2 scored below 85). Three\nchildren had aides because of inattentiveness and\n3 received speech therapy, although all spoke fluendy.\nThe 12 moderate learners showed a significant\nimprovement in Performance IQ, T(l, 18) =\n11.81, p < .01, as shown in Table 3, but the post\xc2\xad\ntreatment mean score (63.67) was over two SDs\nbelow the average range. Although these children\ndid not \xe2\x80\x9ccatch up\xe2\x80\x9d to peers, they did show in\xc2\xa9 American Association on Mental Retardation\n\ncreases in developmental age equivalents. Cogni\xc2\xad\ntive skills increased from 16 to 44 months; adap\xc2\xad\ntive skills, from 16 to 37 months; language skills,\nfrom less than 12 months to 27 months; and so\xc2\xad\ncial skills, from 10 to 31 months. At the end of\nthe study, these children were continuing to gain\nskills at a rate of 3.4 to 4.3 months per year in\nexpressive language and social skills, respectively.\nAll but 2 of them acquired speech, allowing them\nto communicate basic needs while also reducing\nfrustration. Two thirds learned to read simple sto\xc2\xad\nries (e.g., first grade level words with two sentences\nper page). Most acquired the ability to relate to\nothers and to play with peers. Four of the children\nwere in regular classes with an aide, but all had a\nmodified curriculum. Six children had a mixture\nof some time in regular class and some time in\nspecial education, and 2 were in full-time special\neducation classes (one for students with cognitive\ndisabilities and the other for those with emotional\ndisturbances).\n\nAssessment of Residual Symptoms in Rapidly\nLearning Children\nParents completed the Personality Inventory\nfor Children for all 23 children. As shown in Ta\xc2\xad\nble 4, rapidly learning children as a group scored\nin the average range on all factor scales, although\n2 scored in the clinically significant range on Fac\xc2\xad\ntor III (they tended to worry). Moderate learners\nwere rated as having more tantrums (Factor I),\nmore difficulty interacting with others (Factor II),\nand more learning problems (Factor IV).\nParents and teachers completed the Child Be\xc2\xad\nhavior Checklist for all 23 children. Results were\nanalyzed using 2X2 ANOVAS (Rapid Learners\nvs. Moderate Learners X Parent vs. Teacher as re\xc2\xad\npeated measures). As shown in Tables 4 and 5,\nrapid learners as a group scored in the nonclinically significant range on all scales, although they\ndid score less normally than did moderate learners\non Scale 3 (they worried more). Moderate learners\nwere rated as less interactive (Scale 1), more pre\xc2\xad\noccupied (Scale 5), less attentive (Scale 6), and\nmore easily frustrated (Scale 8).\nThe differences in Child Behavior Checklist\nratings between parents and teachers were small,\nreaching significance on two scales (1 and 8).\nHowever, these results largely reflected differences\nwithin the average range. Parents did not rate any\nchildren in the clinically significant range on ei\xc2\xad\nther scale, and teachers rated only 2 children on\n427\n\n\x0cPet. Reh. App.33\n\nVOLUME 110, NUMBER 6: 417-438 I NOVEMBER 2005\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\nIntensive behavioral treatment\n\nG. O. Sallows and T. D. Graupner\n\nTable 4. Mean Scores of Rapid and Moderate Learners on Posttreatment Only Tests of Residual\nSymptoms: Parent Ratings\nPICa factor\nLearner\n\nChild Behavior Checklist6 scale\nIV\n\n1\n\n3\n\n4\n\n6\n\n5\n\n8\n\nRapid (R)\n(n = 11) 53.45\n(.SD)\n(9.38)\n\n62.36\n(8.34)\n\n55.27 64.18\n(13.90) (13.65)\n\n59.09 55.40\n(6.26) (6.14)\n\n57.82\n(7.49)\n\n49.73 97.55\n(8.77) (18.77)\n1.06\n5.13**\n\n58.83 51.75\n61.92\n(6.27) (3.06)\n(7.35)\n0.01\n1.80*\n1.61\n\n65.64 62.64\n(9.87) (9.12)\n\n52.91\n(4.98)\n\n70.42 67.67\n(7.92) (8.17)\n1.64\n1.73*\n\n53.33\n(4.62)\n0.08\n\nModerate (M)\n(n = 12) 66.83\n79.25\n(SD)\n(12.93)\n(9.42)\nR vs. Mc\n3.43**\n4.86**\n\n\'Personality Inventory for Children and Child Behavior Checklist scores a70 are clinically significant and scores a67\nare borderline. Scores below those levels are not reliably different from \xe2\x80\x9cnormal\xe2\x80\x9d (Achenbach, 1991b; Lacher, 1982).\nFactor I = Undisciplined/Poor Self Control, II = Social Incompetence, III = Intemalizing/Somatic Symptoms, IV =\nCognitive Development. bScale I = Withdrawn, 3 = Anxious/Depressed, 4 = Social Problems, 5 = Thought Problems,\n6 = Attention Problems, 8 = Aggression. ct tests are one-tailed, with a df of 19.\n*p < .05. **p < .01.\n\nScale 1 (both moderate learners) and 3 on Scale\n8 in the significant range (1 rapid and 2 moderate\nlearners).\nWhereas checklists such as the Personality In\xc2\xad\nventory for Children and the Child Behavior\nChecklist can be used to assess the presence of\nproblems, the Classroom Edition of the Vineland\nis used to assess the presence of skills (e.g., \xe2\x80\x9cini\xc2\xad\ntiates conversation,\xe2\x80\x9d \xe2\x80\x9cresponds to hints or indi\xc2\xad\nrect cues in conversation\xe2\x80\x9d). Teachers completed\nthis measure for all 23 children except the 2 who\nwere among the highest functioning. As shown in\nTable 5, teacher ratings of Communication and\nSocialization for the remaining 9 rapid learners\nwere in the average range. Moderate learners were\nrated as having deficiencies in both areas.\n\nWe examined changes in behavior related to\ndiagnosis by comparing the Autism Diagnostic In\xc2\xad\nterview-Revised administered prior to and after 3\nyears of treatment using 2X2 ANOVAS (Rapid\nLearners vs. Moderate Learners X Pretreatmentvs.\nPosttreatment as repeated measures). As shown in\nTable 3, rapid learners as a group showed signifi\xc2\xad\ncant improvements on all three Autism Diagnos\xc2\xad\ntic Interview scales: Communication, E(l, 21) =\n52.04, p < .01, Reciprocal Interaction, F{ 1, 21) =\n46.89, p < .01, and stereotyped behaviors, E(l, 21)\n= 16.46, p < .01. Eight of 11 rapid learners scored\nin the nonautistic range in all three areas, and\nmany had their diagnoses removed by the refer\xc2\xad\nring child psychiatrists. Of the rapid learners who\nhad remaining problems, 1 still had some lan-\n\nTable 5. Mean Scores of Rapid and Moderate Learners on Posttreatment Only Tests of Residual\nSymptoms: Teacher Ratings\nVineland\nLearners\nRapid (R)\nn = 11 (SD)\nModerate (M)\nn = 12 (SD)\nR vs. M6\n\nComm.\n\nSocial\n\n94.44\n(13.97)\n58.58\n(7.90)\n6.84**\n\n89.89\n(18.36)\n61.58\n\n(6.02)\n4.60**\n\nChild Behavior Checklist scales3\n1\n57.00\n(7.34)\n64.33\n\n(6.03)\n2.93**\n\n3\n\n4\n\n5\n\n6\n\n8\n\n55.90\n(6.93)\n55.17\n(6.56)\n0.36\n\n56.73\n(6.30)\n58.00\n(5.57)\n0.37\n\n65.55\n(11.37)\n72.58\n(7.06)\n2.41*\n\n59.36\n(12.33)\n63.25\n(7.94)\n1.33\n\n57.60\n(6.11)\n61.25\n(7.45)\n2.86**\n\n\xe2\x80\x98Child Behavior Checklist scores s67 are borderline. Scores below these levels are not reliably different from \xe2\x80\x9cnormal\xe2\x80\x9d\n(Achenbach, 1991b; Lacher, 1982). t tests are one-tailed. Scale 1 = Withdrawn, 3 = Anxious/Depressed, 4 = Social\nProblems, 5 = Thought Problems, 6 = Attention Problems, 8 = Aggression. b/ tests are one-tailed, with a df of 19.\n*p < .05. **p < .01.\n428\n\n\xc2\xa9 American Association on Mental Retardation\n\n\x0cVOLUME 110, NUMBER 6: 417-438 | november2?1)$^^\' APP\'34\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\nIntensive behavioral treatment\n\nG. O. Sallows and T. D. Graupner\n\nTable 6. Combined Parents\xe2\x80\x99 and Teachers\xe2\x80\x99 Ratings of Residual Symptoms of Rapid Learners\n\nChild3\n\nSocial Skills\nVABSb\nCom, Soc\n\nIsolates\nPIC 1&2\n\nNot liked\nCBC 1,4\n\nAnxious\nCBC 3, PIC 3\n\n50\n50\n51\n57.5\n51\n\n50\n50\n50\n68.3\n56.3\n\n47.7\n48.3\n51.3\n52\n60\n\n59\n\n55.3\n57.3\n60\n63.8\n61.3\n62.3\n\n68.3\n46.3\n51.3\n63.7\n67.0\n51.3\n\nInattntn\nCBC 5,6\n\nMoody CBC 8\n\nCD\n1\n2\n3\n4\n5\n\n104\n115.5\n115\n101.3\n\n95.5\n\n50\n50\n55\n\n79.5\n62.5\n\n50\n50\n50\n65.5\n53\n\nPD\n1\n2\n3\n4\n5\n6\n\n107.5\n\n54\n54.5\n67.5\n54.5\n77.5\n64.8\n61.5\n77.5\n69\n70.8\n58\n86.5\n67\n67.8\n51\n99.5\n64\n65\n55.5\n\xe2\x80\x9cCD = clinic directed, PD = parent directed. bVineland Adaptive behavior Scales (VABS) scores below 85 are moderately\nlow and 116-130, moderately high. \'Personality Inventory for Children (PIC) and Child Behavior Checklist (CBC) scores\n\xc2\xa370 are clinically significant; and \xc2\xa367, borderline; below these levels, are not reliably different front \xe2\x80\x9cnormal\xe2\x80\x9d (Achenbach, 1991b; Lacher, 1982).\n\n79.5\n\n54.5\n67.5\n\nguage delays, 1 was rigid in play, and 1 was ele\xc2\xad\nvated in all three areas. The latter child had re\xc2\xad\nceived treatment from a non-UCLA affiliated pro\xc2\xad\nvider after the first year.\nCombined measures of residual symptoms are\nshown in Table 6. Eight of 11 rapid learners\nshowed increases in social skills to the adequate\nrange (above 85), although 3 had some borderline\nproblems, including 1 who had significant prob\xc2\xad\nlems with Preoccupation/Inattention. The remain\xc2\xad\ning 3 rapid learners showed moderately low social\nskills (below 85), and 2 had problems with Pre\xc2\xad\noccupation/Inattention, one of which was clini\xc2\xad\ncally significant. All 3 of these latter children were\nin the parent-directed group and took longer than\n2 years to achieve IQjn the average range. These\nresults are similar to those described in UCLA re\xc2\xad\nports, where 3 of 8 best outcome children scored\nbelow 85 on Vineland Communication, 3 were\nelevated on the Vineland Maladaptive Behavior\nscale, and 5 had at least one significant elevation\non the Personality Inventory for Children. In in\xc2\xad\nterpreting these results, McEachin et al. (1993)\nnoted that 3 of their nonclinical children also had\nsignificant Personality Inventory elevations.\n\nPredicting Outcome\nEarly horning measure. Performance of rapid\nand moderate learners on each of the four sub\xc2\xad\n\xc2\xa9 American Association on Mental Retardation\n\nscales of the Early Learning Measure is shown in\nFigure 3. As can be seen, the difference in their\nrates of learning was evident early in treatment.\nThirteen of 23 children passed the Early Learning\nMeasure (90% correct on verbal imitation). All 11\nwho later achieved scores in the average range\npassed by 16 weeks of treatment (9 children) or\nbefore reaching 42 months of age (2 children).\nPretreatment variabhs. Table 7 shows correla\xc2\xad\ntions between pretreatment variables and three\noutcome variables following 3 years of treatment:\n(a) Full Scale IQj (b) Language, defined as the\nmean of three measures\xe2\x80\x94Vineland Communica\xc2\xad\ntion scores from parents and teachers representing\nlanguage usage at home and school and language\nscores from the Reynell or Clinical Evaluation of\nLanguage Fundamentals; (c) Social Skills, defined\nas the mean of three measures\xe2\x80\x94Vineland Sociali\xc2\xad\nzation scores from parents and teachers and Fac\xc2\xad\ntor II (Social Incompetence) from the Personality\nInventory for Children.\nThe ability to imitate on the Early Learning\nMeasure was highly correlated with outcome in\nall three areas. Seven children were able to imitate\n3 of 20 sounds prior to treatment (mean total\nsounds imitated during the first three Early Learn\xc2\xad\ning Measures was 2.43, range = 0 to 15, SD =\n4.04), and all went on to achieve IQs in the av\xc2\xad\nerage range.\n429\n\nv.\n\xc2\xab\n\n\x0cPet. Reh. App.35\n\nVOLUME 110, NUMBER 6: 417-438 I NOVEMBER 2005\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\nIntensive behavioral treatment\n\n\xe2\x80\xa2\n\nRapid Learners\n\nA\n\nModerate Learners\n\nTreabnert\n\nBasetne\n\n12\n\nG. O. Sallows and T. D. Graupner\n\n10\n\n10\n\n8\n\n8\n\nI6\n\nI6\n\n4\n\n4\n\n2\n\n2\n\nI\n\nTreatment\n\nBanlme\n\n12\n\ne\n\nO\n\noM\n1\n\n2\n\nM\n\nI \'A\n\ni\n\ni---- .-----1-----.---- .-----.\n\n3\n\n5\n\n7\n\n8\n\n4\n\n6\n\n9\n\n10\n\n11\n\n12\n\n13\n\n1\n\n2\n\n3\n\n4\n\nThree Week Probes\nVerbal Irritation (90 % Correct)\n\nBasetne\n\n1\n\nTreatment\n\n121\n\n10\n\n10\n\n8\n\n6\n\n\xc2\xab\n\nI\n\no\n\n4-\n\n2\n\n2\n\n1\n\n2\n\n3\n\n4\n\n5\n\nThree Week Probes\nReceptive Instructions (SO % Correct)\n\n7\n\n8\n\n9\n\n10\n\n11\n\n12\n\n13\n\nTreatment\n\nBaseline\n\n6\n\n4\n\n0#\' \xe2\x80\xa2 \xe2\x80\xa2 \xe2\x99\xa6 \xe2\x80\xa2\n\n6\n\nThree Week Probes\nNon Verbal Irritation (90 % Correct)\n\n12\n\n6\n\n5\n\n0* \xe2\x80\xa2\n1 2\n\n\xe2\x80\xa2\n3\n\n\xe2\x80\xa2\n4\n\n\xe2\x80\xa2 \xe2\x96\xa0\n5\nThree Week Probes\nExpressive Labels (80 % Correct)\n\nFigure 3. Performance of rapid (RL) and moderate (ML) learners on the Early Learning Measure.\n\nWe used linear regression using the best sub\xc2\xad\nset approach (Hosmer et al., 1989) to select the\nmost powerful predictors for each outcome area.\nBased on previous research, potential predictor\nvariables included IQ, imitation, language, social\nrelatedness, and severity of symptoms. Posttreat\xc2\xad\nment IQ_was best predicted by the subset of var\xc2\xad\niables including pretreatment Early Learning Mea\xc2\xad\n430\n\nsure (receptive language, nonverbal imitation, and\nverbal imitation), pretreatment IQi Autism Diag\xc2\xad\nnostic Interview Impairment in Social Interaction\n(low social interest, unresponsive to others\xe2\x80\x99 ap\xc2\xad\nproaches, lack of shared attention), and Autism\nDiagnostic Interview Communication scores. This\nset of variables yielded a correlation of .83 with\nposttreatment IQi which is a strong relationship.\n\xc2\xa9 American Association on Mental Retardation\n\n\x0cVOLUME 110, NUMBER 6: 417-438 I NOVEMBER^!)?}\n\n^PP-36\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\nIntensive behavioral treatment\n\nG. O. Sallows and T. D. Graupner\n\nTable 7. Correlations Between Pretreatment and Posttreatment Measures\nFollow-up\nOne year\nPretreatment measure3\n\nThree year\n\nIQ\n\nIQ change\n\nIQ\n\nLanguage\n\nSocial\n\n.46*\n\n.37\n.19\n\n.35\n.24\n\n.41\n.27\n\n.45*\n.31\n\n.41\n.54*\n.27\n.59**\n\n.71**\n.46*\n\n.69**\n.56**\n.56**\n.69**\n\n.81**\n\nReynell\nExpressive\nComprehension\n\n.30\n\nELM\nNonverbal Imitation\nExp. Labeling\nRec. Instructions\nVerbal Imitation\n\n.59**\n.48*\n.47*\n\n.62**\n\n.56**\n.65**\n\n.65**\n.67**\n.80**\n\nVABS\nCommunication\nDLSb\nMotor\nSocialization\nComposite\nMerrill-Palmer IQ\nBayley Ratio IQ\n\n.49*\n.57*\n\n.20\n.51*\n\n.35\n.40\n.16\n.31\n.32\n-.01\n-.01\n\n.33\n.57**\n.17\n.41*\n.37\n.08\n.45*\n\n-.49*\n-.22\n-.12\n\n-.35\n-.18\n-.17\n\n-.59**\n-.63**\n\n.36\n.44*\n\n.56*\n\n44*\n\n.41\n\n60**\n\n.63**\n\n22\n43*\n43*\n34\n\n.27\n.47*\n.46*\n-.07\n.28\n\n-.52*\n-.50*\n-.10\n92**\n84**\n\n-.57**\n-.52*\n-.10\n82**\n75**\n\n06\n\nADI-R\nCommunication\nSocialization\nRitualistic\nFirst year IQ change\nIQ at one year\n\n.86**\n.86**\n\n-.12\n.87**\n.75**\n\n"Reynell = Reynell Developmental Language Scales, ELM = Early Learning Measure, VABS = Vineland Adaptive\nBehavior Scales, ADI-R = Autism Diagnostic Interview-Revised. bDaily Living Skills.\n*p < .05. **p < .01.\n\nThe amount of variation in posttreatment I(^ex\xc2\xad\nplained by this subset of pretreatment variables\nwas 70%.\nSocial skill acquisition was also predicted by\nthe pretreatment ability to imitate. The subset of\nvariables, including pretreatment Early Learning\nMeasure scores (receptive language, nonverbal im\xc2\xad\nitation, and verbal imitation) and Autism Diag\xc2\xad\nnostic Interview Communication yielded a cor\xc2\xad\nrelation of .90 with posttreatment social skill\nscores, a strong relationship. The amount of var\xc2\xad\niance in posttreatment social skill scores explained\nby this subset of pretreatment variables was 82%.\nFinally, language skill acquisition was also pre\xc2\xad\ndicted by the pretreatment ability to imitate. The\nsubset of variables including pretreatment Early\nLearning Measure scores (receptive language, non\xc2\xad\n\xc2\xa9 American Association on Mental Retardation\n\nverbal imitation, and verbal imitation), Vineland\nDaily Living Skills, and Autism Diagnostic Inter\xc2\xad\nview Communication yielded a correlation of .87\nwith posttreatment language scores, a strong rela\xc2\xad\ntionship. The amount of variance in posttreat\xc2\xad\nment language scores explained by this subset of\npretreatment variables was 75%.\nParents of 6 children (26%) reported acquisi\xc2\xad\ntion of 5 to 25 words, all of which were later lost\nbetween 15 and 26 months of age. Language re\xc2\xad\ngression in other studies has varied between 20%\nand 50% (Howlin, 1998), with a mean near 30%\n(Shinnar et al., 2001) and median age of 18\nmonths (Tuchman & Rapin, 1997). Shinnar et al.\nreported that among those children who regained\nsome language, only 8% achieved typical lan\xc2\xad\nguage. In the present study, loss of speech was not\n431\n\n\x0cPet. Reh. App.37\n\nVOLUME 110, NUMBER 6: 417-438 I NOVEMBER 2005\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\nIntensive behavioral treatment\n\nG. O. Sallows and T. D. Graupner\n\nrelated to outcome. Three rapid learners and 3\nmoderate learners had a clear loss, and 6 rapid\nlearners and 2 moderate learners had no loss (Rap\xc2\xad\nid Learners vs. Moderate Learners X Pre- vs. Post\xc2\xad\ntreatment, x2 (1, N =14) = .16, ns. Three of 6\nchildren with clear regression (50%) achieved typ\xc2\xad\nical language. However, having no speech at the\nstart of treatment (age 36 months), whether from\nearlier loss (and not having recovered any) or nev\xc2\xad\ner having developed speech, was associated with\nslower learning.\nWe used logistic regression to develop models\nto predict the probability of achieving 3-year out\xc2\xad\ncome scores in the average range based on pre\xc2\xad\ntreatment measures. The most accurate model for\nthe current set of data combined pretreatment\nVerbal Imitation from the Early Learning Measure\nand pretreatment Autism Diagnostic Interview\nCommunication as follows: p/(l-p) = ey, where e\n= (approximately) 2.718284 and y = [1.76 (total\nverbal imitation items correct out of 20 trials from\nstandard set administered three times, 3 weeks\napart) \xe2\x80\x942.64 (Autism Diagnostic Interview-Com\xc2\xad\nmunication score) + 32.57]. Using a score above\n0.5 to classify children as potentially \xe2\x80\x9cbest out\xc2\xad\ncome,\xe2\x80\x9d this model correctly predicted 10 of 11\nsuch children (sensitivity = 10/11 = .91), with\none false positive and one false negative (specific\xc2\xad\nity = 21/23 = .91). Predictive power was .91.\nHours of treatment. Table 8 shows the distri\xc2\xad\n\nbution of direct intervention hours for rapid\nlearners during treatment. Most children received\npredominantly 1:1 intervention during the first\nyear, and then gradually spent more time in\nschool. Once children were able to use language,\ntreatment was focused increasingly on building\nthe social skills necessary to function in school\nand to interact with peers.\nThe number of weekly hours of treatment\nseemed less related to outcome than did pretreat\xc2\xad\nment variables. Rapid learners averaged 34 hours\nper week during the first year (range = 25 to 40)\nand 31 during the second year (range = 20 to 39).\nThose who learned at a more moderate rate had\nidentical averages, although they had less peer\nplay due to limited play and language skills.\nThe hours shown in Table 8 do not include\ntime spent by parents generalizing gains made in\ntherapy, which they found quite difficult to esti\xc2\xad\nmate. In an effort to assess the impact of parental\ninvolvement, senior therapists rated parents on\nthe percentage of involvement in their child\xe2\x80\x99s\ntreatment during the first year. Although the cor\xc2\xad\nrelation with outcome, r = .32, was not signifi\xc2\xad\ncant, the real impact of parental involvement may\nnot be seen until formal treatment has ceased,\nwhen parents who were more involved all along\nand, therefore, acquired more skills, may be better\nprepared to help their child deal with new chal\xc2\xad\nlenges.\n\nTable 8. Average Allocation of Treatment Hours Over Time for Rapid Learners\nYears of treatment\nStaffing\n\nn\n1:1\nSchool\nSchool shadow\nPeer shadow\nTotal\n\n.5\n11\n33\n(15-40)\n5\n(0-12)\n1\n(0-5)\n0\n(0)\n34\n(25-40)\n\n1\n11\n\n1.5\n\n10\n24\n(16-35) (10-33)\n6\n8\n(0-12)\n(0-25)\n1\n4\n(0-5)\n(0-15)\n3\n3\n(0-5)\n(0-5)\n33\n31\n(26-40) (20-37)\n\n29\n\n2\n\n2.5\n\n8\n7\n20\n22\n(15-31) (10-27)\n12\n8\n(0-16)\n(8-20)\n5\n8\n(0-15)\n(3-15)\n6\n5\n(2-9)\n(0-9)\n33\n33\n(20-39) (25-37)\n\n3\n7\n18\n(5-28)\n13\n(8-25)\n11\n(6-18)\n4\n(0-8)\n33\n(20-40)\n\n3.5\n\n4\n\n4.5\n\n7\n6\n6\n15\n12\n10\n(0-25) (4-25)\n(0-15)\n33\n18\n28\n(8-30) (15-35) (25-35)\n7\n5\n5\n(0-18) (0-12)\n(2-15)\n4\n2\n3\n(2-8)\n(0-6)\n(0-4)\n17\n26\n21\n(7-40) (6-31) (12-20)\n\nNote. Ranges are in parentheses. Total hours include school hours only when a shadow was present. Hours are for\nchildren still in treatment at each point in time. One child transferred to another provider after 1 year. Children began\n\xe2\x80\x9cgraduating\xe2\x80\x9d from treatment after 2 years. Children who had difficulty learning complex material maintained full hours\nlonger, but treatment focused more on 1:1 hours to teach skills and less on peer interaction due to lower social interest\nand language delays.\n432\n\n\xc2\xa9 American Association on Mental Retardation\n\n\x0cVOLUME 110, NUMBER 6: 417-438 I NOVEMBER^o!)\'!)^^\' ^PP-38\n\nIntensive behavioral treatment\n\nAmong rapid learners, the number of hours\nof structured home-based peer play was signifi\xc2\xad\ncantly related to teachers\xe2\x80\x99 ratings of social skills at\n4 years. Although most children began peer play\nby 48 months of age, those who were subsequent\xc2\xad\nly rated by teachers as being within the average\nrange (Vineland Socialization score of at least 90,\nand no Child Behavior Checklist scores over 65\non Scale 1 (Withdrawn) or Scale 4 (Social Prob\xc2\xad\nlems), had several things in common. By age 54\nmonths, they were all receiving at least 6 (mean\n= 8) hours of supervised peer play per week with\nat least two unfamiliar peers (i.e., not siblings or\ncousins), and this continued for at least 6 months\n(M = 13), p = .008 (Fisher Exact Test).\nSupplemental treatments. Of the 23 children\nparticipating, 22 received some type of supple\xc2\xad\nmental treatment prior to or during the first year\nof treatment (19 of 23 children). These services\nconsisted of special education (21), preschool (2),\nand private therapies beyond what was offered in\nschool: speech (5), sensory integration (7), audi\xc2\xad\ntory integration training (2), music therapy (1),\nand horseback riding (1). Hours per week of sup\xc2\xad\nplemental treatment ranged from 0 to 14 (average\n= 6) prior to and 0 to 15 (average = 7) hours\nduring the first year of treatment. Between the\nfirst and third year of treatment, biomedical man\xc2\xad\nagement became more popular, and more parents\ntried them. Nine children were on Gluten-Casein\nfree diets (for 1 month to 21 months), 10 received\nmega-vitamins and/or dimethylglycine\xe2\x80\x94DMG\n(for 1 month to 3 years), 4 received Secretin (1 to\n4 doses), 4 were given Nystatin (for 1 month to\n12 months), and 1 received 20 doses of Intrave\xc2\xad\nnous Immune Globulin. However, the correlation\nbetween hours of supplemental treatment and\noutcome (-.335 with IQ, \xe2\x80\x94.384 with language,\nand \xe2\x80\x94.334 with socialization) and that between\nthe use of biomedical treatments and outcome\n(\xe2\x80\x94.050 with IQ, \xe2\x80\x94.108 with language, and \xe2\x80\x94.141\nwith socialization) were low and not significant,\nsupporting the conclusion that the increases in\nskills observed in this study were not the result of\nthese interventions.\n\nDiscussion\nIn the present study we demonstrated that the\nUCLA early intensive behavioral treatment pro\xc2\xad\ngram could be implemented in a clinical setting\noutside a university with a similar sample and that\nthe earlier findings by the UCLA group regarding\n\xc2\xa9 American Association on Mental Retardation\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\nG. O. Sallows and T. D. Graupner\n\nfavorable outcome (Lovaas, 1987; McEachin et\nal., 1993) could in large part be replicated without\naversives. Following 2 to 4 years of treatment, 11\nof 23 children (48%) achieved Full Scale IQs in\nthe average range, with IQ_ increases from 55 to\n104, as well as increases in language and adaptive\nareas comparable to data from the UCLA project.\nAt age 7, these rapid learners were succeeding in\nregular first or second grade classes, demonstrated\ngenerally average academic abilities, spoke fluent\xc2\xad\nly, and had peers with whom they played regular\xc2\xad\nlyParent-directed children, who received 6\nhours per month of supervision (usually 3 hours\nevery other week, which is much more than \xe2\x80\x9cpar\xc2\xad\nent-managed\xe2\x80\x9d or \xe2\x80\x9cworkshop\xe2\x80\x9d supervision), did\nabout as well as clinic-directed children, although\nthey received much less supervision. This was un\xc2\xad\nexpected, and it may have been due in part to\nparent-directed parents taking on the senior ther\xc2\xad\napist role, filling cancelled shifts themselves, ac\xc2\xad\ntively targeting generalization, and pursuing\nteachers and neighbors to find peers for daily play\ndates with their children. Adthough many parentdirected parents initially made decisions regarding\ntreatment that resulted in their children progress\xc2\xad\ning slowly (e.g., using their treatment hours for\nineffective interventions or pushing children to\nlearn advanced skills before they were ready), re\xc2\xad\nsulting in frustration and occasionally \xe2\x80\x9cshutting\ndown,\xe2\x80\x9d many parents then sought input from\ntreatment supervisors and rapidly learned to avoid\nmaking the same mistake twice, becoming quite\nskillful after a few months.\nSeveral measures were used to assess residual\nsymptoms of autism among rapid learners, and\nwhile generally not clinically significant, some\nwere found, particularly among those children\nwho achieved average IQ_ after several years of\ntreatment. About one third of the rapid learners\nwere seen as having mild delays in social skills.\nSeeming preoccupied was also a common prob\xc2\xad\nlem for which 3 children were assigned classroom\naides because they \xe2\x80\x9cneeded reminders to stay on\ntask.\xe2\x80\x9d Lovaas (1987) did not mention that aides\nwere assigned to any of his \xe2\x80\x9cbest outcome\xe2\x80\x9d chil\xc2\xad\ndren, and it is possible that our children were not\nas \xe2\x80\x9cnormal.\xe2\x80\x9d However, McEachin et al. (1993)\nfound that in spite of scoring in the clinically sig\xc2\xad\nnificant range in one or two areas, children were\nable to maintain their skills, scoring in the average\nrange on standardized tests of cognitive, emotion\xc2\xad\nal, and social variables and to succeed in regular\n433\n\n\x0cPet. Reh. App.39\n\nVOLUME 110, NUMBER 6: 417-438 I NOVEMBER 2005\n\nIntensive behavioral treatment\n\nclasses at follow-up 6 years after treatment was\nstopped.\nThe strongest pretreatment predictors of out\xc2\xad\ncome were imitation, language, daily living skills,\nand socialization. Rapid acquisition of new ma\xc2\xad\nterial as measured by the Early Learning Measure,\nfirst year IQ, and change in Iafter 1 year were\nalso strong predictors. These findings are consis\xc2\xad\ntent with previous research. A model with 91%\naccuracy was derived for predicting whether a\nchild in the present sample would be a rapid or\nmoderate learner. The usefulness of the model\nmust await validation with other similar samples.\nWe note that one of the two predictors in the\nmodel was pretreatment verbal imitation, which is\nnot widespread among untreated 3-year-old chil\xc2\xad\ndren with autism. However, the model may not\ndiscriminate among children above some as yet\nundetermined age because they often acquire im\xc2\xad\nitation by school age (Charman et al., 1997).\nBecause we used the Bayley to determine pre\xc2\xad\ntreatment IQ^ and Wechsler tests at follow-up,\nthere was a possibility that the observed increases\nin IQmay have reflected the use of different tests\ninstead of treatment effects. To examine this, we\ncompared changes in scores over time from Bayley at Time 1 to Bayley at Time 2, with changes\nfrom Bayley at Time 1 to Wechsler test at Time\n2. One rapid learner was tested using the Bayley\nat pretreatment and again after 1 year of treatment\nbecause he was still only 3 years old. His score\nincreased from 44 to 97, similar to increases seen\nin rapid learners tested with the Bayley at pretreat\xc2\xad\nment and the WPPSI-R at 1 year. Ten moderate\nlearners were tested using the Bayley at pretreat\xc2\xad\nment and again after 1 year of treatment, and with\nWechsler tests thereafter. For these children, Bayley to Bayley IQs increased from 47.2 to 54.3.\nBayley to Wechsler IQs increased from 53.7 to\n54.6. Therefore, there did not seem to be an effect\non IQs attributable to using different tests.\nAnother possible confound was that most\npre- and posttesting of moderate learners was\ndone by the second author, perhaps introducing\nbias. However, the correlation between scores ob\xc2\xad\ntained by the second author and unaffiliated com\xc2\xad\nmunity psychologists was high, and the finding of\nlitde improvement over time on standardized tests\nfor children in this subgroup is consistent with\nprevious findings. A related question is whether\nthe positive findings among rapid learners were\ndue to treatment or maturation. Arguing against\nthe maturation hypothesis is the negligible im\xc2\xad\n434\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\nG. O. Sallows and T. D. Graupner\n\nprovement of children receiving community ser\xc2\xad\nvices found in several longitudinal studies (Eikeseth et al., 2002; Lord & Schopler, 1989; Lovaas,\n1987; Sheinkopf & Siegel, 1998).\nAlthough we matched on age and IQand em\xc2\xad\nployed random assignment, this was not sufficient\nto ensure equal samples. Other pretreatment var\xc2\xad\niables, such as imitation, correlated even more\nstrongly with outcome and were not equal in the\ntwo groups. As a result, we were unable to inter\xc2\xad\npret treatment effects among subgroups of rapid\nlearners. Further, the small number of children in\nthe study limited the power of statistical tests to\ndetect differences, and the many tests on such a\nsmall sample increased the likelihood of spurious\nfindings, thereby limiting the implications of re\xc2\xad\nsults for the larger population of children with\nautism. However, because some treatment effects\nwere so large and have been found in other stud\xc2\xad\nies (e.g., that a subset of the children do well), the\ncurrent results can be seen as supporting an exist\xc2\xad\ning body of research.\nWe found two interesting correlations that de\xc2\xad\nserve further study. First, ratings of parental in\xc2\xad\nvolvement were weakly related to outcome, sug\xc2\xad\ngesting that more overt efforts to increase parents\nfeeling capable of contributing to treatment plan\xc2\xad\nning may enhance treatment effects (Ramey et al.,\n1992). Second, acquisition of social skills was pos\xc2\xad\nitively related to amount and duration of super\xc2\xad\nvised peer play. Some parents were uncomfortable\napproaching other parents to set up play dates,\nand problems doing so may provide a partial ex\xc2\xad\nplanation for the lower social skills scores of their\nchildren. Even so, amount and duration of super\xc2\xad\nvised peer play are surely just a few of the variables\nthat affect acquisition of social skills. Although we\ndo have several powerful interventions, including\nincidental teaching, role playing, and video mod\xc2\xad\neling,\'to teach a curriculum of social conversation,\ncooperative play, and understanding the nonver\xc2\xad\nbal communication of others, building typical so\xc2\xad\ncial skills remains a work in progress (McConnell,\n2002).\nHours of treatment in this study came closer\nthan any previous replication to the intensity of\nhours provided in the UCLA study (Lovaas,\n1987), averaging 38 hours per week for 2 years in\nthe clinic-directed group, and the results were also\nthe most comparable. Forty-eight percent of the\nchildren showed dramatic increases in cognitive\nand social skills and were able to succeed in reg\xc2\xad\nular education classes. However, high hours and\n\xc2\xa9 American Association on Mental Retardation\n\n\x0cvolume\n\n110,\n\nnumber\n\n6: 417-438 |\n\nnovember2?^\xe2\x80\x985^\'\n\nIntensive behavioral treatment\n\nintensive supervision were not sufficient to make\nup for low levels of pretreatment skills. Consistent\nwith previous studies, low IQ_(below 44) and ab\xc2\xad\nsence of language (no words at 36 months) pre\xc2\xad\ndicted limited progress, whereas rate of learning,\nimitation, and social relatedness predicted favor\xc2\xad\nable outcomes (Lord, 1995). Although starting at\na disadvantage, children learning at a moderate\nrate were still acquiring new skills after 4 years.\nWe intend to follow all of the children for several\nmore years to determine their outcome in adoles\xc2\xad\ncence and adulthood.\n\nReferences\nAchenbach, T. M. (1991a). Child Behavior Check\xc2\xad\nlist. Burlington: University of Vermont De\xc2\xad\npartment of Psychiatry. (Available from ASEBA, 1 S. Prospect St., Burlington, VT\n05401-3456 and online at http://checklist.\nuvm.edu)\nAchenbach, T. M. (1991b). Manualfor the Teach\xc2\xad\ner\xe2\x80\x99s Report Form and 1991 Profile. Burlington:\nUniversity of Vermont Department of Psy\xc2\xad\nchiatry. (Available from ASEBA, 1 S. Prospect\nSt., Burlington, VT 05401-3456, and online\nat http://checklist.uvm.edu)\nAmerican Psychiatric Association. (1994). Diag\xc2\xad\nnostic and statistical manual of mental disorders\n(4th ed.). Washington, DC: American Psy\xc2\xad\nchiatric Association.\nAnderson, S. R., Avery, D. L., DiPietro, E. K.,\nEdwards, G. L., & Christian, W. P. (1987).\nIntensive home-based intervention with autis\xc2\xad\ntic children. Education and Treatment of Chil\xc2\xad\ndren, 10, 352-366.\nBayley, N. (1993). Bayley Scales of Infant Develop\xc2\xad\nment (2nd ed.). San Antonio: Psychological\nCorp.\nBibby, P., Eikeseth, S., Martin, N. T., Mudford,\nO. C., & Reeves, D. (2002). Progress and out\xc2\xad\ncomes for children with autism receiving par\xc2\xad\nent-managed intensive interventions. Research\nin Developmental Disabilities, 23, 81-104.\nBierman, K. L., & Welsh, J. A. (1997). Social re\xc2\xad\nlationship deficits. In E. J. Mash & L. G. Terdal (Eds.), Assessment ofchildhood disorders (3rd\ned., pp. 328-365). New York: Guilford Press.\nBimbrauer, J. S., & Leach, D. J. (1993). The Mur\xc2\xad\ndoch Early Intervention Program after 2 years.\nBehavior Change, 10, 63-74.\nBondy, A., & Frost, L. (1994). The Picture-Ex\xc2\xa9 American Association on Mental Retardation\n\nApp.40 AMERICAN JOURNAL ON MENTAL RETARDATION\nG. O. Sallows and T. D. Graupner\n\nchange Communication System. Focus on Au\xc2\xad\ntistic Behavior, 9, 1-19.\nBono, M. A., Daley, T., & Sigman, M. (2004).\nRelations among joint attention, amount of\nintervention and language gain in autism.\nJournal of Autism and Developmental Disorders,\n34, 495-505.\nCarr, E. G., & Durand, V. M. (1985). Reducing\nbehavior problems through functional com\xc2\xad\nmunication training. Journal ofApplied Behav\xc2\xad\nior Analysis, 18, 111-126.\nCharlop, M. H., & Milstein, J. P. (1989). Teaching\nautistic children conversational speech using\nvideo modeling. Journal of Applied Behavior\nAnalysis, 22, 245-285.\nCharman, T., Swettenham, J., Baron-Cohen, S.,\nCox, A., Baird, G., & Drew, A. (1997). Infants\nwith autism: An investigation of empathy,\npretend play, joint attention, and imitation.\nDevelopmental Psychology, 33, 781-789.\nDawson, G., & Osterling, J. (1997). Early inter\xc2\xad\nvention in autism. In M. Guralnick (Ed.), The\neffectiveness of early intervention. Baltimore:\nBrookes.\nEikeseth, S., Smith, T., Jahr, E., & Eldevik, S.\n(2002). Intensive behavioral treatment at\nschool for 4- to 7-year-old children with au\xc2\xad\ntism: A one-year comparison controlled\nstudy. Behavior Modification, 26, 49-68.\nEldevik, S., Eikeseth, S., Jahr, E., & Smith, T. (in\npress). Effects of low-intensity behavioral\ntreatment for children with autism and men\xc2\xad\ntal retardation. Journal of Autism and Develop\xc2\xad\nmental Disorders.\nFenske, B. C., Zalenski, S., Rrantz, P. J., &\nMcClannahan, L. E. (1985). Age at interven\xc2\xad\ntion and treatment outcome for autistic chil\xc2\xad\ndren in a comprehensive intervention pro\xc2\xad\ngram. Analysis and Intervention in Developmen\xc2\xad\ntal Disabilities, 5, 49-58.\nGray, C. (1994). The social story book. Arlington,\nTX: Future Horizons.\nGreen, G. (1996). Early behavioral intervention\nfor autism: What does research tell us? In C.\nMaurice, G. Green, & S. C. Luce (Eds.), Be\xc2\xad\nhavioral intervention for young children with au\xc2\xad\ntism (pp. 29-44). Austin, TX: Pro-Ed.\nGresham, F. M., & MacMillan, D. L. (1998). Early\nintervention project: Can its claims be sub\xc2\xad\nstantiated and its effects replicated? Journal of\nAutism and Developmental Disorders, 28, 5-13.\nFlarris, S. L., & Handleman, J. S. (2000). Age and\nIQ_ at intake as predictors of placement for\n435\n\n\xe2\x80\xa2/-\n\n\x0cPet. Reh. App.41\n\nVOLUME 110, NUMBER 6: 417-438 I NOVEMBER 2005\n\nIntensive behavioral treatment\n\nyoung children with autism: A four- to sixyear follow up. Journal of Autism and Devel\xc2\xad\nopmental Disorders, 30, 137-142.\nHarris, S. L., Handleman, J. S., Gordon, R., Kristoff, B., & Fuentes, F. (1991). Changes in cog\xc2\xad\nnitive and language functioning of preschool\nchildren with autism. Journal of Autism and\nDevelopmental Disorders, 21, 281-290.\nHart, B., & Risley, T. R. (1975). Incidental teach\xc2\xad\ning of language in the preschool. Journal of\nApplied Behavior Analysis, 8, 411-420.\nHosmer, D. W., Jovanovic, B., & Lemeshow, S.\n(1989). Best subset logistic regression. Biomet\xc2\xad\nrics, 45, 1265-1270.\nHoward, J. S., Sparkman, C. R., Cohen, H. G.,\nGreen, G., & Stanislaw, H. (2005). A compar\xc2\xad\nison of intensive behavior analytic and eclec\xc2\xad\ntic treatments for young children with autism.\nResearch in Developmental Disabilities, 26, 359\xe2\x80\x94\n383.\nHowlin, P. (1998). Children with autism and Asper\xc2\xad\nger syndrome: A guidefor practitioners and carers.\nChichester, West Sussex, England: Wiley.\nJacobson, J. W., Mulick, J. A., & Green, G. (1998).\nCost-benefit estimates for early intensive be\xc2\xad\nhavioral intervention for young children with\nautism: General models and single state case.\nBehavioral Interventions, 13, 201-226.\nJahr, E., Eldevik, S., & Eikeseth, S. (2000). Teach\xc2\xad\ning autistic children to initiate and sustain co\xc2\xad\noperative play. Research in Developmental Dis\xc2\xad\nabilities, 21, 151-169.\nKoegel, L. K., Koegel, R. L., Shoshan, Y., &\nMcNemey, E. (1999). Pivotal response inter\xc2\xad\nvention II: Preliminary long-term outcomes\ndata. Journal of the Association for Persons with\nSevere Handicaps, 24, 186-198.\nKoegel, R. L., & Koegel, L. K. (1995). Teaching\nchildren with autism: Strategies for initiating pos\xc2\xad\nitive interactions and improving learning oppor\xc2\xad\ntunities. Baltimore: Brookes.\nKoegel, R. L., Russo, D. C., & Rincover, A.\n(1977). Assessing and training teachers in the\ngeneralized use of behavioral modification\nwith autistic children. Journal of Applied Be\xc2\xad\nhavior Analysis, 10, 197-205.\nLachar, D. (1982). Personality Inventoryfor Children\n(PIC): Revised format manual supplement. Los\nAngeles: Western Psychological Services.\nLord, C. (1995). Follow-up of two-year-olds re\xc2\xad\nferred for possible autism. Journal of Child Psy\xc2\xad\nchology and Psychiatry, 36, 1365-1382.\nLord, C., & Paul, R. (1997). Language and corn436\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\nG. O. Sallows and T. D. Graupner\n\nmunication in autism. In D. L. Cohen & F.\nR. Volkmar (Eds.), Handbook ofautism andper\xc2\xad\nvasive developmental disorders (2nd ed., pp.\n195-225). New York: Wiley.\nLord, C., Rutter, M., & LeCouteur, A. (1994). Au\xc2\xad\ntism Diagnostic Interview-Revised: A revised\nversion of a diagnostic interview for caregivers\nof individuals with possible pervasive devel\xc2\xad\nopmental disorders. Journal ofAutism and De\xc2\xad\nvelopmental Disorders, 23, 659-685.\nLord, C., & Schopler, E. (1989). The role of age\nat assessment, developmental level, and test\nin the stability of intelligence scores in young\nautistic children. Journal ofAutism and Devel\xc2\xad\nopmental Disorders, 19, 483-499.\nLovaas, O. I. (1987). Behavioral treatment and\nnormal educational and intellectual function\xc2\xad\ning in young autistic children. Journal of Con\xc2\xad\nsulting and Clinical Psychology, 55, 3-9.\nLovaas, O. I., Ackerman, A. B., Alexander, D.,\nFirestone, P., Perkins, J., & Young, D. (1981).\nTeaching developmental,ly disabled children: The\nme book. Austin, TX: Pro-Ed.\nLovaas, O. I., Koegel, R. L., Simmons, J. CL, &\nLong, J. S. (1973). Some generalization and\nfollow-up measures on autistic children in be\xc2\xad\nhavior therapy. Journal of Applied Behavior\nAnalysis, 6, 131-166.\nLovaas, O. I., & Smith, T. (1988). Intensive be\xc2\xad\nhavioral treatment for young children with\nautism. In B. B. Lahey & A. E. Kazdin (Eds.),\nAdvances in clinical child psychology (Vol. 11,\npp. 285-324). New York: Plenum.\nLovaas, O. I., Smith, T., & McEachin, J. J. (1989).\nClarifying comments on the young autism\nstudy: Reply to Schopler, Short and Mesibov.\nJournal of Consulting and Clinical Psychology,\n57, 165-167.\nMaine Administrators of Service for Children\nwith Disabilities. (2000). Report of the MADSEC autism task force. Manchester, ME: Au\xc2\xad\nthor. (Available online at http://www.madex.\norg)\nMaurice, C., Green, G., & Luce, S. C. (Eds.).\n(1996). Behavioral interventionfor young children\nwith autism. Austin, TX: Pro-Ed.\nMcConnell, S. R. (2002). Interventions to facili\xc2\xad\ntate social interaction for young children with\nautism: Review of available research and rec\xc2\xad\nommendations for educational intervention\nand future research. Journal ofAutism and De\xc2\xad\nvelopmental Disorders, 32, 351-372.\nMcEachin, J. J., Smith, T., & Lovaas, O. I. (1993).\n\xc2\xa9 American Association on Mental Retardation\n\n\x0cVOLUME 110, NUMBER 6: 417-438 I NOVEMBER2*o!)$\nIntensive behavioral treatment\n\nLong-term outcome for children with autism\nwho received early intensive behavioral treat\xc2\xad\nment. American Journal on Mental Retardation,\n97, 359-372.\nMeyer, L. S., Taylor, B. A., Levin, L., & Fisher, J.\nR. (2001). Alpine Learning Group. In J. S.\nHandleman & S. L. Harris (Eds.), Preschool ed\xc2\xad\nucation programs for children with autism (2nd\ned., pp. 135-155). Austin, TX: Pro-Ed.\nMundy, P. (1993). Normal versus high-function\xc2\xad\ning status in children with autism. American\nJournal on Mental Retardation, 97, 381-384.\nNewsom, C., & Rincover, A. (1989). Autism. In\nE. J. Mash & R. A. Barkley (Eds.), Treatment\nofchildhood disorders (pp. 286-346). New York:\nGuilford.\nNew York State Department of Health, Early In\xc2\xad\ntervention Program. (1999, May). Clinical\npractice guidelines: Autism/pervasive developmen\xc2\xad\ntal disorders, assessment and intervention for\nyoung children (ages 0-3years). Albany: Author.\nRamey, C. T., Bryant, D. M., Wasik, B. H., Spar\xc2\xad\nling, J. J., Fendt, K. H., & LaVange, L. M.\n(1992). Infant health and development pro\xc2\xad\ngram for low birth weight, premature infants:\nProgram elements, family participation, and\nchild intelligence. Pediatrics, 3, 454-465.\nReynell, J. K., & Gruber, G. P. (1990). Reynell De\xc2\xad\nvelopmental Language Scales. Los Angeles:\nWestern Psychological Services.\nRoid, G. H., & Miller, L. J. (1995, 1997). Leiter\nInternational Performance Scale-Revised. Wood\nDale, IL: Stoelting.\nRomanczyk, R. G., Lockshin, S. B., & Matey, L.\n(2001). In J. S. Handleman & S. L. Harris\n(Eds.), Preschool education programs for children\nwith autism (2nd ed., pp. 49-94). Austin, TX:\nPro-Ed.\nSchopler, E., Short, A., & Mesibov, G. (1989). Re\xc2\xad\nlation of behavioral treatment to normal func\xc2\xad\ntioning: Comment on Lovaas. Journal of Con\xc2\xad\nsulting and Clinical Psychology, 57, 162-164.\nSchreibman, L. (1997). Theoretical perspectives\non behavioral intervention for individuals\nwith autism. In D. L. Cohen & F. R. Volkmar\n(Eds.), Handbook of autism and pervasive devel\xc2\xad\nopmental disorders (2nd ed., pp. 920-933). New\nYork: Wiley.\nSchreibman, L. (1988). Autism. Newbury Park,\nCA: Sage.\nSemel, E., Wiig, E. H., & Secord, W. A. (1995).\nClinical evaluation oflanguagefundamentals (3rd\ned.). San Antonio: Psychological Coip.\n\xc2\xa9 American Association on Mental Retardation\n\nApp.42\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\nG. O. Sallows and T. D. Graupner\n\nSheinkopf, S. J., & Siegel, B. (1998). Home-based\nbehavioral treatment of young children with\nautism. Journal of Autism and Developmental\nDisorders, 28, 15-23.\nShinnar, S., Rapin, I., Arnold, S., Tuchman, R. F.,\nShulman, L., Ballaban-Gil, K., Maw, M.,\nDeuel, R. K., & Volkmar, F. R. (2001). Lan\xc2\xad\nguage regression in childhood. Pediatric Neu\xc2\xad\nrology, 24, 183-189.\nSmith, T. (1993). Autism. In T. R. Giles (Ed.),\nHandbook of effective psychotherapy (pp. 107\xe2\x80\x94\n133). New York: Plenum.\nSmith, T., Buch, G. A., & Gamby, T. E. (2000).\nParent-directed, intensive early intervention\nfor children with pervasive developmental dis\xc2\xad\norder. Research in Developmental Disabilities, 21,\n297-309.\nSmith, T., Eikeseth, S., Klevstrand, M., & Lovaas,\nO. I. (1997). Intensive behavioral treatment\nfor preschoolers with severe mental retarda\xc2\xad\ntion and pervasive developmental disorder.\nAmerican Journal on Mental Retardation, 102,\n238-249.\nSmith, T., Groen, A., & Wynn, J. (2000). Ran\xc2\xad\ndomized trial of intensive early intervention\nfor children with pervasive developmental dis\xc2\xad\norder. American Journal on Mental Retardation,\n105, 269-285.\nSmith, T., & Lovaas, O. I. (1997). The UCLA\nYoung Autism Project: A reply to Gresham\nand McMillan. Behavioral. Disorders, 22, 202218.\nSmith, T., McEachin, J. J., & Lovaas, O. I. (1993).\nComments on replication and evaluation of\noutcome. American Journal on Mental Retar\xc2\xad\ndation, 97, 385-391.\nSparrow, S. S., Balia, D. A., & Cicchetti, D. V.\n(1984). Vineland Adaptive Behavior Scales (In\xc2\xad\nterview Ed.). Circle Pines, MN: American\nGuidance Service.\nStutsman, R. (1948). Merrill Palmer Scale ofMental\nTests. Wood Dale, IL: Stoelting.\nTuchman, R. F., & Rapin, I. (1997). Regression in\npervasive developmental disorders: Seizures\nand epileptiform electroencephalogram cor\xc2\xad\nrelates. Pediatrics, 99, 560-566.\nVenter, A., Lord, C., & Schopler, E. (1992). A fol\xc2\xad\nlow-up study of high-functioning autistic chil\xc2\xad\ndren. Journal of Child Psychology and Psychiatry,\n33, 489-507.\nWechsler, D. (1989). Wechsler Preschool and Pri\xc2\xad\nmary Scale ofIntelligence-Revised. San Antonio,\nTX: Psychological Corp.\n437\n\n\x0cPet. Reh. App.43\n\nVOLUME 110, NUMBER 6: 417-438 I NOVEMBER 2005\n\nIntensive behavioral treatment\n\nWechsler, D. (1991). Manualfor the Wechskr Intel\xc2\xad\nligence Scalefor Children: Third Edition. San An\xc2\xad\ntonio: Psychological Corp.\nWeiss, M. J. (1999). Differential rates of skill ac\xc2\xad\nquisition and outcomes of early intensive be\xc2\xad\nhavioral intervention for autism. Behavioral\nInterventions, 14, 3-22.\nWirt, R. D., Lachar, D., Klinedinst, J. K., & Seat,\nP. D. (1977). Multidimensional descriptions of\nchild personality: A manual for the Personality\nInventory for Children. Los Angeles: Western\nPsychological Services.\nWoodcock, R. W., McGrew, K. S., & Mather, N.\n(2001). Woodcock-Johnson III Tests of Achieve\xc2\xad\nment. Itasca, IL: Riverside.\n\nAMERICAN JOURNAL ON MENTAL RETARDATION\n\nG. O. Sallows and T. D. Graupner\n\nReceived 7/22/04, accepted 7/20/05.\nEditor-in-charge: William E. MacLean, Jr.\nThis research was supported in part by National\nInstitute of Mental Health Grant MH4886301A3, Multi-Site Young Autism Project. We\nthank Crystal (Bums) Held, the UCLA-trained\nclinic supervisor, for her efforts in carrying out\nthis study; Tristram Smith for reviewing the man\xc2\xad\nuscript multiple times; Robyn Dawes and Shuangge Ma for consultation on data analysis; and O.\nIvar Lovaas for the opportunity to train at UCLA,\nhis mentorship, and support. Requests for reprints\nshould be sent to either author at the Wisconsin\nEarly Autism Project, 6402 Odana Rd., Madison,\nWI 53719. E-mail: weap@wiautism.com\n\nErrata\nSeveral errors occurred in the article \xe2\x80\x9cSupport Needs and Adaptive Behaviors,\xe2\x80\x9d by Julia Harries,\nRoma Guscia, Neil Kirby, Ted Nettelbeck, and John Taplin (Vol. 110, No. 5, 393-404). On page 395,\nin last line under Participants, the SB should be 3.2 years not 3.2 months.\nIn Table 4 on page 400, there should not be a superscript a next to the ICAP heading. Also, in\nthis table the coefficient for SIS Health and Safety subscale in Factor 3 should be \xe2\x80\x94.16 not .16.\nIn the reference list, there should be reference to two versions of the Supports Intensity Scale (one\nunpublished version and one published version) as follows:\nThompson, J. R., Bryant, B., Campbell, E. M., Craig, E. M., Hughes, C., Rotholz, D. A., Schalock, R.\nL., Silverman, W., Tasse, M. J., & Wehmeyer, M. (2002). Supports Intensity Scale: Standardization\nand. users manual. Unpublished assessment scale, American Association on Mental Retardation.\nThompson, J. R., Bryant, B., Campbell, E. M., Craig, E. M., Hughes, C., Rotholz, D. A., Schalock, R.\nL., Silverman, W., Tass6, M. J., & Wehmeyer, M. (2004). Supports Intensity Scale: Users manual.\nWashington, DC: American Association on Mental Retardation.\n\n438\n\n\xc2\xa9 American Association on Mental Retardation\n\n\x0cPet. Reh. App.44\n0196-206X/06/2702-0145\nDevelopmental and Behavioral Pediatrics\nCopyright \xc2\xa9 2006 by Lippincott Williams & Wilkins, Inc.\n\nC\n\nVol. 27, No. 2, April 2006\nPrinted in U.S.A.\n\nTreatment\n\nEarly Intensive Behavioral Treatment: Replication of\nthe UCLA Model in a Community Setting\nHOWARD COHEN, Ph.D.\nValley Mountain Regional Center, Stockton, CA\nMILA AMERINE-DICKENS, M.S.\nCentral Valley Autism Project, Modesto, CA\n\nTRISTRAM SMITH, Ph.D.\nDepartment of Pediatrics, University of Rochester Medical Center, Rochester, NY\nABSTRACT. Although previous studies have shown favorable results with early intensive behavioral\ntreatment (EIBT) for children with autism, it remains important to replicate these findings, particularly in\ncommunity settings. The authors conducted a 3-year prospective outcome study that compared 2 groups: (1)\n21 children who received 35 to 40 hours per week of EIBT from a community agency that replicated Lovaas\'\nmodel of EIBT and (2).21 age- and IQ-matched children in special education classes at local public schools.\nA quasi-experimental design was used, with assignment to groups based on parental preference.\nAssessments were conducted by independent examiners for IQ (Bayley Scales of Infant Development or\nWechsler Preschool and Primary Scales of Intelligence), language (Reynell Developmental Language\nScales), nonverbal skill (Merrill-Palmer Scale of Mental Tests), and adaptive behavior (Vineland Adaptive\nBehavior Scales). Analyses of covariance, with baseline scores as covariates and Year 1-3 assessments as\nrepeated measures, revealed that, with treatment, the EIBT group obtained significantly higher IQ (F = 5.21,\np = .03) and adaptive behavior scores (F = 7.84, p = .01) than did the comparison group. No difference\nbetween groups was found in either language comprehension (F = 3.82, p = .06) or nonverbal skill. Six of the\n21 EIBT children were fully included into regular education without assistance at Year 3, and 11 others were\nincluded with support; in contrast, only 1 comparison child was placed primarily in regular education.\nAlthough the study was limited by the nonrandom assignment to groups, it does provide evidence that EIBT\ncan be successfully implemented in a community setting. J Dev Behav Pediatr 27:145-155, 2006. Index\nterms: autism, early intervention, applied behavior analysis, behavioral treatment.\n\nThe design and implementation of methodologically\nrigorous treatment studies are daunting tasks and, in\nthe area of treatment for autism spectrum disorders,\noften emotionally charged and publicly vetted as well.\nMatching groups on a variety of important measures,\nincluding severity of disability, individual characteris\xc2\xad\ntics of the child, multiple important socio-familial and\nenvironmental factors, as well as controlling multiple\ntreatment issues such as fidelity, intensity and length\nof treatment and pre-determining appropriate outcome\nmeasures are all challenging (and expensive). Moving\ntreatment studies from the laboratory setting into the\ncommunity presents additional hurtles, yet this is\nultimately the setting in which the efficacy of treatment\nmodels needs to be evaluated. Cohen and colleagues\n\nare to be commended for implementing a communitybased treatment study with matched samples, doc\xc2\xad\numentation of treatment fidelity, and comprehensive\n3-year follow-up. However, the setting was based in\na community program that is mandated to provide\ntreatment to families of children with autism spectrum\ndisorders who are then free to accept a plan or not,\nwhich prohibited random assignment to treatment. This\nintroduced potential bias in their groups, with more\neducated and dual parent families in the EIBT group.\nThere are strengths as well as limitations in this study.\nAlthough it does not resolve the controversies that\ncontinue regarding the \xe2\x80\x9cbest\xe2\x80\x9d treatments for young\nchildren with ASD, we include it because of the critical\nneed for evaluation of treatment approaches. The\nreviewers pointed out the limitations in this community\napproach as well as its strengths. The reader is\nencouraged to look at both in reviewing this article.\nWe hope that it will inspire others to do these vitally\nneeded treatment effectiveness studies. \xe2\x80\x94Editor\n\nReceived September 2005; accepted February 2006.\nAddress for reprints: Mila Amerine-Dickens, M.S., 1317 Oakdale Rd.,\nSuite 800, Modesto, CA 95355; e-mail: mamerine-dickens@cvap.org.\n\nS145\nCopyright \xc2\xa9 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.\n\n*.\xe2\x80\xa2\n\n\x0cPet. Reh. App.45\nS146\n\nAMERINE-DICKENS ET AL\n\nIn an era when Autistic Spectrum Disorder (ASD) was\nviewed as largely unbeatable,1 Ivar Lovaas\xe2\x80\x99 1987 outcome\nstudy2 became a pivotal event that provided optimism about\nbehavioral interventions for ASD. Almost half (9 of 19) of\nthe children with autism who began intensive behavioral\ntreatment prior to the age of 4 years from the UCLA/Lovaas\nclinic (40 hours per week for 2 or more years) were fully\nincluded into regular education and showed significant gains\nin intellectual achievement. A follow-up study of the same\nchildren showed sustained gains.3 This finding, coupled with\na general trend toward earlier diagnosis of ASD (under 3\nyears of age)4 and the recent exponential increase in\ndocumented cases of ASD,5 made Lovaas\xe2\x80\x99 results even more\ninfluential and replication of his research more compelling.\nReplication of the UCLA/Lovaas Model involves the\nfollowing key elements6: (1) clinical internship and train\xc2\xad\ning on the UCLA/Lovaas Model of intervention under the\ndirection of qualified supervisors; (2) implementation of\nthe model for 35 to 40 hours per week throughout the year,\nincluding one-to-one instruction, peer play training ses\xc2\xad\nsions, inclusion into regular education classrooms, and\ngeneralization activities; (3) parent training to foster the\nchild\xe2\x80\x99s acquisition and generalization of skills; and (4)\nannual outcome measures.\nSeveral studies have partially replicated the UCLA/Lovaas\nModel. In the only randomized clinical trial, 28 children with\nASD received either intensive behavioral treatment or parent\ntraining.7 The intensive treatment group averaged 25 hours\nper week in the first year which faded over the next 1 to 2\nyears. The comparison group participated in 10 to 15 hours\nper week of special education classes and received 5 hours\nper week of parent training for 3 to 9 months. The intensive\nchildren outperformed the comparison children on intellec\xc2\xad\ntual, visual-spatial, and academic measures. However, gains\nwere substantially smaller than in Lovaas\xe2\x80\x99 original study.\nFor example, the between-group IQ difference at follow-up\nwas 16 points compared to the 31 reported by Lovaas. In\nother partial replications of the UCLA model, children with\nASD obtained 15 to 35 hours per week of treatment and\nobtained results similar to those reported in the randomized\nclinical trials8,9; similar results also have been reported for\nother early intensive behavioral treatment (EIBT) models\nwith about 25 hours per week of treatment.10,11\nConcerns have been expressed about the difficulty of\noffering treatment at this level of intensity to community\nsamples,12 and mixed results of EIBT in community\nsettings have been reported. One investigation indicated\na lack of significant improvements in a sample of 66\nchildren with ASD.13 A multiple baseline study of 6\nchildren found clear short-term gains but equivocal long\xc2\xad\nterm effects.14 However, a third study reported that an\nEIBT group (n = 29) in a community agency made\nstatistically significant gains in all areas of development\nexcept motor skills, relative to 2 comparison groups.15\nMoreover, 13 of the 29 EIBT children (45%) achieved IQs\nin the average to above average range. In the first\nreplication of the UCLA Model that included all of the\nelements identified by Lovaas, 11 of 23 children with ASD\n(48%) achieved full inclusion into regular education and\n\nJDBP/April, Vol. 27, No. 2\n\nIQ scores greater than 85.16 However, the study did not\nhave a comparison group.\nAlthough these studies generally confirm that EIBT is\neffective, differing results across studies and methodolog\xc2\xad\nical limitations such as the absence of comparison groups\nin many reports weaken the ability to truly validate the\noptimism generated by the initial Lovaas study. Accord\xc2\xad\ningly, the present study was an attempt to fully replicate\nthat study in a community setting. Research questions\nincluded the following: (1) Can the Lovaas/UCLA model\nbe replicated in a community setting? (2) What outcomes\ndo children with ASD achieve with this intervention?\n\nMETHODS\nParticipants\nParticipants were 42 children in 2 groups: The early\nintensive behavioral treatment (EIBT) group (n = 21)\nreceived 35 to 40 hours of behavioral intervention, 47\nweeks per year, for 3 or more years. The comparison group\n(n = 21) received services from local public schools. In\naccord with the UCLA Young Autism Project multisite\nresearch replication protocol, participation criteria for both\ngroups included (1) primary diagnosis of autistic disorder\nor pervasive developmental disorder not otherwise speci\xc2\xad\nfied based on an evaluation by an independent licensed .\npsychologist and confirmed by the Autism Diagnostic\nInterview-Revised,17 (2) pretreatment IQ above 35 on the\nBayley Scales of Infant Development-Revised (BSIDR),18 (3) chronological age between 18 and 42 months at\ndiagnosis and under 48 months at treatment onset, (4) no\nsevere medical limitation or illness including motor or\nsensory deficits that would preclude a child from partic- \'\xe2\x96\xa0\nipating in 30 hours per week of treatment, (5) residence .\nwithin 60 km of the treatment agency, (6) no more than "\n400 hours of behavioral intervention prior to intake, and\n(7) parent\xe2\x80\x99s agreement to participate actively in parent\ntraining and generalization and to have an adult present\nduring home intervention hours.\nIn addition to the 21 participants in each group, there\nwere 5 dropouts who were excluded from the data analyses\n(3 in the EIBT group and 2 in the comparison group). One\nEIBT participant moved out of the area at 17 months into\ntreatment and was unavailable for follow-up; 2 withdrew\ntheir participation, 1 at 3 months and the other at 18\nmonths. Dropouts were similar to completers with regard\nto age of diagnosis (24, 36, and 22 months), baseline IQ\n(42, 44, and 44), and 1-year IQ (58 and 61; score\nunavailable for participant who dropped out after 3\nmonths). Two comparison children were dropped because\nparents either declined annual testing of their child or\ncould not be contacted. All other eligible referrals enrolled\nin the study, completed yearly follow-up assessments, and\nwere included in the data analyses.\nAll treatment in both groups was provided at no cost to\nfamilies. Funding was split between 2 public agencies: (1)\nthe Valley Mountain Regional Center (VMRC; Stockton,\nCA) and (2) the child\xe2\x80\x99s Special Education Local Planning\nArea (SELPA) of residence. VMRC is contracted by the\nCalifornia Department of Developmental Services to\n\nCopyright \xc2\xa9 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.\n\n\x0cPet. Reh. App.46\nEarly Intensive Behavioral Treatment\nidentify and coordinate services for individuals with\ndevelopmental disabilities; its catchment area includes San\nJoaquin, Stanislaus, Calaveras, Amador, and Tuolumne\nCounties. SELPAs are contracted by the California Depart\xc2\xad\nment of Education to provide special education instruction.\nDesign\nInasmuch as VRMC and SELPA had a mandate to\nprovide free and appropriate services, legal and ethical\nconsiderations precluded random assignment of children to\ngroups. Therefore, a quasi-experimental design was used.\nA comparison group was formed by identifying children\nwho met participation criteria for EIBT and whose parents\nchose other services. Specifically, for each EIBT partic\xc2\xad\nipant, a file review was initiated at VMRC to identify a\nmatching child who was not receiving EIBT; the first\nidentified child was then added to the comparison group.\nComparison children were followed prospectively and\nreceived the same annual assessments as EIBT children.\nTo ensure that choices were available to families and that\nfamilies were aware of these choices, VMRC and SELPA 6,\nalong with nonpublic educational agencies and parents,\ndeveloped an ongoing collaborative program (Autism\nConnection).19 The Early Autism Diagnostic Clinic\n(EADC) was created by the Autism Connection (1) to\nprovide expert evaluations for autism and related disorders\n(or referrals to other experts in the area) and (2) to bring\ntogether local clinicians, VMRC, parents, school district\nrepresentatives, and advocates to communicate directly\nwith each other, at the EADC, rather than requiring the\nparents to endure separate meetings. At the time of\ndiagnosis, an educational consultant from the EADC and\na representative from the school district of residence\npresented the family, orally and in writing, a Matrix of\nEducational Options developed by the Autism Connection.\nThis matrix delineates the service agencies in the child\xe2\x80\x99s\narea of residence and their eligibility criteria, along with\nthe roles and responsibilities of parents, service providers,\nand funding agencies in implementing interventions.\nOptions included special education settings, Autistic\nSpectrum Disorder (ASD) classes, speech and language\nservices, occupational therapy, genetic counseling, behav\xc2\xad\nior intervention services, grief counseling, Early Start\nprograms for children under 3 years old, and EIBT\nPrograms, including the agency in this study (Central\nValley Autism Project; CVAP) and other EIBT providers.\nDuring the enrollment period (1995-2000), the number of\nother EIBT providers ranged from 1 to 3. At times when\nCVAP did not have openings, the education consultant and\nschool representative removed CVAP from the Matrix.\nEADC educational consultant and school representatives\nwere otherwise independent of the study.\nTreatment Procedures: EIBT Group\nEIBT consisted of 35 to 40 hours per week of inter\xc2\xad\nvention based on Lovaas\xe2\x80\x99 UCLA treatment model.2,6,20\nSeventeen of the 21 participants remained in EIBT for 3\nyears. Four others ended EIBT prior to 3 years but\n\nS147\n\ncompleted follow-up assessments and are included in the\nstatistical analyses; 1 completed the intervention protocol\nand was fully included in regular education at Year 2,\nwhereas 3 others were transferred to other services (2 after\n6 months and 1 at Year 2) because their progress did not\nmeet specific, predetermined developmental markers for\ncontinuing intervention. Markers at 6, 12, 24, and 36\nmonths were identified collaboratively by Autism Con\xc2\xad\nnection.21 For example, at 24 months, the IEP team\nconsidered whether the child showed one or more signs\nof progress such as the following: (1) the child\xe2\x80\x99s stand\xc2\xad\nardized cognitive testing indicated steady growth or near\xc2\xad\naverage functioning; (2) objective data collected on EIBT\ninstruction demonstrated that the child was mastering new\nskills; (3) objective data revealed an increase in the child\xe2\x80\x99s\nfrequency of initiating language or peer interaction; or (4)\nthe child was included in a general education placement\nwith similar-aged peers for systematically increasing\nincrements of time and was acquiring age-appropriate\npre-academic skills.\nThe EIBT agency, CVAP, met all criteria for replication\nof Lovaas\xe2\x80\x99 UCLA treatment model and participated in a\nmulticenter study supported by the National Institute of\nMental Health. The UCLA model relies exclusively on\nbehavioral techniques such as unambiguous instruction,\nshaping through positive reinforcement of successive\napproximations, systematic prompting and fading proce\xc2\xad\ndures, discrimination learning, and careful task analysis.\nPositive reinforcers such as edibles, sensory and perceptual\nobjects are used initially but soon replaced by social\nreinforcers such as praise, tickles, hugs, and kisses.\nOngoing data collection is performed to monitor skill\nacquisition, generalization, and frequency of problem\nbehaviors. The intervention protocol consists of 3 primary\ncomponents: (1) In-home 1:1 instruction, (2) peer play\ntraining, and (3) regular education classroom inclusion. No\naversive interventions were used throughout the study.\nInitially, the In-Home 1:1 Intervention Component is\nimplemented 35 to 40 hours per week for children older than\n3 years, and 20 to 30 hours per week for children younger\nthan 3 years. The focus is on establishing foundational and\nspontaneous communication. The main teaching format is\ndiscrete trials,22 but generalization activities and commun\xc2\xad\nity outings are also part of the 35 to 40 hours per week of\ninstruction. In discrete trials, the tutor works individually\nwith a child in a distraction-free setting and administers 3\nto 8 trials in a sitting, with 1- to 2-minute breaks between\nsittings, for approximately 50 minutes each hour. The\nremaining 10 minutes of each hour are devoted to\ngeneralization activities. These activities include structured\nplay, in which the child has opportunities to apply skills\ninitially mastered in the 1:1 setting (e.g., labeling toys or\ntaking turns with the tutor during a game), and incidental\nteaching, in which situations were arranged to encourage\ninitiation of language (e.g., placing preferred objects in\nsight but out of reach). Skill mastery in discrete trials was\ndefined as 90% accuracy across 2 days of intervention,\nacross 2 or more tutors. Concept mastery was defined as\n90% accuracy of 5 to 10 novel items probed and mastered\nwithin a concept. After mastery, skills and concepts were\n\nCopyright \xc2\xa9 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.\n\n\x0cPet. Reh. App.47\nS148\n\nAMERINE-DICKENS ET AL\n\nsystematically generalized to other more naturalistic set\xc2\xad\ntings and maintained by available contingencies in the\nnatural environment. To facilitate generalization, commun\xc2\xad\nity outings occurred 3 to 5 times per week. The UCLA\ncurriculum was used for teaching the initial foundation\nskills including compliance, imitation, early receptive and\nexpressive language, visual spatial skills, and self-help. 6,20\nAt approximately 1 year into the behavioral interven\xc2\xad\ntion, the distribution of the 35 to 40 hours per week is\ntypically as follows: 26 to 31 of home instruction, 3 to 5\nhours of peer play, and 6 to 9 hours at preschool.\nThereafter, the home component gradually decreases,\nwhereas other components gradually increase based upon\nthe child\xe2\x80\x99s inclusion in the classroom.\nAs part of the generalization of skills and behaviors to the\nnatural environment, the peer play component is initiated\n3 to 5 sessions per week with a typically developing peer for\n15 to 60 minutes per session when the child has mastered\nprerequisite skills: verbal response to questions, on topic\nstatements, simple play skills, and turn taking.2,6\'20 Skills\nmastered in the 1:1 setting are systematically generalized\nto a social/play setting with a peer of similar age. A trained\ntutor facilitates mastered activities for the child and peer\n(e.g., conversation, pretend play with toys, or turn-taking\ngames) and prompts the peer to engage the child with\nsubtle cues such as whispers in the peer\xe2\x80\x99s ear, visual\nsignals, or indirect questions. When the child is 90%\naccurate initiating with peers across 3 or more peers for 18\nto 24 months, additional children are presented at one time\nto form a group play setting.\nAt about the time that peer play training is initiated, the\nchild enters a teacher-directed structured regular education\npreschool setting.2 Initially, trained tutors accompany the\nchild to school to assist the teaching staff with gaining\ninstructional control, generalizing mastered skills to the\nschool setting, and learning classroom skills. The tutor\nfunctions as a classroom aide and not as a 1:1 aide for the\nchild. Initial goals for inclusion center on generalizing\nskills to a novel, yet structured environment. As the child\nachieves independent responding during specific activities\n(e.g. circle time, center time, and so forth), as determined\nby data, the shadow tutor is faded. Activities requiring\nsocial skills and behaviors are always the last to fade in\nthe process.\nWhen children have achieved typical levels of academic\nfunctioning in the classroom and participate without the\nassistance of a shadow tutor during teacher-directed\nactivities, they still may require the assistance of the\nshadow tutor during social opportunities throughout the\nschool day for an additional 2 to 3 years. Thus, an\nintervention with reduced hours both at home and in\nschool may extend into the early primary grades. School\nhours focus on generalization of social skills and friend\xc2\xad\nship development. As the child\xe2\x80\x99s rate of independent social\ninteraction increases, the intervention hours are succes\xc2\xad\nsively reduced to 0. Subsequently, consultation to the\nfamily and the school setting continue 1 to 2 hours per\nmonth for up to 1 to 2 years. Home hours focus on play\nsessions with peers and gradually transition to typical play\ndates with peers without the presence of a tutor. Periodic\n\nJDBP/April, Vol. 27, No. 2\n\nstandardized assessments continue until the child is\n18 years old.\nDuring the course of the study, there was a growing\nrecognition that many children who made significant gains\nin the first 2 years of treatment required training beyond\nthe UCLA curriculum to develop mutually satisfying\nsocial relationships, enhance their understanding of social\nmeanings, understand and interpret other\xe2\x80\x99s perspectives/\nknowledge/cognition/beliefs, and ultimately respond appro\xc2\xad\npriately to social behaviors of peers and others. To address\nthis need, overt social behaviors were operationally defined,\nboth verbal (e.g., conversational skills, such as responding to\nstatements or questions asked by others, reciprocal state\xc2\xad\nments, initiating conversation, inquiring about others,\nremaining on topic, and sustaining conversation) and non\xc2\xad\nverbal (e.g., interpreting and responding to other\xe2\x80\x99s facial\nexpressions, emotional states, voice tone, or body language),\nand initially taught in a discrete trial format, using the same\nbehavioral principles and methodology described above,\nwith an emphasis on a quick transition to generalized\nteaching to a social context, using incidental teaching and\nvideo modeling as tools for generalization.\nStaff and Parent Training. To ensure proficiency in\nimplementing the UCLA model, 5 CVAP staff members\neach completed 3- to 4-month internships at UCLA, and\nconsultants from UCLA made on-site visits 2 to 4 times\nper year for the first 3 years of the study period, with\nfrequent telephone contacts between visits (typically once\nper week). During this period, a random sample of 12\nCVAP tutors were videotaped and scored by blind raters\nfor adherence to UCLA procedures. The level of adher\xc2\xad\nence by CVAP tutors was found to be nonsignificantly\nhigher than adherence by tutors employed at UCLA.23\nOne UCLA-trained individual served as CVAP site\ndirector, responsible for oversight of each child\xe2\x80\x99s inter\xc2\xad\nvention; she holds a master\xe2\x80\x99s degree in clinical psychol\xc2\xad\nogy/applied behavior analysis and is a Board Certified\nBehavior Analyst. Clinic supervisors trained and provided\nongoing performance feedback to tutors. Supervisors were\ngraduate students in behavior analysis or master\xe2\x80\x99s level\nclinicians with 2 or more years of experience in providing\nEIBT. Tutors were recruited from the community and were\nthe main providers of direct services. Supervisors and\ntutors were assigned to each EIBT participant based on\nopenings in their schedule and geographic location.\nTo become a supervisor, individuals had to meet prespeci\xc2\xad\nfied, objective criteria, including high ratings based on direct\nobservation of their implementation of EIBT interventions,\nfavorable evaluations from families and staff members, satisfac\xc2\xad\ntory performance on a test ofskill at curriculum development, and\noral and written demonstration of their knowledge of applied\nbehavior analysis and ASD.24 Tutors had to pass a rigorous\nbehavior observation assessment of their accuracy in\nconducting discrete trial training (DTT) and oral tests of\ntheir knowledge of the UCLA treatment manual.\nParents were encouraged to be involved in all levels of\nintervention. At the beginning of treatment, all parents\nattended a 12- to 18-hour training workshop across 2 to 3\ndays on behavioral principles and intervention methods.\nThereafter, they participated in weekly training sessions to\n\nCopyright \xc2\xa9 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.\n\nH.\'V\n\nT\'\n\n\x0cPet. Reh. App.48\nEarly Intensive Behavioral Treatment\ngeneralize their child\xe2\x80\x99s newly established skills to the\nnatural environment. Parents provided ongoing informa\xc2\xad\ntion regarding their child\xe2\x80\x99s current level of functioning\nboth in and out of intervention sessions, and they were\nasked to be active participants in their child\xe2\x80\x99s intervention,\nalthough there was no requirement for parents to provide\nany direct intervention hours.\n\nS149\n\na participant performed at the ceiling of the BSID-R, this\ntest was replaced with the Wechsler Preschool and Primary\nScales of Intelligence.31 Follow-up evaluations were\nconducted by an independent, self-employed, highlyskilled, licensed, child evaluator. VMRC made the referral\nand funded the evaluations. The referral to the evaluator\nconsisted only of the name of the child, birth date, parent\xe2\x80\x99s\nnames, and telephone number.\n\nTreatment Procedures: Comparison Group\nParticipants in the Comparison Group received com\xc2\xad\nmunity services that their families selected from the\nMatrix of Educational Options. At intake, 1 comparison\nchild, under 3 years old, received an Early Start Autism\nIntervention Program, which emphasized learning read\xc2\xad\niness skills with both the parent and child. This child\nreceived less than 9 hours per week of a discrete trial\nprogram in his or her home, until the age of 3. Two\ncomparison children received a home-based developmen\xc2\xad\ntal intervention that ranged from 1 to 4 hours a week. At\nage 3, these 3 children were enrolled in a public school\nSpecial Day Class (SDC). Seventeen children who were 3\nand above at intake were enrolled in SDC in the public\nschools. No records were available for 1 child. The\ninstructional methodology in the SDC placements was\neclectic, the child/teacher ratios varied from 1:1 to 3:1, and\nthe classes operated for 3 to 5 days per week, for up to 5\nhours per day. Related services such as speech, occupa\xc2\xad\ntional, and behavioral therapy to these children varied\nfrom approximately 0 to 5 hours per week Three of the\nchildren spent brief sessions (up to 45 minutes per day)\nmainstreamed in regular education. Due to the diverse\ninterventions provided to the comparison group, it was not\npossible to monitor treatment fidelity for this group.\nAssessment\nAt pretreatment, a licensed psychologist at EADC who\nwas independent of the study administered a standardized\nbehavior observation,25 parent interview, and developmen\xc2\xad\ntal tests, including the BSID-R, Merrill-Palmer Scale of\nMental Tests,26 Reynell Developmental Language\nScales,27 and Vineland Adaptive Behavior Scales.28 The\nBSID-R extrapolated table was used to generate a standard\nscore for children who obtained an IQ below 50.29\nAdministration of the BSID-R began at the starting point\nfor the child\xe2\x80\x99s chronological age (or at the highest starting\npoint for the test if the child was older than 42 months).\nThe examiner administered each successive item after the\nstarting point to establish a basal and ceiling; if the child\ndid not obtain a basal on these items, the examiner\nadministered each preceding item in succession until a\nbasal was achieved and then followed rules in the test\nmanual for establishing the ceiling.\nFrom the evaluation, the psychologist made a DSM-IV\ndiagnosis of autism or Pervasive Disorder, Disorder Not\nOtherwise Specified (PDDNOS).30 Subsequently, the diag\xc2\xad\nnosis was confirmed by the Autism Diagnostic InterviewRevised (ADI-R),17 administered by a certified examiner\nemployed by CVAP. The developmental tests (but not the\nADI-R) were repeated in annual follow-up evaluations. If\n\nData Analysis\nIQ was the main measure of treatment response in\nprevious EIBT studies6-16 and was designated as the\nprimary outcome measure in the present study. Secondary\noutcome measures were the Merrill-Palmer Scale of\nMental Tests, Reynell Language Comprehension, Reynell\nExpressive Language, Vineland Adaptive Behavior Scales,\nand classroom placement.\nTo test our main hypothesis that the EIBT group would\ndiffer from the comparison group on outcome measures,\nwe performed a repeated-measures analysis of covariance\n(ANCOVA) for each measure, with pretreatment score as\nthe covariate and Year 1, Year 2, and Year 3 scores as the\nrepeated dependent measures. Consistent with standard\nassumptions for an ANCOVA,32 analyses of skew and\nkurtosis, as well as visual inspection, were consistent with\na normal distribution in our data. Hyunh-Feldt epsilon tests\nconfirmed that the data showed compound symmetry (e >\n.90), unless otherwise noted in Results.\nAs is usual in outcome studies with repeated measures, a\nfew participants had missing data at one or more time\npoints. For each outcome measure, we employed the\nstandard procedure of removing participants with missing"\ndata from the analysis.32 This procedure is appropriate1\nwhen missing data are random or unbiased. We used visual\ninspection to confirm that the missing data were unbiased\n(e.g., the data were not primarily from participants who\nhad unfavorable outcomes or who did not complete the full\n3 years of intervention), and -Results- show the number of\nparticipants retained for each analysis.\nIn as much as the EIBT and comparison groups differed\non several demographic variables (mother education,\nfather education, and diagnosis), we explored whether\nadding these variables as covariates in the ANCOVA\nmodel would change the interpretation of the results.\nThese analyses need to be interpreted with caution because\nthey involve a larger number of variables than is usually\nconsidered appropriate for the relatively small sample size\nin the present study. However, they provided some\ninformation on whether or not the groups differed when\nwe statistically controlled for demographic variables.\nWhen an ANCOVA revealed a between-group differ\xc2\xad\nence on an outcome measure, we hypothesized that the\nEIBT group would show an increase in scores from Year 1\nto Year 2 to Year 3, whereas scores in the comparison\ngroup would remain stable. To test this hypothesis, we\nexamined whether the ANCOVA yielded a statistically\nsignificant Group x Time interaction; if so, we performed\nplanned comparisons to test for an increase from Year 1 to\nYear 3 in the EIBT group.\n\nCopyright \xc2\xa9 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.\n\n\x0cPet. Reh. App.49\nS150\n\nAMERINE-DICKENS ETAL\n\nTable 1. Background Information for the EIBT Group (n = 21)\nand Comparison Group (n = 21)\nEIBT\nDemographics\nMale/Female\nDiagnosis (Autism/PDDNOS)*\nAge at diagnosis [(M(SD)]\nMother education, yr [(M(SD)]\'\nFather education, yr [(/W(SD)]\'\nTwo-parent household (yes/no)*\nPretreatment Test Scores [(/W(SD)]\nIQ\nMerrill-Palmer\nReynell\nLanguage Comprehension\nExpressive Language\nVABS\nComposite\nCommunication\nDaily Living\nSocialization\n\nComparison\n\n18:3\n20:1\n30.2 (5.8)\n15.3 (2.9)\n15.8 (2.9)\n21:0\n\n17:4\n15:6\n33.2 (3.7)\n13.1 (1.6)\n11.8 (2.3)\n14:7\n\n61.6 (16.4)\n82.4 (17.3)\n\n59.4 (14.7)\n73.4 (11.9)\n\n51.7(15.2)\n52.9 (14.5)\n\n52.7(15.1)\n52.8 (14.4)\n\n69.8 (8.1)\n69.4 (11.8)\n\n70.6 (9.6)\n65.0 (6.8)\n72.7(12.5)\n75.1 (13.0)\n\n73.2 (9.2)\n70.3 (10.9)\n\nEIBT indicates early intensive behavioral treatment; Reynell,\nReynell Developmental Language Scales; VABS, Vineland Adaptive\nBehavior Scales; PDDNOS, Pervasive Disorder, Disorder Not\nOtherwise Specified.\n\'Significant difference between EIBT and comparison group\n(p< .05).\n\nTo examine the clinical significance of the results, we\nascertained the number of participants in each group who\nachieved scores in the average range at follow-up on each\nmeasure. We also sought to identify pretreatment measures\nthat were associated with later scores in the average range.\nTherefore, for the EIBT group, we conducted t-tests to\ncompare pretreatment scores of participants who scored in\nthe average range across all measures to pretreatment\nscores of the remaining participants.\n\nRESULTS\nPretreatment\nTable 1 summarizes the demographics and pretreat\xc2\xad\nment scores of the early intensive behavioral treatment\n\nJDBP/April, Vol. 27, No. 2\n\n(EIBT) and comparison groups. The gender make-up\nmirrors the 4:1 male to female ratio in Autistic\nSpectrum Disorder (ASD).31 Twenty of 21 EIBT children\n(95%) and 15 of 21 comparison children (71%) were\ndiagnosed with Autistic Disorder. This difference was\nstatistically significant, r(40) = 2.13, p < .05. The\nremaining children were classified with Pervasive Disor\xc2\xad\nder, Disorder Not Otherwise Specified (PDDNOS). Age of\ndiagnosis was 20 to 41 months, with the EIBT group\naveraging 3 months younger than the comparison group\n(a difference that was not statistically significant). Also, as\nshown in Table 1, although not a requirement for\nparticipation in the EIBT program, parents had signifi\xc2\xad\ncantly more education and were significantly more likely\nto be married than comparison parents. IQ, Merrill-Palmer,\nReynell, and Vineland scores did not differ significantly\nbetween groups; scores in both groups indicated devel\xc2\xad\nopmental delays comparable to other samples of children\nwith ASD.30\n\nOutcome\nTable 2 presents the results of the analysis of covariance\n(ANCOVA) tests for each outcome measure, whereas\nFigure 1 presents the means and 95% confidence intervals\nfor each group at intake, Year 1, Year 2, and Year 3. As\nshown in Table 2, there was a significant difference\nbetween groups on the primary outcome measure, IQ.\nFigure 1 reveals that the mean IQ in the EIBT group\nincreased 25 points, from 62 at pretreatment to 87 at Year\n3. Interestingly, the mean IQ in the comparison group also.,\nincreased, from 59 at pretreatment to 73 at Year 3. ,\nThe EIBT and comparison groups did not differ\nsignificantly on the Merrill-Palmer. Both groups displayed\na mean increase of 13 points from intake to Year 3 on this\nmeasure. Figure 1 suggests that the groups may not have\nbeen matched at pretreatment, as the mean for the EIBT\nwas 82 compared to 73 in the comparison group. A post\nhoc analysis indicated that this difference approached\nstatistical significance, r(35) = 1.87, p = .07. Also, the\nassumption of compound symmetry was questionable for\nthis variable, with Hyunh-Feldt e = .85; because the\n\nTable 2. Analyses of Covariance Testing for Differences Between the EIBT and Comparison Groups on Outcome Measures\nN\n\nSums of Squares (Between Subjects)\n\nMeasure\n\nE\n\nC\n\nGroup\n\nCovariate\n\nError\n\nIQ\nMerrill-Palmer\nReynell\nLanguage Comprehension\nExpressive Language\nVABS\nComposite\nCommunication\nDaily Living\n\n21\n\n19\n\n21\n\n16\n\n4,229.91\n246.27\n\n12,046.14\n15,613.74\n\n30,042.41\n20,657.91\n\n811.96\n626.00\n\n21\n20\n\n19\n19\n\n3,750.25\n3,413.57\n\n17,523.60\n13,590.90\n\n36,312.08\n52,495.66\n\n981.41\n1,458.21\n\n20\n20\n20\n20\n\n20\n20\n20\n20\n\n3,897.52\n3,937.71\n2,527.14\n1,857.84\n\n1,589.31\n2,937.53\n2,229.25\n21.66\n\n18,385.69\n25,994.10\n14,207.49\n16,130.41\n\n496.91\n722.06\n394.65\n460.87\n\nSocialization\n\nMSE\n\nF\n5.21*\n\nns\n3.82"\n\nns\n7.84\'"\n5.45\'\n6.40*\n4.03"\n\nN indicates number of participants included in the analysis; E, EIBT group; C, comparison group; ns, not statistically significant; MSE, mean\nsquare of errors (between subjects); Reynell, Reynell Developmental Language Scales; VABS, Vineland Adaptive Behavior Scales.\n\n* p< .05; "p< .10; "*p< .01.\n\nCopyright \xc2\xa9 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.\n\n\x0cPet. Reh. App.50\nEarly Intensive Behavioral Treatment\n\nS151\n\nTreatment \xe2\x80\x94\nComparison\n110\n\n100\nMenfll-Palmer\n\nn = 21\nn = 21\n\n100\nn = 20\n\nn = 21\nn = 20\n\n90\n\n- 21\n80-\n\n_ n = 18\nn= 17\n\n60\n\n70-\n\n70\n\n60\n\n90\n\nn= 19\n\n60-\n\no\'\n\nT\n\no\'\n\n1\n\nT\n\n100\n\nJ\nn = 21\n\nReynell (Expressive)\n\nReynell (Receptive)\n\n90\n\nn = 21\n\nn = 21\n\nn = 21\n\n80\n70\n\nn = 21\nTn =\n\nn= 19\n\nt\n\n60\n\n.i r\xe2\x80\x98\xe2\x80\x98\n\nSO\n\n8\n\n40\n\nO\no\n\nCZ)\n\n90\n\nT\n\n7\n\n7\n\nVineland Composite\n\nI\n\n40\n\nn = 21\n\n95\n\nn = 21\n\n0\n\n1\n\nVineland Communication\n\nn = 21\n\n85\n\n2\n\n3\n\nn = 21\nn = 21\n\n_n = 21\n65\n\n80\n75\n\nn-20\nn = 2.\n\nn = 20\n\nn = 20\n\na-20\nn-21\n\nn = 20\nn = 20\nn = 21\n\n70\n65\n65\n60\n\n55\n\n1\n\n0\n\n90\n\nn = 20\n\n75\n\n2\n\n3\n\nn = 21\n\n65\n\nT\n3\n\nn = 21\nn = 21\n\nn = 21"\n\nn = 21\n\nn = 21\n\nT\n1\n\nT\n2\n\nn = 21\n\n60\nn = 20\n\n75\n\nn = 20\n\n70\n\n75\n70\n\n65\n\n0\n\nI\n2\n\nn = 21\n\n90\n\nn = 21\n\nn = 21\n\n60\n\nI\n\n1\n\nVineland Socialization\n\nn = 20\n80\n\nI\n0\n\n95\n\nn-21\n\nVineland Daily Living Skills\n\n65\n\n{\n\nT\n1\n\nT\n2\n\nT\n3\n\nY\n\n65\n\nn = 21\n\n60\n\nT\n0\n\nT\n3\n\nYear\nFIGURE 1. Mean and 95% confidence interval for pretreatment (Year 0) and follow-up (Years 1-3).\n\nCopyright \xc2\xa9 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.\n\n\x0cPet. Reh. App.51\nS152\n\nAMERINE-DICKENS ET AL\n\nTable 3. Number of Children in the Average Range on each\nOutcome Measure for the EIBT Group (n = 21) and Comparison\nGroup (n = 21)\nMeasure\nIQ\nLanguage Comprehension3\nExpressive Language3\nVABS Composite6\nSchool Placement\n\nEIBT\n\nComparison\n\n12\n8\n9\n8\n6\n\n7\n\n4\n6\n3\n0\n\nP\nns\nns\nns\n.10\n.001\n\n\xe2\x80\x9cReynell Developmental Language Scales.\nbVineland Adaptive Behavior Scales.\n\nANCOVA did not approach statistical significance, alter\xc2\xad\nnate analyses were not attempted.\nThere was a trend toward a significant difference in\nReynell Language Comprehension (p = .06). The mean\nscore in the EIBT group increased 20 points, from 52 at\npretreatment to 72 at Year 3; the mean score in the\ncomparison group increased 9 points, from 53 at pretreat\xc2\xad\nment to 62 at Year 3. The EIBT group also had a larger\nincrease from pretreatment to Year 3 in Reynell Expres\xc2\xad\nsive Language (53-78, compared to 51\xe2\x80\x9466), but this\ndifference was not statistically significant (p = .13). The\nfailure to find a significant difference may indicate that\nEIBT did not have a meaningful effect on expressive\nlanguage, or it may simply reflect low statistical power to\ndetect an effect.\nThe EIBT and comparison groups differed significantly\nin the Vineland Adaptive Behavior Scales Composite.\nConsistent with this finding, the EIBT group demonstrated\na mean increase of 9 points compared to a 4-point decline\nin the comparison group, as shown in Figure 1. Inasmuch\nas a difference was observed in the Composite, individual\nscales were also analyzed. Significant differences between\ngroups were found in Communication and Daily Living\nSkills, and a trend was found for Socialization (p = .05).\nFigure 1 indicates that the changes in scores from\npretreatment to Year 3 for each scale were similar to the\nchange in Composite scores. These findings support the\ninference that the EIBT group had more advanced adaptive\nbehavior skills than the comparison group at the time of\nthe outcome assessments.\nAn analysis of classroom placement at year 3, between\nthe 2 groups, revealed that 17 of the 21 EIBT children and\n1 of the 21 comparison children were included into regular\neducation classroom settings. Of the 17 EIBT children, 6\nwere fully included without assistance, 4 were fading the\nshadow tutor, and 7 required full shadows.\nWhen mother\xe2\x80\x99 education, father\xe2\x80\x99s education, or diag\xc2\xad\nnosis was added as a covariate to the ANCOVA model,\nANCOVA was unaltered, except in one instance: With the\nfather\xe2\x80\x99s education as a covariate, the difference between\ngroups in IQ was not statistically significant (p = .11). It is\nunclear whether this finding indicates that father\xe2\x80\x99s educa\xc2\xad\ntion was a confound or reflects the limited statistical power\nfor the analysis. When mother\xe2\x80\x99s education, father\xe2\x80\x99s\neducation, and diagnosis were all added as covariates to\nthe ANCOVA model, IQ, Reynell Language Comprehen-\n\nJDBP/April, Vol. 27, No. 2\n\nsion, and Vineland Composite continued to show a trend\ntoward significance (p = .09 for all 3 outcome measures).\nIn sum, the possibility that father\xe2\x80\x99s education was a\nconfound in the analysis of IQ cannot be ruled out, but\nthe remaining analyses indicated that reliable differences\nin outcome between groups remained after statistically\ncontrolling for inequalities at pretreatment.\nNone of the analyses for group x time interactions were\nstatistically significant. Thus, we did not confirm our\nhypothesis that the EEBT group would have increasing\nscores from Year 1 to Year 2 to Year 3, whereas scores in\nthe comparison group would be stable. On the contrary,\nFigures 1 and 2 illustrates that although the EIBT group\nappeared to make larger increases than the comparison\ngroup from pretreatment to Year 1, both groups exhibited\nstable scores from Year 1 to Year 3 in IQ, Merrrill-Palmer,\nand Vineland. Both groups may have exhibited similar\nincreases in scores in Reynell Language Comprehension\nand Expressive Language from Year 1 to Year 3.\nAs shown in Table 3, more EIBT participants than\ncomparison participants achieved follow-up scores in the\naverage range for each measure, although this difference\nwas significant only for school placement and showed a\ntrend toward significance for the Vineland. Ten EIBT\nparticipants scored in the average range on all measures\n(6 of these 10 also were included in regular education\nwithout assistance, whereas the remaining 4 continued to\nreceive shadowing in the regular education classroom), ttests did not reveal any significant differences in pretreat\xc2\xad\nment test scores for these 10 participants compared to the\nremaining 11 participants. For example, these 10 children\nhad a mean pretreatment IQ of 66.6 (SD = 12.4) compared\nto 57.7 (SD = 19.0) for the remaining 11 children, t{ 19) =\n1.28, ns. However, pretreatment Reynell Language Com\xc2\xad\nprehension scores showed a trend toward a difference, with\na pretreatment mean of 58.1 for the participants with the\nmost favorable outcome compared to 45.9 for the other\nparticipants, t(19) = 1.98, p = .06.\n\nDISCUSSION\nThe present study suggests that the UCLA/Lovaas\nModel of early intensive behavioral treatment (EIBT) can\nbe implemented in a nonuniversity community-based\nsetting. On the primary outcome measure of IQ, the\nEIBT group showed a gain of 25 points, which was\nstatistically significant compared to the gain of 14 points\nin the comparison group. Similar effects were found on\nmeasures of adaptive behavior. Although language\ncomprehension showed a trend towards significance,\nexpressive language and nonverbal cognitive skill\nrevealed no difference between groups. The increases in\ntest scores are similar to those reported in Lovaas\xe2\x80\x99\noriginal EIBT study2,3 and in some recent investiga\xc2\xad\ntions.15,16 However, the difference between the EIBT\ngroup and the comparison group on outcome measures\nwas smaller than that in other studies, as the comparison\ngroup also made gains.\n\nCopyright \xc2\xa9 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.\n\nr\n\n\x0cPet. Reh. App.52\nEarly Intensive Behavioral Treatment\nAn important limitation of the study is that, because\ntreatment was funded by public agencies that were\nrequired to offer free and appropriate services, groups\ncould not be randomly assigned, and a quasi-experimental\ndesign was used, with parents choosing which group their\nchild entered. Although pretreatment test scores did not\ndiffer significantly between groups, other pretreatment\nvariables did differ. The EIBT group had more children\nwith autism and fewer with Pervasive Disorder, Disorder\nNot Otherwise Specified (PDDNOS) than did the compar\xc2\xad\nison group. To the extent that PDDNOS is a milder\ndiagnosis that may have a more favorable prognosis than\nautism,7 this difference may have favored the comparison\ngroup. However, the EIBT group also may have had an\nadvantage in that it had more 2-parent families and better\neducated families than did the comparison group. These\nfamily variables have not been associated with outcome in\nprevious studies,2,7 but they might have encouraged\nfamilies to select EIBT over other interventions in the\npresent study, even though all interventions were provided\nat no cost to families. In addition, these variables might\nhave given the EIBT group an advantage by making it\neasier for families to participate in treatment sessions and\nfacilitate generalization of skills outside of treatment.\nAfter statistically controlling for family variables, outcome\nanalyses continued to show improved outcomes in the\nEIBT group relative to the comparison group. Never\xc2\xad\ntheless, statistical controls are not a satisfactory solution\nfor preexisting group differences, especially given the\nrelatively small sample size in the present study. A design\nwith random assignment would have strengthened the\nstudy and allowed for more clearcut conclusions about\nwhether EIBT is effective or not.\nFurther limitations pertain to the assessment protocol\nin the study. As previously noted, the comparison group\nreceived such diverse interventions that a measure of\ntreatment fidelity could not be applied. Also, outside\nevaluators were employed by Valley Mountain Regional\nCenter (VMRC) for pretreatment and follow-up assess\xc2\xad\nments of participants. The referrals to the evaluators did\nnot include information on group assignment or treatment\nhistory. However, to ensure that evaluators remained\nunaware of this information and to allow for checks on\nthe reliability of test administration and scoring, eval\xc2\xad\nuators who were employed by the study and conducted\nassessments at a research site (rather than in their clinical\noffices) might have been preferable. Another limitation is\nthat the assessment protocol tested developmental level\nmore rigorously than did the features of Autistic\nSpectrum Disorder (ASD). The inclusion of the Autism\nDiagnostic Observation Schedule (ADOS),33 in addition\nto the Autism Diagnostic Interview-Revised (ADI-R) and\nclinical diagnosis, would have increased confidence in the\ninitial diagnosis. Including a measure such as the ADOS in\nfollow-up assessments would have indicated whether or\nnot children continued to display behaviors indicative of\nASD. Additional measures such as the Theory of Mind\nTest34 also would help address this issue; Central Valley\nAutism Project (CVAP) is currently involved in a study to\ntranslate this test into English and standardize it in the\n\nS153\n\nUnited States. Without such measures, the present study\ncannot address one of the most controversial issues raised\nby previous EIBT research-whether some children become\nindistinguishable from typically developing peers6 or\nwhether they continue to display characteristics of ASD.\nAn additional follow-up evaluation of study participants\nwith the ADOS and Theory of Mind (TOM) Test is\nplanned to fill in some of these gaps.\nIn this study, advanced behaviors associated with\nfriendship initiation and maintenance, social skills, under\xc2\xad\nstanding of social meaning, and response to social\nbehaviors were identified and treated, using the same\ndiscrete trial methodology as other behaviors, which\nconsequently increased the duration of treatment beyond\n3 years for many participants (usually for 2 additional\nyears). Although this expansion of the treatment protocol\nreflects the contemporary view that the defining feature of\nASD is an impairment in social reciprocity, it raises the\nquestion of whether the present study truly was a\nreplication of the UCLA model. The treatment site met\nall of Lovaas\xe2\x80\x99 criteria for replication, and the first 2 years\nof intervention followed the model as it has been\npreviously described.2 The third year also followed the\nmodel, with the addition of the training in advanced social\nskills. Thus, results from Years 1 and 2 are directly\ncomparable to those of previous studies, and results from\nYear 3 also reflect mostly the same interventions:\nResearch on the specific effects of the additional socialskills training is warranted, as it is acknowledged that such\ntraining was not included in previous studies. Also,\nalthough discrete trial training is a common approach to\nteaching social skills and has some empirical support,35,36\nteaching methodologies other than discrete trials (e.g:\nvideo modeling, incidental teaching) also have empirical*\nsupport and may have advantages such as generalizing\nmore quickly to settings outside of treatment^2 thus, the\nquestion of how best to teach such skills may be another\narea for research.\nInterestingly, although the EIBT protocol lasted for 3\nyears and, in some cases, was continued beyond that time,\nthe nonsignificant group x time interactions in the\nstatistical analyses indicates that the EIBT group did not\nshow reliable IQ increases relative to the comparison\ngroup after Year 1. A possible explanation is that most\ngains occurred in the first year of intervention. Alter\xc2\xad\nnatively, however, it is also possible that gains took place\nlater in treatment but that the study measures were not\nsensitive to them.\nPotential evidence for the latter view comes from the\nfindings on classroom placement. A striking result was\nthat, despite IQ gains in the comparison group, all\nparticipants but 1 remained primarily in a special educa\xc2\xad\ntion classroom setting, whereas most EIBT participants\nwere included in regular education at least part of the day.\nClassroom placement is a controversial outcome measure\nbecause of concerns that it may reflect factors such as\nparent advocacy and school policy rather than the child\xe2\x80\x99s\nfunctioning.12 However, the measure also may be an index\nof real-world academic and social competence.37 If so, the\ndifferences between groups on this measure may be\n\nCopyright \xc2\xa9 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.\n\n\x0cPet. Reh. App.53\nS154\n\nAMERINE-DICKENS ET AL\n\nattributable at least in part to the social skills training that\nEIBT participants received. In addition, it may suggest a\nneed for a high number of treatment hours. Dismantling\nstudies might help address these possibilities.\nThe initial collaborative funding efforts by VMRC and\nSpecial Education Local Planning Areas (SELPAs)\nresulted in a sustainable treatment environment. Stable\nfunding, effective guidelines and policies, and positive\ncommunication and working relationships were primary\ncontributory variables to the feasibility of this study. Thus,\nthis collaboration may be a useful model for other regions\nto employ. Other clinical strengths of this study included\nrigorous treatment quality control criteria, stringent staff\ntraining and evaluation standards, multiple internships at\nUCLA by supervising clinicians, precise programming for\neach individual child, advanced completion programming\nand skilled generalization training, yearly follow-ups by an\nindependent evaluator using multiple outcome measures,\n\nJDBP/April, Vol. 27, No. 2\n\nand a centralized process and standardized protocol for\ndiagnosing children and informing families of EIBT and\nother intervention options available to them. Without such\nstandards, outcomes may differ. Nevertheless, given the\nmethodological limitations of the present research, there is\na continued need for rigorous outcome studies comparing\nEIBT to control conditions or other interventions.\nAcknowledgments. Preparation of this manuscript was partially sup\xc2\xad\nported by NIMH grants ROI MH 48663 (Multi-site Young Autism Project)\nand U54 MH066397 (Genotype and Phenotype of Autism). The authors\nthank Kym Cassaretto, M.S., CVAP Clinical Director, and CVAP staff for\ntheir clinical contributions and committed efforts on behalf of children and\nfamilies. The authors also thank the following individuals for their\nassistance with the research: Mieke San Julian, Chanti Fritzsching, M.S.,\nBCBA, and Angela Castro from CVAP; Schelley McDonald and Marie\nOvermyer from VMRC; and Suzannah Ferraioli from URMC. Preliminary\nreports were presented at the California Association for Behavior Analysis\nConference in San Francisco, CA, February 20, 2004, and at the Interna\xc2\xad\ntional Meeting For Autism Research, Boston, MA, May 6, 2005.\n\nREFERENCES\n1.\n2.\n\n3.\n\n4.\n\n5.\n\n6.\n7.\n\n8.\n\n9.\n10.\n\n11.\n\n12.\n\n13.\n\nDeMyer MK, Hingtgen JN, Jackson RK. Infantile autism: a decade\nof research. Schizophr Bull. 1981;7:388-451.\nLovaas OI. Behavioral treatment and normal educational and\nintellectual functioning in young autistic children. J Consult Clin\nPsychol. 1987;55:3-9.\nMcEachin JJ, Smith T, Lovaas OI. Long-term outcome for children\nwith autism who received early intensive behavioral treatment. Am J\nMent Retard. 1993;97:359-372.\nZwaigenbaum L, Bryson S, Rogers T, et al. Behavioral manifes\xc2\xad\ntations of autism in the first year of life. Int J Dev Neurosci. 2005;\n23:143-152.\nCalifornia Department of Developmental Services. Department\nof Developmental Services Fact Book, 7th ed. California Depart\xc2\xad\nment of Developmental Disabilities website. December, 2004.\nAvailable at www.dds.ca.gov/factsstats/factbook.cfm#pdf. Accessed\nAugust 16, 2005.\nLovaas OI. Teaching Individuals with Developmental Delays: Basic\nIntervention Techniques. Austin, TX: PRO-ED; 2003.\nSmith T, Groen AD, Wynn JW. Randomized trial of intensive early\nintervention for children with pervasive developmental disorder. Am\nJ Ment Retard. 2000;105:269-285.\nAnderson SR, Avery DL, DiPietro EK, et al. Intensive home-based\nearly intervention with autistic children. Educ Treat Child. 1987;\n10:352-366.\nBimbrauer JS, Leach DJ. The Murdoch Early Intervention Program\nafter two years. Behav Change. 1993;10:63-74.\nHarris SL, Handleman JS. Age and IQ at intake as predictors of\nplacement for young children with autism: a four- to six-year\nfollow-up study. J Autism Dev Disord. 2000;30:137-142.\nWeiss M. Differential rates of skill acquisition and outcomes of\nearly intensive behavioral intervention for autism. Behav Intern.\n1999;14:3-22.\nSchopler E, Short A, Mesibov G. Relation of behavioral treatment to\n\xe2\x80\x9cnormal functioning\xe2\x80\x9d: comment on Lovaas. J Consult Clin Psychol.\n1989;57:162-164.\nBibby P, Eikeseth S, Martin NT, et al. Progress and outcomes for\nchildren with autism receiving parent-managed intensive interven\xc2\xad\ntions. Res Dev Disabil. 2002;23:81-104.\n\n14.\n\n15.\n\n16.\n\n17.\n18.\n19.\n\n20.\n21.\n\n22.\n23.\n\n24.\n\n25.\n\n26.\n27.\n\nSmith T, Buch GA, Gamby TE. Parent-directed, intensive early\nintervention for children with pervasive developmental disorder. Res\nDev Disabil. 2000;21:297-309.\nHoward JS, Sparkman CR, Cohen HG, Green G, Sanislaw HA.\nComparison of intensive behavior analytic and eclectic treatments\nfor young children with autism. Res Dev Disabil. 2005;26:359-383.\nSallows GO, Graupner TD. Intensive behavioral treatment for\nchildren with autism: four year outcome and predictors. Am J Ment\nRetard. 2005;110:417-438.\nLord C. Follow-up of two-year-olds referred for possible autism. J\nChild Psychol Psychiatry. 1995;36:1365-1382.\nBayley N. Bayley Scales of Infant Development, 2nd ed. San\nAntonio, TX: The Psychological Corporation; 1993.\nCohen HG. Pyramid building: Partnership as an alternative to\nlitigation. In: Lovaas OI ed. Teaching Individuals with Develop\xc2\xad\nmental Delays: Basic Intervention Techniques. Austin, TX: PROED, 2003:375-386.\nLovaas OI. Teaching Developmentally Disabled Children: The ME\nBook. Austin, TX: PRO-ED; 1981.\nRegion 6 Autism Connection. Early Intensive Behavioral Treatment\n4-way Agreement. Stockton, CA: Region 6 Autism Connection;\n2004.\nSmith T. Discrete trial training in the treatment of autism. Focus\nAutism Relat Disord. 2000;16:86-92.\nMortenson S, Smith T. Quality Control in the Multisite Young\nAutism Project. Paper presented at: Annual Meeting of the\nAssociation for Behavior Analysis; May 1996; San Francisco, CA.\nDavis BJ, Smith T, Donahoe P. Evaluating supervisors in the UCLA\ntreatment model for children with autism: validation of an assess\xc2\xad\nment procedure. Behav Ther. 2002;31:601-614.\nCalifornia Department of Developmental Services. Best Practice\nGuidelines for Screening, Diagnosis, and Assessment. Ethological\nObservation Schedule (ETHOS). Sacramento, CA: California\nDepartment of Developmental Services; 2002.\nStutsman R. Guide for Administering the Merrill-Palmer Scale of\nMental Tests. New York: Hareourt, Brace & World; 1948.\nReynell JK. Reynell Developmental Language Scales. Windsor,\nEngland: Nfer-Nelson; 1990.\n\nCopyright \xc2\xa9 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.\n\n\x0cPet. Reh. App.54\nEarly Intensive Behavioral Treatment\n28.\n29.\n\n30.\n\n31.\n32.\n\nSparrow SS, Balia DA, Cicchetti DV. Vineland Adaptive Behavior\nScales. Circle Pines, MN: American Guidance Service; 1984.\nRobinson BF, Mervis CB. Extrapolated raw scores for the second\nedition of the Bayley Scales of Infant Development. Am J Ment\nRetard. 1996;100:666-671.\nAmerican Psychiatric Association. Diagnostic and Statistical Man\xc2\xad\nual of Mental Disorders, 4th ed, Text Revision. Washington, DC:\nAmerican Psychiatric Association; 2000.\nWechsler D. Manualfor the Wechsler Intelligence Scalefor Children,\n3rd ed. San Antonio, TX: Psychological Corporation; 1991.\nNich C, Carroll K. Now you see it, now you don\xe2\x80\x99t: a comparison of\ntraditional versus random-effects regression models in the analysis\nof longitudinal follow-up data from a clinical trial. J Consult Clin\nPsychol. 1997;65:252-261.\n\nS155\n\n33. Lord C, Rutter M, DiLavore PC, et al. Autism Diagnostic\nInterview Schedule. Los Angeles: Western Psychological Services;\n2001.\n34. Steememan P, Meesters C, Muris P. TOM-Test. Antwerpen-Appledoora: Garant; 2003.\n35. Taylor BA, Jasper S. Teaching programs to increase peer inter\xc2\xad\naction. In: Maurice C, Green G, Foxx M eds. Making a Difference:\nBehavioral Intervention for Autism. Austin, TX: Pro-Ed; 2001:\n97-162.\n36. Weiss MJ, Harris SL. Reaching out, joining. In: Teaching Social\nSkills to Young Children with Autism. Bethesda, MD: Woodbine\nHouse; 2001.\n37. Kazdin A. Replication and extension of behavioral treatment of\nautistic disorder. AmJMent Retard. 1993;97:382-383.\n\nCopyright \xc2\xa9 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.\n\n\x0cPet. Reh. App.55\n\nResearch in Developmental Disabilities 35 (2014) 3326-3344\n,\n\nContents lists available afSoonceOiiect\n\nj\n\nResearch in Developmental Disabilities\n\nSft,\n\nm1\n\naiEiim\n\n_________\n\npi CrossMark\nComparison of behavior analytic and eclectic early\nAil\ninterventions for young children with autism after three years\nJane S. Howardab l *, Harold Stanislaw3, Gina Green0, Coleen R. Sparkmanb,\\\nHoward G. Cohen d\na California State University, Stanislaus, Psychology Department, 1 University Circle, Turlock, CA 95382, USA\nbThe Kendall Centers/Therapeutic Pathways, Modesto, CA 95354, USA\nc Association of Professional Behavior Analysts, 6977 Navajo Road #176, San Diego, CA 92119, USA\nd Valley Mountain Regional Center, 702 North Aurora St, Stockton, CA 95202, USA\n\nARTICLE INFO\n\nABSTRACT\n\nArticle history:\nReceived 16 June 2014\nReceived in revised form 6 August 2014\nAccepted 12 August 2014\nAvailable online\n\nIn a previous study, we compared the effects of just over one year of intensive behavior\nanalytic intervention (IBT) provided to 29 young children diagnosed with autism with two\neclectic (i.e\xe2\x80\x9e mixed-method) interventions (Howard, Sparkman, Cohen, Green, &\nStanislaw, 2005). One eclectic intervention (autism programming; AP) was designed\nspecifically for children with autism and was intensive in that it was delivered for an\naverage of 25-30 h per week (n=16). The other eclectic intervention (generic\nprogramming; GP) was delivered to 16 children with a variety of diagnoses and needs\nfor an average of 15-17 h per week. This paper reports outcomes for children in all three\ngroups after two additional years of intervention. With few exceptions, the benefits of IBT\ndocumented in our first study were sustained throughout Years 2 and 3. At their final\nassessment, children who received IBT were more than twice as likely to score in the\nnormal range on measures of cognitive, language, and adaptive functioning than were\nchildren who received either form of eclectic intervention. Significantly more children in\nthe IBT group than in the other two groups had IQ, language, and adaptive behavior test\nscores that increased by at least one standard deviation from intake to final assessment\nAlthough the largest improvements for children in the IBT group generally occurred during\nYear 1, many children in that group whose scores were below the normal range after the\nfirst year of intervention attained scores in the normal range of functioning with one or\ntwo years of additional intervention. In contrast children in the two eclectic treatment\ngroups were unlikely to attain scores in the normal range after the first year of intervention,\nand many of those who had scores in the normal range in the first year fell out of the normal\nrange in subsequent years. There were no consistent differences in outcomes at Years 2 and\n3 between the two groups who received eclectic interventions. These results provide\nfurther evidence that intensive behavior analytic intervention delivered at an early age is\nmore likely to produce substantial improvements in young children with autism than\ncommon eclectic interventions, even when the latter are intensive.\n\xc2\xa9 2014 The Authors. Published by Elsevier Ltd. This is an open access article under the CC\nBY-NC-ND license (http://creativecommons.Org/licenses/by-nc-nd/3.0/).\n\nKeywords:\nAutism\nEarly intervention\nApplied behavior analysis\nEclectic treatment\nOutcomes\nLongitudinal studies\n\n\xe2\x80\x98.j.\n\n\xe2\x80\xa2\nhr\n\n* Corresponding author at: California State University, Stanislaus, Psychology Department, 1 University Circle, Turlock, CA 95382, USA.\nTel.: +1 209 667 3386; fax: +1 209 993 8192.\nE-mail addresses: jhoward@csustan.edu, janeshoward@mac.com (J.S. Howard), hstanislaw@csustan.edu (H. Stanislaw). ggreen@apbahome.net\n(G. Green), csparkman@tpathways.org (C.R. Sparkman).\n1 Address: PO Box 5157, Modesto, CA 95352, USA.\nhttp://dx.doi.Org/l0.1016/j.ridd.20l4.08.021\n0891-4222/\xc2\xa9 2014 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/\nlicenses/by-nc-nd/3.0/).\n\ni\n\n*\nA\n\n.)\n\n\xe2\x80\xa2\n\n$\n\n\x0cPet. Reh. App.56\n\nJ.S. Howard et ai/Research in Developmental Disabilities 35 (2014) 3326-3344\n\n3327\n\n1. Introduction\nThe past two decades have seen increased interest in early intervention for children diagnosed with autism spectrum\ndisorder (hereafter, "autism\xe2\x80\x9d) among researchers, policymakers, funding sources, and consumers. Following publication of\nthe Lovaas study in 1987, a number of researchers began evaluating the effects of intensive, comprehensive early\nintervention using applied behavior analysis (ABA) methods. Various ABA models for treating children with autism have\nbeen proposed, but many behavior analytic researchers agree that genuine early intensive ABA treatment programs have\ncertain key features in common: (a) individualized, comprehensive intervention that addresses all skill domains; (b) use of\nmultiple behavior analytic procedures (not just discrete-trial procedures or \xe2\x80\x9cnaturalistic" techniques) to build new\nrepertoires and reduce behaviors that interfere with skill acquisition and effective functioning; (c) direction and oversight by\none or more professionals with advanced training in ABA and experience with young children with autism; (d) reliance on\ntypical developmental sequences to guide selection of treatment goals; (e) parents and other individuals trained by behavior\nanalysts to serve as active co-therapists; (f) intervention that is initially one-to-one, transitioning gradually to a group format\nas warranted; (g) intervention that often begins in homes or specialized treatment centers but is also delivered in other\nenvironments, with gradual, systematic transitions to regular schools when children develop the skills required to learn in\nthose settings; (h) planned, structured intervention provided for a minimum of 20-30 h per week with additional hours of\ninformal intervention provided throughout most other waking hours, year round; (i) intensive intervention beginning in the\npreschool years and continuing for at least 2 years (Eldevik et al., 2010; Green, Brennan, & Fein, 2002).\nSubstantial research has documented the effectiveness of treatments that incorporate all of the foregoing features. Eight\nprospective studies used comparison- or control-group designs to evaluate some variation of the Lovaas/UCLA model of early\nintensive ABA intervention for children with autism (Cohen, Amerine-Dickens, & Smith, 2006; Eikeseth, Smith, Jahr, &\nEldevik, 2002; Eikeseth, Smith,Jahr, & Eldevik, 2007; Eldevik, Hastings,Jahr, & Hughes, 2012; Eldevik, Eikeseth,Jahr, & Smith,\n2006; Lovaas, 1987; Sallows & Graupner, 2005; Smith. Groen, & Wynn, 2000). In another three studies, the ABA intervention\nwas designed and overseen by professional behavior analysts not affiliated with Lovaas, and the ABA intervention differed\nsomewhat from the Lovaas model (Howard, Sparkman, Cohen, Green, & Stanislaw, 2005; Remington et al., 2007; Zachor,\nBen-ltzchak, Rabinovitch, Si Lahat, 2007). Outcomes from those 11 studies varied and some children had larger\nimprovements than others. In the large majority of cases, however, the mean change scores achieved by children receiving\nintensive ABA treatment exceeded the mean change scores for similar children in control or comparison groups who\nreceived less intensive ABA treatment, intensive or non-intensive treatment using a mixture of methods or therapies\n(\xe2\x80\x9ceclectic\xe2\x80\x9d treatment), or \xe2\x80\x9ctreatment as usual" (i.e., standard early intervention or special education services). Additionally,\ncompared to children who received other types of treatment, children who received early intensive ABA treatment were\nmore likely to achieve post-treatment scores on one or more standardized measures that were in the normal range, and were\nmore often placed in regular classrooms (for reviews and analyses, see Eikeseth, 2009; Eldevik et al., 2009, 2010; Green,\n2011; National Autism Center, 2009; Reichow Si Wolery, 2009; Rogers & Vismara, 2008).\nDespite the evidence from multiple studies and meta-analyses favoring intensive ABA treatment for autism over other\nmodels of early intervention, a number of questions persist. One is whether other types of early intervention delivered with\ncomparable intensity and individualization can produce outcomes comparable to ABA. Perhaps the most common\nalternative early intervention approach involves a mixture of methods drawn from ABA, speech-language pathology,\noccupational therapy (especially sensory integration techniques), developmental psychology, and autism-specific\napproaches. That model, which has been characterized as \xe2\x80\x9ceclectic\xe2\x80\x9d intervention, is widely available in the United States\nand elsewhere.\nAt least three studies have compared eclectic and ABA interventions directly. Eikeseth et al. (2002) studied children with\nautism who entered treatment at ages 4-7 years (JW = 5.5 years), slightly older than children in most of the other studies of\nearly intensive behavioral intervention. One group (n = 13) received Lovaas-model ABA treatment for 28 h per week, while a\nsecond group (n = 12) received eclectic intervention for 29 h per week. There were no significant differences between the\ngroups when treatment began. Both forms of treatment were delivered in public school classrooms. After 1 year, the ABA\ntreatment group had gained an average of 17 points on IQ test scores, 13 points on tests of language comprehension, 27\npoints on tests of expressive language, and 11 points on an adaptive behavior scale. The eclectic treatment group had average\ngains of only 4 points on IQ tests and 1 point on language tests, and no change in adaptive behavior. A follow-up study\nconducted when those children were 8 years old found that after about 3 years of treatment, the ABA treatment group had\ngained an average of 25 IQ points and 9-20 points on adaptive behavior scales in comparison to baseline. The eclectic\nintervention group had a mean gain of only 7 points on IQ tests, and declines of 6-12 points on adaptive behavior\nassessments (Eikeseth et al., 2007).\nA study we published previously involved a comparison of intensive ABA intervention with two different eclectic\nintervention models (Howard et al., 2005). Twenty-nine preschool children with autism received early intensive behavior\nanalytic intervention (IBT), 16 received intensive eclectic intervention designed for children with autism (designated the\nautism programming, or AP, group), and an additional 16 received typical non-intensive, eclectic early intervention services\n(designated the generic programming, or GP, group). All children began intervention prior to 48 months of age and received\ntreatment for an average of 14 months. They were placed in treatment groups on the basis of parental preferences and\neducation team decisions, and evaluated pre-treatment and annually thereafter by professionals who were neither involved\nin nor employed by any of the treatment programs. The three groups were shown to be similar on key variables when\n\n\x0c3328\n\nPet. Reh. App.57\n\nJ.S. Howard et ai/Research in Developmental Disabilities 35 (2014) 3326-3344\n\ntreatment began. After 14 months of intervention, mean scores on standardized tests of intellectual, communication, and\nadaptive skills were significantly higher for children in the IBT group than for children in the other two groups. Children in\nthe IBT group had an average standard IQ score of 90, compared to 62 and 69 for children in the AP and GP groups,\nrespectively. Developmental trajectories for most measures accelerated markedly over the 14 months of treatment for\nchildren in the IBT group, while the trajectories for children in the other two groups remained flat or declined.\nFor the present study, we followed children who participated in the 2005 study through an additional 2 years of\ntreatment. We focused on four questions: (a) did Year 1 differences in the cognitive, language, and adaptive behavior scores\nof children in the three groups persist? (b) Did differences in the developmental trajectories of the three groups at Year 1\nchange during Years 2 and 3? (c) How many children in each group had standardized test scores in the normal range after 2\nor 3 years of treatment? (d) To what extent were outcomes at Year 1 correlated with outcomes at Years 2 and 3?\n2. Method\n2.1. Participants\nThe same 61 children who participated in the Howard et al. (2005) study participated in this follow-up. Characteristics of\nthe groups at intake are reported in Howard et al. (2005).2\nAssessments were conducted 1-3 years after treatment began, but not all skill domains were assessed each year with\nevery child. (See Section 3.1 for number of assessments available for each group at intake and at Years 1 -3.) In particular, one\nchild in the GP group and one child in the IBT group did not receive any assessments after the first year of treatment.\nNonetheless, scores for all 61 children were retained for the present analyses to permit evaluations of outcomes that were\nnot included in our 2005 publication.\n2.2. Treatments\nInformation about the treatments participants received, school placements, and number of hours and services authorized\nduring Years 2 and 3 was obtained through file review.\n2.2.1. Intensive behavior analytic treatment (IBT)\nThis treatment was designed and delivered by personnel in a California non-public agency that provides ABA services to\nchildren with autism. Treatment was directed by the first author, a Board Certified Behavior Analyst-Doctoral\xc2\xae (BCBA-D\xc2\xae)\nand licensed psychologist, and the fourth author, a licensed speech-language pathologist. Programs were supervised by\nBoard Certified Behavior Analysts\xc2\xae (BCBAs\xc2\xae) and other staff with master\xe2\x80\x99s degrees in psychology or special education and\nsome training in ABA. They were supported by staff who were either Board Certified Assistant Behavior Analysts\xc2\xae (BCaBAs\xc2\xae)\'\nor who had bachelors degrees, most of whom were enrolled in graduate programs in ABA and related areas. Treatments werg\'\ndelivered to children by behavior technicians working under the supervision of the clinical staff. Behavior technicians began\ndelivering treatment only after they had passed competency-based performance evaluations; thereafter, they were directly\nobserved and received written or oral feedback on their implementation of behavior change protocols from their clinical\nsupervisors an average of once or twice each week.\nTo varying degrees, all parents helped support treatment outside of formal treatment hours. Parent training initially\nfocused on teaching instruction-following, promoting spontaneous language, re-directing nonfunctional repetitive behavior,\nmanaging interfering behaviors, and building skills such as toileting, dressing, and independent play. Parents were also\ntrained to implement behavior analytic procedures that were designed to increase success in activities relevant to health and\nself-care, such as cooperating with medical and dental care procedures and participating in sports and other community\nactivities.\nTreatment was delivered in multiple settings, including homes, treatment centers, community settings, and regular\npreschool and elementary school classrooms. Treatment protocols utilized the full range of behavior analytic procedures,\ncustomized to each child\xe2\x80\x99s level of functioning, preferences, family circumstances, and treatment goals. Each child received\nan average of 35-40 h of treatment per week. The adult:child ratio during Year 1 was 1:1, but during Years 2 and 3 the ratio\nwas gradually decreased (e.g., to 1:2 or 1:3, and then to one adult per small group of children), depending on progress and\ntreatment targets. For further details, see Howard et al. (2005).\n\n2 While assembling the data for this study, we uncovered several errors in data reported in our 2005 paper. Most were minor (e.g., 1 -month errors in the\nchild\xe2\x80\x99s age), but the baseline scores of one child in the GP group were reported incorrectly as Year 1 scores, and the Year 1 scores of another child in the GP\ngroup were reported as baseline scores. Correcting those errors had virtually no impact on the conclusions that were drawn in the 2005 paper; all 107 of the\nstatistical tests reported as not significant in 2005 remained non-significant, and 40 of the 43 findings that were reported as statistically significant in 2005\nremained so. The three exceptions were for group differences that were only marginally significant in the 2005 publication: the difference at intake between\nthe mean nonverbal age equivalents for the AP and GP groups changed from p = 0.04 to p = 0.07; the difference at follow-up between the mean motor\nstandard scores of the IBT group and the two comparison groups changed from p = 0.04 top = 0.06; and, when the mean self-help skills learning rates before\nand after treatment were compared, the difference between the IBT group and the two comparison groups changed from p = 0.05 to p = 0.07. Revised tables\nreporting all corrections are available as supplementary materials.\n\n\x0cPet. Reh. App.58\n\nJ.S. Howard et ai/Research in Developmental Disabilities 35 (2014) 3326-3344\n\n3329\n\nData from standardized assessments as well as direct observation and measurement of target behaviors guided decision\xc2\xad\nmaking about the distribution of treatment hours across targets and settings. Initial treatment targets focused on\nfoundational repertoires (e.g., attending, imitating vocal and motor sequences, following spoken directions, receptive and\nexpressive labeling, initiating requests, tolerating change, etc.) that are often absent or at low levels in children with autism.\nTreatment targets during Years 2 and 3 generally focused on advanced cognitive, social, play, self-care, academic, and\ncommunication skills (for example, see Fischer, Howard, Sparkman, & Moore, 2009). More complex interactions involving\npeers and siblings generally occurred during Years 2 and 3 than in Year 1. On average, children in the IBT group had more\nthan 200 goals on their annual individualized education programs (lEPs).\nWhen children acquired the skills necessary to benefit from small group instruction (e.g., learning through observing the\nbehavior of others, language skills close to the level of instruction, low levels of problem behaviors, independent\ncommunication of basic needs), they were placed in preschool or kindergarten programs for typically developing children for\nup to 15 h per week. Each child was accompanied by a behavior technician who used a variety of behavior analytic\napproaches, including self-management and behavioral contracting procedures, to arrange opportunities to prompt and\nreinforce behavior targets in order to promote skill acquisition and generalization across settings. The clinical supervisor\ndirecting the intervention also provided training and consultation to parents, teachers, and other professionals. Sample\nbehaviors targeted in the regular classrooms included following instructions from classroom teachers and aides, engaging in\nclassroom routines, and interacting with peers. Time spent with typically developing peers was gradually increased based on\nskill acquisition, maintenance and generalization of skills, and level of problem behaviors. Most children did not enter\nkindergarten until age 6.\n2.2.2. Autism programming (AP) and generic programming (CP)\nBrief descriptions of the AP and GP interventions are presented next; for details see Howard et al. (2005). The AP programs\nwere designed specifically for children with autism. Intervention procedures were drawn from the Training and Education of\nAutistic and Related Communication Handicapped Children (TEACCH) approach, sensory integration therapy, commercially\navailable programs (e.g., the Picture Exchange Communication System; Bondy & Frost, 1994), and some behavior analytic\nprocedures, such as discrete-trial procedures. Children in this group received an average of25-30 h of intervention per week\nin public school classrooms with staffing ratios of 1:1 or 1:2. Thus, the AP programs provided eclectic intervention at an\nintensity that was comparable to IBT.\nThe GP intervention was delivered in special education classrooms that served children with a variety of diagnoses and \'\neducational needs. Programming that was described as \xe2\x80\x9cdevelopmentally appropriate\xe2\x80\x9d and \xe2\x80\x9clanguage rich\xe2\x80\x9d was provided for\nan average of 15-17 h per week, with slightly more hours as children approached age 6. Adult:child ratios averaged 1:6.\nApproximately one third of the children in both the AP and GP treatment groups received "pull out\xe2\x80\x9d speech therapy\nsessions of less than 30 minutes once or twice a week during Years 1 and 2. About 20% of the children in both groups received .\nrt.\nsome services in general education classrooms, which often included such activities as lunch, physical education, or recess.\nOn average, each child in the AP and GP groups had fewer than 15 goals on his/her annual IEP.\n2.2.3. Summary\nAll of the children in the IBT group and the majority in the two eclectic treatment groups had similar placements and\nprogramming during Years 2 and 3 as in Year 1. Some children changed from one eclectic treatment to the other after Year 1,\nwhile the Year 2 and/or Year 3 intervention was not available for a few AP and GP children. This information is summarized in\nFig. 1, which is similar to a Sankey diagram. Sankey diagrams are used in engineering (see Schmidt, 2008, for an overview)\nand vary the width of each arrow in proportion to the number represented by that arrow. Thus, in the GP treatment group,\nthe arrow leading from GP treatment in Year 1 to AP treatment in Year 2 (which represents n = 3 children) is three times as\nwide as the arrow leading from AP treatment in Year 2 back to GP treatment in Year 3 (representing n = 1 child).\n2.3. Design\nWe utilized a between-groups design to compare performances of children in the IBT group with those of children in the\ntwo eclectic treatment groups at intake and at followup assessments about 1-3 years later. As reported in Howard et al.\n(2005), the three groups of children were substantially similar on most key variables at intake. The only significant\ndifferences were in mean chronological ages and parents\' education, which were controlled for statistically (see Section 2.3.2\nbelow).\n2.3.1. Dependent measures\nThe principal dependent measures in this study were scores on full-scale IQ tests (cognitive skills), measures of language\ndevelopment, and adaptive behavior scales (composite scores as well as communication, self-help, and social skills scores).\nScores on nonverbal IQ tests, receptive and expressive communication skills assessments, and motor skills were also\nanalyzed. Since these latter skills were often not measured in Year 3, we report the Year 2 scores if Year 3 scores were not\navailable.\nAll intake and follow-up assessments were conducted by experienced, qualified examiners who were not involved in\ntreating any children in any of the groups. Assessments were conducted in the child\xe2\x80\x99s home, in the examiner\xe2\x80\x99s office, at a\n\xe2\x96\xa0\n\n\x0c3330\n\nPet. Reh. App.59\n\nJ.S. Howard et aL/Research in Developmental Disabilities 35 (2014) 3326-3344\n\nastyear\n\n2nd year\n\n3rd year\nAP\n(n = 7t\n\nAP\ntreatment\ngroup\n\nGP\n\n[*5]\n\ntreatment\ngroup\n\nAT;\n<n = 3)\n\xe2\x96\xa0Unknown1\n\n1st year\n\n3rd year\n\n2nd year\n\nFig. 1. Movement of children between AP and GP treatments by year. Children in the IBT treatment group had the same treatment all three years. Agency\nfiles did not report the type of treatment that was received during Years 2 and 3 for some of the children who initially received the AP or CP treatments.\nTable 1\nAge (in months) at each assessment, and interval between intake and each subsequent assessment.\nMeasure\n\nAge at diagnosis\nAge at intake testing\nAge at Year 1 follow-up\nAge at Year 2 follow-up\nAge at Year 3 follow-up\nMonths between intake and Year 1\nMonths between intake and Year 2\nMonths between intake and Year 3\n\nIBT\n\nAP\n\nGP\n\nM\n\nSD\n\nM\n\nSD\n\nM\n\nSD\n\n30.07\n30.86\n45.24\n57.64\n69.24\n14.31\n27.05\n37.90\n\n5.30\n5.16\n5.84\n5.30\n5.01\n2.22\n1.91\n2.98\n\n39.31\n37.44\n50.69\n63.21\n74.33\n13.25\n25.36\n37.13\n\n5.52\n5.68\n5.64\n5.86\n5.98\n2.84\n1.82\n2.36\n\n34.94\n34.75\n49.06\n62.23\n73.46\n14.31\n26.85\n38.46\n\n5.18\n4.80\n5.64\n6.15\n6.10\n2.44\n3.11\n2.30\n\nIBT mean minus\nAP/GP mean\n\nAP mean minus\nCP mean\n\n-7.06\n-5.23\n-4.63\n-5.10\n-4.69\n0.53\n0.97\n0.15\n\n4.38*\n2.69\n1.63\n0.98\n0.87\n-1.06\n-1.49\n-1.33\n\n* p < 0.05.\n\'* p<0.01.\n\nschool, or in the settings of local non-profit entities (Regional Centers) that contracted with the state to manage services to\npersons with developmental disabilities. As reported in Howard et al. (2005), Year 1 testing occurred an average of 14.3\nmonths after intake. Thereafter, parents of all children were contacted annually to determine if they were interested in\nhaving their children participate in follow-up assessments. Table 1 shows the mean ages of the groups at each assessment\nand the intervals between assessments. On average, Year 2 testing occurred 23-34 months after intake (M = 27.0 months),\nand Year 3 assessments occurred 31-43 months after intake (JVf = 37.9 months).\nThe examiners selected standardized tests of cognitive skills, language skills, and adaptive behavior that were suited to\neach child\xe2\x80\x99s age and level of functioning. Howard et al. (2005) described the instruments used at intake and at Year 1. After\nYear 1, adaptive behavior was assessed using the Vineland Adaptive Behavior Scales (VABS). Nonverbal IQ was assessed\nusing the Merrill-Palmer Scales of Development (although the Leiter International Performance Scale was used for one child\nin the IBT group in Year 3). Full-scale IQ was typically assessed after Year 1 using the developmentally appropriate Wechsler\ninstrument, either the Wechsler Preschool and Primary Scale of Intelligence (WPPSI-111 or WPPSI-Revised) or the Wechsler\nIntelligence Scale for Children (WISC-III or WISC-IV). However, one child in the IBT group was administered the StanfordBinet Intelligence Scale (the 4th edition in Year 2 and the 5th edition in Year 3), and in Year 3 two children in the IBT group\nwere administered the Differential Ability Scales, one IBT child was administered the Slosson Intelligence Test-Revised, and\none IBT child was administered the Woodcock-Johnson Tests of Cognitive Abilities III. Receptive and expressive language\nskills were assessed using a variety of instruments. The most common was the Reynell Developmental Language Scales.\nOthers included the Receptive One-Word Picture Vocabulary Test, the Expressive One-Word Picture Vocabulary Test, the\nPeabody Picture Vocabulary Test (3rd edition), the Expressive Vocabulary Test, and the Sequenced Inventory of\nCommunication Development-Revised.\nMeasures for which developmental equivalents were available were converted to developmental quotients (DQs) for\nanalysis using the formula DQ= 100 x developmental equivalent (months)/chronological age (months). When all children\n\n\x0cPet. Reh. App.60\n\nJ.S. Howard et al./Research in Developmental Disabilities 35 (2014) 3326-3344\n\n3331\n\nare the same age, there is no statistical difference between analyzing standard scores (SSs), developmental equivalents, and\nDQs. Unlike the other two measures, however, DQs allow valid comparisons to be made among children who have different\nchronological ages at the same assessment time, and automatically compensate for different intervals between assessment\ntimes (cf. Delmolino, 2006; Lord & Schopler, 1989).\n2.3,2. Statistical analyses\nAs in our original study, statistical analyses focused on comparing the mean scores of children in the 1BT group with those\nof children in the AP and GP groups; comparing the mean scores of children in the AP group with those of children in the GP\ngroup was of secondary interest. Accordingly, in this study we used the same multiple regression approach we employed in\nHoward et al. (2005). One term in the regression equation was a contrast that compared mean scores of the children in the\n1BT group with mean scores of the children in the AP and GP groups, while a second contrast term (orthogonal to the first)\ncompared the mean scores of children in the AP group with the mean scores of children in the GP group. Both contrasts were\ntested simultaneously, together with two covariates (chronological age at diagnosis and parents\xe2\x80\x99 mean years of education) to\ncontrol for group differences in the covariates.\nSeparate multiple regression analyses were performed for each of the four assessment times (intake. Year 1, Year 2, and\nYear 3). Repeated measures analyses examining all four assessment times at once were precluded because not all children\nwere assessed at every follow-up. Restricting the analyses to children with complete assessment records would have\neliminated more than half of the children from some analyses. Trends over the 3-year course of treatment were examined by\nusing paired t-tests to compare each child\xe2\x80\x99s score at one assessment with his or her score at the following assessment.\nFor every dependent measure, we also determined whether each child achieved a favorable outcome. This was defined as\na DQor SS within the normal range of functioning (i.e\xe2\x80\x9e 85 or higher), or a DQor SS that was at least 15 points (1 standard\ndeviation) higher at the final assessment (Year 2 or Year 3) than at intake. This definition is logically similar to the reliable\nchange index proposed by Jacobson and Truax (1991) for evaluating the effects of treatments. Chi-square tests were used to\ndetermine whether the percentage of children with a favorable outcome differed by treatment group, with a separate\nanalysis conducted for each dependent measure.\n3. Results\n3.1. Ages and assessment times\nThe assessment chronology for all three groups is summarized in Table 1. Cells in the first five rows include descriptive\nstatistics on chronological ages at diagnosis and at each assessment time. Data in the bottom three rows describe elapsed\ntime between intake and later assessments. Data in the two rightmost columns represent comparisons of group means;\nasterisks indicate statistically significant differences. These data indicate that, at diagnosis and every subsequent\nassessment, the average child in the IBT group was younger than the average child in either comparison group; those\ndifferences were statistically significant. There was also a statistically significant difference between the mean ages of the AP\nand GP children at diagnosis, but not at any of the later assessments.\n3.2. Analyses of standard scores and developmental quotients\nTable 2 presents descriptive statistics and analyses of assessments of cognitive and adaptive skills for each group.\nAdaptive behavior scores (communication, social, and self-help skills) are expressed as developmental quotients (DQs),\nwhile cognitive skills scores and the composite adaptive behavior measure are expressed as standard scores (SSs). For each of\nthe five measures, cells in the first four rows under each group\xe2\x80\x99s column list descriptive statistics from each assessment time;\nresults of statistical comparisons of group means at each assessment time are shown in the two rightmost columns. All\ncomparisons controlled for the child\xe2\x80\x99s age at diagnosis and the parents\xe2\x80\x99 years of education. Asterisks indicate statistical\nsignificance. As shown in the two rightmost columns, all Year 1 and Year 2 mean SSs and DQs were significantly higher for the\nIBT group than for the two comparison groups combined. There were no other statistically significant between-group\ndifferences; the mean scores for the IBT group and the two comparison groups combined did not differ significantly at intake,\nand the mean scores of the AP and GP groups did not differ significantly from each other at intake or at any of the other\nassessment times on any measure.\nThe cells in the bottom three rows for each of the five measures in Table 2 summarize changes in mean scores between\nsuccessive assessments (intake to Year 1, Year 1 to Year 2, and Year 2 to Year 3). Asterisks denote statistically significant\nimprovements (positive values) or declines (negative values) from one year to the next. The IBT group had statistically\nsignificant improvements on all measures from intake to Year 1. The AP group had a statistically significant improvement on\nthe cognitive skills SS from intake to Year 1, and statistically significant declines in the self-help DQand adaptive behavior\ncomposite SS from Year 1 to Year 2. The GP group had a statistically significant improvement on the social skills DQfrom Year\n1 to Year 2. No other changes were statistically significant.\nThe cells in the penultimate column in the bottom three rows for each measure in Table 2 represent comparisons of the\nmean change scores of the IBT group and the AP and GP groups combined. Asterisks indicate statistically significant\ndifferences in change scores between intake and Year 1 on all measures in favor of the IBT group. The cells in the rightmost\n\n\x0c\xe2\x80\xa2juauijpajj jo aoaA auo aajjp sdnojS oaaj aaqjo aqj ui uaapqqa upqj ajoiu paAOjduii dnoaS X9I aqi ui uaapqqa \xe2\x80\x98aSeaaAe\nuo \xe2\x80\x98jpqj pajpaipui saaoas aSupqa \xe2\x80\x98aAoqp passnasip sv sdnojg dD pup dV am P!P upqj aajp[ aeaA auo saaoas upaui aaqSiq\nAyupayiuSis ppq dnojS iai aqi pup \xe2\x80\x98anpjui jp a[qpjpdoioa ajaAA sdnojS aaaqj qp aoj saanspaui qp uo saaoas upa[Al :(S00Z "|b ja\npjPAAOH) aadpd snoiAajd ano ui pajaodaa asoqj aoaaiua j. jpaA pup aqpjm jp pajanpuoa sjuauassassp uioy pjpq \'juauissassp\nqapa ip ajnseaua qapa uo (aAoqp pup ss) aSupj |puuou aqj ui ajaAA saaoas asoqM dnojS qapa ui uajpqqa jo sjaquinu aqj\npup saaoas [pnpiAiput Moqs os[P qaiqAA \xe2\x80\x98g-g -sSij ui AqpaiqdpaS pajuasaadaa ajp z aiqpi ui paquasap pjpp upaui dnoaS aqx\ndnojS dD aqj ui uaapqqa upqj \xe2\x80\x98aSpaaAP\nuo \'aaAAO[ Suuoas dnoaS dV aqi ui uajpqqa qjiM \xe2\x80\x98zlz JeaA Jb juauissassp fta sqi>[s jojoui aqj uo spm sdnojS dD pup dV am\nuaaAAjaq aauaaajjip jupayiuSis Aqpaijsijpjs Ajuo aqj \xe2\x80\x98uuirqoa |Puy aqj ui pajpaipui sy \'sdnoaS aijaapa pauiquioa omj aqj jbao\ndnoa8 iai aqj Suijoapj saaoas aSupqa ui saauaaajqp jupayiuSis Aqpaijsijpjs [paaAas SAAoqs uipSp \xc2\xa3 aiqpjjo uuin|oa pupq-jqSia\najpuiijinuad aqi dnoaS jpqj aoj punoj aaaM saSupqa jupayiuSis Aqpaijsijpjs aaqjo ou jnq \xe2\x80\x98ApupayiuSis paspajauifta a8pnSup|\naAissaadxa upaua s,dnoa8 dD am [BAaajui auaps jpqj aaAO \xc2\xa3lz JB3A oj j jpaA uiojj t)0 sqpjs jojoui upaui aqj ui auqaap p\nspaa aSupqa jupayiuSis Aqpaijsijpjs A[uo aqj \xe2\x80\x98dnoaS dV aqj acy \'|PAjajui auaps jpqj aaAO auqaap p paMoqs^ja upaui sqiqs jojoui\naqj aqqAA \xe2\x80\x98\xc2\xa3lz aeaA oj i jpaA uiojj AyupayiuSis paAOjduii osjp sqjys aSpnSue| aAissaadxg i jpsa oj aqpjui uiojj AqueayiuSis\npaspaaaui (aqpjui jp aSupj |puuou aqj ui ajaAA qaiqAA) sqiqs jojoui jdaaxa saanspaui qp uo sQa upaui \'dnojS iai aqj\njoj aajpjui aayp sapaA oaaj - \xc2\xa3\njp appui spaa juauissassp ou ji - jo aqpjui aajjp sapaA aaaqj aaqjja pup :a>|pju! aajjp jpaA auo\n:aqpjui jp :aunj ui sjuiod aaaqj A|uo joj saijsijpjs SAAoqs \xc2\xa3 a|qpx \xe2\x80\x98A|8uipjoaay \'juauijpajj jo sapaA \xc2\xa3 aajjp A|uo passasspaj ajaAA\nsaaqjo aqqAA \xe2\x80\x98juauijpajj jo sapaA z aaye A[uo passasspaj aaaAA uajpqqa auaos: i jpba aajjp Apuajsisuoa passassp jou aaaAA sqiqs\nasoqx dnoaS Aq s\xc2\xa3)ci sqpjs jojoui pup \xe2\x80\x98a8pnSup| X)i |pqaaAuou sjuasaad \xe2\x80\x98z a|qpx oj A|jpquns pajjpuuoj si qaiqAA \xe2\x80\x98\xc2\xa3 a|q?x\ndnoaS dD am aoj paspaaaui jnq dnoaS dV am J0J pauipap aaoas upaui aqj \xe2\x80\x98aspa qapa ui \'.z aeaA oj i apaA uiojj ss aysoduioa\naoiAPqaq aAijdppp pup\n|Piaos 7)0 djaq-jjas ui saSupqa aqj uj uaas aaaAA saauaaajjip jupayiuSis Aqpaijsijpjs \'sdnoaS\n,dD pup, dV.am yaaAAjaq saaoas aSupqa upaui aqj jo suosiapduioa juasaadaj aanspaui qapa\'Joj\'SAAdjaaaqj uiojjoqaqjuuiunqoa\n100 >d \xe2\x80\x9e\nS0 0><f .\n(S0 0 > d) 3JBUPAOD lUeDIJjUSjS E si\n^90-\n\n..8011SZO\nLL\'L-\n\n698\xc2\xa391\nZl\'O\n\nozs..19EZ98\'Z\n\nszsi8PZI68\'8\n\nEll\n09\'l\n,66\'El \xe2\x80\x94\nS6\'8\n08\'SSS\'88ZS\nZ9E00\'i-\n\nirzSZ\'l6Z\'8\xe2\x80\x9c\nEES\xe2\x80\x94\n98\'S\xe2\x80\x94\nZZP-\n\nsps9PE\nZl\'O\nPEZ699-\n\noozl\xc2\xa3\'Z-\n\nwoOfr\'O\nLU6\n6l\'H\n\xe2\x80\x9eU\'\xc2\xa3l\n\xe2\x80\x9eZ\xc2\xa3\'Zl\nfrZ\'Z\nSOl86\'Z\n,Z9\xe2\x80\x998l\n6\xc2\xa3*6l\n,.SZSZ\n.ZPSl\nZO\'E-\n\nzisZZZ\n.S9\'8\nZZ\'8\nS6ZI\n,1601\nPZZ\n19Z6ZE..SOZZ\nSP\'6l\n,98\xe2\x80\x98PZ\n..ZO\'ZZ\nZZP\n88060\'Z..6061\nZPIZ\n..98EZ\n..9ZPZ\nSEE\n\nWL\n6Z8\nt-Zll\nEZ\'Zl\nWZl\nSPOl\nW\'8\n9861\nSPOZ\n60EZ\nOPZZ\n600E\nIZOZ\nOSS l\n10ZI\n9ZSI\nE6ZI\nZ9EI\n6E\'9l\n6Z\'0l\nEl\'Zl\nZ901\n9PSI\nPEIZ\nPZZZ\n9SZZ\nPEZZ\n6EPI\nSPEl\n6Z\'9\nOZZl\nSP\'6l\nSEZl\nEl\'S l\nZZ\'Sl\n\nas\nUE9UI\n\nUB9UI\n\ndo snuiui\nuesui dV\n\ndO/dV snuiui\nUE9UI iai\n\nsvtl\xc2\xa3lS8S9\nZZ\'69\nE9Z.9\n69 69\nLVL\n.89\'Sl\nLVLS8Z9\n9119\nSSefr\nZ6\xe2\x80\x999S\nUJ\n9frS\nIfr\'EZ619\n8909\n6\xc2\xa3\xe2\x80\x999S\n0E09\n89S\nEE>\nZ8>\n897.S\n9ZIS\n8EZS\nZZLP\nSS\'9\n691\xe2\x80\x9c\nZZ\'8\nLLIL\n8099\n\xc2\xa31\xe2\x80\x9869\n0019\n\nm\n\ndnoaS dO\n\nU\nEl\n\xc2\xa31\nEl\nEl\n91\nEl\n11\nEl\nSI\nEl\nEl\n91\nSI\nll\nEl\nSI\nEl\n\xc2\xa31\n91\nSI\n11\nEl\nSI\n\xc2\xa31\n\xc2\xa31\n91\nSI\n11\nEl\nSI\nEl\nEl\n91\nSI\n\nu\n\nezu\nlfr\xe2\x80\x998\nWJ\xe2\x80\x99Zl\n08\'Sl\nfrlOl\n16ZI\n8P01\nIV LI\nLLLl\n98EZ\n9SSZ\nZS\'SZ\n68\'8Z\nSlEZ\nSl\xe2\x80\x998l\nLOP l\nZSOZ\nSZ\'6l\n9^91\n6^91\nfrSSl\n89>l\n9fr\'8l\n96\'Zl\n9\xc2\xa3\xe2\x80\x986Z\nOSOZ\n\xc2\xa39\'IZ\n10SI\nZ9\'8\n69\'9\nPO\'Sl\nPSPZ\n8EEI\nE9\'6l\nOS\'El\n\n60>>S\'99S0Z0\'8S\n8019\nSZ\'69\n18\'69\nZS\'l\nE6Z6E0\n09ZS\n89\'8fr\nM?\'8S\nS0\xe2\x80\x988S\nIEE\n.ES\'8frSS\nll\'9S\nEl\'ZS\nZIZ9\n\xc2\xa39\'9S\n89>\nZZZ\nZSE\nOP6P\nP6SP\nZS\'9P\nS6ZP\n60 1\nLLl\n_PPS\nZVP9\n8\xc2\xa3\'6S\nEIZ9\n69ES\n\nas\n\ndnojS dV\n\n11\nEl\n91\nH\nEl\n91\n91\n11\nEl\n91\nPI\nEl\n91\n91\n11\nEl\n91\nPI\nEl\n91\n91\n11\nEl\n91\nPI\nEl\n91\n91\n11\n\xc2\xa31\n91\nPI\n\xc2\xa31\n91\n91\n\nSZ\'8\nSZOl\nP601\nP6SI\n90ZI\nS601\nZLL\n18PI\nPVZl\n66\'ZZ\n8PZZ\n09\xe2\x80\x989Z\nS0\xe2\x80\x986l\n0081\nEE\'8\nS8ZI\nPZ\'IZ\n8ZSI\n6691\n6\xc2\xa3\'9l\n6S\xe2\x80\x989l\n6ZIZ\n80\'IZ\nZZIZ\n9P\'6Z\nE6\'8Z\n8PZZ\nZZ\'Pl\n90\xe2\x80\x986\n689\n81PI\n66EZ\nZPOZ\nZ80Z\n8P\xe2\x80\x98Zl\n\nIEP090\xe2\x80\x9e88\'8\n\n91\nOZ\nPZ\n\n00\'9Z\nPl\'6Z\nS118\nOOZZ\n\nOZ\nZZ\n9Z\n9Z\n\nZEE\n,.S8ll\n,.IZSI\nE\xc2\xa3\xe2\x80\x986Z\n9908\nZP\'69\nEPPS\n99Z890\n986\nEIZ9\nSE\'69\n610Z\nS909\nZSZ\n\n91\nOZ\nSZ\n61\nZZ\n9Z\n8Z\n91\nOZ\n9Z\n61\nZZ\n9Z\n6Z\n91\n\nu\n\nas\n\nw\n\nU\n\nzoo-\n\nOZ\n\n\xe2\x80\x9eEZ\'9Z\nP8ZZ\n9PE4\nLV9L\nEZ\'6P\nP6Z\n\n9Z\n61\nZZ\n9Z\n6Z\nLl\n\nsoz-\n\nOZ\n\nPP7.Z\nEP\'68\n6S\'98\n88\'68\nZS09\n\nSZ\nIZ\nZZ\n9Z\n8Z\n\ndnojS igi\n\n\xe2\x80\xa2sdnojS uaayvyaq saDuajajjip pue \xe2\x80\x98sjuauissasse aAissaaons uaa/vqaq saSupqo \xe2\x80\x98\xc2\xa3-1\n\nsjbsa\n\nZ JF3A\nl JF3A\n3>|F5UI\n\n\xe2\x80\x98l^d\n\nJE S\xc2\xa7V ,,\n\n(SS) 33isoduioa\n\n, \xc2\xa3 JB3A SA Z JB3A\nZ JF3A SA l JB3A\nl JE3A SA 3>jl\xe2\x80\x99lII [\n\xc2\xa3 JF3A\nZ JF3A\nl JF3A\n3)(B5UI\nE JF3A SA Z JF3A\nZ JB3A SA l JE3A\nl JB3A SA\nE JF3A\nZ JF3A\nl JF3A\n34jP]UJ\nE IF3A SA Z JF3A\nZ JB3A SA l JB3A\nl JB3A SA 3>]B)UI\n\xc2\xa33B3A\nZ JB3A\nl JB3A\n3)|B3UI\nEJB3ASAEJE3A\nz JB3A SA l JB3A\nl JE3A SA 33(B3UI\n\xc2\xa3 JB3A\nZ JB3A\nl JB3A\n3JIBJUI\n\njuauissdssv\npup ajjBjuj jp\n\nPt\xc2\xa3\xc2\xa3-9Z\xc2\xa3\xc2\xa3 (VIOZ) SC swmosiq ipjuauidojdAdQ ui i/jJD3saj//7D ja pjdmoh S\xe2\x80\x98f\n\nt9\xe2\x80\x98adV\n\nsisouSeip\n\nE JF0A sa z JF3A\nz JB3A SA l JB3A\nl JB3A SA 3>H-\xe2\x80\x994U]\n\xc2\xa3 JE3A\n\nssjcos siji>js\n\n(Oa) [BI30S\n\nCOa) d|3i|\'J|3S\n\n(X)Q) UOpBDIUIUUUIOa\n\n(SS) 3Aqiu3oa\n\najnspaiAj\naAijdBpp pUB 9ApiuSo3\nZ aiqei\nZ\xc2\xa3\xc2\xa3\xc2\xa3\n\n\x0cPet. Reh. App.62\n\n3333\n\n].S. Howard et ai/Research in Developmental Disabilities 35 (2014) 3326-3344\n\nTable 3\nNonverbal IQ. language, and motor skills scores at intake. Year 1, and Year 2 or 3, changes between assessments, and differences between groups.\nIBT treatment group\n\nMeasure\n\nAssessment\n\nNon-verbal (DQ)\n\nIntake\nYear 1b\nYear 2/3\nIntake vs Year 1\nYear 1 vs Year 2/3\nIntakeb\nYear 1\nYear 2/3\nIntake vs Year 1\nYear 1 vs Year 2/3\nIntake\nYear 1\nYear 2/3\nIntake vs Year 1\nYear 1 vs Year 2/3\nIntake\nYear 1\nYear 2/3\nIntake vs Year 1\nYear 1 vs Year 2/3\n\nReceptive (DQ)\n\nExpressive (DQ)\n\nMotor (DQ)\n\nCP treatment group\n\nAP treatment group\n\nn\n\nM\n\nSD\n\nn\n\nM\n\nSD\n\nn\n\nM\n\nSD\n\n20\n24\n24\n20\n21\n29\n26\n25\n26\n24\n\n80.44\n101.04\n98.05\n20.31\xe2\x80\x9d\n-2.20\n48.79\n71.23\n74.46\n22.53\xe2\x80\x9d\n2.18\n\n15.23\n22.44\n24.61\n13.12\n20.81\n5.80\n24.42\n27.25\n22.68\n16.47\n13.09\n12.24\n\n15\n14\n13\n13\n12\n15\n14\n13\n13\n12\n14\n16\n\n25\n25\n23\n\n90.17\n0.63\n-8.44\n\n67.00\n73.60\n69.33\n6.61\n-2.10\n45.44\n5139\n49.53\n5.27\n-0.27\n43.90\n47.31\n47.98\n3.42\n1.51\n89.55\n85.08\n74.00\n-4.46\n-9.82\'\n\n11\n15\n14\n11\n13\n\n49.73\n69.24\n83.25\n20.46"\n10.40\xe2\x80\x99\n94.65\n97.30\n\n16\n16\n15\n16\n15\n16\n15\n15\n15\n14\n16\n15\n15\n15\n14\n16\n16\n12\n16\n12\n\n17.13\n24.79\n22.08\n18.56\n13.63\n\n29\n26\n26\n26\n24\n28\n26\n\n12.06\n18.27\n17.92\n14.97\n8.65\n20.87\n21.97\n25.08\n18.31\n12.39\n16.34\n23.20\n29.88\n22.36\n17.11\n17.50\n14.74\n12.64\n18.23\n18.05\n\n13.24\n12.82\n13.54\n\n14\n14\n14\n\n76.65\n81.08\n82.20\n2.42\n2.78\n47.29\n51.95\n60.31\n2.77\n4.39\n50.20\n48.08\n62.07\n-2.84\n12.34\xe2\x80\x99\n86.96\n85.62\n86.31\n0.83\n1.97\n\n13.37\n18.72\n21.74\n13.42\n11.49\n13.59\n19.46\n18.75\n11.96\n10.49\n12.16\n14.35\n23.83\n12.29\n17.75\n13.34\n13.62\n15.83\n18.18\n20.57\n\nIBT mean\nminus AP/CP\nmean\n\nAP mean\nminus CP\nmean\n\n9.51\n23.82"\n22.50"\n15.41\xe2\x80\x9c\n-2.36\n2.45\n19.57\'\n19.93\xe2\x80\x99\n18.42\xe2\x80\x9d\n0.31\n2.78\n21.56\xe2\x80\x99\n28.73\'\n19.95\xe2\x80\x99\n3.90\n6.31\n11.95\xe2\x80\x99\n9.54\n2.63\n-4.97\n\n-9.65\n-7.47\n-12.87\n4.19\n-4.87\n-1.85\n-0.55\n-10.78\n2.50\n-4.66\n-6.30\n-0.77\n-14.10\n6.26\n-10.82\n2.59\n-0.54\n-12.30\xe2\x80\x99\n-5.29\n-11.78\n\n6 Mean parental years of education is a significant covariate (p < 0.05).\n* p < 0.05.\n** p<0.01.\n140\n\n140\n\nGP\n\nAP\n\n-\xe2\x80\xa2120\n\n120-\n\nO\n\n8\n\n100-\n\nI\n\no\n80-\n\ni\n\n6040-\n\nCognitive\n\nI\n\n20\n\ni\n\nT\n\nT\n\nIt\n\nI\n\nI\n\n\' \xe2\x80\xa2\n\xe2\x80\xa2 \'\n\n6040-\n\n-100\n\xe2\x96\xa0\n\n\xc2\xa9\n\n-60\n\n\xe2\x80\xa2j\n-40\n\nT\n\nT\n\nT\n\nT\n\n-20\n\nI\n\nH.\n\n-120\n\nI\n\n3\n\nMI\n\ni\n\n\xc2\xab\n\n-too\n\nIBT\n\n\xc2\xa7\n\ni\n\n....\n\'\n\nS\n\nf\nIntake\n\nYear 1\n\nYear 2\n\ni\n\n-80\n-60\n\nEclectic\n-40\n\n\xc2\xab\n\n20\n\n-80\n\nMean \xc2\xb1 1 SEM\n\n120-\n\n80-\n\n\xe2\x80\xa2\n\nI\n\n1\n\n1ST\n\n100-\n\nt\n\n\xc2\xa9\n\nI i\n\n0\n\nYear 3\n\nl\n\nIntake\n\nI\n\nYear 1\n\n20\nYear 2\n\nYear 3\n\nAssessment\nFig. 2. Cognitive SSs at intake and 1-3 years later. Each dot represents the score for an individual child at that assessment time. Black dots indicate children\nwho received their original treatment at the time of testing; white dots indicate children in the AP group who received GP treatment in the year preceding\nassessment, or children in the GP group who received AP treatment prior to assessment. Cray dots indicate children whose treatment prior to assessment\nwas not recorded. Scores in the gray region of each panel are in the normal range (85 or higher). The lines in each panel connect the group mean scores at\neach assessment. The vertical bars in the lower right panel extend \xc2\xb11 standard error around each group mean.\n\n\x0c3334\n\nPet. Reh. App.63\n\nJ.S. Howard et al./Research in Developmental Disabilities 35 (2014) 3326-3344\n\xe2\x80\xa2ISO\n\n150\n\nAP\n\nGP\n\n130-\n\n-130\n\n110-\n\n-110\n\no\n90-\n\n6\n\'70?\n\n.\xc2\xa7\n\nI\n\xc2\xbb\n2... I\n\n50?\n\n< I\n\ni\n\n\xe2\x80\xa2\n\no\n\nI\n\nI\n.........\n1..*\nI\n\n\xe2\x96\xa0\n\n:\xc2\xa9\n\n8\n\n*>V\xe2\x80\x9d....\xc2\xa9\n\npiimiiiniiui\n\n30?\n\nCommunication^\n(DQ)\n\n\xe2\x80\xa2\n\n-90\n\n\xe2\x99\xa6\n\n8\n\xe2\x80\xa2\n\n\xc2\xbb\n\ni\n\n*\n\n50-\n\n-130\n\n\' \xe2\x80\xa2\n\ns\n\n-110\n\npHH\n\n!\n30-\n\n-3P\n\nMean \xc2\xb1 1 SEM\n\n1,10-\n\n70 r\n\nt50\n\n-160\n\nIBT\n\n130-\n\n90-\n\n-70\n\n\xe2\x80\xa2\n\nI\n\n:\n\n*\n\n\xc2\xab\n\nYean\n\nYear 2\n\n-90\n\nI\n\nIBTyC**1\n\n-70\n\n....""\'\xe2\x80\x99\xe2\x80\x99I\n\n*\n\ni\n\nEcleotte\n\n!\n\n-30\n\n10Intake\n\nYear?\n\n-so\n\n-10\nIntake\n\nYear1\n\nYear 2\n\nYear 3\n\nAssessment\nFig. 3. Communication DQs at intake and 1-3 years later. See Fig. 2 caption for details.\n\n*\xe2\x80\xa2\xc2\xbb<\n\nAfter the first year of treatment, the sharply accelerated trajectory for the IBT group relative to the two other groups did--\'\nnot continue for any measure except the social skills DQ, which increased again from Year 1 to Year 2 before leveling off\n(Fig. 5). The mean cognitive skills SS for the IBT group remained stable from Year 1 to Year 3 (Fig. 2), while the mean\xe2\x80\x99\xe2\x80\x99\ncommunication skills DQ, self-help skills DQ, and adaptive skills composite SS declined slightly (Figs. 3, 4 and 6?::\nrespectively). The mean scores for the GP and AP groups either increased slightly or declined from Year 1 to Year 3 on all\nmeasures except social skills DQs, which increased for the GP group (Fig. 5).\nIn general, the gaps that emerged between the means of the IBT group and the other two groups after one year of\ntreatment remained.fairlv,cons.tantor,expandedjn,fayor.ofthe,lBTlBfoupiinj|feai:Sr2?and!3iGseeitheilowenrightghandipanelsiaf=\nFigs-2=6)rAlthough the mean scores for the~children ifftfiFlBT group were liigher than those of the children in the eclectic\ntreatment group three years after intake, those differences were not statistically significant (see Tables 2 and 3). With one\npossible exception, that was not because children in the IBT group regressed or because those in the AP and GP groups\nimproved substantially; rather, it was because some children lacked 3-year followup assessments, reducing the Year 3\nsample sizes and precluding the detection of statistically significant differences among the group mean scores. The exception\nwas the mean motor skills DQfor the IBT group, which declined slightly from intake to Year 2/3 but remained in the normal\nrange. The AP group\'s mean motor skills DQs also declined over the course of treatment; that decline was statistically\nsignificant and resulted in a Year 3 mean that was below normal (see Table 3).\nGiven the large improvements in the IBT group after one year of treatment, it may seem surprising that continued\ntreatment did not produce further large gains on most measures; rather, most mean scores remained stable or declined\nslightly in Years 2 and 3. That finding should be interpreted with caution, however, and in relation to the results for the other\ntwo groups. For example, the mean cognitive skills SS for the IBT group was in the normal range after one year of treatment,\nso further large increases were unlikely. The mean adaptive skills composite SS for the IBT group fell slightly over the course\nof treatment, but the means for the two other groups fell even more. One plausible explanation for the apparent declines in\nthe mean VABS composite scores is that the programming for these young children emphasized skills other than those\nassessed by the VABS.\n3.3. Analyses of outcomes by type of treatment\nAdditional analyses were conducted to ascertain the proportions of children in each group who achieved clinically\nimportant outcomes by the end of treatment, and the likelihood that each type of treatment would produce such outcomes.\n\n\x0cPet. Reh. App.64\n\n].S. Howard et aL/Research in Developmental Disabilities 35 (2014) 3326-3344\n\n3335\n\n120\n\n-120\n\nAP\n\n;\n\nGP\n\ns .\n\n100-\n\no\n80-\n\ni\n\n60\'-\n\n40-\n\nSelf-help\n(DQ)\n\n|\n\nI\n\nj\n\n.\n\nI\n\n8\n\n*\n\n8\n\ni\n\n....... .\n\nII **\n\n\xc2\xbb\n\ni\n\n-80\n\nI\n-60\n\n\xe2\x80\xa28\n\n20,\n\n80:-\n\n60-\n\n40-\n\n-40\n\n\xe2\x80\xa22b\n\nIBT\n100-\n\n-100\n\n\xe2\x80\xa2\n\nMean \xc2\xb1 1 SEM\n\n\xe2\x80\xa2\n\n$mz\n\n\'. \xe2\x80\xa2\n\n-100\n\n\xe2\x96\xa0\n\n-80\n\nHn\ni\n\n1\n\nIBT *\n%60\n-r\n\nEclectic\n\n\xe2\x80\xa2:\n\n-40\n\nf\n20-\n\n20\nIntake\n\n\xe2\x96\xa0:\n\nYean\n\nYear 2\n\nYear 3\n\nIntake\n\nYeari\n\nYear 2\n\nYear 3\n\nAssessment\nFig. 4. Self-help DQs at intake and 1-3 years later. See Fig. 2 caption for details.\n\nTable 4 shows the percentage of children in each group who had final (Year 2 or 3) scores in the normal range (i.e., >85; thirdcolumn), final scores that were at least one standard deviation (>15 points) higher than their intake scores (fifth column),^*1\nand either of those favorable outcomes (penultimate column). Columns immediately to the right of each of those show odds" \'\nratios and probability ratios. To illustrate the odds ratio statistic, consider the data in the fourth column for the cognitive SS.\nFor the IBT group, 61% had a final score >85 on that measure; the odds of achieving that favorable outcome were 0.607/\n(1 -0.607.)^.1.545..Eor,thej:hildrenjn,the,AP.and1GP.groupsaGombined[\xc2\xbb25%\xc2\xbbhad;final;Scoresi^85tatheiodds;dftthisfotitcome^^M\nwere 0.250/(1~- 0.250) = 0.333rThe~ratid\'of thoseTwo odds\xe2\x80\x99is_lT545\'/07333 = 4.64. This odds ratio of 4.64 is greater than the\n"neutral\xe2\x80\x9d value of 1, indicating that a favorable outcome on the cognitive SS was attained more often by children in the IBT\ngroup than by children in the two other groups combined. A likelihood ratio test, which is similar to a chi-square test,\nconfirmed this difference as statistically significant An odds ratio of 4.64, however, does not signify that children in the IBT\ngroup were 4.64 times more likely to have a favorable outcome than children in the AP and GP groups. Such an estimate is\nbetter provided by the probability ratio, which is shown in parentheses below each odds ratio in Table 4. The probability ratio\nfor the cognitive SS example is 0.607 (the probability of a final score >85 for children in the IBT group) divided by 0.250 (the\nprobability of a final score >85 for children in the AP and GP groups combined) = 2.43, indicating that children in the IBT\ngroup were 2.43 times more likely to achieve final cognitive SSs in the normal range than were children in the other two\ngroups combined. Probability ratios are more readily interpreted than odds ratios, but statistical tests for group differences\nutilize odds ratios.\nTable 4 shows that the overwhelming majority of the odds ratios and probability ratios favored IBT, indicating that\nclinically important outcomes as defined here were far more likely to be attained by children who received IBT than by\nchildren who received either of the other two treatments. The only exception was that final motor DQ. scores were unlikely to\nbe at least one standard deviation above the intake scores. As noted previously, that was likely due to a ceiling effect, in that\nthe mean motor DQfor the IBT group was in the normal range at intake and stayed there over the course of treatment. Double\nasterisks in Table 4 show that the advantage for IBT children was more likely to be statistically significant when a favorable\noutcome was defined as a final score >85 than when it was defined as an increase of at least 15 points over intake.\nStatistically significant differences between the AP and GP groups emerged only for an increase of 15 points or more over\nintake for social, motor, and adaptive skills composite scores. For those three measures, the odds of a favorable outcome were\nhigher for the GP group than for the AP group. For the cognitive, receptive, and self-help measures, children in the AP group\n\n:\xe2\x96\xa0\n\n?\n\n\x0cPet. Reh. App.65\n\n3336\n\nJ.S. Howard et oi./Research in Developmental Disabilities 35 (2014) 3326-3344\n130-\n\n"\'130\n\nAP\n\n\' GP \' \xe2\x96\xa0\'\n\n110-\n\n-110\n\n\xe2\x80\xa2 .\n-90-\n\nI\n\n\xe2\x80\x98\n\nS\xe2\x80\x94-S\xe2\x80\xa2\n\n9\n\n!\n\n.\n\n30-\n\nSocial\n\nmm\n\n.\n\nI\ni\n\nft\n\ni\n\n\xe2\x80\xa2\n\n<\xe2\x80\xa2\n\n8\n\n\xe2\x80\xa2\nI..>4\n*\n\xe2\x80\xa2\n\n..........\n\ni\n\n\xe2\x80\xa2\n\n\xe2\x80\xa2\n\n*\n\nMitiimniif;\n\ni\n\n,50-\n\n.8.\n\nw\n\n&\n\nI\n\n70-\n\n\xe2\x80\xa2\n\n\xe2\x80\xa2\n\nio\n\n\xe2\x80\xa270\n\nSO\n\nI\n\n;\xe2\x96\xa0\n\n-90\n\nP\n\nft\n\n* \xc2\xb0\n\n\xe2\x80\xa236\n10\n\nIBT\n\nMean\'\xc2\xb1 1 SEM-\n\n110-\n\n1-110\n\n*90-\n\n-96\n\nIBT\n\n70-\n\n-70\n\n1"iji ..........\n\n50-\n\n-50\n\nEtlectic\n*30 -\n\n-30\n\nio\n\n\xe2\x96\xa010\nIntake\n\nYear 1\n\nYear 2Year3\n\nIntake\n\nYear!\n\nYear 2\n\nYear 3\n\nAssessment\nFig. 5. Social DQ? at intake and 1-3 years later. See Fig. 2 caption for details.\n\nwere more likely to have favorable outcomes than children in the GP group, though none of those differences was\nstatistically significant. Collectively, these analyses suggest that neither of the comparison treatments was likely to result in\nfavorable outcomes, and that combining the AP and GP groups did not mask any important group differences in outcomes.\nFig. 7 is a graphic representation of the percentages of children in the IBT group and the combined AP and GP groups who\nhad scores in the normal range at each assessment. At intake, those percentages were comparably small for both groups on\nrallfmeasuresiexcepM*heimotoBskills\xc2\xa9Q?oniwhich!fairlyslargeipropoitiprtS!Ofjboth!groups^57^IBTi\xc2\xbbt7*W?/GBictpibinecl0ilT3d!\nscores in the normal range. By the end of treatment, a larger percentage of children in the IBT group than in the AP/GP group\nhad scores in the normal range on all measures except the self-help DQ,\nIndividuals with final scores that were in the normal range (>85) or at least one standard deviation above intake scores\ncan be readily identified in Fig. 8. In this figure, each child\'s score on each measure is plotted as a function of his or her score at\nintake (on the x-axis) and the change from intake to the final assessment (on the y-axis; the final assessment was made at\nYear 2 if the child was not assessed at Year 3). Final scores in the normal range appear in the dark gray region of each panel,\nand scores representing increases of at least one standard deviation over intake are in the light gray regions. Both regions are\npopulated by more children in the IBT group (closed circles) than by children in the other two groups (open symbols). That is,\nmore of the children who received IBT had final outcomes that constituted clinically important changes over baseline than\ndid children who received either of the other two treatments.\nAn important question is whether children in this study who attained normal levels of functioning at any point\nmaintained those levels over the course of treatment. That question is difficult to answer, because only a portion of the\nchildren in each group had scores in the normal range at any assessment time, and not all children were assessed at both Year\n2 and Year 3. Nevertheless, the question is sufficiently important to merit an attempt to answer it. For this analysis, children\nwere classified into four categories of outcomes: (a) scored <85 one year and remained <85 the next year; (b) scored <85\none year but scored >85 the next year (i.e., transitioned to a normal range of functioning); (c) scored >85 one year but scored\n<85 the next year (i.e., regressed); and (d) scored >85 one year and remained >85 the following year. Those categories were\nthen combined across measures to calculate the probability of each of the four outcomes for each year-to-year assessment\ntransition. Separate calculations were made for children in the IBT group and for children in the combined AP and GP groups.\nResults of these analyses are illustrated by the Sankey diagram shown in Fig. 9. In this figure, arrows are not just\nproportional in width to the quantities the represent; they are also horizontal if they represent children who maintained\n\n,f\n\nJ-fl5\nir\'\n\n\x0cPet. Reh. App.66\n\nJ.S. Howard et at./Research in Developmental Disabilities 35 (2014) 3326-3344\n120\n\n-120\n\n\xe2\x96\xa0 s\n\nAP\n100-\n\n3337\n\nGP\n\n- :\n\n-too\n\n, \xe2\x80\xa2. \'\n\'\xe2\x80\xa2..... (it........\xe2\x80\xa2\no\no\n\n80-\n\n60-\n\n*\n\n\xe2\x80\xa2\n\n\xe2\x80\xa2\n\n\xe2\x96\xa0\n\ns\n\nJ..\'\xe2\x80\xa2\'\xe2\x80\xa2\'8\n\xe2\x80\xa2\nI\n\n40-\n\n*\n\nO\n\n-80\n\n.......\n\n\xc2\xbb\nO\n\ni\n\n-60\n\n-40\n\nO\n\nComposite\n(SS)\n\n20\xe2\x80\x9c\n\nI. -\n\nIBT\n\n-.20\n\nMean \xc2\xb1 1 SEM\n\xe2\x96\xa0\n\n;\n\n\xe2\x80\xa2\n\n100-\n\n-100\n\n80-\n\n-80\n\nEclectic\n\n60.\n\n-60\n\nI\n\n40-\n\n-40\n\n20-\n\n-20\nIntake\n\nYear 1\n\nYear 2\n\nYear 3\n\nIntake\n\nYear 1\n\nYear 2\n\nYear 3\n\nAssessment\nFig. 6. Composite adaptive skills SSs at intake and 1-3 years later. See Fig. 2 caption for details.\n\nassessed levels of functioning, they slant upward for children who improved, and they slant downward for children who\nregressed from one year to the next. The figure should be interpreted cautiously, because it represents data that were >\ncollapsed across measures and is based upon other suboptimal manipulations. Nevertheless, several intriguing trends are\nsuggested. One is that most children who moved from below-normal to normal-range functioning did so after one year of\ntreatment. For both groups, the probability of moving into the normal range was higher from intake to Year 1 than from Year\nXtOLYear-2, orafrom-Year.2,fo-Year-34indicated-bv.the-upward-slanting-arrowsrin-Fig-9feStated^differentlv.ithe-pfospect7of==\nachieving scores in the normal range diminished with each additional year of treatment, but the likelihood of scoring in the\nnormal range was substantially and consistently higher for children in the IBT group than for children in the AP/GP groups\ncombined at all three years post-intake (as shown by the percentages in the upward-slanting arrows). For children in the AP/\nGP group, if a score >85 was not attained after one year of treatment, the prospects for attaining a normal score were\nextremely dim.\nA second general trend, confirming analyses presented in preceding tables and figures, is that children in the IBT group\nwere far more likely to score in the normal range at all three post-intake assessments than were children in the two\ncomparison groups. Further, percentages shown in the upward slanting arrows indicate that children in the IBT group were\nmore than three times as likely as children in the AP and GP groups to have scores that moved them from the below-normal\nto the normal range at Years 1-3. That advantage was not limited to Year 1 scores; it remained relatively consistent\nthroughout all three years of the study.\nA final trend, illustrated by the downward slanting arrows in Fig. 9, is that regressions from normal to below-normal\nrange scores were much more common for children in the AP/GP group than for children in the IBT group. In fact, children in\nthe AP/GP group were 3.45 times as likely to regress as to advance during the first year of treatment, 4.45 times more likely to\nregress than advance during the second year of treatment, and 4.91 times more likely to regress than advance in the third\nyear of treatment. The opposite pattern was seen for children in the IBT group, where advancements were 2.48 times as likely\nas regressions during the first year of treatment. Advancements and regressions occurred about equally often between Year 1\nand Year 2 for the IBT group (the ratio was 1.08 in favor of advancements), but in the third year of treatment advancements\nwere 1.75 times as frequent as regressions. Collectively, these findings suggest that children who received IBT were much\nmore likely to attain and maintain normal levels of functioning than were children who received either of the other\ntreatments.\n\nJ\n\n\x0c3338\n\nJ.S. Howard et ai/Research in Developmental Disabilities 35 (2014) 3326-3344\n\nTable 4\nPercent of children with favorable outcomes, and odds ratios and probability ratios for each measure.\nMeasure\n\nGroup\n\nFinal\nscore > 85\n\nOdds ratio\n(probability\nratio)\n\nFinal\nscore >15\npoints above\nintake\n\nOdds ratio\n(probability\nratio)\n\nEither\ndesirable\noutcome\n\nOdds ratio\n(probability\nratio)\n\nCognitive (SS)\n\nIBT\n\n61% (n = 28)\n\n4.64\n(2.43)\n\n81% (n = 27)\n\n8.00"\n(2.30)\n\n82% (n = 28)\n\n8.78"\n(2.39)\n\nAP/GP combined\nAP\n\n25% (n = 32)\n25% (n = 16)\n\nGP\nIBT\n\n25% (n = 16)\n85% (n = 27)\n\nAP/GP combined\nAP\n\n41% (n = 32)\n31% (n = 16)\n\nGP\nIBT\n\n50% (n = 16)\n26% (n = 27)\n\nNon-verbal (DQ)\n\nReceptive (DQ)\n\nAP/GP combined\nAP\n\n[Expressive.; DQ)\n\nCommun-ication (DQ)\n\nSelf-help (DQ)\n\nGP\n>IBT<\n\nMotor (DQ)\n\nComposite (SS)\n\n7% (n = 15)\n>46%*(iP^28)\'\n\nAP/GP combined\nAP\n\n13% (n = 31)\n13% (n = 16)\n\nGP\nIBT\n\n13% (n = 15)\n36% (n = 28)\n\nAP/GP combined\nAP\n\n13% (n = 32)\n13% (n = 16)\n\nGP\nIBT\n\n13% (n = 16)\n11% (n = 28)\n\nAP/GP combined\nAP\n\nSocial (DQ)\n\n6% (n = 31)\n6% (n = 16)\n\n3% (n = 32)\n0% (n = 16)\n\nGP\nIBT\n\n6% (n = 16)\n54% (n = 28)\n\nAP/GP combined\nAP\n\n22% (n = 32)\n13% (n = 16)\n\nGP\nIBT\n\n31% (n = 16)\n57% (n = 28)\n\nAP/GP combined\nAP\n\n47% (n = 32)\n31% (n = 16)\n\nGP\nIBT\n\n63% (n = 16)\n36% (n = 28)\n\nAP/GP combined\nAP\nGP\n\n6% (n = 32)\n0% (n = 16)\n13% (n = 16)\n\n1.00\n(1.00)\n8.40"\n(2.10)\n0.45\n(0.63)\n5.08\'\n(4.02)\n0.93\n(0.94)\n\xe2\x80\x985\xe2\x80\x9885s>\n(3.60)\n0.93\n(0.94)\n3.89\'\n(2.86)\n1.00\n(1.00)\n3.72\n(3.43)\n0.00\n(0.00)\n4.12\'\n(2.45)\n0.31\n(0.40)\n1.51\n(1.22)\n0.27\n(0.50)\n8.33"\n(5.71)\n0.00\n(0.00)\n\n35% (n = 31)\n38% (n = 16)\n33% (n = 15)\n60% (n = 20)\n33% (n = 27)\n31% (n = 16)\n36% (n = ll)\n78%(n = 27)\n30% (n = 30)\n31% (n = 16)\n29% (n = 14)\nl82%*(n^28)\'\n\n1.20\n(1.13)\n3.00\n(1.80)\n0.80\n(0.86)\n8.17"\n(2.59)\n1.14\n(1.09)\n\niSo5*\n\n34% (n = 32)\n38% (n = 16)\n31% (n = 16)\n85% (n = 27)\n47% (n = 32)\n44% (n = 16)\n50% (n = 16)\n85% (n = 27)\n35% (n = 31)\n38% (n = 16)\n33% (n = 15)\n82%\'(n*28)\n\n(2.46)\n33% (n = 30)\n31% (n = 16)\n36% (n = 14)\n68% (n = 28)\n48% (n = 31)\n38% (n = 16)\n60% (n = 15)\n39% (n = 28)\n32% (n = 31)\n38% (n = 16)\n27% (n = 15)\n67% (n = 27)\n42% (n = 31)\n25% (n = 16)\n60% (n = 15)\n19% (n = 27)\n20% (n = 30)\n0% (n = 16)\n43% (n = 14)\n16% (n = 25)\n10% (n = 29)\n0% (n = 16)\n23% (n = 13)\n\n0.82\n(0.88)\n2.25\n(1.40)\n0.40\n(0.63)\n1.36\n(1.22)\n1.65\n(1.41)\n2.77\n(1.59)\n0.22\xe2\x80\x99\n(0.42)\n0.91\n(0.93)\n0.00"\n(0.00)\n1.65\n(1.55)\n0.00\'\n(0.00)\n\n32% (n = 31)\n31% (n = 16)\n33% (n = 15)\n75% (n = 28)\n50% (n = 32)\n38% (n = 16)\n63% (n = 16)\n43% (n = 28)\n31% (n = 32)\n38% (n = 16)\n25% (n = 16)\n71% (n = 28)\n41% (n = 32)\n25% (n = 16)\n56% (n = 16)\n57% (n = 28)\n47% (n = 32)\n31% (n = 16)\n63% (n = 16)\n36% (n = 28)\n9% (n = 32)\n0% (n = 16)\n\n1.32\n(1.20)\n6.52"\n(1.82)\n0.78\n(0.88)\n10.45*\'\n(2.40)\n1.20\n(1.13)\n^\xe2\x80\x99GG 1\n(2.55)\n0.91\n(0.94)\n3.00\'\n(1.50)\n\n"O\n\n0.36\n(0.60)\n\n73\n\nCD\nr*\n\n0\n\nZT\n1.65\n(1.37)\n1.80\n(1.50)\n3.65\'\n(1.76)\n0.26\n(0.44)\n1.51\n(1.22)\n0.27\n(0.50)\n5.37\xe2\x80\x99\n(3.81)\n0.00\'\n(0.00)\n\n19% (n = 16)\n\n\xe2\x80\x9c Odds ratio differs significantly from 1 (p < 0.05).\n** Odds ratio differs significantly from 1 (p < 0.01).\n\n4. Discussion\n4.1. Differential treatment outcomes\nOur 2005 study evaluated outcomes for 61 children with autism who received just over one year of either IBT or one of\ntwo eclectic interventions. Although the three groups were similar at intake, children who received IBT had significantly\nhigher mean scores after one year of treatment than those who received eclectic interventions. The present study extended\n\n>\n\nT3\n\n"O\nCT)\n-Q\n\n\x0cPet. Reh. App.68\n\nJ.S. Howard et al./Research in Developmental Disabilities 35 (2014) 3326-3344\n100%\n\n, Gbgnitive>SS\n\n3339\n\n% sc6ring a\'85 (IBT)\n\n50%\n\n;\n\nH %;SC6ring <85 (IBT)\n\n0%\n\n% scoring & 85 (AP/GP.)\nJf\n\n-50%\n\n100%\n\n0 % scoring <85 (AP/GP)\n\nis\n\n. Communication DQ\n\n100%\n\nT\n\n. Non-verbalj\n. DQ\n\n50%\n\n1\n\no%\n\n=I\n|11\n^\nIS iM 10\n\n-50%\n\xe2\x80\xa2r--\n\n100%\n\nSelf-Help DQ \'\n\n50%\n\n0%\n\n*1-0\n\n\xe2\x96\xa0I8\n\nI\n\nI\n\n- Receptive DQ\n\n-50%\n100%\n\nT\n\n50%\n\n\xe2\x96\xa050%\n\n0%\n\n-\xe2\x80\xa2\n-50%\n\n\'\n100%\n\n0\n\n| P\n0\n\nl l\n\n%\n\n\xe2\x80\x9e\xe2\x80\xa2 0\n\n^ LSS3\nla|\nT\n\nSocial DQ\n\nT\n\nM HI\nT\n\n- Expressive DQ\n\n50%\n0%\n\n-50%\n100%\n\n\xe2\x80\xa2rv.* 0%\n\n100%\n\n,50%\'\n\nin\n8 i ia "18\nT\n\nJ\n\nI\n\n- Composite SS\n\nT\n\n-HI\n\nmI\n\n0%\n\n0\n0\n. //.\n\nll\n\n50%\n\n-50%\n100%\n\n\xe2\x96\xa050%\n\n!jj\xc2\xa3\n\n0%\n\ni1 \xe2\x80\x9ci LfI\n\no%\n\n\xe2\x96\xa0i\n-50%\n\n; 0\n\n0\n\n-100%\n\nj\n\nNone\n(intake)\n\n1 year\n\ni.\n\n2 years ,3 years\n\n-50%\n\n- Motor DQ\ni\n\ni.\n\nNorte\n(Intake)\n\n1 year\n\n2-3\n\nyears\n\nLength of treatment\nFig. 7. Percent of children in each treatment group with a score in the normal range (SS or DQ >85) at intake and 1-3 years after intake.\n\nthose findings by showing that the largest gains generally occurred in the first year of treatment and in IBT children only, and\nthat the advantage experienced by IBT children after one year of treatment was maintained throughout the second and third\nyears of treatment. Indeed, three years after treatment began, mean scores on standardized assessments of cognitive,\nlanguage, adaptive, and motor skills were higher for children in the IBT group than they were for children in the eclectic\nintervention groups.\nAt their final assessment, 61% of the children who received IBT tested within the average range of cognitive functioning,\ncompared with only 25% of the children who received eclectic treatment. That is, children in the IBT group were more than\ntwice as likely to attain a cognitive skills score in the normal range as children in the two eclectic intervention groups. Final\n\n\x0cPet. Reh. App.69\n\n3340\n\nJ.S. Howard et ai/Research in Developmental Disabilities 35 (2014) 3326-3344\n\n\xe2\x96\xa0\n\n9075\n\n6045-\n\nm\n\n\xe2\x96\xa0\n\nH\n\n-90\n\nmm\n\n\xc2\xa711\n\nHi\n\n\xc2\xa7i!\xc2\xa7i\n|m\n\no\n\n.\'\xe2\x80\xa2/A T -\n\n30-\n\n\xe2\x80\xa2\n\n15\n\nm\nn-m\n\nN\n\na 4\n\nA.....\n\nuW \xe2\x96\xa0\n\nI\n\n\xe2\x96\xa1A\n\no-\n\n\xe2\x96\xa1 4.. ^\n\n-15-\n\nA V *\na\n\nA\n\n-30-\n\n\xe2\x80\xa2 * .\n\n-15\n\n2# *\n\n*\n\n90\n\n-30\n\n-m\n\n............ A.....\n\n75\n\nChange\nin score\nfrom\nintake to\nfinal\nassess\xc2\xad\nment\n\n60\n45-\n\nS*\n\n3015\n\n0-\xe2\x96\xa0\n\n-15-30J\n\nv\n* s\n^\n\n\xe2\x80\xa2<\n-\n\n>la\n\nU\n\n60\n\ni \xe2\x80\xa2\xe2\x80\xa2\n\n\xe2\x96\xa0\n\n\xe2\x96\xa0\n\nlAAk||l\xc2\xab MlfiiM* f |\n\nNt\n\nii\n*\n\xc2\xa3 a\n\n\xe2\x80\x9c\n\nmm\n\nm\n\nJ? 50\n\n35-\n\n-25-40-\n\n\xe2\x80\x98 %\n\n\xe2\x96\xa1\n\n50-\n\n-10-\n\n15\n\n30\n\n1---- 1----*---1\xe2\x80\x9445\n4\n6^S 40 55 70 85 100 115\n\n65\n\n5-\n\nr\n\n\\.4 \xc2\xb0A A\n*\xe2\x80\xa2,\n\nSO\n\n20-\n\nj\n\n-30\n\nd\n\n\xe2\x80\xa2\n\n\xe2\x96\xa0\n\nA\n\n\xc2\xbb\n\n20\n\nU* \xe2\x80\xa2 \xc2\xab "\xe2\x96\xa0*\n\n4 .A?#\nA; AiQ\n\n5\n,\np\n\n00\n\n4 \xe2\x80\xa2\n\n\xe2\x96\xa1\n\n*\n\nA\n\nA\n\n\xe2\x96\xa1 GP\n* IBT\n\n:\n\n-10 \'W\n\na\n\n^\n\nA AP\n\nO\n\n-25\nA\n\n-40\n\n^1, Final score a 65\n\nfck- SOofSincfSasel by a 15\n\n....A~......[ ............... ............. ..... ...1-55\n*55-- ___ ____ \xe2\x80\x94\n10 25 40 55 70 85 100 115 25 40 55 70 85 100 115\n\nScore at intake\nFig. 8. Scores for individual children on each measure, plotted as a function of the value at intake along the x-axis, and the change from intake to Year 3 (or?\nYear 2 if the child was not assessed at Year 3) along they-axis. Scores of children in the IBT group are shown as solid circles, scores of children in the AP group*}\nare represented by open triangles, and scores of children in the GP group are shown as open squares. Final scores in the normal range (>85) appear in the?\'\ndark gray region of each panel, and final scores <85 but at least 15 points higher than at intake (i.e., above the dotted line in each panel) appear in the light\ngray region of each panel.\n\n\xe2\x96\xa0assessment scores on other measures showed\'similar patterhs:-Compared to~children~who\'received,eclectic-interventions,-i\nchildren who received IBT were twice as likely to score in the normal range on the final assessment of nonverbal skills,\napproximately three times as likely to score in the normal range on the final assessments of communication and adaptive\nskills, approximately four times as likely to score within the normal range on the final assessments of receptive and\nexpressive communication skills, and almost six times more likely to have a final adaptive behavior skills composite score\nwithin the normal range.\nAs they were at Year 1, average outcomes at Years 2 and 3 were worse for children in the AP and GP groups than for\nchildren in the IBT group, while average outcomes for the two eclectic intervention groups did not differ significantly from\neach other. The mean score for the GP group was higher than the mean score for the AP group on some measures in some\nyears, but there were no statistically reliable differences between outcomes produced by the two eclectic treatments.\nAdditionally, both eclectic treatments performed substantially worse than IBT in producing standardized test scores in the\nnormal range of functioning, and neither eclectic treatment was more likely than the other to produce a favorable outcome.\nThe results for the AP intervention might be surprising to some readers because that intervention was intensive and designed\nspecifically for children with autism. Despite these features, no child from the AP group scored in the normal range on the\nfinal assessment of adaptive functioning. In contrast, more than one-third of the children in the IBT group achieved a normalrange score on the final assessment of adaptive skills. These findings are especially important given the critical contribution\nof adaptive skills to independent functioning throughout the lifespan.\nAlthough scores in the normal range are certainly desirable outcomes, so are other clinically significant improvements.\nChanges in test scores that do not reach the normal range may nonetheless reflect the acquisition of many skills that enhance\nindependent functioning, which in turn produces economic savings due to reduced need for specialized services (Jacobson,\nMulick, & Green, 1998; Motiwala, Gupta, Lilly, Ungar, & Coyte, 2006). About one-third of the children in this study who\nreceived AP or GP interventions had final scores on tests of cognitive or adaptive skills that were at least 15 points higher than\n\nI\ni\n?\n\n\xe2\x80\x982\n\n!\n\n\x0cPet. Reh. App.70\n\nJ.S. Howard et al./Research in Developmental Disabilities 35 (2014) 3326-3344\nBaseline\n\n,\n41.1%\n\nIBT\nchildren\n\n3341\n\nYear3\n\n50.4%\n\nS2.t%\n\n519*\n\n49:6%\n74.05S\'\n\nScore is 8S\nor higher\n\n15.2%\n\n35.0%\n\n1\n\n16.3%\n\nScore is\nbelow.85\nAP and GP\nchildren\n\n92.9%\n\nas.\xc2\xabt\n\n84.8%\n\n88.4%\n\nBaseline\n\nYear-1\n\n94.3%\n\n)\n\n91;8%\n\n)\n\nf\n\n83.8%\n\nfear 2\n\nV\n\nYear 3\n\nFig. 9. Percentages of children who scored in the normal range (gray region) or below 85 (white region) at each assessment time, and who transitioned from\none of those ranges to another on successive assessments. Horizontal arrows indicate maintenance of scores in the normal range (gray arrows) or in the\nbelow-normal range (white arrows). Upward-slanting arrows indicate changes from below-normal to normal-range scores, and downward-slanting arrows\nrepresent changes from normal to below-normal range scores.\n\ntheir intake scores, suggesting that those interventions may produce some benefit for some children with autism. Children in\nthe IBT group, however, were more than twice as likely as children in the other two groups to show changes of that\nmagnitude over the course of treatment. Differences on most other measures were somewhat smaller but equally clear and \xe2\x96\xa0\nin the same direction. Motor skills scores were an exception, as they were somewhat more likely to increase by at least 15\npoints among children in the AP and GP groups than among children in the IBT group. However, that difference was not\nstatistically significant.\nThe multiple regression approach we used for most of our statistical analyses accommodated individual differences (e.g.,\nin parental education and age at diagnosis), but of course those analyses focused on group data. Group comparisons are\n\'apprdpTiate^foFdeterminingiwhichTdfitwoTOpmOfeftreatmentsiistgenerallyimostieffectivejihowever.Tweiurgeicautionxini\nrelying exclusively on group statistics to prognosticate about individual children. It is clear from the individual data\npresented here (Figs. 2-6) that not all children within each treatment group responded similarly to that treatment. Research\ncorrelating child characteristics with differential outcomes might help identify categories of children who are more or less\nlikely to respond well to a given treatment on average, but more precise information about the effects of treatments on\nindividuals with varying characteristics could be gleaned from studies using single-case research designs, perhaps in\ncombination with elements of between-groups designs (Green. 2008; Guyatt et al., 2008; Larson, 1990; Morgan & Morgan,\n2001; Powers et al., 2006). Research methods that focus on changes in individual behavior with treatment could also enable\nanalyses of the differential effectiveness of elements of multicomponent treatments like IBT (e.g., Heyvaert, Maes, Van den\nNoortgate, Kuppens, & Onghena. 2012) as well as treatment targets that function as behavioral cusps to bring the individual\xe2\x80\x99s\nbehavior into contact with new contingencies of reinforcement, thereby producing even more widespread behavior change\n(Rosales-Ruiz 8i Baer, 1997).\n4.2. Changes over the course of treatment\nIn this study, the changes that occurred during the first year of treatment were generally maintained throughout the\nsecond and third years for children in all three groups. Group mean scores in Years 2 and 3 tended to remain within \xc2\xb15 points\nof the corresponding group means at the end of Year 1, with the large differences in favor of IBT after one year largely persisting\nthroughout Years 2 and 3. Other studies comparing IBT with eclectic treatment over similar time periods have produced similar\nfindings (Cohen et al., 2006; Eikeseth et al., 2007). One difference is that the IBT advantage was larger after a mean of 31.4 months\nof treatment than after one year of treatment in the study by Eikeseth et al. (2007). That may be related to the fact that the children\nstudied by Eikeseth et al. were older when they started treatment than the children in our study and the study by Cohen et al.\n\n;\n\n\x0c3342\n\nPet. Reh. App.71\n\nJ.S. Howard et ai/Research in Developmental Disabilities 35 (2014) 3326-3344\n\n(2006), but it might also reflect differences in other child characteristics or the treatment packages (e.g., variations in targets,\npriorities, procedures, etc.).\nMeasures of some skill domains in our study deviated from the trends just described. For instance, the mean motor and\nself-help scores for the 1BT group were higher than those for either eclectic intervention group at the end of Year 1, but the\ndifferences between the final group means on those measures were not statistically significant. That was at least partly due\nto reduced sample sizes at Year 3. It should also be reiterated that motor skills were not delayed substantially for any of the\ngroups at intake, and motor and self-help skills were not among the highest priority treatment targets for many of the\nchildren who received IBT.\nThe fact that most of the largest improvements in the IBT group occurred after one year of treatment might lead some to\nconclude that there is little benefit in extending treatment beyond the first year. Such a conclusion might be warranted if\nthere were compelling evidence to support predictions that improvements would persist if treatment were to end after one\nyear. Our study cannot speak to that hypothesis, because none of the children in the IBT group received just one year of\ntreatment. Nor are we aware of other studies that have tested that hypothesis directly. One group of researchers did,\nhowever, evaluate the performances of 23 young children with autism two years after they had completed a 2-year course of\nIBT (Kovshoff, Hastings, & Remington, 2011). They found that a subgroup of 9 children who had statistically significant\nincreases on tests of cognitive and adaptive skills during treatment maintained those gains after two years with no\ntreatment, but the scores of the other 14 children decreased significantly. Analyses showed that the first subgroup had\nhigher baseline scores and received more intensive treatment than did the second subgroup. Although limited, those\nfindings corroborate our clinical observations that terminating IBT prematurely can be detrimental to many children with\nautism.\nEnding IBT after one year might also be justified if it were reasonably certain that extending treatment would be unlikely\nto produce further clinically significant gains. Again, we have found no compelling evidence to support that prediction. On\nthe contrary, some children in our IBT group made marked improvements in Years 2 and 3 (e.g., see the upward-pointing\narrows in Fig. 9). Other researchers have also documented meaningful improvements occurring in the second, third, and\nfourth year of IBT (e.g., Cohen et al., 2006: Eikeseth et al., 2007; Sallows & Graupner, 2005). We speculate that given the\npervasive and substantial skill deficits exhibited by many young children with autism, one and even two years of IBT is not\nlikely to produce gains that will persist over long periods of time without specialized intervention. The first 1-2 years of IBT\nare typically focused on building many basic, foundational skills. Further intensive treatment seems essential for solidifying\nthose repertoires and for building the more complex social, language, and academic skills required to function successfully in\nregular school and community settings.\n4.3. Limitations\nParticipants in this study were not randomly assigned to groups; instead, treatment assignments primarily reflected\nparental preferences and education team decisions. In Howard et al. (2005), however, we demonstrated empirically that the\nthree groups were functionally equivalent at intake. The only statistically significant group differences were in parental\neducation (parents of children in the IBT group averaged one year more of education than parents of children in the AP and.\nGP groups) and age at diagnosis (children in the IBT group were diagnosed an average of 5 months earlier than children in the\nGP group, who in turn were diagnosed an average of 4 months earlier than children in the AP group). Both variables were\ncontrolledtfor\xc2\xbbstatisticallyiimsubsequent<dataianalyses,ithoughicontroUwastrarely\xc2\xbbnecessaryjbecausetindividual>scoresi\nalmost never covaried with parental education or age at diagnosis.\nAnother limitation is that some children switched between the AP and GP treatments during Years 2 and 3. We have no\ninformation about the reasons for those shifts, but it would be unusual for an education team to recommend moving a child\nout of an effective program and for the child\xe2\x80\x99s family to approve such a change. Therefore, we speculate that the changes may\nspeak to the lack of efficacy of either eclectic approach. The data showed that neither eclectic treatment reliably produced\nmeaningful benefits, and when children switched from one eclectic treatment to the other, there was rarely any\nimprovement with the new treatment. These findings imply that the two eclectic treatments were essentially\nindistinguishable in their efficacy, and that our analyses and conclusions were not compromised by the fact that some\nchildren switched from one eclectic treatment to the other.\nThe impact of mortality on our findings should be considered. Virtually all children were assessed in all domains at intake\nand Year 1, but participation rates were lower in subsequent years. The reduced sample sizes forced us to combine data from\nYears 2 and 3 to analyze outcomes for the nonverbal intelligence, receptive language, expressive language, and motor skills\nmeasures. That precluded mapping developmental trajectories for those domains as precisely as we did for other domains. It\nis important to note, however, that mortality does not seem to have biased the overall findings. In fact, imputation analyses\nsuggest that the group differences we observed were not artifacts of mortality; if anything, the advantage of IBT over the\neclectic treatments would likely have been greater if more comprehensive assessment data were available for Years 2 and 3.\nThe primary limitation of our study may be that there were no measures of the integrity with which any of the treatments\nwas delivered, as we reported in Howard et al. (2005). Additionally, each treatment comprised a number of components, and\nit was not feasible to parse out the contributions of individual components to the outcomes. Nonetheless, our findings\nconverge with those of other studies in which IBT and a comparison eclectic treatment program had similar elements,\nintensity, and duration (e.g., Cohen et al., 2006; Eikeseth et al., 2007). They add to the growing body of evidence that IBT\n\n\x0cPet. Reh. App.72\n\nJ.S. Howard et al./Research in Developmental Disabilities 35 (2014) 3326-3344\n\n3343\n\nproduces significantly larger increases on standardized measures of cognitive and adaptive functioning than other\ntreatments. Although those measures do not capture all repertoires that may be influenced by intervention, they are\nconsidered more objective than indices like classroom placement, and correlate positively with other measures of overall\nand long-term functioning. Thus, there is general consensus among autism researchers that protocols for evaluating\ntreatment effects must include certain standardized instruments (e.g., Eldevik et al., 2009, 2010; Fein et al\xe2\x80\x9e 2013; Martin,\nBibby, Mudford, & Eikseth, 2003; Mundy, 1993; Wolery & Garfinkle, 2002), Collectively, this study and others that used such\nprotocols clearly indicate that 1BT is an effective, evidence-based treatment for young children diagnosed with autism.\nAcknowledgements\nThis study was supported in part by Valley Mountain Regional Center and California State University, Stanislaus. We also\nwish to express our appreciation to Jessica Bailey for her assistance.\nAppendix A. Supplementary data\nSupplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/\nj.ridd.2014.08.021.\nReferences\nBondy, A. S.. & Frost, L A (1994). The picture exchange communication system: Training manual. Cherry Hill, NJ: Pyramid.\nCohen. H., Amerine-Dickens. M., & Smith, T. (2006). Early intensive behavioral treatment: Replication of the UCLA model in a community setting. Developmental\nand Behavioral Pediatrics. 27, SI45-S155.\nDelmolino, L M. (2006). Brief report: Use of DQ for estimating cognitive ability in young children with autism. Journal ofAutism and Developmental Disorders. 36,\n959-963.\nEikeseth, S. (2009). Outcome of comprehensive psycho-educational interventions for young children with autism. Research in Developmental Disabilities. 30.\n158-178.\nEikeseth, S., Smith, T\xe2\x80\x9e Jahr, E., & Eldevik, S. (2002). Intensive behavioral treatment at school for 4- to 7-year-old children with autism: A 1-year comparison\ncontrolled study. Behavior Modification. 2002, 49-68.\nEikeseth, S., Smith, T., Jahr, E., & Eldevik, S. (2007). Outcome for children with autism who began intensive behavioral treatment: between ages 4 and 7: A\ncomparison controlled study. Behavior Modification. 31, 264-278.\nEldevik. S., Eikeseth, S., Jahr, E\xe2\x80\x9e /& Smith, T. (2006). Effects of low-intensity behavioral treatment for children with autism and mental retardation.Journal ofAutism\nand Developmental Disorders, 36, 211-224.\nEldevik. S., Hastings. R. P., Hughes, J. C, Jahr, E., Eikeseth, S\xe2\x80\x9e 8i Cross. S. (2009). Meta-analysis of early intensive behavioral intervention for children with autism.\nJournal of Clinical Child & Adolescent Psychology, 38,439-450.\nEldevik, S., Hastings, R. P., Hughes, J. C, Jahr, E\xe2\x80\x9e Eikeseth, S., & Cross, S. (2010). Using participant data to extend the evidence base for intensive behavioral\nintervention for children with autism. American Journal on Intellectual and Developmental Disabilities, 1/5. 381-405.\nEldevik, S\xe2\x80\x9e Hastings, R. P\xe2\x80\x9e Jahr, E., 8i Hughes. J.C. (2012). Outcomes of behavioral intervention for children with autism in mainstream pre-school settings. Journal\nof Autism and Developmental Disorders, 42, 210-220.\nFein, D\xe2\x80\x9e Barton. M., Eigsti, I., Kelley, E., Naigles, L, Schultz, R. T., & Tyson, K. (2013). Optimal outcome in individuals with a history of autism. Journal of Child <\nPsychology and Psychiatry, 54. 195-205.\nFischer. J. L. Howard, J. S\xe2\x80\x9e Sparkman, C. R., & Moore, A. C. (2009). Establishing generalized syntactical responding in young children with autism. Research in Autism\nSpectrum Disorders, 4, 76-88.\nGreen, G. (2008). Single-case research methods for evaluating treatments for ASD. In S. C. Luce. D. S. Mandell, C. Mazefsky, 8: W. Seibert (Eds.), Autism in\nPennsylvania: A symposium issue of the Speaker\xe2\x80\x99s Journal of Pennsylvania Policy (pp. 119-132). Harrisburg, PA: Legislative Office for Research Liaison,\n^^Pennsylvania .House.of,\nGreen, G. (2011). Early intensive behavior analytic intervention for autism spectrum disorders. In E. Mayville & J. Mulick (Eds.), Behavioral foundations of effective\nautism treatment (pp. 183-199). Cornwall-on-Hudson. NY: Sloan Publishing.\nGreen, C\xe2\x80\x9e Brennan. L C., & Fein, D. (2002). Intensive behavioral treatment for a toddler at high risk for autism. Behavior Modification, 26, 69-102.\nGuyatt, C., Rennie, D.. Meade, M\xe2\x80\x9e & Cook, D. J. (2008). Users\' guides to the medical literature: A manual for evidence-based clinical practice (2nd ed.). New York:\nMcGraw-Hill Professional.\nHeyvaert, M., Maes, B\xe2\x80\x9e Van den Noortgate, W\xe2\x80\x9e Kuppens, S., & Onghena, P. (2012). A multilevel meta-analysis of single-case and small-n research on interventions\nfor reducing challenging behavior in persons with intellectual disabilities. Research in Developmental Disabilities, 33, 766-780.\nHoward, J. S., Sparkman, C. R., Cohen, H. G., Green, G\xe2\x80\x9e & Stanislaw, H. (2005). A comparison of intensive behavior analytic and eclectic treatments for young\nchildren with autism. Research in Developmental Disabilities, 26, 359-383.\nJacobson,J.W., Mulick, J. A., & Green, G. (1998). Cost-benefit estimates for early intensive behavioral intervention for young children with autism - General model\nand single state case. Behavioral Interventions, 13, 201 -226.\nJacobson, N. S., 8i Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and\nClinical Psychology, 59. 12-19.\nKovshoff, H., Hastings, R., & Remington, B. (2011). Two-year outcomes for children with autism after the cessation of early intensive behavioral intervention.\nBehavior Modification. 35. 427-450.\nLarson, E. B. (1990). N-of-1 clinical trials: A technique for improving medical therapeutics. Western Journal of Medicine, 152. 52-56.\nLord, C., & Schopler, E. (1989). The role of age at assessment, developmental level, and test in the stability of intelligence scores in young autistic children.Journal of\nAutism and Developmental Disorders, 19,483-499.\nLovaas, O. I. (1987). Behavioral treatment and normal educational and intellectual functioning in young autistic children. Journal of Consulting and Clinical\nPsychology, 55. 3-9.\nMartin, N.. Bibby, P., Mudford, O. C\xe2\x80\x9e 8i Eikseth. S. (2003). Toward the use of a standardized assessment for young children with autism: Current assessment\npractices in the UK. Autism, 7, 321-330.\nMorgan, D. L, & Morgan, R. K. (2001). Single-participant research design: Bringing science to managed care. American Psychologist. 56,119-127.\nMotiwala, S. S\xe2\x80\x9e Gupta, S., Lilly, M. B., Ungar, W. J., & Coyte, P. C. (2006). The cost-effectiveness of expanding intensive behavioural intervention to all autistic\nchildren in Ontario. Healthcare Policy, i(2), 135-151.\nMundy. P. (1993). Normal versus high-functioning status in children with autism. American Journal on Mental Retardation, 97, 381-384.\nNational Autism Center (2009). National standards report: Randolph, MA: National Autism Center.\n\n\x0c3344\n\nPet. Reh. APP-73\n\nJ.S. Howard et al/Research in Developmental Disabilities 35 (2014) 3326-3344\n\nPowers. S. C\xe2\x80\x9e Piazza-Waggoner, C., Jones, J. S\xe2\x80\x9e Ferguson, K. S\xe2\x80\x9e Daines, C\xe2\x80\x9e & Acton. J. D. (2006). Examining clinical trial results with single-subject analysis: An\nexample involving behavioral and nutrition treatment for young children with cystic fibrosis. Journal of Pediatric Psychology, 31, 574-581.\nReichow, B., 8i Woleiy, M. (2009). Comprehensive synthesis of early intensive behavioral interventions for young children with autism based on the UCLA Young\nAutism Project Model. Journal of Autism and Developmental Disorders, 39, 23-41.\nRemington, B\xe2\x80\x9e Hastings, R. P\xe2\x80\x9e Kovshoff, H\xe2\x80\x9e Espinosa, F\xe2\x80\x9e Jahr, E\xe2\x80\x9e Brown,T\xe2\x80\x9e & Ward, N. (2007). Early intensive behavioral intervention: Outcomes for children with\nautism and their parents after two years. American Journal on Mental Retardation, 112, 418-438.\nRogers, S. J\xe2\x80\x9e & Vismara, L. A. (2008). Evidence-based comprehensive treatments for early autism. Journal of Clinical Child and Adolescent Psychology. 37, 8-38.\nRosales-Ruiz,J\xe2\x80\x9e& Baer, D. M.(1997). Behavioral cusps: A developmental and pragmatic construct for behavioral analysis. Journal ofApplied Behavior Analysis, 30,\n533-544.\nSallows, C. O., & Graupner, T. D. (2005). Intensive behavioral treatment for children with autism: Four-year outcome and predictors. American Journal on Mental\nRetardation, 110,417-438.\nSchmidt, M. (2008). The Sankey diagram in energy and material flow management Part I: History. Journal of Industrial Ecology, 12, 82-94.\nSmith. T\xe2\x80\x9e Groen, A. D\xe2\x80\x9e & Wynne, J. W. (2000). Randomized trial of intensive early intervention for children with peivasive developmental disorder. American\nJournal on Mental Retardation. 105.269-285.\nWolery, M.. 8i Garfinkle, A. N. (2002). Measures in intervention research with young children who have autism. Journal ofAutism and Developmental Disorders, 32,\n463-478.\nZachor, D. A., Ben-ltzchak, E\xe2\x80\x9e Rabinovich, A.. & Lahat, E. (2007). Change in autism core symptoms with intervention. Research in Autism Spectrum Disorders. I,\n304-307.\n\n:\n\n!\n\n\x0cInformatics\n\nlr\xc2\xa9\xc2\xa3, Lrtg|h. App.74\n\nHuman Mutation\nOFFICIAL JOURNAL\n\nPerformance of Mutation Pathogenicity Prediction\nMethods on Missense Variants\n\nHGVl\n\nHUMAN GENOME\nVARIATION SOCIETY\nwww.hgvs.org\n\nJanita Thusberg,1,2 Ayodeji Olatubosun,1 and Mauno Vihinen1,3*\n\'institute of Biomedical Technology, FI-33014 University of Tampere, Finland;2Buck Institute for Age Research, Novato, California; 3Research\nCenter, Tampere University Hospital, Tampere, Finland\nCommunicated by Christophe Beroud\nReceived 20 Mey 2010; accepted revised manuscript 7 December 2010.\nPublished online 22 February 2011 in Wiley Online Library (www.wiley.com/humanmutation). DOI 10.1002/humu.21445\n\nand prioritization of likely candidates from a pool of data.\nA subset of SNPs occur at protein coding regions in the genome,\nABSTRACT: Single nucleotide polymorphisms (SNPs) are\nand from a medical point of view particularly interesting ones are\nthe most common form of genetic variation in humans.\nthe nonsynonymous SNPs (nsSNPs) that lead to an amino acid\nThe number of SNPs identified in the human genome is\nsubstitution at the protein level (referred here to as missense\ngrowing rapidly, but attaining experimental knowledge\nvariants). nsSNPs may affect gene function through their effect on\nabout the possible disease association of variants is\nthe structure and/or function of the encoded protein.\nlaborious and time-consuming. Several computational\nPrediction of the possible disease-association of missense\nmethods have been developed for the classification of\nvariants is a difficult problem because an amino acid substitution\nSNPs according to their predicted pathogenicity. In this\ncan affect the biological function of a gene product in a number of\nstudy, we have evaluated the performance of nine widely\nways [Thusberg and Vihinen, 2009]. An amino acid substitution\nused pathogenicity prediction methods available on the\nmay disrupt sites that are critical in protein function, such as\nInternet The evaluated methods were MutPred, rtsSNPAcatalytic residues or ligand-binding pockets. A missense mutation\nnalyzer, Panther, PhD-SNP, PolyPhen, PolyPhen2, SIFT,\nmay as well lead to alterations in the structure, folding, or stability\nSNAP, and SNPs&GO. The methods were tested with a set\nof the protein product, thereby altering or preventing the function\nof over 40,000 pathogenic and neutral variants. We also\nof the protein. On the other hand, amino acid substitutions do\nassessed whether the type of original or substituting amino\nnot necessarily affect protein function. Effects of missense\nacid residue, die structural class of the protein, or the\nmutations are often the most difficult to predict while the\nstructural environment of the amino acid substitution, had\nconsequences of most deletions, insertions, and nonsense muta\xc2\xad\nan effect on the prediction performance. The performances\ntions are rather self-evident.\nof the programs ranged from poor (MCC 0.19) to\nMany methods have been developed for the computational\nreasonably good (MCC 0.65), and the results from the\nprediction of the phenotypic effect of nsSNPs. Some of them are\nprograms correlated poorly. The overall best performing\nfor the study of very specific mechanisms, whereas others are\nmethods in this study were SNPs&GO and MutPred, with\n\xc2\xabac*mracles^^ching4D.S^and^,SL,mspectiye!y\xc2\xabKMM\xc2\xbb\xc2\xabw\xc2\xbbWw\xc2\xab __ developed-to-predict-whether-a-variationjs-harmful .or-benign\xe2\x80\x94All\nof the variation tolerance methods evaluated in this study follow a\nHum Mutat 32:358-368, 2011. \xc2\xa9 2011 Wiley-Liss, Inc.\nsimilar procedure in which a missense variant is first labeled with\nKEY WORDS: method evaluation; bioinformatics; patho\xc2\xad\nproperties related to the damage it may cause to the protein\ngenicity prediction; SNPs\nstructure or function. The resulting feature vector is then utilised\nto decide whether the variant is pathogenic or not. The methods\ndiffer in the properties of the variant they take into account in the\nprediction,\nas well as in the nature and possible training of the\nIntroduction\nclassification method used for decision making. The nine widely\nMost human genetic variation is represented by single used methods evaluated in this study are based on evolutionary\nnucleotide polymorphisms (SNPs), and many of them are believed information (Panther [Thomas et al., 2003], PhD-SNP SVMto cause phenotypic differences between individuals. Owing to the Profile [Capriotti et al., 2006], and SIFT [Ng and Henikoff,\napplication of high-throughput sequencing methods, the number 2001]), or a combination of protein structural and/or functional\nof identified variants in the human genome is growing rapidly, but parameters and multiple sequence alignment derived information\nidentifying those variations responsible for specific phenotypes is (MutPred [Li et al., 2009], nsSNPAnalyzer [Bao et al., 2005],\na laborious task. The ability to discriminate between pathogenic PolyPhen [Ramensky et al., 2002], PolyPhen2 [Adzhubei et al.,\nand benign variants computationally could significantly aid 2010], SNAP [Bromberg and Rost, 2007], and SNPs&GO\ntargeting disease-causing mutations by helping in the selection\n[Calabrese et al., 2009]). The machine-learning methods utilize\nneural networks (NN) (SNAP), random forests (RF) (MutPred,\nnsSNPAnalyzer), or support vector machines (SVMs) (PhD-SNP,\n\'Correspondence to: Mauno Vihinen, Institute of Biomedical Technology, FI-33014\nUniversity of Tampere, Finland. E-mail: mauno.vihinen@uta.fi\nSNPs&GO) for classification, whereas the other methods classify\nContract grant sponsors: The Tampere Graduate School in Biomedicine and\nvariants according to empirically derived rules (PolyPhen),\nBiotechnology; The Sigrid Jusdlius Foundation; The Academy of Finland; The Medical\nBayesian methods (PolyPhen2), or mathematical operations\nResearch Fund of Tampere University Hospital.\n(SIFT, Panther) (Table 1).\n\xc2\xa9 2011 WILEY-LISS, INC.\n\n\xe2\x80\xa2A\n\nV\nr\n\nl\n\n\x0cPet. Reh. App.75\nTable 1.\nMethod\n\nSummary of the Evaluated Methods\nBased on\n\nMutPred\nRF\nnsSNPAnalyzer RF\nPanther\nAlignment\n\nTraining set\n\nConservation analysis\n\nHGMD, Swiss-Prot\nSwiss-Prot\n\nSIFT, Pfam, PSI-BLAST\nSIFT\nPanther library, HMMs\n\nSwiss-Prot\n\nSequence environment,\nsequence profiles\nPSIC profiles\n\nStructural attributes\n\nAnnotations\n\nPredicted attributes\nHomologue mapping\n\nSwiss-Prot\n\nhttp://mutpred.mutdb.org/\nhttp://snpanalyzer.uthsc.edu/\nhttp://www.pantherdb.org/tools/\ncsnpScoreForm.jsp\nhttp://gpcr2.biocomp.unibo.it/cgi/\npredictors/PhD-SNP/PhD-SNP.cgi\nhttp://genetics.bwh.harvard.edu/pph/\n\nPfam domain\n\nhttp://genetics.bwh.harvard.edu/pph2/\n\nscores\n\nPhD-SNP\n\nSVM\n\nPolyPhen\n\nEmpirical rules\n\nPo!yPhen2\n\nBayesian\nclassification\nAlignment\n\nSIFT\n\nSwiss-Prot, neutral\npseudo-mutations\n\nPSIC profiles\n\nHomologue\nmapping/predictions\nHomologue\nmapping/predictions\n\nWebsite\n\nMSAs\n\nhttp://sift.jcvi.org/\n\nscores\n\nSNAP\n\nNN\n\nSNPs&GO\n\nSVM\n\nPMD, neutral\npseudo-mutations\nSwiss-Prot\n\nPSIC profiles, Pfam,\nPSI-BLAST\nSequence environment,\nsequence profiles, Panther\n\nPredictions\n\nhttp://rostlab.org/services/snap/\nGO\n\nhttp://snps-and-go.biocomp.unibo.it/\nsnps-and-go/\n\nGO, Gene Ontology; HGMD, Human Gene Mutation Database; HMM, Hidden Markov model; NN, neural network; MSA, multiple sequence alignment; PMD, Protein\nMutant Database; PSIC, position-specific independent counts; RF, random forest; SVM, support vector machine.\n\nAs mutation data and information about the genotypes of the possible disease-relation of nsSNPs, derived from literature\nindividuals accumulate, understanding the molecular level effects [Yip et al., 2008]. The complementing LSDB data was retrieved\nof variations and elucidating their possible disease association is an manually from each database. The pathogenic and neutral datasets\nimportant research challenge [Karchin, 2009; Mooney, 2005; Ng contained 1,190 and 9,011 proteins, respectively, of which 445 and\nand Henikoff, 2006; Steward et al., 2003; Thusberg and Vihinen,\n1,205 were found to have three-dimensional structure coordinates\n2009]. Numerous locus-specific databases (LSDBs) have been in the Protein Data Bank (PDB) [Berman et al., 2000]. The datasets\nestablished for the collection, analysis, and distribution of disease- are available for download at our Website (http://bioinf.uta.fi).\nrelated variation information in certain genes. Data for several\nBoth datasets were run by all of the nine methods studied here.\ngenes is available, for example, in the protein knowledgebase The number of results from nsSNPAnalyzer is much smaller than\nSwissProt [Yip et al., 2004] and PhenCode [Giardine et al., 2007], the original number of cases in the input data, because the\nwhich is a database that connects human variant data with program only accepts mutations in those sequences for which a\nphenotypic information from LSDBs with genomic data from the homologous protein is found in the ASTRAL database [Chandonia\nENCODE project and other resources in the UCSC Genome et al., 2004]. A large number of proteins in our dataset did not\nBrowser [Raney et al., 2011]. SNP information is available in match with any entry in the database, thus limiting the number of\ndbSNP [Sherry et al., 2001], a genetic variation database. Several cases that could be analysed by nsSNPAnalyzer.\ntools for the prediction of the phenotypic consequences of\nTwo kinds of subdatasets were constructed from the original\nmissense variants are available, but without knowledge about the pathogenic and neutral datasets. First, a structural subdataset was\nquality of predictions, choosing the best method and evaluating compiled from the part of both datasets for which structural data\nthe reliability of its outcome is impossible. We therefore performed was available in the PDB, to study the effect of available structure\n.the-first-comDrehensive_svstematic-evaluation-.of nine, bioinfor^. ^ata>ont{3^diction>j3erformanceiiSecondJ_forj3robing<the^ffec^\xc2\xa3\nmatics tools predicting the phenotypic effects of missense variants. using Swiss-Prot-derived data as part of the pathogenic testing set,\nwe constructed a subdataset containing only pathogenic variants\nnot present in Swiss-Prot. The corresponding neutral dataset was\nMaterials and Methods\ncompiled by randomly selecting an equal number of variants from\nthe original neutral test set.\nDatasets\nTo test whether the differences in method performance with\nWe built a positive dataset (referred to as pathogenic dataset) of these subdatasets was caused by smaller testing set size, we\n19,335 missense mutations from the PhenCode database [Giardine constructed 100 sample datasets each containing 1,000 pathogenic\net al., 2007] (downloaded in June 2009), registries in IDbases and 1,000 neutral variants randomly picked from the original\n[Piirila et al., 2006] and from 18 individual LSDBs, and a negative datasets, and compared the average MCCs obtained with the\n(neutral) dataset of 21,170 human nonsynonymous coding SNPs MCCs from the subdatasets.\nThe Pathogenic-or-not Pipeline (PON-P) [Thusberg and Vihinen,\nwith an allele frequency >0.01 and chromosome sample count\n> 49 from the dbSNP database [Sherry et al., 2001] build 131. The 2009] was used for the submission of sequences and variants into\nSNP data was filtered so that none of the dbSNP entries included in the analysis programs nsSNPAnalyzer, Panther, PhD-SNP, PolyPhen,\nour dataset contained OMIM links to minimize the number of PolyPhen2, SIFT, and SNAP. PON-P is a service that simultaneously\ndisease-associated SNPs in the neutral dataset. Entries annotated as submits the input data provided by the user to selected prediction\n\xe2\x80\x9cputative\xe2\x80\x9d or \xe2\x80\x9cpredicted\xe2\x80\x9d were also left out. In addition, the methods. MutPred and SNPs8tGO were run locally at the\nneutral dataset was searched against the pathogenic dataset in corresponding laboratories by the developers of the methods.\norder to remove possible duplicates and further minimise the\nprobability of having false negative cases in the set The PhenCode\ndata was filtered so that only SNPs annotated as disease causing in\nthe SwissProt database were taken into our pathogenic dataset.\nSwiss-Prot provides high-quality hand-curated information about\n\nPrediction Methods\nThe effects of mutations and SNPs were predicted by the\nprograms MutPred [Li et al., 2009], nsSNPAnalyzer [Bao et al.,\nHUMAN MUTATION, Vol. 32, No. 4, 358-368, 2011\n\n359\n\n\x0cPet. Reh. App.76\n2005], Panther [Thomas et aL, 2003], PhD-SNP [Capriotti et al.,\n2006], PolyPhen [Ramensky et al., 2002], PolyPhen2 [Adzhubei\net aL, 2010], SIFT [Ng and Henikoff, 2001], SNAP [Bromberg and\nRost, 2007], and SNPs&GO [Calabrese et al., 2009]. Key properties\nof the methods are listed in Table 1. The default parameters of all\nprograms were applied, and only the protein sequence and missense\nvariant were given as input information for each program, as in a\nnormal user situation of unknown variant analysis.\n\nMutPred\nMutPred is a Random Forest-based classification method that\nutilizes several attributes related to protein structure, function,\nand evolution. MutPred utilizes the SIFT method [Ng and\nHenikoff, 2003] for defining the evolutionary attributes, along\nwith PSI-BLAST, transition frequencies [Bromberg and Rost,\n2007], and Pfam profiles [Finn et al., 2010], In MutPred,\nstructural descriptors include prediction of secondary structure\nand solvent accessibility by the method PHD [Rost, 1996],\ntransmembrane helix prediction by TMHMM [Krogh et al., 2001],\ncoiled-coil structure prediction by MARCOIL [Delorenzi and\nSpeed, 2002], stability prediction by I-Mutant 2.0 [Capriotti et al.,\n2005], B-factor prediction [Radivojac et al., 2004], and disorder\nprediction by DisProt [Peng et al., 2006]. Function-related\nattributes include predictions of DNA-binding residues [Ahmad\net al., 2004], catalytic residues, calmodulin-binding targets\n[Radivojac et al., 2006], and posttranslational modification sites\n[Daily et al., 2005; Iakoucheva et al., 2004; Radivojac et al., 2010].\nThe MutPred method estimates effects of an amino acid\nsubstitution on the set of defined properties of a protein and based\non those estimates, predicts whether an amino acid substitution is\nlikely to have phenotypic effects.\n\n(most likely to be deleterious). The cutoff for classifying a\nmissense variant to be pathogenic can be defined by the user, but\nthe authors of the method advice to use a cutoff of \xe2\x80\x943 for\nclassification [Thomas et al., 2003].\n\nPhD-SNP\nPhD-SNP is a prediction method based on single-sequence and\nsequence profile based support vector machines trained on SwissProt variants [Yip et al., 2004]. The single-sequence SVM (SVMSequence) classifies the missense variant to be pathogenic or\nneutral based on the nature of the substitution and properties of\nthe neighboring sequence environment. The profile-based SVM\n(SVM-Profile) utilizes sequence profile information taken from\nMSAs, and classifies the variant according to the ratio between the\nfrequencies of the wild-type and substituted residue. A decision\ntree algorithm chooses which one of the two SVMs described\nabove is to be used at each case based on the occurrence of wildtype and mutant amino acids at the given position.\n\nPolyPhen\n\nPolyPhen (Polymorphism Phenotyping) uses a rule-based cutoff\nsystem to classify variants. It initially characterises the input\nmissense variant by various sequence, structure, and phylogeny\nbased descriptors. The sequence-based characterisation includes\nSWALL database [Johnson and Todd, 2000] annotations for\nsequence features, a transmembrane predictor TMHMM [Krogh\net al., 2001] and PHAT [Ng et al., 2000] transmembrane-specific\nmatrix score for substitutions at predicted transmembrane regions,\nthe Coils2 program [Lupas et al., 1991] for prediction of coiled coil\nregions, and the Signal? [Nielsen et al., 1997] program to predict\nsignal peptide regions. Phylogenetic information is derived by\nconstructing a profile matrix from aligned sequences by the PSIC\nnsSNPAnalyzer\n(Position-Specific Independent Counts) software [Sunyaev et al.,\nnsSNPAnalyzer is a machine-learning method that integrates 1999]. The structural descriptors are obtained by mapping the\nmultiple sequence alignment (MSA) and protein structure analysis missense variant onto the corresponding or similar protein and\nto classify missense variants. The input protein sequence is then using the DSSP program [Kabsch and Sander, 1983] for\nsearchedagainsttheASTRALdatabaselChandoniaetaLjTOO^ secondary structure information, solvent-accessible surface, and\nfor homologous protein structures, and extracts features of the Tf^JTchlieSTaHmgles^raSciltJOirT\xe2\x80\x99olyPiTer^alailate^th1? normalenvironment of the substitution from the obtained structure, ized accessible surface area and changes in accessible surface\nnamely, the solvent accessibility, environmental polarity, and propensity resulting from the amino acid substitution, change in\nsecondary structure. The SIFT method [Ng and Henikoff, 2003] is residue side chain volume, region of the Ramachandran map,\nused for calculating the normalised probability of the substitution normalized B factor, and loss of a hydrogen bond according to the\nin the MSA, and the similarity and dissimilarity between the Hbplus program [McDonald and Thornton, 1994]. The SWALL\nmutated, that is, original, and mutant residue is also taken into database annotations are utilized in the structure analysis such that\naccount. The program then uses a Random Forest classifier the program checks whether the substitution site is in spatial\ntrained by a dataset prepared from the Swiss-Prot database [Yip contact with critical residues annotated to be involved in forming\net al., 2004] to classify the variant to be disease-associated or binding sites or active sites. Additionally, the contacts of the\nfunctionally neutral.\nsubstituted residue with ligands or subunits of the protein molecule\nare checked. After characterising the variant, PolyPhen applies\nempirically derived rules based on the characteristics to predict\nPanther\nwhether a missense variant is damaging or benign.\nThe Panther Evolutionary Analysis of Coding SNPs (referred\nsimply to as Panther in this article) calculates substitution\nposition-specific evolutionary conservation (subPSEC) scores\nbased on alignments of evolutionarily related proteins to predict\nthe pathogenicity. The alignments are obtained from the\nPANTHER library of protein families based on Hidden Markov\nModels (HMMs). The subPSEC score describes the amino acid\nprobabilities, in particular, positions among evolutionarily related\nsequences, and the values range from 0 (neutral) to about \xe2\x80\x9410\n\n360\n\nHUMAN MUTATION, Vol. 32, No. 4, 358-368, 2011\n\nPolyPhen2\nPolyPhen2 utilizes a combination of sequence- and structurebased attributes for the description of an amino acid substitution,\nand the effect of mutation is predicted by a naive Bayesian classifier.\nThe sequence-based features include PSIC scores and MSA proper\xc2\xad\nties, and position of mutation in relation to domain boundaries as\ndefined by Pfam [Finn et al., 2010], The structure-derived features\n\n\x0cPet. Reh. App.77\nare solvent accessibility, changes in solvent accessibility for buried\nresidues, and crystallographic B-fector.\n\nSIFT\nSIFT (Sorting Intolerant From Tolerant) makes inferences from\nsequence similarity using mathematical operations. SIFT con\xc2\xad\nstructs an MSA and considers the position of the missense variant\nand the type of the amino acid change. Based on the amino acids\nappearing at each position in the MSA, SIFT calculates the\nprobability that a missense variant is tolerated conditional on the\nmost frequent amino acid being tolerated.\n\n(ASA) values for each mutation site were assigned by the program\nSTRIDE [Frishman and Argos, 1995]. We classified residues with\nASAs < 10% as buried and with ASAs >25% as exposed, similarly\nas in a previous study [Khan and Vihinen, 2010].\n\nDetermination of Structural Classes of Proteins\nThe CATH database version 3.3 [Orengo et al., 1997] was used\nto group studied proteins according to their secondary and\ntertiary structure types.\n\nStatistical Analyses\nSNAP\nSNAP (Screening for Nonacceptable Polymorphisms) is a neural\nnetwork-based tool for the prediction of the effect of a missense\nvariant. The method utilises evolutionary information from PSIBLAST [Altschul et al., 1997] frequency profiles and PSIC [Sunyaev\net al., 1999], transition frequencies for mutations, biophysical\ncharacteristics of the substitution, secondary structural informa\xc2\xad\ntion, and relative solvent accessibility values predicted by PROFsec/\nPROFacc [Rost, 1996; Rost and Sander, 1994], chain flexibility\npredicted by PROFbval [Schlessinger et al., 2006], protein family\nevolutionary information, and information about domain bound\xc2\xad\naries from Pfam [Finn et al., 2010], and Swiss-Prot annotations\n[Bairoch and Apweiler, 2000] to classify a missense variant The\ntraining sets for the NN were constructed from Protein Mutant\nDatabase (PMD) [Kawabata et al., 1999] data complemented by a\nset of neutral pseudomutations generated by the authors of the\nmethod as described in Bromberg and Rost [2007],\n\nThe quality of the predictions is described by six parameters:\naccuracy, precision, sensitivity, specificity, negative predictive\nvalue (NPV) and Matthews correlation coefficient (MCC). In\nthe following equations, tp, tn, fp, and fn refer to the number of\ntrue positives, true negatives, false positives and false negatives,\nrespectively.\ntp+tn\nAccuracy =\ntp+tn+fp+fn\nPrecision = \xe2\x80\x94\xe2\x80\x94\n\ntp+fp\n\nSpeClflCIty =]fhn\ntp\ntp+fn\ntn\nNPV =\ntn+fn\n\nSensitivity =\n\nMCC =\n\ntp x tn \xe2\x80\x94 fn x fp\ny/(tp+fn)(tp+fp)(tn+fn)(tn+fp)\n\nSNPs&GO\n\nThe MCC [Matthews, 1975] is a very important evaluation\nSNPs&GO is an SVM classifier based on mutation type and statistic as it is unaffected by the differing proportion of neutral\nsequence environment information, sequence profiles taken from and pathogenic datasets predicted by the different programs.\nMSAs, predictions from the program Panther [Thomas et al., Because of its insensitivity to differing test set sizes, it gives a more\n2003], and a function-based log-odds score describing informa- balanced assessment of performance than the other performance\n\'tion-abTmt"protEiri*funCtion*defined\xc2\xab\'by\xc2\xbb@ene\xc2\xbb\xc2\xa9ntology\xc2\xbb(G\xc2\xa9)i \xc2\xbbmeasuresi[\xc2\xa3aldiietial.7*2000]!^^H^^^^M^^^^^Hai^^^^^^H\nTo be able to correlate the quality parameters for different\nterms [Ashburner et al., 2000],\nFrom the output of the programs, we only took the binary programs with different sizes of test sets containing different\nprediction (pathogenic/neutral) into consideration without taking amounts of pathogenic and neutral cases, the numbers of neutral\ninto account any confidence values provided by some of the cases were normalized to be equal to the number of pathogenic\nprograms. Panther provides a numerical output rather than a cases for each program.\nSubstitution statistics for both the pathogenic and neutral\nbinary classification (subPSEC score), which we converted to a\nbinary prediction using a cutoff point of \xe2\x80\x943 as recommended in datasets were analyzed by comparing the frequencies of the\n[Thomas et al., 2003], PolyPhen and PolyPhen2 classify the effects substitutions with the expected values that were calculated using\nof a missense variant into three categories: \xe2\x80\x9cProbably pathogenic,\xe2\x80\x9d the distribution of all amino acids in the datasets. For the original\n\xe2\x80\x9cPossibly pathogenic,\xe2\x80\x9d and \xe2\x80\x9cBenign.\xe2\x80\x9d We converted these into residues, the expected values were calculated with regard to their\nbinary classifications in two ways, first by considering only the codon diversity thereby taking into account all possible amino\n\xe2\x80\x9cProbably pathogenic\xe2\x80\x9d class as pathogenic and the \xe2\x80\x9cPossibly acid substitutions. The chi-square test was used to determine the\npathogenic\xe2\x80\x9d and \xe2\x80\x9cBenign\xe2\x80\x9d classes as neutral, and second, by significance of the results and chi-square was calculated as:\nconsidering both the \xe2\x80\x9cProbably pathogenic\xe2\x80\x9d and \xe2\x80\x9cPossibly\n(/We)2\npathogenic\xe2\x80\x9d classes as pathogenic, and the \xe2\x80\x9cBenign\xe2\x80\x9d class as\nX2=^\nfe\nneutral. These two ways of classifying the variants are referred to\nas PolyPhen(2)a and PolyPhen(2)b in this study, respectively.\nwhere/\xe2\x80\x9e is the observed frequency and/,, is the expected frequency\nfor an amino acid. The p-values were estimated in a one-tailed\nDetermination of Secondary Structural Elements\nfashion.\nand Accessible Surface Areas\nCorrelations between the program outputs were calculated by\nThe 3D structure coordinates of proteins were obtained from the counting all of the common cases and those predicted correctly,\nPDB. Secondary structural information and accessible surface area and using Spearman\xe2\x80\x99s rank correlation coefficient.\nHUMAN MUTATION, Vol. 32, No. 4, 358-368, 2011\n\n361\n\nr. \xe2\x96\xa0\n\n*\n\n\x0cPet. Reh. App.78\nResults\nTest Set Features\nThe distributions of mutated and mutant amino acids in both\npathogenic and neutral datasets are biased (Table 2), and only a few\nresidues occur as expected on the grounds of codon diversity. In the\npathogenic dataset (mutation data), A, C, G, M, R, W, and Y are\noverrepresented among the original (mutated) amino acid residues,\nwhereas E, F, I, K, L, N, Q, S, T, and V are significantly\nunderrepresented. These results are in line with previous observa\xc2\xad\ntions for distributions of disease-causing mutations in protein\nsecondary structural elements [Khan and Vihinen, 2007], except for\nthe overrepresentation of A and Y, and underrepresentation of L, N,\nS, and V in our data. In the neutral dataset, the distributions of\nmany amino acids differ from the distributions in the pathogenic\nset. Most importantly, cysteines are highly underrepresented among\nthe substituted positions, as opposed to their frequent mutation in\n\nTable 2.\n\nthe pathogenic dataset. This might be due to the important role of\ncysteines in folding of many proteins as they are capable of forming\ndisulphide bonds, and therefore the substitution of cysteines in\nproteins transported through endoplasmic reticulum by any other\nresidue can rarely be neutral in terms of protein structure and\nfunction. Other differences between the datasets are the under\xc2\xad\nrepresentation of mutated glycine, tryptophan, and tyrosine\nresidues in the neutral set as opposed to their frequent mutation\nin the pathogenic set, and the overrepresentation of isoleucine,\nasparagine, threonine, and valine residues in the neutral variation\ndata, contrasting their underrepresentation in the mutation data.\nThe distributions of mutant or substituting amino acids are also\nvery biased in both pathogenic and neutral datasets, and the\namino acid residues I, P, R, T, V, and Y have opposite distributions\nin the mutation and neutral sets. Interestingly, proline residues are\nhighly overrepresented among the substituting residues in the\nmutation dataset, and underrepresented in the negative set.\n\nAmino Acid Distributions in the Pathogenic (Mutations) and Neutral (SNPs) Datasets\nWild-type residues/pathogenic variants\nObserved\n\nA\nC\nD\nE\nF\nG\nH\nI\nK\nL\nM\nN\nP\nQ\nR\nS\nT\nV\nY\nAll\n\n1224\n943\n950\n994\n537\n2087\n554\n642\n497\n1497\n520\n605\n1192\n454\n2797\n1135\n802\n919\n*3%\n610\n19335\n\nx2\n\nP-value\n\n3737.28***\n481.79***\n1.52\n143.32***\n68.53***\n395.42***\n1.20\n79.64***\n390.28***\n157.84***\n16.39***\n29.59***\n2.95\n274.52***\n2426.45***\n177.55***\n75.12***\n85.93***\nm*\xc2\xb0 ^iii\n5.85*\n\n0.000\n8.71E-107\n0.218\n5.02E-33\n1.25E-I6\n5.46E-88\n0.273\n4.49E-19\n7.20E-87\n3.35E-36\n5.16E-05\n5.35E-08\n0.086\n1.17E-61\n0.000\n1.66E-40\n4.42E-18\n1.86E-20\n0.016\n\nExpected\n252.5\n468.1\n988.7\n1449.8\n766.1\n1355.0\n528.8\n911.4\n1173.9\n2068.4\n435.5 \xe2\x80\xa2\n754.4\n1252.8\n970.0\n1136.4\n1681.4\n1087.9\n1246.3\n\n>M4i\n553.1\n19335\n\nWild-type residues/neutral variants\nObserved\n\nExpected\n\nx2\n\nP-value\n\nA\nC\nD\nE\nF\nG\nH\nI\nK\nL\nM\nN\nP\nQ\nR\nS\nT\nV\n\n1852\n424\n991\n1273\n458\n1182\n530\n996\n774\n1270\n642\n894\n1277\n875\n2376\n1648\n1482\n1682\n\n111.82***\n5.24*\n0.70\n43.31\n123.83***\n26.85***\nU2\n5.53*\n164.85***\n336.34***\n90.32***\n17.61***\n1.62\n22.79***\n1247.88***\n11.72**\n98.72\xe2\x80\x99\n138.46***\n\nY\nAll\n\n377\n21170\n\n1449.4\n473.9\n1017.8\n1530.4\n766.0\n1374.1\n555.0\n924.5\n1223.0\n2113.0\n442.2\n777.0\n1323.3\n1028.1\n1168.5\n1793.0\n1145.7\n1263.7\n\xe2\x80\x94251.8,\n549.8\n21170\n\n3.91 E-26\n0.022\n0.401\n4.68E-11\n9.16E-29\n2.20E-07\n0.289\n0.019\n9.87E-38\n4.00E-75\n2.03E-21\n2.71E-05\n0.203\n1.81E-06\n2.40E-273\n0.001\n2.91 E-23\n5.78E-32\n.9.17E-Q8\n1.71E-13\n\nMutant residucs/pathogenic variants\nA\n\n622\n\nC\n\n1233\n900\n719\n623\n922\n918\n619\n834\n1225\n534\n662\n1609\n808\n2084\n1502\n1012\n1195\n638\n676\n19335\n\nD\nE\nF\n\nG\nH\n1\nK\nL\nM\nN\nP\n\nQ\nR\n\nS\nT\nV\nW\nY\n\nAll\n\n1267.9\n563.5\n633.9\n563.5\n633.9\n1232.7\n633.9\n950.9\n563.5\n1796.1\n317.0\n633.9\n1267.9\n563.5\n1831.4\n1796.1\n1267.9\n1267.9\n246.5\n\n493.1\n19335\n\n329.01***\n795.45***\n111.67***\n42.91***\n\n0.19\n78.29***\n\n127.29***\n115.85***\n\n129.85***\n181.62***\n\n148.61***\n1.24\n91.78***\n106.09***\n34.85***\n48.17***\n51.64***\n4.19*\n\n621.62***\n67.88***\n\nOk SA***\n\n54.31***\n\nMutant residues/neutral variants\n1.58E-73\n5.26E-175\n4.22E-26\n5.72E-11\n0.664\n8.90E-19\n1.61E-29\n5.14E-27\n4.41E-30\n2.15E-41\n3.50E-34\n0.265\n9.67E-22\n7.05E-25\n3.56E-09\n3.91E-12\n6.68E-13\n0.041\n3.32E-137\n1.74E-16\n\nA\n\n1061\n\nC\n\n722\n666\n825\n855\n1376\n967\n1139\n1171\n1390\n828\n845\n1176\n1056\n1431\n1691\n1517\n1589\n471\n394\n21170\n\nD\nE\nF\n\nG\nH\nI\nK\nL\nM\nN\nP\n\nQ\nR\nS\nT\nV\nW\nY\n\nAll\n\n1388.20\n617.0\n694.1\n617.0\n694.1\n1349.6\n694.1\n1041.1\n617.0\n1966.6\n347.0\n694.1\n1388.2\n617.0\n2005.2\n1966.6\n1388.2\n1388.2\n269.9\n539.9\n21170\n\n77.12***\n17.88***\n\n1.14\n70.14***\n37.30***\n0.52\n107.30***\n9.20**\n497.49\'\n169.06***\n\n666.52***\n32.81\n32.44***\n\n312.40\'\n164.41***\n38.63***\n\n11.95**\n29.05***\n149.78***\n39.41***\n\n1.61E-18\n2.36E-05\n0.286\n5.53E-17\n1.0 IE-09\n0.473\n3.83E-25\n0.002\n3.34E-110\n1.19E-38\n5.72E-147\n1.02E-08\n1.23E-08\n6.56E-70\n1.23E-37\n5.13E-10\n0.001\n7.07E-08\n1.93E-34\n3.44E-10\n\nThe chi-square values in italics identify residues that are underrepresented and the values in bold identify overrepresented residues in comparison to random distributions\nderived theoretical codon usage frequencies. Significance levels are *P<0.05; **P<0.01; ***P<0.001.\n\n362\n\nHUMAN MUTATION, Vol. 32, No. 4, 358-368, 2011\n\n\x0cPet. Reh. App.79\nProline is a known secondary structure breaker [Chou and\nFasman; 1974] and therefore mutations to P are often pathogenic.\n\nPerformance of Prediction Methods\nTo evaluate the performance of the programs predicting the\npathogenicity of missense variants, we used six measures:\naccuracy, precision (or positive predictive value, PPV), specificity,\nsensitivity, NPV, and MCC. The values for these measures are\npresented in Table 3 for all the missense variants. SNPs&GO\nperformed best in terms of accuracy (0.82), precision (0.90),\nspecificity (0.92), and MCC (0.65), but sensitivity was higher in\nsix other methods, and MutPred, Panther, PolyPhen2b, and SNAP\nperformed better in terms of NPV. nsSNPAnalyzer performed\nworst in terms of MCC (0.19), accuracy (0.60), NPV (0.60), and\nprecision (0.59). The two versions of PolyPhen have very similar\noverall performance; however, PolyPhen2 is recommended\nbecause the quality measures are more balanced.. The version\nclassifying \xe2\x80\x9cProbably pathonegenic,\xe2\x80\x9d PolyPhen2a, as harmful is\nsomewhat better than the other option.\nIn Table 3, the results are presented for the subset of cases for\nwhich structural information could be assigned. The performance of\nall methods was generally worse except for sensitivity, which is better\n\nTable 3.\n\nPerformance of Prediction Methods\nMutPred\n\nnsSNPAnalyzer\n\nPerformance of prediction methods {full data)\n13829\n4360\ntp\xc2\xb0\nfhfl\n2778\n2507\ntna\n15891\n1319\n4557\n943\nfpa\ncases +\xc2\xb0\n16336\n7138\ncases\n2262\n20448\nAccuracy6\n0.81\n0.60\nPrecision6\n0.79\n0.59\nSpecificity6\n0.78\n0.58\nSensitivity6\n0.85\n0.61\nNPV6\n0.84\n0.60\nMCC6\n0.63\n0.19\nm\n\nfor all methods. SNPs&GO performed best also in the structural\nsubcategory considering accuracy, precision, specificity, and MCC,\nand MutPred was the best method in terms of sensitivity and NPV.\nTo test whether the poor performance was due to the smaller\ndataset size we sampled the full dataset results for those cases for\nwhich structural data was not available. We then compared the\naverage MCC values of the samples to those obtained for the full\ndataset. The 100 sample datasets each contained randomly picked\n1,000 neutral and 1,000 pathogenic variations. The average MCCs\nof the sample datasets were comparable to the MCCs of the full\ndataset in the case of Panther (average sample MCC 0.53), PhDSNP (0.43), PolyPhen2b (0.39), and SNAP (0.47). For the other\nmethods the MCC values were rather close when comparing the\nfull dataset to the subdataset. We conclude that the large\ndifferences in the MCCs of the programs between the full dataset\nand the set for which structures were available (Table 3) were not\ndue to the differences in the sizes of these datasets but were caused\nby some other factors, that is, differences in the performance of\nthe methods when predicting on different types of data.\nWe also performed the analyses for a dataset that consisted only\nof LSDB-derived mutations not found in SwissProt (Table 3). This\nwas done as some methods have been trained with Swiss-Prot\ndisease-causing mutations. Because all methods (except SNPs&GO),\n\nPanther\n\nPhD*SNP\n\nPolyPhenla\n\n9689\n2859\n8676\n2797\n12548\n11473\n0.76\n0.76\n0.76\n0.77\n0.77\n0.53\n\n11900\n6896\n16788\n4377\n18796\n21165\n0.71\n0.75\n0.79\n0.63\n0.68\n0.43\n\n10093\n9185\n17669\n3199\n19278\n20868\n0.69\n0.77\n0.85\n0.52\n0.64\n0.39\n\n3934\n1009\n735\n441\n4943\n1176\n0.71\n0.68\n0.63\n0.80\n0.75\n0.43\n\n5041\n2411\n1090\n754\n7452\n1844\n0.63\n0.62\n0.59\n0.68\n0.65\n0.27\n\n4563\n3074\n1361\n462\n7637\n1823\n0.67\n0.70\n0.75\n0.60\n0.65\n0.35\n\nPolyPhen lb\n\nPolyPhen 2a\n\nPolyPhen 2b\n\nSIFT\n\nSNAP\n\nSNPs&GO\n\n14285\n4993\n13671\n7197\n19278\n20868\n- 0.70\n0.68\n0.66\n0.74\n0.72\n0.40\n\n13807\n5102\n13863\n6010\n18909\n19873\n0.71\n0.71\n0.70\n0.73\n0.72\n0.43\n\n16206\n2703\n10199\n9674\n18909\n19873\n0.69\n0.64\n0.51\n0.86\n0.78\n0.39\n\n10464\n4856\n12188\n7433\n15320\n19621\n0.65\n0.64\n0.62\n0.68\n0.66\n0.30\n\n16000\n2146\n8190\n6387\n18146\n14577\n0.72\n0.67\n0.56\n0.88\n0.83\n0.47\n\n13736\n5487\n17028\n1382\n19223\n18410\n0.82\n0.90\n0.92\n0.71\n0.76\n0.65\n\n5980\n1657\n1070\n753\n7637\n1823\n0.68\n0.65\n0.59\n0.78\n0.73\n0.38\n\n5814\n1842\n1163\n672\n7656\n1835\n0.70\n0.67\n0.63\n0.76\n0.72\n0.40\n\n6726\n930\n843\n992\n7656\n1835\n0.67\n0.62\n0.46\n0.88\n0.79\n0.37\n\n4303\n1329\n904\n901\n5632\n1805\n0.63\n0.60\n0.50\n0.76\n0.68\n0.27\n\n6751\n714\n700\n111\n7465\n1477\n0.69\n0.63\n0.47\n0.90\n0.83\n0.42\n\n5887\n1746\n1378\n318\n7633\n1696\n0.79\n0.80\n0.81\n0.77\n0.78\n0.58\n\n2410\n1184\n2333\n1205\n3594\n3538\n0.66\n0.66\n0.66\n0.67\n0.67\n0.33\n\n2190\n1361\n2334\n1028\n3551\n3362\n0.66\n0.67\n0.69\n0.62\n0.64\n0.31\n\n2764\n787\n1705\n1657\n3551\n3362\n0.64\n0.61\n0.51\n0.78\n0.70\n0.30\n\n2131\n1145\n2073\n1268\n3276\n3341\n0.64\n0.63\n0.62\n0.65\n0.64\n0.27\n\n2615\n917\n1382\n1069\n3532\n2451\n0.65\n0.63\n0.56\n0.74\n0.68\n0.31\n\n2547\n952\n2898\n259\n3499\n3157\n0.82\n0.90\n0.92\n0.73\n0.77\n0.66\n\nICCaO)\n\nfiia\ntna\n\n\xc2\xa5\ncases +a\ncases\nAccuracy6\nPrecision6\nSpecificity6\nSensitivity6\nNPV6\nMCC6\n\n5625\n517\n1101\n697\n6142\n1798\n0.76\n0.70\n0.61\n0.92\n0.88\n0.55\n\n2857\n1603\n569\n527\n4460\n1096\n0.58\n0.57\n0.52\n0.64\n0.59\n0.16\n\nPerformance ofprediction methods (pathogenic dataset only from LSDBs, not in SwissProt)\n2240\n1175\n1368\n1436\n1651\ntp\nfii\n899\n862\n1252\n2158\n1943\n2655\ntn\n212\n1508\n2842\n3004\n804\n165\n501\n752\n534\nfp\ncases +a\n3139\n2037\n2620\n3594\n3594\ncases\n3459\n377\n2009\n3594\n3538\nAccuracy6\n0.74\n0.57\n0.64\n0.6\n0.65\nPrecision6\n0.75\n0.57\n0.68\n0.66\n0.75\nSpecificity6\n0.77\n0.56\n0.75\n0.79\n0.85\nSensitivity6\n0.71\n0.58\n0.4\n0.52\n0.46\nNPV6\n0.73\n0.57\n0.61\n0.57\n0.61\nMCC6\n0.48\n0.14\n0.28\n0.21\n0.33\n\naTotal number of cases used by the given program (not normalized).\nbAccuracy, precision, specificity, sensitivity, NPV, and MCC are calculated from normalised numbers.\nHUMAN MUTATION, Vol. 32, No. 4, 358-368, 2011\n\n363\n\n;\n\n\x0cPet. Reh. App.80\nand not only the ones trained on Swiss-Prot data, performed worse\nin this subcategory, we claim our results are not biased, even though\nwe acknowledge that a perfectly fair comparison between methods\ntrained on different datasets cannot be made.\nTo study the effect of residue types, the mutated and mutant\namino acids were assigned into six groups according to their\nphysicochemical properties: hydrophobic (C, F, I, L, M, V, W, and Y),\npositively charged (H, K, and R), negatively charged (D and E),\nconformational (G and P), polar (N, Q, and S), and A and T [Shen\nand Vihinen, 2004]. There were small differences in accuracy and\nprecision of the methods for different types of wild-type or mutant\namino adds, but their sensitivity and MCC were dependent on the\nphysicochemical properties of the wild-type and mutant amino\nadds (Fig. 1). The methods were more sensitive to mutations at\nconformational, hydrophobic, and positively charged amino acids\nthan mutations at polar residues or A and T (Fig. 1). MCC differed\nas well depending on the nature of the original residue position,\nand substitutions at hydrophobic positions were predicted best by\nmost methods. Panther predicted mutations at hydrophobic and\npositively charged residues with equal performance, and MutPred\nand SNPs&GO performed better predicting conformational\n\nA\n\nresidues. Mutations affecting negatively charged residues had the\nlowest MCCs by most methods, except for PolyPhenlb, which\npredicted other classes better than the conformational class, and\nMutPred, nsSNPAnalyzer, and SNPs&GO, which had the lowest\nMCC when predicting the effects of mutations altering A and T\nresidues (Fig. 1). The sensitivity and MCC of the methods also\nvaried in predicting the effects of different types of mutant residues,\nAll the methods performed best when the substituting residue was\ncharged, and in the case of nsSNPAnalyzer, polar residues were\npredicted better than negatively charged residues, and SNAP\npredicted polar residues better than positively charged residues,\nDifferences in prediction sensitivity could also be seen at the\nlevel of individual amino acids. Predictions for substitutions at C,\nW, and Y were clearly more sensitive than at other residues by all\nmethods (Fig. 2A). A similar trend was also seen when looking at\nmutant amino acids: mutations to the aforementioned residues\nwere predicted with better sensitivity (Fig. 2A). The sensitivity of\nPolyPhen2b and SNAP varied less at individual residues than that\nof the other programs,\nThe results for the substitutions in the secondary structural\nelements are shown in Figure 2B. All of the programs predicted\n\nB\n\nMutPred\n\nMutPred\nIt\n\njwSNPAnalyzer\n\nSNPs&GO\n\nSNPs&GO-\n\n|/\\?.6\n\n0.6\n\nSNAP\n\n,Pantfier\n\nL4\n\nnsSNPAnafyzer\n\n.\n\nSNAP a.\n\n\xe2\x80\x98 IK/\n\nvParither\n\nv...\n\nf\n\n0.1\n\nPhD-SNP\n\nSIFTt\xe2\x80\x94\' 1\n\n\xe2\x80\x94i PhD-SNP\n\nSIFT\n\nJi\n\nPolyPhenZb^\n\nypoiyPhenla\n\nPolyPhen2a\n\nPotyPheMa\n\nPdyPhen2b\'<\nHydrophobic\n\n\'TfotyPhenib\n1\n\nPolyPhen2a\n\n\xe2\x80\x98\xe2\x80\x98PolyPhonib\n\n:Chsrgs*\nCharge-\n\nC^CortomsSMtai\nPolar\n\nC\n\nD\n\nMutPred\n1\n\nSNPs&GO,\n\niNPAitalyzer\n\nC6 \xe2\x80\xa2\ntL4 -: \\\n\nSNAPa\n\nsim\xe2\x80\x94\nPotyPhen2b^\'^5-;;i,\nPolyPh\xc2\xbbn2a\n\na Panther\n\nSNPs&GO.,\n\n(Muffled\n0,8 j\n\nf\\o,4\n\nSNAP,\n\n\' V\n\niSNPAnatyzor\ntPanther\n\no:\n\nql\n\nW-T}\n\'PhD-SNP\n\nPotyPhenla\n>1b\n\n~+\xe2\x80\x94rfPHD-SNP\n\nSIFTi\n\nPelyPhen2b\'^\n\npdyPHenia1\n\n\xe2\x80\x99\xe2\x80\x99PotyPhenla\nifyWt^iib\n\nFigure 1. The values of the quality parameters, accuracy, precision, sensitivity, and Matthews correlation coefficient (MCC) for different classes of\nsubstituted amino acids. A: accuracy, B: precision, C: sensitivity, and D: MCC. Abbreviations: Charge+, positively charged. Charge -, negatively charged.\n[Color figures can be viewed in the online issue, which is available at wileyonlinelibrary.com]\n\n364\n\nHUMAN MUTATION, Vol. 32, No. 4, 358-368, 2011\n\n\x0cPet. Reh. App.81\n\n\xe2\x80\x94Hi ln I\n\'Aw\xc2\xa3NPAtf*\xc2\xbbr\n\'Pmtim\'\np&ptmi*\n\n*\xe2\x80\xa2***><\xc2\xbb\n\n\\. <fwi>\xc2\xabn2.\n\xc2\xbbT\nItiNNX\'\nISMPrtGO\n\nMutPred\niNParalyiar\nrisSNPAnatyzar\ny\'^^.^.PiWhw\n\nPtxPSNP\n\niJMUm\n!2om ;\nAIpKiMx\nC BattMnnd\niTUftt\n\n\xe2\x80\x94cut\n\nPW&-SNP\n\nPofcPhinabf\n\n(WfPM*\n\nP\xc2\xabSyPht\xc2\xabar:-\n\nmlb:\n\nMutPied\n\xe2\x80\x99Analyzer\n\nSNPaAGO,\n\n;SNAP^._I\n\niPaftihef\niBuried\n...... ^Exposed\nph0\xe2\x80\x98SHP MaFuinjsita\n: 2D all\n\nhenia\n\nPotyPhon2b\'\n\nIPAnalyzer\nA>:\n\n\\\n\n-2\nc-y\n\nSIFTI\n\n+~t-tPhO-SNP\n\nP\xc2\xablyPh\xc2\xabn2b\'\n\n\xe2\x80\x99olyPhenlB\n\n\xe2\x80\x99itanie\n\nD\n\nUu^red\n1\nYQfi\n\n.SNAPx^/\n\nv-\n\nHb\n\nMWitttyzar\n\nSNP*tGOv\n\nVAPaither\n\nSf*aSdd<\n\niWred\n\xc2\xa93t\n\n\' \\l\n\nM\n\nSNAP.\n\nJdfcr!\n\niSNMnafjiet\nxp\xc2\xabwh\xc2\xabr.\n\nM<km\n\npmj-np\n\nSFT\n\nam\xe2\x80\x94*\nl\n\n\xe2\x96\xa0!W>hfw2b^ ^\n\n\'PfcOSNP\n\n: Abmtxm\n\n~ ^Poiyfhtnln\n\nWyfhenib\n\n/\nPotJPhen2i\n\nPobWwnle\nlb\n\nFigure 2. The values of sensitivity and Matthews correlation coefficient (MCC) for different types of amino acid substitutions. A: Sensitivity in\ndifferent amino acid residues. Left mutated (original) amino acids, right substituting (mutant) amino acids. B: Sensitivity (left) and MCC (right)\nfor amino acid substitutions at different secondary structural elements. C: Sensitivity (left) and MCC (right) for amino acid substitutions\naccording to the accessible surface area (ASA) of the position (buried ASA <10%, exposed ASA ^25%). D: Sensitivity (left) and MCC (right) for\namino acid substitutions at different protein structural classes. [Color figures can be viewed in the online issue, which is available at\nwileyonlinelibrary.com]\n\nthe effects of substitutions at different secondary structures with\nalmost equal accuracy and precision. Sensitivity and MCC values\nshowed more variation with secondary structure. In terms of\nMCC, MutPred, nsSNPAnalyzer, PolyPhenlb, and PolyPhen2b\npredicted amino acid substitutions at strands best, whereas\nPanther, PolyPhenla, SNAP, and SNPs&GO performed best at\nturns. PhD-SNP and SIFT predicted substitutions positioned at\na-helices best, and PolyPhen2a at coils. The differences in MCC\nwere not striking. Except for Panther, PhD-SNP, and SNPs&GO, all\n\nmethods were most sensitive when predicting the effects of amino\nacid substitutions at strands. Solvent-accessible surface areas of the\npositions did not markedly affect prediction accuracy or precision,\nbut all the methods were more sensitive when predicting the effects\nof substitutions at buried positions (Fig. 2C). MCC for most\nmethods was better at exposed than buried positions, except for\nPolyPhenla and PolyPhen2a, which performed better at buried\npositions. MCCs for PolyPhenlb and SNAP did not differ with\nsolvent accessibility of the position. These results are not in line\nHUMAN MUTATION, Vol. 32, No. 4, 358-368, 2011\n\n365\n\n\x0cm\n\nPet. Reh. App.82\nTable 4.\n\nPairwise Prediction Correlations\nMutPred\n\nMutPred\nnsSNPAnalyzer\nPanther\nPhD-SNP\nPolyPhen la\nPolyPhen lb\nPolyPhen2a\nPolyPhen2b\n\n4620\n15296\n23955\n22125\n22208\n22234\n20911\n18807\n18877\n23220\n\nSIFT\nSNAP\nSNPs8cGO\nMutPred\nnsSNPAnalyzer\nPanther\nPhD-SNP\nPolyPhenla\nPolyPhenlb\nPolyPhen2a\nPolyPhen2b\nSIFT\nSNAP\nSNPs8cGO\n\nnsSNPAnalyzer\n\nPanther\n\nPhD-SNP\n\nPolyPhen la\n\nPolyPhen lb\n\nPolyPhen 2a\n\nPolyPhen 2b\n\nSIFT\n\nSNAP\n\nSNPs&GO\n\n8721\n\n22645\n7237\n\n36300\n9225\n23671\n\n36522\n9380\n23869\n39659\n\n36522\n9380\n23869\n39659\n40146\n\n35198\n9353\n23406\n38254\n38485\n38485\n\n35198\n9353\n23406\n38254\n38485\n38485\n38782\n\n32705\n8270\n21540\n34532\n34683\n34683\n33686\n33686\n\n29674\n8609\n20713\n32203\n32533\n32533\n31790\n31790\n28726\n\n34066\n9145\n22555\n37095\n37324\n37324\n36317\n36317\n32434\n30987\n\n3589\n4389\n4386\n4965\n4777\n5012\n4302\n4750\n4672\n53.0\n\n0.36\n0.54\n0.57\n0.43\n0.43\n0.49\n0.44\n0.41\n0.46\n0.50\n\n0.37\n0.35\n0.44\n0.47\n0.44\n0.42\n0.53\n0.41\n0.25\n\n14838\n13961\n14701\n14728\n14288\n12623\n13307\n14285\n67.5\n49.6\n0.51\n0.46\n0.50\n0.51\n0.49\n0.48\n0.51\n0.39\n\n22756\n22170\n21871\n20042\n18879\n18004\n23333\n66.0\n47.6\n62.7\n0.45\n0.43\n0.45\n0.40\n0.45\n0.44\n0.44\n\n23764\n22383\n19656\n18207\n17024\n22544\n60.6\n46.8\n58.5\n57.4\n0.66\n0.56\n0.46\n0.45\n0.44\n0.39\n\n23156\n22412\n18985\n19811\n22206\n60.8\n52.9\n61.6\n55.9\n59.2\n0.58\n0.57\n0.52\n0.54\n0.38\n\n24006\n18645\n19321\n22042\n63.2\n51.1\n62.9\n57.2\n58.2\n58.2\n0.72\n0.50\n0.52\n0.38\n\n17833\n19945\n20569\n59.4\n53.6\n61.0\n52.4\n51.1\n58.2\n61.9\n0.51\n0.53\n0.34\n\n16393\n18135\n57.5\n52.0\n58.6\n54.7\n52.5\n54.7\n55.3\n52.9\n0.53\n0.34\n\n18833\n63.6\n55.2\n64.2\n55.9\n52.3\n60.9\n55.3\n62.7\n57.0\n\n68.2\n51.1\n63.3\n62.9\n60.4\n59.5\n60.7\n56.6\n55.9\n60.8\n\n0.39\n\nUpper table: the number of cases shared by two programs (upper right triangle). The number of cases predicted correctly (lower left triangle). Lower table: The number of cases\npredicted correctly, reported as a percentage (upper right triangle). Pairwise correlation (lower left triangle).\n\nwith a previous study [Mort et al., 2010], where a sequence of\' the variant or the structural context affect prediction\nconservation based method yielded results of lower accuracy when performance. The processing of the vast and increasing amount\npredicting the effects of solvent-exposed residues.\nof genetic variation data requires the development of automatic\nCATH classifies proteins as mainly a-helical or (5-stranded, annotation tools to determine the potential pathological character\nmixed a- and ^-structures (a-(3), or as having few secondary of a given variant. Prioritizing the most interesting and likely\nstructures. Interestingly, none of the proteins included in this pathogenic cases for experimental analysis is another important\nanalysis was assigned into the few secondary structures class, application of the tested prediction methods.\nThe predictions differed with respect to sensitivity and MCC\nTo our knowledge, no comprehensive evaluation of the\ndepending on which protein class a mutation appeared (Fig. 2D), performance of missense variant pathogenicity predictors has\nMost programs were more sensitive to amino acid substitutions in been made outside the performance studies of individual methods\nthe a-(3 class of proteins, but SNPs&GO predicted substitutions in the context of their development. We selected test sets that have\nbest in the mainly fi-class. nsSNPAnalyzer predicted those not been used in the training of the methods as such, but a subset\nmutations occurring in a-P and a-helical proteins or domains of the pathogenic dataset is comprised of mutations from Swisswith equal sensitivity. MCCs varied significantly with the Prot, and some methods (MutPred, nsSNPAnalyzer, PhD-SNP,\n=struGtural\xc2\xabiGlass\xc2\xbbofgiprotems!>\xc2\xbbespegiallysin^hespredictionSsiby^=Pj)lvBhen2aandiSNBs8tGQ.).haveaisediSwissJ>Jot.mutations.in.tbe..\nnsSNPAnalyzer, PolyPhenlb, PolyPhen2a, and 2b, and SNPs&GO. training of the method. Testing of the performance of a method\nThe results were generally better for the a-P class of proteins, but with the same cases it was trained on would lead into biased\nnsSNPAnalyzer predicted substitutions at a-helical proteins best results, so that those methods trained on SwissProt mutations\nand SNPs&GO performed best with proteins in the mainly P-dass. would have an advantage over the other methods. However,\nTo further evaluate the performance of the programs we because the pathogenic dataset includes a large number of LSDB\ncompared them in a pairwise fashion (Table 4). The numbers of variations not found in SwissProt, we claim the test set was not\ncases that were shared by the programs varied because the number similar to the training sets to the extent that it would advantage\nof cases that could be predicted by each program varied as described those methods trained on SwissProt data. Further, we tested the\nin the Materials and Methods section. The largest percentage of methods with cases coming only from LSDBs. With this dataset\ncorrectly predicted cases by two programs was 68.2% (for the the performance decreased with all methods, whether trained on\ncombination of MutPred and SNPs&GO). On average, the fraction Swiss-Prot data or not, except for SNPs&GO. This indicates that\nof correctly predicted cases between any two programs was 57.7%. the good performance of SNP&GO was not a result of that it has\nThe correlations between two programs were highest for MutPred previously been exposed to the test dataset during its training\nand PhD-SNP (0.57), and for PolyPhen 1 and 2 (0.57 for the less phase. Furthermore, the poor performance of PhD-SNP indicates\nstringent b versions, and 0.56 for the a versions) (without taking the method did not benefit from the possible identical cases in the\ninto account the higher correlation between PolyPhenla or 2a and data used for training and testing. However, it is impossible to\nPolyPhenlb or 2b that are different forms of the same program). construct a large testing dataset that would not share any cases\nCorrelation was lowest for nsSNPAnalyzer and SNPs&GO (0.25).\nwith the original training sets of any of the methods, especially\nwhen the specific contents of the training sets are rarely published.\nThe neutral dataset was generated from dbSNP entries that had\na frequency higher than 1% when there was data at least for 25\nIn this study we evaluated how reliably the pathogenicity of individuals (50 chromosomes). This way the number of false\nmissense mutants can be predicted, and whether selected features negatives could be minimized in the test set.\n\nDiscussion\n\n366\n\nHUMAN MUTATION, Vol. 32, No. 4, 358-368, 2011\n\nI\n\ni\n\ni\n\n\x0cPet. Reh. App.83\nThere are still other pathogenicity predictors that we did not\nevaluate. SNPs3D [Yue et al., 2006] was not included in this study\nbecause it does not allow submission of user-defined amino acid\nsubstitutions. Similarly, LS-SNP [Karchin et al., 2005] is an\nannotated database of SNPs, not a prediction method for any\nuser-provided variant, although often referred to as a prediction\nmethod for nsSNP pathogenicity. The Auto-Mute predictor of\ndisease potential of human nsSNPs [Barenboim et al., 2008] was\nleft out from the analysis because the program did not allow batch\nsubmission. PMut [Ferrer-Costa et al., 2005] could not be tested\nbecause the server did not return predictions.\nOverall, we found SNPs&GO and MutPred to be clearly the most\nreliable predictors for our dataset of genetic variants. The accuracies\nof all the methods were in the range of 0.60-0.82, and precision\nranged from 0.59 to 0.90. More variation among the methods was\nseen when considering the sensitivities and MCC values that ranged\nfrom 0.52 to 0.88 and 0.19 to 0.65, respectively. The local structural\ncontext of a mutated residue did not dramatically affect predictor\nperformance in most cases but most methods showed variance in\ntheir prediction power at the level of protein tertiary structure\nclassification and at different mutated positions.\nStudies have shown that combining information obtained from\nthe multiple sequence alignment and three-dimensional protein\nstructure can increase prediction performance [Bromberg and\nRost, 2007; Saunders and Baker, 2002]. According to our results,\nthis is not always the case. Panther operates solely on sequencebased evolutionary information, and it is one of the best\nperforming methods, outperforming all the methods incorporat\xc2\xad\ning structural information in the prediction, except for MutPred,\nwhich uses sequence-derived structural predictions as features in\ncombination with evolutionary information. Furthermore,\nalthough nsSNPAnalyzer uses the SIFT method for the evolu\xc2\xad\ntionary analysis and also includes structure-derived features, its\noverall performance is below that for SIFT, except for an increase\nin specificity in the structure subset of data. However, the two best\nperforming predictors include both protein structural or func\xc2\xad\ntional and MSA-derived information in the prediction.\nIt is very difficult to determine whether the notable differences\nin the performance of these methods are caused by differences in\nctheifeaturesiUtilizedibyithc.methodsiOrithe^trainingidatasets.-Eor,,\nexample, SNPs&GO uses GO annotations as a feature, and GO is\nbiased toward genes involved in diseases. The PDB is biased as well,\ncontaining structures of mostly well-studied proteins, which\ninclude products of disease-related genes. Therefore, one would\nexpect SNPs8tGO would perform better in predicting the effects of\nmissense variants in proteins that have structures in the PDB as\nthey are likely to have GO annotation as well\xe2\x80\x94and in fact, it\nperforms worse. One factor that very probably affects prediction\nreliability is the quality of multiple sequence alignment. Because all\nof the methods studied here use MSA as input to the prediction,\nthe quality of the provided MSA should be very carefully assessed.\nFor many of the methods, we did not find documentation how the\nMSA is constructed when the user provides just the query sequence\nas input. For example, an automatic BLAST search often performed\nby the programs may lead into construction of an MSA that\ncontains multiple versions of the same sequence or paralog\nsequences, affecting the resulting conservation analysis. The MSA\nshould contain a selection of closely and distantly related sequences\nin order to effectively yield a conservation signal.\nIn conclusion, those methods that performed best had high\naccuracy (reaching 0.82, SNPs8tGO), precision (0.90, SNPs&GO),\nspecificity (0.92, SNPs&GO), sensitivity (0.88, SNAP), and NPV\n(0.84, MutPred). Matthews correlation coefficient reached the\n\nvalue of 0.65 at best (SNPs&GO). There is no single method that\ncould be rated as best by all parameters, so the user should\nconsider what aspects would be most valuable considering the\nnature of the data analysed. Furthermore, some methods require\n3D structure coordinates, limiting the number of cases that can be\nanalyzed (nsSNPAnalyzer), and some methods are at least\ncurrendy too slow for high-throughput analyses (SNAP).\nAlthough some of the existing methods perform reasonably well,\ndevelopment of new, more reliable methods is certainly needed.\nComplementary methods could be combined in a metaserver to\nyield more reliable predictions.\n\nAcknowledgments\nThe authors thank Pier Luigi Martelli and Rita Casadio from the\nUniversity of Bologna, Biao Li from Indiana University, and Sean Mooney\nand Vidhya Krishnan from the Buck Institute for Age Research, for\ncooperation in running of data.\n\nReferences\nAdzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P,\nKondrashov AS, Sunyaev SR. 2010. A method and server for predicting\ndamaging missense mutations. Nat Methods 7:248-449.\nAhmad S, Gromiha MM, Sarai A. 2004. Analysis and prediction of DNA-binding\nproteins and their binding residues based on composition, sequence and\nstructural information. Bioinformatics 20:477-486.\nAltschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997.\nGapped BLAST and PSI-BLAST: a new generation of protein database search\nprograms. Nucleic Acids Res 25:3389-3402.\nAshbumer M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP,\nDolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, lssel-Tarver L, Kasarskis A,\nLewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. 2000.\nGene ontology: tool for the unification of biology. The Gene Ontology\nConsortium. Nat Genet 25:25-29.\nBairoch A, Apweiler R. 2000. The SW1SS-PROT protein sequence database and its\nsupplement TrEMBL in 2000. Nucleic Acids Res 28:45\xe2\x80\x9448.\nBaldi P, Brunak S, Chauvin Y, Andersen CAF, Nielsen H. 2000. Assessing the accuracy of\nprediction algorithms for classification: an overview. Bioinformatics 16:412-424.\nBao L, Zhou M, Cui Y. 2005. nsSNPAnalyzer: identifying disease-associated non- \xe2\x80\xa2.\nsynonymous single nucleotide polymorphisms. Nucleic Acids Res 33:W480-W482.\nBarenboim M, Masso M, Vaisman, II, Jamison DC. 2008. Statistical geometry based\nprediction of nonsynonymous SNP functional effects using random forest and\nneuro-fuzzy classifiers. Proteins 71:1930-1939.\n^Berman^IMT%Wcstbr66kj^Eeng-Z.-Gilliland-G,^BhatJN,-WedssiaJd;.ShindyalovlN--"\nBourne PE. 2000. The Protein Data Bank. Nucleic*Acids_R\xc2\xab 28:235-242.\nBromberg Y, Rost B. 2007. SNAP: predict effect of non-synonymous polymorphisms\n^ on function. Nucleic Acids Res 35:3823-3835.\nCalabrese R, Capriotti E, Fariselli P, Martelli PL, Casadio R. 2009. Functional\nannotations improve the predictive score of human disease-related mutations in\nproteins. Hum Mutat 30:1237-1244.\nCapriotti E, Calabrese R, Casadio R. 2006. Predicting the insurgence of human\ngenetic diseases associated to single point protein mutations with support vector\nmachines and evolutionary information. Bioinformatics 22:2729-2734.\nCapriotti E, Fariselli P, Casadio R. 2005. I-Mutant2.0: predicting stability changes\nupon mutation from the protein sequence or structure. Nucleic Acids Res 33:\nW306-W310.\nChandonia JM, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE. 2004.\nThe ASTRAL Compendium in 2004. Nucleic Acids Res 32:D189-D192.\nChou PY, Fasman GD. 1974. Prediction of protein conformation. Biochemistry 13:\n222-245.\n\nDaily KM, Radivojac P\xe2\x80\x9e Dunker AK. 2005. Intrinsic disorder and protein\nmodifications: building an SVM predictor for methylation. IEEE Symposium\non Computational Intelligence in Bioinformatics and Computational Biology,\nCIBCB: 475-481.\nDelorenzi M, Speed T. 2002. An HMM model for coiled-coil domains and a\ncomparison with PSSM-based predictions. Bioinformatics 18:617-625.\nFerrer-Costa C, Gelpf JL, Zamakola L, Parraga I, de la Cruz X, Orozco M. 2005.\nPMUT: a web-based tool for the annotation of pathological mutations on\nproteins. Bioinformatics 21:3176-3178.\nFinn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P,\nCeric G, Forslund K, Holm L, Sonnhammer EL, Eddy SR, Bateman A. 2010. The\nPfam protein families database. Nucleic Acids Res 3:D211-D222.\nHUMAN MUTATION, Vol. 32, No. 4, 358-368, 2011\n\n367\n\n\xe2\x96\xa0i\n\n\x0cPet. Reh. App.84\n\nPeng K, Radivojac P, Vucetic S, Dunker AK, Obradovic Z. 2006. Length-dependent\nFrishman D, Argos P. 1995. Knowledge-based protein secondary structure assignment.\nprediction of protein intrinsic disorder. BMC Bioinformatics 7:208.\nProteins 23:566-579.\nPiirila H, Valiaho J, Vihinen M. 2006. Immunodeficiency mutation databases\nGiardine B, Riemer C, Hefferon T, Thomas D, Hsu F, Zielenski J, Sang Y, Elnitski L,\n(IDbases). Hum Mutat 27:1200-1208.\nCutting G, Trumbower H, and others. 2007. PhenCode: connecting ENCODE\nRadivojac P, Obradovic Z, Smith DK, Zhu G, Vucetic S, Brown CJ, Lawson JD,\ndata with mutations and phenotype. Hum Mutat 28:554-562.\nDunker AK. 2004. Protein flexibility and intrinsic disorder. Protein Sci 13:\nIakoucheva LM, Radivojac P, Brown C), O\xe2\x80\x99Connor TR, Sikes JG, Obradovic Z,\n71-80.\nDunker AK. 2004. The importance of intrinsic disorder for protein phosphorylation.\nRadivojac P, Vacic V, Haynes C, Cocklin RR, Mohan A, Heyen JW, Goebl MG,\nNucleic Acids Res 32:1037-1049.\nJohnson GC, Todd JA. 2000. Strategies in complex disease mapping. Curr Opin Genet\nIakoucheva LM. 2010. Identification, analysis, and prediction of protein\nubiquitination sites. Proteins 78:365-380.\nDev 10:330-334.\nKabsch W, Sander C. 1983. Dictionary of protein secondary structure: pattern recognition\nRadivojac P, Vucetic S, O\xe2\x80\x99Connor TR, Uversky VN, Obradovic Z, Dunker AK. 2006.\nCalmodulin signaling: analysis and prediction of a disorder-dependent\nof hydrogen-bonded and geometrical features. Biopolymers 22:2577-2637.\nmolecular recognition. Proteins 63:398-410.\nKarchin R. 2009. Next generation tools for the annotation of human SNPs. Brief\nBioinform 10:35-52.\nRamensky V, Bork P, Sunyaev S. 2002. Human non-synonymous SNPs: server and\nKarchin R, Diekhans M, Kelly L, Thomas DJ, Pieper U, Eswar N, Haussler D, Sali A.\nsurvey. Nucleic Acids Res 30:3894-3900.\n2005. LS-SNP: large-scale annotation of coding non-synonymous SNPs based\nRaney BJ, Cline MS, Rosenbloom KR, Dreszer TR, Learned K, Barber GP, Meyer LR,\non multiple information sources. Bioinformatics 21:2814-2820.\nSloan CA, Malladi VS, Roskin KM, Suh BB, Hinrichs AS, Clawson H, Zweig AS,\nKawabata T, Ota M, Nishikawa K. 1999. The protein mutant database. Nucleic Acids\nKirkup V, Fujita PA, Rhead B, Smith KE, Pohl A, Kuhn RM, Karolchik D,\nRes 27:355-357.\nHaussler D, Kent WJ. 2011. OENCODE whole-genome data in the UCSC\nKhan S, Vihinen M. 2007. Spectrum of disease-causing mutations in protein\nGenome Browser (2011 update). Nucleic Acids Res 39(Database issue):871-875.\nsecondary structures. BMC Struct Biol 7:56.\nRost B. 1996. PHD: predicting one-dimensional protein structure by profile-based\nKhan S, Vihinen M. 2010. Performance of protein stability predictors. Hum Mutat\nneural networks. Methods Enzymol 266:525-539.\n31:675-684.\nRost B, Sander C. 1994. Conservation and prediction of solvent accessibility in\nKrogh A, Larsson B, von Heijne G, Sonnhammer EL 2001. Predicting transmem\xc2\xad\nprotein families. Proteins 20:216-226.\nSaunders CT, Baker D. 2002. Evaluation of structural and evolutionary contributions\nbrane protein topology with a hidden Markov model: application to complete\nto deleterious mutation prediction. J Mol Biol 322:891-901.\ngenomes. J Mol Biol 305:567-580.\nLi B, Krishnan VG, Mort ME, Xin F, Kamati KK, Cooper DN, Mooney SD,\nSchlessinger A, Yachdav G, Rost B. 2006. PROFbval: predict flexible and rigid residues\nRadivojac P. 2009. Automated inference of molecular mechanisms of disease\nin proteins. Bioinformatics 22:891-893.\nShen B, Vihinen M. 2004. Conservation and covariance in PH domain sequences:\nfrom amino acid substitutions. Bioinformatics 25:2744-2750.\nLupas A, Van Dyke M, Stock J. 1991. Predicting coiled coils from protein sequences.\nphysicochemical profile and information theoretical analysis of XLA-causing\nScience 252:1162-1164.\nmutations in the Btk PH domain. Protein Eng Des Sel 17:267-276.\nMatthews BW. 1975. Comparison of the predicted and observed secondary structure\nSherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. 2001.\ndbSNP: the NCB1 database of genetic variation. Nucleic Acids Res 29:308-311.\nof T4 phage lysozyme. Biochim Biophys Acta 405:442\xe2\x80\x94451.\nSteward RE, MacArthur MW, Laskowski RA, Thornton JM. 2003. Molecular basis of\nMcDonald IK, Thornton JM. 1994. Satisfying hydrogen bonding potential in\ninherited diseases: a structural perspective. Trends Genet 19:505-513.\nproteins. J Mol Biol 238:777-793.\nSunyaev SR, Eisenhaber F, Rodchenkov IV, Eisenhaber B, Tumanyan VG,\nMooney S. 2005. Bioinformatics approaches and resources for single nucleotide\nKuznetsov EN. 1999. PSIC: profile extraction from sequence alignments with\npolymorphism functional analysis. Brief Bioinform 6:44-56.\nMort M, Evani US, Krishnan VG, Kamati KK, Baenziger PH, Bagchi A, Peters B,\nposition-specific counts of independent observations. Protein Eng 12:\nSathyesh R, Li B, Sun Y, Xue B, Shah NH, Kann MG, Cooper DN, Radivojac P,\n; 387-394.\nThomas PD, Campbell MJ, Kejariwal A, Mi H, Karlak B, Daverman R, Diemer K,\nMooney SD. 2010. In silico functional profiling of human disease-associated and\nMuruganujan A, Narechania A. 2003. PANTHER: a library of protein families\npolymorphic amino acid substitutions. Hum Mutat 31:335-346.\nand subfamilies indexed by function. Genome Res 13:2129-2141.\nNg PC, Henikoff S. 2001. Predicting deleterious amino acid substitutions. Genome\nThusberg J, Vihinen M. 2009. Pathogenic or not? And if so, then how? Studying the\nRes 11:863-874.\neffects of missense mutations using bioinformatics methods. Hum Mutat 30: .\nNg PC, Henikoff S. 2003. SIFT: predicting amino acid changes that affect protein\nfunction. Nucleic Acids Res 31:3812-3814.\n703-714.\nYip YL, Famiglietti M, Gos A, Duek PD, David FP, Gateau A, Bairoch A. 2008.\nNg PC, Henikoff S. 2006. Predicting the effects of amino acid substitutions on\nprotein function. Annu Rev Genomics Hum Genet 7:61-80.\nAnnotating single amino acid polymorphisms in the UniProt/Swiss-Prot\nNg PC, Henikoff JG, Henikoff S. 2000. PHAT: a transmembrane-specific substitution\nknowledgebase. Hum Mutat 29:361-366.\n"matrixTPredicted hydrophobic\'and transmembranerBioinformatics 16:760-766.-----Yip YLrScheib HrDiemand AV,-Gattiker-A,-FamigIietti LM,-Gasteiger E,-Bairoch A.\nNielsen H, Engelbrecht J, Brunak S, von Heijne G. 1997. Identification of prokaryotic and\n2004. The Swiss-Prot variant page and the ModSNP database: a resource for\neukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 10:1-6.\nsequence and structure information on human protein variants. Hum Mutat 23:\nOrengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM. 1997.\n464-470.\nCATH\xe2\x80\x94a hierarchic classification of protein domain structures. Structure 5:\nYue P, Melamud E, Moult J. 2006. SNPs3D: candidate gene and SNP selection for\n1093-1108.\nassociation studies. BMC Bioinformatics 7:166.\n\n368\n\nHUMAN MUTATION, Vol. 32, No. 4, 358-368, 2011\n\n\x0cin\n\nSSI* aas*\n\n:.i^\n\nt\n\nwww natore.com/scicntificreports\n\nPet. Reh. App.85\n\nSCIENTIFIC\n\nREPORTS\nnatureresearch\n\nOPEN\n\nIdentification of pathogenic\nmissense mutations using protein\nstability predictors\nLukas Gerasimavicius, Xin Liu & Joseph A. Marsh0\nAttempts at using protein structures to identify disease-causing mutations have been dominated by\nthe idea that most pathogenic mutations are disruptive at a structural level. Therefore, computational\nstability.predictors, which assess whether a mutation is likely to be stabilising or destabilising to\nprotein structure, have been commonly used when evaluating new Candidate disease variants,\'\ndespite not having been developed specifically for this purpose. We therefore tested 13 different\nstability predictors for their ability to discriminate between pathogenic and putatively benign\nmissense variants. We find that one method, FoldX, significantly outperforms all other predictors in .y\nthe identification of disease variants. Moreover, we demonstrate that employing predicted absolute^ ,\nenergy change scores improves performance of nearly all predictors in distinguishing pathogenic $.\nfrom benign variants. Importantly, however, we observe that the utility of computational stability <y:.\npredictors is highly heterogeneous across different proteins, and that they are all inferior to the\nbest performing variant effect predictors for identifying pathogenic mutations. We suggest that\nthis is largely due to alternate molecular mechanisms other than protein destabilisation underlying\nmany pathogenic mutations. Thus, better ways of incorporating protein structural information and\nmolecular mechanisms into computational variant effect predictors will be required for improved\ndisease variant prioritisation.\n\n>\xe2\x96\xa0\n\nAdvances in next generation sequencing technologies have revolutionised research of genetic variation, increas\xc2\xad\ning our ability to explore the basis of human disorders and enabling huge databases covering both pathogenic\nand putatively benign variants1,2. Novel sequencing methodologies allow the rapid identification of variation\niiTtKFclinicfand are helping" facilitate a paradigm shift towards precision me"dicih^3;^"Despite this/however,"it"\nremains challenging to distinguish the small fraction of variants with medically relevant effects from the huge\nbackground of mostly benign human genetic variation.\nA particularly important research focus is single nucleotide variants that lead to amino acid substitutions\nat the protein level, i.e. missense mutations, which are associated with more than half of all known inherited\ndiseases5,6. A large number of computational methods have been developed for the identification of potentially\npathogenic missense mutations, i.e. variant effect predictors. Although different approaches vary in their imple\xc2\xad\nmentation, a few types of information are most commonly used, including evolutionary conservation, changes in\nphysiochemical properties of amino acids, biological function, known disease association and protein structure7.\nWhile these predictors are clearly useful for variant prioritisation, and show a statistically significant ability to\ndistinguish known pathogenic from benign variants, they still make many incorrect predictions8"10, and the\nextent to which we can rely on them for aiding diagnosis remains limited11.\nAn alternative approach to understanding the effects of missense mutations is with computational stability\npredictors. These are programs that have been developed to assess folding or protein interaction energy changes\nupon mutation (change in Gibbs free energy - AAG in short). This can be achieved by approximating struc\xc2\xad\ntural energy through linear physics-based pairwise energy scoring functions, their empirical and knowledgebased derivatives, or a mixture of such energy terms. Statistical and machine learning methods are employed\nto parametrise the scoring models. These predictors have largely been evaluated against their ability to predict\nexperimentally determined AAG values. Great effort has been previously made to assess stability predictor per\xc2\xad\nformance in producing accurate or well-correlated energy change estimates upon mutation, as well as assessing\ntheir shortfalls, such as biases arising from destabilising variant overrepresentation in training sets and lack of\nMRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4\n2XU, UK. \xe2\x80\x99\xe2\x80\x9c\xe2\x80\x99email: joseph.marsh@igmm.ed.ac.uk\nSCIENTIFIC REPORTS |\n\n(2020) 10:15387\n\n| https://doi.org/10.1038/s41598-020-72404-w\n\n1\n\n\x0cwww, nature.com/scientjfiereports/\n\nPet-RehrApp:86self-consistency predicting forward-backward substitutions12\'18. Several predictors have since been shown to\nalleviate such issues through their specific design or have been improved in this regard14,19\xe2\x80\x9920. Moreover, the\npractical utility of stability predictors has been demonstrated through their extensive usage in the fields of protein\nengineering and design21\'23.\nAlthough computational stability predictors have not been specifically designed to identify pathogenic\nmutations, they are very commonly used when assessing candidate disease mutations. For example, publi\xc2\xad\ncations reporting novel variants will often include the output of stability predictors as evidence in support\nof pathogenicity24"27. This relies essentially upon the assumption that the molecular mechanism underlying\nmany or most pathogenic mutations is directly related to the structural destabilisation of protein folding or\ninteractions28\'31. However, despite their widespread application to human variants, there has been little to no sys\xc2\xad\ntematic assessment of computational stability predictors for their ability to predict disease mutations. A number\nof studies have assessed the real-world utility for individual protein targets and families using certain stability\npredictors32\'36. However, numerous computational stability predictors have now been developed and, overall,\nwe still do not have a good idea of which methods perform best for the identification of disease mutations, and\nhow they compare relative to other computational variant effect predictors.\nIn this work, we explore the applicability and performance of 13 methodologically diverse structure-based\nprotein stability predictors for distinguishing between pathogenic and putatively benign missense mutations.\nWe find that FoldX significantly outperforms all other stability predictors for the identification of disease muta\xc2\xad\ntions, and also demonstrate the practical value of using predicted absolute AAG values to account for poten\xc2\xad\ntially overstabilising mutations. However, this work also highlights the limitations of stability predictors for\npredicting disease, as they still miss many pathogenic mutations and perform worse than many variant effect\npredictors, thus emphasising the importance of considering alternate molecular disease mechanisms beyond\nprotein destabilisation.\nResults\nWe tested 13 different computational stability predictors on the basis of accessibility, automation or batching\npotential, computation speed, as well as recognition\xe2\x80\x94and included FoldX37, INPS3D38, Rosetta37, PoPMusic39,\nI-Mutant40, SDM41, SDM242, mCSM43, DUET44, CUPSAT45, MAESTRO46, ENCoM47 and DynaMut48 (Table 1).\nWe ran each predictor against 13,508 missense mutations from 96 different high-resolution (<2 A) crystal struc\xc2\xad\ntures of disease-associated monomeric proteins. Our disease mutation dataset was comprised of 3,338 missense\nvariants from ClinVar2 annotated as pathogenic or likely pathogenic, and we only included proteins with at least\n10 known pathogenic missense mutations occurring at residues present in the structure. We compared these to\n10,170 missense variants observed in the human population, taken from gnomAD v2.l\xe2\x80\x98, which we refer to as\n\xe2\x80\x9cputatively benign\xe2\x80\x9d. We acknowledge that it is likely that some of these gnomAD variants could be pathogenic\nunder certain circumstances (e.g. if observed in a homozygous state, if they cause late-onset disease, or there\nis incomplete penetrance), or they may be damaging but lead to a subclinical phenotype. However, the large\nmajority of gnomAD variants will be non-pathogenic, and we believe that our approach of represents a good\ntest of the practical utilisation of variant effect predictors, where the main challenge is in distinguishing severe\npathogenic mutations from others observed in the human population. While filtering by allele frequency would\ngive us variants that are more likely to be truly benign, it would also dramatically reduce the size of the dataset\n(e.g. only-1% of missense variants in gnomAD have an allele frequency >0.1%). Thus, we have not filtered the\ngnomAD variants (other than to exclude known pathogenic variants present in the ClinVar set).\nTo investigate the utility of the computational stability predictors for the identification of pathogenic mis\xc2\xad\nsense mutations, we used receiver operating characteristic (ROC) plots to assess the ability of AAG values to\ndistinguish between pathogenic and putatively benign mutations (Fig. 1A). This was quantifed by the area\nunder the curve (AUC), which is equal to the probability of a randomly chosen disease mutation being assigned\na higher-ranking score than a random benign one. Of the 13 tested structure-based AAG predictors, FoldX\nperforms the best as a predictor of human missense mutation pathogenicity, with an AUC value of 0.661. This\nis followed by INPS3D at 0.640, Rosetta at 0.617 and PoPMusic at 0.614. Evaluating the performance through\nbootstrapping, we found that the difference between FoldX and other predictors is significant, with a p value of\n2 x 10"4 compared to INSP3D, 1 x 10\'7 for Rosetta and 8 x 10\'9 for PoPMusiC. The remaining predictors show a\nwide range of lower performance values.\nTwo predictors, ENCoM and DynaMut, stand out for their unusual pattern in the ROC plots, with a rotated\nsigmoidal shape where the false positive rate becomes greater than the true positive rate at higher levels. Close\ninspection of the underlying data shows that this is indicative of the predicted energy change distribution tails\nfor the disease-associated class extending both directions away from the putatively benign missense mutation\nscore density. This suggests that a considerable portion of pathogenic missense mutations are predicted by these\nmethods to excessively stabilise the protein.\nWhile the analysis (Fig. 1 A) assumes that protein destabilisation should be indicative of mutation pathogenic\xc2\xad\nity, it also possible for mutations that increase protein stability to cause disease49,50. Recent research has shown\nthat absolute AAG values, which treat stabilisation and destabilisation equivalently, may be better indicators of\ndisease association51,52. Therefore, we repeated the analysis using absolute AAG values (Fig. IB). This improved\nthe performance of most predictors, while not reducing the performance of any. The most drastic change was\nobserved for ENCoM, which improved from worst to fifth best predictor, with an increase in AUC from 0.495\nto 0.619. However, the top four predictors, FoldX, INPS3D, Rosetta and PoPMuSiC, improve only slightly and\ndo not change in ranking.\nUsing the ROC point distance to the top-left corner53, we establish the best disease classification AAG value\nfor each predictor when assessing general perturbation (Table 2). It is interesting to note that FoldX demonstrates\n\nSCIENTIFIC REPORTS |\n\n(2020)10:15387 |\n\nhttps://doi.org/10.1038/s41598-020-72404-w\n\n5.\n\ni\n\n\x0cwww nature com/scientificreports/\n\nPet. RehvApp.87\'\nPredictor\n\n1\n\n\xe2\x96\xa0\n\n\xe2\x96\xa0:\n\n\' \' -T\n\nI uik*\'\n\n*\n\n- H\n\nDescription*.\n\n\xe2\x80\xa2\nv; \xe2\x96\xa0 -jV-.\n\xe2\x96\xa0s^l \xe2\x80\xa2 =\nConsensus predictor which uses outputs from Bio3D. ENCoM and DUET\nto assess the impact of mutations on protein stability. Due to its nature, the\npredictor leverages multiple methodologies, such as normal mode analysis\nand statistical potentials\n\nDynaMut4*\n\nhttps://biosig.unimelb.edu.au/dynamut/\n\nENCoM47\n\nNo longer available as a stand-alone server, but available from DynaMut\n\nA prediction method based on normal mode analysis that relates changes\nin vibrational entropy upon mutation to changes in protein stability. Uses\ncoarse-grained protein representations that accounts for residue properties\n\nDUET44\n\nhttps://biosig.unimelb.edu.au/duet/stability\n\nA machine-learnt consensus predictor that leverages output from SDM\nand mCSM, integrated using support vector machines\n\nSDM41\n\nNo longer available as a stand-alone server (succeeded by the SDM2\nWebserver), but available from DynaMut\n\nA knowledge-based energy potential, derived using evolutionary environ\xc2\xad\nment-specific residue substitution propensities\n\nFoldX76\n\nhttps://foldxsuitc.crg.cu/\n\nA full-atom force field consisting of physics-based interaction and entropic\nterms, parametrised on empirical training data. Allows to easily run\npredictions on multi-chain assemblies\n\nRosetta57\n\nhttps://www.rosettacommons.org/home\n\nRosetta macromolecular modelling software suite, which includes algo\xc2\xad\nrithms for stability impact prediction. Driven by a scoring function that\nis a linear combination of statistical and empirical energy terms. Highly\nmodular and customisable\n\nINPS3D58\n\nhttps://inpsmd.biocomp.ujiibo.it/inpsSuite/default/index3D\n\nINPS3D builds upon its sequence and physicochemical conservationbased predecessor INPS, and employs structure-derived features such as\nsolvent accessibility and local energy differences. The predictor is trained\nby employing support vector regression\n\nmCSM45\n\nhttps://biosig.imimelb.cdu.au/mcsm/stability\n\nA machine-learned approach that evaluates structural signature changes\nimparted by mutations. Derives graph representation of physicochemical\nand geometric residue environment features\n\nSDM242\n\nhttps://marid.bioccam.ac.uk/sdm2/prediction\n\nUpdated version of SDM, a knowledge-based potential, which uses\n- environment-specific residue substitution .tables, information pn residue conformation and interactions, as well as packing density and residue\ndepth, to assess protein stability changes\n\nCUPSAT45\n\nhttps://cupsat.tu-bs.de/\n\nPrediction method that uses a residue torsion angle potential and an\nenvironment-specific atom pair potential (an improvement upon amino,\nadd potentials) to assess stability changes\n\nPoPMuSiC59\n\nhttps://soft.de2yme.com/qucry/creatc/pop\n\nA potential consisting of 13 statistical terms, volume difference between\nthe wild-type and mutant residues, as well as the solvent accessibility of the\noriginal residue to differentiate core and surface substitutions\n-\n\nMAESTRO46\n\nhttps://pbwww.che.sbg.ac.al/macstro/web\n\nCombines 3 statistical scoring functions of solvent exposure and residue\npair distances, as well as 6 protein properties, in a machine-learning\nframework to derive a consensus stability impact prediction\n\nI-Mutant 3.040\n\nhUps://gpa2.biocomp.unibo.it/cgi/predictors/I-Mutant3.0/l-Mutan\nt3.0.cgi\n\n\xe2\x80\xa21\n\n\'\xe2\x96\xa0\n\nthe best classification performance when utilising 1.58 keal/mol as the stability change threshold, which is\nremarkably close\xe2\x80\x9dto"the Value of lTS lSll/mbrptewSu\'sly\'su\'ggested^Sd used\'in-a-number\'of bther worksTvHerr\nassessing missense mutation impact on stability13-35\'54. Of course, these threshold values should be considered\nfar from absolute rules, and there are many pathogenic and-benign mutations above.and below, the thresholds\nfor all predictors. For example, nearly 40% of pathogenic niissense mutations have FoldX Values lower than the\nthreshold, whereas approximately 35% of putatively benign variants are above the threshold.\nTo account for the class imbalance between putatively benign and pathogenic variants (roughly 3-to-l) in\nour dataset, we also performed precision-recall curve analysis. While the AUC of PR curves, unlike ROC, does\nnot have a straightforward statistical interpretation, we again based the predictor performance according to this\nmetric. From Fig. SI, it is apparent that the top four best predictors, according to both raw and absolute AAG\nvalues, remain the same as in the ROC analysis\xe2\x80\x94FoldX, INPS3D, Rosetta and PoPMuSiC, respectively.\nWe also calculated ROC AUC values for each protein separately and compared the distributions across predic\xc2\xad\ntors (Fig. 2). FoldX again performs much better than other stability predictors for the identification of pathogenic\nmutations, with a mean ROC of 0.681, compared to INPS3D at 0.655, Rosetta at 0.627, PoPMuSiC at 0.621, and\nENCoM at 0.630. Notably, the protein-specific performance was observed to be extremely heterogeneous across\nall predictors. While some predictors performed extremely well (AUC > 0.9) for certain proteins, each predictor\nhas a considerable number of proteins for which they perform worse than random classification (AUC <0.5).\nUsing the raw and absolute AAG scores, we explored the similarities between different predictors by calculat\xc2\xad\ning Spearman correlations for all mutations between all pairs of predictors (Fig. S2). It is apparent that, outside of\nimproved method versions and their predecessors, as well as consensus predictors and their input components,\nindependent methods do not show correlations above 0.65. Furthermore, correlations on the absolute scale\nappear to slightly decrease in the majority of cases, with exceptions like ENCoM becoming more correlated with\nFoldX and INPS3D, while at the same time decoupling from DynaMut\xe2\x80\x94a consensus predictor which uses it as\ninput. Interestingly, FoldX and INSP3D, the best two methods, only correlate at 0.50 and 0.48 for raw and absolute\nAAG values, respectively, which could indicate potential for deriving a more effective consensus methodology.\n\n(2020) 10:15387 |\n\nii\n\nA machine-learning derived method that takes into account mutated\nresidue spatial environment in terms of surrounding residue types and\nsurface accessibility\n\nTable 1. Protein stability predictors used in this study.\n\nSCIENTIFIC REPORTS |\n\n:!\n\nhttps://doi.org/10.1038/s41598-020-72404-w\n\n3\n\nJ\n\n;\n\n\x0cwwvv nature com/scientificreports/\n\nPet. Reh. App.88\n\nFalse positive rate\n\xe2\x96\xa0 FoldX (AUC 0.661)\nB SDM (AUC 0.578)\nB mCSM (AUC 0.578)\n\xe2\x96\xa0 INPS3D (AUC 0.640)\nH Rosetta (AUC 0.617)\nB CUPSAT (AUC 0.576)\nPredictor\nPoPMuSiC (AUC 0.614) B MAESTRO (AUC 0.557)\n13 l-Mutant 3.0 (AUC 0.607) B DynaMut (AUC 0.524)\nB DUET (AUC 0.594)\nB ENCoM (AUC 0.495)\nB SDM2 (AUC 0.591\n\nrn\nB\nfl\nB\nS3\nB\nm\n\nFalse positive rate\nFoldX (AUC 0.665)\n0 DUET (AUC 0.599)\n\xe2\x96\xa0 SDM2 (AUC 0.591)\nINPS3 0 (AUC 0.643)\nB MAESTRO (AUC 0.591)\nRosetta (AUC 0.624)\nPoPMuSiC (AUC 0.620) B DynaMut (AUC 0.583)\nENCoM (AUC 0.619)\nB SDM (AUC 0.580)\nl-Mutant 3.0 (AUC 0.607) E mCSM (AUC 0.580)\nCUPSAT (AUC 0.599)\n-\xe2\x96\xa0-3\n\nFigure 1. Using AAG values from protein stability predictors to discriminate between pathogenic and\nputatively benign missense variants. Receiver operating characteristic (ROC) curves are plotted for each\npredictor, with the classification performance being presented next to its name in the form of area under the\ncurve (AUC). (A) ROC curves for classification performance using native AAG value scale for each predictor..\n(B) ROC curves for predictor classification performance when using absolute AAG values. The figure was\n,\ngenerated in R v3.6.3 (https://mvw.r-project.org) using ggplot2 v3.3.0 (https://ggplot2.tidyverse.org/), both\nfreely available.\n\n--\n\nPHdutar\n\n\'-I\n\nW po\xc2\xabhn rale (95* \xe2\x80\x94Bdcnc. mtcr.nl>\n\n^ AlMiluteUfidwwImld\n\n.1\n\nFoldX\n\n1.578\n\n0.339-0.357\n\n0.591-0.624\n\n1NPS3D\n\n0.674\n\n0.389-0.409\n\n0.595-0.628\n\nRosetta\n\n1.886\n\n0.390-0.409\n\n0.572-0.605\n\nPoPMuSiC\n\n0.795\n\n0.417-0.437\n\n0.584-0.618\n\nCUPSAT\n\n1.455\n\n0.415-0.434\n\n0.549-0.583\n\nMAESTRO\n\n0.321\n\n0.418-0.437\n\n0.544-0.578\n\nSDM\n\n1.025\n\n0.350-0.370\n\n0.477-0.511\n\nLSDM2:\n\n\xe2\x80\xa20,875,\n\n.0.365-0.385.\n\nA510-0,544\n\nJ\n\nmCSM\n\n0.889\n\n0.433-0.453\n\n0.542-0.575\n\n~i\n\nDUET\n\n0.803\n\n0.400-0.421\n\n0.548-0.582\n\nI-Mutant 3.0\n\n0.915\n\n0.405-0.424\n\n0.545-0.578\n\nENCoM\n\n0.221\n\n0.415-0.436\n\n0.598-0.632\n\nDynaMut\n\n0.476\n\n0.446-0.467\n\n0.570-0.605\n\n\'!\n\n\xe2\x96\xa0i\n\nTable 2. Best stability predictor classification thresholds according to \xe2\x80\x98distance-to-corner metric. The\nperformance metrics and their 95% confidence intervals were derived from 2000 bootstraps of the data.\n\n\xe2\x80\x99\n\nFinally, we compared the performance of protein stability predictors to a variety of different computational\nvariant effect predictors (Fig. 3). Importantly, we excluded any predictors trained using supervised learning\ntechniques, as well as meta-predictors that utilise the outputs of other predictors, thus including only predictors\nwe labelled as unsupervised and empirical in our recent study10. This is due to the fact that predictors based\nupon supervised learning are likely to have been directly trained on some of the same mutations used in our\nevaluation dataset, making a fair comparison impossible10,55. A few predictors perform substantially better than\nFoldX, with the best performance seen for SIFT4G56, a modified version of the SIFT algorithm57. Interestingly,\nFoldX and INPS3D are the only stability predictors to outperform the BLOSUM62 substitution matrix58. On the\nother hand, all stability predictors performed better than a number of simple evolutionary constraint metrics.\n\nSCIENTIFIC REPORTS |\n\n(2020)10:15387 |\n\nhttps://doi.orgA0.1038/s41598-020-72404-w\n\n:\n\n4\n\n\x0cwww.nature.com/scientifiCfeports/\n\nPetrRetirApp:89"\n1.0\n\n0.8\n2\n\ni\n\n1\xc2\xa3 0.6\n\no.\nO\n3\n\n<\n\nO\nO 0.4\n\n0.2\n\nFoldX\nINPS3D\n\n\xe2\x80\xa2 Mean 0.681\n\n0.655\n\nRosetta\nENCoM\nCUPSAT\nPoPMuSIC\nI-Mutant 3.0\nDUET\n0.627\n\n0.621\n\n0.630\n\n0.589\n\n0.612\n\nSDM2\n\n0.599\n\n0.590\n\nDynaMut\nMAESTRO\nSDM\n0.562\n\n0.597\n\n0.578\n\nmCSM\n0.576\n\ni\n\nFigure 2. The heterogeneity of protein-specific missense variant classification performance. All the stability\npredictors exhibit very high degrees of heterogeneity in their protein-specific performance, as measured by the\nROC AUC on a per-protein basis. Absolute AAG values were used during protein-specific tool assessment. The\nmean performance of each predictor is indicated by a red dot and numerically showcased below the plot. Boxes\ninside the violins illustrate die interquartile range (IQR) of the protein-specific performance points, with the\nwhiskers measuring 1.5 IQR. Boxplot oudiers are designated by black dots. The figure was generated in R v3.6.3\n(https://www.r-project.org) using ggplot2 v3.3.0 (https://ggplot2.tidyverse.org), both freely available.\n\n!\n\n0.75 i\n\n0.70\n\n^ 0.65\n\n<\n\nO\n\n\xc2\xa7 0.60\n-.-Ia ..\n\n0.55\n\n0.50\n\ni\nI i I i 111! i 1! 1E f 1 I t i M\ng\n\nm\n\ntt.\n\na.\n\n2\n|\n\no\n\n\xc2\xa3\n\n!\n\nO\n3\n\n|\n\n1E\n\n\xc2\xa3\n\n% | \xc2\xa3\nO\n\nE\n\nP\n\nS\n\n\xc2\xab u<\n\n\xc2\xa7 6 |\n\n&\n\n<\xc2\xb0\n\nI \xe2\x80\xa2!\n\n-1\n\ny\n\ng\n\no\nO\n\nFigure 3. Performance comparison of protein stability and variant effect predictors for identifying pathogenic\nvariants. Error bars indicate the 95% confidence interval of the ROC AUC as derived through bootstrapping.\nStability predictors are shown in red, while other variant effect prediction methods are shown in green. Absolute\nAAG values were used for stability-based methods. The figure was generated in R v3.6.3 (https://www.r-proje\nct.org) using ggplot2 v3.3.0 (https://ggplot2.tidyverse.org), both freely available.\nDiscussion\nThe first purpose of this study was to compare the abilities of different computational stability to distinguish\nbetween known pathogenic missense mutations and other putatively benign variants observed in the human\npopulation. In this regard, FoldX is the winner, clearly outperforming the other AAG prediction tools. It also\nhas the advantage of being computationally undemanding, fairly easy to run, and flexible in its utilisation.\nCompared to other methods that employ physics-based terms, FoldX introduces a few unique energy terms\ninto its potential, notably the theoretically derived entropy costs for fixing backbone and side chain positions59.\nHowever, the main reason behind its success is likely the parametrisation of the scoring function, resulting from\nthe well optimised design of the training and validation mutant sets, which aimed to cover all possible residue\nstructural environments60. Interestingly, while the form of the FoldX function, consisting of mostly physicsbased energy terms, has not seen much change over the years, newer knowledge-based methods, which leverage\nSCIENTIFIC REPORTS |\n\n(2020) 10:15387 |\n\nhttps://doi.org/10.1038/s41598-020-72404-w\n\n5\n\n\x0c\xe2\x96\xa0i\n\nmam\nwww nature com/snentificreports/\n\nPet.Reh.\'App.90\'\n\n. \xc2\xbb s. ^ \xe2\x80\xa2\n\n1\n\nstatistics derived from the abundant sequence and structure information, demonstrate poorer and highly varied\nperformance. However, it is important to emphasise that the performance of FoldX does not necessarily mean\nthat it is the best predictor of experimental AAG values or true (de)stabilisation, as that is not what we are test\xc2\xad\ning here. We also note the strong performance of INPS3D, which ranked a clear second in all tests. It has the\nadvantage of being available as a Webserver, thus making it simple for users to test small numbers of mutations\nwithout installing any software.\nThere are two factors likely to be contributing to the improvement in the identification of pathogenic muta\xc2\xad\ntions using absolute AAG values. First, while most focus in the past has been on destabilising mutations, some\npathogenic missense mutations are known to stabilise protein structure. As an example, the H101Q variant of\nchloride intracellular channel 2 (CLIC2) protein, which is thought to play a role in calcium ion signalling, leads\nto developmental disabilities, increased risk to epilepsy and heart failure61. The CLIC2 protein is soluble, but\nrequires insertion into the membrane for its function, with a flexible loop connecting its domains being func\xc2\xad\ntionally implicated in a necessary conformational rearrangement. The histidine to glutamine substitution, which\noccurs in the flexible loop, was predicted to have an overall stabilising energetic effect due to conservation of\nweak hydrogen bonding, but also the removal of charge that the protonated histidine exerted on the structure61.\nThe AAG predictions were followed up by molecular dynamics simulations, which supported the previous con\xc2\xad\nclusions by showing reduced flexibility and movement of the N-terminus, with functional assays also revealing\nreduced membrane integration of the CLIC2 protein in line with the rigidification hypothesis62. However, other\ninteresting examples of negative effects of over-stabilisation exist in enzymes and protein complexes, manifest\xc2\xad\ning through the activity-stability trade-off, rigidification of co-operative subunit movements, dysregulation of\nprotein-protein interactions, and turnover49,50,63.\nIn addition, it may be that some predictors are not as good at predicting the direction of the change in stability\nupon mutation. That is, they can predict structural perturbations that will be reflected in the magnitude of the\nAAG value, but are less accurate in their prediction of whether this will be stabilising or destabilisng. For example,\nENGoM.and.DyiiaMuf predict nearly half of pathogenic missense mutations to be stabilising (41% and 44%,\nrespectively)7whereas FoldX predicts"only 13%rWKile FoldX, Rosetta\'and PoPMuSiCarealTdrivenby scoring\nfunctions consisting of a linear combination of physics- and statistics-based energy terms, ENCoM is based on\nnormal mode analysis, and relates the assessed entropy changes around equilibrium upon mutation to the state\nof free energy. DynaMut, a consensus method, integrates the output from ENCoM and several other predictors\n(Table 1) into its score48. The creators of ENCoM found that their method is less biased at predicting stabilising\nmutations64. From our analysis, we are unable to confidently say anything about what proportion of pathogenic\nmutations are stabilising versus destabilising, or about which methods are better at predicting the directionof\nstability change, but this is clearly an issue that needs more attention in the future.\nThe second purpose of our study was to try to understand how useful protein stability predictors are for the\nidentification of pathogenic missense mutations. Here, the answer is less clear. While all methods show some\nability to discriminate between pathogenic and putatively benign variants, it is notable and perhaps surprising\nthat all methods except FoldX and INPS3D performed worse than the simple BLOSUM62 substitution matrix,\nwhich suggests that these methods may be relatively limited utility for variant prioritisation. Even FoldXjWas\nunequivocally inferior to multiple variant effect predictors, suggesting that it should not be relied upon by itself\nfor the identification of disease mutations.\nOne reason for the limited success of stability predictors in the identification of disease mutations is jthat\npredictions of AAG values are still far from perfect. For example, a number of studies have compared AAG\npredictors, showing heterogeneous correlations with experimental values on the order of R = 0.5 for many\npredictors12,13,65. However, a recent work has also revealed problems with the noise in experimental stability\n\xe2\x80\x9c"Tlata\'used:to\'\'b\'ericliifiafk"theprediction*metKo"ds,-generallyrassessed:throughrcorrelation,values6feTakingmoise=\nand data distribution limitations into account, it is estimated that with currently available experimental data\nthe best AAG predictor output correlations should be in the range 0.7-0.8, while higher values would suggest\noverfitting66. As such, even assuming that\xe2\x80\x98true\xe2\x80\x99 AAG values were perfectly correlated with mutation pathogenic\xc2\xad\nity, we would still expect these computational predictors to misclassify many variants.\nThe existence of alternate molecular mechanisms underlying pathogenic missense mutations is also likely to\nbe a major contributor to the underperformance of stability predictors compared to other variant effect predic\xc2\xad\ntors. At the simplest level, our analysis does hot consider intermolecular interactions. Thus, given that pathogenic\nmutations are known to often occur at protein interfaces and disrupt interactions67,68, the stability predictors\nwould not be likely to identify these mutations in this study. We tried to minimise the effects of this by only\nconsidering crystal structures of monomeric proteins, but the existence of a monomeric crystal structure does\nnot mean that a protein does not participate in interactions. Fortunately, FoldX can be easily applied to protein\ncomplex structures, so the effects of mutations on complex stability can be assessed.\nPathogenic mutations that act via other mechanisms may also be missed by stability predictors. For example,\nwe have previously shown that dominant-negative mutations in ITPR169 and gain-of-function mutations in\nPAX670 tend to be mild at a protein structural level. This is consistent with the simple fact that highly destabilis\xc2\xad\ning mutations would not be compatible with dominant-negative or gain-of-function mechanisms. Similarly,\nhypomorphic mutations that cause only a partial loss of function are also likely to be less disruptive to protein\nstructure than complete loss-of-function missense mutations71.\nThese varying molecular mechanisms are all likely to be related to the large heterogeneity in predictions we\nobserve for different proteins in Fig. 2. Similarly, the specific molecular and cellular contexts of different proteins\ncould also limit the utility of AAG values for predicting disease mutation. For example, even weak perturbations\nin haploinsufficient proteins could lead to a deleterious phenotype. At the same time, intrinsically stable proteins,\nproteins that are overabundant or functionally redundant could tolerate perturbing variants without such high\n\nSCIENTIFIC REPORTS |\n\n(2020)10:15387 |\n\nhttps://doi.orgA0.1038/s41S98-020-72404-w\n\n5\n\ns\n\n\'\n\nj\n\n.. -j:\n\n\x0cWWW nature com/scientificreports/\n......... .............. ........\xe2\x80\xa2............. ........ .................>...........*\n\n.\n\n+\n\n\xe2\x80\xa2\n\n\xc2\xbb *\xe2\x96\xa0\n\n-\n\n*=2\n\n\xe2\x96\xa0 PetrReh. App.91 \xe2\x96\xa0\n\n\xe2\x80\x94I\n\nAAG variants being associated with disease. Finally, in some cases, mildly destabilising mutations can unfold\nlocal regions, leading to proteasome mediated degradation of the whole protein3\'1,3\'1,72.\nThere could be considerable room for improvement in AAG predictors and their applicability to disease\nmutation identification. Recently emerged hybrid methods, such as VIPUR73 and SNPMuSiC\'4, show promise\nof moving in the right direction, as they assess protein stability changes upon mutation while attempting to\nincrease the interpretability and accuracy by taking the molecular and cellular contexts into account. However,\nnone of the mentioned hybrid methods employ FoldX, which, given our findings here, may be a good strategy.\nRosetta is also promising due to its tremendous benefit demonstrated in protein design. It should be noted that\nthe protocol used for Rosetta in our work utilised rigid backbone parameters, due to the computation costs\nand time constraints involved in allowing backbone flexibility. An accuracy-oriented Rosetta protocol, or the\n\xe2\x80\x9ccartesian_ddg\xe2\x80\x9d application in the Rosetta suite, which allows structure energy minimisation in Cartesian space,\nmay lead to better performance37,75.\nThe ambiguity of the relationship between protein stability and function is exacerbated by the biases of the\nvarious stability prediction methods, which arise in their training, like overrepresentation of destabilising vari\xc2\xad\nants, dependence on crystal resolution and residue replacement asymmetry. Having observed protein-specific\nperformance heterogeneity, we suggest that in the future focus could be shifted to identifying functional and\nstructural properties of proteins, which could be most amenable to structure and stability-based prediction of\nmutation effects. Additionally, a recent work has showcased the use of homology models in structural analysis\nof missense mutation effects associated with disease, demonstrating utility that rivals experimentally derived\nstructures, and thus expanding the possible resource pool that could be taken advantage of for structure-based\ndisease prediction methods30. Further, our disease-associated mutations set likely contains variants causing\ndisease through other mechanisms, that do not manifest through strong perturbation of the structure, making\naccurate evaluation impossible. To allow better stability-based predictors, it is important to have robust annota\xc2\xad\ntion of putative variant mechanisms, which is currently lacking due to non-existent experimental characterisa\xc2\xad\ntion. We hope our results encourage new hybrid approaches, which make full use of the best available tools and\nresources to increase our ability to accurately prioritise putative disease mutations for further study, and elucidate\nthe relationship between disease and stability changes.------ - \xe2\x80\x94----------------------\n\n-as\n\nMethods\n\nxiiiM\n\n. ...\n\nPathogenic and likely pathogenic missense mutations were downloaded from the ClinVar2 database on 201904-17, while putatively benign variants were taken from gnomAD v2.l\xe2\x80\x98. Any ClinVar mutations were excluded\nfrom the gnomAD set. We searched for human protein-coding genes with at least 10 ClinVar mutations occur\xc2\xad\nring at residues present in a single high-resolution (< 2 A) crystal structure of a protein that is monomeric in its\nfirst biological assembly in the Protein Data Bank. We excluded non-monomeric structures due to the fact ftiat\nseveral of the computational predictors can only take a single polypeptide chain into consideration.\nFoldX 5.076 was run locally using default settings. Importantly, the \xe2\x80\x98RepairPDB\xe2\x80\x99 option was first used to repair\nall structures. Ten replicates were performed for each mutation to calculate the mean.\nThe Rosetta suite (2019.14.60699 release build) was tested on structures first pre-minimised using the minimize_with_cst application and the following flags: -in:file:fullatom; -ignore_unrecognized_res -fa_max_dis\n9.0; -ddg::harmonic_ca_tether 0.5; -ddg::constraint_weight 1.0; -ddg::sc_min_only false. The ddg_monomer\napplication was run according to a rigid backbone protocol with the following argument flags: -in:file:fullatom;\n-ddg:weight_file re\xc2\xa32015_soft; -ddg::iterations 50; -ddg::local_opt_only false; -ddg::min_cst false; -ddg::min true;\n-ddg::ramp_repulsive true ;-ignore_unrecognized_res.\n<4\nPredictions by ENCoM, DUET and SDM were extracted from the DynaMut results page, as it runs them\nas parts of its own scoring protocol. mCSM values from DynaMut coincided perfectly with values from the\n^separate-mGSMaveb^ser-ver.tandThusitheiSeryer^valuesayere.used^as^D.ynaMutAalc.ulatigjngiaeldgdjgSLgjaUlg.\ndue to failing on more proteins.\n\\\n\'\n.\n_ , \'\nAll other stability predictors were accessed through their online webservers with default settings by employing\nthe Python RoboBrowseT web scrapping library. Variant effect predictors were run in the same way as described\nin our recent benchmarking study10.\nMethod performance was analysed in R using the PRROC77 and pROC78 packages, and AUC curve differ\xc2\xad\nences were statistically assessed through 10,000 bootstraps using the roc.test function of pROC. For DynaMut,\nI-Mutant 3.0, mCSM, SDM, SDM2 and DUET, the sign of the predicted stability score was inverted to match the\nconvention of increased stability being denoted by a negative change in energy. For the precision-recall analysis,\nwe used a subset of the mutation dataset, containing 9,498 ClinVar and gnomAD variants, which had no missing\nprediction values for any. of the stability-based methods. This is because a few of the predictors were unable to\ngive predictions for all mutations (e.g. they crashed on certain structures), and for the precision-recall analysis,\nit is crucial that all predictors are tested on exactly the same dataset. We also show that the relative performance\nof the top predictors remains the same in the ROC analysis using this smaller dataset (Table SI).\nAll mutations and corresponding structures and predictions are provided in Table S2.\n\ni\n\xe2\x80\xa2L\n\n*\n\nI\n\nReceived: 11 June 2020; Accepted: 31 August 2020\nPublished online: 21 September 2020\nReferences\n1. Karczewski, K. J. el al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature \xc2\xab\xe2\x96\xa0"\n(2020).\n\nSCIENTIFIC REPORTS |\n\n(2020)10:15387 |\n\nhttps://doi.org/10.1038/s41598-020-72404-w\n\nJ\n\n\x0c'