ASL-Sentence Reproduction Test

The reason for developing the American Sign Language Sentence Reproduction Test (ASL-SRT; Hauser et al., 2008) was to have a robust measure of ASL fluency, given the wide variety in ASL fluency among users. The purpose of the ASL-SRT is to evaluate the level of ASL fluency (1) for educational and clinical assessment of ASL users and (2) to provide a benchmark test which can be used in linguistic research.

The goal of the ASL-SRT is also to have a measure that takes only a short time to administer and analyze/score, requires less expertise of the person scoring the test, and tests across different linguistic domains.

Based on an review of existing ASL measures, the authors state that “there remains therefore a need for a quantitative test of global ASL proficiency for both children and adults that involves a short administration time and scoring procedures that are robust across scorers and can be completed by individuals without intensive training” (Hauser et al., p. 162, 2008). This test should be able to discriminate between signers who acquired ASL from their parents and those who did not, and also between signers who have achieved full mastery of ASL vs. those who have not yet done so. None of the ASL measures that have been developed so far match these purposes (Hauser et al., 2008).

The ASL-SRT is based on the Speaking Grammar Subtest of the Test of Adolescent and Adult Language – Third Edition (TOALT3; Hammill, Brown, Larsen, & Wiederholt, 1994).



The Speaking Grammar Subtest of the TOALT3 (Hammill et al., 1994)consists of 30 English sentences which are ordered with increasing difficulty. The sentences are administered live by the test administrator. The test taker is awarded one point when he/she reproduces the sentence correctly and zero points when the sentences are reproduced with omissions or commission. The authors of the ASL-SRT decided to have the sentences pre-recorded available on video and the replies of the test taker are videotaped for later scoring.


Development and pilot of the ASL-SRT: For the ASL-SRT 40 sentences that increase in length and in syntactic, thematic and morphemic complexity were developed. The authors were careful to use only lexical items for the test that do not show regional variation, do not vary across generations, and are not a variation of a sign system. Sentence complexity was increased by using fingerspelling, numerical incorporation affixes, and signs that were assumed to be of low frequency of occurrence. It is also important to note that sentence complexity in sign languages in general does not increase with increasing sentence length (e.g. polymorphemic signs). The sentences were arranged in order of increasing complexity, determined by two deaf native signers and then piloted with a group of eight signers. These eight signers had varying degrees of ASL skills (novice hearing adults, hearing native and non-native ASL/English interpreters, deaf native and non-native adult signers). The pilot resulted in a revision of the sentence order, an even distribution of easy and difficult items was created, and items that resulted in grammatical and lexical features that were found to vary among the pilot participants’ responses were omitted. This left the test with 39 sentences which were further tested with a larger group of signers.


Main study/second phase: The aims of the main study were threefold: (1) to determine how well the test materials differentiated among participants with different levels of ASL proficiency, (2) to determine whether the rating method was reliable among different scorers, and (3) to determine whether it was possible to agree upon a final sentence order. The data used in the main study were the basis for creating the final version of the ASL-SRT.

In the following, 120 native and non-native deaf and hearing children and adults were tested (Table 1). The participants had varying degrees of ASL proficiency. Only the data from 99 participants were scored and analyzed.

Table 1: Number and age of deaf and hearing native and non-native children and adult signers (N = 99) (from: Hauser et al., 2008)































































Testing procedure: The test and the test instructions were prerecorded and shown to the participants on video. All participants were tested individually. After watching the instructions they had the chance to ask clarificatory questions to the test administrator. Prior to the actual 39 test sentences, two practice items were presented which the participants had to repeat. They did not get any feedback on theses items. Then the sentences with increasing complexity where shown to the participant. After each sentence, the participant repeated the sentence and the responses were videotaped.


Scoring procedure: The reproduced sentences were subsequently scored by two ASL deaf native signers. These were both undergraduate psychology students with no formal training in linguistics, and only a brief training on how to rate the video data. The scorers were trained to watch the original sentence and also the reproduced sentence of the participants. They were instructed to mark a sentence as incorrect “if they felt that the reproduction was not exactly the same as the original” (Hauser et al., 2008, p. 166). The scorers where also asked to describe the errors of the participants to explain discrepancies between the scorers and also as basis for the development of the test manual explaining errors and acceptable variations. One of the authors provided some feedback and answered questions on the scorers’ rating based on the reproduction of one participant. No more training after that was provided since the goal was to construct a test that requires minimal training. For each correctly reproduced sentence, one point was awarded, and for a sentence with one or more errors, zero points were awarded (for an overview on the different error types, see Hauser et al., 2008, p. 166). The scored data of 99 participants were used (1) to calculate reliability, (2) to calculate the effect of the hearing status on the ASL-SRT scores (native vs. non-native signers), and (3) to calculate the age effect and native exposure (deaf children) on the test scores.



Reliability: Even when the mean scores of the two raters were significantly different (a paired-samples t-test revealed that one rater had a higher criterion for accepting the reproduced sentences (t(98) = 15.28, p < .001)), a significant inter-rater reliability was found (Pearson R = .83, p < .01). The internal consistency of each rater’s score was found to be high (rater 1: alpha = .87; rater 2: alpha = .88). In summary, the scored results of the two raters revealed a high inter- and intra-rater reliability (Hauser et al., 2008).


Effect of hearing status in adults: The results of the comparison between hearing and deaf native signers showed that the 25 hearing native signers (M = 18.3, SD = 6.3) performed significantly worse than the 23 deaf native signers (M = 25.9, SD = 4.0; t(46) = -4.95, p < .001), which suggests that deaf and hearing native signers have different levels of ASL proficiency. The authors conclude that “[t]herefore, the data from hearing participants will not be used to rank the sentences in order of difficulty for the final test product” (Hauser et al., 2008, p. 168).


Effect of native fluency and developmental age in deaf signers: In order to investigate the effect of native fluency on the scores of the ASL-SRT the authors included deaf native and non-native signing children (age: 10-17 years old) and adults (age: 18-60 years old; Hauser et al., 2008).  A 2x2 ANOVA was computes on the ASL-SRT scores with native fluency (native vs. non-native) and developmental age (children vs. adults). The results revealed a “main effect of native fluency (Mnative = 24.8, SDnative = 4.1; Mnon-native = 18.3, SDnon-native = 7.2, F(1, 63) = 11.33, p = .001), and of developmental age (Mchildren = 21.7, SDchildren = 6.2; Madults = 25.3, SDadults = 4.2, F(1, 63) = 5.10, p < .05), without an interaction effect“ (Hauser et al., p. 168, 2008).

The results indicate that the ASL-SRT is “sensitive to differences in ASL fluency between native and non-native signers. It also distinguishes between children and adults” (Hauser et al., p. 169, 2008).


Refinement of the test and future direction

Based on the results of the main study, the sentences were ordered according to their level of difficulty. This ordering of sentence difficulty was based only on the data of the native signing deaf children (Hauser et al., 2008). The authors argue in favour of using only the data from native signing children and excluding deaf non-native signers because of their varying pattern of language development. The children’s data also demonstrated more variability than the data of the native signing deaf children which helped to determine the item order (P. Hauser, personal communication, June 19, 2012). The reordering of the sentences was based on the results of the two scorers. The items were reordered from the highest level of correct responses (easy items) to the lowest (difficult items). Out of the original 39 sentences, 30 were selected and checked again on a new sample of deaf signers.


The current version takes 30 minutes to administer and 30 minutes for scoring with minimal training (Hauser et al., 2008).


In the future, the ASL-SRT will be administered again to collect data for normative samples for children and adults who are native and non-native deaf signers. Then the authors will also test the validity of the ASL-SRT and improve inter- and intra-rater reliability. The analysis of this data collection is expected to be finished in 2012 (P. Hauser, personal communication, June 18, 2012).


The ASL-SRT has been adapted to German Sign Language (Kubus & Rathmann, 2012), British Sign Language (Cormier et al., 2012) and is currently been adapted to Swedish Sign Language (K. Schönström, personal communication, July 5, 2013) and Swiss German Sign Language.


Strengths: (1) no extensive training is needed to score the test, (2) no formal linguistic training is needed to score the test, (3) it can be used as a baseline to differentiate between levels of fluency depending on the native vs. non-native background and children and adults.


Weaknesses: (1) mostly applicable in the research context, (2) no reported evidence on validity.


Summarized by Tobias Haug (2012).


For more information regarding this test, please contact  Peter Hauser at the NTID, Rochester Institute of Technology.