Assessment Instrument for SLN

The purpose of the assessment instrument for Sign Language of the Netherlands (TNGT) was to develop a standardized language test for deaf children in primary education in the Netherlands. This test was developed as part of a five year project (starting in 2001), and built on an early  assessment project on Sign Language of the Netherlands (SLN; Jansma , Knoors, & Baker, 1997). This SLN assessment instrument tests both receptive and expressive skills (Hermans, Knoors, & Verhoeven, 2010).
The assessment instrument for SLN can be considered as an instrument for in-depth investigation across different domains of language. The target group for this test is deaf children 4-12 years old.


The SLN assessment instrument consists of nine different computerized tests that focus on receptive and expressive SLN skills across different domains, i.e. phonology, morpho-syntax, and narrative skills. Not every test was expected to be appropriate for every age group, for example, the receptive and phonology tests were only developed for the children 4-8 years old since it was expected that children > 8 years old have already mastered the phonological system of SLN (Hermans et al., 2010).


The first version of the test was developed in close cooperation with deaf informants and SLN researchers. This first version was trialled with 10 deaf children aged 4-10 years old, which resulted in an improved version of the test.  Before the test administrator actually started with the test, he/she had to type in the child’s name, gender, and age. All expressive tasks were scored live by the test administrator, in order to keep avoid time-consuming analysis after the testing (Hermans et al., 2010).


The final test version of the SLN assessment instrument contains the following tasks:


(1) Receptive phonology task (Video 1): In this task, two signs were shown one after the other on a computer screen. The signs were produced by two deaf native signers. Children were asked to decide whether or not the signs had the same meaning (press the green button with the mouse) or a different meaning (press the red button). A certain number of the signs are minimal pairs in SLN, and are different (predominantly) in respect to one phonological parameter: either the mouthing pattern, the handshape, or the movement. The rationale behind the task was that if children had not yet acquired this phonological parameter, they would have difficulties in discriminating these minimal pairs. The receptive phonology task consisted of 36 items.


(2) Expressive phonology task: The children’s expressive phonological skills were assessed in an imitation task. In this task, a sign was presented on a computer screen. Children were instructed to repeat the sign. The test-administrator judged the correctness of one parameter of the sign produced by the child (e.g., handshape, movement, oral component). The test materials also included an information sheet for test administrator on which the possible correct responses for each sign were depicted.


(3) Receptive vocabulary task (Video 2): In the receptive vocabulary task, a sign was presented on the computer screen followed by four pictures. Children were asked to select the picture that matched the meaning of the sign, by using the computer’s mouse. This test consisted of 61 items in total. One of the major problems in developing a sign language vocabulary test concerns the iconicity of signs (e.g., Jansma et al., 1997; White & Tischler, 1999). The problem is that children who encounter a sign which they have not yet acquired may exploit the iconic features of the sign to correctly guess its meaning, and select the appropriate picture. The authors of this test used two strategies to reduce this problem: (1) distractors were added (pictures), that did not match the meaning of the target sign, but which resembled the shape of the target sign and (2) the picture that matched the target sign in meaning was drawn from such a perspective that its shapes no longer resembled the iconic features of the sign (Hermans et al., 2010).

In order to investigate the facilitating effect of iconicity, Hermans et al. (2010) administered the receptive vocabulary test to 28 hearing children, aged 11 to 12, with no knowledge of SLN. The percentage of correct responses for these children ranged from 21.3% to 42.6%, with an average of 33.5%. This percentage differed significantly by chance (25%), as confirmed by a t-test (t (27) = 7.87, p < .001). This finding may suggest that hearing children can still effectively exploit the iconicity of signs. However, it is important to note that hearing children without knowledge of SLN may have exploited not only the iconicity of signs, but also its spoken component. The SLN test consisted predominantly of nouns which are usually accompanied by an oral component in SLN, that in citation-form often consists of the whole word. After the experiment, some of the hearing children reported that they used the spoken component to guess the correct picture. In other words, it is possible that the hearing children without SLN knowledge may have scored above chance because they could exploit the spoken component (i.e. mouthing pattern) instead of the iconic features of the sign. Nevertheless, even though the hearing children performed above chance, the problem was much less pronounced as in previous studies (Jansma et al., 1997; White & Tischler, 1999).


(4) & (5) Expressive vocabulary skills task: The children’s expressive vocabulary skills were assessed in two tasks: (4) theexpressive vocabulary-I and (5) the expressive vocabulary-II tasks. In the expressive vocabulary-I task, a picture was presented on a screen. Children were instructed to name the picture in SLN. The test consisted of 54 items. The test-administrator scored whether or not children had produced the correct response in SLN. In the expressive vocabulary-II task, a sign was presented in SLN. Children were instructed to describe the meaning of the sign. The test consisted of 40 items. Again, the test-administrator wrote down whether or not the deaf child had successfully managed to describe the meaning of the sign in SLN.


(6) & (7) Receptive (Video 3) and expressive morpho-syntacic tasks: The children’s receptive and expressive morpho-syntactic skills were tested in two tasks. In both tasks, a variety of morpho-syntactic rules of SLN were tested (e.g., verb agreement, modifications of verbs for aspect, classifier verbs of motion and location). In the receptive morpho-syntactic task, a phrase or sentence in SLN was presented on the screen, followed by four pictures. Children were instructed to select the picture that matched the phrase of sentence.

In the expressive morpho-syntactic task, a picture was presented on a screen. Then, an SLN video appeared next to the picture, and the picture’s content was described in SLN. Finally, another picture was presented on the screen and children were instructed to describe the picture in SLN. The test-administrator scored whether or not the child had successfully described its content in SLN. The first picture and its description in SLN were used to elicit an appropriate response. The expressive morpho-syntactic task consisted of 24 items.


(8) & (9) Narrative comprehension and production skills task (Figure 1)The narrative comprehension and production skills of the children were assessed in two tasks. In the narrative comprehension task a story was presented in SLN. After each story, four questions were presented in SLN on the screen, and children were instructed to answer these questions. Some of these questions referred to information mentioned in the stories. The test-administrator scored whether or not the children had correctly answered each question. The narrative comprehension task consisted of 5 stories and 20 questions. The average length of the stories was 53 seconds (range 39 – 83).

In the narrative production task, a story was depicted on the screen. Children were instructed to watch the story. Then the depicted story disappeared from the screen, and children were asked to retell the story in SLN. The retelling of the story was scored live.

Figure 1: Example of narrative production task (© Hermans et al., 2007)
Figure 1: Example of narrative production task (© Hermans et al., 2007)

The norming study

A norming study was conducted with the goal of collecting information for children aged 4;0 to 4;11, 5;0 to 5;11 etc, in yearly intervals up to 12 years old. A total of 330 children were tested. Because of the N of 330, it was not possible to develop separate norms for deaf children of deaf and hearing parents across age groups. 7 out of 8 schools for the deaf in the Netherlands were included in the standardization study. All schools provided a bilingual education program. None of the participating children had a known additional disability. 163 were tested in three consecutive years. Seventy-six children were tested twice, while 91 children were tested once.

Additionally, the goal of the norming study was to define five categories for each age group, following the TAK-R (Verhoeven & Vermeer, 2001). On the basis of the scores of all of the children in each age group the following five intervals were defined: A) Good to Excellent (above the 75th percentile), B) Average to Good (between the 50th and 75th percentile), C) Moderate to Average (between the 25th and 50th percentile), D) Poor to Moderate (between the 10th and 25th percentile), and E) Very poor to Poor (below the 10th percentile).


Test administrators for the norming study were recruited from deaf and hearing students attending university programs to become interpreters or sign language teachers. All participating students were trained how to administer the test. The administration of the entire test battery took 2- 2.5 hours, approximately 15 minutes for each subtest.


Psychometric properties

The authors of the test applied different measures to establish reliability and validity.

(1) Cronbach alphas were computed for each age group and each subtest separately. Across all age groups and tests, there was quite a lot of variance on the Cronbach alpha values, ranging from .60 to .96. The average alphas for each task were good, i.e. (> .90) for the receptive vocabulary task and the receptive morpho-syntax task, acceptable (> .80) for the receptive phonology task, the expressive vocabulary-I task, expressive vocabulary-II task, the expressive morpho-syntax task, the narrative comprehension task and moderate (> .70) for the expressive phonology task and the narrative production task.

(2) Test-retest reliability: Such data were not available for the sign language tests. However, a large proportion of the children were tested in two or three consecutive years. Hermans et al. (2010) have used those data to assess the test-retest reliability. The correlations between the children’s score on two consecutive test administrations were .53, .56, .71, .80, .77, .73, .74, .83, .81 for the tests assessing receptive phonology, expressive phonology, receptive vocabulary, expressive vocabulary-I, expressive vocabulary-II, receptive morpho-syntax, expressive morpho-syntax, narrative comprehension and narrative production respectively. Especially for the receptive and expressive phonology tests, test-retest reliability was very low. Note that using such a large time-interval (one year) will have negatively affected the test-retest reliability, as there will be differences between children’s acquisition of SLN skills between the test-administrations. In other words, this procedure has presumably resulted in an underestimation of the test-retest reliability.

(3) Inter-rater reliability: The expressive tasks were scored by the test administrator during the test administration. The expressive tasks were also videotaped to investigate the inter-rater reliability of the scoring. For each of the five expressive tasks, another group of 13 test administrators scored a randomly selected group of children within a particular age group for the second time, but now from videotape. The correlations ranged from .78 (narrative production) to .92 (expressive vocabulary-II), which can be considered as high.

(4) Validity: Concurrent validity could not be established because no other SLN tests were available at this time. The construct and predictive validity, however, could be determined.

In order to investigate construct validity three procedures were applied. (I) The children’s age was correlated with their test performance. The correlations between the children’s age and their test scores were .585 (p < .001) for the receptive phonology task, .579 (p < .001) for the expressive phonology task, .722 (p < .001) for the receptive vocabulary task, .735 (p < .001) for the expressive vocabulary-I task, .685 (p < .001) for the expressive vocabulary-II, .698 (p < .001) for the receptive morpho-syntax task, .682 (p < .001) for the expressive morpho-syntax task, .758 (p < .001) for the narrative comprehension task and .683 (p < .001) for the narrative production task. (II) Another measure to investigate construct validity was correlate gender with test performance. As was assumed by Hermans and his colleagues, girls out-performed boys on every SLN task. (III) Parents’ hearing status and test scores: The third variable that was investigated was the hearing status of the parents of the children. Deaf children with deaf parents often have better signing skills than deaf children with hearing parents. The results revealed that deaf children of deaf parents significantly outperformed deaf children of hearing parents on each of the nine SLN subtests.

The predictive validity was established making use of a previous conducted by Ormel (2008), which showed that the receptive vocabulary task used in this study correlated significantly with reading comprehension in deaf children.


The test is available and can be used in schools.


Strength: (1) covers a wide range of linguistic devices/domains, (2) broad age range (4-12 years old, (3) tests both language comprehension and production, (4) robust psychometric properties to a certain degree, (5) based on linguistic research availablefor SLN (not on other sign languages), (6) availability of age norms, and (7) availability of the test.


Weaknesses: (1) rather complex and takes some time to administer and score and (2) issue of dialect variants in vocabulary tests is not clear.

Summarized by Tobias Haug (2010; in cooperation with Daan Hermans).

For more information regarding this test, please contact  Daan Hermans at Kentalis, the Netherlands.