Content Validity and Inter-Rater Reliability of an Instrument to Characterize Unintentional...
-
Upload
independent -
Category
Documents
-
view
5 -
download
0
Transcript of Content Validity and Inter-Rater Reliability of an Instrument to Characterize Unintentional...
CONTENT VALIDITY AND INTER-RATER RELIABILITY OF THE HALLIWICK-
CONCEPT-BASED INSTRUMENT "SWIMMING WITH INDEPENDENT MEASURE"
Katja Groleger Sršen1, Gaj
Vidmar
1, Maša Pikl
2, Irena Vrečar
1, Cirila Burja
3, Klavdija Krušec
4
1University Rehabilitation Institute, Republic of Slovenia, Ljubljana
2Elementary School Gradec, Litija, Slovenia
3CIRIUS Kamnik, Slovenia
4CIRIUS Vipava, Slovenia
Corresponding author:
Assist Prof Gaj Vidmar, PhD
University Rehabilitation Institute, Republic of Slovenia
Linhartova 51, SI-1000 Ljubljana, Slovenia
E-mail: [email protected]
Conflicts of interest: none declared
Source of funding: none
1
Abstract
Objective: The Halliwick concept is widely used in different settings to promote joyful
movement in water and swimming. To assess swimming skills and progression of individual
swimmer, one should use a valid and reliable measure. The Halliwick-concept-based Swimming
with Independent Measure (SWIM) was introduced for this purpose. We wanted to determine
its content validity and inter-rater reliability.
Methods: 54 healthy children, 3.5 to 11 years old, from a mainstream swimming program
participated in a content validity study. They were evaluated with SWIM and the national
evaluation system of swimming abilities (classifying children into seven categories). For
studying inter-rater reliability of SWIM, we included 37 children and youth from a Halliwick
swimming program, aged 7-22 years, who were evaluated by two Halliwick instructors
independently.
Results: Average SWIM score differed between national evaluation system categories and
followed the expected order (p<0.001), whereby a ceiling effect was observed in the higher
categories. High inter-rater reliability was found for all 11 SWIM items. The lowest reliability
was observed for item G (sagittal rotation), though the estimates were still above 0.9. As
expected, the highest reliability was observed for total score (intraclass correlation 0.996).
Conclusions: Validity of SWIM with respect to the national evaluation system of swimming
abilities is high until the point where a swimmer is well adapted to water and already able to
learn some swimming techniques. Inter-rater reliability of SWIM is very high, so we believe
that SWIM can be used in further research and practice to follow the progress of swimmers.
Key words: swimming, children, Halliwick, evaluation of progress, SWIM
2
Introduction
The Halliwick concept of teaching swimming is widely used across the world. In short, it is a
well-developed concept of teaching people with physical and/or learning difficulties to move
independently in water and also to swim, if possible. It comprises knowledge on
hydromechanics, hydrostatics, biomechanics and teaching, aiming to help swimmer develop
water confidence. Its development began in 1950s, when Phyl McMillan, James McMillan and
Joan Martin wanted to develop a special swimming program for children at the Halliwick
School for Crippled Girls in London. Eventually, all the knowledge and experience that they
had acquired led to the a development of a ten-point program: mental adjustment to water,
disengagement of swimmer (building independence), transversal rotation control, sagittal,
longitudinal and combined rotation control, up-trust, balance in stillness, turbulent gliding,
simple progression and basic swimming stroke (McMillan, 2002).
Within the Halliwick system, swimmers are actively engaged in the learning process through
different activities and play. Their abilities are traditionally assessed through a system of four
Halliwick badges: red, yellow, green and blue. The tests are based on the ten-point program.
Swimmers have to pass several items, which are scored either "passed" or "not passed". In order
to pass the items for red or yellow badge, the swimmer has to perform several activities while
instructor is providing some physical support. Only to pass for the green badge the swimmer has
to perform the items unaided. The blue badge is aimed at providing a wide range of water skills
for advanced swimmers (McMillan, 2006).
To our knowledge, nobody has reported on content validity and inter-rater reliability of this
system yet, even though the system's fundamental validity appears to be self-evident. On the
other hand, it is also obvious that the four-badge system is too rough to precisely evaluate a
wide range of swimmer abilities and cannot be sensitive to a small change. Taking into account
3
that there are quite some swimmers with profound physical difficulties who are not expected to
progress much or at least not in a reasonably short time, another test is needed.
Based on extensive experience gained while working in a Halliwick swimming club, Kim
Peackok (1993) developed a new test called "Swimming with Independent Measure" (SWIM).
SWIM is based on the ten-point Halliwick program. It is aimed at evaluating functional abilities
within any swimming pool setting and can be applied to any diagnostic group and to all ages.
The results of a small recent study suggest that it is sensitive enough to evaluate, follow-up and
plan the individual or group program (Groleger Sršen et. al., 2008, 2010).
There are some other tests to evaluate swimming skills for children with physical or learning
disabilities: the Aquatic Independence Measure – AIM (Chacham and Hutzler, 2001), the Water
Orientation Test of Alyn – WOTA (Tirosh et al., 2008) and Humphries’ Assessment of Aquatic
Readiness – HAAR (Humphries, 2008). They are all related to the Halliwick concept to some
extent. A comparison of the items of those tests suggests that SWIM follows the Halliwick ten-
point program more closely. However, WOTA and AIM have already been tested for content
validity and inter-rater reliability (Tirosh et al., 2008). WOTA consists of two versions: one for
those who are capable of fulfilling instructions, and one for those who are not. The second one
is appropriate for the evaluation of children at the age of approximately 3 years and for older
children with limited cognitive abilities (Tirosh et al., 2008). For practical reasons, it may be
preferable to apply a single version of a test, as it is the case with SWIM, which is useful for all
ages and diagnoses. SWIM is also less time-consuming then WOTA. In our experience, it can
be performed in 15 minutes, while WOTA is reported to be performed in 30 minutes (Tirosh et
al., 2008). SWIM testing is easy to perform and no additional training is needed, while for
WOTA there is a special training incorporated into the curricula of Aquatic Therapy courses in
Israel through a one-day workshop (Tirosh et al., 2008).
4
Based on these considerations, we decided to use SWIM in clinical practice. We could not find
any data on content validity or inter-rater reliability of SWIM, so we wanted to assess both. We
hypothesized that SWIM is a valid and highly reliable measure and could therefore be used in
different settings for evaluation of the relevant functional abilities in water according to the
Halliwick concept.
Methods
Participants
Fifty-five healthy children from a mainstream swimming program were invited for the content
validity study. All of them had been in the program for several months or longer because the
testing was performed in late spring, i.e., at the end of the school-year. Thirty-seven children
were invited to participate in the study on inter-rater reliability. They were all residents at one of
the two school-centres for children with special needs in Slovenia (CIRIUS Kamnik, Vipava)
and engaged in the Halliwick program for several years. A detailed description of this group is
provided in Table 1. Parents of all the children were informed about the protocol and signed the
informed consent. Ethical approval for this study was obtained from the Research Ethics
Committee of the University Rehabilitation Institute of the Republic Slovenia.
Study design
Content validity study: Children were tested by two Halliwick instructors. One was reading
instructions from the SWIM manual and instructing individual child about SWIM items he or
she should perform. When needed, the child was given a practical demonstration by the second
instructor. Physical support was offered when needed. Each child was scored based on the best
performance out of several trials. The instructors then discussed and decided on the assigned
score for each item. Afterwards, all children were tested using the National Evaluation System
of Swimming Abilities – NESSA (Kapus et al., 2002) and assigned to one of the first seven
categories (Table 2).
5
Inter-rater reliability study: The children were tested by two pairs of Halliwick instructors.
Since swimmers with learning and physical disabilities need to be very confident with a person
who is working with them in water, we decided that only one instructor will perform the
practical part of testing. The other instructor was instructing the child about SWIM items he or
she should perform. Physical support was offered when needed. Like in the content validity
study, the child was given a practical demonstration of particular item if needed. Each child was
scored based on best performance out of several trials. The second pair of Halliwick instructors
was sitting at the opposite side of the pool, so they could clearly see and evaluate the
performance of the child but they were not able to hear any possible discussion on SWIM items
or a decision made on scoring a particular item by the first pair.
Swimming with Independent Measure
SWIM comprises 11 items (Table 3) that are evaluated on a 7-point scale (1 to 7). Detailed
information on items is available in the manual (Peacock, 1993). Score 1 means that the
swimmer is unable to perform the activity, it is not safe to test or the item is not measured.
Score 7 is assigned to a swimmer who is able to perform the activity without any support and in
an appropriate way. The maximum possible score is 77 points. There is no need to pass a formal
training to use SWIM, but it is obvious that a person would need knowledge about Halliwick
concept and some practical experience. SWIM was translated into Slovenian and is being used
in clinical practice for the last five years (Groleger Sršen et al., 2008).
Statistical analysis
To eschew the controversies related to measurement level outlined at the beginning of the
Discussion, we conducted all statistical analyses of individual SWIM items, sum of selected
SWIM items and total SWIM scores first using methods assuming interval-level properties and
then using methods assuming only ordinal measurement level. The difference in mean SWIM
score between NESSA categories was first tested using one-way analysis of variance (ANOVA,
6
including a trend analysis with contrasts), and then the ordered trend of SWIM scores with
respect to NESSA categories was tested using the exact Jonckheere-Terpstra test. Mean scores
on two subtotals of SWIM items were compared using paired-samples t-test and then using
exact Wilcoxon matched-pairs signed-rank test (EMP). Likewise, mean scores of all SWIM
items and total SWIM score were compared between the two pairs of raters using paired-
samples t-test and EMP (without adjustment for multiple testing). Agreement between the two
rater pairs regarding SWIM (each item and total score) was assessed with intraclass correlation
(two-way random model for a single measure – ICC (2,1)), and also with weighted Cohen's
kappa coefficient (using quadratic weights). Additionally, for purely exploratory and illustrative
purposes, agreement regarding total score was depicted using the Bland and Altman (1986)
limits-of-agreement approach. All statistical analyses were performed using SPSS 15.0 for
Windows (SPSS Inc., Chicago, IL, 2007), whereby a macro program adapted from published
and verified code was used for kappa calculations (Valiquette et al., 1994; García-Granero,
2007) and Bland-Altman plots (García-Granero, 2009).
Results
Content validity study
Of the 55 healthy children invited to participate, 54 were evaluated because one boy was
reluctant to co-operate; thus, 28 boys and 26 girls were tested. Mean age of the group was 5.9
years (SD 1.9 years, range 3.5-11 years). Nineteen children were not able to swim, 4 were able
to glide through water; the rest were able to swim from eight meters up to 10 minutes without
touching the pool floor (Table 4). As expected, differences in mean SWIM scores among
different NESSA categories (Table 4) were significant (p<0.001). Because of the evident ceiling
effect, we also tested if the means rose with a quadratic trend rather than linearly, which also
proved to be significant (p<0.001). The data did not significantly deviate from the quadratic
trend (p=0.528). The Jonckheere-Terpstra test also showed a significant rise of SWIM scores
with NESSA level (exact p<0.001). Average scores for SWIM items of adjustment to water,
7
breathing and balance were statistically significantly higher than average scores for the items on
rotations (by about 1 point), whether estimated as means and tested parametrically or estimated
as medians and tested nonparametrically (Table 5).
Inter-rater reliability study
All the 37 invited children and youth were willing to co-operate. Mean and median scores for all
SWIM items assessed by both pairs of raters are presented in Table 6. Paired-samples
comparisons for detecting possible bias showed no difference between raters. The largest
difference was observed for item G (sagittal rotation development), but it was still not
statistically significant at the 5% alpha level even though no correction for multiple testing was
applied.
Intraclass correlation and weighted kappa estimates demonstrated that agreement between the
two (pairs of) raters was very high (Table 7). The lowest agreement was found for item G, but it
was still above 0.9. As expected, the agreement was the highest regarding total score (where it
was practically perfect). All the weighted kappa values were virtually identical to the ICC
values (they were identical to two decimals, and equal or lower by up to 0.002 to three
decimals). Given the negligible differences between rater means this should come as no surprise
because in the absence of rater mean differences, ICC(2,1) and kappa with quadratic weights are
identical (Schuster, 2004). On a related note concerning also the content validity study results, it
was also only natural to observe all the medians being very close to the means (because no
distribution was extremely skewed), and all the interquartile ranges being (roughly speaking)
larger by about a half than the standard deviations (because it is known from basic probability
that for a normal distribution the ratio of SD to IQR is 1.35).
The agreement regarding total score is depicted in Figure 1, where the vast majority of points
(i.e., score pairs) lie very close to the main diagonal that represents perfect agreement. In
addition, we visualised agreement regarding total score using the limits-of-agreement approach.
8
The Bland-Altman plot (Figure 2) showed no systematic trend of differences between the two
(pairs of) raters and the limits of agreement comprised zero. The case with the largest
disagreement was a 12 years old girl with CP, GMFCS level IV, for whom the difference of five
points was a result of disagreement regarding items B (by 2 points), C (by 2 points) and J (by 1
point).
Discussion
The aim of the presented study was to explore content validity and inter-rater reliability of
SWIM. The former was addressed through a study involving 55 healthy children, and the later
through a study involving 37 children and youth with special needs assessed by two pairs of
raters.
We found clear association of SWIM scores with the categories of the National Evaluation
System of Swimming Abilities. Based on that, we can conclude that SWIM has good content
validity. Since this part of the study was performed with healthy children, it would be
interesting to perform a similar study with a group of children with physical or learning
disabilities and test content validity against the Halliwick system of four badges.
As expected, we found a ceiling effect with healthy children, since SWIM items are meant to
evaluate pre-swimming and early swimming abilities. This means that SWIM is not useful for
advanced swimmers. We also expected that SWIM scores of children would follow the logical
order of development of pre-swimming and early swimming skill. Again, our expectation
proved to be justified, because the children gained higher scores on the items of adjustment to
water and breathing control, which are among the first skills to be mastered, than on the items of
rotations. Only when a child is able to control breathing while being under water, he or she is
prepared to learn and perform more demanding items of full transversal, longitudinal and
combined rotation (holding face immersed into water and being able to blow out in a controlled
9
manner). It can be added that the same was observed in the group of children and youth with
disabilities.
It was somewhat surprising to see that there were quite some children who were able to swim
(according to the national evaluation system for healthy children) but did still not gain the
maximum total SWIM score of 77 points. We found some of those children not to be fully
adapted to water and not be able to submerge to pool floor and blow bubbles in a controlled
manner. This could lead us to the conclusion that the mainstream swimming program of a
particular school should have spent more time on teaching skills of water adjustment and
breathing control while children are trained in swimming skills. We could not make any
conclusion on other mainstream programs on the national level, but it would be very interesting
to explore this in more detail in the future. The national guidelines of teaching swimming
namely include the points of adjustment to water, breathing control and gliding through water as
early skills to be taught (Kapus et al., 2002).
The results of the second part of the study demonstrated very high inter-rater reliability of
SWIM. At the time of developing the study protocol, we had thought that we should test each
child twice in a row by two pairs of testers. However, we subsequently learned that a child is
able to perform skills at his/her best only when he/she feels comfortable and trusts the instructor
in water. Hence, we adapted the protocol and evaluated children while performing the test only
on one occasion. In this way, we were able to observe only the differences caused by different
decisions of testers, which were in fact just minor. Based on this experience, we can recommend
that the person who is performing the SWIM test should be one who knows the child well and
that the child should be in a confident relationship with that person (concerning water
activities). Such recommendation leads to more reliable evaluation of the child's performance.
10
Based on high inter-rater reliability, we can conclude that no special training is needed for using
the SWIM. Nevertheless, we recommend the tester to be a Halliwick instructor at the level of
group leader or instructor with long-term experience.
The observed inter-rater reliability estimates were all very high. We cannot assign any special
meaning to the fact that the lowest agreement was found regarding item G on sagittal rotation. If
anywhere, we might have expected to find disagreement regarding item D on balance
development. A child can be scored with two points when able to balance in vertical position
with support from helper at trunk, three points when able to balance in vertical position without
support from helper, four points when able to balance in back float position with support from
helper at trunk, and five points when able to balance in back float position without support from
helper. In our experience, there are quite some children (higher levels of GMFCS,
myelomeningocoela and others with poor functional ability of legs) who are not able to stand in
water but are very confident in lying position without support. In those cases we agreed in
advance to score them with the higher score of five points. This is not addressed in the manual
(Peacock, 1993), so we recommend that it is noted and applied in the future.
We also found no systematic trend of differences between raters. In the case with the largest
disagreement (12-year old girl with CP, GMFCS level IV), the two pairs of raters disagreed in
scoring of items B and C by two points. During evaluation, she needed full support of helper
faced towards and was able to blow out with lips at the level of water. The pair of raters who
scored her performance higher (engagement of helper from behind and being able to blow
bubbles in water) was the pair who work with her regularly and know her usual performance.
Hence, disagreement was influenced by previous experience with the girl's performance and
was not a result of a fundamental disagreement on scoring rules.
Before concluding, some methodological issues regarding our statistical analyses must be
addressed. By performing "parametric" and "nonparametric" analyses (to put it in widely used,
11
albeit often misused and technically inappropriate terms) in parallel, we sought to surpass the
controversies regarding the relation between measurement theory (measurement levels) and
statistical theory (statistical methods). Our aim was to avoid both extreme views, namely that of
"permissible" analyses in terms of Stevens' (1951, 1975) taxonomy and the opposing one
epitomised by the saying "the numbers do not know where they came from" (Lord, 1953; Gaito,
1980, Velleman and Wilkinson, 1993). We tried to follow the principle indirectly acknowledged
from both "opposing sides" – from the former primarily through the work in mathematical
psychology ranging from Luce et al. (1990) to Zand Scholten and Borsboom (2009), and the
later in Lord's own sequel (1954) – and championed universally and brilliantly by Tukey (1961),
namely that scientists have to apply mathematics to real-life data with care and understanding.
Furthermore, our results justify our approach and demonstrate that adhering to strictly "ordinal-
level analyses" by discarding the "parametric" part of each of our analyses would have
unnecessarily sacrificed not only familiarity, but also useful information. We can also
confidently speculate that approaching the data within the framework of "ordinal
psychometrics" (Cliff, 1989, 1996, 2003) would have had the same avail. This holds for the
individual SWIM items (which are unquestionably further from interval measurement level) as
well as for the total SWIM score (which might comfortably be assumed to have interval-level
properties in the tradition of classical test theory). It has also long been known and empirically
demonstrated that for the types of analyses we conducted, the decision between ordinal or
interval level of measurement is of no great importance (Baker et al., 1966). Nevertheless,
further research on metric characteristics of SWIM is needed, starting with internal validity
examination that would shed light on the measurement level issues. A much larger sample will
be needed for such analysis, which would entail item-response modelling (e.g., through the
graded response model, or – as widely preferred and advocated in the rehabilitation research
literature – the rating scale extension of the Rasch model).
On a final methodological note, we used the Bland-Altman plot even though the SWIM score is
inherently discrete rather than continuous as assumed by the method, because – as already
12
stressed – we applied it for purely exploratory and explanatory purposes. It served them well by
exposing no systematic trend of differences between the two raters and by clearly identifying
the case with the largest disagreement, which merited additional explanation. The limits of
agreement were therefore calculated and depicted for providing useful visual context rather than
for drawing conclusions. Like all the descriptive and inferential methods, data visualisation was
thus also used in the spirit that should, in our belief, pervade any scientific research and use of
any scientific instruments or methods, i.e., cum grano salis (Vidmar, 2010).
Conclusion
The results showed that the validity of SWIM compared to the National Evaluation System of
Swimming Abilities is high up to the point where a swimmer is well adapted to water and
already able to learn some swimming techniques. Inter-rater reliability of SWIM is very high, so
we believe that SWIM could be used reliably in different practical settings to follow the
progress of swimmers, as well as for research purposes. The findings are also valuable for
planning future studies on efficacy of different programs (impact of different functional abilities
within different pathologies, length of programs, and intensity of programs). However, before
application of SWIM for scientific research purposes, further studies on its sensitivity and
internal validity are recommended.
13
References
1. Baker BO, Hardyck C, Petrinovich LF (1966). Weak measurement vs. strong statistics: an
empirical critique of S.S. Stevens' proscriptions on statistics. Educ Psychol Meas 26, 291-
309.
2. Bland JM, Altman DG (1986). Statistical methods for assessing agreement between two
methods of clinical measurement. Lancet 1(8476), 307-310.
3. Chacham A, Hutzler Y (2001). Reliability and validity of the aquatic adjustment test for
children with disabilities. Movement 6, 160-89.
4. Cliff N (1989). Ordinal consistency and ordinal true scores. Psychometrika 54, 75-91.
5. Cliff N (1996). Ordinal methods for behavioral data analysis. Mahwah, NJ: Lawrence
Erlbaum.
6. Cliff N, Keats JA (2003). Ordinal measurement in the behavioral Sciences. Mahwah, NJ:
Erlbaum.
7. Gaito J (1980). Measurement scales and statistics: resurgence of an old misconception.
Psychol Bull 87(3), 564-567.
8. García-Granero M (2007). KAPPAPLUS. http://www.listserv.uga.edu/cgi-
bin/wa?A2=ind0706&L=spssx-l&D=1&P=53665
9. García-Granero M (2009). Bland & Altman LOA (Limits Of Agreement) analysis (MACRO).
http://gjyp.nl/marta/BALOA.sps
10. Groleger Sršen K, Vrečar I, Korelc S (2008). Swimming program based on Halliwick
concept: evaluation of swimming skill progress in a group of children with motor
disabilities. Neurologia Croatica 57(S3), 4.
11. Groleger Sršen K, Vrečar I, Vidmar G (2010). The Halliwick concept of teaching
swimming and assessment of swimming skills. Rehabilitation (Ljubljana) 9(1), 32-39.
12. Humphries KM (2008). Humphries’ Assessment of Aquatic Readiness. Denton: Texas
Woman’s University, Department Of Kinesiology, Adapted Physical Education And
Activity.
14
13. Kapus V, Štrumbelj B, Kapus J, Jurak G, Šajber Pincolič D, Vute R, Bednarik J, Kapus M,
Čermak V (2002). Plavanje, učenje: slovenska šola plavanja za novo tisočletje. Ljubljana:
Faculty of Sport, Institute of Sport.
14. McMillan P (2002). The Halliwick Story. London: Halliwick Association of Swimming
Therapy. www.halliwick.org.uk/html/history.htm
15. Lord FM (1953). On the statistical treatment of football numbers. Am Psychol 8(12), 750-
751.
16. Lord, FM (1954). Further comment on "Football Numbers". Am Psychol 9(6), 264-265.
17. LuceRD, Krantz DH, Suppes P, Tversky A (1990). Foundations of measurement. (Vol. III:
Representation, axiomatization, and invariance). New York: Academic Press.
18. McMillan J, McMillan P (2006). Halliwick Association of Swimming Therapy: Foundation
Course handbook (14th ed.). London: Halliwick Association of Swimming Therapy.
19. Palisano R, Rosenbaum P, Walter S, Russell D, Wood E, Galuppi B (1997). Gross Motor
Function Classification System, Expanded and Revised. Dev Med Child Neurol 39, 214-
223.
20. Peacock K (1993). Swimming with independent measurement: manual for evaluation.
London: Halliwick Association of Swimming Therapy.
21. Schuster C (2004). A note on the interpretation of weighted kappa and its relations to other
rater agreement statistics for metric scales. Educ Psychol Meas 64, 243-253.
22. Stevens SS (1946). On the theory of scales of measurement. Science 103(2684), 677-680.
23. Stevens SS (1975). Psychophysics. New York: Wiley.
24. Tirosh R, Kats-Leurer M, Gettz M (2008). Halliwick-Based aquatic assessments: reliability
and validity. Int J Aquat Res Educ 2, 224-236.
25. Tukey JW (1961). Data Analysis and Behavioral Science or Learning to Bear the
Quantitative Man's Burden by Shunning Badmandments. In The collected works of John W.
Tukey vol. III. Belmont, CA: Wadsworth (1986); pp. 391-484.
26. Valiquette CAM, Lesage AD, Cyr M, Toupin J (1994). Computing Cohen's kappa
coefficients using SPSS MATRIX. Behav Res Methods Instrum Comput 26, 60-61.
15
27. Vidmar G (2010). Evidence in medicine. Rehabilitation (Ljubljana) 10(S1), 4-11.
28. Velleman P, Wilkinson L (1993). Nominal, ordinal, interval, and ratio typologies are
misleading. Am Stat 47, 65-72.
29. Zand Scholten A, Borsboom D (2009). A reanalysis of Lord's statistical treatment of
football numbers. J Math Psychol 53(2), 69-75.
16
Table 1: Characteristics of the children included in the inter-rater reliability study.
Characteristic Value
Gender Male 15
Female 22
Age Mean 14 years
Range 7-22 years
Diagnosis Autistic spectrum disorder 1
Cerebral vascular insult 1
Chrosomopathy 1
Down syndrome 4
Myelomeningocoela 1
Mental retardation 1
Cerebral palsy 28
GMFCS level I 3
GMFCS level II 5
GMFCS level III 5
GMFCS level IV 7
GMFCS level V 8
GMFCS, Gross Motor Function Classification System level (Palisano et al., 1997)
17
Table 2: Slovenian National Evaluation System of Swimming Abilities (short version).
Level Description of swimming ability
0 Not able to swim
1 Able to glide on the water, arms forward, face in water for 5 seconds
2 Swimming in free style for 8 m
3 Swimming in free style for 25 m, starting in water
4 Swimming in free style for 35 m, starting from the edge of pool
5 Swimming in free style for 50 m, starting from the edge of pool;
Able to change body position from prone lying to horizontal position and back to supine
position;
6 Able to swim for 10 minutes;
Norms for 50 or 100 m of freestyle by age and gender
7 Able to swim breast stroke, back stroke and freestyle, each for 50 m;
Able to jump into water head forward;
Norms by age and gender;
8 Able to swim 200 m in 5 minutes or less;
Able to swim 15 m under the water;
Mastered rescue-from-water techniques
18
Table 3: SWIM items.
Pool-skill Short description
A Water entry development: the extent of support needed for a swimmer to entry the
water at any pool setting
B Water adjustment development: the extent of support needed for a swimmer to be in
the water
C Breath control development: from being able to blow above the water to being able
to submerge and hum safely
D Balance development: being able to control body position in vertical and back float
position
E Backwards transversal rotation development: being able to control movement from
chair position (or curled) position to back float position
F Forwards transversal rotation development: being able to control movement from
back float to chair or prone float position
G Sagittal rotation development: being able to control body while moving sideways by
changing position of head and reaching with arm
H Longitudinal rotation development: being able to do longitudinal roll from back to
back float position
I Combined rotation development: being able to control combination of rotations
J Water stroke development: support needed and distance
K Exit development: the extent of support needed for a swimmer to exit the water
19
Table 4: Descriptive statistics of SWIM score for each level of the National Evaluation System
of Swimming Abilities (NESSA).
SWIM score
NESSA N Mean SD Min Max Me IQR
0 19 34.5 4.1 27 43 34 7
1 4 43.3 7.4 37 54 41 9
2 9 59.8 12.5 45 74 61 26
3 8 68.9 4.2 60 73 70 4
4 5 72.8 1.3 71 74 73 2
5 7 74.0 3.1 69 77 75 4
6 2 76.0 0.0 76 76 76 0
N, number of children; SD, standard deviation; Min, minimum; Max, maximum; Me, median;
IQR, interquartile range
20
Table 5: Difference in average score between the items on adjustment to water and breathing
control compared to the items on rotations (with and without the item on balance).
SWIM items Mean SD p(t) Median IQR p(EMP)
B + C 5.35 1.41 5.75 2.13
vs. D to I 4.50 1.94 <0.001 4.50 3.75 <0.001
vs. E to I 4.60 2.11 <0.001 4.60 4.40 <0.001
B, adjustment to water; C, breath control; D, balance; E, backwards transversal rotation; F,
forwards transversal rotation; G, sagittal rotation; H, longitudinal rotation; I, combined rotation;
SD, standard deviation; t, matched-pairs t-test; IQR, interquartile range; EMP, exact Wilcoxon
matched-pairs signed-rank
21
Table 6: Average scores of SWIM items and mean total score for both pairs of instructors.
SWIM item Mean 1 Mean 2 p(t) Median 1 Median 2 p(EMP)
A 6.00 6.05 0.324 7 7 0.625
B 5.89 5.84 0.600 7 7 0.813
C 5.49 5.57 0.324 6 6 0.531
D 5.19 5.27 0.262 5 5 0.500
E 5.05 5.16 0.103 5 5 0.219
F 5.22 5.19 0.768 6 6 1.000
G 5.00 5.22 0.058 6 6 0.094
H 4.41 4.51 0.160 6 6 0.289
I 4.57 4.54 0.768 5 5 1.000
J 5.00 4.97 0.744 7 7 1.000
K 5.27 5.22 0.571 7 7 1.000
Total 57.08 57.54 0.104 63 64 0.079
t, matched-pairs t-test; IQR, interquartile range; EMP, exact Wilcoxon matched-pairs signed-
rank test; p-values are not adjusted for multiple tests
22
Table 7: Inter-rater reliability estimates for each SWIM item and total SWIM score.
SWIM item ICC κWQ
A 0.975 0.974
B 0.922 0.920
C 0.954 0.953
D 0.953 0.952
E 0.975 0.974
F 0.958 0.957
G 0.905 0.902
H 0.983 0.982
I 0.973 0.972
J 0.979 0.978
K 0.968 0.967
Total 0.996 0.996
ICC, intraclass correlation; κWQ, weighted Cohen's Kappa with quadratic weights