Establishing a threshold for the number of missing days using 7 d pedometer data

10
Establishing a threshold for the number of missing days using 7 d pedometer data This article has been downloaded from IOPscience. Please scroll down to see the full text article. 2012 Physiol. Meas. 33 1877 (http://iopscience.iop.org/0967-3334/33/11/1877) Download details: IP Address: 161.45.234.37 The article was downloaded on 01/11/2012 at 02:03 Please note that terms and conditions apply. View the table of contents for this issue, or go to the journal homepage for more Home Search Collections Journals About Contact us My IOPscience

Transcript of Establishing a threshold for the number of missing days using 7 d pedometer data

Establishing a threshold for the number of missing days using 7 d pedometer data

This article has been downloaded from IOPscience. Please scroll down to see the full text article.

2012 Physiol. Meas. 33 1877

(http://iopscience.iop.org/0967-3334/33/11/1877)

Download details:

IP Address: 161.45.234.37

The article was downloaded on 01/11/2012 at 02:03

Please note that terms and conditions apply.

View the table of contents for this issue, or go to the journal homepage for more

Home Search Collections Journals About Contact us My IOPscience

IOP PUBLISHING PHYSIOLOGICAL MEASUREMENT

Physiol. Meas. 33 (2012) 1877–1885 doi:10.1088/0967-3334/33/11/1877

Establishing a threshold for the number of missingdays using 7 d pedometer data

Minsoo Kang, Peter D Hart and Youngdeok Kim

Department of Health and Human Performance, Middle Tennessee State University,Murfreesboro, TN, USA

E-mail: [email protected]

Received 23 March 2012, accepted for publication 15 May 2012Published 31 October 2012Online at stacks.iop.org/PM/33/1877

AbstractThe purpose of this study was to examine the threshold of the number of missingdays of recovery using the individual information (II)-centered approach. Datafor this study came from 86 participants, aged from 17 to 79 years old, whohad 7 consecutive days of complete pedometer (Yamax SW 200) wear. Missingdatasets (1 d through 5 d missing) were created by a SAS random process10 000 times each. All missing values were replaced using the II-centeredapproach. A 7 d average was calculated for each dataset, including the completedataset. Repeated measure ANOVA was used to determine the differencesbetween 1 d through 5 d missing datasets and the complete dataset. Meanabsolute percentage error (MAPE) was also computed. Mean (SD) daily stepcount for the complete 7 d dataset was 7979 (3084). Mean (SD) values for the1 d through 5 d missing datasets were 8072 (3218), 8066 (3109), 7968 (3273),7741 (3050) and 8314 (3529), respectively (p > 0.05). The lower MAPEs wereestimated for 1 d missing (5.2%, 95% confidence interval (CI) 4.4–6.0) and2 d missing (8.4%, 95% CI 7.0–9.8), while all others were greater than 10%.The results of this study show that the 1 d through 5 d missing datasets, withreplaced values, were not significantly different from the complete dataset.Based on the MAPE results, it is not recommended to replace more than twodays of missing step counts.

Keywords: pedometer, step counts, missing, individual information-centeredapproach, simulation

1. Introduction

Increasingly, objective physical activity monitoring devices (e.g., pedometers andaccelerometers) are used to measure individuals’ habitual physical activity for surveillance,screening, research and evaluation purposes (Troiano et al 2008). Pedometers are a practical

0967-3334/12/111877+09$33.00 © 2012 Institute of Physics and Engineering in Medicine Printed in the UK & the USA 1877

1878 M Kang et al

choice because they are inexpensive for researchers and less troublesome for participants(Schneider et al 2004). Pedometers provide simple data in the form of step counts and areusually recorded in units of steps/day. This form of objectively measured physical activitycan be used in research to determine steps/day gains made due to an intervention (Chan andTudor-Locke 2008, Kang et al 2009b). Steps/day can be used to evaluate physical activitypatterns in specific populations such as school-based physical education (Tudor-Locke et al2006). Also, steps/day can be used as a motivational tool to achieve a certain level of activitysuch as 10 000 steps/day (De Cocker et al 2009).

Although the use of objectively measured steps/day via a pedometer can avoid the biasseen by other subjective instruments, there is still a need to examine validity and reliability.Pedometer validity, the ability of the tool to provide an accurate assessment of step counts,pertains to the accuracy of the measurements (step counts) themselves. Many researchers haveestablished this validity through criterion-related procedures, such as comparing actual stepstaken to pedometer-determined steps (Holbrook et al 2009, Hasson et al 2009). Pedometerreliability can take on two separate forms. The first is the reliability of the pedometer as aninstrument. This form of reliability, test reliability, can be established by either using repeatedshake tests or walking on a flat surface (e.g., 100 m × 2 or more times) while controllingother factors. Holbrook et al (2009) evaluate inter-device reliability by comparing step countsacross multiple devices. Another form of reliability is the step-count reliability, which refersto the reliability of the scores themselves (behavior stability) and considers characteristics(number of days, age of participant, etc) that may affect the consistency of the measurements(Ragan and Kang 2005).

The number of pedometer wear days required to reach acceptable reliability is ameasurement issue that has been examined. Kang et al (2009a) found, by examininglongitudinal data on 365 days of participant pedometer wear, that at least 5 days of consecutivepedometer wear was necessary to achieve acceptable measurement reliability. Gretebeck andMontoye (1992) suggested that at least five or six consecutive days of pedometer monitoringare needed for a stable estimation of daily step counts. They also recommended that bothweekdays and weekend be included. Clemes and Griffiths (2008) recommended that over a 7d period of data collection is necessary for a reliable estimate of ambulatory activity in adults.A 7 d wear period is also of interest in this study because of its current use in large surveillancestudies. A 7 d period has been instituted in the US for objectively measured physical activitywith participants wearing accelerometers (Tudor-Locke et al 2011). The 7 d wear has alsobeen utilized with pedometers in a large national study in Canada (Craig et al 2010).

Due to the acceptable measurement properties of pedometer-determined step counts,the use of pedometers is rapidly increasing in physical activity research (Schneider et al2004). However, with the nature of this type of data collection, researchers are often leftwith a large amount of missing data. Individual information (II)-centered approach (Kanget al 2005, 2009c) to missing data recovery is an alternative and superior method whichrelies on an individual’s own pattern of physical activity. Specifically, the II-centered approachfocuses on replacing missing values with the mean of the remaining non-missing days ofthe participant. Although the II-centered approach has shown to be an effective missing datareplacement method, a question still remains of how many days of replacement can be madewhile maintaining valid estimates of step-count data. Few studies have examined the impactof the amount of missing days replaced on overall step counts. Therefore, the purpose ofthis study was (a) to examine the threshold of the number of missing days replaced, using theII-centered approach, for valid estimates of step-count data and (b) to examine the effectivenessof II-centered approaches which consider the types of missing days (i.e. weekdays orweekend).

Establishing a threshold for the number of missing days using 7 d pedometer data 1879

2. Methods

2.1. Participants and data

A total of 117 participants, aged from 17 to 79 years old, had 21 consecutive days ofpedometer (Yamax SW 200) wear during waking hours. Participants were recruited froma large Southeastern University and surrounding community and written consent forms wereobtained from participants under the approval of the University’s Institutional Review Board.The dataset used in this study has been previously described by Tudor-Locke et al (2003).

2.2. Missing dataset generation

Of the 117 participants, 86 participants who had 7 consecutive days of pedometer step countswere selected to examine the threshold of the number of missing days that could effectivelybe recovered using the II-centered approach. Missing data were created using three differentmissing conditions (i.e. total missing, weekdays missing and combination missing). Totalmissing data include five missing datasets with a varying number of randomly selected missingdays (i.e. 1 d to 5 d missing) without the distinction of weekdays and weekend. Weekdaysmissing data were created by selecting the random missing days only from the weekdaysand include 1 d through 3 d missing datasets. Combination missing data include 1 d through3 d missing datasets in which one random weekend was selected for a 1 d missing dataset,one random weekday and one random weekend day for a 2 d missing dataset, and two randomweekdays and one random weekend for a 3 d missing dataset (see figure 1).

In addition, each missing dataset within the three different missing conditions wasgenerated 10 000 times in which the missing days were randomly selected according to theconditions for the given missing dataset, with equal probability of being selected as missingusing the RANUNI function in SAS v 9.2.

2.3. Missing data recovery

Three types of II-centered conditions were applied for the recovery of missing values dependingon the missing conditions. Missing values from each participant in total missing data werereplaced by the average of the remaining values of the same participants. For weekdays missingdata, the average of the remaining weekday values was substituted for missing values. Lastly,missing values in combination missing data were replaced by the average of remaining valuesof weekdays or weekend depending on the type of missing values.

2.4. Data analyses

To determine the differences between mean step counts of complete data and missing datawhere missing values were replaced using the II-centered approach, pre-planned contrastingmethods in one-way repeated measure ANOVA were used with Bonferroni adjusted alphalevels of 0.008 (i.e. 0.05/6 comparisons) for total missing data and 0.017 (i.e. 0.05/3comparisons) for weekday and combination missing data, respectively. In addition, absolutepercentage error (APE) between complete data and each missing dataset was obtained andmean APE (MAPE) and 95% confidence interval (95% CI) were calculated throughout 10 000iterations. MAPE less than 10% (Basiotis et al 1987, Kang et al 2009a) was used as a cutoffcriterion for an acceptable estimate of the original mean step counts of the complete dataset.All statistical analyses were performed using SAS v 9.2.

1880 M Kang et al

Figure 1. Missing dataset generation procedures (e.g., 2 d missing dataset for total missing data,1, for weekdays missing data, 2, and for combination missing data, 3).

3. Results

Descriptive statistics for step counts by day and across complete and total missing data arepresented in table 1. Mean (SD) daily step count for the complete 7 d dataset was 7978(3083). Mean (SD) values for 1d through 5 d total missing datasets were 8012 (3142), 7913(3148), 7987 (3279), 7742 (3099) and 8115 (3874), respectively. There were no significantdifferences in average step counts between the total missing data compared to complete data(p > 0.008). Tables 2 and 3 contain descriptive statistics of step counts across weekdaysmissing and combination missing data, respectively. In weekdays missing data, where themissing values were randomly selected among weekdays, there were no significant differencesin average step counts for complete data across 1 d through 3 d missing datasets (p > 0.017).Likewise, the average step counts of combination missing data did not significantly differ fromthe complete data (p > 0.017).

Establishing a threshold for the number of missing days using 7 d pedometer data 1881

Table 1. Descriptive statistics of daily step-count information in total missing data.

Complete data Total missing data

(N = 86) 1 d missing 2 d missing 3 d missing 4 dmissing 5 d missing

Days Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD

Monday 8246 4091 8121 4051 7904 4023 8212 3789 7666 3243 8189 4123Tuesday 8081 4865 8078 4658 7906 4404 8114 3912 7716 3458 8117 3980Wednesday 8732 4583 8680 4432 8393 3938 8284 3947 8014 3544 8032 4071Thursday 7886 3821 7903 3808 7975 3788 8105 3855 7860 3390 8406 3974Friday 8390 4451 8338 4427 8196 4025 8371 4219 7913 3606 8400 4220Saturday 8036 4577 8178 4486 8274 4079 7786 3909 7760 3910 8213 4193Sunday 6480 3704 6790 3731 6747 3564 7094 3641 7269 3521 7453 3944Total 7979 3084 8013 3143 7914 3148 7987 3280 7743 3100 8116 3875

Table 2. Descriptive statistics of daily step-count information in weekday missing data.

Complete data Weekday missing data

(N = 86) 1 d missing 2 d missing 3 d missing

Days Mean SD Mean SD Mean SD Mean SD

Monday 8246 4091 8342 4186 8546 4114 8445 3900Tuesday 8081 4865 8019 4577 8332 4672 8417 4099Wednesday 8732 4583 8500 4241 8514 4225 8579 3795Thursday 7886 3821 8056 3890 8128 3820 8384 3684Friday 8390 4451 8363 4269 8075 4074 8471 4114Saturday 8036 4577 8036 4577 8036 4577 8036 4577Sunday 6480 3704 6480 3704 6480 3704 6480 3704Total 7979 3084 7971 3113 8016 3267 8118 3202

Table 3. Descriptive statistics of daily step-count information in combination missing data.

Complete data Combination missing data

(N = 86) 1 d missing 2 d missing 3 d missing

Days Mean SD Mean SD Mean SD Mean SD

Monday 8246 4091 8246 4091 8313 3652 8131 3190Tuesday 8081 4865 8081 4865 8167 4747 7914 4076Wednesday 8732 4583 8732 4583 8553 4433 8598 4127Thursday 7886 3821 7886 3821 7838 3867 8256 3981Friday 8390 4451 8390 4451 8476 4160 8194 4077Saturday 8036 4577 7080 4238 7282 4411 7308 4472Sunday 6480 3704 7080 4238 7282 4411 7308 4472Total 7979 3084 7833 3091 7987 3243 7958 3322

Figure 2 depicts the MAPE of average step counts between complete data and each datasetin total missing data. The lower MAPEs were estimated for 1 d missing (5.2%, 95% CI 4.4–6.0) and 2 d missing (8.4%, 95% CI 7.0–9.8). Although 3 d missing had a low MAPE, itsCI overlapped the 10% cutoff (9.8%, 95% CI 7.9–11.8). All other datasets were greater than10%. Similar results were found for both weekdays missing and combination missing datawhere replacing a maximum of 2 d missing values with the average of remaining weekdays orweekend, depending on the type of missing value, had less than 10% of MAPEs (see figure 3).

1882 M Kang et al

Figure 2. Mean absolute percentage error (MAPE) of average step counts by different number ofmissing values in total missing data.

Figure 3. MAPE of average step counts by different number of missing values in weekdays missingand combination missing data.

4. Discussion

The primary purpose of this study was to examine the threshold of the number of missing daysreplaced, using the II-centered approach, for valid estimates of step-count data. This evaluationis different from the examination of ‘How many days of pedometer wear are enough to reliablyestimate step-count data?’ This study made the assumption that seven days of pedometer wearwas a sufficient time block to reliably estimate the ambulatory activity (Clemes and Griffiths2008, Gretebeck and Montoye 1992, Kang et al 2009a). The current focus was, however, todetermine how many missing days of pedometer wear could be replaced while maintainingvalid estimates of step-count data. It was determined, using MAPE > 10% as a criterion cutoff,that a maximum of two days out of seven can be effectively replaced.

Establishing a threshold for the number of missing days using 7 d pedometer data 1883

The replacement of missing step-count days, using the II-centered approach, is an effectiveand superior method in comparison to the more common method of using group information(GI). Kang et al (2005) showed, by using the root-mean-square difference (RMSD), a measureof accuracy where smaller values indicate greater accuracy, that the II-centered approachshowed less error than the GI-centered approach. The same study provided further evidencefor the use of II-centered replacement by showing smaller mean signed difference (MSD)values compared to the same GI-centered approach. Kang et al (2009c) confirmed the II-centered approach, showing both smaller RMSD and MSD values, compared to the GI-centeredapproach.

The secondary purpose of this study was to examine the effectiveness of the II-centeredapproach considering the types of missing days (i.e. weekdays or weekend). An examinationof the types of missing days initiated from the idea that individuals participate in differentamounts of physical activity during weekdays and the weekend. Kang et al (2005) found thatreplacing missing days by the mean of the remaining weekdays or weekend, depending onwhen the missing days occurred, was most effective in recovering the missing values. Analternative finding was noted, however, from Kang et al (2009c) by examining no differencesbetween the means from a complete dataset and the means from a dataset that replaced usingthe remaining weekdays or weekend, depending on when the missing days occurred. Thepossible cause for such inconsistent findings may lie in the inherent differences between thesamples, that is, if one study sample has a significant difference in mean step counts betweentheir weekdays and weekend, then that sample is likely to require a combination (i.e. weekdaysor weekend) II-centered approach. For the current study, the step counts were significantly(t(85) = 2.79, p = 0.006) higher for weekdays (M: 8267, SD: 3449) than for weekend days(M: 7258, SD: 3401). Therefore, it is suggested that a simple paired t-test can be used beforeselecting either a total or combination II-centered approach to missing step counts. In somecases, a t-test may be needed to determine the extent to which Saturday and Sunday stepcounts differ. In such cases when Saturday step counts are more similar to weekday stepcounts than to Sunday step counts, then the weekday mean may be used to replace Saturdaydata.

This study has many strengths worth mentioning. First, the study’s design was built usinga semi-simulation to create missing days. Since data for this study came from participantswearing actual pedometers, the inferences are strengthened compared to other missing datastudies that use full simulation methods. Second, the threshold analysis across the threedifferent missing conditions (total missing, weekday missing and combination missing)allowed for an examination of the potential bias introduced when replacing missing stepcounts by certain days of the week. Finally, the missing day simulation across each conditioninvolved creating an extreme situation of missing data where each participant in the samplehad missing days. Therefore, the results of this study should be considered conservative andthe MAPEs should be considered as upper-bound estimates.

This study is not without limitations. Although the missing days of step counts werecreated via a computerized random generator, the process itself did not account for a specificunderlying missing mechanism belonging to individuals who have missing data. A futurestudy investigating the II-centered approach to missing data replacement should considermechanisms such as physical activity levels, sedentary behaviors, or overweightness that cangive rise to the missingness of pedometer step counts. This study is also limited to the analysisof missing data across a 7 d period. Variations in the physical activity level may exist overlonger time periods, as monthly trends in step activity have been noted in healthy middle-agedadults (Kang et al 2012). In addition, variations in step counts have been observed acrossseasons and between genders, potentially raising the number of days required for reliable

1884 M Kang et al

estimates of physical activity (Togo et al 2008). Future studies should focus on more diverseranges of pedometer wear such as 3 d, 30 d, etc as well as wear across seasons and genders.Finally, this study is limited to the analysis of missing data replacement of step counts froma single pedometer model (i.e. Yamax SW 200). Future studies should include other modelsso as to strengthen the generalization of missing data replacement recommendations using theII-centered approach to various situations.

5. Conclusions

The results of this study show that the 1 d through 5 d missing datasets, with replacedvalues, were not significantly different from the complete dataset. Based on the meanabsolute percentage error (MAPE) results, it is not recommended to replace missing stepcounts for 3 or more days of missing data when 7 d pedometer data are used (MAPE >

10%). In addition, more accurate estimates would result when no more than 2 d missingvalues in weekdays were replaced by the average of remaining weekdays. Although the II-centered approach of differentiating missing weekdays and weekend (combination missing)may be less advantageous than the traditional II-centered approach (total missing), it mayprovide more unbiased estimates when missing data occur on both weekdays and weekend.Further research is suggested to analyze missing step-count data for study periods otherthan 7 d.

References

Basiotis P P, Welsh S O, Cronin F J, Kelsay J L and Mertz W 1987 Number of days of food intake records requiredto estimate individual and group nutrient intakes with defined confidence J. Nutr. 117 1638–41

Chan C B and Tudor-Locke C 2008 Real-world evaluation of a community-based pedometer intervention J. Phys.Act. Health 5 648–64

Clemes S A and Griffiths P L 2008 How many days of pedometer monitoring predict monthly ambulatory activity inadults? Med. Sci. Sports Exerc. 40 1589–95

Craig C L, Tudor-Locke C, Cragg S and Cameron C 2010 Process and treatment of pedometer data collection foryouth: the Canadian physical activity levels among youth study Med. Sci. Sports Exerc. 42 430–5

De Cocker K, De Bourdeaudhuij I, Brown W and Cardon G 2009 Moderators and mediators of pedometer use andstep count increase in the ‘10 000 Steps Ghent’ intervention Int. J. Behav. Nutr. Phys. Act. 12 3–9

Gretebeck R J and Montoye H J 1992 Variability of some objective measures of physical activity Med. Sci. SportsExerc. 24 1167–72

Hasson R E, Haller J, Poder D M, Staudenmayer J and Freedson P S 2009 Validity of the Omron HJ-112 pedometerduring treadmill walking Med. Sci. Sports Exerc. 41 805–9

Holbrook E A, Barreira T V and Kang M 2009 Validity and reliability of Omron pedometers for prescribed andself-paced walking Med. Sci. Sports Exerc. 41 669–73

Kang M, Bassett D R, Barreira T V, Tudor-Locke C, Ainsworth B, Reis J P, Strath S and Swartz A 2009a How manydays are enough? A study of 365 days of pedometer monitoring Res. Q. Exerc. Sport 80 445–53

Kang M, Bassett D R, Tudor-Locke C, Barreira T V and Ainsworth B 2012 Measurement effects of seasonal andmonthly variability on pedometer-determined data J. Phys. Act. Health 9 336–43

Kang M, Marshall S J, Barreira T V and Lee J O 2009b Effect of pedometer-based physical activity interventions: ameta-analysis Res. Q. Exerc. Sport 80 648–55

Kang M, Rowe D A, Barreira T V, Robinson T S and Mahar M T 2009c Individual information-centered approachfor handling physical activity missing data Res. Q. Exerc. Sport 80 131–7

Kang M, Zhu W, Tudor-Locke C and Ainsworth B 2005 Experimental determination of effectiveness of an individualinformation-centered approach in recovering step-count missing data Meas. Phys. Educ. Exerc. Sci. 9 233–50

Ragan B G and Kang M 2005 Reliability: current issues and concerns Athletic Ther. Today 10 30–33Schneider P L, Crouter S E and Bassett D R 2004 Pedometer measures of free-living physical activity: comparison

of 13 models Med. Sci. Sports Exerc. 36 331–5Togo F, Watanabe E, Park H, Yasunaga A, Park S, Shephard R J and Aoyagi Y 2008 How many days of pedometer

use predict the annual activity of the elderly reliably? Med. Sci. Sports Exerc. 40 1058–64

Establishing a threshold for the number of missing days using 7 d pedometer data 1885

Troiano R P et al 2008 Physical activity in the United States measured by accelerometer Med. Sci. Sports Exerc.40 181–8

Tudor-Locke C, Ainsworth B E, Whitt M C, Thompson R W, Addy C L and Jones D A 2003 Ambulatory activityand simple cardiorespiratory parameters at rest and submaximal exercise Can. J. Appl. Physiol. 28 699–709

Tudor-Locke C, Lee S M, Morgan C F, Beighle A and Pangrazi R P 2006 Children’s pedometer-determined physicalactivity during the segmented school day Med. Sci. Sports Exerc. 38 1732–8

Tudor-Locke C, Leonardi C, Johnson W D, Katzmarzyk P T and Church T S 2011 Accelerometer steps/day translationof moderate-to-vigorous activity Prev. Med. 53 31–33