A Meta-Analysis of the Effects of Haptic Interfaces on Task Performance with Teleoperation Systems

15
NITSCH ET AL.: A META-ANALYSIS OF THE EFFECTS OF HAPTIC INTERFACES ON TASK PERFORMANCE WITH TELEOPERATION SYSTEMS 1 A Meta-Analysis of the Effects of Haptic Interfaces on Task Performance with Teleoperation Systems V. Nitsch and B. Färber Abstract—Human task performance with teleoperation systems is characterized by long task completion times, handling errors and excessive force application to objects in the remote environment. Haptic interfaces promise to address these challenges by providing the human user with sensory feedback from the remote environment that would otherwise be lacking. Up to now, only few attempts have been made to present current research efforts from a broader, more integrative perspective. To address this need, several meta-analyses were conducted which aimed at establishing the overall effectiveness of haptic interfaces in improving the critical performance aspects in teleoperation systems. In this context, the influence of potential moderator variables (i.e. virtual vs. real teleoperation setup; vibrotactile vs. kinaesthetic force feedback) as well as outcome-specific effects (i.e. force regulation ability; task completion time; performance errors) were investigated. Index Terms—Evaluation/Methodology, Haptic I/O, Human Factors, Operator Interfaces. —————————— —————————— 1 INTRODUCTION ver the past decades, the employment of teleopera- tion systems has spread to numerous domains. Tel- eoperated vehicles have been employed in warfare [1] and underwater exploration [2]. Robot-assisted teleopera- tion systems are frequently applied to surgery [3], and attempts have been made to use teleoperated systems for live-line maintenance [4], search & rescue operations [5], education [6], and care of the disabled and the elderly [7]. Recent years have witnessed a surge in the development of multi-modal human-machine interfaces in an effort to improve the operability of these systems. The majority of modern multi-modal interfaces speak predominantly to the visual and auditory senses. Increasingly, solutions to currently observed performance problems with teleopera- tion systems are advanced that focus on the incorporation of the haptic sensory modality into the human-machine interface. An in-depth review of presently available literature sug- gests that haptic interface research is a rather fragmented field. A plethora of studies has been conducted featuring a multitude of haptic devices, investigated tasks and ex- perimental methodologies. Typically, each study focuses on one particular system only, while studies which com- pare human work performance with several different devices are virtually non-existent. Furthermore, since the work domain (e.g. surgery, micro- or macro-assembly), the task domain (e.g. pick-and-place, tracking, suturing), the experimental methodology and the experimental apparatus vary for virtually every study that has been conducted in the field of haptic teleoperation applica- tions, it is difficult to ascertain the overall effectiveness of a particular haptic application. In the absence of an over- arching theoretical framework that might support empiri- cal research and make predictions regarding the potential effectiveness of haptic interfaces in teleoperation systems, a synthesis of available research evidence is required, so that it may be determined whether a particular applica- tion of haptic signals is found to be effective (or ineffec- tive) in a variety of settings, rather than in one particular experimental setup. Aiming to address this need to synthesise current re- search findings, the present study features a series of meta-analyses, which assess the magnitude and the extent of effects of haptic interfaces on human work perfor- mance with teleoperation systems. In addition to ascer- taining the overall effect strength of haptic interfaces in improving work performance, possible dependencies on the experimental setup (real teleoperation vs. VR) and haptic feedback type (kinaesthetic force feedback vs. vi- brotactile feedback) are investigated. Also of interest are the effects of haptic feedback on specific performance aspects (i.e. human force regulation, performance speed, handling errors). The present work also aims at offering a general introduction to meta-analysis. In so doing, its merits and challenges are discussed. It is hoped that the present work will raise awareness of the need to publish material and to report effects that are non-significant or that are contrary to the stipulated hypotheses. xxxx-xxxx/0x/$xx.00 © 200x IEEE ———————————————— V. Nitsch and B. Färber are with the Human Factors Institute, Universität der Bundeswehr München, 85577 Neubiberg, Germany. E-mail: [email protected], [email protected] Manuscript received (October 1 st , 2012). 2013 IEEE Transactions On Haptics. 6(4), pp. 387-398. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. O

Transcript of A Meta-Analysis of the Effects of Haptic Interfaces on Task Performance with Teleoperation Systems

NITSCH ET AL.: A META-ANALYSIS OF THE EFFECTS OF HAPTIC INTERFACES ON TASK PERFORMANCE WITH TELEOPERATION SYSTEMS 1

A Meta-Analysis of the Effects of Haptic Interfaces on Task

Performance with Teleoperation Systems V. Nitsch and B. Färber

Abstract—Human task performance with teleoperation systems is characterized by long task completion times, handling errors and excessive force application to objects in the remote environment. Haptic interfaces promise to address these challenges by providing the human user with sensory feedback from the remote environment that would otherwise be lacking. Up to now, only few attempts have been made to present current research efforts from a broader, more integrative perspective. To address this need, several meta-analyses were conducted which aimed at establishing the overall effectiveness of haptic interfaces in improving the critical performance aspects in teleoperation systems. In this context, the influence of potential moderator variables (i.e. virtual vs. real teleoperation setup; vibrotactile vs. kinaesthetic force feedback) as well as outcome-specific effects (i.e. force regulation ability; task completion time; performance errors) were investigated.

Index Terms—Evaluation/Methodology, Haptic I/O, Human Factors, Operator Interfaces.

—————————— ——————————

1 INTRODUCTION ver the past decades, the employment of teleopera-tion systems has spread to numerous domains. Tel-

eoperated vehicles have been employed in warfare [1] and underwater exploration [2]. Robot-assisted teleopera-tion systems are frequently applied to surgery [3], and attempts have been made to use teleoperated systems for live-line maintenance [4], search & rescue operations [5], education [6], and care of the disabled and the elderly [7]. Recent years have witnessed a surge in the development of multi-modal human-machine interfaces in an effort to improve the operability of these systems. The majority of modern multi-modal interfaces speak predominantly to the visual and auditory senses. Increasingly, solutions to currently observed performance problems with teleopera-tion systems are advanced that focus on the incorporation of the haptic sensory modality into the human-machine interface.

An in-depth review of presently available literature sug-gests that haptic interface research is a rather fragmented field. A plethora of studies has been conducted featuring a multitude of haptic devices, investigated tasks and ex-perimental methodologies. Typically, each study focuses on one particular system only, while studies which com-pare human work performance with several different

devices are virtually non-existent. Furthermore, since the work domain (e.g. surgery, micro- or macro-assembly), the task domain (e.g. pick-and-place, tracking, suturing), the experimental methodology and the experimental apparatus vary for virtually every study that has been conducted in the field of haptic teleoperation applica-tions, it is difficult to ascertain the overall effectiveness of a particular haptic application. In the absence of an over-arching theoretical framework that might support empiri-cal research and make predictions regarding the potential effectiveness of haptic interfaces in teleoperation systems, a synthesis of available research evidence is required, so that it may be determined whether a particular applica-tion of haptic signals is found to be effective (or ineffec-tive) in a variety of settings, rather than in one particular experimental setup.

Aiming to address this need to synthesise current re-search findings, the present study features a series of meta-analyses, which assess the magnitude and the extent of effects of haptic interfaces on human work perfor-mance with teleoperation systems. In addition to ascer-taining the overall effect strength of haptic interfaces in improving work performance, possible dependencies on the experimental setup (real teleoperation vs. VR) and haptic feedback type (kinaesthetic force feedback vs. vi-brotactile feedback) are investigated. Also of interest are the effects of haptic feedback on specific performance aspects (i.e. human force regulation, performance speed, handling errors). The present work also aims at offering a general introduction to meta-analysis. In so doing, its merits and challenges are discussed. It is hoped that the present work will raise awareness of the need to publish material and to report effects that are non-significant or that are contrary to the stipulated hypotheses.

xxxx-xxxx/0x/$xx.00 © 200x IEEE

————————————————

• V. Nitsch and B. Färber are with the Human Factors Institute, Universität der Bundeswehr München, 85577 Neubiberg, Germany. E-mail: [email protected], [email protected]

Manuscript received (October 1st, 2012).

2013 IEEE Transactions On Haptics. 6(4), pp. 387-398. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.

O

2 IEEE TRANSACTIONS ON HAPTICS, TH-2011-12-0107.R2

1.1 The theoretical potential of haptic interfaces to improve work performance with teleoperation systems

Despite the many advantages of teleoperation systems, existing systems pose a number of challenges for the hu-man operator, in turn limiting their large-scale industrial deployment. For one, it is much more difficult to coordi-nate the movements of the teleoperator unit in the remote environment than it is to coordinate one’s limb move-ments. In fact, Zhai & Senders [8] observed that around one quarter of all movements with a multiple-degree of freedom (DOF) system is uncoordinated. Deml [9] found in a more challenging task that only two-thirds of move-ments can be considered coordinated. As a result, it gen-erally takes much more time to perform a task with a teleoperation system than it would if the task was per-formed manually ( [10], [11], [12]). Another consequence of the poor coordination of the teleoperator’s movements is that performance tends to be much more error prone ( [13], [14]). Another challenge remains the precise regula-tion of forces that are applied in the event of contact with an object or surface in the remote environment. Thus, there is a distinct risk that the teleoperator itself and any material in the remote environment that comes into con-tact with the teleoperator, might be damaged due to ex-cessive forces applied by the user ( [15], [16]). Safety mechanisms that automatically incapacitate the teleopera-tor in the event of excessive forces can be implemented into most teleoperation systems; however, frequent trig-gering of these mechanisms would further delay work performance and may not always prevent damage to the material.

It has been suggested that haptic human-machine inter-faces can ameliorate observed performance problems with teleoperation systems in a number of ways. For ex-ample, displays that convey haptic information to the user which would otherwise be lacking may adequately compensate for this loss of information from the remote environment and thereby enhance the user’s ability to control applied forces precisely, thus reducing the risk of excessive forces on the teleoperator side [17]. By provid-ing cues of depth in a two-dimensional depiction of the remote environment (e.g. via video feed), haptic feedback may also improve the user’s ability to coordinate the teleoperator’s movements by resolving sensory ambiguity [18].

1.2 Experimental evidence on the effectiveness of haptic interfaces

It seems that, up to now, only few attempts have been made to present the many bits of existing research from a broader, more integrative perspective. Tan, Eberman, Srinivasan & Cheng [19], and MacLean [20] independent-ly reviewed haptic interface technology available at the time. In their reviews, the authors examined and catego-rised physical characteristics and demands of haptic inter-

faces and offered guidelines for their effective design. Neither of these reviews, however, disseminated availa-ble empirical evidence on task performance with these systems. As such, the suggested guidelines focused on the design of physical characteristics of the interface, rather than the effective employment of existing haptic technol-ogy.

Only few reviews considered the effect of haptic signals on measures of human performance with a technological system. Among them, Hale & Stanney [21] disseminated psychophysical, neurological and physiological studies that involved haptic displays. Based on their selected studies, they proposed guidelines for the design of haptic and multimodal displays and theorised regarding possi-ble beneficial effects of various forms of tactile and kin-aesthetic feedback on texture perception, 2D/3D form perception and spatial awareness. Since teleoperation systems were not considered, however, explicit measures of force regulation and movement coordination with these systems did not feature in their review.

Jones & Sarter [22] presented a detailed review of vi-brotactile devices. In their review, the authors formulated guidelines that advise on the most effective employment of vibrotactile devices based on their consideration of previous studies on human vibrotactile perception, gen-eral mechanical properties of vibrotactile display technol-ogy, and human performance with these systems. Yet, similarly to Hale & Stanney [21], the authors considered neither force feedback devices nor important measures of the effect of tactile displays on force regulation.

Crucially, none of the previously mentioned reviews considered the strength of effects that were found in the various pieces of research that the authors disseminated, nor did they produce indicators of overall effect strength of particular applications. Yet, the consideration of the effect strength of individual findings, as well as the indi-cation of an overall effect strength of an application is important for two reasons. For one, when considering multiple studies on the same topic that differ in their results, effect strength can serve to judge which findings are more reliable and which ones are less so. Further-more, without an indication of the overall effect strength of a particular application, it is difficult to judge whether this application can be recommended for the industrial employment of haptic interfaces, considering that mar-ginal improvements of human performance may not nec-essarily be worthwhile a substantial financial investment.

Burke, Prewett, Gray, et al. [23] were among the few re-searchers who considered the strength of effect that hap-tic signals exert on human performance by conducting a meta-analysis. Specifically, the authors attempted to summarise and evaluate empirical evidence on general effects of visual-tactile feedback on certain performance measures such as reaction time and error rate. In a meta-

NITSCH ET AL.: A META-ANALYSIS OF THE EFFECTS OF HAPTIC INTERFACES ON TASK PERFORMANCE WITH TELEOPERATION SYSTEMS 3

analysis of 43 studies, they demonstrated that the multi-modality of visual-tactile feedback provides a significant, albeit tentative, advantage over using an exclusively vis-ual feedback system, whereby the extra modality was found to improve reaction time, but not error rate. How-ever, the authors did not differentiate between different applications of haptic technology and focused mostly on studies that investigated two-dimensional human-computer interaction. As such, their findings can only marginally inform on the effective use of particular haptic applications in teleoperation systems, which typically entail three-dimensional human-machine interaction.

2 META-ANALYSIS AS AN INVESTIGATIVE TOOL

Meta-analyses allow for a meaningful numerical compar-ison and analysis of studies that use different measures and manipulations [24]. In essence, a meta-analysis com-bines standardised effect sizes over a number of studies to compute a summary effect that represents the weighted mean of the individual effects, which were found in each of the included studies. The weights that are assigned to the individual effect sizes, and thus the emphasis that is placed on each of the included studies, are determined based on assumptions about the distribu-tion of effect sizes from which the studies were sampled, and are calculated using established procedures [25]. In simple terms, the summary effect is the mean of the effect sizes of the included studies, with more weight assigned to the more precise studies. This summary effect size statistic may be used to determine the statistical signifi-cance of an overall effect, as well as indicate both the direction and the magnitude of a relationship between an independent variable and an outcome measure. Further insight into the meaningfulness of an effect can be ob-tained from indicators of the accuracy of this summary effect as an estimate of the true effect, as well as measures of the consistency of a particular effect across all sampled studies [25].

Meta-analysis offers a number of advantages over tradi-tional, narrative literature reviews, which focus on the reported statistics of individual studies. Since a meta-analysis typically aggregates the effect sizes of a larger number of studies, it has the advantage that it is relatively unaffected by small sample size and other methodologi-cal confounds that are often associated with individual studies. This makes meta-analysis results much more robust than individual studies. Moreover, in contrast to narrative literature reviews, where the authors’ criteria for selecting and emphasising particular studies over others are usually implicit, meta-analysis aims to use transparent criteria so that other researchers may repro-duce it. Hence, meta-analysis offers the benefit of provid-ing a quantitative and transparent basis for the evaluation of current practices, which makes meta-analyses especial-ly attractive for the formulation of practice guidelines [24].

2.1 Data selection criteria

Meta-analysis has often been termed a Garbage-In-Garbage-Out (GIGO) technique, which means to imply that the results of this analysis are only as useful as the data that is fed into it. In the selection of data that pro-vides input for the meta-analysis, the researcher is forced to make a number of decisions, all of which have conse-quences for the meaningfulness of the results. It is there-fore vital that the underlying decision processes are made transparent. Hence, in the following section, general se-lection criteria that were considered during the literature review process of the present meta-analysis are described, before details of the analysis itself are described in the Method Section. 2.2 Research quality

For one, the level of research quality of each study needs to be considered, before it is included in a meta-analysis. Generally speaking, the lower the quality of a particular study (e.g. due to methodological flaws), the less reliable the result of that study will be. While meta-analysis is fairly robust to the effects of methodological flaws of a few studies, obviously, the meaningfulness of the results of a meta-analysis will diminish if these are based on a large number of flawed studies. On the other hand, if the quality criteria are too stringent, only few studies are likely to meet them and thus be eligible for inclusion. Yet, a number of experts in the field have argued that the inclusion of a greater number of studies of reasonable quality is preferable to the selection of few studies that are of high quality, as long as the criteria for selection are specified [26, p. 675]. The challenge is thus to set an ac-ceptable level of research quality, which applies to a larg-er number of studies.

For the present meta-analysis, it was decided that studies would not be included in the meta-analysis if they are suspected to present evidence of low reliability or validi-ty, for example, as they omitted reports of the chosen randomisation technique, standardised instructions, the use of a practice/familiarisation technique or the statisti-cal treatment of outliers. Similarly, studies which fea-tured insufficiently small sample sizes, inadequately op-erationalised dependent variables, or selectively reported statististics (which were usually in favour of significant effects) were also not considered for inclusion.

2.3 Type of effect

An important criterion for meta-analysis is the independ-ence of the individual effect sizes that are included. That is, for each study that is eligible for inclusion, only one effect statistic should be included in the meta-analysis. Since most studies report statistics on a number of anal-yses, another decision hence needs to be made on the inclusion of specific effect sizes. While typically, meta-analyses are conducted in order to determine average mean effect strengths of a particular variable, it was de-

4 IEEE TRANSACTIONS ON HAPTICS, TH-2011-12-0107.R2

cided that the present analyses should aim at determining average maximum effect strengths instead. This means, that if several effect sizes are presented for a particular outcome measure, the largest effect, rather than a mean calculation of all reported effects, should be included in the analysis. This choice was made based on two reasons. For one, effect sizes can only be accurately determined if the appropriate statistics are reported in the literature. Yet, unfortunately, it has become common practice to report only statistics for significant results, while non-significant findings, if reported at all, are typically only mentioned as such in the text. In these cases, effect sizes would be calculated based on the assumption that p =.05, thus leading to inflated effect sizes and consequently an overestimation of the average effect of haptic signals. Secondly, since measures of effect strength lend them-selves particularly well to cost-benefit analyses [24], the results of the meta-analysis featured in this work may be used to judge the financial impact of a particular invest-ment. It may be reasoned that if the maximum effect of an application is not found to be effective in improving task performance, further investigations and investments to this aim are surely not worth making. Thus, not only can the maximum effect strength be more accurately estimat-ed than the average effect strength, one might also argue that it makes economic sense to determine the maximum return on investments (ROI) into new technologies before deciding whether it is worthwhile to make an investment.

2.4 Construct definitions

When it comes to the selection of studies, further deci-sions need to be made in the definition of the investigated constructs, i.e. independent and dependent variables. For example, a decision needs to be made whether all studies that feature any form of haptic feedback are eligible for inclusion or whether only those that feature a specific haptic device or a particular sample (e.g. expert users) should be included. Similarly, do studies qualify for in-clusion if they investigate the effect of haptic signals on any type of performance measure or only on specific cri-teria? While the primary strength of meta-analysis lies in the synthesis of a wide range of studies that differ in their scope and methodology, critics have argued that an over-ly heterogeneous selection of studies further curtails the meaningfulness of results (the apples-and-oranges argu-ment) [26, p. 675]. The challenge in this regard is therefore to decide on an acceptable level of heterogeneity in the sample of selected studies, so that a range of studies with different setups are included in the meta-analysis, but a synthesis of their results is still conceptually meaningful. The following section hence specifies the construct defini-tions that are applied to the present meta-analysis.

2.4.1 Main analysis: Feedback modality

For the main analysis, two principal categories of feed-back modality are differentiated: “haptic feedback pre-sent” and “haptic feedback absent”, whereby haptic feed-back only refers to the transmission of task-relevant in-

formation. The “haptic feedback present” category en-compasses studies, that utilised either (kinaesthetic) force feedback or vibrotactile feedback in order to convey to the human operator a haptic impression of contact with ob-jects in the remote remote environment. Studies were only included, if they contrasted the effects of the utilised haptic feedback with a control condition, in which haptic feedback was absent.

Although haptic displays exist in a multitude of different forms, it was decided to focus on haptic feedback devices which are commonly used to signal or simulate forces from the remote environment, and which were found explicitly to have an effect on teleoperator movement coordination and force regulation. Hence, studies are considered for inclusion if they featured one of two appli-cations of haptic signals. The applications of interest were (a) vibrotactile feedback and (b) force feedback.

a) The category vibrotactile feedback encompasses studies that made use of any devices that provide directed or diffuse vibrations in order to alert to or inform the user of a contact in the remote en-vironment. This vibrotactile feedback may only provide a general indication of contact, or inform the user additionally of direction and intensity of contact. The literature review suggested that ap-plications for vibrotactile feedback seem to have focused largely on two-dimensional human-computer interaction. However, since the focus of the present work is on human-machine inter-action in teleoperation systems, it was decided that only studies that investigated this haptic ap-plication within a teleoperation context would be taken into consideration.

b) Force feedback devices portray the forces that would be encountered in direct contact with the remote environment and can stop the motion of a user by applying force via various mechanical so-lutions [27]. Into the force feedback category fall studies which employed admittance- or imped-ance-type (kinaesthetic) force feedback devices of any output-capability in order to convey to the user haptically that a contact of the teleoperator unit with an object in the virtual or remote envi-ronment has occurred.

Please note that no assumption is made regarding other sensory channels which may provide complimentary or contradictory information to that transmitted via the hap-tic modality. This decision was made as oftentimes, non-haptic sensory input that might affect a participant’s task performance is insufficiently described to allow for fur-ther differentiation in studies (e.g. if the student is not deprived of visual feedback- do they receive task relevant or irrelevant information? What is the quality of the visu-al feedback? Do participants receive auditory feedback that might affect their performance?). Hence, it was de-

NITSCH ET AL.: A META-ANALYSIS OF THE EFFECTS OF HAPTIC INTERFACES ON TASK PERFORMANCE WITH TELEOPERATION SYSTEMS 5

cided not to differentiate any further between studies based on non-haptic sensory input in favour of a larger study pool.

2.4.2 Moderator variables If a meta-analysis indicates a high amount of variance in the results, further moderator analyses may be conduct-ed, in which parts of the collected data are analysed sepa-rately, depending on their relationship to a suspected moderating variable. If this procedure markedly reduces the observed variance in the results, one may suspect that the moderator variable influences the effects of the inves-tigated independent variable on the outcome measure(s). For the present meta-analysis, two moderator variables were considered: feedback type and experimental setup. In addition, it was investigated whether haptic feedback would differentially affect different types of performance indices (i.e. task completion time, force regulation, han-dling errors). Possible interdependencies of these modera-tor variables were also of interest.

2.4.2.1 Feedback type: Vibrotactile feedback vs. force feedback

Vibrotactile and force feedback signals are both suited to convey information from the remote environment that would be present in a direct interaction, but lacking dur-ing remote operation. However, they differ in their effect on human perception. Vibratory signals are well suited to signalling a sudden state transition (e.g. from movements in free space to an encounter with an object) as they tend to bear no direct physical relation to properties encoun-tered in the remote environment [28]. In contrast, force feedback signals are arguably better suited to inform the user of encountered force intensity as they simulate natu-ral feedback. Hence it was decided to investigate, wheth-er haptic interfaces which provide vibrotactile feedback would differ in their effect on work performance from those that provide force feedback.

2.4.2.2 Experimental setup: teleoperation vs. virtual reality

The context, in which a haptic interface is utilised, was deemed another potential moderator variable. Specifical-ly, it was previously suggested that haptic feedback can be implemented more easily in virtual simulations than in teleoperation applications. Yet, there are reasons to be-lieve that task performance in virtual environments may differ from that observed with real teleoperation systems. For one, it is conceivable that users of real teleoperation systems might behave more carefully as they are aware that performance errors can quickly result in costly dam-age of the teleoperator and objects in the remote envi-ronment. In contrast, performance errors in virtual simu-lations are generally of no substantial consequence as, at worst, they require a system reboot. Second, the tasks that users are required to perform also tend to be more ab-stract in studies that utilise VR environments than in studies which feature real teleoperation systems. As such,

it is conceivable that the lack of realism in VR systems may in some form affect task performance and effect differences in the observed effectiveness of haptic inter-faces. It was therefore investigated whether haptic feed-back is equally effective in virtual environments and tele-operation applications. Studies are classified as teleopera-tion applications if they feature a real teleoperator in the remote environment. Note that this category also includes teleoperation systems, in which the contact forces in the remote environment are virtually simulated rather than directly transmitted.

2.4.2.3. Task performance outcomes

It was further decided that only studies would be consid-ered for inclusion if they featured task performance out-comes that encompass measures of force regulation, task completion time and/or error rate, as these are perfor-mance indices that would most likely benefit from the addition of haptic feedback from the remote environment (see Section 1). In this context, the measurement of force regulation pertains to all measures of users’ accuracy in the application of forces, such as peak forces, mean forces or force variance. The variable task completion time refers to the measured times for entire tasks as well as that of individual task segments. Finally, the error rate captures any deviation from a response previously defined as cor-rect. The error rate includes, for example, position or force deviation measures as well as failure/success rates of tasks that required quick or precise movements. In con-trast, studies that only featured errors in qualitative judgements (e.g. weight or texture discrimination) would not be eligible for inclusion.

Since only few studies met the inclusion criteria, for this outcome analysis only, individual studies could feature in more than one category, if they reported effect sizes for several outcome measures, e.g. task completion time and error rate. As [23], who had also performed this proce-dure pointed out, strictly speaking, this procedure is not theoretically appropriate, as it implies that effect sizes are not independent. However, since it is unknown whether the three outcome measures chosen for this study consti-tute measures of the same underlying dimension of task performance or in fact measure disparate constructions, a separate analysis by outcome was considered necessary. Since it violates the theoretical assumption of data inde-pendence, however, this analysis should only be consid-ered exploratory in nature.

3 Method

3.1 Literature search

Multiple approaches were taken to the literature search. For one, relevant key terms (e.g. force feedback, vibrotac-tile feedback, haptic virtual fixtures, vibrations, user study, haptic display, haptic device) were entered into relevant databases and search engines such as PsycInfo,

6 IEEE TRANSACTIONS ON HAPTICS, TH-2011-12-0107.R2

Cambridge Scientific Abstracts (CSA), the IEEE Xplore and Google Scholar. In addition, a 10-year retro-grade hand-search was conducted on the table of contents for a number of prominent journals in the field.

3.2 Inclusion criteria

Studies were eligible for inclusion in the meta-analysis if they met all of the following inclusion criteria. For one, as it was outlined in the previous section, studies were in-cluded if they feature a force feedback or vibrotactile display and compared the effects of haptic feedback to a control condition featuring no haptic feedback with an experimental user study. In order to qualify for inclusion, studies also needed to investigate teleoperation or virtual scenarios that require movement coordination or the precise regulation of applied forces and measured forces, task completion time or error rates. Tasks in which quali-tative judgements needed to be made such as texture or weight discrimination were not considered for analysis. Studies with sample sizes of less than ten participants were not included, as were those which lacked statistics necessary for the calculation of effect sizes, such as mean values and standard deviations or standard errors, or inferential statistics in general. For studies which other-wise qualified for inclusion but lacked the statistics that are necessary to calculate effect sizes, authors were con-tacted via e-mail in an attempt to obtain the missing in-formation.

All but three of the included studies had been published in journals or conference proceedings. Two studies were published in form of a doctoral thesis, whilst one study was found in a Master Thesis. The criteria specified above yielded 32 studies and 49 effect sizes.

3.3 Effect size calculations

The effects were coded so that a positive effect always meant an improvement of work performance. Hence, reductions of task completion times, excessive forces or errors would be coded as positive effects, even if there would be a negative correlation between these measures and the application of haptic signals.

Subsequent analyses of the collected effect sizes have been conducted in accordance with Lipsey & Wilson [24]. Based on their suggestions, the standardised mean-difference effect size (Cohen’s d) was determined an ap-propriate effect size index. The effect sizes were calculat-ed for each included study based on reported summary statistics, such as means and standard deviations, t- and F-tests, or p-values if no other statistics were available. If studies reported effects for more than one outcome meas-ure, e.g. task completion time and error rate, multiple effects were recorded. Oftentimes, studies would report more than one effect for a particular outcome measure, for example, mean forces and peak forces. In these cases, the largest effect sizes were chosen for inclusion in the analysis. Since the vast majority of studies included fea-

ture small sample sizes and it was shown that Cohen’s d tends to be upwardly biased when based upon sample sizes of less than 20 participants [29], a correction was applied using Hedges’ g as indicated in (1)

𝑔 = 𝑑 × �2𝑁−22𝑁

(1)

whereby N denotes the number of participants that took part in each study. All subsequent analyses were con-ducted with this weighted effect size estimate. For the meta-analysis, the sign of the effect size is re-coded so that positive effect sizes denote performance improve-ment with the use of haptic feedback, whereas a negative sign indicates a deterioration in performance. The ob-tained summary effect sizes and their variance are dis-played in Table 1. To ensure transparency of each analy-sis, citations of the included studies are listed in brackets. References of these studies are provided after the general references towards the end of this paper.

3.4 Analyses

Precision of the effect size estimates is expressed in form of calculated 95% Confidence Intervals (CI). Essentially, these intervals indicate a range of values around the effect size statistic that are believed to contain, with a certain probability (in this case 95%), the true population value as it might be estimated from much larger sample sizes [30]. Hence, the CI indicate the extent to which sampling error is likely to influence the estimate of the effect size. For the analysis, it was assumed that variance in effect sizes stems from study-level sampling error as well as subject-level sampling error. Consequently, a random-effects model was chosen, which typically results in wider confi-dence levels, thus yielding a more conservative estimate [25].

Homogeneity of effect sizes was calculated with Cochran’s Q statistic, which is calculated as the weighted sum of squared differences between individual study effects and the pooled effect across studies, as indicated in (2):

𝑄 = ∑ 𝑊𝑖𝑌2𝑖𝑘𝑖=1 −

�∑ 𝑊𝑖𝑌𝑖𝑘𝑖=1 �

2

∑ 𝑊𝑖𝑘𝑖=1

(2)

where Wi is the study weight (1/vi) and Yi is the study effect [25]. The standardised heterogeneity measure Q may then be used to test the null hypothesis that the in-cluded studies share a common effect size (i.e. show ho-mogeneity), under the assumption that Q will follow a central chi-squared distribution with degrees of freedom (df) equal to k-1. Hence, a p-value can be reported for this assumption of homogeneity to indicate statistical signifi-cance. However, in the interpretation of these significance values it should be kept in mind that, like all tests of sig-nificance, this test is sensitive to excess dispersion and the number of included studies.

NITSCH ET AL.: A META-ANALYSIS OF THE EFFECTS OF HAPTIC INTERFACES ON TASK PERFORMANCE WITH TELEOPERATION SYSTEMS 7

TABLE 1. HEDGES’G AND VARIANCE FOR EACH INCLUDED EFFECT.

Study No. N Outcome Experimental Setup Feedback Type Hedges‘ g Variance [1] 13 time VR FF 2.25 0.52 [1] 13 error VR FF 1.15 0.36 [2] 30 time VR FF 1.48 0.09 [2] 30 error VR FF 0.07 0.07 [3] 24 time VR VT 1.46 0.21 [3] 24 force VR VT -1.74 0.23 [4] 25 time TOP FF 0.07 0.16 [5] 48 time TOP FF 0.55 0.09 [6] 10 error VR VT 0.69 0.43 [7] 20 error VR FF 0.97 0.11 [7] 20 time VR FF 0.09 0.10 [8] 25 force TOP FF 0.85 0.18 [8] 25 time TOP FF 0.77 0.17 [9] 20 time VR FF 0.09 0.10

[10] 14 error VR FF 1.85 0.42 [11] 12 force TOP VT 0.89 0.19 [12] 58 time VR VT 0.58 0.04 [13] 32 error TOP VT 0.43 0.13 [13] 10 force TOP VT 1.41 0.51 [14] 10 time VR FF 0.24 0.40 [15] 24 error VR FF 0.55 0.09 [16] 14 error VR FF 2.77 0.58 [17] 24 time VR FF -0.92 0.19 [17] 24 error VR FF -0.87 0.18 [18] 32 time VR FF 2.86 0.26 [19] 17 time VR FF 0.95 0.26 [20] 22 time VR FF 0.75 0.20 [20] 22 error VR FF 1.06 0.21 [21] 11 time VR FF 3.40 0.94 [22] 10 time TOP FF 0.54 0.21 [22] 10 force TOP FF 0.41 0.21 [22] 10 time TOP FF 0.09 0.20 [22] 10 force TOP FF 0.47 0.21 [23] 38 error VR FF 0.68 0.11 [24] 10 error VR FF 1.95 0.61 [25] 20 error TOP FF 1.86 0.15 [25] 20 time TOP FF 0.31 0.10 [26] 21 force TOP FF 0.85 0.21 [27] 34 time TOP FF 0.52 0.06 [27] 34 force TOP FF 0.52 0.06 [28] 30 error VR VT 0.32 0.14 [28] 30 time VR VT -0.48 0.14 [29] 12 force VR VT -2.54 0.63 [29] 12 time VR VT 2.54 0.63 [29] 12 error VR VT 0.28 0.13 [30] 12 force TOP VT 1.15 0.39 [30] 12 time TOP VT 1.23 0.40 [31] 20 time VR VT 1.11 0.23 [32] 20 time TOP VT 0.92 0.22

8 IEEE TRANSACTIONS ON HAPTICS, TH-2011-12-0107.R2

Also reported is the I² statistic, which describes the per-centage of variation across studies that is due to effect heterogeneity rather than chance ( [31], [32]), and may be calculated as (3):

𝐼² = �𝑄−𝑑𝑓𝑄� × 100% (3)

Hence, unlike other homogeneity measures, such as the frequently reported Q-statistic, I² is not inherently de-pendent upon the number of studies included in the analysis and has the advantage of being more intuitively interpretable.

3.5 Procedure

The meta-analyses were conducted in several steps. First, the overall effect of haptic feedback on task performance compared to performance conducted without task-relevant haptic information was investigated. Next, mod-erator analyses also investigated whether the effect of haptic feedback on task performance was affected by experimental setup (virtual reality or teleoperation) and whether the two different applications of haptic signals differed in their effectivenes. It was further explored whether effects of haptic feedback on task performance varied by outcome measure. Finally, outcome-specific effects of vibrotactile and force feedback devices were investigated separately.

4 RESULTS Based on an analysis of the distribution of standardised mean difference effect sizes for over 300 meta-analyses, Lipsey and Wilson [24] established benchmarks for the interpretation of effect sizes, which will be utilised for the following analyses. According to their classification, an effect size of ES ≤ .30 constitutes a small effect, one of ES = .50 a medium effect, and an effect size of ES ≥ .67 classi-fies as a large effect. For the sake of clarity and brevity, summaries of the results of the respective meta-analyses are presented in Tables 2-6, indicating the number of

studies (k) included in each model, the overall number of participants (N) and the aforementioned test statistics.

4.1 Analysis of the main effect of haptic feedback

First, the overall effect of task-relevant haptic feedback on task performance was investigated (s. Table 2, “Baseline model”). The results indicate a large effect of haptic feed-back on task performance (g = 0.96, p<.001) but also sug-gest a significant inconsistency in the reported effect sizes (I² = 66.6%).

4.2 Moderator analyses

In meta-analysis, typically not only the overall effect strength but also its variability is of interest as it offers indi-cation and in some cases even explanation of the working mechanisms behind the observed effects. Hence, aiming to ascertain possible sources of variability in the baseline mod-el, further analyses were conducted.

4.2.1 Experimental setup

Next, a moderator analysis investigated potential influ-ences of the experimental setup on task performance (s. Table 2). The meta-analysis of studies which assessed performance with a virtual setup showed a large effect of haptic feedback on performance improvement (g = 1.11, p<.001), although the inconsistency remained high (I² = 74.1%). On the other hand, studies with teleoperation setups showed a comparably smaller, yet still large effect (g = 0.74, p<.001) with considerably reduced, non-significant heterogeneity of results (I² = 35.1%). Hence, it would seem haptic feedback is slightly less effective, yet more reliable in improving task performance with actual teleoperated systems compared to virtual simulations of teleoperated setups.

TABLE 2. SUMMARY OF RANDOM-EFFECTS META-ANALYSIS RESULTS FOR THE COMPARISON “HAPTIC FEEDBACK PRESENT” VS. “HAPTIC FEEDBACK ABSENT” (BASELINE MODEL) AND THE MODERATOR VARIABLE EXPERIMENTAL SETUP (TELEOPERATION VS.

VR).

k N Hedges’ g

Lower 95% CI

Upper 95% CI

Q I² References

Baseline model 32 712 0.96, p<.001

0.71 1.21 92.75, p<.001

66.6% [1] – [32]

Experimental Setup

Teleoperation 11 259 0.74, p<.001

0.47 1.02 14.59, p=.15

31.5% [4], [5], [8], [11], [13], [22], [25], [26], [27], [30], [32]

VR 21 453 1.11, p<.001

0.74 1.48 77.12, p<.001

74.1% [1], [2], [3], [6], [7], [9], [10], [12], [14], [15], [16], [17], [18], [19], [20], [21], [23], [24], [28], [29], [31]

NITSCH ET AL.: A META-ANALYSIS OF THE EFFECTS OF HAPTIC INTERFACES ON TASK PERFORMANCE WITH TELEOPERATION SYSTEMS 9

4.2.2 Feedback Type

A moderator analysis investigating the effect of the type of haptic feedback on task performance yielded statistically significant, large effect sizes for vibrotactile applications (g = 0.81, p<.001) as well as applications of force feedback (g = 1.01, p<.001). As can be seen in Table 3, inconsistency measures indicate that those studies which utilised vibrotactile feedback to haptically convey an impression of surface contact reported homogenous effects (I² = 24%), whereas studies which used force feedback devices to this effect produced overall more inconsistent findings (I² = 73.9%).

4.3 Analysis by outcome

Finally, it was explored whether haptic feedback im-proved all three measures of task performance or only some of them. For this purpose, all of the 49 effect sizes included in the meta-analysis were investigated (s. Table

4). Somewhat surprisingly, the analysis showed the larg-est effect of haptic feedback on error rate reduction (g = 0.78, p <.001). Although the effect of haptic feedback on the reduction of task completion times was comparatively smaller (g = 0.75, p<.001), it is still classified as large. A small, statistically non-significant effect was found on the reduction of applied forces (g = 0.29, p=.36), with the reported effect sizes showing a wide spread. As it was found to be the case with previous analyses, inconsisten-cies in the report of effect sizes were significant for all measures and quite high (I²error = 69.8%, I²time = 73.2%, I²force = 77.9%).

In summary, the use of interfaces with haptic feedback capability had a large, positive effect on overall work performance, as measured in task completion times, pre-cise force regulation and error rate. There seemed to be large inconsistency in the reported effectiveness, howev-er. Further analyses showed that this large variability in

TABLE 3. SUMMARY OF RANDOM-EFFECTS META-ANALYSIS RESULTS FOR THE COMPARISON “HAPTIC FEEDBACK PRESENT” VS. “HAPTIC FEEDBACK ABSENT” (BASELINE MODEL) AND THE MODERATOR VARIABLE “FEEDBACK TYPE” (VIBROTACTILE FEEDBACK

VS. FORCE FEEDBACK).

k N Hedges’ g

Lower 95% CI

Upper 95% CI

Q I² References

Baseline model 32 712 0.96, p<.001

0.71 1.21 92.75, p<.001

66.6% [1] – [32]

Feedback Type

Vibrotactile Feedback

10 230 0.81, p<.001

0.52 1.11 11.85, p=.22

24% [3], [11], [12], [13], [28], [29], [30], [31], [32]

Force Feed-back

22 482 1.01, p<.001

0.66 1.35 80.49, p<.001

73.9% [1], [2], [4], [5], [7], [8], [9], [10], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27]

TABLE 4. SUMMARY OF RANDOM-EFFECTS META-ANALYSIS RESULTS FOR THE COMPARISON “HAPTIC FEEDBACK PRESENT” VS. “HAPTIC FEEDBACK ABSENT” (BASELINE MODEL) WITH SUBSEQUENT SEPARATE ANALYSES FOR EACH OUTCOME MEASURE.

k N Hedges’ g

Lower 95% CI

Upper 95% CI

Q I² References

Baseline model 32 712 0.96, p<.001

0.71 1.21 92.75, p<.001

66.6% [1] – [32]

Outcome Measure*

Force regula-tion

10 170 0.29, p = .36

-0.32 0.90 40.64, p<.001

77.9% [3], [8], [11], [13], [22], [26], [27], [29], [30]

Completion Time

24 547 0.75, p<.001

0.43 1.06 85.83, p<.001

73.2% [1], [2], [3], [4], [5], [7], [8], [9], [12], [14], [17], [18], [19], [20], [21], [22], [25], [27], [28], [29], [30], [31], [32]

Error Rate 15 313 0.78, p<.001

0.40 1.17 46.43, p<.001

69.8% [1], [2], [6], [7], [10], [13], [15], [16], [17], [20], [23], [24], [25], [28], [29]

*Note that several studies may be represented by more than one outcome variable in this analysis.

10 IEEE TRANSACTIONS ON HAPTICS, TH-2011-12-0107.R2

results may be attributed to the experimental setup as well as the feedback type. It seems that, overall, haptic interfaces are similarly effective in real teleoperation sys-tems as they are in virtual teleoperation systems, howev-er, there seems to be much more variability in the effec-tiveness of haptic feedback when used in a virtual simula-tion. Similarly, vibrotactile feedback was found to be only slightly less effective than force feedback in improving work performance measures, however, there seemed to be consid-erably more consistency in the reported effectiveness of vibrotactile feedback.

Looking at the effect of haptic interfaces on individual task performance measures, haptic interfaces seemed most effec-tive in reducing performance errors and task completion times. Users’ ability to apply forces appropriately in the remote environment was in many cases also positively af-fected by the application of haptic feedback, however, the meta-analysis did not indicate a statistically effect.

4.3.1 Outcome-specific effects of vibrotactile and force feedback

Aiming to shed some further light on possible sources of variability in the reported results, a second series of meta-analyses was conducted to ascertain outcome-specific effects of vibrotactile feedback and force feedback on work perfor-mance. Since this requires further partitioning of data, some analyses encompass only few studies, which may lack the power to produce reliable results. Hence these analyses should only be considered exploratory in nature.

4.3.2 Vibrotactile feedback

A summary of the results of meta-analyses on perfor-mance-enhancing effects of vibrotactile feedback is pre-sented in Table 5. Interestingly, the analysis by outcome indicates that vibrotactile feedback only speeds up task completion times significantly (g = 0.89, p<.01), but does

TABLE 5. SUMMARY OF RANDOM-EFFECTS META-ANALYSIS RESULTS FOR THE COMPARISON “VIBROTACTILE FEEDBACK PRE-SENT” VS. “VIBROTACTILE FEEDBACK ABSENT” WITH SUBSEQUENT SEPARATE ANALYSES FOR EACH OUTCOME MEASURE.

k N Hedges’ g

Lower 95% CI

Upper 95% CI

Q I² References

Vibrotactile Feedback

10 230 0.81, p<.001

0.52 1.11 11.85, p=.22

24% [3], [11], [12], [13], [28], [29], [30], [31], [32]

Outcome Measure*

Force regula-tion

5 70 -0.15, p= .85

-1.65 1.36 34.84, p<.001

88.5% [3], [11], [13], [29], [30]

Completion Time

7 176 0.89, p<.01

0.31 1.48 20.17, p<.01

70.3% [3], [12], [28], [29], [30], [31], [32]

Error Rate 4 84 0.38, p=.06

-0.01 0.76 0.35, p= .95

0% [3], [13], [28], [29]

*Note that several studies may be represented by more than one outcome variable in this analysis.

TABLE 6. SUMMARY OF RANDOM-EFFECTS META-ANALYSIS RESULTS FOR THE COMPARISON “FORCE FEEDBACK PRESENT” VS. “FORCE FEEDBACK ABSENT” WITH SUBSEQUENT SEPARATE ANALYSES FOR EACH OUTCOME MEASURE.

k N Hedges’ g

Lower 95% CI

Upper 95% CI

Q I² References

Force Feedback 22 482 1.01, p<.001

0.66 1.35 80.49, p<.001

73.9% [1], [2], [4], [5], [7], [8], [9], [10], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27]

Outcome Measure*

Force regula-tion

5 100 0.59, p<.001

0.27 0.92 1.01, p=.91

0% [8], [22], [26], [27]

Completion Time

17 371 0.69, p<.001

0.31 1.08 65.36, p<.001

75.5% [1], [2], [4], [5], [7], [8], [9], [14], [17], [18], [19], [20], [21], [22], [25], [27]

Error Rate 11 229 0.96 p<.001

0.44 1.48 43.76, p< .001

77.1% [1], [2], [7], [10], [15], [16], [17], [20], [23], [24], [25]

*Note that several studies may be represented by more than one outcome variable in this analysis.

NITSCH ET AL.: A META-ANALYSIS OF THE EFFECTS OF HAPTIC INTERFACES ON TASK PERFORMANCE WITH TELEOPERATION SYSTEMS 11

not appear to reduce forces (g = -0.15, p =.63) or handling errors (g = 0.38, p = .06) significantly.

Significance tests of the corresponding Q-heterogeneity statistic indicate significant inconsistency in the reported effects of force regulation measures (I² = 88.5%) and task completion times (I² = 70.3%), suggesting that the respec-tive summary effect sizes may not necessarily be indica-tive of the actual effect. In contrast, the analysis indicated no inconsistency in the reduction of errors (I² = 0%), alt-hough it should be cautioned that homogeneity measures are not particularly reliable if calculated based on three or fewer studies (Lipsey & Wilson, 2001, p. 117).

4.3.3 Force feedback

A summary of the results of meta-analyses on the effects of force feedback on task performance is presented in Table 6. The exploratory analysis by outcome variable shows that force feedback demonstrates its biggest strength to lie in the reduction of handling errors (g = 0.96, p<.001), although large and medium-sized effect sizes were also found with respect to task completion time savings (g = 0.69, p<.001) as well as force regulation (g = 0.59, p<.001), respectively. Although the effect of force feedback on human force regulation was smaller than that on other measures, it seemed to be much more consistent (I²force=0% vs. I²time = 75.5% and I²error = 77.1%).

5. DISCUSSION The presented analysis provided a quantitative summary of available research evidence and calculated its effec-tiveness over a broad range of settings, systems, task setups, researchers, and circumstances. The meta-analysis indicated haptic interfaces to be very effective in improv-ing work performance, as indicated by speed, handling accuracy and appropriate application of forces. Whilst this effect is less pronounced in real teleoperation systems compared to performance with virtual systems, it is much more consistently observed in the former. This effect may be attributable to a much greater variety in tasks that was observed amongst studies that used virtual setups of teleoperated systems. Due to a lack of suitable data, meta-analyses that investigate the moderating role of task ob-jectives and task requirements were not performed at this point but remain subject to future research.

The presented series of meta-analyses further made a direct comparison of the effectiveness of different haptic applications possible, and established the strengths and weaknesses of each of the two main applications of haptic signals investigated in this work with respect to their effectiveness in improving particular performance as-pects. The results indicated that vibrotactile feedback was only effective in reducing task completion times, but nei-ther forces nor errors were significantly reduced. Particu-larly noticeable is the large variance in the reported effect sizes of force measurements. Although this variance may

be caused by a number of factors, including task depend-ence or differing methodology, the most likely explana-tion for this variance would seem to be that some vi-brotactile devices may be less effective than other devices in reducing applied forces. Indeed, studies suggest that if vibration frequency or location varies in order to convey information of intensity or direction, vibrotactile feedback may be less effective than a uniform signal that alerts the user of a required response [33].

Represented by the largest number of studies in the meta-analysis, force feedback from the remote environment appears to be not only more popular in the teleoperation community than vibrotactile feedback, but also more promising in terms of task performance improvement as it seems to exert a larger effect on overall performance (gforce feedback = 1.01 vs. gvibrotactile = 0.81). Its effect on task completion time is only slightly lower, whilst also demonstrating large and medium effects on error rate and force regulation, respectively. Whilst the effect on human force regulation is not as pronounced, force feedback appears to be particularly consistent in reducing forces, suggesting that it is of some importance to simulate con-tact forces realistically in order to improve the users’ abil-ity to regulate precisely the forces that are applied to the remote or virtual environment, as was suggested by [15].

Noticeably high inconsistency was found in most anal-yses. In part, this heterogeneity may also be attributed to the fact that only maximum rather than mean effects were selected for inclusion, as the former are prone to exaggera-tion. However, one cannot discount the possibility that this heterogeneity in reported effect sizes reflects a sub-stantial influence of one or more variables that were not considered in this study, such as the technical specifica-tions of the haptic interface or more general methodology issues. For example, although no explicit evidence was found for this assumption, it is conceivable that older studies report fewer performance gains compared to studies with modern haptic interfaces, assuming that haptic technology has improved over the years. The in-cluded studies also varied widely in their use of visual feedback. Considering that humans are very much visual-ly-oriented creatures, it seems very likely that perfor-mance with haptic interfaces would differ depending on the quality and type of visual feedback provided to the user. Further meta-analytical studies seem to be necessary to examine the influence of such variables, once sufficient data is available. This heterogeneity in findings also re-flects the impression one gains when viewing the litera-ture: that a wide disparity in methods, tasks and systems exists, which are difficult to unite under broader themes. Theoretical development is urgently needed to further the unification and integration of haptic interface research.

In the interpretation of these findings, it should be stressed again, that this study was not intended to inves-tigate average but maximum effects. Furthermore, the

12 IEEE TRANSACTIONS ON HAPTICS, TH-2011-12-0107.R2

observed heterogeneity indicates that the reported pooled effect sizes may not represent their respective sample of effect sizes very well, since there is significantly more variability in these scores as could be explained by chance deviations in performance measures. Hence, the present analyses do not necessarily provide information about the improvement in performance that one could realistically expect from the implementation of haptic signals. Instead, the results reflect an optimistic estimate of the largest improvement that one could hope to achieve. As such, only the best-case scenario is described with the rationale that an investment into haptic interfaces cannot be rec-ommended if the most optimistic results do not suggest an improvement in task performance. Nevertheless, as-certaining the average impact of haptic signals on task performance remains a priority of interface research, and researchers are encouraged to report all results for this purpose, even if they are not statistically significant or contrary to the stipulated hypotheses.

Finally, it needs to be pointed out that meta-analysis has been termed a Garbage-In Garbage-Out technique, mean-ing that the results are only reliable and meaningful to the extent that the included studies are of high quality. For the present analysis, selective criteria were chosen that aimed at ensuring high research quality of the included studies. However, in many cases, relevant information was simply missing. For example, the randomisation of experimental conditions or the planning of practice ses-sions with the system are rarely elaborated. Although the omission of some relevant information seems hardly avoidable considering the stringent space restrictions of most publications, researchers are encouraged to report all information that would be necessary for a replication of their study to facilitate the synthesis of reported evi-dence.

Despite its caveats, the meta-analyses outlined in this paper provide overwhelming evidence that haptic inter-faces are capable of improving task performance. Yet, the results presented in this work should not be interpreted without considering its limitations. For one, the quantita-tive measures of work performance, which were the focus of this work, are necessarily very restrictive in that they only provide a brief glimpse of haptic human-machine interaction in teleoperation systems. While measures of timing and force regulation are clearly defined, error rate is a very diverse measure, referring to position errors as well as more general failures in achieving a particular task goal. Although particular care was taken that the measures of error considered in the presented studies all reflect a general inability to precisely control the teleoper-ator’s movements, it cannot be precluded that the chosen measures are less homogenous than assumed and that haptic signals may thus differ in their effects on particular measures of performance error. Hence, one should be aware that important performance benefits might not have been uncovered in the present work.

Furthermore, an important measure of work performance has not been considered thus far: the ability of users to identify and discriminate objects based on qualitative assessments of properties, such as texture, shape, and compliance. In fact, performance problems that are caused by the inability to perform such assessments are widely reported; they are particularly prominent during telesurgery [34]. These difficulties have largely been at-tributed to the absence of tactile feedback and as such, this aspect of work performance holds the most promise for haptic applications. The effectiveness of haptic signals in improving users’ ability to make qualitative assess-ments of object properties is well-researched and well-documented. Yet, the synthesis of available research on these aspects of task performance with teleoperated sys-tems remains the subject of future work.

6. CONCLUSION

In conclusion, haptic human-machine interfaces promise to improve observed performance problems with many teleoperated systems. A series of meta-analyses presented in this paper indicate that haptic interfaces indeed show a potential for reducing the risk of damage through exces-sive force application, speeding up performance and reducing handling errors – provided, of course, that they are applied appropriately. Further meta-analytical work is needed to process and synthesise the currently frag-mented empirical evidence base. In addition, considerable theoretical development on haptic human-machine inter-action is required in order to provide a framework that guides future investigations of this topic. Only a method-ological approach integrating theory and applied studies can unite the fragmented research efforts that currently prevail in the field of haptic human-machine interaction and thus eventually effect the required improvement of work performance with industrial teleoperation systems.

Acknowledgment

The authors wish to thank the anonymous reviewers for their helpful comments and support. Further thanks are extended to Prof. Dr. Michael Popp of the Universität der Bundeswehr München for sharing his insights on the topic. This work was supported in part by the Collabora-tive Research Center “High Fidelity Telepresence and Teleaction Systems” (SFB453), funded by the German Research Foundation (DFG).

REFERENCES [1] R. T. Laird, M. H. Bruch, M. B. West, D. A. Ciccimaro

and H. R. Everett, “Issues in vehicle teleoperation for tunnel and sewer reconnaissance,” Ft. Belvoir Defense Technical Information Center APR, 2000.

[2] M. Utsumi, T. Hirabayashi and M. Yoshie,

NITSCH ET AL.: A META-ANALYSIS OF THE EFFECTS OF HAPTIC INTERFACES ON TASK PERFORMANCE WITH TELEOPERATION SYSTEMS 13

“Development for teleoperation underwater grasping system in unclear environment,” in International Symposium on Underwater Technology, 2002.

[3] J. Rosen, B. Hannaford and R. M. Satava, Surgical Robotics: Systems Applications and Visions, 1 ed., Heidelberg Berlin: Springer Verlag, 2010.

[4] R. Aracil, L. F. Penin, M. Ferne, L. M. Jimenez, A. Barrientos, A. Santamaria, P. Martinez and A. Tudun, “ROBTET: A new teleoperated system for live-line maintenance,” in 7th International Conference on Transmission and Distribution Construction and Live-Line Maintenance, 1995.

[5] A. Jacoff, E. Messina, B. A. Weiss, S. Tadokoro and Y. Nakagawa, “Test arenas and performance metrics for urban search and rescue robots,” in Intelligent Robots and Systems, 2003.

[6] X. Yang, Q. Chen, D. C. Petriu and E. M. Petriu, “Internet-based teleopeartion of a robot manipulator for education.,” in 3rd IEEE International Workshop on Haptic, Audio and Visual Environments and Their Applications, 2004.

[7] K. Kawamura and M. Iskarous, “Trends in service robots for the disabled and the elderly,” in IEEE/RSJ/GI International Conference on Intelligent Robots and Systems, 1994.

[8] S. Zhai and J. W. Senders, “Investigating coordination in multidegree of freedom control II: correlation analysis in 6 DOF tracking,” in 41st Annual Meeting of the Human Factors and Ergonomic Society, 1997.

[9] B. Deml, Telepräsenzsysteme: Gestaltung der Mensch-System-Schnittstelle, Neubiberg: Universität der Bundeswehr München, 2004.

[10] L. Geiger, M. Popp, B. Färber, J. Artigas and P. Kremer, “The influence of telemanipulation-systems on fine motor performance,” in Third International Conference on Advances in Computer- Human Interactions, 2010.

[11] R. J. Adams, D. Klowden and B. Hannaford, “Virtual training for a manual assembly task,” in Haptics-e, 2001.

[12] B. J. N. A. Unger, P. M. Berkelman, A. Thompson, S. Lederman, R. L. Klatzky and R. L. Hollis, “Virtual peg-in-hole performance using a 6-Dof magnetic levitation haptic device: Comparison with real forces and with visual guidance alone,” in 10th Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems., 2002.

[13] C. Ware and R. Balakrishnan, “Reaching for objects in VR displays: lag and frame rate,” Computer-Human Interaction, vol. 1, no. 4, pp. 331-356, 1994.

[14] M. J. Massimino, T. B. Sheridan and J. B. Roseborough, “One hand tracking in six degrees of freedom,” in IEEE International Conference on Systems, Man and Cybernetics, 1989.

[15] M. Radi, A. Reiter, S. Zaidan, V. Nitsch, B. Faerber

and G. Reinhart, “Telepresence in industrial applications: implementation issues for assembly tasks,” Presence: Teleoperators and Virtual Environments, vol. 19, no. 5, pp. 415-429, 2010.

[16] M. Tavakoli, R. V. Patel and M. Moallem, “Haptic feedback and sensory substitution during telemanipulated suturing,” in First Joint Eurohaptics Conference and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems, 2005.

[17] R. Aracil, M. Buss, S. Cobos, M. Ferre, S. Hirche, M. Kuschel and A. Peer, “The human role in telerobotics,” in Advances in Telerobotics, M. Ferre, M. Buss, R. Aracil, C. Melchiorri and C. Balaguer, Eds., Berlin Heidelberg, Springer Verlag, 2007, pp. 1-7.

[18] N. Birbaumer and R. F. Schmidt, “Bewegung und Handlung,” in Biologische Psychologie, 6 ed., Heidelberg, Springer Medizin Verlag, 2003, pp. 255-295.

[19] H. Z. Tan, B. Eberman, M. A. Srinivasan and B. Cheng, “Human factors for the design of force-reflecting haptic interfaces,” in Dynamic Systems and Control, vol. 55(1), C. Radcliffe, Ed., The American Society of Mechanical Engineers, 1994, pp. 353-359.

[20] K. E. MacLean, “Designing with haptic feedback,” in IEEE International Conference on Robotics and Automation, 2000.

[21] K. S. Hale and K. M. Stanney, “Deriving haptic design guidelines from human physiological, psychophysical, and neurological foundations,” IEEE Computer Graphics and Applications, vol. March/April, pp. 33-39, 2004.

[22] L. A. Jones and N. B. Sarter, “Tactile displays: guidance for their design and application,” Human Factoros: The Journal of the Human Factors and Ergonomics Society, vol. 50, no. 9, pp. 90-111, 2008.

[23] J. L. Burke, M. S. Prewett, A. A. Gray, L. Yang, F. R. B. Stilson, M. D. Coovert, L. R. Elliot and E. Redden, “Comparing the effects of visual-auditory and visual-tactile feedback on user performance: a meta-analysis,” in 8th International Conference on Multimodal Interfaces, 2006.

[24] M. W. Lipsey and D. B. Wilson, Practical meta-analysis, London New Delhi: Sage Publications, Inc., 2001.

[25] M. Borenstein, L. V. Hedges, J. P. T. Higgins and H. R. Rothstein, Introduction to Meta-analysis, Chichester: John Wiley & Sons, Ltd., 2009.

[26] J. Bortz and N. Döring, Forschungsmethoden und Evaluation, 4 ed., Heidelberg: Springer Medizin Verlag, 2006.

[27] S. D. Laycock and A. M. Day, “Recent developments and applications of haptic devices,” Computer Graphics Forum, vol. 22, no. 2, pp. 117-132, 2003.

[28] H. Iwata, “History of haptic interface,” in Human Haptic Perception, Basel, Birkhäuser Verlag, 2008, pp. 355-361.

14 IEEE TRANSACTIONS ON HAPTICS, TH-2011-12-0107.R2

[29] L. V. Hedges and I. O. Olkin, Statistical methods for meta-analysis, San Diegao, CA: Academic Press, Inc., 1985.

[30] A. Field, Discovering statistics using SPSS, 3 ed., London: Sage, 2009.

[31] G. A. Higgins and H. R. Champion, “The military simulation experience: Charting the vision for simulation training in combat trauma,” Fort Detrick, MD, 2000.

[32] J. P. T. Higgins and S. G. Thompson, “Quantifying heterogeneity in a meta-analysis,” Statistics in Medicine, vol. 21, pp. 1539-1558, 2002.

[33] V. Nitsch, Haptic human-machine interaction in teleoperation systems: Implications for the design and effective use of haptic interfaces, Saarbrücken: SVH, 2012.

[34] C. R. Wagner, R. D. Howe and N. Stylopoulos, “The role of force feedback in surgery: analysis of blunt dissection,” in 10th Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems, 2002.

References of Studies Included in the Meta-analyses: [1] R. Arsenault and C. Ware, “Eye-hand coordination

with force feedback,” in SIGCHI Conference on Human Factors in Computing Systems, 2000.

[2] C. G. L. Cao, M. Zhou, D. B. Jones and S. D. Schwaitzberg, “Can surgeons think and operate with haptics at the same time?,” Journal of Gastrointestinal Surgery, vol. 11, pp. 1564-1569, 2007.

[3] L. T. Cheng, R. Kazman and J. Robinson, “Vibrotactile feedback in delicate virtual reality operations,” in Fourth ACM international Conference on Multimedia, 1996.

[4] B. Deml, T. Ortmaier and U. Seibold, “The touch and feel in minimally invasive surgery,” in IEEE International Workshop on Haptic Audio Visual Environments and their Applications, 2005.

[5] B. Petzold, M. F. Zaeh, B. Faerber, B. Deml, H. Egermeier, J. Schilp and S. Clarke, “A study on visual, auditory, and haptic feedback for assembly tasks,” Presence: Teleoperators and virtual environments, vol. 13, no. 1, pp. 16-21, 2004.

[6] L. Forsberg, “Increasing performance and reducing the visual information overload when using a computer mouse with the help of vibrotactile feedback,” in Umea's 12th Student Conference in Computing Science, 2008.

[7] Y. Hurmuzlu, A. Ephanov and D. Stoianovici, “Effect of a pneumatically driven haptic interface on the perceptional capabilities of human operators,” Presence: Teleoperators and Virtual Environments, vol. 7, no. 3, pp. 290-307, 1998.

[8] H. Mayer, I. Nagy, A. Knoll, E. A. Braun, R. Bauernschmitt and R. Lange, “Haptic feedback in a telepresence system for endoscopic heart surgery,”

Presence: Teleoperators and Virtual Environments, vol. 16, no. 5, pp. 459-470, 2007.

[9] I. Oakley, M. R. McGee, S. Brewster and P. Gray, “Putting the feel in look and feel,” in SIGCHI Conference on Human Factors in Computing Systems, 2000.

[10] E. L. Sallnäs, “Improved precision in mediated collaborative manipulation of objects by haptic force feedback,” in Haptic HCI 2000, Lecture Notes in Computer Science, vol. 2058, S. Brewster and R. Murray-Smith, Eds., Springer-Verlag, 2000, pp. 69-75.

[11] R. E. Schoonmaker and C. G. L. Cao, “Vibrotactile force feedback system for minimally invasive surgical procedures,” in IEEE International Conference on Systems, Man, and Cybernetics, 2006.

[12] A. Viau, M. Najm and C. E. L. M. F. Chapman, “Effect of tactile feedback on movement speed and precision during work-related tasks using a computer mouse,” Human Factors: The Journal of the Human Factors and Ergonomics Society, no. 47, pp. 816-826, 2005.

[13] A. Reiter, V. Nitsch, G. Reinhart und B. Färber, „Effects of Visual and Haptic Feedback on Telepresent Micro Assembly Tasks,“ in 3rd International Conference on Changeable, Agile, Reconfigurable and Virtual Production (CARV), 2008.

[14] R. J. Adams, D. Klowden and B. Hannaford, “Virtual training for a manual assembly task,” in Haptics-e, 2001.

[15] P. Richard, P. Coiffet, A. Kheddar and R. England, “Human performance evaluation of two handle haptic devies in a dextrous virtual telemanipulation task,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 1999.

[16] J. T. Dennerlein and M. C. Yang, “Haptic Force-Feedback Devices for the Office Computer: Performance and Musculoskeletal Loading Issues,” Human Factors, vol. 43, no. 2, pp. 278-286, 2001.

[17] G. W. Edwards, “Performance and usability of force feedback and auditory substitutions in a virtual environment manipulation task,” Virginia Polytechnic Institute and State University, 2000.

[18] R. W. Lindeman, J. L. Sibert and J. K. Hahn, “Towards usable VR: an empirical study of user interfaces for immersive virtual environments,” in SIGCHI Conference on Human Factors in Computing Systems, 1999.

[19] J. Payette, “Evaluation of a force feedback (haptic) computer pointing device in zero gravity,” ASME Dynamics Systems and Contol Division, no. 58, pp. 547-553, 1996.

[20] E. L. Sallnäs and S. Zhai, “Collaboration meets Fitts' law: Passing virtual objects with and without haptic force feedback,” in IFIP Conference on Human-Computer Interaction, 2003.

[21] R. Gupta, T. B. Sheridan and D. Whitney,

NITSCH ET AL.: A META-ANALYSIS OF THE EFFECTS OF HAPTIC INTERFACES ON TASK PERFORMANCE WITH TELEOPERATION SYSTEMS 15

“Experiments using multimodal virtual environments in design for assembly analysis,” Presence: Teleoperators and Virtual Environments, vol. 6, no. 3, pp. 318-338, 1997.

[22] M. Radi, A. Reiter, S. Zaidan, V. Nitsch, B. Faerber and G. Reinhart, “Telepresence in industrial applications: implementation issues for assembly tasks,” Presence: Teleoperators and Virtual Environments, vol. 19, no. 5, pp. 415-429, 2010.

[23] P. Ström, L. Hedman, L. Särna, A. Kjellin, T. Wredmark and L. Felländer-Tsai, “Early exposure to haptic feedback enhances performance in surgical simulator training: a prospective randomized crossover study in surgical residents,” Surgical Endoscopy, pp. 1383-1388, 2006.

[24] O. Gerovich, P. Marayong and A. M. Okamura, “The effect of visual and haptic feedback on computer-assisted needle insertion,” Computer Aided Surgery, vol. 9, no. 6, pp. 243-249, 2004.

[25] S. Lee, G. S. Sukhatme, G. J. Kim and C. M. Park, “Haptic control of a mobile robot: A user study,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2002.

[26] E. A. M. Heijnsdijk, A. Pasdeloup, A. J. Van der Pijl, J. Dankelman and D. J. Gouma, “The influence of force feedback and visual feedback in grasping tissue laparoscopically,” Surgical Endoscopy, vol. 18, no. 6, pp. 980-985, 2004.

[27] V. Nitsch, Haptic Human-Machine Interaction in Teleoperation Systems: Implications for the Design and Effective Use of Haptic Interfaces, Südwestdeutscher Verlag für Hochschulschriften, 2012.

[28] L. O'Hara Long, Investigating the usability of a vibrotactile torso display for improving simulated teleoperation obstacle avoidance, Clemson University, USA: Unpublished Master Thesis, 2011.

[29] C. Jeongeun, J. Kammerl, E. Steinbach and A. El Saddik, "Improving spatial perception in telepresence and teleaction systems by displaying distance information through visual and vibrotactile feedback," Presence: Teleoperators and Virtual Environments, vol. 19, no. 5, pp. 430-449, 2010.

[30] A. Murray, R. Klatzky and P. Khosla, "Psychophysical characterization and testbed validation of a wearable vibrotactile glove for telemanipulation," Presence: Teleoperators and Virtual Environments, vol. 12, no. 2, pp. 156-182, 2003.

[31] R. Adams, A. Olowin, B. Hannaford and O. Sands, "Tactile data entry for extravehicular activity," in Proceedings of IEEE World Haptics Conference, Istanbul, 2011.

[32] M. Martin and S. Parikh, "Improving mobile robot control - Negative feedback for touch interfaces," in Proceedings of IEEE Technologies for Practical Robot Applications, 2011.

Verena Nitsch received her B.Sc. (hons) degree in Applied Psy-chology from the University of Central Lancashire, UK, in 2006 and graduated with a M.Sc. degree in Organisational Psychology from the University of Manchester, UK, in 2007. Since 2008, she has pursued an academic career at the Human Factors Institute of the Universität der Bundeswehr München in Germany, where she re-ceived her Ph.D in 2012. For her work, she has received best paper and best project awards. Dr. Nitsch is a member of numerous pro-fessional bodies, including the IEEE, the British Psychological Socie-ty, the International Association of Applied Psychology and the Haptics Society. Berthold Färber studied psychology at the University of Regens-burg, where he received his doctoral degree in 1980. Since 1989, he has been a professor of human factors studies at the Universität der Bundeswehr München, where he currently conducts and supervises applied and theoretical research on various aspects of human-machine interaction and their implications for system design.