Interpreting the Dynamics of Social Indicators: Methodological Issues Related to Absolute, Relative,...
Transcript of Interpreting the Dynamics of Social Indicators: Methodological Issues Related to Absolute, Relative,...
Interpreting the Dynamics of SocialIndicators: Methodological IssuesRelated to Absolute, Relative, and TimeDifferences
Katja Prevodnik
Vesna Dolničar
Vasja Vehovar
Faculty of Social Sciences, University of Ljubljana, Kardeljeva ploščad
5, 1000 Ljubljana, Slovenia
Abstract
In contemporary society, a growing demand for the accessibility of social
indicators can be observed. This requires increased attention to their
construction to avoid misleading interpretations, which can be the result of
inadequate knowledge, or can even be an intentional choice to imply a
specific desired outcome. This paper addresses this issue by first
summarizing research regarding the perception of numbers, statistical
thinking, and numerical literacy. The focus is then narrowed to the
comparison of social indicators observed for two units in a time
perspective. Three simple and popular measures of dynamics—most
frequently used when social change is analyzed and interpreted—are
addressed: absolute difference, relative difference, and time distance. In a
corresponding experiment, respondents evaluated the direction of change of
a certain social indicator in time (i.e., whether the differences increase,
decrease, or stagnate) for a hypothetical case where the three measures
implied contradictory interpretations. Each experimental group was
exposed to one of these measures. The results indicate that interpretations
basically followed the specific measures that respondents were exposed to.
This effect was particularly strong regarding absolute difference, followed
1,*
1
1
1
e.Proofing http://springerproof.sps.co.in:8080/oxe_v1/printpage.php?token=5xv2...
1 od 36 11.1.2014 20:42
by relative difference, while the effect of exposure to time distance was
somewhat weaker. When only data or graphical presentation was given,
respondents tended to interpret dynamics according to absolute differences.
The results indicate that extreme methodological rigor is needed when
presenting social indicators in time, and some guidelines are provided for
this purpose.
Keywords
Statistical literacy
Numerical data
Number sense
Manipulations
Comparative analysis
1. Introduction
The proper usage of indicators, statistics, and presentations is primarily
dependent on if and how the public understands them. Sicherl (2011 ) claims
that perceptions of welfare and social progress depend on the measure used.
Thus, the lagging of certain subjects or groups in a specific comparison can
be perceived as much more severe if presented with time-distance
methodology as opposed to presenting only absolute or relative differences.
Consequently, the ten-year time lag of a certain country appears much more
serious compared to correspondingly, say, a three-point absolute difference
(e.g., 20 vs. 23 %). Similarly, Mueller (2011 ) has shown that different forms
of presentation can have very specific effects on the reader’s understanding.
Decisions and actions based on such presentations are, therefore, directly
dependent not only on data quality (e.g., accuracy, reliability, validity, etc.)
but also on how the analyses have been performed, presented, and interpreted.
Mueller (2011 ) also noted that contextual issues arise not so much from the
core problem of monitoring the development but from the methodology.
According to Best (2008 ), an in-depth analysis of an individual statistic is
necessary to really acquire a clear view of a social phenomenon. Therefore,
the authors believe that accurate statistics (e.g., absolute numbers, ratios,
percentages, etc.) are not solely sufficient in building general knowledge of
research questions but a quality method for the presentation of data is also
necessary.
The assumption of the general public is that statistics are provided by experts
e.Proofing http://springerproof.sps.co.in:8080/oxe_v1/printpage.php?token=5xv2...
2 od 36 11.1.2014 20:42
(e.g., statistical offices and research institutes), which is sometimes one of the
key reasons for a rather uncritical acceptance of provided results. In this
paper, this assumption is questioned following Best (2008 ), who states that
even official statistics are social products formulated by specific people and
organizations. In general, all numbers, measures, and indicators are, thus,
products of the activities of people who decide not only what they want to
count and how to perform measurements and analyses but also how to present
and interpret the results and give them meaning. Thus, the capacity of the
public to evaluate and critically analyze data and corresponding presentations
is becoming ever more important. This is essential for social indicators,
particularly in the field of well-being and quality-of-life research, since
results have an important influence on the public perception of the current
state in a society as well as on government measures and policies. Substantial
efforts have already been devoted to various contextual and methodological
issues; however, certain very essential aspects have still not been fully
elaborated. The issue of the appropriate choice of indicator and its
presentation may sometimes be decisive to its interpretation as, for example,
Diener and Suh (1997 ) demonstrated in their quality-of-life indicators.
Many researchers (e.g., Wild and Pfannkuch 1999 ; Kaplan et al. 2010 ;
Schield 2010 ) and popular writers (e.g., Huff 1973 ; Campbell 1974 ;
Blastland and Dilnot 2007 ; Best 2008 ) believe that this discussion is
particularly important because the use of numerical and statistical data is not
only rapidly increasing but also is being used by a growing number of
different groups. Data and indicators are generally produced by competent
researchers but can also be produced by less qualified persons (e.g., the
media, lobbyists, specific stakeholders, policymakers, etc.) where the
selection of numbers, graphs, and formulae might become a communication
or even a manipulation strategy. The multiplication of measurement
instruments and methods further expands the potential of heterogeneous
methods of analysis and presentation. In practice, this may complicate an
objective and conclusive judgment as it may become unclear which
approach/interpretation is correct. A researcher’s ethical duty is to cover as
many aspects as possible and answer as many questions as possible by also
taking into account current knowledge in the perceptions of numbers,
graphical presentations, and verbal interpretations (e.g., Curcio 1981 ;
Lewandowsky 1987, Lewadowsky 1999 ; Lewandowsky and Spence 1989 ;
Schwarz 1996; Shaughnessy et al. 1996; Friel et al. 2001 ; Dehaene 2011 ;
OECD 2011 ). These problems are covered in most ethics codes in the fields
e.Proofing http://springerproof.sps.co.in:8080/oxe_v1/printpage.php?token=5xv2...
3 od 36 11.1.2014 20:42
of social sciences, methodology, and statistics (e.g., ISA Code of Ethics,
AAPOR Code of Professional Ethics and Practices, WAPOR Code of
Professional Ethics and Practices, ICC/ESOMAR Code, RESPECT Code of
Practice, and ISI Declaration of Professional Ethics).
The problem of interpreting the dynamics of social indicators is encountered
extremely frequently, and therefore, is considered particularly important.
Various specific questions are often discussed such as “Is the gender gap truly
increasing?”, “Is the digital divide growing or stagnating?”, and “Is the
quality-of-life indicator changing equally across subgroups?” These dilemmas
are, therefore, not only methodologically intriguing but also of substantial
practical importance. In this paper, the above issues are addressed within the
somewhat narrower context of simple comparative analyses of a social
indicator for two units/groups in time. The corresponding theory and
empirical research are elaborated to evaluate the opportunity for
misinterpretation (and manipulation) when interpreting such comparisons.
We start with an overview of a broader set of theoretical considerations
regarding numbers and the human perception of quantities (Sect. 2 ). Next,
absolute, relative, and time difference measures are presented together with an
example (Sect. 3 ). For the empirical study, the experimental design is
outlined in Sect. 4 , and the data analyses are presented in Sect. 5 . In the
conclusions (Sect. 6 ), the key findings are discussed in a broader context
together with their methodological limitations and implications for future
research. Some recommendations for practical work using these measures are
also outlined.
2. Understanding Numbers and Statistics
Theoretically, two major research streams in human perception and the
understanding of numerical data can be identified. The first arises from
psychology, and particularly neuropsychology (in relation to the so-called
number sense), while the second is related to statistical thinking and statistical
literacy.
One of the basic questions related to human perception is whether any innate
predispositions determine potential perception. Some answers can be found in
the field of neuropsychology, where important research has been related to the
so-called human “number sense” (the “number concept”, Brainerd 1979 )
defined as the ability to quickly understand, approximate, and manipulate
e.Proofing http://springerproof.sps.co.in:8080/oxe_v1/printpage.php?token=5xv2...
4 od 36 11.1.2014 20:42
numerical quantities. A likely conclusion (Dehaene 2011 : 50) is that there
exists, in fact, an area of the brain specifically for identifying numbers, which
is laid down through the spontaneous maturation of cerebral neuronal
networks under genetic control and with minimal guidance from the
environment. Dehaene (2001 ) argues that the foundations of arithmetic lie in
a human’s ability to mentally represent and manipulate numerosities on a
mental “number line” and that this representation has a long evolutionary
history and a specific cerebral substrate. The findings of various studies,
summarized by Dehaene (2011 ), show that individuals very quickly
differentiate digits up to 3 (4 seems to be the inflection point). Similarly, they
determine which number is bigger or smaller more quickly if the absolute
difference between the compared amounts is greater. They also perceive the
difference between two small numbers as greater than the difference between
two large numbers (e.g., 3 and 4 vs. 103 and 104). The parameter by which
humans naturally distinguish two numbers is, thus, not so much their absolute
difference but their difference relative to their size. An experiment comparing
young children in a Western civilization with uneducated tribal adults
concluded that the representation of numbers in the uneducated adults closely
approximated a logarithmic function and not a line; whereas, in the young,
educated children, the converse was true. Thus, “a shift from logarithmic to
linear mapping occurs later in development, between first and fourth grade
depending on the experience and the range of numbers tested” (Dehaene
2011 ; see also Nunez 2011 ). Such research indicates that the bias for small
numbers can have far-reaching consequences in the way individuals conduct
and interpret statistical analyses. The biological principles behind the number
sense can be explored further (e.g., Dehaene 2001 and Dehaene 2011 ; Göbel
et al. 2011 ; Nunez 2011 ), but the intriguing fact here is that there is a real
and biologically proven possibility that a particular numerical problem will be
viewed in a certain way unless we are taught otherwise. Research has shown
that education plays an important role in developing or acquiring some
numerical abilities (e.g., Brainerd 1979 ; Göbel et al. 2011 ; Nunez 2011 ). If
this is the case, education (specific to culture or civilization) may condition
our understanding and view of numbers in everyday life.
Within this context, for the purpose of our study, the following question is
relevant: if humans are in fact conditioned (taught) to view a certain
comparison in time predominantly in one specific dimension (e.g., the
absolute difference), is our perception of the true state of inequality
(difference or gap) actually distorted or skewed?
e.Proofing http://springerproof.sps.co.in:8080/oxe_v1/printpage.php?token=5xv2...
5 od 36 11.1.2014 20:42
Human perception aspects have been emphasized not only in
neuropsychology but also by researchers in statistical education. Researchers
investigated factors that can explain why some students understand statistics
better while others never understand the importance and meaning of even
simple statistical measures (e.g., Wild and Pfannkuch 1999 ; Friel et al. 2001 ;
Watson and Callingham 2003 ; Garfield and Ben-Zvi 2007 ; Kaplan et al.
2010 ; Pfannkuch et al. 2010 ; Schield 2010 ). This research is covered under
the umbrella term, statistical literacy. Typically, however, research is focused
on a narrow research problem (e.g., understanding certain types of tables and
graphs, percentages, the p value, reading data, etc.). Different terminology
and definitions have arisen in the last few decades with regards to statistical
literacy, which all encompass a certain view on understanding statistics; for
example, statistical thinking (Wild and Pfannkuch 1998, 1999 ), statistical
reasoning (Garfield and Ben-Zvi 2007 ), statistical proficiency (Kaplan et al.
2010 ), statistical literacy (Gaal 2002 ; Schield 2010 ), and numeracy (Best
2008 ).
Paulos (2001) defined innumeracy as the mathematical equivalent of illiteracy
and referred to the inability to perceive the basic meanings of numbers and
probability. Biggeri and Zuliani (1999 ) refer to numerical literacy in terms of
several competences: the ability to work with numbers and quantitative
problems, understanding basic mathematical ideas and patterns, statistical
reasoning, the importance of thinking from the aspect of probability,
collecting and presenting data, the omnipresence of variability, and the
quantification and explanation of variability. However, the most important is
an understanding of the meaning of information (e.g., limitations and source
of statistical information and differentiation between quality and questionable
data). Schield (2010 ) defines statistical literacy as the ability to read and
interpret statistics in everyday media; in graphs, tables, assertions, surveys,
and studies, and states that it is a prerequisite for all data users. Human
understanding of complex analyses, nevertheless, varies significantly with
education (in mathematics or statistics, as well as by general education,
people’s experiences with data, etc.; e.g., Friel et al. 2001 ), which
corresponds to certain assertions in research that are summarized in Chapter
3.
Applied statistics is part of the process of collecting information and learning,
by which we support the process of informed decision and policymaking.
Being in the position of presenting information (as a researcher, statistician,
e.Proofing http://springerproof.sps.co.in:8080/oxe_v1/printpage.php?token=5xv2...
6 od 36 11.1.2014 20:42
policy maker, or journalist/reporter) is demanding. One needs to provide the
reader with enough information to enable him or her to form a coherent view
of the problem, and at the same time, ensure that the choice of measures and
data correctly influences the perception of the reader. According to Gaal
(2002 ), there are two interrelated components to statistical literacy: (a)
people’s abilities to interpret and critically evaluate statistical information,
and (b) their abilities to discuss or communicate their reactions to such
statistical information. The true answers to research questions are often out of
reach, and we are only able to formulate an assessment of the available
answers and interpretations with varying degrees of error. Correspondingly,
we may provide users with several measures that describe a certain problem
(data) and are relevant for the research question, and thus, enable the
individual to form an informed notion of the problem. Of course, such an
approach is very demanding and requires additional resources. In addition, to
apply such an open and expanded methodology for the dissemination of
findings, knowledge of how people perceive different views of a research
question also needs to be gained to minimize possible manipulations and
misunderstandings.
Perception of the different measures is also the focus of this paper, with the
focus on the dynamics of social indicators between units in time (i.e., absolute
difference, relative difference, or time distance). All three measures are
presumed to answer the question: is the difference (gap) between the two
units in time constant, increasing, or decreasing? A comprehensive framework
to cover these issues is still required, and so far, only specific attempts have
been developed, mainly in the field of graphical presentations. Spence and
Lewandowsky (1991 ) and Spence (2005 ), for example, concluded that the
prejudice towards pie charts (instead of bar charts) is sometimes misguided.
Similarly, Galesic and Garcia-Retamero (2011 ) demonstrated through
experimental research that, in both the United States and Germany, one-third
of the population has low graph literacy and low numeracy skills.
AQ1
AQ2
3. Basic Comparative Analyses in Time
The three most common and most basic measures of comparison in time are
absolute difference, relative difference (ratio), and time distance. For
illustrative purposes, an example is presented (Fig. 1 ), which is then also
e.Proofing http://springerproof.sps.co.in:8080/oxe_v1/printpage.php?token=5xv2...
7 od 36 11.1.2014 20:42
used in the experiment. Next, previous research is overviewed. This section is
concluded by outlining the research questions and hypotheses.
Fig. 1
Graphical presentation of the example and the indicated static (A absolute
difference, R relative difference) and dynamic measures of difference (T time
distance)
In the example presented here, a standard question is asked when dealing with
comparisons: is the difference (i.e., monthly income) between two units (i.e.,
individuals) in a certain time interval increasing, decreasing, or constant? For
illustration—as well as for purpose of the experiment—a specific example is
presented in Fig. 1 and Table 1 , which was intentionally constructed so that
the three measures show contradictory conclusions regarding the direction of
change.
Table 1
Data and calculations of statistical measures for the indicator of monthly income (in €)
2008 1,000 500 500 0.50 (Not available)
2009 1,500 1,000 500 0.67 1
2010 1,750 1,250 500 0.71 1.5
2011 2,000 1,500 500 0.75 2
Constant Decrease Increase
It is also necessary to comment briefly on the choice of indicator (i.e., the
e.Proofing http://springerproof.sps.co.in:8080/oxe_v1/printpage.php?token=5xv2...
8 od 36 11.1.2014 20:42
monthly income for person X and person Y for the years 2008–2011). The
aim was to choose an indicator that would immediately intrigue all potential
respondents to actively consider and analyze the problem. The indicator used
is intuitively understandable to all regardless of their educations in
mathematics, statistics, or other data-related fields. Unlike issues in
information society, economic development, or medical studies (e.g.,
Svedberg 2004 ; Dolničar 2007 ; Moser et al. 2007 ; Sicherl 2007 ; Citrome
2010 ; James 2011 ), this example refers to an everyday issue affecting the
majority of the population—income. Average monthly income for two distinct
groups (e.g., the public vs. the private sector or two different countries) over
several decades could also be used as an example. However, a significant
drawback here might be that most individuals probably already have at least
some opinion or even prejudice regarding this issue, which might have an
effect on their perceptions and understandings.
Firstly, the comparison can be made based on the absolute difference, which is
the subtraction of the lagging unit from the leading unit. In this case, the
absolute difference at all points in time is 500 € (1,000–500 in 2008;
1,500–1,000 in 2009, and so on) and is, thus constant, implies that the
difference between the two units has not changed.
Secondly, the relative difference can be calculated (i.e., the ratio of the two
values of the indicator). Similar results follow if some derivatives are
calculated (e.g., various types of growth rates, indices, or Gini coefficient);
however, these measures are more complex for the reader to understand. In
2008, the ratio is 0.5 (ratio = 500/1,000; the lagging unit achieved 50 % of the
value of the leading unit). However, by 2011 this has changed to 0.75
(ratio = 1,500/2,000). Thus, the situation can be interpreted as a decrease in
the difference between the units.
Thirdly, as Sicherl (2004 , 2007 , and 2011 ) suggests our usual perspective
can be complemented by considering the dynamic dimension of the
comparison. The statistical measure S-time-distance expresses the distance
(proximity) in time between points i and j when two compared series reach a
specified level of indicator X according to:
This determines how far ahead in time the leading unit is. The time distance
1
2
L
e.Proofing http://springerproof.sps.co.in:8080/oxe_v1/printpage.php?token=5xv2...
9 od 36 11.1.2014 20:42
in 2009 is 1 year: person X achieved a level of 1,000 in 2008, which means a
1-year time lag (distance) for person Y, who achieved the same level in 2009.
In 2010, the time distance is 1.5 years, and in 2011, it is 2 years (Fig. 1 ).
Considering the situation from this perspective, the gap in time has in fact
increased; person Y needs increasingly more time to catch up with person X.
With this simple and realistic—albeit, controversial—example, it has been
illustrated that the answer to “How is the difference between the two units
changing?” is not always straightforward. Considering different statistical
measures to cover various dimensions of the compared time series, it is clear
that the results can even contradict one another. All of the three presented
measures are statistically correct, legitimate, and possible but imply different
conclusions, and the case itself is not completely artificial but strongly
resembles relations in reality.
Given its importance, the above dilemma has been addressed by surprisingly
few researchers who are, in addition, from various contextual research fields.
Wallgren and Wallgren (2010 ) emphasize that statistical time-series analysis
differs from other aspects of statistical science because, instead of estimating
quantitative parameters (such as regression coefficients), the aim is often to
acquire a picture of qualitative patterns of the time series under study.
Nevertheless, the authors of the present study maintain that appropriate basic
quantitative measures are also necessary for adequate qualitative
interpretations of trends.
Mueller and Schuessler (1961 ) define time series as a succession of
chronologically spaced observations designed to depict growth or decline, or
that they are simply variations in the incidence of the subject observed; while
quantitative observations of time series (measures of dynamics) may be in the
form of absolute values and relative values. Wallgren and Wallgren (2010 )
showed that, even when reporting only the raw values, misunderstandings
might occur. When referring to only two measures (absolute and relative
difference), several authors in various scientific areas highlight the
importance of recognizing that the two measures do not necessarily show the
same direction (e.g., Mueller and Schuessler 1961 ; Amiel and Cowell 1992 ;
Atkinson and Brandolini 2004 ; Svedberg 2004 ; Harper and Lynch 2005 ;
Atkinson and Sicherl 2007 and Sicherl 2011 ; Dolničar 2007 ; Moser et al.
2007 ; James 2009 , 2010 , 2011 ; Citrome 2010 ). For example, Moser et al.
(2007 ) compare the measures for relative inequality (ratio) and absolute
inequality to assess their implications from both static and dynamic
e.Proofing http://springerproof.sps.co.in:8080/oxe_v1/printpage.php?token=5xv2...
10 od 36 11.1.2014 20:42
perspectives. They conclude that there is substantial variation between
countries in the size and direction of change in inequality, demonstrating that
the direction of trends can depend upon the way in which inequalities are
measured. In addition, Harper and Lynch (2005 ) show that, in relative terms,
the difference in stomach cancer mortality between genders from 1930 to
2000 has steadily increased, but in terms of absolute difference, there has
been a steep decline in disparity between men and women. If this is not
properly taken into account (given the specific research question and the
characteristics of the research problem), it may lead to skewed answers.
Atkinson and Brandolini (2004 ) go even further and emphasize that there is
no a priori reason to rank the relative over the absolute criterion since they are
both equally acceptable, and that the choice is a value judgment. They argue
that people differ in their views and that their evaluation patterns are more
complicated than the simple relative/absolute dichotomy.
Reporting of inequality is particularly problematic when only relative
measures are considered and estimates of absolute inequality are not included.
This attitude is supported by Amiel and Cowell (1992 ), who posed verbal
and numerical questions to groups of students to elicit their views on
inequality. One-third of the respondents preferred the relative approach and
one-sixth preferred the absolute approach, while the remaining fraction
rejected either approach and followed some other logic. Similarly, certain
methodological approaches (e.g., Land et al. 2011 ) also focus mainly on
relative differences. On the other hand, James (2011 ), in the context of
comparing countries (developed vs. developing), clearly advocated the use of
the absolute difference because the analysis must capture a real achievement
rather than simply be an exercise in arithmetic starting from a low base
number, which then serves for relative comparisons. In possible contradictory
trends that can be deduced from absolute and relative difference alone, some
elaboration is made also by Mueller and Schuessler (1961 ), who believe that
the choice of measure needs to be appropriately tailored according to the
research problem.
AQ4
In recent years, a strong focus has been evident in using an alternative method
to compare units in time—the dynamic view, using the time distance measure
(e.g., Sicherl 2004 , 2007 , 2011 ; Vehovar et al. 2006 ; Dolničar 2007 , 2008 ;
Mueller 2011 ). This measure has the potential to provide important additional
information as a supplementary measure to existing simple comparisons. This
e.Proofing http://springerproof.sps.co.in:8080/oxe_v1/printpage.php?token=5xv2...
11 od 36 11.1.2014 20:42
has already been acknowledged by the application of this measure by several
research institutions (e.g., Granger and Jeon 1997 ; Empirica 2005 ;
EUROCHAMBRES 2005 ; Eurostat 2010 ; ITU 2010 ; NSCB 2010 ; IHME
(Kulkarni et al. 2011 ); The Millenium Project (Glenn and Florescu 2011 );
OECD 2011 ; SORS 2012 ).
In general, comparative analyses most often inquire into specific contextual
research issues rather than related general methodological questions (e.g.,
Natoli and Zuhair 2011 ; Pasimeni 2011 ; Dale and Neal 2012 ). As a more
specific methodological approach, Mueller (2011 ) recently introduced an
innovative methodology of social clocks for the monitoring of
multidimensional social development, which also includes the time distance
concept.
Nevertheless, despite several attempts, no comprehensive research has been
identified regarding how readers understand and perceive various measures of
dynamics. Consequently, a most basic and general methodological guidance
for the usage of these three measures is still lacking. On the other hand, we do
have some guidance in the somewhat similar conceptual dilemma regarding
the usage of the measures of central tendency (mean, modus, and median). In
this case, every statistical textbook clearly indicates which measure is
preferred in certain circumstances because they too may provide contradicting
results. This is exactly what is also required for the three basic measures of
dynamics in time. In this situation, however, the context and circumstances
are, of course, much more complex.
Within this framework, the contribution of this experimental study is twofold:
first, the methodological aspect will contribute to the empirical evidence
regarding how people read and understand measures of changes in time;
second, in a more substantial context, implications and practical suggestions
are presented for a more comprehensive way of presenting basic comparative
analyses to the general public.
AQ3
4. The Experiment
As mentioned, there is no consensus on guidelines regarding when and how to
use one or another measure of dynamics in time. The most often-used
approximation is the recommendation that we should use the measure that
best answers our specific research question, but there are no guidelines on
e.Proofing http://springerproof.sps.co.in:8080/oxe_v1/printpage.php?token=5xv2...
12 od 36 11.1.2014 20:42
how to achieve this. As is evident from the previous review, different authors
prefer different measures. Nevertheless, in some very specific types of time
series, such as diffusion process (e.g., digital divide), a more elaborative
framework has already been established. However, in this case, only very
specific (increasing) types of time-series trends are observed (e.g., penetration
of Internet usage) with a set of expected characteristics of the diffusion
process. However, this is not the case for time series in general (e.g., a
financial indicator, as in the above example), so for the purpose of this
experiment, a rather general case of dynamics in time was selected.
4.1. Research Question and Hypotheses
The general thesis is that manipulation of presentations of the data can have
very serious consequences on the reader’s perception. As illustrated
previously (Table 1 and Fig. 1 ), even in very simple (mathematically almost
trivial) comparative analyses contradicting suggestions may occur.
The focus of this study is on the perceptions of the receivers of data (readers
and users): Do their perceptions differ according to the specific measure
used? Are these differences so important that we need to consider them when
building a methodological and interpretational framework? Is there room for
manipulation?
Based on previous research and a review of the literature, an experiment was
conducted to test the following hypotheses:
• H1: The first reaction of users when exposed to raw data in a table is
thinking in absolute differences.
• H2: When exposed to specific measures, users accept absolute rather than
relative differences, while time distance has a rather weak impact on their
perceptions.
• H3: To ensure that the relative difference or time distance measure are
considered, additional stimuli are required (i.e., strong, explicit, or even
exclusive exposure to this measure).
• H4: When exposed to relative differences, users follow the corresponding
interpretation instead of the default thinking of absolute differences. This
effect is much less when exposed to time distance.
e.Proofing http://springerproof.sps.co.in:8080/oxe_v1/printpage.php?token=5xv2...
13 od 36 11.1.2014 20:42
• H5: A graphical presentation strongly implies and reinforces the
perception of absolute differences.
• H6: The time distance measure is by far the most cognitively demanding,
while absolute differences are the easiest to understand.
The next section explains the research design as a combination of
between-group and repeated-measures designs, which allow for testing all the
hypotheses. To determine the differences within the experimental groups, the
paired comparison t-test is used; to test the differences between the groups
(regarding the treatment effect and the cognitive burden), one-way ANOVA is
used with corresponding post hoc tests (Bonferroni and Dunnett test). We
expect that strong relationships and strong causality links will be revealed as
significant in a relatively small sample.
4.2. Experimental Design
The overall aim of the experiment was to study the perceptions and
understandings of a certain measure. Following Creswell (2009 ), the
experiment was designed to test the effects of an intervention on the outcome
variable. The experimental design is illustrated in Fig. 2 . The measures of
outcome were all of the same kind (a judgment of difference) on the same
contextual example. First, all respondents were asked to judge the difference
based on the data in a table (Fig. 2 step A). Second, an experimental
treatment was introduced: the first group was asked to consider the absolute
difference, the second group was asked to consider the relative difference
(ratio), and the third group was asked to consider the time distance measure
(Fig. 2 step B). The control group was unexposed to this manipulation. All
groups were then shown a graphical presentation of the same example (Fig. 2
step C), and the question that followed was based on all measures and the
graphical presentation (Fig. 2 step D). After each step, a simple measure of
cognitive burden was introduced to all groups.
Fig. 2
e.Proofing http://springerproof.sps.co.in:8080/oxe_v1/printpage.php?token=5xv2...
14 od 36 11.1.2014 20:42
The experimental design. Note In the schematic presentation, only variable
labels and consecutive numbering (in brackets in each label) of the indicators
are presented. The survey questionnaire has subsequently been translated to
English and is available for preview at https://www.1ka.si/SURVEY. In the
“Appendix”, a screenshot of one of the steps in the survey is presented (Step
D)
From this point forward, the four experimental groups are referred to by the
following abbreviations: ABS (experimental group 1, absolute difference),
REL (experimental group 2, relative difference), TIME (experimental group
3, time distance), and CONT (control group, no experimental manipulation).
4.3. Sampling
A non-probability sample was used for this experimental study, which is
e.Proofing http://springerproof.sps.co.in:8080/oxe_v1/printpage.php?token=5xv2...
15 od 36 11.1.2014 20:42
usually the case with randomized experimental studies where relations and
causality are addressed, particularly in clinical studies and marketing research
but also in social sciences (Schreuder et al. 2001 ). Thus, the focus here is
primarily on the internal validity, while external validity (i.e., inference to the
population) cannot be formally elaborated. However, extensive empirical
evidence shows that, while the sample may not be representative in socio-
demographic controls, the relationships and causality found are usually very
robust. Implicitly, in social research, this is also confirmed with growing
usage of internet/access panels (ESOMAR 2011 ), which are becoming the
prevailing method for surveying general population. In addition, our own
more than 15-year experience with RIS (www.ris.org) research, where
face-to-face and telephone surveys were extensively compared with
(reasonably spread) non-probability web surveys, shows there is almost no
difference in averages and distributions for ordinal and ratio scale variables.
With shares (percentages), certain biases may appear towards the
characteristics of intensive Internet users (Vehovar et al. 1999 ). Thus, it is
highly likely that relationships found in the experiment also hold true in the
general population, particularly for the differences in means of scale
variables.
The research was conducted in Slovenia, which is by all information society
indicators an average European country with around 70 % of weekly internet
users in population 16–74 (Eurostat 2013 ).
The sample (n = 146) consisted of predominantly young, educated adults
broadly recruited on the web and via social networks. The sampling started
with a small pool of initial contacts of the authors and continued in the form
of a snowball sampling. According to gender, the sample was quite balanced
(44 % male and 56 % female), approximately 80 % of the respondents were
30 years old or younger (the rest were aged between 31 and 50 years), and the
majority were well educated (more than 50 % with a higher education, and
most of the others were students). There were no significant differences in
results regarding gender, age, or education—also additionally and strongly
reinforcing the robustness of the findings—so we did not perform any socio-
demographic weighting. However, weighting was used to remove the
remaining random variation in differences of the initial perceptions (i.e.,
answers to the first question, step A). This way, the same initial distribution
on the first question was effectively assured for all groups. This was achieved
with simple post-stratification weighting, which adjusted the distributions of
3
total
e.Proofing http://springerproof.sps.co.in:8080/oxe_v1/printpage.php?token=5xv2...
16 od 36 11.1.2014 20:42
the four groups on the first variable of the experiment according to the known
average of all respondents in this question. As the sample was relatively
small, the focus was only on very strong effects, which may become visible
(statistically significant) in a relatively small-scale study.
After respondents started the survey, they were randomly assigned to groups
(using the web application random function), so differences in the outcomes
could be attributed (besides random variation) only to the experimental
treatment.
In experiments, we either expose different units to different experimental
manipulations (between-group or independent design) or take a single group
of units and expose them to different experimental manipulations at different
points in time (a repeated-measures design). Here, a combination of both was
used as it is possible to compare the four groups with one another (based on
one different experimental treatment), but the differences within each group
can also be compared (based on the presumption that each step is an
experimental treatment, per se).
5. Results
The data for each experimental group are presented in a table of frequencies
across the main experimental factors (Table 2 ). Further, results of analyses of
the differences between and within the groups are presented. All the tests
where ANOVA was used have been repeated using even more robust
non-parametric tests (Kruskal–Wallis test and Mann–Whitney test). Where
ANOVA showed statistically significant differences, post hoc tests were used
to identify for which pair of groups the difference was actually significant
(Bonferroni and Dunnett test). Since one-way ANOVA and non-parametric
tests showed identical conclusions, our decision was only to report the
significant results of one-way ANOVA.
Table 2
Frequencies of the main experimental factors (weighted data)
ABSA—data(1)
4 10 21 56 13 34 38 100 2.3 0.62
e.Proofing http://springerproof.sps.co.in:8080/oxe_v1/printpage.php?token=5xv2...
17 od 36 11.1.2014 20:42
B—ABS(4)
2 5 23 61 13 34 38 100 2.3 0.56
C—graph(10)
2 5 27 72 8 23 37 100 2.2 0.50
D—all(13)
2 5 20 53 16 42 38 100 2.4 0.59
REL
A—data(1)
3 10 19 56 12 34 34 100 2.3 0.62
B—REL(6)
5 14 10 28 20 58 35 100 2.4 0.74
C—graph(10)
2 7 25 74 6 19 33 100 2.1 0.50
D—all(13)
3 10 18 53 13 37 34 100 2.3 0.64
TIME
A—data(1)
4 10 23 56 14 34 41 100 2.3 0.62
B—TIME(8)
7 17 21 52 13 32 41 100 2.2 0.69
C—graph(10)
0 0 34 83 7 16 41 100 2.2 0.37
D—all(13)
4 11 21 54 13 35 38 100 2.2 0.64
CONT
A—data(1)
3 10 20 56 12 34 35 100 2.3 0.62
B(experimental treatment was not introduced to the controlgroup)
C—graph(10)
1 3 29 84 5 13 35 100 2.1 0.40
D—all(13)
6 16 16 45 14 39 36 100 2.2 0.72
Mean answer to the question “Is the difference increasing, constant, or decreasing?”,1 (increasing), 2 (constant), and 3 (decreasing)
n , n , n = weighted data for the categories of answers: n = increase, n = constant, andn = decrease
mean evaluation of the cognitive burden on the scale from 1 to 5, where 1 = very easy5 = very difficult
a
bi c d i c
d
C
e.Proofing http://springerproof.sps.co.in:8080/oxe_v1/printpage.php?token=5xv2...
18 od 36 11.1.2014 20:42