Auditing Wikipedia's Hyperlinks Network on Polarizing Topics
-
Upload
khangminh22 -
Category
Documents
-
view
1 -
download
0
Transcript of Auditing Wikipedia's Hyperlinks Network on Polarizing Topics
Auditing Wikipediarsquos Hyperlinks Network on Polarizing TopicsCristina Menghini
DIAG
Sapienza University
Rome Italy
menghinidiaguniroma1it
Aris Anagnostopoulos
DIAG
Sapienza University
Rome Italy
arisdiaguniroma1it
Eli Upfal
Dept of Computer Science
Brown University
Providence RI USA
elicsbrownedu
ABSTRACTPeople eager to learn about a topic can accessWikipedia to form
a preliminary opinion Despite the solid revision process behind
the encyclopediarsquos articles the usersrsquo exploration process is still
influenced by the hyperlinksrsquo network In this paper we shed light
on this overlooked phenomenon by investigating how articles de-
scribing complementary subjects of a topic interconnect and thus
may shape readersrsquo exposure to diverging content To quantify
this we introduce the exposure to diverse information a metric that
captures how usersrsquo exposure to multiple subjects of a topic varies
click-after-click by leveraging navigation models
For the experiments we collected six topic-induced networks
about polarizing topics and analyzed the extent to which their
topologies induce readers to examine diverse content More specif-
ically we take two sets of articles about opposing stances (eg
guns control and guns right) and measure the probability that users
move within or across the sets by simulating their behavior via
a Wikipedia-tailored model Our findings show that the networks
hinder users to symmetrically explore diverse content Moreover
on average the probability that the networks nudge users to remain
in a knowledge bubble is up to an order of magnitude higher than
that of exploring pages of contrasting subjects Taken together
those findings return a new and intriguing picture of Wikipediarsquos
network structural influence on polarizing issuesrsquo exploration
KEYWORDSWikipedia Hyperlinks Network Polarization User Behavior
ACM Reference FormatCristina Menghini Aris Anagnostopoulos and Eli Upfal 2021 Auditing
Wikipediarsquos Hyperlinks Network on Polarizing Topics In Proceeding of TheWeb Conference 2021 April 19ndash23 2021 Ljubljana Slovenia ACM New York
NY USA 13 pages httpsdoiorg101145nnnnnnnnnnnnnn
1 INTRODUCTIONKnowledge onWikipedia is distributed across articles inter-connected
via hyperlinks According to Wikipediarsquos Linking Manual [49] In-ternal links can add to the cohesion and utility of Wikipedia allowing
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page Copyrights for components of this work owned by others than ACM
must be honored Abstracting with credit is permitted To copy otherwise or republish
to post on servers or to redistribute to lists requires prior specific permission andor a
fee Request permissions from permissionsacmorg
WWW rsquo21 April 19ndash23 2021 Ljubljana Sloveniacopy 2021 Association for Computing Machinery
ACM ISBN 978-x-xxxx-xxxx-xYYMM $1500
httpsdoiorg101145nnnnnnnnnnnnnn
readers to deepen their understanding of a topic by conveniently ac-cessing other articles Consequently while reading an article users
are directly exposed to its content and indirectly exposed to the
content of the pages it points to
Wikipediarsquos pages are the result of collaborative efforts of a com-
mitted community that following policies and guidelines [4 20]
generates and maintains up-to-date and high-quality content [28
40] Even though tools support the community for curating pages
and adding links it lacks a systematic way to contextualize the
pages within the more general articlesrsquo network Indeed it is im-
portant to stress that having access to high-quality pages does not
imply a comprehensive exposure to an argument especially for a
broader or polarizing topic
Users differently use Wikipedia according to their informationneeds Singer et al [45] show that users curious about a topic explore
it by browsing the encyclopedia In fact they rely on hyperlinks to
find correlated or complementary content to the subject of interest
Therefore it is crucial to evaluate the extent to which the current
link structure encourages users to browse related topics to develop a
more comprehensive view and perspective of a subject This theme
becomes particularly important when users look for an overview
on polarizing topics spanning across multiple articles
Wikipediarsquos Neutral Point of View (NPOV) encourages editors
to work such that articlesrsquo content fairly and proportionately repre-
sents all the significant views that have been published by reliable
sources on the subject [51] Although the NPOV document gathers
many suggestions to properly curate the direct content of pages itdoes not refer to the impact links might have in determining usersrsquo
exposure to indirect contentSuppose we consider the topic abortion It is a broad issue which
distributes across multiple articles on Wikipedia Moreover due to
its polarizing nature it is possible to recognize pages about events
people subjects or organizations that are associated either to pro-choice or pro-life standings Users willing to learn about abortionmight access the encyclopedia to collect information and then de-
velop their idea Consider a user that enters the network reading
the article Abortion-rights movement that portrays and outlines
campaigns supporting abortion We assume that the articlersquos body
does not endorse the pagersquos subject due to the NPOV principle
So we expect that the user acquires objective knowledge about
organizations supporting abortion and maybe also realizes the
existence of anti-abortion movements Now imagine that the user
decides to continue her exploration of the topic and to do it she
follows the hyperlinks within the current page If the linkage to
pages regarding subjects close to pro-life view is weak our user has
little possibilities of collecting diverse views that contribute to the
usersrsquo development of a comprehensive perspective on the topic It
arX
iv2
007
0819
7v4
[cs
SI]
8 M
ar 2
021
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
follows that the lack of sufficient linkage among pages expressing
diverse stances of a topic can be against the NPOVrsquos goals
To our best knowledge there are no studies that investigate
the validity of the NPOV principles concerning usersrsquo exposure to
the indirect content (ie the one suggested by hyperlinks) Hence
analyzing Wikipediarsquos links is particularly important to understand
if broad topics which conceptually span across multiple articles
are effectively proportionately and fairly presented to readers not
only in terms of direct content (ie articlersquos body)Previous works addressed the issue of usersrsquo polarization on
social networks and showed that it is hard for users to interact
with content createdshared by users of opposing views [1 9 10
19 24] Ribeiro et al [42] empirically showed that the YouTube
recommender system contributes to radicalize usersrsquo pathways
Given the nature and role of Wikipedia as a primary source of
knowledge acquisition the lack of broad exposure to different views
of a topic appears to be critical to guarantee fair and balanced access
to a well-rounded knowledge
This paper provides a first observational study on Wikipedia
that aims to quantify how the hyperlinksrsquo network topology can
profoundly affect user exposure to diverse stances on polarizing
topics Having a comprehensive view of the connections among
Wikipedia pages and how they shape reader exposure to informa-
tion is a difficult task to grasp for humans Therefore it requires
introducing algorithmic methods to audit and quantify the mutual
level of exposure among articles of diverse content especially for
polarizing matters That is fundamental for the improvement of the
encyclopedia and its role in promoting a self-critical society
By studying the hyperlinks network we first aim to discover
to what extent the networkrsquos topology pushes users to explore
diverse content rather than keep them within knowledge bubbles1Secondly we aim to gain insights that may help to design a system
supporting editors in (1) contextualizing pages within the more
general encyclopediarsquos network and (2) adding links connecting
articles of opposingcomplementary views
In summary this paper tackles the following research questions
RQ1 How do readers consume articles about polarizing topics
(Sect 5)
RQ2 To what extent does the hyperlinksrsquo network expose read-
ers to diverse information (Sect 6)
By answering them we make the following contributions
bull We initiate a discussion that aims to shed light on the role
that the hyperlink network plays in connecting articles be-
longing to different categories We focus our work on ana-
lyzing this phenomenon on a set of polarizing topics such
as abortion guns evolution
bull We define two metrics the exposure to diverse informationand the (mutual) exposure to diverse information to quan-
tify the strength of connections among sets of articles (eg
pages about abortion-rights and anti-abortion) These met-
rics quantify to what extent the network topology assists
readers to visit pages of contrasting subjects and whether it
does it equally for all them (see Sections 41 412 and 42)
To this end they embed readers possible behavior relying
1We intend as knowledge bubbles the sets of pages presenting one side of a con-
tentious subject (ie pages about pro-life or pro-choice movements)
on their behavioral patterns [43 45] features determining
the success of wikilinks [16 31] and readersrsquo clickstream
data [53]
bull We find that the structure of the network facilitates users
to explore knowledge bubbles of homogeneous view rather
than opposing stances Moreover we show that readersrsquo
interest is biased toward one side of the topic based on the
internal and external traffic on Wikipedia (see Sect 411 5
and 6)
To our knowledge this is the first work that analyzes Wikipediarsquos
readersrsquo exposure to diverse information through the link network
Before moving on we want to emphasize that this work does not
claim how the hyperlinks network should be rather we aim to
study if the current connections among articles encumber users in
visiting complementary pages about a polarizing topic Also our
conclusions come from a network-based analysis More advanced
investigation combining network properties and articlesrsquo content is
left out for future works The code to replicate the paper is stored
in an anonymous folder2
2 RELATEDWORKSWe divide this paper related work in four categories ImprovingWikipedia Navigating Wikipedia Wikipedia Categorization and
Polarization on Social MediaImprovingWikipedia The scientific community proposed semi-
automated procedures to improveWikipediarsquos quality These works
check the veracity of references [18 41] suggest articlesrsquo structure
[39] look for hoaxes [30] or recommend links [38 54] Although
link recommendation tools enrich the editing process they do not
provide editors a measure to evaluate the relationship among ar-
ticles containing diverse opinions In this work we define such
metrics Sect 42
Wikipedia Navigation The literature still lacks a model that
generalizes Wikipediarsquos usersrsquo behavior Previous studies [25 27
31 46] focused on modeling and predicting human navigation in-
side Wikipedia relying on traces from navigation games ie Wik-ispeedia [43 48] and WikiGame [13 29 46]3 While such games
provide valuable insights about how users exploit links to go from
one concept to another Singer et al [45] and [15 17] showed that
users display different behavioral patterns depending on their in-
formation needs and the linksrsquo position within pages Thus we
exploit the insights provided by Singer et al [45] to define a general
model mimicking localized and more in-depth topic exploration
We further enrich the model characterizing usersrsquo next-link choices
according to findings in [15 17] Sections 412 and 413
Wikipedia Categorization In this work we need to collect ar-
ticles expressing the distinct facets of a polarizing topic Wikimedia
provides a supervised classifier ie ORES4that based on features
derived from the articlesrsquo text categorize an article into a manually-
designed categories taxonomy5[3] Alternatively one can use topic
models [5 6] Unfortunately none of the above approaches provide
2httpsdrivegooglecomdrivefolders1CJr_YiFE2YlyAtB9yKaGe8CLwVLWx9Ta
usp=sharing
3These games ask readers to go from one article to another using wikilinks
4httpsoreswikimediaorg
5httpswwwmediawikiorgwikiORESArticletopicTaxonomy
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
Figure 1 From Wikipediarsquos graph to a topic-induced net-work The image on the left shows the original Wikipediarsquosgraph On the rights we have the final topic-induced net-work The dashed circles in119882 identify the set of nodes thatwe use to build the topic-induced network 119866 The color redrefers to the set of nodes 119875 We use the blue to indicate 119875
and green and yellow for N and 119904 respectively To keep theimage tidy we do not specify the edges direction
us the requested granularity So we exploit the collection proce-
dure employed by Shi et al [44] who needed the same data to study
how polarization in teams impact articlesrsquo content about polarizing
topics see Sect 3
Polarization on Social Media There is a large spectrum of
works related to detect [1 10 12 19 36] model quantify and mit-
igate [2 9 21ndash24 32 34 35 37] polarization on social media We
focus on the work of Garimella et al [24] that better relates to our
metric of exposure to diverse information (ExDIN) They introduced
a graph polarization measure based on random walks ie Random
Walk Controversy score (RWC) On a social graph it quantifies to
what extent opinionated users are more exposed to their own opin-
ion than the opposite thanks to a chain of retweets (represented by
the random walks) While RWC is conceived for networks of users
and measures the overall polarization of a graph ExDIN works
on information networks and quantifies how the networkrsquos topol-
ogy impacts the usersrsquo exposure to diverse information when they
navigate the graph
Cultural bias onWikipedia Callahan and Herring [8] showedthe presence of cultural bias in the same articles of different lan-
guages Other studies highlighted differences between women and
men biographies [26 47] These content-based analyses call for
the need for a thorough investigation of the phenomenon To this
end we decide to investigate the presence of bias in the hyperlink
network by quantifying the diversity of pages it suggests to users
browsing the network of articles
3 DATA COLLECTIONTo audit a polarizing topic on Wikipedia we encode it by building
a topic-induced network This representation embeds both the
network structure and readersrsquo interactions with the topic
31 Topic Induced NetworksIn this section we explain how to build a topic-induced network
We suggest the reader to follow the process looking at Figure 1
First we consider the directed English Wikipediarsquos graph119882 =
(119860 119871) The nodes of the graph are encyclopediarsquos pages classified
as Articles [50] The edges represent the links connecting pages
and are known as wikilinks6 This set of links includes those in the
infoboxes7
Among the vertices we identify a set of pages T sub 119860 about the
different polarizing sides of a given topic We partition T into two
sets 119875 and 119875 (ie 119875 cap 119875 = empty and 119875 cup 119875 = T ) Each of them gather
pages related to the same side of the topic Then we define the set
of nodes N that includes all vertices at one-hop distance from the
vertices in T The reason we consider nodes representing pages
outside T is twofold (1) We want to include in the graph those
nodes related to the topic that do not appear in T because describe
subjects neutral to the topic8 (2) When we will consider readers
exploring the network we want to account for the possibility that
they reach pages about entities of opposing opinion passing through
articles not strictly related to the topic (see Sect 41)
To reduce the complexity of our analysis we cluster all the pages
in 119878 = 119860 (T cup N) in one super node 119904 Note that nodes in 119878
are only connected to vertices in N For each node 119907 isin N we can
have multiple edges going to 119904 We compress them in a unique edge
(119907 119904) Respectively 119904 can point multiple times to the same node
119907 isin N So we compress them to a unique edge (119904 119907) In both cases
the weights of (119907 119904) and (119904 119907) will be the sum of weights of the
aggregated edges
Finally we built a directed weighted network119866 = (119881 119864) that wecall topic-induced network whose set of vertices119881 is T cupNcup119904 ofcardinality 119899 + 1 and the edges 119864 are the links connecting the pages
The edge weights are transition probabilities as follows Let119872 be
an (119899 + 1) times (119899 + 1) right-stochastic transition matrix associated
to 119866 that is a matrix such that each entry 119898119894 119895 is a probability
with119898119894 119895 = 0 if (119894 119895) notin 119864 and such that
sum119899+1
119895=1119898119894 119895 = 1 The entry
119898119894 119895 describes the probability that being on article 119894 a reader clicks
page 119895 In Section 412 we propose different characterizations of
the transition matrix
Summarizing to extract the topic-induced network of a given
topic we first extracted data from a complete English Wikipedia
database dump9From this dump we build the graph119882 To collect
the corpus of articles expressing different opinions about the topic
(ie T ) we rely on the collection strategy adopted by the authors
of [44] (see Sect 2) In particular the subcorpus belonging to 119875
consists of all articles categorized under a Wikipedia category de-
scribing a viewpoint and its subcategories For instance the corpus
of abortion articles consists of two subcorpora pro-life (119875 ) and
pro-choice (119875 ) articles The pro-life subcorpus consists of all articlescategorized under the seed category ldquoAnti-abortion movementrdquo and
its subcategoriesFor instance the article ldquoFetal rightsrdquo is directly un-
der the seed category whereas the article ldquoCrisis pregnancy centerrdquo
is located under the subcategory ldquoAnti-abortion organizationsrdquo The
pro-choice corpus is collected in a similar fashion starting from the
category ldquoAbortion-rights movementrdquo Note that because we want
6We exclude links within the same page Moreover while building the graph
we resolve all the redirects [52] Specifically for any given node 119903 pointed by 119906 and
redirecting to 119907 we replace the edges (119906 119903 ) and (119903 119907) with (119906 119907) The final effectof this operation is that we exclude all the redirecting nodes from119860 while retaining
their connections to the rest of the graph
7An infobox is a fixed-format table usually added to the top right-hand corner of
articles to consistently present a summary of some unifying aspect that the articles
share
8For instance articles that present an overall introductiondescription of the topic
9Unless differently specified we refer to the dump of September 2020
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Topic |119881 119904| |119875 | |119875 | |N | |119864 | |119864119875rarr119875 | |119864119875rarr119875 | |119864119875rarrN | |119864119875rarrN | |119864Nrarr119875 | |119864Nrarr119875 | 119876 (119875 119875) Unreach(119875 ) Unreach(119875 )
Abort 56861 481 291 56093 19M 205 97 21843 14492 21396 29889 029 (030041) 21 481
Cannabis 32743 45 231 32470 11M 8 6 1089 15055 656 27823 027 (014 003) 1136 349
Guns 65743 167 187 65393 25M 98 115 18342 12304 56702 16608 026 (024 030) 363 000
Evolution 84788 342 1334 83113 199M 391 135 18289 45472 15601 58720 020 (022 027) 169 562
Racism 129963 1024 1022 127953 48M 746 560 64359 41566 74354 58195 032 (021 031) 272 255
LGBT 150563 459 640 149479 46M 195 143 28100 22678 92975 81706 034 (030 013) 244 535
Table 1 Networksrsquo statistics The notation119876 (119875 119875) isin [0 1] indicates the modularity among the partitions Higher119876 means thatconnections within partitions exceed those among them
Pro-lifePro-choice
ProhibiitonActivism
ControlRights
CreationismEvol Bio
Racism Anti-racism
DiscriminationSupport
000
025
050
075
Links
pos
ition
Opposite opinion Same opinion
Figure 2 Linksrsquo position distribution within pages Given 119875 and 119875 the orange boxplots show the distribution of links withinpages in 119875 (resp 119875) that point to articles in 119875 (resp 119875 ) The green boxes represent linkrsquos placement among pages only belongingto 119875 (resp 119875 ) The value of the y-axis is the relative position re-scaled with the 119905119886119899ℎ to similarly score links at the top of thepage Higher the value higher the position in the page is
119875 and 119875 to be disjoint articles belonging to both ldquoAnti-abortion
movementrdquo and ldquoAbortion-rights movementrdquo are assigned to N10
Once we have the list of pages in T we proceed building the topic-
induced network as described in the first part of this section The
articles we collect gather pages about different entities such as
organizations people events The inclusion of a heterogeneous set
of pages for each viewpoint allows to capture the different way a
user can learnknow about a topic
Before moving on we need to make two remarks (1) Throughoutthe paper when we talk about articles expressing an opinion ordescribe a viewpoint of a topic we do not mean that they endorse
the position of any subject they describe But they objectively talk of
entities that are close to one side of the issue (2) Since subcategoriesare often redundant or not entirely related to the parent category
we check them manually In this way we avoid cases like having
articles about anti-racism falling into the racism category Moreover
we do not consider categories whose names do not include topic-
specific keywords
32 General Statistics on Topicsrsquo NetworksFollowing the procedure explained in the previous section we
collect the topic-induced network related to six different topics
that we pick from the List of controversial issues on Wikipedia11
and other resources that indicate some controversial issues in our
society These topics are abortion cannabis guns evolution LGBTand racism These are critical topics that often polarize as follows
pro-choice vs prolife cannabis activism vs cannabis prohibition
gun control vs gun rights creationism vs evolutionary biology
support to LGBT rights vs opposition to LGBT rights and racism
10We report the size of the intersections between partitions in the next section
11httpsenwikipediaorgwikiWikipediaList_of_controversial_issues
Topic 119875 119875 Seed 119875 Seed 119875
Abortion Pro-life Pro-choice
Anti-abortion
movement
Abortion-rights
movement
Cannabis Prohibition Activism Cannabis prohibition Cannabis activism
Guns Control Rights
Gun control
advocacy groups
Gun rights
advocacy groups
Evolution Creationism
Evolutionary
biology
Creationism
Evolutionary
biology
Racism Racism Anti-racism Racism Anti-racism
LGBT Discrimination Support
Discrimination against
LGBT people
LGBT rights
movement
Table 2 The table indicates what opinion of a topic the par-titions 119875 and 119875 correspond to
vs anti-racism Information about the seed categories of each topic
are in Table 2 The full category lists and sample titles are provided
in the code folder Sect 1
For the rest of the paper we refer to the opinions about a topic
using 119875 and 119875 In Table 2 for each topic we match each set to the
real opinion it represents
Before presenting the general statistics of the retrieved networks
we remark that when we assign the articles to partitions we put
to the set N those assigned to both partitions The size of the
intersections among partitions (ie the number of common articles)
are the following abortion is 2 cannabis is 3 evolution is 2 guns is 1lgbt is 5 racism is 7 Recalling that we do not remove these articles
(ie they belong to N ) they can still act as bridges connecting 119875
and 119875 in sessions longer than 1 click Instead when we consider
the direct connections among partitions (1 click) we discard them
since they do not explicitly categorized into one partition
In Table 1 we show some statistics on the six topic-induced
networks Immediately we observe that the size of 119875 and 119875 differ
substantially for all the topics except for racism and guns It means
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
that we have one of the two opinions represented by more articles
In terms of content this does not necessarily imply that neither
one of the two views is incomplete nor insufficiently represented
Indeed a topic spans a few articles or may require more pages to
be complete On the other hand the unbalanced sizes can affect an
opinionrsquos exposure within the entire Wikipediarsquos network Practi-
cally if a set of articles is large and well connected to the rest of
the network the chances that users who randomly browses reach
it are higher than those of going to a small partition Moreover if
readers exploit the random article functionality of Wikipedia an
opinion more represented gets more chances of being randomly
sampled
The topics showing the higher unbalance are cannabis wherethere are five times more pages about activism than about prohi-bition and evolution where there are four times more pages about
evolutionary biology than about creationism If we consider the edges
across partitions the number of cross-partition edges is higher for
bigger sets This is reasonable because more nodes can point to the
opposite side Despite that for evolution the edges from creationismto evolutionary biology are sim3 times more and for LGBT the edges
from discrimination to rights are 36 more Despite the low number
of edges across cannabis partitions we decide not to discard the
topic
Above we said that one of the two partitions might connect
better to the rest of the encyclopedia We observe that the sizes of 119875
and 119875 are not linear in the number of edges that point out or to the
nodes in the partitions For instance the number of articles about
pro-choice (291) is half of the nodes related to pro-life movement
(481) Although the nodes in pro-life are twice as many as those in
pro-choice the number of links pointing to pages about pro-choiceis 36 more than those pointing to pro-life articles This happenswith different magnitude also for guns and LGBT We will see later
that the fact that a side of a topic is better blended in the network
has implications on the readersrsquo exposure to one of the two sides
of the topic (Sect 6)
We also investigate how many pages in 119875 and 119875 cannot be
reached by users unless they enter Wikipedia directly on those
pages The sets of articles with the highest number of unreachable
nodes are in the category of cannabis prohibition (1136) followed
by the 562 of evolutionary biology and LGBT rights (535)Furthermore we compute the modularity 119876 among 119875 and 119875
Higher 119876 means that connections within partitions exceed those
among them In Table 1 we report three values computed on dif-
ferently weighted graphs with probabilities assigned to click the
link of each page as follows (1) uniform (2) proportional to the
position of the link within the page and (3) proportional to readersrsquo
clickstream (see Sect 412) Overall if we consider the position of
links and readersrsquo clickstream it seems that the partitions are more
modular
Based on that we study how links across and within partitions
position in pages First we define the position of a link Given a
page we have its list of links in order of appearance We get the
relative rank within the list for each link and re-scale it by the tanh
In this way we have values in [0 1] and the links at the top of the
list get a more similar score The set of links includes those in the
infoboxes We regard them as at the top of the article according to
results in [15 17] If a link appears more than once we average its
position
In Figure 2 we show the position distributions According to
the t-test whose significant level is fixed to 120572 = 095 the average
position of links in pro-choice pointing to pro-choice is significantlydifferent than the average position of links pointing to pro-life Alsothe position of links from guns control to guns control is signifi-cantly higher than those to guns rights For evolutionary biologywhose distribution of links to creationism are placed statistically
significantly lower than those to evolutionary biology The same
happens for LGBTFor the sake of completeness of the analysis even if not used
further in the paper for each topic we study the quality of the pages
populating it In particular we use the ORES API to get the ldquoarticlequalityrdquo We observe that overall for all the topics between 60 and
70 the articles are classified as stubs or start Then the 22-29 is
in B-class the 0-5 are Featured Articles and the remaining belong
to the C-class12
4 METRICSIn this section we define the models and metrics that we use to an-
swer the research questions formulated in Sect 1 First we describe
how we characterize readersrsquo consumption either by analyzing
real usersrsquo data or by simulating their behavior (see Sect 41) Then
we introduce the core metrics of the paper ExDIN and M-ExDIN
see Sect 42
41 Content ConsumptionTo understand readersrsquo consumption of polarizing topics we need
different modeling strategies that we describe in the following
subsections
411 Metrics Based on Clickstream We build twometrics upon the
information we extract from usersrsquo clickstream data that are made
publicly available by Wikimedia and preserve usersrsquo privacy [14
54]13
From these data we infer 119888119894 119895 counting how many times a hyper-
link to 119894 isin 119881 is clicked from page 119895 The page 119895 may be either an
internalWikipedia page ( 119895 isin 119860 recalling that119881 = T cupNcup119878 includeall the Wikipedia pages) or external if corresponds to a page from
outside Wikipedia (eg a search engine) Thus we define the vari-
able120575 119895 which indicateswhether 119895 is an external page or it belongs to
the topic-induced network 120575 119895 = 1 if 119895 is external and 0 otherwise
Given a page 119894 we indicate withJ the set of external and internal
pages pointing to it see Figure 3 We define 119888119894 =sum
119895 isinJ 119888 119895119894 to be
the total clicks to the page
sum119895 isinJ 120575119894119888 119895119894 is the total number of clicks
from external websites therefore the difference between 119888119894 and this
summation is the number of visits from internal (Wikipedia) pages
Now we are ready to define the following metrics
Reader Search Rate (RSR) Given a page 119894 isin 119881 the empirical
probability that a visit to page 119894 is from an external website is
119877119878119877119894 =
sum119895 isinJ 120575119894119888 119895119894
119888119894 (1)
12httpsenwikipediaorgwikiTemplateGrading_scheme
13Description of the data is at httpsmetawikimediaorgwikiResearch
Wikipedia_clickstream The provided information is enough to extract the clickstream
based metrics
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Click Through Rate (CTR) Given a page 119894 isin 119881 the empirical
probability that a reader clicks a link within the page is
119862119879119877119894 =
sum119895 isin119873119900119906119905 (119894) 119888119894 119895
119888119894 (2)
where 119873119900119906119905 (119894) is the the set of pages 119894 points to (Multiple clicks
from the same page are counted as originating from different visits
to 119894 and thus counted multiple times in 119888119894 )
412 Model Clicks Within Pages When readers visit a page they
have the possibility of clicking any of the present links However
according to the information needs they want to satisfy each of the
links may have a different probability of being clicked [45] Now
we propose three models to describe the distribution probability
of clicking a link ldquojrdquo within an article ldquoirdquo First let 119894 be an article
in 119881 and 119895 isin 119873119900119906119905 (119894) We define 119901119900119904 ( 119895 |119894) as the rank of 119895 among
all links in 119894 and 119903 ( 119895 |119894) = |119873119900119906119905 (119894) | minus 119901119900119904 ( 119895 |119894) such that a higher
value indicates a higher ranking position Moreover we introduce
tanh119909 = 1198902119909minus1
1198902119909+1 which we use to transform ranking positions to
values between 0 and 1 such that links at the top of the page get
similar scores
The Clicks Within Pages models (CwP) are directly applicable on
119866 by setting the transition matrix119872 in one of the following modes
(1) 119872119906(Uniform) whose entry119898(119894 119895) = 1
|119873119900119906119905 (119894) | mimics read-
ers who click each link in a page uniformly at random
(2) 119872119901(Position) whose entry 119898(119894 119895) =
tanh 119903 ( 119895 |119894)sum119895isin119873119900119906119905 (119894 ) tanh 119903 ( 119895 |119894)
captures the scenario in which readers click with higher
probability links appearing first in the page This model is
based on previous work that shows how the link position is
a good predictor to determine the success of a link [16 31]
(3) 119872119888(Clicks) whose entry119898(119894 119895) = 119888119894119895sum
119895isin119873119900119906119905 (119894 ) 119888119894119895represents
the empirical probability that users in 119894 will click the link
toward 119895 When 119888119894 119895 lt 10 we substitute it with 1014 the
minimum number of times the link must be clicked to be
included in the dataset [53]
For the sake of completeness we recall that 119866 includes a super
node 119904 To fill its corresponding entries in the transition matrices
we need to aggregate over the edges we compressed to build the
graph15
see Sect 31
413 Readers Navigation Model The main goal of this paper is
to audit the mutual exposure to diverse information across 119875 and
119875 We can do it by simply looking at a snapshot of the graph and
counting the links going from 119875 to 119875 and vice-versa To do a step
further we recall that the Wikipediarsquos network is conceived to let
users move fulfilling their own information needs Thus we want
to understand how different usersrsquo navigation behavior can affect
readersrsquo exposure to diverse information
To do that it would be optimal to have access to usersrsquo log ses-
sion Because these data are not available to the public we define a
parametric model that simulates usersrsquo navigation by embedding
14We aim to model users on the current version of Wikipedia Thus to include all
the links we assign a smoothing factor equal to 10 to links clicked less than 10 This
implies a small probability of clicking these links Setting the smoothing factor to 10
is a deliberate choice However we experimentally verified that setting any number
between 1 and 10 does not affect the results
15The computation of these quantities is straightforward so we omit it from the
body of the paper
external
internal
i internal
Figure 3 Information from the clickstreamdataset For eachnode we extract the number of views coming from inter-nal and external websites Moreover we know howmany ac-cesses on a page turn into a click toward another article
different behaviors accordingly to chosen parameter We empha-
size that the scope of this model is not to perfectly replicate usersrsquo
behavior on Wikipedia Rather we want to see how users simu-
lated from a reasonable and general model are exposed to diverse
information
In other words we want to define a stochastic process with 119899 +1
states corresponding to the 119899 + 1 pages in119881 that approximates the
probability of reaching any of the articles starting at random from
119901 isin 119875 (or from 119875 )
Wemodel this by considering the process 119883 ℓ ℓ = 0 1 119871 on
the set of nodes119881 induced by transitionmatrix119872 with starting state
119883 0selected from the probability distribution 1206450
119875= (120587119875 )119894 isin R1times119899
over119881 We recall that the transition matrix119872 can vary according to
the CwP models (Sections 31 and 412) Based on the assumption
that usersrsquo session length (the number of clicks) is finite we evaluate
the process on a finite number of states 119871 We have that Pr(119883 ℓ =
119895) = (120587 ℓ119875) 119895 where the (row) vector 120645 ℓ
119875is given by the following
variation of the Personalized Random Walk with Restart (RWR)
Definition 1 (Navigation Model) Let1198720 be the transition ma-trix embedding a click-within-pages model 1206450
119875the distribution of the
starting state over 119875 and 120572 isin [0 1] the restart parameter We have
1206451
119875 = 1206450
119875middot1198720 (3)
and for ℓ ge 1
120645 ℓ+1
119875 = (1 minus 120572)120645 ℓ119875 middot119872ℓ + 120572 (1206450
119875 middot119872ℓ ) (4)
where119872ℓ = norm((119863 (119872ℓminus1)119879 )119879 ) and119863 = 119889119894119886119892
(1 + 120645 ℓminus1
119875
)minus1
norm(119872)transforms matrix119872 into a right-stochastic matrix by normalizingeach row independently such that it sums to 1
This process is a variation of the standard random-surfer (PageR-
ank) model with the difference that the transition matrix is updated
in each step It takes into account the probability that an article
has already been visited in a previous iteration Specifically the
vector 120645 ℓ119875that we get at the end of each iteration represents the
likelihood that each node is reached at step ℓ if it starts uniformly at
random from a node in 119875 We assume that readers within the same
session do not click more than once the same link Thus we desire
that at step ℓ + 1 the nodes that are clicked with high probability
at step ℓ see their probability of being reached deflated and those
with lower probability have more chances of being clicked We
achieve this by dividing the rows of119872 by the vector of probabilities
120645 ℓ119875+1 where 1 is a smoothing factor to avoid divisions by 0 and
then normalize the matrix to get the updated stochastic matrix to
use in the next iteration
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
(a) Star-like (120572 = 1) (b) Star-like Rand Navigation(0 lt
120572 lt 1)
(c) Random Naviga-
tion (0 = 120572 )
Figure 4 Navigation model for different 120572 The green nodesrepresent the starting navigation pages
Overall as we will later see in Section 42 this approach allow us
to investigate how the exposure to diverse information varies for
users who behave differently in terms of navigation session length
(meant as the number of clicks) and next-link choices
Looking deeper at the model
bull When 120572 = 1 Figure 4(a) the model emulates the reader
whose navigation consists in just opening links from the
starting page We call this behavior star-like and basically
consists in opening pages from the starting node With this
kind of exploration readers locally explore articles likely
semantically related to each other [49]
bull For 0 lt 120572 lt 1 Figure 4(b) we simulate two cases (1) readers
open sequential articles and then jump back to the starting
page (2) readers keeps multiple path open The more 120572 is
close to 1 the more users show a star-like behavior Instead
the closer 120572 is to 0 the more users navigate navigate in a
more DFS-oriented fashion Thus readers move randomly
according to the CwP model and from time to times jump
back to the starting page
bull If 120572 = 0 Figure 4(c) the users sequentially clicks links so
each click depends only on the CwP model In this case
especially if related articles are not densely connected the
exploration can lead to articles less related to the starting
page and returning to the origin following hyperlinks may
be difficult
Because Wikipedia does not have a button that allows readers to
go back to the previous page we assume the jumping back action
to consist in clicking the back button of the browser in use until
reaching again the session starting page The restart parameter
indirectly embeds the back-button action which for the absence of
back-links on Wikipedia can not be tracked on the graph
The behaviors replicated through the model recall those de-
scribed in [43 45]
42 Exposure MetricsAt this point we have all the ingredients to define the exposureto diverse information The metrics aim to quantify how much the
network structure allows readers to reach one or multiple sets of
articles To do that we rely on both the CwP and Navigationmodels
The application of the following metrics is not limited to polarizing
topics In fact they can generalize to the analysis of any sets of
nodes in a graph For this reason we adopt a more general notation
in their definition
Pro-
life
Pro-
choi
ce
Proh
ibiit
onAc
tivism
Cont
rol
Righ
ts
Crea
tioni
smEv
ol B
io
Racis
m
Anti-
racis
m
Disc
rimin
atio
nSu
ppor
t
0
5
10
Log(
Page
view
s)
Figure 5 Pageviews distribution For each topic we havea purple and yellow boxplot They represent the average(over all pages in the group 119875 or 119875) number of pageviewsAll the distribution distributions except for abortion are sta-tistically different at confidence level 120572 = 095 The topicsin order are abortion cannabis guns evolution racism andLGBT
Definition 2 (Exposure to diverse information (ExDIN)) Giventwo sets of pages 119875 119875 in 119881 let 120645 ℓ
119875be the vector indicating for each
article the probability of being reached at step ℓ (ℓ ge 1) starting froma random page in 119875 We say that the exposure of 119875 to 119875 is
119890ℓ119875rarr119875
=sum119895 isin119875
Pr(119883 ℓ = 119895) =sum119895 isin119875
120645 ℓ119875 (5)
and describes the probability that a reader in 119875 reaches an arbitrarynode in 119875 at the ℓth click
We employ this metric in two ways
(1) (Topological exposure to diverse information) If ℓ is 1 and
the CwP model is 119872119906(see Sect 412) it only quantifies
the topological property of the network to connect pages
belonging to different sets
(2) (Readersrsquo exposure to diverse information) For any parameter
and model that we pick the metric tells us how the readers
characterized by the CwP and Navigation models change
their exposure to diverse information over a session (ie
sequence of clicks)
Moreover we notice that Definition 2 can be extended tomultiple
sets Consider the case where we want to understand how one set of
nodes 119875 is exposed to three sets of nodes 119876119885 and 119871 To calculate
the ExDIN if we want to know the total exposure to the three sets
we define 119875 = 119876 cup 119885 cup 119871 Otherwise if we want to have the ExDIN
wrt to each set namely 119890119875rarr119876 119890119875rarr119885 119890119875rarr119871 we take 120645ℓ119875and sum
up the probabilities of the nodes within each set
Now that we have a metric to compute the exposure to diverseinformation we want to compare the flows among the sets Thus
we introduce the mutual exposure to diverse information
Definition 3 (Mutual exposure to diverse information (M-ExDIN))Let 119890ℓ
119875rarr119875and 119890ℓ
119875rarr119875be the exposure to diverse information of sets 119875
and 119875 We say that the mutual exposure between the sets is
120598ℓ =min119890ℓ
119875rarr119875 119890ℓ119875rarr119875
max119890ℓ119875rarr119875
119890ℓ119875rarr119875
isin [0 1] (6)
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
00 05 10 15 20 25 of pageview from
opposite partition
Pro-lifePro-choice
ProhibitionActivism
Gun controlGun rights
CreazionismEvolutionary biology
LGBT discriminationLGBT rights
RacismAnti-racism
Figure 6 Percentage of pageviews coming from oppositeside Topics in order from the bottom abortion cannabisguns evolution LGBT racism
If either 119890ℓ119875rarr119875
or 119890ℓ119875rarr119875
is 0 then 120598 = 0
This measure quantifies to what extend the exposure to diverse
information is balanced across 119875 and 119875
The closer 120598 is to 1 the more balanced the probabilities of moving
from one set to the other are In this case the network topology does
not favor connections from one set to the other On the other hand
if 120598 is close to 0 it the network structure tends to favor either the
navigation from 119875 rarr 119875 or from 119875 rarr 119875 On this perspective if we
observe a tendency in the network of facilitating the exploration
from one of the sets to the other we may say that the network
topology is biased toward a direction Thus we can think of using
M-ExDIN to measure the bias in the network wrt two sets of
nodes
Even though the mutual exposure to diverse information cap-
tures the balance among ExDIN of 119875 and 119875 when they are of com-
parable size it may fail if one is much smaller than the other For
instance suppose 119875 is 10 times larger than 119875 then if pages of both
partitions have a similar out-degree distribution one would expect
119890ℓ119875rarr119875
asymp 10 middot 119890ℓ119875rarr119875
and as a result 120598 asymp 01 The same happens if
they have similar in-degree distribution For this reason when we
compute either ExDIN and M-ExDIN we check whether the sizes
of the communities are unbalanced and we proceed as follows If
|119875 | lt |119875 | we define 119875 prime obtained by sampling |119875 | articles from 119875
Thus we use the new set for all computations Because of the ran-
domness of the phenomenon we repeat the measurements multiple
times
5 RQ1 READERSrsquo TOPIC CONSUMPTIONBefore looking into how readers are exposed to diverse content
we investigate how they have consumed each of the six topics
that we concentrate on over the last four years In particular we
collect monthly clickstream data from November 2017 to September
2020 We note that when we count the click views of a page we
consider the average over the number of months the page existed16
Accordingly when computing the occurrences for the transitions
matrix based on clickstream we consider the average clicks of the
link over the number of months it exists In this way we reduce
16Based on the temporal graphs extracted by [11]
the seasonality effect and weight links according to page changes
in terms of hyperlinks
51 Pageviews DistributionTo start our analysis we count the average number of times a
page has been visited over 34 months In Figure 5 we plot the
log-distributions of the pageviews for each topic and opinion By
running a t-test we conclude that for all topics except for abortionthe difference of the means of opinionsrsquo pageviews is statistically
significant for 119901 lt 005 This finding demonstrate that users tend to
visit more pages expressingsupporting one of the two viewpoints
From a networkrsquos perspective to increase the exposure to opposing
opinions it is desirable for pages that are frequently visited to be
well connected to articles expressing opposing opinions
In Figure 6 we break down the pageviews showing how many
of them come from pages of the opposing partition Overall the
fraction of visits from the opposite side is low (below 05) The
category LGBT rights has the highest ratio of visits from LGBT dis-crimination pages about 25 For topics such as guns and abortionthe percentage of visits from opposite partition shows that there
are somewhat fewer visits to pages of a liberal inclination from
articles expressing a more conservative opinion In fact the 028
of visits to pro-choice come from pro-life compared to 06 visits of
pro-life from pro-choice
52 External or Internal Access to the TopicWe now investigate how readers access content about a topic As
introduced in Section 411 from the clickstream data we can com-
pute the RSR which indicates whether a page is accessed more
by external sources or by navigating Wikipedia In Figure 7 we
provide a visualization that depicts the flows of the cumulative
visits from external and internal pages towards the two partitions
Referring to Figure 7(c) the 448 of visualizations come from
internal pages The click stream from internal pages is broken down
to see the proportion of flow towards guns control and gun rightsThe internal views of guns control articles are 34 times more than
those of gun rights We observe that also from external websites
most of the traffic is towards gun control (27 times more than gunrights) Overall the 26 of the total visits to gun related content is
concentrated on gun rightsThe abundance of traffic towards one of the two opinions does
not characterize only the guns topic Indeed among all the topics
the 59ndash74 of visits is accumulated by one partition Moreover
readersrsquo preferences appear consistent among external and internal
accesses that is they both point more towards the same view of
the topic For both internal and external views the distribution
of accesses toward partitions is approximately the same (ie the
percentage of visits from external to 119875 (resp 119875 ) is the same of
from internal to 119875 (resp119875 )) The only exception is evolution whoseexternal visits to creationism is 453 lower than internal accesses
We note that partitions with higher views are not necessarily the
biggest in the topic-induced networks
In general the largest amount of visits to topicsrsquo articles comes
from external pages Particularly only the 236 and 335 of traffic
to evolution and racism is generated by the internal Wikipediarsquos
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll PPPPPrrrrrooooo-----llllliiiiifffffeeeee
PPPPPrrrrrooooo-----ccccchhhhhoooooiiiiiccccceeeee
(a) Abortion
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
PPPPPrrrrrooooohhhhhiiiiibbbbbiiiiiiiiiitttttooooonnnnn
AAAAAccccctttttiiiiivvvvviiiiisssssmmmmm
(b) Cannabis
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
CCCCCooooonnnnntttttrrrrrooooolllll
RRRRRiiiiiggggghhhhhtttttsssss
(c) Guns
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllllCCCCCrrrrreeeeeaaaaatttttiiiiiooooonnnnniiiiisssssmmmmm
EEEEEvvvvvooooolllll BBBBBiiiiiooooo
(d) Evolution
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
RRRRRaaaaaccccciiiiisssssmmmmm
AAAAAnnnnntttttiiiii-----rrrrraaaaaccccciiiiisssssmmmmm
(e) Racism
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
DDDDDiiiiissssscccccrrrrriiiiimmmmmiiiiinnnnnaaaaatttttiiiiiooooonnnnn
SSSSSuuuuuppppppppppooooorrrrrttttt
(f) LGBT
Figure 7 Cumulative Pagesrsquo Traffic Eachplot indicates (1) On the left the cumulative amount of accesses coming from externalweb pages or internal Wikipediarsquos articles (2) The flows of visits from external and internal pages to partitions (3) On theright the cumulative accesses to 119875 and 119875
0 5 10 15 20 25 30Avg(Click Through Rate) X 100
Pro-lifePro-choiceProhibition
ActivismGun controlGun rights
CreazionismEvolutionary biologyLGBT discrimination
LGBT rightsRacism
Anti-racism
Figure 8 Average click-through rate In this plot we reportthe average CTR of pages belonging to the same set 119875 (resp119875) The score indicate the average probability that a linkwithin a page 119901 in 119875 (resp 119875) is clicked Topics in orderfrom the bottom abortion cannabis guns evolution racismLGBT
navigation The same quantity for the remaining topics ranges
ranges between 44 and 47
We point out that readersrsquo consulting articles about abortioncannabis and guns are inclined toward pages conveying liberal
views on the topic Instead it is more complicated to draw inter-
pretations about the remaining topics One explanation may be
that users look for information generally less covered in the public
mainstream debate
53 How Much Readers Navigate LinksOnce readers visit a page they can decide to click any of its links
We want to understand how frequently they do so For that we
compute the average pagesrsquo click-through rate (Sect 411)
We plot this information in Figure 8 Overall we see that the
percentage of access turning into a visit to another page ranges
between 10ndash28 Dimitrov and Lemmerich [14] observed that the
CTR average for the whole Wikipedia is 12 So most of the subset
of pages we consider have a CTR higher than Wikipediarsquos average
The CTR of guns control is the highest (28) the pages about racismfollow with 26 The articles that over the years have generated
less internal traffic are those about evolutionary biology and LGBTrights
Examining pagesrsquo connections we found that those with higher
CTR have more links (the Pearson correlation coefficient of is 052)
Topic macrC(119875)
macrC(119875 rarr 119875) macrC(119875 rarr 119875)
macrC(119875) macrC(119875 rarr 119875)
macrC(119875 rarr 119875)Abortion 6889 9082 5764 7026 8981 4537
Cannabis 7081 9501 3750 6578 9652 1667
Guns 5234 7869 3568 5963 7535 3928
Evolution 7115 8449 6447 7269 9900 5513
Racism 5636 8841 3432 7187 9063 6544
LGBT 6166 8942 525 7242 9252 5917
Table 3 Average of links within pages clicked less than 10times
macrC(119875) is the average percentage of un-clicked hyper-linkswithin pages in 119875
macrC(119875 rarr 119875) is the average percentageof un-clicked links within 119875 pointing to 119875
This suggests that articles having higher out-degree offer more
options to users Presumably because of this users are more likely
to continue the exploration from those articles
In addition we count the number of links clicked fewer than 10
times over the last three years see Table 3 As an example given
a page in creationism on average 7115 of its links have been
clicked fewer than 10 times If we distinguish between references
to creationism and references to evolution readers did not click
the 8449 of links pointing to creationism and the 6447 of those
pointing to evolutionary biology
6 RQ2 EXPOSURE ACROSS TOPICVIEWPOINTS
The main contribution of this paper is to examine to what extent
current Wikipediarsquos topology supports users to explore diverse
facets of polarizing issues In particular we study (1) how readers are
locally exposed to diverse information and (2) how their exposure
to plural opinions may change throughout a navigation session
61 Exposure to DiversityTo evaluate the exposure to diversity induced by the networkrsquos
topology we compute the exposure to diverse information for ℓ = 1
using the uniform CwP model Recalling that ℓ indicates the usersrsquo
session length if we set it equals to 1 we study the exposure to
diversity over one-click sessions
Plots in the first row of Figure 9 show the value of ExDIN for
all the topics when CwP is119872119906
For instance let evolution be the topic we analyze If readers
start uniformly at random from a page about creationism the prob-
ability of visiting an article of the same partition is 576 On the
contrary the chances of entering a page about evolutionary biology
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Pro-life
Pro-choice
Unifo
rm
456
748133
061 Prohibiiton
Activism
338
038039
008 Control
Rights
444
478161
116 Creationism
Evol Bio
576
327061
018 Racism
Anti-racism
876
8613
142 Discrimination
Support
424
395063
054
Pro-life
Pro-choice
Posit
ion
428
781115
056 Prohibiiton
Activism
332
037037
008 Control
Rights
468
473138
089 Creationism
Evol Bio
553
329054
016 Racism
Anti-racism
892
834129
159 Discrimination
Support
394
356056
042
Pro-
life
Pro-
choi
ce
Pro-life
Pro-choice
Click
s
1274
1357129
123Pr
ohib
iiton
Activ
ism
Prohibiiton
Activism
864
065081
013
Cont
rol
Righ
ts
Control
Rights
1376
1305208
137
Crea
tioni
sm
Evol
Bio
Creationism
Evol Bio
1157
52069
03
Racis
m
Anti-
racis
m
Racism
Anti-racism
2367
1503116
129
Disc
rimin
atio
n
Supp
ort
Discrimination
Support
1107
653066
053
Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click
is 018 (32 times smaller) On the other hand readers starting
uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion
This probability is 5 times larger than that of visiting creationismpages
It is worth to point out that the current networkrsquos topology not
only nudges users in reading more about the same opinion but
also hinders them to explore diverse content symmetrically Indeed
users reading about evolutionary biology have higher chances of
reading one article about creationism (3 times more) than users from
creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-
tioned observations hold for most of them Moreover we note that
the probabilities to continue the session reading a page of the same
opinion is greater for one of the two partitions of a given topic
Taken together these measurements highlight that the structure
of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to
diverse information smaller than 1
The findings above report the intrinsic capability of the network
to spur users towards diverse content If we want to combine it
with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results
in the second and third row of Figure 9
Referring back to evolution we now consider the matrix corre-
sponding to the ExDIN computed using the position CwP model
(second row) We see that if users click with higher chances links
at the top of the page wrt the uniform model the probabilities are
only slightly modified These modest variations are coherent with
linksrsquo placement within pages Figure 2
For a few topics such as guns the linksrsquo position plays a more
significant role worsening the user exposure to diverse information
Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-
sequence is that the probability of reaching an article supporting
guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the
uniform CwP model Therefore we conclude that for some topics
the placement of links within pages contributes in reducing the
exposure to diverse information In other words users who tend
to click with higher probability the links located towards the be-
ginning of a page have less possibility to read about contrasting
opinions
Finally we analyze how the phenomenon changes when we use
the click CwP model (third row) In this case we assume readers
make the next-click choice similarly to past users Going back to
evolution we immediately observe that the probability to start a
session in creationism and to continue reading about it after one
click grows from 576 of the uniform model to the 1157
For all the topics we verify a significant increment of the proba-
bility to visit pages of the same opinion Simply interpreting this
result we can say that real users click more the links strictly related
to the page they are reading From another perspective combining
this finding with the previous remark saying that the ldquotopology of
the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer
this question but we hope it will be addressed in future works
Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing
diverse view
Before moving to the next section we want to underline that
stronger relations among pages of similar content is an intrinsic
property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a
fact we believe that it would be valuable to provide to editors met-
rics and tools making them aware of the effect that a newcurrent
links have on usersrsquo exposure to diverse information This is not
meant to alter the core and essential intrinsic property ofWikipedia
rather to avoid this property to become harmful when it prevents
users from accessing diverse content
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
1 3 5 7 9 11 13 15
1
2
e PP
Abortion
1 3 5 7 9 11 13 15
00
05
10Cannabis
1 3 5 7 9 11 13 150
1
2
Guns
1 3 5 7 9 11 13 15
05
10Evolution
1 3 5 7 9 11 13 15
05
10
15
Racism
1 3 5 7 9 11 13 15
05
10
Lgbt
1 3 5 7 9 11 13 1500
05
10
15
e PP
1 3 5 7 9 11 13 15
000
025
050
1 3 5 7 9 11 13 15
1
2
1 3 5 7 9 11 13 15
02
04
06
1 3 5 7 9 11 13 1505
10
15
1 3 5 7 9 11 13 15
05
10
15
1 3 5 7 9 11 13 15
40
60
80
100
1 3 5 7 9 11 13 15
25
50
75
1 3 5 7 9 11 13 1525
50
75
100
1 3 5 7 9 11 13 15
40
60
80
1 3 5 7 9 11 13 15
60
80
100
1 3 5 7 9 11 13 15
60
80
100
Number of Clicks
uniform alpha=0position alpha=0clicks alpha=0
uniform alpha=02position alpha=02clicks alpha=02
uniform alpha=1position alpha=1clicks alpha=1
Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42
62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions
longer than 1 and see how their (mutual) exposure to diversity may
change According to the combinations of models employed to mea-
sure the ExDIN and M-ExDIN we provide different insights about
the effect of the current networkrsquos topology on usersrsquo exposure to
diverse content over a navigation session
In Figure 10 for sessions of length 15 we plot the ExDIN from
119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can
notice that each of the topics shows its own trends For this reason
we decide to highlight and provide an explanation for the most
recurrent patterns Moreover for better understanding we suggest
to cross-check the following explanations with the analysis done
above in the paper We start describing how given the current net-
workrsquos topology the ExDIN changes over the course of a navigation
session (first and second row)
(1) The curves corresponding to the same value of 120572 (same color)
show very similar trends Depending on the respective CwP model
(markerrsquos shape) they are shifted up or down This implies that
when users share the same navigation behavior the way they make
the next-click choice plays a crucial role on determining the mag-
nitude of their exposure to diverse content In general the CwP
model (markersrsquo shapes) corresponding to higher exposure is119872119888
followed by119872119906and119872119901
(2) If users navigate mirroring a star-like behavior (green 120572 = 1)
their exposure to the opposite opinion is steady It can slightly
decrease or increase when the probability of clicking links to the
opposite side becomes higher or lower respectively in the first
iterations This happens because these kind of users are only subject
to the exposure of their starting navigation page So the more links
to the opposite partition they click in the first steps the more their
exposure to diversity decreases and vice-versa
(3) The curves of users who randomly navigate the network (sky
blue 120572 = 0) show two trends For both cases after the first few
clicks the ExDIN is lower than at the beginning of the session
Then it inverts the trend In one case it reaches or exceeds the
starting exposure On the other hand it grows getting steady below
the initial exposure The more the destination partition is connected
to the rest of the graph the more users randomly navigating the
network are able to reach it Sometimes ExDIN from 119875 to 119875 of
LGBT the curves start to decrease after many steps This happens
when the pages within the destination partition have been reached
with high probability
(4) Users characterized by a star-like random navigation (blue
120572 = 02) are exposed to diverse content similarly to users exploring
randomly the network but the ExDIN magnitude is greater because
of the possibility of jumping back to the starting point at each step
Given this observation we analyze the ExDIN of guns With the
current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)
are those characterized by a star-like behavior As soon as the users
navigate in a random fashion this probability drops down This is
because the guns rights partition has a number of in-going edges
that prevent random users who walk away to get back to it We
observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to
the guns control partition is higher or comparable to that of star-like
behaving users The probability of reaching guns control after many
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
steps becomes high because of the higher incoming edges of the
partition
From complementary analysis we observe that for sessions
longer than 2 page visits the topology of the graph is not such
to keep random users within the knowledge bubble they accessed
Indeed for all the topics the probability to visit articles of the same
opinion of the starting page becomes close to 0 (refer to Fig 9 to
cross-check the probabilities at the first click) For users with star-
like behavior the probabilities show slightly descending curves
This demonstrates that they tend to pick pages of the same opinion
at the first iterations Due to space constraints figures picturing
these phenomena could not be displayed
Nowwe compare the ExDIN curves of each topic to understand if
users reading about different opinions have equal chances of visiting
each other We recall that to do it we use the Mutual exposure to
diverse information see Sect 42 For all the topics the mutual
exposure to diverse information is lower than 100 meaning that
for none of them does the network topology provide an equal
exposure across opinions If we consider topics like abortion and
guns the longer the usersrsquo session is the more the network topology
prevents readers to symmetrically explore different opinions
In general if we detect low mutuality the topology of the net-
work favors the exploration of one side of the topic more than the
other Moreover we want to stress that all the comments regarding
users navigating according to the uniform CwP model express the
intrinsic topological exposure of the network On Wikipedia two
scenarios determining this phenomenon may be (1) the knowledge
on the encyclopedia is complete but articles are underlinked [49]
that is there is the content and keywords to become anchors thus
we need a strategy to densify the network ensuringmutual exposure
to diversity (2) the knowledge on the encyclopedia is incomplete
that is there are no words to attach links thus the addition of com-
plementary content may be necessary An in depth investigation of
this conditions may be an interesting future work
7 CONCLUSIONSOur work provides a first analysis to understand how the current
Wikipediarsquos network topology assists readers to explore opposite
stances of polarizing topics spanning over sets of articles We for-
malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles
expressing opposing viewpoints The latter evaluates whether the
relationship across diverse views is symmetric that is whether
the flow and the opportunity to go from one side to the other is
comparable for the two directions
We investigate the phenomenon on six polarizing topics (Sect 6)
In addition we also study the overall usersrsquo topics consumption
Our main findings suggest the following
bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-
ing from both external and internal pages suggest that read-
ers are inclined to seek content about one facet of the topic
Most seem to have bias toward liberal content (see Fig 7)
bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse
contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61
the analysis suggests that users reading about an opinion
have higher chances of continuing to explore articles of sim-
ilar views than of the opposite Furthermore for each of the
topics that we explored the users of one of the two views
had substantially more tendency albeit small to visit pages
of the opposing view than the ones of the other one
bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we
observe that the mutual exposure to diverse information is
never achieved by users navigating completely at random
The better one of the two opinions is connected to the rest of
the network the more the graph nudges users toward that
opinion
bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-
mation of the same inclination diminishes On the other
hand if users explore the network with a star-like behavior
their level of exposure to the same opinion is similar to those
who only do one click
In our study we analyze sets of articles assigned to opinions ac-
cording to editorsrsquo crafted categories [44] Although this approach
represents a solid starting point for analysis it can cause article mis-
classification As future work we plan to investigate a more reliable
classification strategy to improve the accuracy of our analysis an
analysis which should include also the content of the articles along
the line information Secondly the performance of a longitudinal
study with the goal of understanding the dynamics that brought
to the current state of the encyclopediarsquos network would provide
further understanding of the usersrsquo behavior Finally in the light
of our findings we deem crucial to design tools to help editors to
contextualize articles within the network such that they are aware
of the effect of links insertion on users knowledge exploration
The prevalence of bias and polarization is well established in
multiple areas of our life and filter bubbles aggravate this phenom-
enon Understanding better how they manifest in Wikipedia (and
other media) is a crucial first step for finding ways to attenuate it
and our hope is that this work is a step towards this goal
Acknowledges Partially supported by the ERCAdvancedGrant
788893 AMDROMA Algorithmic and Mechanism Design Research
in Online Markets and MIUR PRIN project ALGADIMAR Algo-
rithms Games and Digital Markets
REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004
US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem
[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion
networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep
Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18
[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian
Self-Governance in Action Motivating the Policy Lens In ICWSM
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models
IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation
Journal of machine Learning research 3 Jan (2003) 993ndash1022
[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-
work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web
[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content
on famous persons Journal of the American society for information science andtechnology (2011)
[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter
Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM
[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves
Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter
In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs
A complete longitudinal and multi-language dataset of the Wikipedia link net-
works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607
[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena
Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-
ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media
[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho
2016 Extracting Semantics from Random Walks on Wikipedia Comparing
Learning and Counting Methods In Wiki ICWSM
[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-
ferent topic different traffic How search and navigation interplay on Wikipedia
The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28
[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926
[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web
[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding
news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346
[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers
and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320
[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in
Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72
[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In
Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)
[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Political discourse on social media Echo chambers gate-
keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922
[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-
ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671
[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)
[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-
rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282
[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First
Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media
[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013
Models of human navigation in information networks based on decentralized
search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98
[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki
dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-
trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113
[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and
Andreas Hotho 2019 On the right track Analysing and predicting navigation
success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext
and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web
Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web
[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017
How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50
[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of
source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754
[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-
nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and
moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505
[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my
friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726
[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito
2015 Measuring political polarization Twitter shows the two sides of Venezuela
Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018
Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18
[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving
website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624
[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring
Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval
[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk
about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11
[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019
Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-
fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and
Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141
[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014
The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222
[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom
of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn
Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600
[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013
Computing semantic relatedness from human navigational paths A case study
on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70
[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer
2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)
[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-
works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki
WikipediaManual_of_StyleLinking
[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia
Namespace
[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki
WikipediaNeutral_point_of_view
[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect
[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https
doiorg106084m9figshare1305770v22 (2017)
[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing
wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985
- Abstract
- 1 Introduction
- 2 Related Works
- 3 Data Collection
-
- 31 Topic Induced Networks
- 32 General Statistics on Topics Networks
-
- 4 Metrics
-
- 41 Content Consumption
- 42 Exposure Metrics
-
- 5 RQ1 Readers Topic Consumption
-
- 51 Pageviews Distribution
- 52 External or Internal Access to the Topic
- 53 How Much Readers Navigate Links
-
- 6 RQ2 Exposure Across Topic Viewpoints
-
- 61 Exposure to Diversity
- 62 Dynamic Exposure to Diversity
-
- 7 Conclusions
- References
-
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
follows that the lack of sufficient linkage among pages expressing
diverse stances of a topic can be against the NPOVrsquos goals
To our best knowledge there are no studies that investigate
the validity of the NPOV principles concerning usersrsquo exposure to
the indirect content (ie the one suggested by hyperlinks) Hence
analyzing Wikipediarsquos links is particularly important to understand
if broad topics which conceptually span across multiple articles
are effectively proportionately and fairly presented to readers not
only in terms of direct content (ie articlersquos body)Previous works addressed the issue of usersrsquo polarization on
social networks and showed that it is hard for users to interact
with content createdshared by users of opposing views [1 9 10
19 24] Ribeiro et al [42] empirically showed that the YouTube
recommender system contributes to radicalize usersrsquo pathways
Given the nature and role of Wikipedia as a primary source of
knowledge acquisition the lack of broad exposure to different views
of a topic appears to be critical to guarantee fair and balanced access
to a well-rounded knowledge
This paper provides a first observational study on Wikipedia
that aims to quantify how the hyperlinksrsquo network topology can
profoundly affect user exposure to diverse stances on polarizing
topics Having a comprehensive view of the connections among
Wikipedia pages and how they shape reader exposure to informa-
tion is a difficult task to grasp for humans Therefore it requires
introducing algorithmic methods to audit and quantify the mutual
level of exposure among articles of diverse content especially for
polarizing matters That is fundamental for the improvement of the
encyclopedia and its role in promoting a self-critical society
By studying the hyperlinks network we first aim to discover
to what extent the networkrsquos topology pushes users to explore
diverse content rather than keep them within knowledge bubbles1Secondly we aim to gain insights that may help to design a system
supporting editors in (1) contextualizing pages within the more
general encyclopediarsquos network and (2) adding links connecting
articles of opposingcomplementary views
In summary this paper tackles the following research questions
RQ1 How do readers consume articles about polarizing topics
(Sect 5)
RQ2 To what extent does the hyperlinksrsquo network expose read-
ers to diverse information (Sect 6)
By answering them we make the following contributions
bull We initiate a discussion that aims to shed light on the role
that the hyperlink network plays in connecting articles be-
longing to different categories We focus our work on ana-
lyzing this phenomenon on a set of polarizing topics such
as abortion guns evolution
bull We define two metrics the exposure to diverse informationand the (mutual) exposure to diverse information to quan-
tify the strength of connections among sets of articles (eg
pages about abortion-rights and anti-abortion) These met-
rics quantify to what extent the network topology assists
readers to visit pages of contrasting subjects and whether it
does it equally for all them (see Sections 41 412 and 42)
To this end they embed readers possible behavior relying
1We intend as knowledge bubbles the sets of pages presenting one side of a con-
tentious subject (ie pages about pro-life or pro-choice movements)
on their behavioral patterns [43 45] features determining
the success of wikilinks [16 31] and readersrsquo clickstream
data [53]
bull We find that the structure of the network facilitates users
to explore knowledge bubbles of homogeneous view rather
than opposing stances Moreover we show that readersrsquo
interest is biased toward one side of the topic based on the
internal and external traffic on Wikipedia (see Sect 411 5
and 6)
To our knowledge this is the first work that analyzes Wikipediarsquos
readersrsquo exposure to diverse information through the link network
Before moving on we want to emphasize that this work does not
claim how the hyperlinks network should be rather we aim to
study if the current connections among articles encumber users in
visiting complementary pages about a polarizing topic Also our
conclusions come from a network-based analysis More advanced
investigation combining network properties and articlesrsquo content is
left out for future works The code to replicate the paper is stored
in an anonymous folder2
2 RELATEDWORKSWe divide this paper related work in four categories ImprovingWikipedia Navigating Wikipedia Wikipedia Categorization and
Polarization on Social MediaImprovingWikipedia The scientific community proposed semi-
automated procedures to improveWikipediarsquos quality These works
check the veracity of references [18 41] suggest articlesrsquo structure
[39] look for hoaxes [30] or recommend links [38 54] Although
link recommendation tools enrich the editing process they do not
provide editors a measure to evaluate the relationship among ar-
ticles containing diverse opinions In this work we define such
metrics Sect 42
Wikipedia Navigation The literature still lacks a model that
generalizes Wikipediarsquos usersrsquo behavior Previous studies [25 27
31 46] focused on modeling and predicting human navigation in-
side Wikipedia relying on traces from navigation games ie Wik-ispeedia [43 48] and WikiGame [13 29 46]3 While such games
provide valuable insights about how users exploit links to go from
one concept to another Singer et al [45] and [15 17] showed that
users display different behavioral patterns depending on their in-
formation needs and the linksrsquo position within pages Thus we
exploit the insights provided by Singer et al [45] to define a general
model mimicking localized and more in-depth topic exploration
We further enrich the model characterizing usersrsquo next-link choices
according to findings in [15 17] Sections 412 and 413
Wikipedia Categorization In this work we need to collect ar-
ticles expressing the distinct facets of a polarizing topic Wikimedia
provides a supervised classifier ie ORES4that based on features
derived from the articlesrsquo text categorize an article into a manually-
designed categories taxonomy5[3] Alternatively one can use topic
models [5 6] Unfortunately none of the above approaches provide
2httpsdrivegooglecomdrivefolders1CJr_YiFE2YlyAtB9yKaGe8CLwVLWx9Ta
usp=sharing
3These games ask readers to go from one article to another using wikilinks
4httpsoreswikimediaorg
5httpswwwmediawikiorgwikiORESArticletopicTaxonomy
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
Figure 1 From Wikipediarsquos graph to a topic-induced net-work The image on the left shows the original Wikipediarsquosgraph On the rights we have the final topic-induced net-work The dashed circles in119882 identify the set of nodes thatwe use to build the topic-induced network 119866 The color redrefers to the set of nodes 119875 We use the blue to indicate 119875
and green and yellow for N and 119904 respectively To keep theimage tidy we do not specify the edges direction
us the requested granularity So we exploit the collection proce-
dure employed by Shi et al [44] who needed the same data to study
how polarization in teams impact articlesrsquo content about polarizing
topics see Sect 3
Polarization on Social Media There is a large spectrum of
works related to detect [1 10 12 19 36] model quantify and mit-
igate [2 9 21ndash24 32 34 35 37] polarization on social media We
focus on the work of Garimella et al [24] that better relates to our
metric of exposure to diverse information (ExDIN) They introduced
a graph polarization measure based on random walks ie Random
Walk Controversy score (RWC) On a social graph it quantifies to
what extent opinionated users are more exposed to their own opin-
ion than the opposite thanks to a chain of retweets (represented by
the random walks) While RWC is conceived for networks of users
and measures the overall polarization of a graph ExDIN works
on information networks and quantifies how the networkrsquos topol-
ogy impacts the usersrsquo exposure to diverse information when they
navigate the graph
Cultural bias onWikipedia Callahan and Herring [8] showedthe presence of cultural bias in the same articles of different lan-
guages Other studies highlighted differences between women and
men biographies [26 47] These content-based analyses call for
the need for a thorough investigation of the phenomenon To this
end we decide to investigate the presence of bias in the hyperlink
network by quantifying the diversity of pages it suggests to users
browsing the network of articles
3 DATA COLLECTIONTo audit a polarizing topic on Wikipedia we encode it by building
a topic-induced network This representation embeds both the
network structure and readersrsquo interactions with the topic
31 Topic Induced NetworksIn this section we explain how to build a topic-induced network
We suggest the reader to follow the process looking at Figure 1
First we consider the directed English Wikipediarsquos graph119882 =
(119860 119871) The nodes of the graph are encyclopediarsquos pages classified
as Articles [50] The edges represent the links connecting pages
and are known as wikilinks6 This set of links includes those in the
infoboxes7
Among the vertices we identify a set of pages T sub 119860 about the
different polarizing sides of a given topic We partition T into two
sets 119875 and 119875 (ie 119875 cap 119875 = empty and 119875 cup 119875 = T ) Each of them gather
pages related to the same side of the topic Then we define the set
of nodes N that includes all vertices at one-hop distance from the
vertices in T The reason we consider nodes representing pages
outside T is twofold (1) We want to include in the graph those
nodes related to the topic that do not appear in T because describe
subjects neutral to the topic8 (2) When we will consider readers
exploring the network we want to account for the possibility that
they reach pages about entities of opposing opinion passing through
articles not strictly related to the topic (see Sect 41)
To reduce the complexity of our analysis we cluster all the pages
in 119878 = 119860 (T cup N) in one super node 119904 Note that nodes in 119878
are only connected to vertices in N For each node 119907 isin N we can
have multiple edges going to 119904 We compress them in a unique edge
(119907 119904) Respectively 119904 can point multiple times to the same node
119907 isin N So we compress them to a unique edge (119904 119907) In both cases
the weights of (119907 119904) and (119904 119907) will be the sum of weights of the
aggregated edges
Finally we built a directed weighted network119866 = (119881 119864) that wecall topic-induced network whose set of vertices119881 is T cupNcup119904 ofcardinality 119899 + 1 and the edges 119864 are the links connecting the pages
The edge weights are transition probabilities as follows Let119872 be
an (119899 + 1) times (119899 + 1) right-stochastic transition matrix associated
to 119866 that is a matrix such that each entry 119898119894 119895 is a probability
with119898119894 119895 = 0 if (119894 119895) notin 119864 and such that
sum119899+1
119895=1119898119894 119895 = 1 The entry
119898119894 119895 describes the probability that being on article 119894 a reader clicks
page 119895 In Section 412 we propose different characterizations of
the transition matrix
Summarizing to extract the topic-induced network of a given
topic we first extracted data from a complete English Wikipedia
database dump9From this dump we build the graph119882 To collect
the corpus of articles expressing different opinions about the topic
(ie T ) we rely on the collection strategy adopted by the authors
of [44] (see Sect 2) In particular the subcorpus belonging to 119875
consists of all articles categorized under a Wikipedia category de-
scribing a viewpoint and its subcategories For instance the corpus
of abortion articles consists of two subcorpora pro-life (119875 ) and
pro-choice (119875 ) articles The pro-life subcorpus consists of all articlescategorized under the seed category ldquoAnti-abortion movementrdquo and
its subcategoriesFor instance the article ldquoFetal rightsrdquo is directly un-
der the seed category whereas the article ldquoCrisis pregnancy centerrdquo
is located under the subcategory ldquoAnti-abortion organizationsrdquo The
pro-choice corpus is collected in a similar fashion starting from the
category ldquoAbortion-rights movementrdquo Note that because we want
6We exclude links within the same page Moreover while building the graph
we resolve all the redirects [52] Specifically for any given node 119903 pointed by 119906 and
redirecting to 119907 we replace the edges (119906 119903 ) and (119903 119907) with (119906 119907) The final effectof this operation is that we exclude all the redirecting nodes from119860 while retaining
their connections to the rest of the graph
7An infobox is a fixed-format table usually added to the top right-hand corner of
articles to consistently present a summary of some unifying aspect that the articles
share
8For instance articles that present an overall introductiondescription of the topic
9Unless differently specified we refer to the dump of September 2020
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Topic |119881 119904| |119875 | |119875 | |N | |119864 | |119864119875rarr119875 | |119864119875rarr119875 | |119864119875rarrN | |119864119875rarrN | |119864Nrarr119875 | |119864Nrarr119875 | 119876 (119875 119875) Unreach(119875 ) Unreach(119875 )
Abort 56861 481 291 56093 19M 205 97 21843 14492 21396 29889 029 (030041) 21 481
Cannabis 32743 45 231 32470 11M 8 6 1089 15055 656 27823 027 (014 003) 1136 349
Guns 65743 167 187 65393 25M 98 115 18342 12304 56702 16608 026 (024 030) 363 000
Evolution 84788 342 1334 83113 199M 391 135 18289 45472 15601 58720 020 (022 027) 169 562
Racism 129963 1024 1022 127953 48M 746 560 64359 41566 74354 58195 032 (021 031) 272 255
LGBT 150563 459 640 149479 46M 195 143 28100 22678 92975 81706 034 (030 013) 244 535
Table 1 Networksrsquo statistics The notation119876 (119875 119875) isin [0 1] indicates the modularity among the partitions Higher119876 means thatconnections within partitions exceed those among them
Pro-lifePro-choice
ProhibiitonActivism
ControlRights
CreationismEvol Bio
Racism Anti-racism
DiscriminationSupport
000
025
050
075
Links
pos
ition
Opposite opinion Same opinion
Figure 2 Linksrsquo position distribution within pages Given 119875 and 119875 the orange boxplots show the distribution of links withinpages in 119875 (resp 119875) that point to articles in 119875 (resp 119875 ) The green boxes represent linkrsquos placement among pages only belongingto 119875 (resp 119875 ) The value of the y-axis is the relative position re-scaled with the 119905119886119899ℎ to similarly score links at the top of thepage Higher the value higher the position in the page is
119875 and 119875 to be disjoint articles belonging to both ldquoAnti-abortion
movementrdquo and ldquoAbortion-rights movementrdquo are assigned to N10
Once we have the list of pages in T we proceed building the topic-
induced network as described in the first part of this section The
articles we collect gather pages about different entities such as
organizations people events The inclusion of a heterogeneous set
of pages for each viewpoint allows to capture the different way a
user can learnknow about a topic
Before moving on we need to make two remarks (1) Throughoutthe paper when we talk about articles expressing an opinion ordescribe a viewpoint of a topic we do not mean that they endorse
the position of any subject they describe But they objectively talk of
entities that are close to one side of the issue (2) Since subcategoriesare often redundant or not entirely related to the parent category
we check them manually In this way we avoid cases like having
articles about anti-racism falling into the racism category Moreover
we do not consider categories whose names do not include topic-
specific keywords
32 General Statistics on Topicsrsquo NetworksFollowing the procedure explained in the previous section we
collect the topic-induced network related to six different topics
that we pick from the List of controversial issues on Wikipedia11
and other resources that indicate some controversial issues in our
society These topics are abortion cannabis guns evolution LGBTand racism These are critical topics that often polarize as follows
pro-choice vs prolife cannabis activism vs cannabis prohibition
gun control vs gun rights creationism vs evolutionary biology
support to LGBT rights vs opposition to LGBT rights and racism
10We report the size of the intersections between partitions in the next section
11httpsenwikipediaorgwikiWikipediaList_of_controversial_issues
Topic 119875 119875 Seed 119875 Seed 119875
Abortion Pro-life Pro-choice
Anti-abortion
movement
Abortion-rights
movement
Cannabis Prohibition Activism Cannabis prohibition Cannabis activism
Guns Control Rights
Gun control
advocacy groups
Gun rights
advocacy groups
Evolution Creationism
Evolutionary
biology
Creationism
Evolutionary
biology
Racism Racism Anti-racism Racism Anti-racism
LGBT Discrimination Support
Discrimination against
LGBT people
LGBT rights
movement
Table 2 The table indicates what opinion of a topic the par-titions 119875 and 119875 correspond to
vs anti-racism Information about the seed categories of each topic
are in Table 2 The full category lists and sample titles are provided
in the code folder Sect 1
For the rest of the paper we refer to the opinions about a topic
using 119875 and 119875 In Table 2 for each topic we match each set to the
real opinion it represents
Before presenting the general statistics of the retrieved networks
we remark that when we assign the articles to partitions we put
to the set N those assigned to both partitions The size of the
intersections among partitions (ie the number of common articles)
are the following abortion is 2 cannabis is 3 evolution is 2 guns is 1lgbt is 5 racism is 7 Recalling that we do not remove these articles
(ie they belong to N ) they can still act as bridges connecting 119875
and 119875 in sessions longer than 1 click Instead when we consider
the direct connections among partitions (1 click) we discard them
since they do not explicitly categorized into one partition
In Table 1 we show some statistics on the six topic-induced
networks Immediately we observe that the size of 119875 and 119875 differ
substantially for all the topics except for racism and guns It means
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
that we have one of the two opinions represented by more articles
In terms of content this does not necessarily imply that neither
one of the two views is incomplete nor insufficiently represented
Indeed a topic spans a few articles or may require more pages to
be complete On the other hand the unbalanced sizes can affect an
opinionrsquos exposure within the entire Wikipediarsquos network Practi-
cally if a set of articles is large and well connected to the rest of
the network the chances that users who randomly browses reach
it are higher than those of going to a small partition Moreover if
readers exploit the random article functionality of Wikipedia an
opinion more represented gets more chances of being randomly
sampled
The topics showing the higher unbalance are cannabis wherethere are five times more pages about activism than about prohi-bition and evolution where there are four times more pages about
evolutionary biology than about creationism If we consider the edges
across partitions the number of cross-partition edges is higher for
bigger sets This is reasonable because more nodes can point to the
opposite side Despite that for evolution the edges from creationismto evolutionary biology are sim3 times more and for LGBT the edges
from discrimination to rights are 36 more Despite the low number
of edges across cannabis partitions we decide not to discard the
topic
Above we said that one of the two partitions might connect
better to the rest of the encyclopedia We observe that the sizes of 119875
and 119875 are not linear in the number of edges that point out or to the
nodes in the partitions For instance the number of articles about
pro-choice (291) is half of the nodes related to pro-life movement
(481) Although the nodes in pro-life are twice as many as those in
pro-choice the number of links pointing to pages about pro-choiceis 36 more than those pointing to pro-life articles This happenswith different magnitude also for guns and LGBT We will see later
that the fact that a side of a topic is better blended in the network
has implications on the readersrsquo exposure to one of the two sides
of the topic (Sect 6)
We also investigate how many pages in 119875 and 119875 cannot be
reached by users unless they enter Wikipedia directly on those
pages The sets of articles with the highest number of unreachable
nodes are in the category of cannabis prohibition (1136) followed
by the 562 of evolutionary biology and LGBT rights (535)Furthermore we compute the modularity 119876 among 119875 and 119875
Higher 119876 means that connections within partitions exceed those
among them In Table 1 we report three values computed on dif-
ferently weighted graphs with probabilities assigned to click the
link of each page as follows (1) uniform (2) proportional to the
position of the link within the page and (3) proportional to readersrsquo
clickstream (see Sect 412) Overall if we consider the position of
links and readersrsquo clickstream it seems that the partitions are more
modular
Based on that we study how links across and within partitions
position in pages First we define the position of a link Given a
page we have its list of links in order of appearance We get the
relative rank within the list for each link and re-scale it by the tanh
In this way we have values in [0 1] and the links at the top of the
list get a more similar score The set of links includes those in the
infoboxes We regard them as at the top of the article according to
results in [15 17] If a link appears more than once we average its
position
In Figure 2 we show the position distributions According to
the t-test whose significant level is fixed to 120572 = 095 the average
position of links in pro-choice pointing to pro-choice is significantlydifferent than the average position of links pointing to pro-life Alsothe position of links from guns control to guns control is signifi-cantly higher than those to guns rights For evolutionary biologywhose distribution of links to creationism are placed statistically
significantly lower than those to evolutionary biology The same
happens for LGBTFor the sake of completeness of the analysis even if not used
further in the paper for each topic we study the quality of the pages
populating it In particular we use the ORES API to get the ldquoarticlequalityrdquo We observe that overall for all the topics between 60 and
70 the articles are classified as stubs or start Then the 22-29 is
in B-class the 0-5 are Featured Articles and the remaining belong
to the C-class12
4 METRICSIn this section we define the models and metrics that we use to an-
swer the research questions formulated in Sect 1 First we describe
how we characterize readersrsquo consumption either by analyzing
real usersrsquo data or by simulating their behavior (see Sect 41) Then
we introduce the core metrics of the paper ExDIN and M-ExDIN
see Sect 42
41 Content ConsumptionTo understand readersrsquo consumption of polarizing topics we need
different modeling strategies that we describe in the following
subsections
411 Metrics Based on Clickstream We build twometrics upon the
information we extract from usersrsquo clickstream data that are made
publicly available by Wikimedia and preserve usersrsquo privacy [14
54]13
From these data we infer 119888119894 119895 counting how many times a hyper-
link to 119894 isin 119881 is clicked from page 119895 The page 119895 may be either an
internalWikipedia page ( 119895 isin 119860 recalling that119881 = T cupNcup119878 includeall the Wikipedia pages) or external if corresponds to a page from
outside Wikipedia (eg a search engine) Thus we define the vari-
able120575 119895 which indicateswhether 119895 is an external page or it belongs to
the topic-induced network 120575 119895 = 1 if 119895 is external and 0 otherwise
Given a page 119894 we indicate withJ the set of external and internal
pages pointing to it see Figure 3 We define 119888119894 =sum
119895 isinJ 119888 119895119894 to be
the total clicks to the page
sum119895 isinJ 120575119894119888 119895119894 is the total number of clicks
from external websites therefore the difference between 119888119894 and this
summation is the number of visits from internal (Wikipedia) pages
Now we are ready to define the following metrics
Reader Search Rate (RSR) Given a page 119894 isin 119881 the empirical
probability that a visit to page 119894 is from an external website is
119877119878119877119894 =
sum119895 isinJ 120575119894119888 119895119894
119888119894 (1)
12httpsenwikipediaorgwikiTemplateGrading_scheme
13Description of the data is at httpsmetawikimediaorgwikiResearch
Wikipedia_clickstream The provided information is enough to extract the clickstream
based metrics
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Click Through Rate (CTR) Given a page 119894 isin 119881 the empirical
probability that a reader clicks a link within the page is
119862119879119877119894 =
sum119895 isin119873119900119906119905 (119894) 119888119894 119895
119888119894 (2)
where 119873119900119906119905 (119894) is the the set of pages 119894 points to (Multiple clicks
from the same page are counted as originating from different visits
to 119894 and thus counted multiple times in 119888119894 )
412 Model Clicks Within Pages When readers visit a page they
have the possibility of clicking any of the present links However
according to the information needs they want to satisfy each of the
links may have a different probability of being clicked [45] Now
we propose three models to describe the distribution probability
of clicking a link ldquojrdquo within an article ldquoirdquo First let 119894 be an article
in 119881 and 119895 isin 119873119900119906119905 (119894) We define 119901119900119904 ( 119895 |119894) as the rank of 119895 among
all links in 119894 and 119903 ( 119895 |119894) = |119873119900119906119905 (119894) | minus 119901119900119904 ( 119895 |119894) such that a higher
value indicates a higher ranking position Moreover we introduce
tanh119909 = 1198902119909minus1
1198902119909+1 which we use to transform ranking positions to
values between 0 and 1 such that links at the top of the page get
similar scores
The Clicks Within Pages models (CwP) are directly applicable on
119866 by setting the transition matrix119872 in one of the following modes
(1) 119872119906(Uniform) whose entry119898(119894 119895) = 1
|119873119900119906119905 (119894) | mimics read-
ers who click each link in a page uniformly at random
(2) 119872119901(Position) whose entry 119898(119894 119895) =
tanh 119903 ( 119895 |119894)sum119895isin119873119900119906119905 (119894 ) tanh 119903 ( 119895 |119894)
captures the scenario in which readers click with higher
probability links appearing first in the page This model is
based on previous work that shows how the link position is
a good predictor to determine the success of a link [16 31]
(3) 119872119888(Clicks) whose entry119898(119894 119895) = 119888119894119895sum
119895isin119873119900119906119905 (119894 ) 119888119894119895represents
the empirical probability that users in 119894 will click the link
toward 119895 When 119888119894 119895 lt 10 we substitute it with 1014 the
minimum number of times the link must be clicked to be
included in the dataset [53]
For the sake of completeness we recall that 119866 includes a super
node 119904 To fill its corresponding entries in the transition matrices
we need to aggregate over the edges we compressed to build the
graph15
see Sect 31
413 Readers Navigation Model The main goal of this paper is
to audit the mutual exposure to diverse information across 119875 and
119875 We can do it by simply looking at a snapshot of the graph and
counting the links going from 119875 to 119875 and vice-versa To do a step
further we recall that the Wikipediarsquos network is conceived to let
users move fulfilling their own information needs Thus we want
to understand how different usersrsquo navigation behavior can affect
readersrsquo exposure to diverse information
To do that it would be optimal to have access to usersrsquo log ses-
sion Because these data are not available to the public we define a
parametric model that simulates usersrsquo navigation by embedding
14We aim to model users on the current version of Wikipedia Thus to include all
the links we assign a smoothing factor equal to 10 to links clicked less than 10 This
implies a small probability of clicking these links Setting the smoothing factor to 10
is a deliberate choice However we experimentally verified that setting any number
between 1 and 10 does not affect the results
15The computation of these quantities is straightforward so we omit it from the
body of the paper
external
internal
i internal
Figure 3 Information from the clickstreamdataset For eachnode we extract the number of views coming from inter-nal and external websites Moreover we know howmany ac-cesses on a page turn into a click toward another article
different behaviors accordingly to chosen parameter We empha-
size that the scope of this model is not to perfectly replicate usersrsquo
behavior on Wikipedia Rather we want to see how users simu-
lated from a reasonable and general model are exposed to diverse
information
In other words we want to define a stochastic process with 119899 +1
states corresponding to the 119899 + 1 pages in119881 that approximates the
probability of reaching any of the articles starting at random from
119901 isin 119875 (or from 119875 )
Wemodel this by considering the process 119883 ℓ ℓ = 0 1 119871 on
the set of nodes119881 induced by transitionmatrix119872 with starting state
119883 0selected from the probability distribution 1206450
119875= (120587119875 )119894 isin R1times119899
over119881 We recall that the transition matrix119872 can vary according to
the CwP models (Sections 31 and 412) Based on the assumption
that usersrsquo session length (the number of clicks) is finite we evaluate
the process on a finite number of states 119871 We have that Pr(119883 ℓ =
119895) = (120587 ℓ119875) 119895 where the (row) vector 120645 ℓ
119875is given by the following
variation of the Personalized Random Walk with Restart (RWR)
Definition 1 (Navigation Model) Let1198720 be the transition ma-trix embedding a click-within-pages model 1206450
119875the distribution of the
starting state over 119875 and 120572 isin [0 1] the restart parameter We have
1206451
119875 = 1206450
119875middot1198720 (3)
and for ℓ ge 1
120645 ℓ+1
119875 = (1 minus 120572)120645 ℓ119875 middot119872ℓ + 120572 (1206450
119875 middot119872ℓ ) (4)
where119872ℓ = norm((119863 (119872ℓminus1)119879 )119879 ) and119863 = 119889119894119886119892
(1 + 120645 ℓminus1
119875
)minus1
norm(119872)transforms matrix119872 into a right-stochastic matrix by normalizingeach row independently such that it sums to 1
This process is a variation of the standard random-surfer (PageR-
ank) model with the difference that the transition matrix is updated
in each step It takes into account the probability that an article
has already been visited in a previous iteration Specifically the
vector 120645 ℓ119875that we get at the end of each iteration represents the
likelihood that each node is reached at step ℓ if it starts uniformly at
random from a node in 119875 We assume that readers within the same
session do not click more than once the same link Thus we desire
that at step ℓ + 1 the nodes that are clicked with high probability
at step ℓ see their probability of being reached deflated and those
with lower probability have more chances of being clicked We
achieve this by dividing the rows of119872 by the vector of probabilities
120645 ℓ119875+1 where 1 is a smoothing factor to avoid divisions by 0 and
then normalize the matrix to get the updated stochastic matrix to
use in the next iteration
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
(a) Star-like (120572 = 1) (b) Star-like Rand Navigation(0 lt
120572 lt 1)
(c) Random Naviga-
tion (0 = 120572 )
Figure 4 Navigation model for different 120572 The green nodesrepresent the starting navigation pages
Overall as we will later see in Section 42 this approach allow us
to investigate how the exposure to diverse information varies for
users who behave differently in terms of navigation session length
(meant as the number of clicks) and next-link choices
Looking deeper at the model
bull When 120572 = 1 Figure 4(a) the model emulates the reader
whose navigation consists in just opening links from the
starting page We call this behavior star-like and basically
consists in opening pages from the starting node With this
kind of exploration readers locally explore articles likely
semantically related to each other [49]
bull For 0 lt 120572 lt 1 Figure 4(b) we simulate two cases (1) readers
open sequential articles and then jump back to the starting
page (2) readers keeps multiple path open The more 120572 is
close to 1 the more users show a star-like behavior Instead
the closer 120572 is to 0 the more users navigate navigate in a
more DFS-oriented fashion Thus readers move randomly
according to the CwP model and from time to times jump
back to the starting page
bull If 120572 = 0 Figure 4(c) the users sequentially clicks links so
each click depends only on the CwP model In this case
especially if related articles are not densely connected the
exploration can lead to articles less related to the starting
page and returning to the origin following hyperlinks may
be difficult
Because Wikipedia does not have a button that allows readers to
go back to the previous page we assume the jumping back action
to consist in clicking the back button of the browser in use until
reaching again the session starting page The restart parameter
indirectly embeds the back-button action which for the absence of
back-links on Wikipedia can not be tracked on the graph
The behaviors replicated through the model recall those de-
scribed in [43 45]
42 Exposure MetricsAt this point we have all the ingredients to define the exposureto diverse information The metrics aim to quantify how much the
network structure allows readers to reach one or multiple sets of
articles To do that we rely on both the CwP and Navigationmodels
The application of the following metrics is not limited to polarizing
topics In fact they can generalize to the analysis of any sets of
nodes in a graph For this reason we adopt a more general notation
in their definition
Pro-
life
Pro-
choi
ce
Proh
ibiit
onAc
tivism
Cont
rol
Righ
ts
Crea
tioni
smEv
ol B
io
Racis
m
Anti-
racis
m
Disc
rimin
atio
nSu
ppor
t
0
5
10
Log(
Page
view
s)
Figure 5 Pageviews distribution For each topic we havea purple and yellow boxplot They represent the average(over all pages in the group 119875 or 119875) number of pageviewsAll the distribution distributions except for abortion are sta-tistically different at confidence level 120572 = 095 The topicsin order are abortion cannabis guns evolution racism andLGBT
Definition 2 (Exposure to diverse information (ExDIN)) Giventwo sets of pages 119875 119875 in 119881 let 120645 ℓ
119875be the vector indicating for each
article the probability of being reached at step ℓ (ℓ ge 1) starting froma random page in 119875 We say that the exposure of 119875 to 119875 is
119890ℓ119875rarr119875
=sum119895 isin119875
Pr(119883 ℓ = 119895) =sum119895 isin119875
120645 ℓ119875 (5)
and describes the probability that a reader in 119875 reaches an arbitrarynode in 119875 at the ℓth click
We employ this metric in two ways
(1) (Topological exposure to diverse information) If ℓ is 1 and
the CwP model is 119872119906(see Sect 412) it only quantifies
the topological property of the network to connect pages
belonging to different sets
(2) (Readersrsquo exposure to diverse information) For any parameter
and model that we pick the metric tells us how the readers
characterized by the CwP and Navigation models change
their exposure to diverse information over a session (ie
sequence of clicks)
Moreover we notice that Definition 2 can be extended tomultiple
sets Consider the case where we want to understand how one set of
nodes 119875 is exposed to three sets of nodes 119876119885 and 119871 To calculate
the ExDIN if we want to know the total exposure to the three sets
we define 119875 = 119876 cup 119885 cup 119871 Otherwise if we want to have the ExDIN
wrt to each set namely 119890119875rarr119876 119890119875rarr119885 119890119875rarr119871 we take 120645ℓ119875and sum
up the probabilities of the nodes within each set
Now that we have a metric to compute the exposure to diverseinformation we want to compare the flows among the sets Thus
we introduce the mutual exposure to diverse information
Definition 3 (Mutual exposure to diverse information (M-ExDIN))Let 119890ℓ
119875rarr119875and 119890ℓ
119875rarr119875be the exposure to diverse information of sets 119875
and 119875 We say that the mutual exposure between the sets is
120598ℓ =min119890ℓ
119875rarr119875 119890ℓ119875rarr119875
max119890ℓ119875rarr119875
119890ℓ119875rarr119875
isin [0 1] (6)
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
00 05 10 15 20 25 of pageview from
opposite partition
Pro-lifePro-choice
ProhibitionActivism
Gun controlGun rights
CreazionismEvolutionary biology
LGBT discriminationLGBT rights
RacismAnti-racism
Figure 6 Percentage of pageviews coming from oppositeside Topics in order from the bottom abortion cannabisguns evolution LGBT racism
If either 119890ℓ119875rarr119875
or 119890ℓ119875rarr119875
is 0 then 120598 = 0
This measure quantifies to what extend the exposure to diverse
information is balanced across 119875 and 119875
The closer 120598 is to 1 the more balanced the probabilities of moving
from one set to the other are In this case the network topology does
not favor connections from one set to the other On the other hand
if 120598 is close to 0 it the network structure tends to favor either the
navigation from 119875 rarr 119875 or from 119875 rarr 119875 On this perspective if we
observe a tendency in the network of facilitating the exploration
from one of the sets to the other we may say that the network
topology is biased toward a direction Thus we can think of using
M-ExDIN to measure the bias in the network wrt two sets of
nodes
Even though the mutual exposure to diverse information cap-
tures the balance among ExDIN of 119875 and 119875 when they are of com-
parable size it may fail if one is much smaller than the other For
instance suppose 119875 is 10 times larger than 119875 then if pages of both
partitions have a similar out-degree distribution one would expect
119890ℓ119875rarr119875
asymp 10 middot 119890ℓ119875rarr119875
and as a result 120598 asymp 01 The same happens if
they have similar in-degree distribution For this reason when we
compute either ExDIN and M-ExDIN we check whether the sizes
of the communities are unbalanced and we proceed as follows If
|119875 | lt |119875 | we define 119875 prime obtained by sampling |119875 | articles from 119875
Thus we use the new set for all computations Because of the ran-
domness of the phenomenon we repeat the measurements multiple
times
5 RQ1 READERSrsquo TOPIC CONSUMPTIONBefore looking into how readers are exposed to diverse content
we investigate how they have consumed each of the six topics
that we concentrate on over the last four years In particular we
collect monthly clickstream data from November 2017 to September
2020 We note that when we count the click views of a page we
consider the average over the number of months the page existed16
Accordingly when computing the occurrences for the transitions
matrix based on clickstream we consider the average clicks of the
link over the number of months it exists In this way we reduce
16Based on the temporal graphs extracted by [11]
the seasonality effect and weight links according to page changes
in terms of hyperlinks
51 Pageviews DistributionTo start our analysis we count the average number of times a
page has been visited over 34 months In Figure 5 we plot the
log-distributions of the pageviews for each topic and opinion By
running a t-test we conclude that for all topics except for abortionthe difference of the means of opinionsrsquo pageviews is statistically
significant for 119901 lt 005 This finding demonstrate that users tend to
visit more pages expressingsupporting one of the two viewpoints
From a networkrsquos perspective to increase the exposure to opposing
opinions it is desirable for pages that are frequently visited to be
well connected to articles expressing opposing opinions
In Figure 6 we break down the pageviews showing how many
of them come from pages of the opposing partition Overall the
fraction of visits from the opposite side is low (below 05) The
category LGBT rights has the highest ratio of visits from LGBT dis-crimination pages about 25 For topics such as guns and abortionthe percentage of visits from opposite partition shows that there
are somewhat fewer visits to pages of a liberal inclination from
articles expressing a more conservative opinion In fact the 028
of visits to pro-choice come from pro-life compared to 06 visits of
pro-life from pro-choice
52 External or Internal Access to the TopicWe now investigate how readers access content about a topic As
introduced in Section 411 from the clickstream data we can com-
pute the RSR which indicates whether a page is accessed more
by external sources or by navigating Wikipedia In Figure 7 we
provide a visualization that depicts the flows of the cumulative
visits from external and internal pages towards the two partitions
Referring to Figure 7(c) the 448 of visualizations come from
internal pages The click stream from internal pages is broken down
to see the proportion of flow towards guns control and gun rightsThe internal views of guns control articles are 34 times more than
those of gun rights We observe that also from external websites
most of the traffic is towards gun control (27 times more than gunrights) Overall the 26 of the total visits to gun related content is
concentrated on gun rightsThe abundance of traffic towards one of the two opinions does
not characterize only the guns topic Indeed among all the topics
the 59ndash74 of visits is accumulated by one partition Moreover
readersrsquo preferences appear consistent among external and internal
accesses that is they both point more towards the same view of
the topic For both internal and external views the distribution
of accesses toward partitions is approximately the same (ie the
percentage of visits from external to 119875 (resp 119875 ) is the same of
from internal to 119875 (resp119875 )) The only exception is evolution whoseexternal visits to creationism is 453 lower than internal accesses
We note that partitions with higher views are not necessarily the
biggest in the topic-induced networks
In general the largest amount of visits to topicsrsquo articles comes
from external pages Particularly only the 236 and 335 of traffic
to evolution and racism is generated by the internal Wikipediarsquos
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll PPPPPrrrrrooooo-----llllliiiiifffffeeeee
PPPPPrrrrrooooo-----ccccchhhhhoooooiiiiiccccceeeee
(a) Abortion
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
PPPPPrrrrrooooohhhhhiiiiibbbbbiiiiiiiiiitttttooooonnnnn
AAAAAccccctttttiiiiivvvvviiiiisssssmmmmm
(b) Cannabis
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
CCCCCooooonnnnntttttrrrrrooooolllll
RRRRRiiiiiggggghhhhhtttttsssss
(c) Guns
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllllCCCCCrrrrreeeeeaaaaatttttiiiiiooooonnnnniiiiisssssmmmmm
EEEEEvvvvvooooolllll BBBBBiiiiiooooo
(d) Evolution
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
RRRRRaaaaaccccciiiiisssssmmmmm
AAAAAnnnnntttttiiiii-----rrrrraaaaaccccciiiiisssssmmmmm
(e) Racism
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
DDDDDiiiiissssscccccrrrrriiiiimmmmmiiiiinnnnnaaaaatttttiiiiiooooonnnnn
SSSSSuuuuuppppppppppooooorrrrrttttt
(f) LGBT
Figure 7 Cumulative Pagesrsquo Traffic Eachplot indicates (1) On the left the cumulative amount of accesses coming from externalweb pages or internal Wikipediarsquos articles (2) The flows of visits from external and internal pages to partitions (3) On theright the cumulative accesses to 119875 and 119875
0 5 10 15 20 25 30Avg(Click Through Rate) X 100
Pro-lifePro-choiceProhibition
ActivismGun controlGun rights
CreazionismEvolutionary biologyLGBT discrimination
LGBT rightsRacism
Anti-racism
Figure 8 Average click-through rate In this plot we reportthe average CTR of pages belonging to the same set 119875 (resp119875) The score indicate the average probability that a linkwithin a page 119901 in 119875 (resp 119875) is clicked Topics in orderfrom the bottom abortion cannabis guns evolution racismLGBT
navigation The same quantity for the remaining topics ranges
ranges between 44 and 47
We point out that readersrsquo consulting articles about abortioncannabis and guns are inclined toward pages conveying liberal
views on the topic Instead it is more complicated to draw inter-
pretations about the remaining topics One explanation may be
that users look for information generally less covered in the public
mainstream debate
53 How Much Readers Navigate LinksOnce readers visit a page they can decide to click any of its links
We want to understand how frequently they do so For that we
compute the average pagesrsquo click-through rate (Sect 411)
We plot this information in Figure 8 Overall we see that the
percentage of access turning into a visit to another page ranges
between 10ndash28 Dimitrov and Lemmerich [14] observed that the
CTR average for the whole Wikipedia is 12 So most of the subset
of pages we consider have a CTR higher than Wikipediarsquos average
The CTR of guns control is the highest (28) the pages about racismfollow with 26 The articles that over the years have generated
less internal traffic are those about evolutionary biology and LGBTrights
Examining pagesrsquo connections we found that those with higher
CTR have more links (the Pearson correlation coefficient of is 052)
Topic macrC(119875)
macrC(119875 rarr 119875) macrC(119875 rarr 119875)
macrC(119875) macrC(119875 rarr 119875)
macrC(119875 rarr 119875)Abortion 6889 9082 5764 7026 8981 4537
Cannabis 7081 9501 3750 6578 9652 1667
Guns 5234 7869 3568 5963 7535 3928
Evolution 7115 8449 6447 7269 9900 5513
Racism 5636 8841 3432 7187 9063 6544
LGBT 6166 8942 525 7242 9252 5917
Table 3 Average of links within pages clicked less than 10times
macrC(119875) is the average percentage of un-clicked hyper-linkswithin pages in 119875
macrC(119875 rarr 119875) is the average percentageof un-clicked links within 119875 pointing to 119875
This suggests that articles having higher out-degree offer more
options to users Presumably because of this users are more likely
to continue the exploration from those articles
In addition we count the number of links clicked fewer than 10
times over the last three years see Table 3 As an example given
a page in creationism on average 7115 of its links have been
clicked fewer than 10 times If we distinguish between references
to creationism and references to evolution readers did not click
the 8449 of links pointing to creationism and the 6447 of those
pointing to evolutionary biology
6 RQ2 EXPOSURE ACROSS TOPICVIEWPOINTS
The main contribution of this paper is to examine to what extent
current Wikipediarsquos topology supports users to explore diverse
facets of polarizing issues In particular we study (1) how readers are
locally exposed to diverse information and (2) how their exposure
to plural opinions may change throughout a navigation session
61 Exposure to DiversityTo evaluate the exposure to diversity induced by the networkrsquos
topology we compute the exposure to diverse information for ℓ = 1
using the uniform CwP model Recalling that ℓ indicates the usersrsquo
session length if we set it equals to 1 we study the exposure to
diversity over one-click sessions
Plots in the first row of Figure 9 show the value of ExDIN for
all the topics when CwP is119872119906
For instance let evolution be the topic we analyze If readers
start uniformly at random from a page about creationism the prob-
ability of visiting an article of the same partition is 576 On the
contrary the chances of entering a page about evolutionary biology
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Pro-life
Pro-choice
Unifo
rm
456
748133
061 Prohibiiton
Activism
338
038039
008 Control
Rights
444
478161
116 Creationism
Evol Bio
576
327061
018 Racism
Anti-racism
876
8613
142 Discrimination
Support
424
395063
054
Pro-life
Pro-choice
Posit
ion
428
781115
056 Prohibiiton
Activism
332
037037
008 Control
Rights
468
473138
089 Creationism
Evol Bio
553
329054
016 Racism
Anti-racism
892
834129
159 Discrimination
Support
394
356056
042
Pro-
life
Pro-
choi
ce
Pro-life
Pro-choice
Click
s
1274
1357129
123Pr
ohib
iiton
Activ
ism
Prohibiiton
Activism
864
065081
013
Cont
rol
Righ
ts
Control
Rights
1376
1305208
137
Crea
tioni
sm
Evol
Bio
Creationism
Evol Bio
1157
52069
03
Racis
m
Anti-
racis
m
Racism
Anti-racism
2367
1503116
129
Disc
rimin
atio
n
Supp
ort
Discrimination
Support
1107
653066
053
Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click
is 018 (32 times smaller) On the other hand readers starting
uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion
This probability is 5 times larger than that of visiting creationismpages
It is worth to point out that the current networkrsquos topology not
only nudges users in reading more about the same opinion but
also hinders them to explore diverse content symmetrically Indeed
users reading about evolutionary biology have higher chances of
reading one article about creationism (3 times more) than users from
creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-
tioned observations hold for most of them Moreover we note that
the probabilities to continue the session reading a page of the same
opinion is greater for one of the two partitions of a given topic
Taken together these measurements highlight that the structure
of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to
diverse information smaller than 1
The findings above report the intrinsic capability of the network
to spur users towards diverse content If we want to combine it
with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results
in the second and third row of Figure 9
Referring back to evolution we now consider the matrix corre-
sponding to the ExDIN computed using the position CwP model
(second row) We see that if users click with higher chances links
at the top of the page wrt the uniform model the probabilities are
only slightly modified These modest variations are coherent with
linksrsquo placement within pages Figure 2
For a few topics such as guns the linksrsquo position plays a more
significant role worsening the user exposure to diverse information
Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-
sequence is that the probability of reaching an article supporting
guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the
uniform CwP model Therefore we conclude that for some topics
the placement of links within pages contributes in reducing the
exposure to diverse information In other words users who tend
to click with higher probability the links located towards the be-
ginning of a page have less possibility to read about contrasting
opinions
Finally we analyze how the phenomenon changes when we use
the click CwP model (third row) In this case we assume readers
make the next-click choice similarly to past users Going back to
evolution we immediately observe that the probability to start a
session in creationism and to continue reading about it after one
click grows from 576 of the uniform model to the 1157
For all the topics we verify a significant increment of the proba-
bility to visit pages of the same opinion Simply interpreting this
result we can say that real users click more the links strictly related
to the page they are reading From another perspective combining
this finding with the previous remark saying that the ldquotopology of
the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer
this question but we hope it will be addressed in future works
Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing
diverse view
Before moving to the next section we want to underline that
stronger relations among pages of similar content is an intrinsic
property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a
fact we believe that it would be valuable to provide to editors met-
rics and tools making them aware of the effect that a newcurrent
links have on usersrsquo exposure to diverse information This is not
meant to alter the core and essential intrinsic property ofWikipedia
rather to avoid this property to become harmful when it prevents
users from accessing diverse content
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
1 3 5 7 9 11 13 15
1
2
e PP
Abortion
1 3 5 7 9 11 13 15
00
05
10Cannabis
1 3 5 7 9 11 13 150
1
2
Guns
1 3 5 7 9 11 13 15
05
10Evolution
1 3 5 7 9 11 13 15
05
10
15
Racism
1 3 5 7 9 11 13 15
05
10
Lgbt
1 3 5 7 9 11 13 1500
05
10
15
e PP
1 3 5 7 9 11 13 15
000
025
050
1 3 5 7 9 11 13 15
1
2
1 3 5 7 9 11 13 15
02
04
06
1 3 5 7 9 11 13 1505
10
15
1 3 5 7 9 11 13 15
05
10
15
1 3 5 7 9 11 13 15
40
60
80
100
1 3 5 7 9 11 13 15
25
50
75
1 3 5 7 9 11 13 1525
50
75
100
1 3 5 7 9 11 13 15
40
60
80
1 3 5 7 9 11 13 15
60
80
100
1 3 5 7 9 11 13 15
60
80
100
Number of Clicks
uniform alpha=0position alpha=0clicks alpha=0
uniform alpha=02position alpha=02clicks alpha=02
uniform alpha=1position alpha=1clicks alpha=1
Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42
62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions
longer than 1 and see how their (mutual) exposure to diversity may
change According to the combinations of models employed to mea-
sure the ExDIN and M-ExDIN we provide different insights about
the effect of the current networkrsquos topology on usersrsquo exposure to
diverse content over a navigation session
In Figure 10 for sessions of length 15 we plot the ExDIN from
119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can
notice that each of the topics shows its own trends For this reason
we decide to highlight and provide an explanation for the most
recurrent patterns Moreover for better understanding we suggest
to cross-check the following explanations with the analysis done
above in the paper We start describing how given the current net-
workrsquos topology the ExDIN changes over the course of a navigation
session (first and second row)
(1) The curves corresponding to the same value of 120572 (same color)
show very similar trends Depending on the respective CwP model
(markerrsquos shape) they are shifted up or down This implies that
when users share the same navigation behavior the way they make
the next-click choice plays a crucial role on determining the mag-
nitude of their exposure to diverse content In general the CwP
model (markersrsquo shapes) corresponding to higher exposure is119872119888
followed by119872119906and119872119901
(2) If users navigate mirroring a star-like behavior (green 120572 = 1)
their exposure to the opposite opinion is steady It can slightly
decrease or increase when the probability of clicking links to the
opposite side becomes higher or lower respectively in the first
iterations This happens because these kind of users are only subject
to the exposure of their starting navigation page So the more links
to the opposite partition they click in the first steps the more their
exposure to diversity decreases and vice-versa
(3) The curves of users who randomly navigate the network (sky
blue 120572 = 0) show two trends For both cases after the first few
clicks the ExDIN is lower than at the beginning of the session
Then it inverts the trend In one case it reaches or exceeds the
starting exposure On the other hand it grows getting steady below
the initial exposure The more the destination partition is connected
to the rest of the graph the more users randomly navigating the
network are able to reach it Sometimes ExDIN from 119875 to 119875 of
LGBT the curves start to decrease after many steps This happens
when the pages within the destination partition have been reached
with high probability
(4) Users characterized by a star-like random navigation (blue
120572 = 02) are exposed to diverse content similarly to users exploring
randomly the network but the ExDIN magnitude is greater because
of the possibility of jumping back to the starting point at each step
Given this observation we analyze the ExDIN of guns With the
current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)
are those characterized by a star-like behavior As soon as the users
navigate in a random fashion this probability drops down This is
because the guns rights partition has a number of in-going edges
that prevent random users who walk away to get back to it We
observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to
the guns control partition is higher or comparable to that of star-like
behaving users The probability of reaching guns control after many
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
steps becomes high because of the higher incoming edges of the
partition
From complementary analysis we observe that for sessions
longer than 2 page visits the topology of the graph is not such
to keep random users within the knowledge bubble they accessed
Indeed for all the topics the probability to visit articles of the same
opinion of the starting page becomes close to 0 (refer to Fig 9 to
cross-check the probabilities at the first click) For users with star-
like behavior the probabilities show slightly descending curves
This demonstrates that they tend to pick pages of the same opinion
at the first iterations Due to space constraints figures picturing
these phenomena could not be displayed
Nowwe compare the ExDIN curves of each topic to understand if
users reading about different opinions have equal chances of visiting
each other We recall that to do it we use the Mutual exposure to
diverse information see Sect 42 For all the topics the mutual
exposure to diverse information is lower than 100 meaning that
for none of them does the network topology provide an equal
exposure across opinions If we consider topics like abortion and
guns the longer the usersrsquo session is the more the network topology
prevents readers to symmetrically explore different opinions
In general if we detect low mutuality the topology of the net-
work favors the exploration of one side of the topic more than the
other Moreover we want to stress that all the comments regarding
users navigating according to the uniform CwP model express the
intrinsic topological exposure of the network On Wikipedia two
scenarios determining this phenomenon may be (1) the knowledge
on the encyclopedia is complete but articles are underlinked [49]
that is there is the content and keywords to become anchors thus
we need a strategy to densify the network ensuringmutual exposure
to diversity (2) the knowledge on the encyclopedia is incomplete
that is there are no words to attach links thus the addition of com-
plementary content may be necessary An in depth investigation of
this conditions may be an interesting future work
7 CONCLUSIONSOur work provides a first analysis to understand how the current
Wikipediarsquos network topology assists readers to explore opposite
stances of polarizing topics spanning over sets of articles We for-
malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles
expressing opposing viewpoints The latter evaluates whether the
relationship across diverse views is symmetric that is whether
the flow and the opportunity to go from one side to the other is
comparable for the two directions
We investigate the phenomenon on six polarizing topics (Sect 6)
In addition we also study the overall usersrsquo topics consumption
Our main findings suggest the following
bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-
ing from both external and internal pages suggest that read-
ers are inclined to seek content about one facet of the topic
Most seem to have bias toward liberal content (see Fig 7)
bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse
contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61
the analysis suggests that users reading about an opinion
have higher chances of continuing to explore articles of sim-
ilar views than of the opposite Furthermore for each of the
topics that we explored the users of one of the two views
had substantially more tendency albeit small to visit pages
of the opposing view than the ones of the other one
bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we
observe that the mutual exposure to diverse information is
never achieved by users navigating completely at random
The better one of the two opinions is connected to the rest of
the network the more the graph nudges users toward that
opinion
bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-
mation of the same inclination diminishes On the other
hand if users explore the network with a star-like behavior
their level of exposure to the same opinion is similar to those
who only do one click
In our study we analyze sets of articles assigned to opinions ac-
cording to editorsrsquo crafted categories [44] Although this approach
represents a solid starting point for analysis it can cause article mis-
classification As future work we plan to investigate a more reliable
classification strategy to improve the accuracy of our analysis an
analysis which should include also the content of the articles along
the line information Secondly the performance of a longitudinal
study with the goal of understanding the dynamics that brought
to the current state of the encyclopediarsquos network would provide
further understanding of the usersrsquo behavior Finally in the light
of our findings we deem crucial to design tools to help editors to
contextualize articles within the network such that they are aware
of the effect of links insertion on users knowledge exploration
The prevalence of bias and polarization is well established in
multiple areas of our life and filter bubbles aggravate this phenom-
enon Understanding better how they manifest in Wikipedia (and
other media) is a crucial first step for finding ways to attenuate it
and our hope is that this work is a step towards this goal
Acknowledges Partially supported by the ERCAdvancedGrant
788893 AMDROMA Algorithmic and Mechanism Design Research
in Online Markets and MIUR PRIN project ALGADIMAR Algo-
rithms Games and Digital Markets
REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004
US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem
[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion
networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep
Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18
[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian
Self-Governance in Action Motivating the Policy Lens In ICWSM
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models
IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation
Journal of machine Learning research 3 Jan (2003) 993ndash1022
[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-
work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web
[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content
on famous persons Journal of the American society for information science andtechnology (2011)
[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter
Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM
[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves
Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter
In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs
A complete longitudinal and multi-language dataset of the Wikipedia link net-
works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607
[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena
Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-
ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media
[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho
2016 Extracting Semantics from Random Walks on Wikipedia Comparing
Learning and Counting Methods In Wiki ICWSM
[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-
ferent topic different traffic How search and navigation interplay on Wikipedia
The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28
[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926
[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web
[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding
news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346
[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers
and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320
[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in
Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72
[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In
Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)
[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Political discourse on social media Echo chambers gate-
keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922
[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-
ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671
[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)
[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-
rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282
[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First
Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media
[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013
Models of human navigation in information networks based on decentralized
search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98
[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki
dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-
trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113
[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and
Andreas Hotho 2019 On the right track Analysing and predicting navigation
success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext
and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web
Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web
[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017
How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50
[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of
source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754
[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-
nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and
moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505
[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my
friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726
[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito
2015 Measuring political polarization Twitter shows the two sides of Venezuela
Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018
Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18
[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving
website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624
[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring
Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval
[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk
about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11
[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019
Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-
fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and
Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141
[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014
The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222
[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom
of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn
Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600
[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013
Computing semantic relatedness from human navigational paths A case study
on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70
[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer
2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)
[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-
works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki
WikipediaManual_of_StyleLinking
[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia
Namespace
[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki
WikipediaNeutral_point_of_view
[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect
[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https
doiorg106084m9figshare1305770v22 (2017)
[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing
wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985
- Abstract
- 1 Introduction
- 2 Related Works
- 3 Data Collection
-
- 31 Topic Induced Networks
- 32 General Statistics on Topics Networks
-
- 4 Metrics
-
- 41 Content Consumption
- 42 Exposure Metrics
-
- 5 RQ1 Readers Topic Consumption
-
- 51 Pageviews Distribution
- 52 External or Internal Access to the Topic
- 53 How Much Readers Navigate Links
-
- 6 RQ2 Exposure Across Topic Viewpoints
-
- 61 Exposure to Diversity
- 62 Dynamic Exposure to Diversity
-
- 7 Conclusions
- References
-
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
Figure 1 From Wikipediarsquos graph to a topic-induced net-work The image on the left shows the original Wikipediarsquosgraph On the rights we have the final topic-induced net-work The dashed circles in119882 identify the set of nodes thatwe use to build the topic-induced network 119866 The color redrefers to the set of nodes 119875 We use the blue to indicate 119875
and green and yellow for N and 119904 respectively To keep theimage tidy we do not specify the edges direction
us the requested granularity So we exploit the collection proce-
dure employed by Shi et al [44] who needed the same data to study
how polarization in teams impact articlesrsquo content about polarizing
topics see Sect 3
Polarization on Social Media There is a large spectrum of
works related to detect [1 10 12 19 36] model quantify and mit-
igate [2 9 21ndash24 32 34 35 37] polarization on social media We
focus on the work of Garimella et al [24] that better relates to our
metric of exposure to diverse information (ExDIN) They introduced
a graph polarization measure based on random walks ie Random
Walk Controversy score (RWC) On a social graph it quantifies to
what extent opinionated users are more exposed to their own opin-
ion than the opposite thanks to a chain of retweets (represented by
the random walks) While RWC is conceived for networks of users
and measures the overall polarization of a graph ExDIN works
on information networks and quantifies how the networkrsquos topol-
ogy impacts the usersrsquo exposure to diverse information when they
navigate the graph
Cultural bias onWikipedia Callahan and Herring [8] showedthe presence of cultural bias in the same articles of different lan-
guages Other studies highlighted differences between women and
men biographies [26 47] These content-based analyses call for
the need for a thorough investigation of the phenomenon To this
end we decide to investigate the presence of bias in the hyperlink
network by quantifying the diversity of pages it suggests to users
browsing the network of articles
3 DATA COLLECTIONTo audit a polarizing topic on Wikipedia we encode it by building
a topic-induced network This representation embeds both the
network structure and readersrsquo interactions with the topic
31 Topic Induced NetworksIn this section we explain how to build a topic-induced network
We suggest the reader to follow the process looking at Figure 1
First we consider the directed English Wikipediarsquos graph119882 =
(119860 119871) The nodes of the graph are encyclopediarsquos pages classified
as Articles [50] The edges represent the links connecting pages
and are known as wikilinks6 This set of links includes those in the
infoboxes7
Among the vertices we identify a set of pages T sub 119860 about the
different polarizing sides of a given topic We partition T into two
sets 119875 and 119875 (ie 119875 cap 119875 = empty and 119875 cup 119875 = T ) Each of them gather
pages related to the same side of the topic Then we define the set
of nodes N that includes all vertices at one-hop distance from the
vertices in T The reason we consider nodes representing pages
outside T is twofold (1) We want to include in the graph those
nodes related to the topic that do not appear in T because describe
subjects neutral to the topic8 (2) When we will consider readers
exploring the network we want to account for the possibility that
they reach pages about entities of opposing opinion passing through
articles not strictly related to the topic (see Sect 41)
To reduce the complexity of our analysis we cluster all the pages
in 119878 = 119860 (T cup N) in one super node 119904 Note that nodes in 119878
are only connected to vertices in N For each node 119907 isin N we can
have multiple edges going to 119904 We compress them in a unique edge
(119907 119904) Respectively 119904 can point multiple times to the same node
119907 isin N So we compress them to a unique edge (119904 119907) In both cases
the weights of (119907 119904) and (119904 119907) will be the sum of weights of the
aggregated edges
Finally we built a directed weighted network119866 = (119881 119864) that wecall topic-induced network whose set of vertices119881 is T cupNcup119904 ofcardinality 119899 + 1 and the edges 119864 are the links connecting the pages
The edge weights are transition probabilities as follows Let119872 be
an (119899 + 1) times (119899 + 1) right-stochastic transition matrix associated
to 119866 that is a matrix such that each entry 119898119894 119895 is a probability
with119898119894 119895 = 0 if (119894 119895) notin 119864 and such that
sum119899+1
119895=1119898119894 119895 = 1 The entry
119898119894 119895 describes the probability that being on article 119894 a reader clicks
page 119895 In Section 412 we propose different characterizations of
the transition matrix
Summarizing to extract the topic-induced network of a given
topic we first extracted data from a complete English Wikipedia
database dump9From this dump we build the graph119882 To collect
the corpus of articles expressing different opinions about the topic
(ie T ) we rely on the collection strategy adopted by the authors
of [44] (see Sect 2) In particular the subcorpus belonging to 119875
consists of all articles categorized under a Wikipedia category de-
scribing a viewpoint and its subcategories For instance the corpus
of abortion articles consists of two subcorpora pro-life (119875 ) and
pro-choice (119875 ) articles The pro-life subcorpus consists of all articlescategorized under the seed category ldquoAnti-abortion movementrdquo and
its subcategoriesFor instance the article ldquoFetal rightsrdquo is directly un-
der the seed category whereas the article ldquoCrisis pregnancy centerrdquo
is located under the subcategory ldquoAnti-abortion organizationsrdquo The
pro-choice corpus is collected in a similar fashion starting from the
category ldquoAbortion-rights movementrdquo Note that because we want
6We exclude links within the same page Moreover while building the graph
we resolve all the redirects [52] Specifically for any given node 119903 pointed by 119906 and
redirecting to 119907 we replace the edges (119906 119903 ) and (119903 119907) with (119906 119907) The final effectof this operation is that we exclude all the redirecting nodes from119860 while retaining
their connections to the rest of the graph
7An infobox is a fixed-format table usually added to the top right-hand corner of
articles to consistently present a summary of some unifying aspect that the articles
share
8For instance articles that present an overall introductiondescription of the topic
9Unless differently specified we refer to the dump of September 2020
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Topic |119881 119904| |119875 | |119875 | |N | |119864 | |119864119875rarr119875 | |119864119875rarr119875 | |119864119875rarrN | |119864119875rarrN | |119864Nrarr119875 | |119864Nrarr119875 | 119876 (119875 119875) Unreach(119875 ) Unreach(119875 )
Abort 56861 481 291 56093 19M 205 97 21843 14492 21396 29889 029 (030041) 21 481
Cannabis 32743 45 231 32470 11M 8 6 1089 15055 656 27823 027 (014 003) 1136 349
Guns 65743 167 187 65393 25M 98 115 18342 12304 56702 16608 026 (024 030) 363 000
Evolution 84788 342 1334 83113 199M 391 135 18289 45472 15601 58720 020 (022 027) 169 562
Racism 129963 1024 1022 127953 48M 746 560 64359 41566 74354 58195 032 (021 031) 272 255
LGBT 150563 459 640 149479 46M 195 143 28100 22678 92975 81706 034 (030 013) 244 535
Table 1 Networksrsquo statistics The notation119876 (119875 119875) isin [0 1] indicates the modularity among the partitions Higher119876 means thatconnections within partitions exceed those among them
Pro-lifePro-choice
ProhibiitonActivism
ControlRights
CreationismEvol Bio
Racism Anti-racism
DiscriminationSupport
000
025
050
075
Links
pos
ition
Opposite opinion Same opinion
Figure 2 Linksrsquo position distribution within pages Given 119875 and 119875 the orange boxplots show the distribution of links withinpages in 119875 (resp 119875) that point to articles in 119875 (resp 119875 ) The green boxes represent linkrsquos placement among pages only belongingto 119875 (resp 119875 ) The value of the y-axis is the relative position re-scaled with the 119905119886119899ℎ to similarly score links at the top of thepage Higher the value higher the position in the page is
119875 and 119875 to be disjoint articles belonging to both ldquoAnti-abortion
movementrdquo and ldquoAbortion-rights movementrdquo are assigned to N10
Once we have the list of pages in T we proceed building the topic-
induced network as described in the first part of this section The
articles we collect gather pages about different entities such as
organizations people events The inclusion of a heterogeneous set
of pages for each viewpoint allows to capture the different way a
user can learnknow about a topic
Before moving on we need to make two remarks (1) Throughoutthe paper when we talk about articles expressing an opinion ordescribe a viewpoint of a topic we do not mean that they endorse
the position of any subject they describe But they objectively talk of
entities that are close to one side of the issue (2) Since subcategoriesare often redundant or not entirely related to the parent category
we check them manually In this way we avoid cases like having
articles about anti-racism falling into the racism category Moreover
we do not consider categories whose names do not include topic-
specific keywords
32 General Statistics on Topicsrsquo NetworksFollowing the procedure explained in the previous section we
collect the topic-induced network related to six different topics
that we pick from the List of controversial issues on Wikipedia11
and other resources that indicate some controversial issues in our
society These topics are abortion cannabis guns evolution LGBTand racism These are critical topics that often polarize as follows
pro-choice vs prolife cannabis activism vs cannabis prohibition
gun control vs gun rights creationism vs evolutionary biology
support to LGBT rights vs opposition to LGBT rights and racism
10We report the size of the intersections between partitions in the next section
11httpsenwikipediaorgwikiWikipediaList_of_controversial_issues
Topic 119875 119875 Seed 119875 Seed 119875
Abortion Pro-life Pro-choice
Anti-abortion
movement
Abortion-rights
movement
Cannabis Prohibition Activism Cannabis prohibition Cannabis activism
Guns Control Rights
Gun control
advocacy groups
Gun rights
advocacy groups
Evolution Creationism
Evolutionary
biology
Creationism
Evolutionary
biology
Racism Racism Anti-racism Racism Anti-racism
LGBT Discrimination Support
Discrimination against
LGBT people
LGBT rights
movement
Table 2 The table indicates what opinion of a topic the par-titions 119875 and 119875 correspond to
vs anti-racism Information about the seed categories of each topic
are in Table 2 The full category lists and sample titles are provided
in the code folder Sect 1
For the rest of the paper we refer to the opinions about a topic
using 119875 and 119875 In Table 2 for each topic we match each set to the
real opinion it represents
Before presenting the general statistics of the retrieved networks
we remark that when we assign the articles to partitions we put
to the set N those assigned to both partitions The size of the
intersections among partitions (ie the number of common articles)
are the following abortion is 2 cannabis is 3 evolution is 2 guns is 1lgbt is 5 racism is 7 Recalling that we do not remove these articles
(ie they belong to N ) they can still act as bridges connecting 119875
and 119875 in sessions longer than 1 click Instead when we consider
the direct connections among partitions (1 click) we discard them
since they do not explicitly categorized into one partition
In Table 1 we show some statistics on the six topic-induced
networks Immediately we observe that the size of 119875 and 119875 differ
substantially for all the topics except for racism and guns It means
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
that we have one of the two opinions represented by more articles
In terms of content this does not necessarily imply that neither
one of the two views is incomplete nor insufficiently represented
Indeed a topic spans a few articles or may require more pages to
be complete On the other hand the unbalanced sizes can affect an
opinionrsquos exposure within the entire Wikipediarsquos network Practi-
cally if a set of articles is large and well connected to the rest of
the network the chances that users who randomly browses reach
it are higher than those of going to a small partition Moreover if
readers exploit the random article functionality of Wikipedia an
opinion more represented gets more chances of being randomly
sampled
The topics showing the higher unbalance are cannabis wherethere are five times more pages about activism than about prohi-bition and evolution where there are four times more pages about
evolutionary biology than about creationism If we consider the edges
across partitions the number of cross-partition edges is higher for
bigger sets This is reasonable because more nodes can point to the
opposite side Despite that for evolution the edges from creationismto evolutionary biology are sim3 times more and for LGBT the edges
from discrimination to rights are 36 more Despite the low number
of edges across cannabis partitions we decide not to discard the
topic
Above we said that one of the two partitions might connect
better to the rest of the encyclopedia We observe that the sizes of 119875
and 119875 are not linear in the number of edges that point out or to the
nodes in the partitions For instance the number of articles about
pro-choice (291) is half of the nodes related to pro-life movement
(481) Although the nodes in pro-life are twice as many as those in
pro-choice the number of links pointing to pages about pro-choiceis 36 more than those pointing to pro-life articles This happenswith different magnitude also for guns and LGBT We will see later
that the fact that a side of a topic is better blended in the network
has implications on the readersrsquo exposure to one of the two sides
of the topic (Sect 6)
We also investigate how many pages in 119875 and 119875 cannot be
reached by users unless they enter Wikipedia directly on those
pages The sets of articles with the highest number of unreachable
nodes are in the category of cannabis prohibition (1136) followed
by the 562 of evolutionary biology and LGBT rights (535)Furthermore we compute the modularity 119876 among 119875 and 119875
Higher 119876 means that connections within partitions exceed those
among them In Table 1 we report three values computed on dif-
ferently weighted graphs with probabilities assigned to click the
link of each page as follows (1) uniform (2) proportional to the
position of the link within the page and (3) proportional to readersrsquo
clickstream (see Sect 412) Overall if we consider the position of
links and readersrsquo clickstream it seems that the partitions are more
modular
Based on that we study how links across and within partitions
position in pages First we define the position of a link Given a
page we have its list of links in order of appearance We get the
relative rank within the list for each link and re-scale it by the tanh
In this way we have values in [0 1] and the links at the top of the
list get a more similar score The set of links includes those in the
infoboxes We regard them as at the top of the article according to
results in [15 17] If a link appears more than once we average its
position
In Figure 2 we show the position distributions According to
the t-test whose significant level is fixed to 120572 = 095 the average
position of links in pro-choice pointing to pro-choice is significantlydifferent than the average position of links pointing to pro-life Alsothe position of links from guns control to guns control is signifi-cantly higher than those to guns rights For evolutionary biologywhose distribution of links to creationism are placed statistically
significantly lower than those to evolutionary biology The same
happens for LGBTFor the sake of completeness of the analysis even if not used
further in the paper for each topic we study the quality of the pages
populating it In particular we use the ORES API to get the ldquoarticlequalityrdquo We observe that overall for all the topics between 60 and
70 the articles are classified as stubs or start Then the 22-29 is
in B-class the 0-5 are Featured Articles and the remaining belong
to the C-class12
4 METRICSIn this section we define the models and metrics that we use to an-
swer the research questions formulated in Sect 1 First we describe
how we characterize readersrsquo consumption either by analyzing
real usersrsquo data or by simulating their behavior (see Sect 41) Then
we introduce the core metrics of the paper ExDIN and M-ExDIN
see Sect 42
41 Content ConsumptionTo understand readersrsquo consumption of polarizing topics we need
different modeling strategies that we describe in the following
subsections
411 Metrics Based on Clickstream We build twometrics upon the
information we extract from usersrsquo clickstream data that are made
publicly available by Wikimedia and preserve usersrsquo privacy [14
54]13
From these data we infer 119888119894 119895 counting how many times a hyper-
link to 119894 isin 119881 is clicked from page 119895 The page 119895 may be either an
internalWikipedia page ( 119895 isin 119860 recalling that119881 = T cupNcup119878 includeall the Wikipedia pages) or external if corresponds to a page from
outside Wikipedia (eg a search engine) Thus we define the vari-
able120575 119895 which indicateswhether 119895 is an external page or it belongs to
the topic-induced network 120575 119895 = 1 if 119895 is external and 0 otherwise
Given a page 119894 we indicate withJ the set of external and internal
pages pointing to it see Figure 3 We define 119888119894 =sum
119895 isinJ 119888 119895119894 to be
the total clicks to the page
sum119895 isinJ 120575119894119888 119895119894 is the total number of clicks
from external websites therefore the difference between 119888119894 and this
summation is the number of visits from internal (Wikipedia) pages
Now we are ready to define the following metrics
Reader Search Rate (RSR) Given a page 119894 isin 119881 the empirical
probability that a visit to page 119894 is from an external website is
119877119878119877119894 =
sum119895 isinJ 120575119894119888 119895119894
119888119894 (1)
12httpsenwikipediaorgwikiTemplateGrading_scheme
13Description of the data is at httpsmetawikimediaorgwikiResearch
Wikipedia_clickstream The provided information is enough to extract the clickstream
based metrics
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Click Through Rate (CTR) Given a page 119894 isin 119881 the empirical
probability that a reader clicks a link within the page is
119862119879119877119894 =
sum119895 isin119873119900119906119905 (119894) 119888119894 119895
119888119894 (2)
where 119873119900119906119905 (119894) is the the set of pages 119894 points to (Multiple clicks
from the same page are counted as originating from different visits
to 119894 and thus counted multiple times in 119888119894 )
412 Model Clicks Within Pages When readers visit a page they
have the possibility of clicking any of the present links However
according to the information needs they want to satisfy each of the
links may have a different probability of being clicked [45] Now
we propose three models to describe the distribution probability
of clicking a link ldquojrdquo within an article ldquoirdquo First let 119894 be an article
in 119881 and 119895 isin 119873119900119906119905 (119894) We define 119901119900119904 ( 119895 |119894) as the rank of 119895 among
all links in 119894 and 119903 ( 119895 |119894) = |119873119900119906119905 (119894) | minus 119901119900119904 ( 119895 |119894) such that a higher
value indicates a higher ranking position Moreover we introduce
tanh119909 = 1198902119909minus1
1198902119909+1 which we use to transform ranking positions to
values between 0 and 1 such that links at the top of the page get
similar scores
The Clicks Within Pages models (CwP) are directly applicable on
119866 by setting the transition matrix119872 in one of the following modes
(1) 119872119906(Uniform) whose entry119898(119894 119895) = 1
|119873119900119906119905 (119894) | mimics read-
ers who click each link in a page uniformly at random
(2) 119872119901(Position) whose entry 119898(119894 119895) =
tanh 119903 ( 119895 |119894)sum119895isin119873119900119906119905 (119894 ) tanh 119903 ( 119895 |119894)
captures the scenario in which readers click with higher
probability links appearing first in the page This model is
based on previous work that shows how the link position is
a good predictor to determine the success of a link [16 31]
(3) 119872119888(Clicks) whose entry119898(119894 119895) = 119888119894119895sum
119895isin119873119900119906119905 (119894 ) 119888119894119895represents
the empirical probability that users in 119894 will click the link
toward 119895 When 119888119894 119895 lt 10 we substitute it with 1014 the
minimum number of times the link must be clicked to be
included in the dataset [53]
For the sake of completeness we recall that 119866 includes a super
node 119904 To fill its corresponding entries in the transition matrices
we need to aggregate over the edges we compressed to build the
graph15
see Sect 31
413 Readers Navigation Model The main goal of this paper is
to audit the mutual exposure to diverse information across 119875 and
119875 We can do it by simply looking at a snapshot of the graph and
counting the links going from 119875 to 119875 and vice-versa To do a step
further we recall that the Wikipediarsquos network is conceived to let
users move fulfilling their own information needs Thus we want
to understand how different usersrsquo navigation behavior can affect
readersrsquo exposure to diverse information
To do that it would be optimal to have access to usersrsquo log ses-
sion Because these data are not available to the public we define a
parametric model that simulates usersrsquo navigation by embedding
14We aim to model users on the current version of Wikipedia Thus to include all
the links we assign a smoothing factor equal to 10 to links clicked less than 10 This
implies a small probability of clicking these links Setting the smoothing factor to 10
is a deliberate choice However we experimentally verified that setting any number
between 1 and 10 does not affect the results
15The computation of these quantities is straightforward so we omit it from the
body of the paper
external
internal
i internal
Figure 3 Information from the clickstreamdataset For eachnode we extract the number of views coming from inter-nal and external websites Moreover we know howmany ac-cesses on a page turn into a click toward another article
different behaviors accordingly to chosen parameter We empha-
size that the scope of this model is not to perfectly replicate usersrsquo
behavior on Wikipedia Rather we want to see how users simu-
lated from a reasonable and general model are exposed to diverse
information
In other words we want to define a stochastic process with 119899 +1
states corresponding to the 119899 + 1 pages in119881 that approximates the
probability of reaching any of the articles starting at random from
119901 isin 119875 (or from 119875 )
Wemodel this by considering the process 119883 ℓ ℓ = 0 1 119871 on
the set of nodes119881 induced by transitionmatrix119872 with starting state
119883 0selected from the probability distribution 1206450
119875= (120587119875 )119894 isin R1times119899
over119881 We recall that the transition matrix119872 can vary according to
the CwP models (Sections 31 and 412) Based on the assumption
that usersrsquo session length (the number of clicks) is finite we evaluate
the process on a finite number of states 119871 We have that Pr(119883 ℓ =
119895) = (120587 ℓ119875) 119895 where the (row) vector 120645 ℓ
119875is given by the following
variation of the Personalized Random Walk with Restart (RWR)
Definition 1 (Navigation Model) Let1198720 be the transition ma-trix embedding a click-within-pages model 1206450
119875the distribution of the
starting state over 119875 and 120572 isin [0 1] the restart parameter We have
1206451
119875 = 1206450
119875middot1198720 (3)
and for ℓ ge 1
120645 ℓ+1
119875 = (1 minus 120572)120645 ℓ119875 middot119872ℓ + 120572 (1206450
119875 middot119872ℓ ) (4)
where119872ℓ = norm((119863 (119872ℓminus1)119879 )119879 ) and119863 = 119889119894119886119892
(1 + 120645 ℓminus1
119875
)minus1
norm(119872)transforms matrix119872 into a right-stochastic matrix by normalizingeach row independently such that it sums to 1
This process is a variation of the standard random-surfer (PageR-
ank) model with the difference that the transition matrix is updated
in each step It takes into account the probability that an article
has already been visited in a previous iteration Specifically the
vector 120645 ℓ119875that we get at the end of each iteration represents the
likelihood that each node is reached at step ℓ if it starts uniformly at
random from a node in 119875 We assume that readers within the same
session do not click more than once the same link Thus we desire
that at step ℓ + 1 the nodes that are clicked with high probability
at step ℓ see their probability of being reached deflated and those
with lower probability have more chances of being clicked We
achieve this by dividing the rows of119872 by the vector of probabilities
120645 ℓ119875+1 where 1 is a smoothing factor to avoid divisions by 0 and
then normalize the matrix to get the updated stochastic matrix to
use in the next iteration
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
(a) Star-like (120572 = 1) (b) Star-like Rand Navigation(0 lt
120572 lt 1)
(c) Random Naviga-
tion (0 = 120572 )
Figure 4 Navigation model for different 120572 The green nodesrepresent the starting navigation pages
Overall as we will later see in Section 42 this approach allow us
to investigate how the exposure to diverse information varies for
users who behave differently in terms of navigation session length
(meant as the number of clicks) and next-link choices
Looking deeper at the model
bull When 120572 = 1 Figure 4(a) the model emulates the reader
whose navigation consists in just opening links from the
starting page We call this behavior star-like and basically
consists in opening pages from the starting node With this
kind of exploration readers locally explore articles likely
semantically related to each other [49]
bull For 0 lt 120572 lt 1 Figure 4(b) we simulate two cases (1) readers
open sequential articles and then jump back to the starting
page (2) readers keeps multiple path open The more 120572 is
close to 1 the more users show a star-like behavior Instead
the closer 120572 is to 0 the more users navigate navigate in a
more DFS-oriented fashion Thus readers move randomly
according to the CwP model and from time to times jump
back to the starting page
bull If 120572 = 0 Figure 4(c) the users sequentially clicks links so
each click depends only on the CwP model In this case
especially if related articles are not densely connected the
exploration can lead to articles less related to the starting
page and returning to the origin following hyperlinks may
be difficult
Because Wikipedia does not have a button that allows readers to
go back to the previous page we assume the jumping back action
to consist in clicking the back button of the browser in use until
reaching again the session starting page The restart parameter
indirectly embeds the back-button action which for the absence of
back-links on Wikipedia can not be tracked on the graph
The behaviors replicated through the model recall those de-
scribed in [43 45]
42 Exposure MetricsAt this point we have all the ingredients to define the exposureto diverse information The metrics aim to quantify how much the
network structure allows readers to reach one or multiple sets of
articles To do that we rely on both the CwP and Navigationmodels
The application of the following metrics is not limited to polarizing
topics In fact they can generalize to the analysis of any sets of
nodes in a graph For this reason we adopt a more general notation
in their definition
Pro-
life
Pro-
choi
ce
Proh
ibiit
onAc
tivism
Cont
rol
Righ
ts
Crea
tioni
smEv
ol B
io
Racis
m
Anti-
racis
m
Disc
rimin
atio
nSu
ppor
t
0
5
10
Log(
Page
view
s)
Figure 5 Pageviews distribution For each topic we havea purple and yellow boxplot They represent the average(over all pages in the group 119875 or 119875) number of pageviewsAll the distribution distributions except for abortion are sta-tistically different at confidence level 120572 = 095 The topicsin order are abortion cannabis guns evolution racism andLGBT
Definition 2 (Exposure to diverse information (ExDIN)) Giventwo sets of pages 119875 119875 in 119881 let 120645 ℓ
119875be the vector indicating for each
article the probability of being reached at step ℓ (ℓ ge 1) starting froma random page in 119875 We say that the exposure of 119875 to 119875 is
119890ℓ119875rarr119875
=sum119895 isin119875
Pr(119883 ℓ = 119895) =sum119895 isin119875
120645 ℓ119875 (5)
and describes the probability that a reader in 119875 reaches an arbitrarynode in 119875 at the ℓth click
We employ this metric in two ways
(1) (Topological exposure to diverse information) If ℓ is 1 and
the CwP model is 119872119906(see Sect 412) it only quantifies
the topological property of the network to connect pages
belonging to different sets
(2) (Readersrsquo exposure to diverse information) For any parameter
and model that we pick the metric tells us how the readers
characterized by the CwP and Navigation models change
their exposure to diverse information over a session (ie
sequence of clicks)
Moreover we notice that Definition 2 can be extended tomultiple
sets Consider the case where we want to understand how one set of
nodes 119875 is exposed to three sets of nodes 119876119885 and 119871 To calculate
the ExDIN if we want to know the total exposure to the three sets
we define 119875 = 119876 cup 119885 cup 119871 Otherwise if we want to have the ExDIN
wrt to each set namely 119890119875rarr119876 119890119875rarr119885 119890119875rarr119871 we take 120645ℓ119875and sum
up the probabilities of the nodes within each set
Now that we have a metric to compute the exposure to diverseinformation we want to compare the flows among the sets Thus
we introduce the mutual exposure to diverse information
Definition 3 (Mutual exposure to diverse information (M-ExDIN))Let 119890ℓ
119875rarr119875and 119890ℓ
119875rarr119875be the exposure to diverse information of sets 119875
and 119875 We say that the mutual exposure between the sets is
120598ℓ =min119890ℓ
119875rarr119875 119890ℓ119875rarr119875
max119890ℓ119875rarr119875
119890ℓ119875rarr119875
isin [0 1] (6)
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
00 05 10 15 20 25 of pageview from
opposite partition
Pro-lifePro-choice
ProhibitionActivism
Gun controlGun rights
CreazionismEvolutionary biology
LGBT discriminationLGBT rights
RacismAnti-racism
Figure 6 Percentage of pageviews coming from oppositeside Topics in order from the bottom abortion cannabisguns evolution LGBT racism
If either 119890ℓ119875rarr119875
or 119890ℓ119875rarr119875
is 0 then 120598 = 0
This measure quantifies to what extend the exposure to diverse
information is balanced across 119875 and 119875
The closer 120598 is to 1 the more balanced the probabilities of moving
from one set to the other are In this case the network topology does
not favor connections from one set to the other On the other hand
if 120598 is close to 0 it the network structure tends to favor either the
navigation from 119875 rarr 119875 or from 119875 rarr 119875 On this perspective if we
observe a tendency in the network of facilitating the exploration
from one of the sets to the other we may say that the network
topology is biased toward a direction Thus we can think of using
M-ExDIN to measure the bias in the network wrt two sets of
nodes
Even though the mutual exposure to diverse information cap-
tures the balance among ExDIN of 119875 and 119875 when they are of com-
parable size it may fail if one is much smaller than the other For
instance suppose 119875 is 10 times larger than 119875 then if pages of both
partitions have a similar out-degree distribution one would expect
119890ℓ119875rarr119875
asymp 10 middot 119890ℓ119875rarr119875
and as a result 120598 asymp 01 The same happens if
they have similar in-degree distribution For this reason when we
compute either ExDIN and M-ExDIN we check whether the sizes
of the communities are unbalanced and we proceed as follows If
|119875 | lt |119875 | we define 119875 prime obtained by sampling |119875 | articles from 119875
Thus we use the new set for all computations Because of the ran-
domness of the phenomenon we repeat the measurements multiple
times
5 RQ1 READERSrsquo TOPIC CONSUMPTIONBefore looking into how readers are exposed to diverse content
we investigate how they have consumed each of the six topics
that we concentrate on over the last four years In particular we
collect monthly clickstream data from November 2017 to September
2020 We note that when we count the click views of a page we
consider the average over the number of months the page existed16
Accordingly when computing the occurrences for the transitions
matrix based on clickstream we consider the average clicks of the
link over the number of months it exists In this way we reduce
16Based on the temporal graphs extracted by [11]
the seasonality effect and weight links according to page changes
in terms of hyperlinks
51 Pageviews DistributionTo start our analysis we count the average number of times a
page has been visited over 34 months In Figure 5 we plot the
log-distributions of the pageviews for each topic and opinion By
running a t-test we conclude that for all topics except for abortionthe difference of the means of opinionsrsquo pageviews is statistically
significant for 119901 lt 005 This finding demonstrate that users tend to
visit more pages expressingsupporting one of the two viewpoints
From a networkrsquos perspective to increase the exposure to opposing
opinions it is desirable for pages that are frequently visited to be
well connected to articles expressing opposing opinions
In Figure 6 we break down the pageviews showing how many
of them come from pages of the opposing partition Overall the
fraction of visits from the opposite side is low (below 05) The
category LGBT rights has the highest ratio of visits from LGBT dis-crimination pages about 25 For topics such as guns and abortionthe percentage of visits from opposite partition shows that there
are somewhat fewer visits to pages of a liberal inclination from
articles expressing a more conservative opinion In fact the 028
of visits to pro-choice come from pro-life compared to 06 visits of
pro-life from pro-choice
52 External or Internal Access to the TopicWe now investigate how readers access content about a topic As
introduced in Section 411 from the clickstream data we can com-
pute the RSR which indicates whether a page is accessed more
by external sources or by navigating Wikipedia In Figure 7 we
provide a visualization that depicts the flows of the cumulative
visits from external and internal pages towards the two partitions
Referring to Figure 7(c) the 448 of visualizations come from
internal pages The click stream from internal pages is broken down
to see the proportion of flow towards guns control and gun rightsThe internal views of guns control articles are 34 times more than
those of gun rights We observe that also from external websites
most of the traffic is towards gun control (27 times more than gunrights) Overall the 26 of the total visits to gun related content is
concentrated on gun rightsThe abundance of traffic towards one of the two opinions does
not characterize only the guns topic Indeed among all the topics
the 59ndash74 of visits is accumulated by one partition Moreover
readersrsquo preferences appear consistent among external and internal
accesses that is they both point more towards the same view of
the topic For both internal and external views the distribution
of accesses toward partitions is approximately the same (ie the
percentage of visits from external to 119875 (resp 119875 ) is the same of
from internal to 119875 (resp119875 )) The only exception is evolution whoseexternal visits to creationism is 453 lower than internal accesses
We note that partitions with higher views are not necessarily the
biggest in the topic-induced networks
In general the largest amount of visits to topicsrsquo articles comes
from external pages Particularly only the 236 and 335 of traffic
to evolution and racism is generated by the internal Wikipediarsquos
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll PPPPPrrrrrooooo-----llllliiiiifffffeeeee
PPPPPrrrrrooooo-----ccccchhhhhoooooiiiiiccccceeeee
(a) Abortion
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
PPPPPrrrrrooooohhhhhiiiiibbbbbiiiiiiiiiitttttooooonnnnn
AAAAAccccctttttiiiiivvvvviiiiisssssmmmmm
(b) Cannabis
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
CCCCCooooonnnnntttttrrrrrooooolllll
RRRRRiiiiiggggghhhhhtttttsssss
(c) Guns
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllllCCCCCrrrrreeeeeaaaaatttttiiiiiooooonnnnniiiiisssssmmmmm
EEEEEvvvvvooooolllll BBBBBiiiiiooooo
(d) Evolution
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
RRRRRaaaaaccccciiiiisssssmmmmm
AAAAAnnnnntttttiiiii-----rrrrraaaaaccccciiiiisssssmmmmm
(e) Racism
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
DDDDDiiiiissssscccccrrrrriiiiimmmmmiiiiinnnnnaaaaatttttiiiiiooooonnnnn
SSSSSuuuuuppppppppppooooorrrrrttttt
(f) LGBT
Figure 7 Cumulative Pagesrsquo Traffic Eachplot indicates (1) On the left the cumulative amount of accesses coming from externalweb pages or internal Wikipediarsquos articles (2) The flows of visits from external and internal pages to partitions (3) On theright the cumulative accesses to 119875 and 119875
0 5 10 15 20 25 30Avg(Click Through Rate) X 100
Pro-lifePro-choiceProhibition
ActivismGun controlGun rights
CreazionismEvolutionary biologyLGBT discrimination
LGBT rightsRacism
Anti-racism
Figure 8 Average click-through rate In this plot we reportthe average CTR of pages belonging to the same set 119875 (resp119875) The score indicate the average probability that a linkwithin a page 119901 in 119875 (resp 119875) is clicked Topics in orderfrom the bottom abortion cannabis guns evolution racismLGBT
navigation The same quantity for the remaining topics ranges
ranges between 44 and 47
We point out that readersrsquo consulting articles about abortioncannabis and guns are inclined toward pages conveying liberal
views on the topic Instead it is more complicated to draw inter-
pretations about the remaining topics One explanation may be
that users look for information generally less covered in the public
mainstream debate
53 How Much Readers Navigate LinksOnce readers visit a page they can decide to click any of its links
We want to understand how frequently they do so For that we
compute the average pagesrsquo click-through rate (Sect 411)
We plot this information in Figure 8 Overall we see that the
percentage of access turning into a visit to another page ranges
between 10ndash28 Dimitrov and Lemmerich [14] observed that the
CTR average for the whole Wikipedia is 12 So most of the subset
of pages we consider have a CTR higher than Wikipediarsquos average
The CTR of guns control is the highest (28) the pages about racismfollow with 26 The articles that over the years have generated
less internal traffic are those about evolutionary biology and LGBTrights
Examining pagesrsquo connections we found that those with higher
CTR have more links (the Pearson correlation coefficient of is 052)
Topic macrC(119875)
macrC(119875 rarr 119875) macrC(119875 rarr 119875)
macrC(119875) macrC(119875 rarr 119875)
macrC(119875 rarr 119875)Abortion 6889 9082 5764 7026 8981 4537
Cannabis 7081 9501 3750 6578 9652 1667
Guns 5234 7869 3568 5963 7535 3928
Evolution 7115 8449 6447 7269 9900 5513
Racism 5636 8841 3432 7187 9063 6544
LGBT 6166 8942 525 7242 9252 5917
Table 3 Average of links within pages clicked less than 10times
macrC(119875) is the average percentage of un-clicked hyper-linkswithin pages in 119875
macrC(119875 rarr 119875) is the average percentageof un-clicked links within 119875 pointing to 119875
This suggests that articles having higher out-degree offer more
options to users Presumably because of this users are more likely
to continue the exploration from those articles
In addition we count the number of links clicked fewer than 10
times over the last three years see Table 3 As an example given
a page in creationism on average 7115 of its links have been
clicked fewer than 10 times If we distinguish between references
to creationism and references to evolution readers did not click
the 8449 of links pointing to creationism and the 6447 of those
pointing to evolutionary biology
6 RQ2 EXPOSURE ACROSS TOPICVIEWPOINTS
The main contribution of this paper is to examine to what extent
current Wikipediarsquos topology supports users to explore diverse
facets of polarizing issues In particular we study (1) how readers are
locally exposed to diverse information and (2) how their exposure
to plural opinions may change throughout a navigation session
61 Exposure to DiversityTo evaluate the exposure to diversity induced by the networkrsquos
topology we compute the exposure to diverse information for ℓ = 1
using the uniform CwP model Recalling that ℓ indicates the usersrsquo
session length if we set it equals to 1 we study the exposure to
diversity over one-click sessions
Plots in the first row of Figure 9 show the value of ExDIN for
all the topics when CwP is119872119906
For instance let evolution be the topic we analyze If readers
start uniformly at random from a page about creationism the prob-
ability of visiting an article of the same partition is 576 On the
contrary the chances of entering a page about evolutionary biology
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Pro-life
Pro-choice
Unifo
rm
456
748133
061 Prohibiiton
Activism
338
038039
008 Control
Rights
444
478161
116 Creationism
Evol Bio
576
327061
018 Racism
Anti-racism
876
8613
142 Discrimination
Support
424
395063
054
Pro-life
Pro-choice
Posit
ion
428
781115
056 Prohibiiton
Activism
332
037037
008 Control
Rights
468
473138
089 Creationism
Evol Bio
553
329054
016 Racism
Anti-racism
892
834129
159 Discrimination
Support
394
356056
042
Pro-
life
Pro-
choi
ce
Pro-life
Pro-choice
Click
s
1274
1357129
123Pr
ohib
iiton
Activ
ism
Prohibiiton
Activism
864
065081
013
Cont
rol
Righ
ts
Control
Rights
1376
1305208
137
Crea
tioni
sm
Evol
Bio
Creationism
Evol Bio
1157
52069
03
Racis
m
Anti-
racis
m
Racism
Anti-racism
2367
1503116
129
Disc
rimin
atio
n
Supp
ort
Discrimination
Support
1107
653066
053
Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click
is 018 (32 times smaller) On the other hand readers starting
uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion
This probability is 5 times larger than that of visiting creationismpages
It is worth to point out that the current networkrsquos topology not
only nudges users in reading more about the same opinion but
also hinders them to explore diverse content symmetrically Indeed
users reading about evolutionary biology have higher chances of
reading one article about creationism (3 times more) than users from
creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-
tioned observations hold for most of them Moreover we note that
the probabilities to continue the session reading a page of the same
opinion is greater for one of the two partitions of a given topic
Taken together these measurements highlight that the structure
of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to
diverse information smaller than 1
The findings above report the intrinsic capability of the network
to spur users towards diverse content If we want to combine it
with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results
in the second and third row of Figure 9
Referring back to evolution we now consider the matrix corre-
sponding to the ExDIN computed using the position CwP model
(second row) We see that if users click with higher chances links
at the top of the page wrt the uniform model the probabilities are
only slightly modified These modest variations are coherent with
linksrsquo placement within pages Figure 2
For a few topics such as guns the linksrsquo position plays a more
significant role worsening the user exposure to diverse information
Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-
sequence is that the probability of reaching an article supporting
guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the
uniform CwP model Therefore we conclude that for some topics
the placement of links within pages contributes in reducing the
exposure to diverse information In other words users who tend
to click with higher probability the links located towards the be-
ginning of a page have less possibility to read about contrasting
opinions
Finally we analyze how the phenomenon changes when we use
the click CwP model (third row) In this case we assume readers
make the next-click choice similarly to past users Going back to
evolution we immediately observe that the probability to start a
session in creationism and to continue reading about it after one
click grows from 576 of the uniform model to the 1157
For all the topics we verify a significant increment of the proba-
bility to visit pages of the same opinion Simply interpreting this
result we can say that real users click more the links strictly related
to the page they are reading From another perspective combining
this finding with the previous remark saying that the ldquotopology of
the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer
this question but we hope it will be addressed in future works
Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing
diverse view
Before moving to the next section we want to underline that
stronger relations among pages of similar content is an intrinsic
property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a
fact we believe that it would be valuable to provide to editors met-
rics and tools making them aware of the effect that a newcurrent
links have on usersrsquo exposure to diverse information This is not
meant to alter the core and essential intrinsic property ofWikipedia
rather to avoid this property to become harmful when it prevents
users from accessing diverse content
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
1 3 5 7 9 11 13 15
1
2
e PP
Abortion
1 3 5 7 9 11 13 15
00
05
10Cannabis
1 3 5 7 9 11 13 150
1
2
Guns
1 3 5 7 9 11 13 15
05
10Evolution
1 3 5 7 9 11 13 15
05
10
15
Racism
1 3 5 7 9 11 13 15
05
10
Lgbt
1 3 5 7 9 11 13 1500
05
10
15
e PP
1 3 5 7 9 11 13 15
000
025
050
1 3 5 7 9 11 13 15
1
2
1 3 5 7 9 11 13 15
02
04
06
1 3 5 7 9 11 13 1505
10
15
1 3 5 7 9 11 13 15
05
10
15
1 3 5 7 9 11 13 15
40
60
80
100
1 3 5 7 9 11 13 15
25
50
75
1 3 5 7 9 11 13 1525
50
75
100
1 3 5 7 9 11 13 15
40
60
80
1 3 5 7 9 11 13 15
60
80
100
1 3 5 7 9 11 13 15
60
80
100
Number of Clicks
uniform alpha=0position alpha=0clicks alpha=0
uniform alpha=02position alpha=02clicks alpha=02
uniform alpha=1position alpha=1clicks alpha=1
Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42
62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions
longer than 1 and see how their (mutual) exposure to diversity may
change According to the combinations of models employed to mea-
sure the ExDIN and M-ExDIN we provide different insights about
the effect of the current networkrsquos topology on usersrsquo exposure to
diverse content over a navigation session
In Figure 10 for sessions of length 15 we plot the ExDIN from
119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can
notice that each of the topics shows its own trends For this reason
we decide to highlight and provide an explanation for the most
recurrent patterns Moreover for better understanding we suggest
to cross-check the following explanations with the analysis done
above in the paper We start describing how given the current net-
workrsquos topology the ExDIN changes over the course of a navigation
session (first and second row)
(1) The curves corresponding to the same value of 120572 (same color)
show very similar trends Depending on the respective CwP model
(markerrsquos shape) they are shifted up or down This implies that
when users share the same navigation behavior the way they make
the next-click choice plays a crucial role on determining the mag-
nitude of their exposure to diverse content In general the CwP
model (markersrsquo shapes) corresponding to higher exposure is119872119888
followed by119872119906and119872119901
(2) If users navigate mirroring a star-like behavior (green 120572 = 1)
their exposure to the opposite opinion is steady It can slightly
decrease or increase when the probability of clicking links to the
opposite side becomes higher or lower respectively in the first
iterations This happens because these kind of users are only subject
to the exposure of their starting navigation page So the more links
to the opposite partition they click in the first steps the more their
exposure to diversity decreases and vice-versa
(3) The curves of users who randomly navigate the network (sky
blue 120572 = 0) show two trends For both cases after the first few
clicks the ExDIN is lower than at the beginning of the session
Then it inverts the trend In one case it reaches or exceeds the
starting exposure On the other hand it grows getting steady below
the initial exposure The more the destination partition is connected
to the rest of the graph the more users randomly navigating the
network are able to reach it Sometimes ExDIN from 119875 to 119875 of
LGBT the curves start to decrease after many steps This happens
when the pages within the destination partition have been reached
with high probability
(4) Users characterized by a star-like random navigation (blue
120572 = 02) are exposed to diverse content similarly to users exploring
randomly the network but the ExDIN magnitude is greater because
of the possibility of jumping back to the starting point at each step
Given this observation we analyze the ExDIN of guns With the
current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)
are those characterized by a star-like behavior As soon as the users
navigate in a random fashion this probability drops down This is
because the guns rights partition has a number of in-going edges
that prevent random users who walk away to get back to it We
observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to
the guns control partition is higher or comparable to that of star-like
behaving users The probability of reaching guns control after many
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
steps becomes high because of the higher incoming edges of the
partition
From complementary analysis we observe that for sessions
longer than 2 page visits the topology of the graph is not such
to keep random users within the knowledge bubble they accessed
Indeed for all the topics the probability to visit articles of the same
opinion of the starting page becomes close to 0 (refer to Fig 9 to
cross-check the probabilities at the first click) For users with star-
like behavior the probabilities show slightly descending curves
This demonstrates that they tend to pick pages of the same opinion
at the first iterations Due to space constraints figures picturing
these phenomena could not be displayed
Nowwe compare the ExDIN curves of each topic to understand if
users reading about different opinions have equal chances of visiting
each other We recall that to do it we use the Mutual exposure to
diverse information see Sect 42 For all the topics the mutual
exposure to diverse information is lower than 100 meaning that
for none of them does the network topology provide an equal
exposure across opinions If we consider topics like abortion and
guns the longer the usersrsquo session is the more the network topology
prevents readers to symmetrically explore different opinions
In general if we detect low mutuality the topology of the net-
work favors the exploration of one side of the topic more than the
other Moreover we want to stress that all the comments regarding
users navigating according to the uniform CwP model express the
intrinsic topological exposure of the network On Wikipedia two
scenarios determining this phenomenon may be (1) the knowledge
on the encyclopedia is complete but articles are underlinked [49]
that is there is the content and keywords to become anchors thus
we need a strategy to densify the network ensuringmutual exposure
to diversity (2) the knowledge on the encyclopedia is incomplete
that is there are no words to attach links thus the addition of com-
plementary content may be necessary An in depth investigation of
this conditions may be an interesting future work
7 CONCLUSIONSOur work provides a first analysis to understand how the current
Wikipediarsquos network topology assists readers to explore opposite
stances of polarizing topics spanning over sets of articles We for-
malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles
expressing opposing viewpoints The latter evaluates whether the
relationship across diverse views is symmetric that is whether
the flow and the opportunity to go from one side to the other is
comparable for the two directions
We investigate the phenomenon on six polarizing topics (Sect 6)
In addition we also study the overall usersrsquo topics consumption
Our main findings suggest the following
bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-
ing from both external and internal pages suggest that read-
ers are inclined to seek content about one facet of the topic
Most seem to have bias toward liberal content (see Fig 7)
bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse
contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61
the analysis suggests that users reading about an opinion
have higher chances of continuing to explore articles of sim-
ilar views than of the opposite Furthermore for each of the
topics that we explored the users of one of the two views
had substantially more tendency albeit small to visit pages
of the opposing view than the ones of the other one
bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we
observe that the mutual exposure to diverse information is
never achieved by users navigating completely at random
The better one of the two opinions is connected to the rest of
the network the more the graph nudges users toward that
opinion
bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-
mation of the same inclination diminishes On the other
hand if users explore the network with a star-like behavior
their level of exposure to the same opinion is similar to those
who only do one click
In our study we analyze sets of articles assigned to opinions ac-
cording to editorsrsquo crafted categories [44] Although this approach
represents a solid starting point for analysis it can cause article mis-
classification As future work we plan to investigate a more reliable
classification strategy to improve the accuracy of our analysis an
analysis which should include also the content of the articles along
the line information Secondly the performance of a longitudinal
study with the goal of understanding the dynamics that brought
to the current state of the encyclopediarsquos network would provide
further understanding of the usersrsquo behavior Finally in the light
of our findings we deem crucial to design tools to help editors to
contextualize articles within the network such that they are aware
of the effect of links insertion on users knowledge exploration
The prevalence of bias and polarization is well established in
multiple areas of our life and filter bubbles aggravate this phenom-
enon Understanding better how they manifest in Wikipedia (and
other media) is a crucial first step for finding ways to attenuate it
and our hope is that this work is a step towards this goal
Acknowledges Partially supported by the ERCAdvancedGrant
788893 AMDROMA Algorithmic and Mechanism Design Research
in Online Markets and MIUR PRIN project ALGADIMAR Algo-
rithms Games and Digital Markets
REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004
US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem
[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion
networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep
Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18
[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian
Self-Governance in Action Motivating the Policy Lens In ICWSM
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models
IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation
Journal of machine Learning research 3 Jan (2003) 993ndash1022
[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-
work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web
[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content
on famous persons Journal of the American society for information science andtechnology (2011)
[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter
Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM
[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves
Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter
In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs
A complete longitudinal and multi-language dataset of the Wikipedia link net-
works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607
[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena
Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-
ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media
[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho
2016 Extracting Semantics from Random Walks on Wikipedia Comparing
Learning and Counting Methods In Wiki ICWSM
[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-
ferent topic different traffic How search and navigation interplay on Wikipedia
The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28
[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926
[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web
[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding
news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346
[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers
and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320
[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in
Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72
[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In
Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)
[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Political discourse on social media Echo chambers gate-
keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922
[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-
ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671
[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)
[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-
rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282
[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First
Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media
[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013
Models of human navigation in information networks based on decentralized
search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98
[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki
dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-
trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113
[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and
Andreas Hotho 2019 On the right track Analysing and predicting navigation
success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext
and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web
Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web
[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017
How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50
[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of
source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754
[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-
nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and
moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505
[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my
friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726
[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito
2015 Measuring political polarization Twitter shows the two sides of Venezuela
Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018
Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18
[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving
website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624
[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring
Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval
[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk
about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11
[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019
Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-
fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and
Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141
[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014
The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222
[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom
of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn
Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600
[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013
Computing semantic relatedness from human navigational paths A case study
on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70
[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer
2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)
[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-
works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki
WikipediaManual_of_StyleLinking
[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia
Namespace
[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki
WikipediaNeutral_point_of_view
[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect
[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https
doiorg106084m9figshare1305770v22 (2017)
[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing
wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985
- Abstract
- 1 Introduction
- 2 Related Works
- 3 Data Collection
-
- 31 Topic Induced Networks
- 32 General Statistics on Topics Networks
-
- 4 Metrics
-
- 41 Content Consumption
- 42 Exposure Metrics
-
- 5 RQ1 Readers Topic Consumption
-
- 51 Pageviews Distribution
- 52 External or Internal Access to the Topic
- 53 How Much Readers Navigate Links
-
- 6 RQ2 Exposure Across Topic Viewpoints
-
- 61 Exposure to Diversity
- 62 Dynamic Exposure to Diversity
-
- 7 Conclusions
- References
-
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Topic |119881 119904| |119875 | |119875 | |N | |119864 | |119864119875rarr119875 | |119864119875rarr119875 | |119864119875rarrN | |119864119875rarrN | |119864Nrarr119875 | |119864Nrarr119875 | 119876 (119875 119875) Unreach(119875 ) Unreach(119875 )
Abort 56861 481 291 56093 19M 205 97 21843 14492 21396 29889 029 (030041) 21 481
Cannabis 32743 45 231 32470 11M 8 6 1089 15055 656 27823 027 (014 003) 1136 349
Guns 65743 167 187 65393 25M 98 115 18342 12304 56702 16608 026 (024 030) 363 000
Evolution 84788 342 1334 83113 199M 391 135 18289 45472 15601 58720 020 (022 027) 169 562
Racism 129963 1024 1022 127953 48M 746 560 64359 41566 74354 58195 032 (021 031) 272 255
LGBT 150563 459 640 149479 46M 195 143 28100 22678 92975 81706 034 (030 013) 244 535
Table 1 Networksrsquo statistics The notation119876 (119875 119875) isin [0 1] indicates the modularity among the partitions Higher119876 means thatconnections within partitions exceed those among them
Pro-lifePro-choice
ProhibiitonActivism
ControlRights
CreationismEvol Bio
Racism Anti-racism
DiscriminationSupport
000
025
050
075
Links
pos
ition
Opposite opinion Same opinion
Figure 2 Linksrsquo position distribution within pages Given 119875 and 119875 the orange boxplots show the distribution of links withinpages in 119875 (resp 119875) that point to articles in 119875 (resp 119875 ) The green boxes represent linkrsquos placement among pages only belongingto 119875 (resp 119875 ) The value of the y-axis is the relative position re-scaled with the 119905119886119899ℎ to similarly score links at the top of thepage Higher the value higher the position in the page is
119875 and 119875 to be disjoint articles belonging to both ldquoAnti-abortion
movementrdquo and ldquoAbortion-rights movementrdquo are assigned to N10
Once we have the list of pages in T we proceed building the topic-
induced network as described in the first part of this section The
articles we collect gather pages about different entities such as
organizations people events The inclusion of a heterogeneous set
of pages for each viewpoint allows to capture the different way a
user can learnknow about a topic
Before moving on we need to make two remarks (1) Throughoutthe paper when we talk about articles expressing an opinion ordescribe a viewpoint of a topic we do not mean that they endorse
the position of any subject they describe But they objectively talk of
entities that are close to one side of the issue (2) Since subcategoriesare often redundant or not entirely related to the parent category
we check them manually In this way we avoid cases like having
articles about anti-racism falling into the racism category Moreover
we do not consider categories whose names do not include topic-
specific keywords
32 General Statistics on Topicsrsquo NetworksFollowing the procedure explained in the previous section we
collect the topic-induced network related to six different topics
that we pick from the List of controversial issues on Wikipedia11
and other resources that indicate some controversial issues in our
society These topics are abortion cannabis guns evolution LGBTand racism These are critical topics that often polarize as follows
pro-choice vs prolife cannabis activism vs cannabis prohibition
gun control vs gun rights creationism vs evolutionary biology
support to LGBT rights vs opposition to LGBT rights and racism
10We report the size of the intersections between partitions in the next section
11httpsenwikipediaorgwikiWikipediaList_of_controversial_issues
Topic 119875 119875 Seed 119875 Seed 119875
Abortion Pro-life Pro-choice
Anti-abortion
movement
Abortion-rights
movement
Cannabis Prohibition Activism Cannabis prohibition Cannabis activism
Guns Control Rights
Gun control
advocacy groups
Gun rights
advocacy groups
Evolution Creationism
Evolutionary
biology
Creationism
Evolutionary
biology
Racism Racism Anti-racism Racism Anti-racism
LGBT Discrimination Support
Discrimination against
LGBT people
LGBT rights
movement
Table 2 The table indicates what opinion of a topic the par-titions 119875 and 119875 correspond to
vs anti-racism Information about the seed categories of each topic
are in Table 2 The full category lists and sample titles are provided
in the code folder Sect 1
For the rest of the paper we refer to the opinions about a topic
using 119875 and 119875 In Table 2 for each topic we match each set to the
real opinion it represents
Before presenting the general statistics of the retrieved networks
we remark that when we assign the articles to partitions we put
to the set N those assigned to both partitions The size of the
intersections among partitions (ie the number of common articles)
are the following abortion is 2 cannabis is 3 evolution is 2 guns is 1lgbt is 5 racism is 7 Recalling that we do not remove these articles
(ie they belong to N ) they can still act as bridges connecting 119875
and 119875 in sessions longer than 1 click Instead when we consider
the direct connections among partitions (1 click) we discard them
since they do not explicitly categorized into one partition
In Table 1 we show some statistics on the six topic-induced
networks Immediately we observe that the size of 119875 and 119875 differ
substantially for all the topics except for racism and guns It means
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
that we have one of the two opinions represented by more articles
In terms of content this does not necessarily imply that neither
one of the two views is incomplete nor insufficiently represented
Indeed a topic spans a few articles or may require more pages to
be complete On the other hand the unbalanced sizes can affect an
opinionrsquos exposure within the entire Wikipediarsquos network Practi-
cally if a set of articles is large and well connected to the rest of
the network the chances that users who randomly browses reach
it are higher than those of going to a small partition Moreover if
readers exploit the random article functionality of Wikipedia an
opinion more represented gets more chances of being randomly
sampled
The topics showing the higher unbalance are cannabis wherethere are five times more pages about activism than about prohi-bition and evolution where there are four times more pages about
evolutionary biology than about creationism If we consider the edges
across partitions the number of cross-partition edges is higher for
bigger sets This is reasonable because more nodes can point to the
opposite side Despite that for evolution the edges from creationismto evolutionary biology are sim3 times more and for LGBT the edges
from discrimination to rights are 36 more Despite the low number
of edges across cannabis partitions we decide not to discard the
topic
Above we said that one of the two partitions might connect
better to the rest of the encyclopedia We observe that the sizes of 119875
and 119875 are not linear in the number of edges that point out or to the
nodes in the partitions For instance the number of articles about
pro-choice (291) is half of the nodes related to pro-life movement
(481) Although the nodes in pro-life are twice as many as those in
pro-choice the number of links pointing to pages about pro-choiceis 36 more than those pointing to pro-life articles This happenswith different magnitude also for guns and LGBT We will see later
that the fact that a side of a topic is better blended in the network
has implications on the readersrsquo exposure to one of the two sides
of the topic (Sect 6)
We also investigate how many pages in 119875 and 119875 cannot be
reached by users unless they enter Wikipedia directly on those
pages The sets of articles with the highest number of unreachable
nodes are in the category of cannabis prohibition (1136) followed
by the 562 of evolutionary biology and LGBT rights (535)Furthermore we compute the modularity 119876 among 119875 and 119875
Higher 119876 means that connections within partitions exceed those
among them In Table 1 we report three values computed on dif-
ferently weighted graphs with probabilities assigned to click the
link of each page as follows (1) uniform (2) proportional to the
position of the link within the page and (3) proportional to readersrsquo
clickstream (see Sect 412) Overall if we consider the position of
links and readersrsquo clickstream it seems that the partitions are more
modular
Based on that we study how links across and within partitions
position in pages First we define the position of a link Given a
page we have its list of links in order of appearance We get the
relative rank within the list for each link and re-scale it by the tanh
In this way we have values in [0 1] and the links at the top of the
list get a more similar score The set of links includes those in the
infoboxes We regard them as at the top of the article according to
results in [15 17] If a link appears more than once we average its
position
In Figure 2 we show the position distributions According to
the t-test whose significant level is fixed to 120572 = 095 the average
position of links in pro-choice pointing to pro-choice is significantlydifferent than the average position of links pointing to pro-life Alsothe position of links from guns control to guns control is signifi-cantly higher than those to guns rights For evolutionary biologywhose distribution of links to creationism are placed statistically
significantly lower than those to evolutionary biology The same
happens for LGBTFor the sake of completeness of the analysis even if not used
further in the paper for each topic we study the quality of the pages
populating it In particular we use the ORES API to get the ldquoarticlequalityrdquo We observe that overall for all the topics between 60 and
70 the articles are classified as stubs or start Then the 22-29 is
in B-class the 0-5 are Featured Articles and the remaining belong
to the C-class12
4 METRICSIn this section we define the models and metrics that we use to an-
swer the research questions formulated in Sect 1 First we describe
how we characterize readersrsquo consumption either by analyzing
real usersrsquo data or by simulating their behavior (see Sect 41) Then
we introduce the core metrics of the paper ExDIN and M-ExDIN
see Sect 42
41 Content ConsumptionTo understand readersrsquo consumption of polarizing topics we need
different modeling strategies that we describe in the following
subsections
411 Metrics Based on Clickstream We build twometrics upon the
information we extract from usersrsquo clickstream data that are made
publicly available by Wikimedia and preserve usersrsquo privacy [14
54]13
From these data we infer 119888119894 119895 counting how many times a hyper-
link to 119894 isin 119881 is clicked from page 119895 The page 119895 may be either an
internalWikipedia page ( 119895 isin 119860 recalling that119881 = T cupNcup119878 includeall the Wikipedia pages) or external if corresponds to a page from
outside Wikipedia (eg a search engine) Thus we define the vari-
able120575 119895 which indicateswhether 119895 is an external page or it belongs to
the topic-induced network 120575 119895 = 1 if 119895 is external and 0 otherwise
Given a page 119894 we indicate withJ the set of external and internal
pages pointing to it see Figure 3 We define 119888119894 =sum
119895 isinJ 119888 119895119894 to be
the total clicks to the page
sum119895 isinJ 120575119894119888 119895119894 is the total number of clicks
from external websites therefore the difference between 119888119894 and this
summation is the number of visits from internal (Wikipedia) pages
Now we are ready to define the following metrics
Reader Search Rate (RSR) Given a page 119894 isin 119881 the empirical
probability that a visit to page 119894 is from an external website is
119877119878119877119894 =
sum119895 isinJ 120575119894119888 119895119894
119888119894 (1)
12httpsenwikipediaorgwikiTemplateGrading_scheme
13Description of the data is at httpsmetawikimediaorgwikiResearch
Wikipedia_clickstream The provided information is enough to extract the clickstream
based metrics
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Click Through Rate (CTR) Given a page 119894 isin 119881 the empirical
probability that a reader clicks a link within the page is
119862119879119877119894 =
sum119895 isin119873119900119906119905 (119894) 119888119894 119895
119888119894 (2)
where 119873119900119906119905 (119894) is the the set of pages 119894 points to (Multiple clicks
from the same page are counted as originating from different visits
to 119894 and thus counted multiple times in 119888119894 )
412 Model Clicks Within Pages When readers visit a page they
have the possibility of clicking any of the present links However
according to the information needs they want to satisfy each of the
links may have a different probability of being clicked [45] Now
we propose three models to describe the distribution probability
of clicking a link ldquojrdquo within an article ldquoirdquo First let 119894 be an article
in 119881 and 119895 isin 119873119900119906119905 (119894) We define 119901119900119904 ( 119895 |119894) as the rank of 119895 among
all links in 119894 and 119903 ( 119895 |119894) = |119873119900119906119905 (119894) | minus 119901119900119904 ( 119895 |119894) such that a higher
value indicates a higher ranking position Moreover we introduce
tanh119909 = 1198902119909minus1
1198902119909+1 which we use to transform ranking positions to
values between 0 and 1 such that links at the top of the page get
similar scores
The Clicks Within Pages models (CwP) are directly applicable on
119866 by setting the transition matrix119872 in one of the following modes
(1) 119872119906(Uniform) whose entry119898(119894 119895) = 1
|119873119900119906119905 (119894) | mimics read-
ers who click each link in a page uniformly at random
(2) 119872119901(Position) whose entry 119898(119894 119895) =
tanh 119903 ( 119895 |119894)sum119895isin119873119900119906119905 (119894 ) tanh 119903 ( 119895 |119894)
captures the scenario in which readers click with higher
probability links appearing first in the page This model is
based on previous work that shows how the link position is
a good predictor to determine the success of a link [16 31]
(3) 119872119888(Clicks) whose entry119898(119894 119895) = 119888119894119895sum
119895isin119873119900119906119905 (119894 ) 119888119894119895represents
the empirical probability that users in 119894 will click the link
toward 119895 When 119888119894 119895 lt 10 we substitute it with 1014 the
minimum number of times the link must be clicked to be
included in the dataset [53]
For the sake of completeness we recall that 119866 includes a super
node 119904 To fill its corresponding entries in the transition matrices
we need to aggregate over the edges we compressed to build the
graph15
see Sect 31
413 Readers Navigation Model The main goal of this paper is
to audit the mutual exposure to diverse information across 119875 and
119875 We can do it by simply looking at a snapshot of the graph and
counting the links going from 119875 to 119875 and vice-versa To do a step
further we recall that the Wikipediarsquos network is conceived to let
users move fulfilling their own information needs Thus we want
to understand how different usersrsquo navigation behavior can affect
readersrsquo exposure to diverse information
To do that it would be optimal to have access to usersrsquo log ses-
sion Because these data are not available to the public we define a
parametric model that simulates usersrsquo navigation by embedding
14We aim to model users on the current version of Wikipedia Thus to include all
the links we assign a smoothing factor equal to 10 to links clicked less than 10 This
implies a small probability of clicking these links Setting the smoothing factor to 10
is a deliberate choice However we experimentally verified that setting any number
between 1 and 10 does not affect the results
15The computation of these quantities is straightforward so we omit it from the
body of the paper
external
internal
i internal
Figure 3 Information from the clickstreamdataset For eachnode we extract the number of views coming from inter-nal and external websites Moreover we know howmany ac-cesses on a page turn into a click toward another article
different behaviors accordingly to chosen parameter We empha-
size that the scope of this model is not to perfectly replicate usersrsquo
behavior on Wikipedia Rather we want to see how users simu-
lated from a reasonable and general model are exposed to diverse
information
In other words we want to define a stochastic process with 119899 +1
states corresponding to the 119899 + 1 pages in119881 that approximates the
probability of reaching any of the articles starting at random from
119901 isin 119875 (or from 119875 )
Wemodel this by considering the process 119883 ℓ ℓ = 0 1 119871 on
the set of nodes119881 induced by transitionmatrix119872 with starting state
119883 0selected from the probability distribution 1206450
119875= (120587119875 )119894 isin R1times119899
over119881 We recall that the transition matrix119872 can vary according to
the CwP models (Sections 31 and 412) Based on the assumption
that usersrsquo session length (the number of clicks) is finite we evaluate
the process on a finite number of states 119871 We have that Pr(119883 ℓ =
119895) = (120587 ℓ119875) 119895 where the (row) vector 120645 ℓ
119875is given by the following
variation of the Personalized Random Walk with Restart (RWR)
Definition 1 (Navigation Model) Let1198720 be the transition ma-trix embedding a click-within-pages model 1206450
119875the distribution of the
starting state over 119875 and 120572 isin [0 1] the restart parameter We have
1206451
119875 = 1206450
119875middot1198720 (3)
and for ℓ ge 1
120645 ℓ+1
119875 = (1 minus 120572)120645 ℓ119875 middot119872ℓ + 120572 (1206450
119875 middot119872ℓ ) (4)
where119872ℓ = norm((119863 (119872ℓminus1)119879 )119879 ) and119863 = 119889119894119886119892
(1 + 120645 ℓminus1
119875
)minus1
norm(119872)transforms matrix119872 into a right-stochastic matrix by normalizingeach row independently such that it sums to 1
This process is a variation of the standard random-surfer (PageR-
ank) model with the difference that the transition matrix is updated
in each step It takes into account the probability that an article
has already been visited in a previous iteration Specifically the
vector 120645 ℓ119875that we get at the end of each iteration represents the
likelihood that each node is reached at step ℓ if it starts uniformly at
random from a node in 119875 We assume that readers within the same
session do not click more than once the same link Thus we desire
that at step ℓ + 1 the nodes that are clicked with high probability
at step ℓ see their probability of being reached deflated and those
with lower probability have more chances of being clicked We
achieve this by dividing the rows of119872 by the vector of probabilities
120645 ℓ119875+1 where 1 is a smoothing factor to avoid divisions by 0 and
then normalize the matrix to get the updated stochastic matrix to
use in the next iteration
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
(a) Star-like (120572 = 1) (b) Star-like Rand Navigation(0 lt
120572 lt 1)
(c) Random Naviga-
tion (0 = 120572 )
Figure 4 Navigation model for different 120572 The green nodesrepresent the starting navigation pages
Overall as we will later see in Section 42 this approach allow us
to investigate how the exposure to diverse information varies for
users who behave differently in terms of navigation session length
(meant as the number of clicks) and next-link choices
Looking deeper at the model
bull When 120572 = 1 Figure 4(a) the model emulates the reader
whose navigation consists in just opening links from the
starting page We call this behavior star-like and basically
consists in opening pages from the starting node With this
kind of exploration readers locally explore articles likely
semantically related to each other [49]
bull For 0 lt 120572 lt 1 Figure 4(b) we simulate two cases (1) readers
open sequential articles and then jump back to the starting
page (2) readers keeps multiple path open The more 120572 is
close to 1 the more users show a star-like behavior Instead
the closer 120572 is to 0 the more users navigate navigate in a
more DFS-oriented fashion Thus readers move randomly
according to the CwP model and from time to times jump
back to the starting page
bull If 120572 = 0 Figure 4(c) the users sequentially clicks links so
each click depends only on the CwP model In this case
especially if related articles are not densely connected the
exploration can lead to articles less related to the starting
page and returning to the origin following hyperlinks may
be difficult
Because Wikipedia does not have a button that allows readers to
go back to the previous page we assume the jumping back action
to consist in clicking the back button of the browser in use until
reaching again the session starting page The restart parameter
indirectly embeds the back-button action which for the absence of
back-links on Wikipedia can not be tracked on the graph
The behaviors replicated through the model recall those de-
scribed in [43 45]
42 Exposure MetricsAt this point we have all the ingredients to define the exposureto diverse information The metrics aim to quantify how much the
network structure allows readers to reach one or multiple sets of
articles To do that we rely on both the CwP and Navigationmodels
The application of the following metrics is not limited to polarizing
topics In fact they can generalize to the analysis of any sets of
nodes in a graph For this reason we adopt a more general notation
in their definition
Pro-
life
Pro-
choi
ce
Proh
ibiit
onAc
tivism
Cont
rol
Righ
ts
Crea
tioni
smEv
ol B
io
Racis
m
Anti-
racis
m
Disc
rimin
atio
nSu
ppor
t
0
5
10
Log(
Page
view
s)
Figure 5 Pageviews distribution For each topic we havea purple and yellow boxplot They represent the average(over all pages in the group 119875 or 119875) number of pageviewsAll the distribution distributions except for abortion are sta-tistically different at confidence level 120572 = 095 The topicsin order are abortion cannabis guns evolution racism andLGBT
Definition 2 (Exposure to diverse information (ExDIN)) Giventwo sets of pages 119875 119875 in 119881 let 120645 ℓ
119875be the vector indicating for each
article the probability of being reached at step ℓ (ℓ ge 1) starting froma random page in 119875 We say that the exposure of 119875 to 119875 is
119890ℓ119875rarr119875
=sum119895 isin119875
Pr(119883 ℓ = 119895) =sum119895 isin119875
120645 ℓ119875 (5)
and describes the probability that a reader in 119875 reaches an arbitrarynode in 119875 at the ℓth click
We employ this metric in two ways
(1) (Topological exposure to diverse information) If ℓ is 1 and
the CwP model is 119872119906(see Sect 412) it only quantifies
the topological property of the network to connect pages
belonging to different sets
(2) (Readersrsquo exposure to diverse information) For any parameter
and model that we pick the metric tells us how the readers
characterized by the CwP and Navigation models change
their exposure to diverse information over a session (ie
sequence of clicks)
Moreover we notice that Definition 2 can be extended tomultiple
sets Consider the case where we want to understand how one set of
nodes 119875 is exposed to three sets of nodes 119876119885 and 119871 To calculate
the ExDIN if we want to know the total exposure to the three sets
we define 119875 = 119876 cup 119885 cup 119871 Otherwise if we want to have the ExDIN
wrt to each set namely 119890119875rarr119876 119890119875rarr119885 119890119875rarr119871 we take 120645ℓ119875and sum
up the probabilities of the nodes within each set
Now that we have a metric to compute the exposure to diverseinformation we want to compare the flows among the sets Thus
we introduce the mutual exposure to diverse information
Definition 3 (Mutual exposure to diverse information (M-ExDIN))Let 119890ℓ
119875rarr119875and 119890ℓ
119875rarr119875be the exposure to diverse information of sets 119875
and 119875 We say that the mutual exposure between the sets is
120598ℓ =min119890ℓ
119875rarr119875 119890ℓ119875rarr119875
max119890ℓ119875rarr119875
119890ℓ119875rarr119875
isin [0 1] (6)
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
00 05 10 15 20 25 of pageview from
opposite partition
Pro-lifePro-choice
ProhibitionActivism
Gun controlGun rights
CreazionismEvolutionary biology
LGBT discriminationLGBT rights
RacismAnti-racism
Figure 6 Percentage of pageviews coming from oppositeside Topics in order from the bottom abortion cannabisguns evolution LGBT racism
If either 119890ℓ119875rarr119875
or 119890ℓ119875rarr119875
is 0 then 120598 = 0
This measure quantifies to what extend the exposure to diverse
information is balanced across 119875 and 119875
The closer 120598 is to 1 the more balanced the probabilities of moving
from one set to the other are In this case the network topology does
not favor connections from one set to the other On the other hand
if 120598 is close to 0 it the network structure tends to favor either the
navigation from 119875 rarr 119875 or from 119875 rarr 119875 On this perspective if we
observe a tendency in the network of facilitating the exploration
from one of the sets to the other we may say that the network
topology is biased toward a direction Thus we can think of using
M-ExDIN to measure the bias in the network wrt two sets of
nodes
Even though the mutual exposure to diverse information cap-
tures the balance among ExDIN of 119875 and 119875 when they are of com-
parable size it may fail if one is much smaller than the other For
instance suppose 119875 is 10 times larger than 119875 then if pages of both
partitions have a similar out-degree distribution one would expect
119890ℓ119875rarr119875
asymp 10 middot 119890ℓ119875rarr119875
and as a result 120598 asymp 01 The same happens if
they have similar in-degree distribution For this reason when we
compute either ExDIN and M-ExDIN we check whether the sizes
of the communities are unbalanced and we proceed as follows If
|119875 | lt |119875 | we define 119875 prime obtained by sampling |119875 | articles from 119875
Thus we use the new set for all computations Because of the ran-
domness of the phenomenon we repeat the measurements multiple
times
5 RQ1 READERSrsquo TOPIC CONSUMPTIONBefore looking into how readers are exposed to diverse content
we investigate how they have consumed each of the six topics
that we concentrate on over the last four years In particular we
collect monthly clickstream data from November 2017 to September
2020 We note that when we count the click views of a page we
consider the average over the number of months the page existed16
Accordingly when computing the occurrences for the transitions
matrix based on clickstream we consider the average clicks of the
link over the number of months it exists In this way we reduce
16Based on the temporal graphs extracted by [11]
the seasonality effect and weight links according to page changes
in terms of hyperlinks
51 Pageviews DistributionTo start our analysis we count the average number of times a
page has been visited over 34 months In Figure 5 we plot the
log-distributions of the pageviews for each topic and opinion By
running a t-test we conclude that for all topics except for abortionthe difference of the means of opinionsrsquo pageviews is statistically
significant for 119901 lt 005 This finding demonstrate that users tend to
visit more pages expressingsupporting one of the two viewpoints
From a networkrsquos perspective to increase the exposure to opposing
opinions it is desirable for pages that are frequently visited to be
well connected to articles expressing opposing opinions
In Figure 6 we break down the pageviews showing how many
of them come from pages of the opposing partition Overall the
fraction of visits from the opposite side is low (below 05) The
category LGBT rights has the highest ratio of visits from LGBT dis-crimination pages about 25 For topics such as guns and abortionthe percentage of visits from opposite partition shows that there
are somewhat fewer visits to pages of a liberal inclination from
articles expressing a more conservative opinion In fact the 028
of visits to pro-choice come from pro-life compared to 06 visits of
pro-life from pro-choice
52 External or Internal Access to the TopicWe now investigate how readers access content about a topic As
introduced in Section 411 from the clickstream data we can com-
pute the RSR which indicates whether a page is accessed more
by external sources or by navigating Wikipedia In Figure 7 we
provide a visualization that depicts the flows of the cumulative
visits from external and internal pages towards the two partitions
Referring to Figure 7(c) the 448 of visualizations come from
internal pages The click stream from internal pages is broken down
to see the proportion of flow towards guns control and gun rightsThe internal views of guns control articles are 34 times more than
those of gun rights We observe that also from external websites
most of the traffic is towards gun control (27 times more than gunrights) Overall the 26 of the total visits to gun related content is
concentrated on gun rightsThe abundance of traffic towards one of the two opinions does
not characterize only the guns topic Indeed among all the topics
the 59ndash74 of visits is accumulated by one partition Moreover
readersrsquo preferences appear consistent among external and internal
accesses that is they both point more towards the same view of
the topic For both internal and external views the distribution
of accesses toward partitions is approximately the same (ie the
percentage of visits from external to 119875 (resp 119875 ) is the same of
from internal to 119875 (resp119875 )) The only exception is evolution whoseexternal visits to creationism is 453 lower than internal accesses
We note that partitions with higher views are not necessarily the
biggest in the topic-induced networks
In general the largest amount of visits to topicsrsquo articles comes
from external pages Particularly only the 236 and 335 of traffic
to evolution and racism is generated by the internal Wikipediarsquos
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll PPPPPrrrrrooooo-----llllliiiiifffffeeeee
PPPPPrrrrrooooo-----ccccchhhhhoooooiiiiiccccceeeee
(a) Abortion
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
PPPPPrrrrrooooohhhhhiiiiibbbbbiiiiiiiiiitttttooooonnnnn
AAAAAccccctttttiiiiivvvvviiiiisssssmmmmm
(b) Cannabis
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
CCCCCooooonnnnntttttrrrrrooooolllll
RRRRRiiiiiggggghhhhhtttttsssss
(c) Guns
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllllCCCCCrrrrreeeeeaaaaatttttiiiiiooooonnnnniiiiisssssmmmmm
EEEEEvvvvvooooolllll BBBBBiiiiiooooo
(d) Evolution
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
RRRRRaaaaaccccciiiiisssssmmmmm
AAAAAnnnnntttttiiiii-----rrrrraaaaaccccciiiiisssssmmmmm
(e) Racism
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
DDDDDiiiiissssscccccrrrrriiiiimmmmmiiiiinnnnnaaaaatttttiiiiiooooonnnnn
SSSSSuuuuuppppppppppooooorrrrrttttt
(f) LGBT
Figure 7 Cumulative Pagesrsquo Traffic Eachplot indicates (1) On the left the cumulative amount of accesses coming from externalweb pages or internal Wikipediarsquos articles (2) The flows of visits from external and internal pages to partitions (3) On theright the cumulative accesses to 119875 and 119875
0 5 10 15 20 25 30Avg(Click Through Rate) X 100
Pro-lifePro-choiceProhibition
ActivismGun controlGun rights
CreazionismEvolutionary biologyLGBT discrimination
LGBT rightsRacism
Anti-racism
Figure 8 Average click-through rate In this plot we reportthe average CTR of pages belonging to the same set 119875 (resp119875) The score indicate the average probability that a linkwithin a page 119901 in 119875 (resp 119875) is clicked Topics in orderfrom the bottom abortion cannabis guns evolution racismLGBT
navigation The same quantity for the remaining topics ranges
ranges between 44 and 47
We point out that readersrsquo consulting articles about abortioncannabis and guns are inclined toward pages conveying liberal
views on the topic Instead it is more complicated to draw inter-
pretations about the remaining topics One explanation may be
that users look for information generally less covered in the public
mainstream debate
53 How Much Readers Navigate LinksOnce readers visit a page they can decide to click any of its links
We want to understand how frequently they do so For that we
compute the average pagesrsquo click-through rate (Sect 411)
We plot this information in Figure 8 Overall we see that the
percentage of access turning into a visit to another page ranges
between 10ndash28 Dimitrov and Lemmerich [14] observed that the
CTR average for the whole Wikipedia is 12 So most of the subset
of pages we consider have a CTR higher than Wikipediarsquos average
The CTR of guns control is the highest (28) the pages about racismfollow with 26 The articles that over the years have generated
less internal traffic are those about evolutionary biology and LGBTrights
Examining pagesrsquo connections we found that those with higher
CTR have more links (the Pearson correlation coefficient of is 052)
Topic macrC(119875)
macrC(119875 rarr 119875) macrC(119875 rarr 119875)
macrC(119875) macrC(119875 rarr 119875)
macrC(119875 rarr 119875)Abortion 6889 9082 5764 7026 8981 4537
Cannabis 7081 9501 3750 6578 9652 1667
Guns 5234 7869 3568 5963 7535 3928
Evolution 7115 8449 6447 7269 9900 5513
Racism 5636 8841 3432 7187 9063 6544
LGBT 6166 8942 525 7242 9252 5917
Table 3 Average of links within pages clicked less than 10times
macrC(119875) is the average percentage of un-clicked hyper-linkswithin pages in 119875
macrC(119875 rarr 119875) is the average percentageof un-clicked links within 119875 pointing to 119875
This suggests that articles having higher out-degree offer more
options to users Presumably because of this users are more likely
to continue the exploration from those articles
In addition we count the number of links clicked fewer than 10
times over the last three years see Table 3 As an example given
a page in creationism on average 7115 of its links have been
clicked fewer than 10 times If we distinguish between references
to creationism and references to evolution readers did not click
the 8449 of links pointing to creationism and the 6447 of those
pointing to evolutionary biology
6 RQ2 EXPOSURE ACROSS TOPICVIEWPOINTS
The main contribution of this paper is to examine to what extent
current Wikipediarsquos topology supports users to explore diverse
facets of polarizing issues In particular we study (1) how readers are
locally exposed to diverse information and (2) how their exposure
to plural opinions may change throughout a navigation session
61 Exposure to DiversityTo evaluate the exposure to diversity induced by the networkrsquos
topology we compute the exposure to diverse information for ℓ = 1
using the uniform CwP model Recalling that ℓ indicates the usersrsquo
session length if we set it equals to 1 we study the exposure to
diversity over one-click sessions
Plots in the first row of Figure 9 show the value of ExDIN for
all the topics when CwP is119872119906
For instance let evolution be the topic we analyze If readers
start uniformly at random from a page about creationism the prob-
ability of visiting an article of the same partition is 576 On the
contrary the chances of entering a page about evolutionary biology
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Pro-life
Pro-choice
Unifo
rm
456
748133
061 Prohibiiton
Activism
338
038039
008 Control
Rights
444
478161
116 Creationism
Evol Bio
576
327061
018 Racism
Anti-racism
876
8613
142 Discrimination
Support
424
395063
054
Pro-life
Pro-choice
Posit
ion
428
781115
056 Prohibiiton
Activism
332
037037
008 Control
Rights
468
473138
089 Creationism
Evol Bio
553
329054
016 Racism
Anti-racism
892
834129
159 Discrimination
Support
394
356056
042
Pro-
life
Pro-
choi
ce
Pro-life
Pro-choice
Click
s
1274
1357129
123Pr
ohib
iiton
Activ
ism
Prohibiiton
Activism
864
065081
013
Cont
rol
Righ
ts
Control
Rights
1376
1305208
137
Crea
tioni
sm
Evol
Bio
Creationism
Evol Bio
1157
52069
03
Racis
m
Anti-
racis
m
Racism
Anti-racism
2367
1503116
129
Disc
rimin
atio
n
Supp
ort
Discrimination
Support
1107
653066
053
Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click
is 018 (32 times smaller) On the other hand readers starting
uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion
This probability is 5 times larger than that of visiting creationismpages
It is worth to point out that the current networkrsquos topology not
only nudges users in reading more about the same opinion but
also hinders them to explore diverse content symmetrically Indeed
users reading about evolutionary biology have higher chances of
reading one article about creationism (3 times more) than users from
creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-
tioned observations hold for most of them Moreover we note that
the probabilities to continue the session reading a page of the same
opinion is greater for one of the two partitions of a given topic
Taken together these measurements highlight that the structure
of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to
diverse information smaller than 1
The findings above report the intrinsic capability of the network
to spur users towards diverse content If we want to combine it
with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results
in the second and third row of Figure 9
Referring back to evolution we now consider the matrix corre-
sponding to the ExDIN computed using the position CwP model
(second row) We see that if users click with higher chances links
at the top of the page wrt the uniform model the probabilities are
only slightly modified These modest variations are coherent with
linksrsquo placement within pages Figure 2
For a few topics such as guns the linksrsquo position plays a more
significant role worsening the user exposure to diverse information
Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-
sequence is that the probability of reaching an article supporting
guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the
uniform CwP model Therefore we conclude that for some topics
the placement of links within pages contributes in reducing the
exposure to diverse information In other words users who tend
to click with higher probability the links located towards the be-
ginning of a page have less possibility to read about contrasting
opinions
Finally we analyze how the phenomenon changes when we use
the click CwP model (third row) In this case we assume readers
make the next-click choice similarly to past users Going back to
evolution we immediately observe that the probability to start a
session in creationism and to continue reading about it after one
click grows from 576 of the uniform model to the 1157
For all the topics we verify a significant increment of the proba-
bility to visit pages of the same opinion Simply interpreting this
result we can say that real users click more the links strictly related
to the page they are reading From another perspective combining
this finding with the previous remark saying that the ldquotopology of
the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer
this question but we hope it will be addressed in future works
Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing
diverse view
Before moving to the next section we want to underline that
stronger relations among pages of similar content is an intrinsic
property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a
fact we believe that it would be valuable to provide to editors met-
rics and tools making them aware of the effect that a newcurrent
links have on usersrsquo exposure to diverse information This is not
meant to alter the core and essential intrinsic property ofWikipedia
rather to avoid this property to become harmful when it prevents
users from accessing diverse content
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
1 3 5 7 9 11 13 15
1
2
e PP
Abortion
1 3 5 7 9 11 13 15
00
05
10Cannabis
1 3 5 7 9 11 13 150
1
2
Guns
1 3 5 7 9 11 13 15
05
10Evolution
1 3 5 7 9 11 13 15
05
10
15
Racism
1 3 5 7 9 11 13 15
05
10
Lgbt
1 3 5 7 9 11 13 1500
05
10
15
e PP
1 3 5 7 9 11 13 15
000
025
050
1 3 5 7 9 11 13 15
1
2
1 3 5 7 9 11 13 15
02
04
06
1 3 5 7 9 11 13 1505
10
15
1 3 5 7 9 11 13 15
05
10
15
1 3 5 7 9 11 13 15
40
60
80
100
1 3 5 7 9 11 13 15
25
50
75
1 3 5 7 9 11 13 1525
50
75
100
1 3 5 7 9 11 13 15
40
60
80
1 3 5 7 9 11 13 15
60
80
100
1 3 5 7 9 11 13 15
60
80
100
Number of Clicks
uniform alpha=0position alpha=0clicks alpha=0
uniform alpha=02position alpha=02clicks alpha=02
uniform alpha=1position alpha=1clicks alpha=1
Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42
62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions
longer than 1 and see how their (mutual) exposure to diversity may
change According to the combinations of models employed to mea-
sure the ExDIN and M-ExDIN we provide different insights about
the effect of the current networkrsquos topology on usersrsquo exposure to
diverse content over a navigation session
In Figure 10 for sessions of length 15 we plot the ExDIN from
119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can
notice that each of the topics shows its own trends For this reason
we decide to highlight and provide an explanation for the most
recurrent patterns Moreover for better understanding we suggest
to cross-check the following explanations with the analysis done
above in the paper We start describing how given the current net-
workrsquos topology the ExDIN changes over the course of a navigation
session (first and second row)
(1) The curves corresponding to the same value of 120572 (same color)
show very similar trends Depending on the respective CwP model
(markerrsquos shape) they are shifted up or down This implies that
when users share the same navigation behavior the way they make
the next-click choice plays a crucial role on determining the mag-
nitude of their exposure to diverse content In general the CwP
model (markersrsquo shapes) corresponding to higher exposure is119872119888
followed by119872119906and119872119901
(2) If users navigate mirroring a star-like behavior (green 120572 = 1)
their exposure to the opposite opinion is steady It can slightly
decrease or increase when the probability of clicking links to the
opposite side becomes higher or lower respectively in the first
iterations This happens because these kind of users are only subject
to the exposure of their starting navigation page So the more links
to the opposite partition they click in the first steps the more their
exposure to diversity decreases and vice-versa
(3) The curves of users who randomly navigate the network (sky
blue 120572 = 0) show two trends For both cases after the first few
clicks the ExDIN is lower than at the beginning of the session
Then it inverts the trend In one case it reaches or exceeds the
starting exposure On the other hand it grows getting steady below
the initial exposure The more the destination partition is connected
to the rest of the graph the more users randomly navigating the
network are able to reach it Sometimes ExDIN from 119875 to 119875 of
LGBT the curves start to decrease after many steps This happens
when the pages within the destination partition have been reached
with high probability
(4) Users characterized by a star-like random navigation (blue
120572 = 02) are exposed to diverse content similarly to users exploring
randomly the network but the ExDIN magnitude is greater because
of the possibility of jumping back to the starting point at each step
Given this observation we analyze the ExDIN of guns With the
current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)
are those characterized by a star-like behavior As soon as the users
navigate in a random fashion this probability drops down This is
because the guns rights partition has a number of in-going edges
that prevent random users who walk away to get back to it We
observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to
the guns control partition is higher or comparable to that of star-like
behaving users The probability of reaching guns control after many
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
steps becomes high because of the higher incoming edges of the
partition
From complementary analysis we observe that for sessions
longer than 2 page visits the topology of the graph is not such
to keep random users within the knowledge bubble they accessed
Indeed for all the topics the probability to visit articles of the same
opinion of the starting page becomes close to 0 (refer to Fig 9 to
cross-check the probabilities at the first click) For users with star-
like behavior the probabilities show slightly descending curves
This demonstrates that they tend to pick pages of the same opinion
at the first iterations Due to space constraints figures picturing
these phenomena could not be displayed
Nowwe compare the ExDIN curves of each topic to understand if
users reading about different opinions have equal chances of visiting
each other We recall that to do it we use the Mutual exposure to
diverse information see Sect 42 For all the topics the mutual
exposure to diverse information is lower than 100 meaning that
for none of them does the network topology provide an equal
exposure across opinions If we consider topics like abortion and
guns the longer the usersrsquo session is the more the network topology
prevents readers to symmetrically explore different opinions
In general if we detect low mutuality the topology of the net-
work favors the exploration of one side of the topic more than the
other Moreover we want to stress that all the comments regarding
users navigating according to the uniform CwP model express the
intrinsic topological exposure of the network On Wikipedia two
scenarios determining this phenomenon may be (1) the knowledge
on the encyclopedia is complete but articles are underlinked [49]
that is there is the content and keywords to become anchors thus
we need a strategy to densify the network ensuringmutual exposure
to diversity (2) the knowledge on the encyclopedia is incomplete
that is there are no words to attach links thus the addition of com-
plementary content may be necessary An in depth investigation of
this conditions may be an interesting future work
7 CONCLUSIONSOur work provides a first analysis to understand how the current
Wikipediarsquos network topology assists readers to explore opposite
stances of polarizing topics spanning over sets of articles We for-
malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles
expressing opposing viewpoints The latter evaluates whether the
relationship across diverse views is symmetric that is whether
the flow and the opportunity to go from one side to the other is
comparable for the two directions
We investigate the phenomenon on six polarizing topics (Sect 6)
In addition we also study the overall usersrsquo topics consumption
Our main findings suggest the following
bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-
ing from both external and internal pages suggest that read-
ers are inclined to seek content about one facet of the topic
Most seem to have bias toward liberal content (see Fig 7)
bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse
contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61
the analysis suggests that users reading about an opinion
have higher chances of continuing to explore articles of sim-
ilar views than of the opposite Furthermore for each of the
topics that we explored the users of one of the two views
had substantially more tendency albeit small to visit pages
of the opposing view than the ones of the other one
bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we
observe that the mutual exposure to diverse information is
never achieved by users navigating completely at random
The better one of the two opinions is connected to the rest of
the network the more the graph nudges users toward that
opinion
bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-
mation of the same inclination diminishes On the other
hand if users explore the network with a star-like behavior
their level of exposure to the same opinion is similar to those
who only do one click
In our study we analyze sets of articles assigned to opinions ac-
cording to editorsrsquo crafted categories [44] Although this approach
represents a solid starting point for analysis it can cause article mis-
classification As future work we plan to investigate a more reliable
classification strategy to improve the accuracy of our analysis an
analysis which should include also the content of the articles along
the line information Secondly the performance of a longitudinal
study with the goal of understanding the dynamics that brought
to the current state of the encyclopediarsquos network would provide
further understanding of the usersrsquo behavior Finally in the light
of our findings we deem crucial to design tools to help editors to
contextualize articles within the network such that they are aware
of the effect of links insertion on users knowledge exploration
The prevalence of bias and polarization is well established in
multiple areas of our life and filter bubbles aggravate this phenom-
enon Understanding better how they manifest in Wikipedia (and
other media) is a crucial first step for finding ways to attenuate it
and our hope is that this work is a step towards this goal
Acknowledges Partially supported by the ERCAdvancedGrant
788893 AMDROMA Algorithmic and Mechanism Design Research
in Online Markets and MIUR PRIN project ALGADIMAR Algo-
rithms Games and Digital Markets
REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004
US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem
[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion
networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep
Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18
[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian
Self-Governance in Action Motivating the Policy Lens In ICWSM
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models
IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation
Journal of machine Learning research 3 Jan (2003) 993ndash1022
[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-
work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web
[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content
on famous persons Journal of the American society for information science andtechnology (2011)
[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter
Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM
[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves
Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter
In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs
A complete longitudinal and multi-language dataset of the Wikipedia link net-
works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607
[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena
Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-
ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media
[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho
2016 Extracting Semantics from Random Walks on Wikipedia Comparing
Learning and Counting Methods In Wiki ICWSM
[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-
ferent topic different traffic How search and navigation interplay on Wikipedia
The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28
[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926
[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web
[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding
news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346
[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers
and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320
[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in
Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72
[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In
Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)
[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Political discourse on social media Echo chambers gate-
keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922
[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-
ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671
[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)
[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-
rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282
[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First
Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media
[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013
Models of human navigation in information networks based on decentralized
search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98
[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki
dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-
trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113
[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and
Andreas Hotho 2019 On the right track Analysing and predicting navigation
success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext
and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web
Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web
[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017
How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50
[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of
source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754
[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-
nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and
moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505
[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my
friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726
[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito
2015 Measuring political polarization Twitter shows the two sides of Venezuela
Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018
Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18
[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving
website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624
[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring
Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval
[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk
about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11
[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019
Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-
fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and
Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141
[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014
The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222
[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom
of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn
Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600
[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013
Computing semantic relatedness from human navigational paths A case study
on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70
[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer
2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)
[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-
works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki
WikipediaManual_of_StyleLinking
[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia
Namespace
[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki
WikipediaNeutral_point_of_view
[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect
[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https
doiorg106084m9figshare1305770v22 (2017)
[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing
wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985
- Abstract
- 1 Introduction
- 2 Related Works
- 3 Data Collection
-
- 31 Topic Induced Networks
- 32 General Statistics on Topics Networks
-
- 4 Metrics
-
- 41 Content Consumption
- 42 Exposure Metrics
-
- 5 RQ1 Readers Topic Consumption
-
- 51 Pageviews Distribution
- 52 External or Internal Access to the Topic
- 53 How Much Readers Navigate Links
-
- 6 RQ2 Exposure Across Topic Viewpoints
-
- 61 Exposure to Diversity
- 62 Dynamic Exposure to Diversity
-
- 7 Conclusions
- References
-
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
that we have one of the two opinions represented by more articles
In terms of content this does not necessarily imply that neither
one of the two views is incomplete nor insufficiently represented
Indeed a topic spans a few articles or may require more pages to
be complete On the other hand the unbalanced sizes can affect an
opinionrsquos exposure within the entire Wikipediarsquos network Practi-
cally if a set of articles is large and well connected to the rest of
the network the chances that users who randomly browses reach
it are higher than those of going to a small partition Moreover if
readers exploit the random article functionality of Wikipedia an
opinion more represented gets more chances of being randomly
sampled
The topics showing the higher unbalance are cannabis wherethere are five times more pages about activism than about prohi-bition and evolution where there are four times more pages about
evolutionary biology than about creationism If we consider the edges
across partitions the number of cross-partition edges is higher for
bigger sets This is reasonable because more nodes can point to the
opposite side Despite that for evolution the edges from creationismto evolutionary biology are sim3 times more and for LGBT the edges
from discrimination to rights are 36 more Despite the low number
of edges across cannabis partitions we decide not to discard the
topic
Above we said that one of the two partitions might connect
better to the rest of the encyclopedia We observe that the sizes of 119875
and 119875 are not linear in the number of edges that point out or to the
nodes in the partitions For instance the number of articles about
pro-choice (291) is half of the nodes related to pro-life movement
(481) Although the nodes in pro-life are twice as many as those in
pro-choice the number of links pointing to pages about pro-choiceis 36 more than those pointing to pro-life articles This happenswith different magnitude also for guns and LGBT We will see later
that the fact that a side of a topic is better blended in the network
has implications on the readersrsquo exposure to one of the two sides
of the topic (Sect 6)
We also investigate how many pages in 119875 and 119875 cannot be
reached by users unless they enter Wikipedia directly on those
pages The sets of articles with the highest number of unreachable
nodes are in the category of cannabis prohibition (1136) followed
by the 562 of evolutionary biology and LGBT rights (535)Furthermore we compute the modularity 119876 among 119875 and 119875
Higher 119876 means that connections within partitions exceed those
among them In Table 1 we report three values computed on dif-
ferently weighted graphs with probabilities assigned to click the
link of each page as follows (1) uniform (2) proportional to the
position of the link within the page and (3) proportional to readersrsquo
clickstream (see Sect 412) Overall if we consider the position of
links and readersrsquo clickstream it seems that the partitions are more
modular
Based on that we study how links across and within partitions
position in pages First we define the position of a link Given a
page we have its list of links in order of appearance We get the
relative rank within the list for each link and re-scale it by the tanh
In this way we have values in [0 1] and the links at the top of the
list get a more similar score The set of links includes those in the
infoboxes We regard them as at the top of the article according to
results in [15 17] If a link appears more than once we average its
position
In Figure 2 we show the position distributions According to
the t-test whose significant level is fixed to 120572 = 095 the average
position of links in pro-choice pointing to pro-choice is significantlydifferent than the average position of links pointing to pro-life Alsothe position of links from guns control to guns control is signifi-cantly higher than those to guns rights For evolutionary biologywhose distribution of links to creationism are placed statistically
significantly lower than those to evolutionary biology The same
happens for LGBTFor the sake of completeness of the analysis even if not used
further in the paper for each topic we study the quality of the pages
populating it In particular we use the ORES API to get the ldquoarticlequalityrdquo We observe that overall for all the topics between 60 and
70 the articles are classified as stubs or start Then the 22-29 is
in B-class the 0-5 are Featured Articles and the remaining belong
to the C-class12
4 METRICSIn this section we define the models and metrics that we use to an-
swer the research questions formulated in Sect 1 First we describe
how we characterize readersrsquo consumption either by analyzing
real usersrsquo data or by simulating their behavior (see Sect 41) Then
we introduce the core metrics of the paper ExDIN and M-ExDIN
see Sect 42
41 Content ConsumptionTo understand readersrsquo consumption of polarizing topics we need
different modeling strategies that we describe in the following
subsections
411 Metrics Based on Clickstream We build twometrics upon the
information we extract from usersrsquo clickstream data that are made
publicly available by Wikimedia and preserve usersrsquo privacy [14
54]13
From these data we infer 119888119894 119895 counting how many times a hyper-
link to 119894 isin 119881 is clicked from page 119895 The page 119895 may be either an
internalWikipedia page ( 119895 isin 119860 recalling that119881 = T cupNcup119878 includeall the Wikipedia pages) or external if corresponds to a page from
outside Wikipedia (eg a search engine) Thus we define the vari-
able120575 119895 which indicateswhether 119895 is an external page or it belongs to
the topic-induced network 120575 119895 = 1 if 119895 is external and 0 otherwise
Given a page 119894 we indicate withJ the set of external and internal
pages pointing to it see Figure 3 We define 119888119894 =sum
119895 isinJ 119888 119895119894 to be
the total clicks to the page
sum119895 isinJ 120575119894119888 119895119894 is the total number of clicks
from external websites therefore the difference between 119888119894 and this
summation is the number of visits from internal (Wikipedia) pages
Now we are ready to define the following metrics
Reader Search Rate (RSR) Given a page 119894 isin 119881 the empirical
probability that a visit to page 119894 is from an external website is
119877119878119877119894 =
sum119895 isinJ 120575119894119888 119895119894
119888119894 (1)
12httpsenwikipediaorgwikiTemplateGrading_scheme
13Description of the data is at httpsmetawikimediaorgwikiResearch
Wikipedia_clickstream The provided information is enough to extract the clickstream
based metrics
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Click Through Rate (CTR) Given a page 119894 isin 119881 the empirical
probability that a reader clicks a link within the page is
119862119879119877119894 =
sum119895 isin119873119900119906119905 (119894) 119888119894 119895
119888119894 (2)
where 119873119900119906119905 (119894) is the the set of pages 119894 points to (Multiple clicks
from the same page are counted as originating from different visits
to 119894 and thus counted multiple times in 119888119894 )
412 Model Clicks Within Pages When readers visit a page they
have the possibility of clicking any of the present links However
according to the information needs they want to satisfy each of the
links may have a different probability of being clicked [45] Now
we propose three models to describe the distribution probability
of clicking a link ldquojrdquo within an article ldquoirdquo First let 119894 be an article
in 119881 and 119895 isin 119873119900119906119905 (119894) We define 119901119900119904 ( 119895 |119894) as the rank of 119895 among
all links in 119894 and 119903 ( 119895 |119894) = |119873119900119906119905 (119894) | minus 119901119900119904 ( 119895 |119894) such that a higher
value indicates a higher ranking position Moreover we introduce
tanh119909 = 1198902119909minus1
1198902119909+1 which we use to transform ranking positions to
values between 0 and 1 such that links at the top of the page get
similar scores
The Clicks Within Pages models (CwP) are directly applicable on
119866 by setting the transition matrix119872 in one of the following modes
(1) 119872119906(Uniform) whose entry119898(119894 119895) = 1
|119873119900119906119905 (119894) | mimics read-
ers who click each link in a page uniformly at random
(2) 119872119901(Position) whose entry 119898(119894 119895) =
tanh 119903 ( 119895 |119894)sum119895isin119873119900119906119905 (119894 ) tanh 119903 ( 119895 |119894)
captures the scenario in which readers click with higher
probability links appearing first in the page This model is
based on previous work that shows how the link position is
a good predictor to determine the success of a link [16 31]
(3) 119872119888(Clicks) whose entry119898(119894 119895) = 119888119894119895sum
119895isin119873119900119906119905 (119894 ) 119888119894119895represents
the empirical probability that users in 119894 will click the link
toward 119895 When 119888119894 119895 lt 10 we substitute it with 1014 the
minimum number of times the link must be clicked to be
included in the dataset [53]
For the sake of completeness we recall that 119866 includes a super
node 119904 To fill its corresponding entries in the transition matrices
we need to aggregate over the edges we compressed to build the
graph15
see Sect 31
413 Readers Navigation Model The main goal of this paper is
to audit the mutual exposure to diverse information across 119875 and
119875 We can do it by simply looking at a snapshot of the graph and
counting the links going from 119875 to 119875 and vice-versa To do a step
further we recall that the Wikipediarsquos network is conceived to let
users move fulfilling their own information needs Thus we want
to understand how different usersrsquo navigation behavior can affect
readersrsquo exposure to diverse information
To do that it would be optimal to have access to usersrsquo log ses-
sion Because these data are not available to the public we define a
parametric model that simulates usersrsquo navigation by embedding
14We aim to model users on the current version of Wikipedia Thus to include all
the links we assign a smoothing factor equal to 10 to links clicked less than 10 This
implies a small probability of clicking these links Setting the smoothing factor to 10
is a deliberate choice However we experimentally verified that setting any number
between 1 and 10 does not affect the results
15The computation of these quantities is straightforward so we omit it from the
body of the paper
external
internal
i internal
Figure 3 Information from the clickstreamdataset For eachnode we extract the number of views coming from inter-nal and external websites Moreover we know howmany ac-cesses on a page turn into a click toward another article
different behaviors accordingly to chosen parameter We empha-
size that the scope of this model is not to perfectly replicate usersrsquo
behavior on Wikipedia Rather we want to see how users simu-
lated from a reasonable and general model are exposed to diverse
information
In other words we want to define a stochastic process with 119899 +1
states corresponding to the 119899 + 1 pages in119881 that approximates the
probability of reaching any of the articles starting at random from
119901 isin 119875 (or from 119875 )
Wemodel this by considering the process 119883 ℓ ℓ = 0 1 119871 on
the set of nodes119881 induced by transitionmatrix119872 with starting state
119883 0selected from the probability distribution 1206450
119875= (120587119875 )119894 isin R1times119899
over119881 We recall that the transition matrix119872 can vary according to
the CwP models (Sections 31 and 412) Based on the assumption
that usersrsquo session length (the number of clicks) is finite we evaluate
the process on a finite number of states 119871 We have that Pr(119883 ℓ =
119895) = (120587 ℓ119875) 119895 where the (row) vector 120645 ℓ
119875is given by the following
variation of the Personalized Random Walk with Restart (RWR)
Definition 1 (Navigation Model) Let1198720 be the transition ma-trix embedding a click-within-pages model 1206450
119875the distribution of the
starting state over 119875 and 120572 isin [0 1] the restart parameter We have
1206451
119875 = 1206450
119875middot1198720 (3)
and for ℓ ge 1
120645 ℓ+1
119875 = (1 minus 120572)120645 ℓ119875 middot119872ℓ + 120572 (1206450
119875 middot119872ℓ ) (4)
where119872ℓ = norm((119863 (119872ℓminus1)119879 )119879 ) and119863 = 119889119894119886119892
(1 + 120645 ℓminus1
119875
)minus1
norm(119872)transforms matrix119872 into a right-stochastic matrix by normalizingeach row independently such that it sums to 1
This process is a variation of the standard random-surfer (PageR-
ank) model with the difference that the transition matrix is updated
in each step It takes into account the probability that an article
has already been visited in a previous iteration Specifically the
vector 120645 ℓ119875that we get at the end of each iteration represents the
likelihood that each node is reached at step ℓ if it starts uniformly at
random from a node in 119875 We assume that readers within the same
session do not click more than once the same link Thus we desire
that at step ℓ + 1 the nodes that are clicked with high probability
at step ℓ see their probability of being reached deflated and those
with lower probability have more chances of being clicked We
achieve this by dividing the rows of119872 by the vector of probabilities
120645 ℓ119875+1 where 1 is a smoothing factor to avoid divisions by 0 and
then normalize the matrix to get the updated stochastic matrix to
use in the next iteration
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
(a) Star-like (120572 = 1) (b) Star-like Rand Navigation(0 lt
120572 lt 1)
(c) Random Naviga-
tion (0 = 120572 )
Figure 4 Navigation model for different 120572 The green nodesrepresent the starting navigation pages
Overall as we will later see in Section 42 this approach allow us
to investigate how the exposure to diverse information varies for
users who behave differently in terms of navigation session length
(meant as the number of clicks) and next-link choices
Looking deeper at the model
bull When 120572 = 1 Figure 4(a) the model emulates the reader
whose navigation consists in just opening links from the
starting page We call this behavior star-like and basically
consists in opening pages from the starting node With this
kind of exploration readers locally explore articles likely
semantically related to each other [49]
bull For 0 lt 120572 lt 1 Figure 4(b) we simulate two cases (1) readers
open sequential articles and then jump back to the starting
page (2) readers keeps multiple path open The more 120572 is
close to 1 the more users show a star-like behavior Instead
the closer 120572 is to 0 the more users navigate navigate in a
more DFS-oriented fashion Thus readers move randomly
according to the CwP model and from time to times jump
back to the starting page
bull If 120572 = 0 Figure 4(c) the users sequentially clicks links so
each click depends only on the CwP model In this case
especially if related articles are not densely connected the
exploration can lead to articles less related to the starting
page and returning to the origin following hyperlinks may
be difficult
Because Wikipedia does not have a button that allows readers to
go back to the previous page we assume the jumping back action
to consist in clicking the back button of the browser in use until
reaching again the session starting page The restart parameter
indirectly embeds the back-button action which for the absence of
back-links on Wikipedia can not be tracked on the graph
The behaviors replicated through the model recall those de-
scribed in [43 45]
42 Exposure MetricsAt this point we have all the ingredients to define the exposureto diverse information The metrics aim to quantify how much the
network structure allows readers to reach one or multiple sets of
articles To do that we rely on both the CwP and Navigationmodels
The application of the following metrics is not limited to polarizing
topics In fact they can generalize to the analysis of any sets of
nodes in a graph For this reason we adopt a more general notation
in their definition
Pro-
life
Pro-
choi
ce
Proh
ibiit
onAc
tivism
Cont
rol
Righ
ts
Crea
tioni
smEv
ol B
io
Racis
m
Anti-
racis
m
Disc
rimin
atio
nSu
ppor
t
0
5
10
Log(
Page
view
s)
Figure 5 Pageviews distribution For each topic we havea purple and yellow boxplot They represent the average(over all pages in the group 119875 or 119875) number of pageviewsAll the distribution distributions except for abortion are sta-tistically different at confidence level 120572 = 095 The topicsin order are abortion cannabis guns evolution racism andLGBT
Definition 2 (Exposure to diverse information (ExDIN)) Giventwo sets of pages 119875 119875 in 119881 let 120645 ℓ
119875be the vector indicating for each
article the probability of being reached at step ℓ (ℓ ge 1) starting froma random page in 119875 We say that the exposure of 119875 to 119875 is
119890ℓ119875rarr119875
=sum119895 isin119875
Pr(119883 ℓ = 119895) =sum119895 isin119875
120645 ℓ119875 (5)
and describes the probability that a reader in 119875 reaches an arbitrarynode in 119875 at the ℓth click
We employ this metric in two ways
(1) (Topological exposure to diverse information) If ℓ is 1 and
the CwP model is 119872119906(see Sect 412) it only quantifies
the topological property of the network to connect pages
belonging to different sets
(2) (Readersrsquo exposure to diverse information) For any parameter
and model that we pick the metric tells us how the readers
characterized by the CwP and Navigation models change
their exposure to diverse information over a session (ie
sequence of clicks)
Moreover we notice that Definition 2 can be extended tomultiple
sets Consider the case where we want to understand how one set of
nodes 119875 is exposed to three sets of nodes 119876119885 and 119871 To calculate
the ExDIN if we want to know the total exposure to the three sets
we define 119875 = 119876 cup 119885 cup 119871 Otherwise if we want to have the ExDIN
wrt to each set namely 119890119875rarr119876 119890119875rarr119885 119890119875rarr119871 we take 120645ℓ119875and sum
up the probabilities of the nodes within each set
Now that we have a metric to compute the exposure to diverseinformation we want to compare the flows among the sets Thus
we introduce the mutual exposure to diverse information
Definition 3 (Mutual exposure to diverse information (M-ExDIN))Let 119890ℓ
119875rarr119875and 119890ℓ
119875rarr119875be the exposure to diverse information of sets 119875
and 119875 We say that the mutual exposure between the sets is
120598ℓ =min119890ℓ
119875rarr119875 119890ℓ119875rarr119875
max119890ℓ119875rarr119875
119890ℓ119875rarr119875
isin [0 1] (6)
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
00 05 10 15 20 25 of pageview from
opposite partition
Pro-lifePro-choice
ProhibitionActivism
Gun controlGun rights
CreazionismEvolutionary biology
LGBT discriminationLGBT rights
RacismAnti-racism
Figure 6 Percentage of pageviews coming from oppositeside Topics in order from the bottom abortion cannabisguns evolution LGBT racism
If either 119890ℓ119875rarr119875
or 119890ℓ119875rarr119875
is 0 then 120598 = 0
This measure quantifies to what extend the exposure to diverse
information is balanced across 119875 and 119875
The closer 120598 is to 1 the more balanced the probabilities of moving
from one set to the other are In this case the network topology does
not favor connections from one set to the other On the other hand
if 120598 is close to 0 it the network structure tends to favor either the
navigation from 119875 rarr 119875 or from 119875 rarr 119875 On this perspective if we
observe a tendency in the network of facilitating the exploration
from one of the sets to the other we may say that the network
topology is biased toward a direction Thus we can think of using
M-ExDIN to measure the bias in the network wrt two sets of
nodes
Even though the mutual exposure to diverse information cap-
tures the balance among ExDIN of 119875 and 119875 when they are of com-
parable size it may fail if one is much smaller than the other For
instance suppose 119875 is 10 times larger than 119875 then if pages of both
partitions have a similar out-degree distribution one would expect
119890ℓ119875rarr119875
asymp 10 middot 119890ℓ119875rarr119875
and as a result 120598 asymp 01 The same happens if
they have similar in-degree distribution For this reason when we
compute either ExDIN and M-ExDIN we check whether the sizes
of the communities are unbalanced and we proceed as follows If
|119875 | lt |119875 | we define 119875 prime obtained by sampling |119875 | articles from 119875
Thus we use the new set for all computations Because of the ran-
domness of the phenomenon we repeat the measurements multiple
times
5 RQ1 READERSrsquo TOPIC CONSUMPTIONBefore looking into how readers are exposed to diverse content
we investigate how they have consumed each of the six topics
that we concentrate on over the last four years In particular we
collect monthly clickstream data from November 2017 to September
2020 We note that when we count the click views of a page we
consider the average over the number of months the page existed16
Accordingly when computing the occurrences for the transitions
matrix based on clickstream we consider the average clicks of the
link over the number of months it exists In this way we reduce
16Based on the temporal graphs extracted by [11]
the seasonality effect and weight links according to page changes
in terms of hyperlinks
51 Pageviews DistributionTo start our analysis we count the average number of times a
page has been visited over 34 months In Figure 5 we plot the
log-distributions of the pageviews for each topic and opinion By
running a t-test we conclude that for all topics except for abortionthe difference of the means of opinionsrsquo pageviews is statistically
significant for 119901 lt 005 This finding demonstrate that users tend to
visit more pages expressingsupporting one of the two viewpoints
From a networkrsquos perspective to increase the exposure to opposing
opinions it is desirable for pages that are frequently visited to be
well connected to articles expressing opposing opinions
In Figure 6 we break down the pageviews showing how many
of them come from pages of the opposing partition Overall the
fraction of visits from the opposite side is low (below 05) The
category LGBT rights has the highest ratio of visits from LGBT dis-crimination pages about 25 For topics such as guns and abortionthe percentage of visits from opposite partition shows that there
are somewhat fewer visits to pages of a liberal inclination from
articles expressing a more conservative opinion In fact the 028
of visits to pro-choice come from pro-life compared to 06 visits of
pro-life from pro-choice
52 External or Internal Access to the TopicWe now investigate how readers access content about a topic As
introduced in Section 411 from the clickstream data we can com-
pute the RSR which indicates whether a page is accessed more
by external sources or by navigating Wikipedia In Figure 7 we
provide a visualization that depicts the flows of the cumulative
visits from external and internal pages towards the two partitions
Referring to Figure 7(c) the 448 of visualizations come from
internal pages The click stream from internal pages is broken down
to see the proportion of flow towards guns control and gun rightsThe internal views of guns control articles are 34 times more than
those of gun rights We observe that also from external websites
most of the traffic is towards gun control (27 times more than gunrights) Overall the 26 of the total visits to gun related content is
concentrated on gun rightsThe abundance of traffic towards one of the two opinions does
not characterize only the guns topic Indeed among all the topics
the 59ndash74 of visits is accumulated by one partition Moreover
readersrsquo preferences appear consistent among external and internal
accesses that is they both point more towards the same view of
the topic For both internal and external views the distribution
of accesses toward partitions is approximately the same (ie the
percentage of visits from external to 119875 (resp 119875 ) is the same of
from internal to 119875 (resp119875 )) The only exception is evolution whoseexternal visits to creationism is 453 lower than internal accesses
We note that partitions with higher views are not necessarily the
biggest in the topic-induced networks
In general the largest amount of visits to topicsrsquo articles comes
from external pages Particularly only the 236 and 335 of traffic
to evolution and racism is generated by the internal Wikipediarsquos
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll PPPPPrrrrrooooo-----llllliiiiifffffeeeee
PPPPPrrrrrooooo-----ccccchhhhhoooooiiiiiccccceeeee
(a) Abortion
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
PPPPPrrrrrooooohhhhhiiiiibbbbbiiiiiiiiiitttttooooonnnnn
AAAAAccccctttttiiiiivvvvviiiiisssssmmmmm
(b) Cannabis
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
CCCCCooooonnnnntttttrrrrrooooolllll
RRRRRiiiiiggggghhhhhtttttsssss
(c) Guns
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllllCCCCCrrrrreeeeeaaaaatttttiiiiiooooonnnnniiiiisssssmmmmm
EEEEEvvvvvooooolllll BBBBBiiiiiooooo
(d) Evolution
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
RRRRRaaaaaccccciiiiisssssmmmmm
AAAAAnnnnntttttiiiii-----rrrrraaaaaccccciiiiisssssmmmmm
(e) Racism
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
DDDDDiiiiissssscccccrrrrriiiiimmmmmiiiiinnnnnaaaaatttttiiiiiooooonnnnn
SSSSSuuuuuppppppppppooooorrrrrttttt
(f) LGBT
Figure 7 Cumulative Pagesrsquo Traffic Eachplot indicates (1) On the left the cumulative amount of accesses coming from externalweb pages or internal Wikipediarsquos articles (2) The flows of visits from external and internal pages to partitions (3) On theright the cumulative accesses to 119875 and 119875
0 5 10 15 20 25 30Avg(Click Through Rate) X 100
Pro-lifePro-choiceProhibition
ActivismGun controlGun rights
CreazionismEvolutionary biologyLGBT discrimination
LGBT rightsRacism
Anti-racism
Figure 8 Average click-through rate In this plot we reportthe average CTR of pages belonging to the same set 119875 (resp119875) The score indicate the average probability that a linkwithin a page 119901 in 119875 (resp 119875) is clicked Topics in orderfrom the bottom abortion cannabis guns evolution racismLGBT
navigation The same quantity for the remaining topics ranges
ranges between 44 and 47
We point out that readersrsquo consulting articles about abortioncannabis and guns are inclined toward pages conveying liberal
views on the topic Instead it is more complicated to draw inter-
pretations about the remaining topics One explanation may be
that users look for information generally less covered in the public
mainstream debate
53 How Much Readers Navigate LinksOnce readers visit a page they can decide to click any of its links
We want to understand how frequently they do so For that we
compute the average pagesrsquo click-through rate (Sect 411)
We plot this information in Figure 8 Overall we see that the
percentage of access turning into a visit to another page ranges
between 10ndash28 Dimitrov and Lemmerich [14] observed that the
CTR average for the whole Wikipedia is 12 So most of the subset
of pages we consider have a CTR higher than Wikipediarsquos average
The CTR of guns control is the highest (28) the pages about racismfollow with 26 The articles that over the years have generated
less internal traffic are those about evolutionary biology and LGBTrights
Examining pagesrsquo connections we found that those with higher
CTR have more links (the Pearson correlation coefficient of is 052)
Topic macrC(119875)
macrC(119875 rarr 119875) macrC(119875 rarr 119875)
macrC(119875) macrC(119875 rarr 119875)
macrC(119875 rarr 119875)Abortion 6889 9082 5764 7026 8981 4537
Cannabis 7081 9501 3750 6578 9652 1667
Guns 5234 7869 3568 5963 7535 3928
Evolution 7115 8449 6447 7269 9900 5513
Racism 5636 8841 3432 7187 9063 6544
LGBT 6166 8942 525 7242 9252 5917
Table 3 Average of links within pages clicked less than 10times
macrC(119875) is the average percentage of un-clicked hyper-linkswithin pages in 119875
macrC(119875 rarr 119875) is the average percentageof un-clicked links within 119875 pointing to 119875
This suggests that articles having higher out-degree offer more
options to users Presumably because of this users are more likely
to continue the exploration from those articles
In addition we count the number of links clicked fewer than 10
times over the last three years see Table 3 As an example given
a page in creationism on average 7115 of its links have been
clicked fewer than 10 times If we distinguish between references
to creationism and references to evolution readers did not click
the 8449 of links pointing to creationism and the 6447 of those
pointing to evolutionary biology
6 RQ2 EXPOSURE ACROSS TOPICVIEWPOINTS
The main contribution of this paper is to examine to what extent
current Wikipediarsquos topology supports users to explore diverse
facets of polarizing issues In particular we study (1) how readers are
locally exposed to diverse information and (2) how their exposure
to plural opinions may change throughout a navigation session
61 Exposure to DiversityTo evaluate the exposure to diversity induced by the networkrsquos
topology we compute the exposure to diverse information for ℓ = 1
using the uniform CwP model Recalling that ℓ indicates the usersrsquo
session length if we set it equals to 1 we study the exposure to
diversity over one-click sessions
Plots in the first row of Figure 9 show the value of ExDIN for
all the topics when CwP is119872119906
For instance let evolution be the topic we analyze If readers
start uniformly at random from a page about creationism the prob-
ability of visiting an article of the same partition is 576 On the
contrary the chances of entering a page about evolutionary biology
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Pro-life
Pro-choice
Unifo
rm
456
748133
061 Prohibiiton
Activism
338
038039
008 Control
Rights
444
478161
116 Creationism
Evol Bio
576
327061
018 Racism
Anti-racism
876
8613
142 Discrimination
Support
424
395063
054
Pro-life
Pro-choice
Posit
ion
428
781115
056 Prohibiiton
Activism
332
037037
008 Control
Rights
468
473138
089 Creationism
Evol Bio
553
329054
016 Racism
Anti-racism
892
834129
159 Discrimination
Support
394
356056
042
Pro-
life
Pro-
choi
ce
Pro-life
Pro-choice
Click
s
1274
1357129
123Pr
ohib
iiton
Activ
ism
Prohibiiton
Activism
864
065081
013
Cont
rol
Righ
ts
Control
Rights
1376
1305208
137
Crea
tioni
sm
Evol
Bio
Creationism
Evol Bio
1157
52069
03
Racis
m
Anti-
racis
m
Racism
Anti-racism
2367
1503116
129
Disc
rimin
atio
n
Supp
ort
Discrimination
Support
1107
653066
053
Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click
is 018 (32 times smaller) On the other hand readers starting
uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion
This probability is 5 times larger than that of visiting creationismpages
It is worth to point out that the current networkrsquos topology not
only nudges users in reading more about the same opinion but
also hinders them to explore diverse content symmetrically Indeed
users reading about evolutionary biology have higher chances of
reading one article about creationism (3 times more) than users from
creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-
tioned observations hold for most of them Moreover we note that
the probabilities to continue the session reading a page of the same
opinion is greater for one of the two partitions of a given topic
Taken together these measurements highlight that the structure
of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to
diverse information smaller than 1
The findings above report the intrinsic capability of the network
to spur users towards diverse content If we want to combine it
with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results
in the second and third row of Figure 9
Referring back to evolution we now consider the matrix corre-
sponding to the ExDIN computed using the position CwP model
(second row) We see that if users click with higher chances links
at the top of the page wrt the uniform model the probabilities are
only slightly modified These modest variations are coherent with
linksrsquo placement within pages Figure 2
For a few topics such as guns the linksrsquo position plays a more
significant role worsening the user exposure to diverse information
Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-
sequence is that the probability of reaching an article supporting
guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the
uniform CwP model Therefore we conclude that for some topics
the placement of links within pages contributes in reducing the
exposure to diverse information In other words users who tend
to click with higher probability the links located towards the be-
ginning of a page have less possibility to read about contrasting
opinions
Finally we analyze how the phenomenon changes when we use
the click CwP model (third row) In this case we assume readers
make the next-click choice similarly to past users Going back to
evolution we immediately observe that the probability to start a
session in creationism and to continue reading about it after one
click grows from 576 of the uniform model to the 1157
For all the topics we verify a significant increment of the proba-
bility to visit pages of the same opinion Simply interpreting this
result we can say that real users click more the links strictly related
to the page they are reading From another perspective combining
this finding with the previous remark saying that the ldquotopology of
the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer
this question but we hope it will be addressed in future works
Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing
diverse view
Before moving to the next section we want to underline that
stronger relations among pages of similar content is an intrinsic
property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a
fact we believe that it would be valuable to provide to editors met-
rics and tools making them aware of the effect that a newcurrent
links have on usersrsquo exposure to diverse information This is not
meant to alter the core and essential intrinsic property ofWikipedia
rather to avoid this property to become harmful when it prevents
users from accessing diverse content
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
1 3 5 7 9 11 13 15
1
2
e PP
Abortion
1 3 5 7 9 11 13 15
00
05
10Cannabis
1 3 5 7 9 11 13 150
1
2
Guns
1 3 5 7 9 11 13 15
05
10Evolution
1 3 5 7 9 11 13 15
05
10
15
Racism
1 3 5 7 9 11 13 15
05
10
Lgbt
1 3 5 7 9 11 13 1500
05
10
15
e PP
1 3 5 7 9 11 13 15
000
025
050
1 3 5 7 9 11 13 15
1
2
1 3 5 7 9 11 13 15
02
04
06
1 3 5 7 9 11 13 1505
10
15
1 3 5 7 9 11 13 15
05
10
15
1 3 5 7 9 11 13 15
40
60
80
100
1 3 5 7 9 11 13 15
25
50
75
1 3 5 7 9 11 13 1525
50
75
100
1 3 5 7 9 11 13 15
40
60
80
1 3 5 7 9 11 13 15
60
80
100
1 3 5 7 9 11 13 15
60
80
100
Number of Clicks
uniform alpha=0position alpha=0clicks alpha=0
uniform alpha=02position alpha=02clicks alpha=02
uniform alpha=1position alpha=1clicks alpha=1
Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42
62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions
longer than 1 and see how their (mutual) exposure to diversity may
change According to the combinations of models employed to mea-
sure the ExDIN and M-ExDIN we provide different insights about
the effect of the current networkrsquos topology on usersrsquo exposure to
diverse content over a navigation session
In Figure 10 for sessions of length 15 we plot the ExDIN from
119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can
notice that each of the topics shows its own trends For this reason
we decide to highlight and provide an explanation for the most
recurrent patterns Moreover for better understanding we suggest
to cross-check the following explanations with the analysis done
above in the paper We start describing how given the current net-
workrsquos topology the ExDIN changes over the course of a navigation
session (first and second row)
(1) The curves corresponding to the same value of 120572 (same color)
show very similar trends Depending on the respective CwP model
(markerrsquos shape) they are shifted up or down This implies that
when users share the same navigation behavior the way they make
the next-click choice plays a crucial role on determining the mag-
nitude of their exposure to diverse content In general the CwP
model (markersrsquo shapes) corresponding to higher exposure is119872119888
followed by119872119906and119872119901
(2) If users navigate mirroring a star-like behavior (green 120572 = 1)
their exposure to the opposite opinion is steady It can slightly
decrease or increase when the probability of clicking links to the
opposite side becomes higher or lower respectively in the first
iterations This happens because these kind of users are only subject
to the exposure of their starting navigation page So the more links
to the opposite partition they click in the first steps the more their
exposure to diversity decreases and vice-versa
(3) The curves of users who randomly navigate the network (sky
blue 120572 = 0) show two trends For both cases after the first few
clicks the ExDIN is lower than at the beginning of the session
Then it inverts the trend In one case it reaches or exceeds the
starting exposure On the other hand it grows getting steady below
the initial exposure The more the destination partition is connected
to the rest of the graph the more users randomly navigating the
network are able to reach it Sometimes ExDIN from 119875 to 119875 of
LGBT the curves start to decrease after many steps This happens
when the pages within the destination partition have been reached
with high probability
(4) Users characterized by a star-like random navigation (blue
120572 = 02) are exposed to diverse content similarly to users exploring
randomly the network but the ExDIN magnitude is greater because
of the possibility of jumping back to the starting point at each step
Given this observation we analyze the ExDIN of guns With the
current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)
are those characterized by a star-like behavior As soon as the users
navigate in a random fashion this probability drops down This is
because the guns rights partition has a number of in-going edges
that prevent random users who walk away to get back to it We
observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to
the guns control partition is higher or comparable to that of star-like
behaving users The probability of reaching guns control after many
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
steps becomes high because of the higher incoming edges of the
partition
From complementary analysis we observe that for sessions
longer than 2 page visits the topology of the graph is not such
to keep random users within the knowledge bubble they accessed
Indeed for all the topics the probability to visit articles of the same
opinion of the starting page becomes close to 0 (refer to Fig 9 to
cross-check the probabilities at the first click) For users with star-
like behavior the probabilities show slightly descending curves
This demonstrates that they tend to pick pages of the same opinion
at the first iterations Due to space constraints figures picturing
these phenomena could not be displayed
Nowwe compare the ExDIN curves of each topic to understand if
users reading about different opinions have equal chances of visiting
each other We recall that to do it we use the Mutual exposure to
diverse information see Sect 42 For all the topics the mutual
exposure to diverse information is lower than 100 meaning that
for none of them does the network topology provide an equal
exposure across opinions If we consider topics like abortion and
guns the longer the usersrsquo session is the more the network topology
prevents readers to symmetrically explore different opinions
In general if we detect low mutuality the topology of the net-
work favors the exploration of one side of the topic more than the
other Moreover we want to stress that all the comments regarding
users navigating according to the uniform CwP model express the
intrinsic topological exposure of the network On Wikipedia two
scenarios determining this phenomenon may be (1) the knowledge
on the encyclopedia is complete but articles are underlinked [49]
that is there is the content and keywords to become anchors thus
we need a strategy to densify the network ensuringmutual exposure
to diversity (2) the knowledge on the encyclopedia is incomplete
that is there are no words to attach links thus the addition of com-
plementary content may be necessary An in depth investigation of
this conditions may be an interesting future work
7 CONCLUSIONSOur work provides a first analysis to understand how the current
Wikipediarsquos network topology assists readers to explore opposite
stances of polarizing topics spanning over sets of articles We for-
malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles
expressing opposing viewpoints The latter evaluates whether the
relationship across diverse views is symmetric that is whether
the flow and the opportunity to go from one side to the other is
comparable for the two directions
We investigate the phenomenon on six polarizing topics (Sect 6)
In addition we also study the overall usersrsquo topics consumption
Our main findings suggest the following
bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-
ing from both external and internal pages suggest that read-
ers are inclined to seek content about one facet of the topic
Most seem to have bias toward liberal content (see Fig 7)
bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse
contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61
the analysis suggests that users reading about an opinion
have higher chances of continuing to explore articles of sim-
ilar views than of the opposite Furthermore for each of the
topics that we explored the users of one of the two views
had substantially more tendency albeit small to visit pages
of the opposing view than the ones of the other one
bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we
observe that the mutual exposure to diverse information is
never achieved by users navigating completely at random
The better one of the two opinions is connected to the rest of
the network the more the graph nudges users toward that
opinion
bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-
mation of the same inclination diminishes On the other
hand if users explore the network with a star-like behavior
their level of exposure to the same opinion is similar to those
who only do one click
In our study we analyze sets of articles assigned to opinions ac-
cording to editorsrsquo crafted categories [44] Although this approach
represents a solid starting point for analysis it can cause article mis-
classification As future work we plan to investigate a more reliable
classification strategy to improve the accuracy of our analysis an
analysis which should include also the content of the articles along
the line information Secondly the performance of a longitudinal
study with the goal of understanding the dynamics that brought
to the current state of the encyclopediarsquos network would provide
further understanding of the usersrsquo behavior Finally in the light
of our findings we deem crucial to design tools to help editors to
contextualize articles within the network such that they are aware
of the effect of links insertion on users knowledge exploration
The prevalence of bias and polarization is well established in
multiple areas of our life and filter bubbles aggravate this phenom-
enon Understanding better how they manifest in Wikipedia (and
other media) is a crucial first step for finding ways to attenuate it
and our hope is that this work is a step towards this goal
Acknowledges Partially supported by the ERCAdvancedGrant
788893 AMDROMA Algorithmic and Mechanism Design Research
in Online Markets and MIUR PRIN project ALGADIMAR Algo-
rithms Games and Digital Markets
REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004
US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem
[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion
networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep
Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18
[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian
Self-Governance in Action Motivating the Policy Lens In ICWSM
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models
IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation
Journal of machine Learning research 3 Jan (2003) 993ndash1022
[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-
work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web
[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content
on famous persons Journal of the American society for information science andtechnology (2011)
[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter
Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM
[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves
Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter
In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs
A complete longitudinal and multi-language dataset of the Wikipedia link net-
works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607
[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena
Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-
ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media
[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho
2016 Extracting Semantics from Random Walks on Wikipedia Comparing
Learning and Counting Methods In Wiki ICWSM
[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-
ferent topic different traffic How search and navigation interplay on Wikipedia
The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28
[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926
[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web
[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding
news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346
[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers
and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320
[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in
Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72
[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In
Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)
[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Political discourse on social media Echo chambers gate-
keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922
[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-
ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671
[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)
[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-
rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282
[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First
Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media
[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013
Models of human navigation in information networks based on decentralized
search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98
[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki
dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-
trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113
[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and
Andreas Hotho 2019 On the right track Analysing and predicting navigation
success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext
and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web
Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web
[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017
How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50
[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of
source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754
[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-
nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and
moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505
[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my
friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726
[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito
2015 Measuring political polarization Twitter shows the two sides of Venezuela
Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018
Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18
[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving
website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624
[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring
Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval
[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk
about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11
[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019
Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-
fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and
Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141
[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014
The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222
[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom
of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn
Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600
[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013
Computing semantic relatedness from human navigational paths A case study
on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70
[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer
2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)
[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-
works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki
WikipediaManual_of_StyleLinking
[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia
Namespace
[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki
WikipediaNeutral_point_of_view
[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect
[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https
doiorg106084m9figshare1305770v22 (2017)
[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing
wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985
- Abstract
- 1 Introduction
- 2 Related Works
- 3 Data Collection
-
- 31 Topic Induced Networks
- 32 General Statistics on Topics Networks
-
- 4 Metrics
-
- 41 Content Consumption
- 42 Exposure Metrics
-
- 5 RQ1 Readers Topic Consumption
-
- 51 Pageviews Distribution
- 52 External or Internal Access to the Topic
- 53 How Much Readers Navigate Links
-
- 6 RQ2 Exposure Across Topic Viewpoints
-
- 61 Exposure to Diversity
- 62 Dynamic Exposure to Diversity
-
- 7 Conclusions
- References
-
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Click Through Rate (CTR) Given a page 119894 isin 119881 the empirical
probability that a reader clicks a link within the page is
119862119879119877119894 =
sum119895 isin119873119900119906119905 (119894) 119888119894 119895
119888119894 (2)
where 119873119900119906119905 (119894) is the the set of pages 119894 points to (Multiple clicks
from the same page are counted as originating from different visits
to 119894 and thus counted multiple times in 119888119894 )
412 Model Clicks Within Pages When readers visit a page they
have the possibility of clicking any of the present links However
according to the information needs they want to satisfy each of the
links may have a different probability of being clicked [45] Now
we propose three models to describe the distribution probability
of clicking a link ldquojrdquo within an article ldquoirdquo First let 119894 be an article
in 119881 and 119895 isin 119873119900119906119905 (119894) We define 119901119900119904 ( 119895 |119894) as the rank of 119895 among
all links in 119894 and 119903 ( 119895 |119894) = |119873119900119906119905 (119894) | minus 119901119900119904 ( 119895 |119894) such that a higher
value indicates a higher ranking position Moreover we introduce
tanh119909 = 1198902119909minus1
1198902119909+1 which we use to transform ranking positions to
values between 0 and 1 such that links at the top of the page get
similar scores
The Clicks Within Pages models (CwP) are directly applicable on
119866 by setting the transition matrix119872 in one of the following modes
(1) 119872119906(Uniform) whose entry119898(119894 119895) = 1
|119873119900119906119905 (119894) | mimics read-
ers who click each link in a page uniformly at random
(2) 119872119901(Position) whose entry 119898(119894 119895) =
tanh 119903 ( 119895 |119894)sum119895isin119873119900119906119905 (119894 ) tanh 119903 ( 119895 |119894)
captures the scenario in which readers click with higher
probability links appearing first in the page This model is
based on previous work that shows how the link position is
a good predictor to determine the success of a link [16 31]
(3) 119872119888(Clicks) whose entry119898(119894 119895) = 119888119894119895sum
119895isin119873119900119906119905 (119894 ) 119888119894119895represents
the empirical probability that users in 119894 will click the link
toward 119895 When 119888119894 119895 lt 10 we substitute it with 1014 the
minimum number of times the link must be clicked to be
included in the dataset [53]
For the sake of completeness we recall that 119866 includes a super
node 119904 To fill its corresponding entries in the transition matrices
we need to aggregate over the edges we compressed to build the
graph15
see Sect 31
413 Readers Navigation Model The main goal of this paper is
to audit the mutual exposure to diverse information across 119875 and
119875 We can do it by simply looking at a snapshot of the graph and
counting the links going from 119875 to 119875 and vice-versa To do a step
further we recall that the Wikipediarsquos network is conceived to let
users move fulfilling their own information needs Thus we want
to understand how different usersrsquo navigation behavior can affect
readersrsquo exposure to diverse information
To do that it would be optimal to have access to usersrsquo log ses-
sion Because these data are not available to the public we define a
parametric model that simulates usersrsquo navigation by embedding
14We aim to model users on the current version of Wikipedia Thus to include all
the links we assign a smoothing factor equal to 10 to links clicked less than 10 This
implies a small probability of clicking these links Setting the smoothing factor to 10
is a deliberate choice However we experimentally verified that setting any number
between 1 and 10 does not affect the results
15The computation of these quantities is straightforward so we omit it from the
body of the paper
external
internal
i internal
Figure 3 Information from the clickstreamdataset For eachnode we extract the number of views coming from inter-nal and external websites Moreover we know howmany ac-cesses on a page turn into a click toward another article
different behaviors accordingly to chosen parameter We empha-
size that the scope of this model is not to perfectly replicate usersrsquo
behavior on Wikipedia Rather we want to see how users simu-
lated from a reasonable and general model are exposed to diverse
information
In other words we want to define a stochastic process with 119899 +1
states corresponding to the 119899 + 1 pages in119881 that approximates the
probability of reaching any of the articles starting at random from
119901 isin 119875 (or from 119875 )
Wemodel this by considering the process 119883 ℓ ℓ = 0 1 119871 on
the set of nodes119881 induced by transitionmatrix119872 with starting state
119883 0selected from the probability distribution 1206450
119875= (120587119875 )119894 isin R1times119899
over119881 We recall that the transition matrix119872 can vary according to
the CwP models (Sections 31 and 412) Based on the assumption
that usersrsquo session length (the number of clicks) is finite we evaluate
the process on a finite number of states 119871 We have that Pr(119883 ℓ =
119895) = (120587 ℓ119875) 119895 where the (row) vector 120645 ℓ
119875is given by the following
variation of the Personalized Random Walk with Restart (RWR)
Definition 1 (Navigation Model) Let1198720 be the transition ma-trix embedding a click-within-pages model 1206450
119875the distribution of the
starting state over 119875 and 120572 isin [0 1] the restart parameter We have
1206451
119875 = 1206450
119875middot1198720 (3)
and for ℓ ge 1
120645 ℓ+1
119875 = (1 minus 120572)120645 ℓ119875 middot119872ℓ + 120572 (1206450
119875 middot119872ℓ ) (4)
where119872ℓ = norm((119863 (119872ℓminus1)119879 )119879 ) and119863 = 119889119894119886119892
(1 + 120645 ℓminus1
119875
)minus1
norm(119872)transforms matrix119872 into a right-stochastic matrix by normalizingeach row independently such that it sums to 1
This process is a variation of the standard random-surfer (PageR-
ank) model with the difference that the transition matrix is updated
in each step It takes into account the probability that an article
has already been visited in a previous iteration Specifically the
vector 120645 ℓ119875that we get at the end of each iteration represents the
likelihood that each node is reached at step ℓ if it starts uniformly at
random from a node in 119875 We assume that readers within the same
session do not click more than once the same link Thus we desire
that at step ℓ + 1 the nodes that are clicked with high probability
at step ℓ see their probability of being reached deflated and those
with lower probability have more chances of being clicked We
achieve this by dividing the rows of119872 by the vector of probabilities
120645 ℓ119875+1 where 1 is a smoothing factor to avoid divisions by 0 and
then normalize the matrix to get the updated stochastic matrix to
use in the next iteration
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
(a) Star-like (120572 = 1) (b) Star-like Rand Navigation(0 lt
120572 lt 1)
(c) Random Naviga-
tion (0 = 120572 )
Figure 4 Navigation model for different 120572 The green nodesrepresent the starting navigation pages
Overall as we will later see in Section 42 this approach allow us
to investigate how the exposure to diverse information varies for
users who behave differently in terms of navigation session length
(meant as the number of clicks) and next-link choices
Looking deeper at the model
bull When 120572 = 1 Figure 4(a) the model emulates the reader
whose navigation consists in just opening links from the
starting page We call this behavior star-like and basically
consists in opening pages from the starting node With this
kind of exploration readers locally explore articles likely
semantically related to each other [49]
bull For 0 lt 120572 lt 1 Figure 4(b) we simulate two cases (1) readers
open sequential articles and then jump back to the starting
page (2) readers keeps multiple path open The more 120572 is
close to 1 the more users show a star-like behavior Instead
the closer 120572 is to 0 the more users navigate navigate in a
more DFS-oriented fashion Thus readers move randomly
according to the CwP model and from time to times jump
back to the starting page
bull If 120572 = 0 Figure 4(c) the users sequentially clicks links so
each click depends only on the CwP model In this case
especially if related articles are not densely connected the
exploration can lead to articles less related to the starting
page and returning to the origin following hyperlinks may
be difficult
Because Wikipedia does not have a button that allows readers to
go back to the previous page we assume the jumping back action
to consist in clicking the back button of the browser in use until
reaching again the session starting page The restart parameter
indirectly embeds the back-button action which for the absence of
back-links on Wikipedia can not be tracked on the graph
The behaviors replicated through the model recall those de-
scribed in [43 45]
42 Exposure MetricsAt this point we have all the ingredients to define the exposureto diverse information The metrics aim to quantify how much the
network structure allows readers to reach one or multiple sets of
articles To do that we rely on both the CwP and Navigationmodels
The application of the following metrics is not limited to polarizing
topics In fact they can generalize to the analysis of any sets of
nodes in a graph For this reason we adopt a more general notation
in their definition
Pro-
life
Pro-
choi
ce
Proh
ibiit
onAc
tivism
Cont
rol
Righ
ts
Crea
tioni
smEv
ol B
io
Racis
m
Anti-
racis
m
Disc
rimin
atio
nSu
ppor
t
0
5
10
Log(
Page
view
s)
Figure 5 Pageviews distribution For each topic we havea purple and yellow boxplot They represent the average(over all pages in the group 119875 or 119875) number of pageviewsAll the distribution distributions except for abortion are sta-tistically different at confidence level 120572 = 095 The topicsin order are abortion cannabis guns evolution racism andLGBT
Definition 2 (Exposure to diverse information (ExDIN)) Giventwo sets of pages 119875 119875 in 119881 let 120645 ℓ
119875be the vector indicating for each
article the probability of being reached at step ℓ (ℓ ge 1) starting froma random page in 119875 We say that the exposure of 119875 to 119875 is
119890ℓ119875rarr119875
=sum119895 isin119875
Pr(119883 ℓ = 119895) =sum119895 isin119875
120645 ℓ119875 (5)
and describes the probability that a reader in 119875 reaches an arbitrarynode in 119875 at the ℓth click
We employ this metric in two ways
(1) (Topological exposure to diverse information) If ℓ is 1 and
the CwP model is 119872119906(see Sect 412) it only quantifies
the topological property of the network to connect pages
belonging to different sets
(2) (Readersrsquo exposure to diverse information) For any parameter
and model that we pick the metric tells us how the readers
characterized by the CwP and Navigation models change
their exposure to diverse information over a session (ie
sequence of clicks)
Moreover we notice that Definition 2 can be extended tomultiple
sets Consider the case where we want to understand how one set of
nodes 119875 is exposed to three sets of nodes 119876119885 and 119871 To calculate
the ExDIN if we want to know the total exposure to the three sets
we define 119875 = 119876 cup 119885 cup 119871 Otherwise if we want to have the ExDIN
wrt to each set namely 119890119875rarr119876 119890119875rarr119885 119890119875rarr119871 we take 120645ℓ119875and sum
up the probabilities of the nodes within each set
Now that we have a metric to compute the exposure to diverseinformation we want to compare the flows among the sets Thus
we introduce the mutual exposure to diverse information
Definition 3 (Mutual exposure to diverse information (M-ExDIN))Let 119890ℓ
119875rarr119875and 119890ℓ
119875rarr119875be the exposure to diverse information of sets 119875
and 119875 We say that the mutual exposure between the sets is
120598ℓ =min119890ℓ
119875rarr119875 119890ℓ119875rarr119875
max119890ℓ119875rarr119875
119890ℓ119875rarr119875
isin [0 1] (6)
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
00 05 10 15 20 25 of pageview from
opposite partition
Pro-lifePro-choice
ProhibitionActivism
Gun controlGun rights
CreazionismEvolutionary biology
LGBT discriminationLGBT rights
RacismAnti-racism
Figure 6 Percentage of pageviews coming from oppositeside Topics in order from the bottom abortion cannabisguns evolution LGBT racism
If either 119890ℓ119875rarr119875
or 119890ℓ119875rarr119875
is 0 then 120598 = 0
This measure quantifies to what extend the exposure to diverse
information is balanced across 119875 and 119875
The closer 120598 is to 1 the more balanced the probabilities of moving
from one set to the other are In this case the network topology does
not favor connections from one set to the other On the other hand
if 120598 is close to 0 it the network structure tends to favor either the
navigation from 119875 rarr 119875 or from 119875 rarr 119875 On this perspective if we
observe a tendency in the network of facilitating the exploration
from one of the sets to the other we may say that the network
topology is biased toward a direction Thus we can think of using
M-ExDIN to measure the bias in the network wrt two sets of
nodes
Even though the mutual exposure to diverse information cap-
tures the balance among ExDIN of 119875 and 119875 when they are of com-
parable size it may fail if one is much smaller than the other For
instance suppose 119875 is 10 times larger than 119875 then if pages of both
partitions have a similar out-degree distribution one would expect
119890ℓ119875rarr119875
asymp 10 middot 119890ℓ119875rarr119875
and as a result 120598 asymp 01 The same happens if
they have similar in-degree distribution For this reason when we
compute either ExDIN and M-ExDIN we check whether the sizes
of the communities are unbalanced and we proceed as follows If
|119875 | lt |119875 | we define 119875 prime obtained by sampling |119875 | articles from 119875
Thus we use the new set for all computations Because of the ran-
domness of the phenomenon we repeat the measurements multiple
times
5 RQ1 READERSrsquo TOPIC CONSUMPTIONBefore looking into how readers are exposed to diverse content
we investigate how they have consumed each of the six topics
that we concentrate on over the last four years In particular we
collect monthly clickstream data from November 2017 to September
2020 We note that when we count the click views of a page we
consider the average over the number of months the page existed16
Accordingly when computing the occurrences for the transitions
matrix based on clickstream we consider the average clicks of the
link over the number of months it exists In this way we reduce
16Based on the temporal graphs extracted by [11]
the seasonality effect and weight links according to page changes
in terms of hyperlinks
51 Pageviews DistributionTo start our analysis we count the average number of times a
page has been visited over 34 months In Figure 5 we plot the
log-distributions of the pageviews for each topic and opinion By
running a t-test we conclude that for all topics except for abortionthe difference of the means of opinionsrsquo pageviews is statistically
significant for 119901 lt 005 This finding demonstrate that users tend to
visit more pages expressingsupporting one of the two viewpoints
From a networkrsquos perspective to increase the exposure to opposing
opinions it is desirable for pages that are frequently visited to be
well connected to articles expressing opposing opinions
In Figure 6 we break down the pageviews showing how many
of them come from pages of the opposing partition Overall the
fraction of visits from the opposite side is low (below 05) The
category LGBT rights has the highest ratio of visits from LGBT dis-crimination pages about 25 For topics such as guns and abortionthe percentage of visits from opposite partition shows that there
are somewhat fewer visits to pages of a liberal inclination from
articles expressing a more conservative opinion In fact the 028
of visits to pro-choice come from pro-life compared to 06 visits of
pro-life from pro-choice
52 External or Internal Access to the TopicWe now investigate how readers access content about a topic As
introduced in Section 411 from the clickstream data we can com-
pute the RSR which indicates whether a page is accessed more
by external sources or by navigating Wikipedia In Figure 7 we
provide a visualization that depicts the flows of the cumulative
visits from external and internal pages towards the two partitions
Referring to Figure 7(c) the 448 of visualizations come from
internal pages The click stream from internal pages is broken down
to see the proportion of flow towards guns control and gun rightsThe internal views of guns control articles are 34 times more than
those of gun rights We observe that also from external websites
most of the traffic is towards gun control (27 times more than gunrights) Overall the 26 of the total visits to gun related content is
concentrated on gun rightsThe abundance of traffic towards one of the two opinions does
not characterize only the guns topic Indeed among all the topics
the 59ndash74 of visits is accumulated by one partition Moreover
readersrsquo preferences appear consistent among external and internal
accesses that is they both point more towards the same view of
the topic For both internal and external views the distribution
of accesses toward partitions is approximately the same (ie the
percentage of visits from external to 119875 (resp 119875 ) is the same of
from internal to 119875 (resp119875 )) The only exception is evolution whoseexternal visits to creationism is 453 lower than internal accesses
We note that partitions with higher views are not necessarily the
biggest in the topic-induced networks
In general the largest amount of visits to topicsrsquo articles comes
from external pages Particularly only the 236 and 335 of traffic
to evolution and racism is generated by the internal Wikipediarsquos
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll PPPPPrrrrrooooo-----llllliiiiifffffeeeee
PPPPPrrrrrooooo-----ccccchhhhhoooooiiiiiccccceeeee
(a) Abortion
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
PPPPPrrrrrooooohhhhhiiiiibbbbbiiiiiiiiiitttttooooonnnnn
AAAAAccccctttttiiiiivvvvviiiiisssssmmmmm
(b) Cannabis
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
CCCCCooooonnnnntttttrrrrrooooolllll
RRRRRiiiiiggggghhhhhtttttsssss
(c) Guns
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllllCCCCCrrrrreeeeeaaaaatttttiiiiiooooonnnnniiiiisssssmmmmm
EEEEEvvvvvooooolllll BBBBBiiiiiooooo
(d) Evolution
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
RRRRRaaaaaccccciiiiisssssmmmmm
AAAAAnnnnntttttiiiii-----rrrrraaaaaccccciiiiisssssmmmmm
(e) Racism
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
DDDDDiiiiissssscccccrrrrriiiiimmmmmiiiiinnnnnaaaaatttttiiiiiooooonnnnn
SSSSSuuuuuppppppppppooooorrrrrttttt
(f) LGBT
Figure 7 Cumulative Pagesrsquo Traffic Eachplot indicates (1) On the left the cumulative amount of accesses coming from externalweb pages or internal Wikipediarsquos articles (2) The flows of visits from external and internal pages to partitions (3) On theright the cumulative accesses to 119875 and 119875
0 5 10 15 20 25 30Avg(Click Through Rate) X 100
Pro-lifePro-choiceProhibition
ActivismGun controlGun rights
CreazionismEvolutionary biologyLGBT discrimination
LGBT rightsRacism
Anti-racism
Figure 8 Average click-through rate In this plot we reportthe average CTR of pages belonging to the same set 119875 (resp119875) The score indicate the average probability that a linkwithin a page 119901 in 119875 (resp 119875) is clicked Topics in orderfrom the bottom abortion cannabis guns evolution racismLGBT
navigation The same quantity for the remaining topics ranges
ranges between 44 and 47
We point out that readersrsquo consulting articles about abortioncannabis and guns are inclined toward pages conveying liberal
views on the topic Instead it is more complicated to draw inter-
pretations about the remaining topics One explanation may be
that users look for information generally less covered in the public
mainstream debate
53 How Much Readers Navigate LinksOnce readers visit a page they can decide to click any of its links
We want to understand how frequently they do so For that we
compute the average pagesrsquo click-through rate (Sect 411)
We plot this information in Figure 8 Overall we see that the
percentage of access turning into a visit to another page ranges
between 10ndash28 Dimitrov and Lemmerich [14] observed that the
CTR average for the whole Wikipedia is 12 So most of the subset
of pages we consider have a CTR higher than Wikipediarsquos average
The CTR of guns control is the highest (28) the pages about racismfollow with 26 The articles that over the years have generated
less internal traffic are those about evolutionary biology and LGBTrights
Examining pagesrsquo connections we found that those with higher
CTR have more links (the Pearson correlation coefficient of is 052)
Topic macrC(119875)
macrC(119875 rarr 119875) macrC(119875 rarr 119875)
macrC(119875) macrC(119875 rarr 119875)
macrC(119875 rarr 119875)Abortion 6889 9082 5764 7026 8981 4537
Cannabis 7081 9501 3750 6578 9652 1667
Guns 5234 7869 3568 5963 7535 3928
Evolution 7115 8449 6447 7269 9900 5513
Racism 5636 8841 3432 7187 9063 6544
LGBT 6166 8942 525 7242 9252 5917
Table 3 Average of links within pages clicked less than 10times
macrC(119875) is the average percentage of un-clicked hyper-linkswithin pages in 119875
macrC(119875 rarr 119875) is the average percentageof un-clicked links within 119875 pointing to 119875
This suggests that articles having higher out-degree offer more
options to users Presumably because of this users are more likely
to continue the exploration from those articles
In addition we count the number of links clicked fewer than 10
times over the last three years see Table 3 As an example given
a page in creationism on average 7115 of its links have been
clicked fewer than 10 times If we distinguish between references
to creationism and references to evolution readers did not click
the 8449 of links pointing to creationism and the 6447 of those
pointing to evolutionary biology
6 RQ2 EXPOSURE ACROSS TOPICVIEWPOINTS
The main contribution of this paper is to examine to what extent
current Wikipediarsquos topology supports users to explore diverse
facets of polarizing issues In particular we study (1) how readers are
locally exposed to diverse information and (2) how their exposure
to plural opinions may change throughout a navigation session
61 Exposure to DiversityTo evaluate the exposure to diversity induced by the networkrsquos
topology we compute the exposure to diverse information for ℓ = 1
using the uniform CwP model Recalling that ℓ indicates the usersrsquo
session length if we set it equals to 1 we study the exposure to
diversity over one-click sessions
Plots in the first row of Figure 9 show the value of ExDIN for
all the topics when CwP is119872119906
For instance let evolution be the topic we analyze If readers
start uniformly at random from a page about creationism the prob-
ability of visiting an article of the same partition is 576 On the
contrary the chances of entering a page about evolutionary biology
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Pro-life
Pro-choice
Unifo
rm
456
748133
061 Prohibiiton
Activism
338
038039
008 Control
Rights
444
478161
116 Creationism
Evol Bio
576
327061
018 Racism
Anti-racism
876
8613
142 Discrimination
Support
424
395063
054
Pro-life
Pro-choice
Posit
ion
428
781115
056 Prohibiiton
Activism
332
037037
008 Control
Rights
468
473138
089 Creationism
Evol Bio
553
329054
016 Racism
Anti-racism
892
834129
159 Discrimination
Support
394
356056
042
Pro-
life
Pro-
choi
ce
Pro-life
Pro-choice
Click
s
1274
1357129
123Pr
ohib
iiton
Activ
ism
Prohibiiton
Activism
864
065081
013
Cont
rol
Righ
ts
Control
Rights
1376
1305208
137
Crea
tioni
sm
Evol
Bio
Creationism
Evol Bio
1157
52069
03
Racis
m
Anti-
racis
m
Racism
Anti-racism
2367
1503116
129
Disc
rimin
atio
n
Supp
ort
Discrimination
Support
1107
653066
053
Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click
is 018 (32 times smaller) On the other hand readers starting
uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion
This probability is 5 times larger than that of visiting creationismpages
It is worth to point out that the current networkrsquos topology not
only nudges users in reading more about the same opinion but
also hinders them to explore diverse content symmetrically Indeed
users reading about evolutionary biology have higher chances of
reading one article about creationism (3 times more) than users from
creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-
tioned observations hold for most of them Moreover we note that
the probabilities to continue the session reading a page of the same
opinion is greater for one of the two partitions of a given topic
Taken together these measurements highlight that the structure
of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to
diverse information smaller than 1
The findings above report the intrinsic capability of the network
to spur users towards diverse content If we want to combine it
with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results
in the second and third row of Figure 9
Referring back to evolution we now consider the matrix corre-
sponding to the ExDIN computed using the position CwP model
(second row) We see that if users click with higher chances links
at the top of the page wrt the uniform model the probabilities are
only slightly modified These modest variations are coherent with
linksrsquo placement within pages Figure 2
For a few topics such as guns the linksrsquo position plays a more
significant role worsening the user exposure to diverse information
Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-
sequence is that the probability of reaching an article supporting
guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the
uniform CwP model Therefore we conclude that for some topics
the placement of links within pages contributes in reducing the
exposure to diverse information In other words users who tend
to click with higher probability the links located towards the be-
ginning of a page have less possibility to read about contrasting
opinions
Finally we analyze how the phenomenon changes when we use
the click CwP model (third row) In this case we assume readers
make the next-click choice similarly to past users Going back to
evolution we immediately observe that the probability to start a
session in creationism and to continue reading about it after one
click grows from 576 of the uniform model to the 1157
For all the topics we verify a significant increment of the proba-
bility to visit pages of the same opinion Simply interpreting this
result we can say that real users click more the links strictly related
to the page they are reading From another perspective combining
this finding with the previous remark saying that the ldquotopology of
the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer
this question but we hope it will be addressed in future works
Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing
diverse view
Before moving to the next section we want to underline that
stronger relations among pages of similar content is an intrinsic
property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a
fact we believe that it would be valuable to provide to editors met-
rics and tools making them aware of the effect that a newcurrent
links have on usersrsquo exposure to diverse information This is not
meant to alter the core and essential intrinsic property ofWikipedia
rather to avoid this property to become harmful when it prevents
users from accessing diverse content
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
1 3 5 7 9 11 13 15
1
2
e PP
Abortion
1 3 5 7 9 11 13 15
00
05
10Cannabis
1 3 5 7 9 11 13 150
1
2
Guns
1 3 5 7 9 11 13 15
05
10Evolution
1 3 5 7 9 11 13 15
05
10
15
Racism
1 3 5 7 9 11 13 15
05
10
Lgbt
1 3 5 7 9 11 13 1500
05
10
15
e PP
1 3 5 7 9 11 13 15
000
025
050
1 3 5 7 9 11 13 15
1
2
1 3 5 7 9 11 13 15
02
04
06
1 3 5 7 9 11 13 1505
10
15
1 3 5 7 9 11 13 15
05
10
15
1 3 5 7 9 11 13 15
40
60
80
100
1 3 5 7 9 11 13 15
25
50
75
1 3 5 7 9 11 13 1525
50
75
100
1 3 5 7 9 11 13 15
40
60
80
1 3 5 7 9 11 13 15
60
80
100
1 3 5 7 9 11 13 15
60
80
100
Number of Clicks
uniform alpha=0position alpha=0clicks alpha=0
uniform alpha=02position alpha=02clicks alpha=02
uniform alpha=1position alpha=1clicks alpha=1
Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42
62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions
longer than 1 and see how their (mutual) exposure to diversity may
change According to the combinations of models employed to mea-
sure the ExDIN and M-ExDIN we provide different insights about
the effect of the current networkrsquos topology on usersrsquo exposure to
diverse content over a navigation session
In Figure 10 for sessions of length 15 we plot the ExDIN from
119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can
notice that each of the topics shows its own trends For this reason
we decide to highlight and provide an explanation for the most
recurrent patterns Moreover for better understanding we suggest
to cross-check the following explanations with the analysis done
above in the paper We start describing how given the current net-
workrsquos topology the ExDIN changes over the course of a navigation
session (first and second row)
(1) The curves corresponding to the same value of 120572 (same color)
show very similar trends Depending on the respective CwP model
(markerrsquos shape) they are shifted up or down This implies that
when users share the same navigation behavior the way they make
the next-click choice plays a crucial role on determining the mag-
nitude of their exposure to diverse content In general the CwP
model (markersrsquo shapes) corresponding to higher exposure is119872119888
followed by119872119906and119872119901
(2) If users navigate mirroring a star-like behavior (green 120572 = 1)
their exposure to the opposite opinion is steady It can slightly
decrease or increase when the probability of clicking links to the
opposite side becomes higher or lower respectively in the first
iterations This happens because these kind of users are only subject
to the exposure of their starting navigation page So the more links
to the opposite partition they click in the first steps the more their
exposure to diversity decreases and vice-versa
(3) The curves of users who randomly navigate the network (sky
blue 120572 = 0) show two trends For both cases after the first few
clicks the ExDIN is lower than at the beginning of the session
Then it inverts the trend In one case it reaches or exceeds the
starting exposure On the other hand it grows getting steady below
the initial exposure The more the destination partition is connected
to the rest of the graph the more users randomly navigating the
network are able to reach it Sometimes ExDIN from 119875 to 119875 of
LGBT the curves start to decrease after many steps This happens
when the pages within the destination partition have been reached
with high probability
(4) Users characterized by a star-like random navigation (blue
120572 = 02) are exposed to diverse content similarly to users exploring
randomly the network but the ExDIN magnitude is greater because
of the possibility of jumping back to the starting point at each step
Given this observation we analyze the ExDIN of guns With the
current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)
are those characterized by a star-like behavior As soon as the users
navigate in a random fashion this probability drops down This is
because the guns rights partition has a number of in-going edges
that prevent random users who walk away to get back to it We
observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to
the guns control partition is higher or comparable to that of star-like
behaving users The probability of reaching guns control after many
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
steps becomes high because of the higher incoming edges of the
partition
From complementary analysis we observe that for sessions
longer than 2 page visits the topology of the graph is not such
to keep random users within the knowledge bubble they accessed
Indeed for all the topics the probability to visit articles of the same
opinion of the starting page becomes close to 0 (refer to Fig 9 to
cross-check the probabilities at the first click) For users with star-
like behavior the probabilities show slightly descending curves
This demonstrates that they tend to pick pages of the same opinion
at the first iterations Due to space constraints figures picturing
these phenomena could not be displayed
Nowwe compare the ExDIN curves of each topic to understand if
users reading about different opinions have equal chances of visiting
each other We recall that to do it we use the Mutual exposure to
diverse information see Sect 42 For all the topics the mutual
exposure to diverse information is lower than 100 meaning that
for none of them does the network topology provide an equal
exposure across opinions If we consider topics like abortion and
guns the longer the usersrsquo session is the more the network topology
prevents readers to symmetrically explore different opinions
In general if we detect low mutuality the topology of the net-
work favors the exploration of one side of the topic more than the
other Moreover we want to stress that all the comments regarding
users navigating according to the uniform CwP model express the
intrinsic topological exposure of the network On Wikipedia two
scenarios determining this phenomenon may be (1) the knowledge
on the encyclopedia is complete but articles are underlinked [49]
that is there is the content and keywords to become anchors thus
we need a strategy to densify the network ensuringmutual exposure
to diversity (2) the knowledge on the encyclopedia is incomplete
that is there are no words to attach links thus the addition of com-
plementary content may be necessary An in depth investigation of
this conditions may be an interesting future work
7 CONCLUSIONSOur work provides a first analysis to understand how the current
Wikipediarsquos network topology assists readers to explore opposite
stances of polarizing topics spanning over sets of articles We for-
malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles
expressing opposing viewpoints The latter evaluates whether the
relationship across diverse views is symmetric that is whether
the flow and the opportunity to go from one side to the other is
comparable for the two directions
We investigate the phenomenon on six polarizing topics (Sect 6)
In addition we also study the overall usersrsquo topics consumption
Our main findings suggest the following
bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-
ing from both external and internal pages suggest that read-
ers are inclined to seek content about one facet of the topic
Most seem to have bias toward liberal content (see Fig 7)
bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse
contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61
the analysis suggests that users reading about an opinion
have higher chances of continuing to explore articles of sim-
ilar views than of the opposite Furthermore for each of the
topics that we explored the users of one of the two views
had substantially more tendency albeit small to visit pages
of the opposing view than the ones of the other one
bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we
observe that the mutual exposure to diverse information is
never achieved by users navigating completely at random
The better one of the two opinions is connected to the rest of
the network the more the graph nudges users toward that
opinion
bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-
mation of the same inclination diminishes On the other
hand if users explore the network with a star-like behavior
their level of exposure to the same opinion is similar to those
who only do one click
In our study we analyze sets of articles assigned to opinions ac-
cording to editorsrsquo crafted categories [44] Although this approach
represents a solid starting point for analysis it can cause article mis-
classification As future work we plan to investigate a more reliable
classification strategy to improve the accuracy of our analysis an
analysis which should include also the content of the articles along
the line information Secondly the performance of a longitudinal
study with the goal of understanding the dynamics that brought
to the current state of the encyclopediarsquos network would provide
further understanding of the usersrsquo behavior Finally in the light
of our findings we deem crucial to design tools to help editors to
contextualize articles within the network such that they are aware
of the effect of links insertion on users knowledge exploration
The prevalence of bias and polarization is well established in
multiple areas of our life and filter bubbles aggravate this phenom-
enon Understanding better how they manifest in Wikipedia (and
other media) is a crucial first step for finding ways to attenuate it
and our hope is that this work is a step towards this goal
Acknowledges Partially supported by the ERCAdvancedGrant
788893 AMDROMA Algorithmic and Mechanism Design Research
in Online Markets and MIUR PRIN project ALGADIMAR Algo-
rithms Games and Digital Markets
REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004
US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem
[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion
networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep
Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18
[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian
Self-Governance in Action Motivating the Policy Lens In ICWSM
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models
IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation
Journal of machine Learning research 3 Jan (2003) 993ndash1022
[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-
work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web
[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content
on famous persons Journal of the American society for information science andtechnology (2011)
[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter
Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM
[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves
Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter
In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs
A complete longitudinal and multi-language dataset of the Wikipedia link net-
works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607
[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena
Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-
ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media
[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho
2016 Extracting Semantics from Random Walks on Wikipedia Comparing
Learning and Counting Methods In Wiki ICWSM
[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-
ferent topic different traffic How search and navigation interplay on Wikipedia
The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28
[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926
[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web
[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding
news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346
[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers
and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320
[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in
Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72
[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In
Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)
[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Political discourse on social media Echo chambers gate-
keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922
[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-
ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671
[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)
[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-
rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282
[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First
Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media
[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013
Models of human navigation in information networks based on decentralized
search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98
[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki
dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-
trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113
[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and
Andreas Hotho 2019 On the right track Analysing and predicting navigation
success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext
and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web
Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web
[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017
How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50
[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of
source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754
[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-
nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and
moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505
[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my
friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726
[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito
2015 Measuring political polarization Twitter shows the two sides of Venezuela
Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018
Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18
[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving
website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624
[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring
Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval
[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk
about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11
[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019
Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-
fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and
Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141
[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014
The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222
[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom
of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn
Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600
[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013
Computing semantic relatedness from human navigational paths A case study
on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70
[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer
2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)
[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-
works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki
WikipediaManual_of_StyleLinking
[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia
Namespace
[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki
WikipediaNeutral_point_of_view
[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect
[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https
doiorg106084m9figshare1305770v22 (2017)
[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing
wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985
- Abstract
- 1 Introduction
- 2 Related Works
- 3 Data Collection
-
- 31 Topic Induced Networks
- 32 General Statistics on Topics Networks
-
- 4 Metrics
-
- 41 Content Consumption
- 42 Exposure Metrics
-
- 5 RQ1 Readers Topic Consumption
-
- 51 Pageviews Distribution
- 52 External or Internal Access to the Topic
- 53 How Much Readers Navigate Links
-
- 6 RQ2 Exposure Across Topic Viewpoints
-
- 61 Exposure to Diversity
- 62 Dynamic Exposure to Diversity
-
- 7 Conclusions
- References
-
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
(a) Star-like (120572 = 1) (b) Star-like Rand Navigation(0 lt
120572 lt 1)
(c) Random Naviga-
tion (0 = 120572 )
Figure 4 Navigation model for different 120572 The green nodesrepresent the starting navigation pages
Overall as we will later see in Section 42 this approach allow us
to investigate how the exposure to diverse information varies for
users who behave differently in terms of navigation session length
(meant as the number of clicks) and next-link choices
Looking deeper at the model
bull When 120572 = 1 Figure 4(a) the model emulates the reader
whose navigation consists in just opening links from the
starting page We call this behavior star-like and basically
consists in opening pages from the starting node With this
kind of exploration readers locally explore articles likely
semantically related to each other [49]
bull For 0 lt 120572 lt 1 Figure 4(b) we simulate two cases (1) readers
open sequential articles and then jump back to the starting
page (2) readers keeps multiple path open The more 120572 is
close to 1 the more users show a star-like behavior Instead
the closer 120572 is to 0 the more users navigate navigate in a
more DFS-oriented fashion Thus readers move randomly
according to the CwP model and from time to times jump
back to the starting page
bull If 120572 = 0 Figure 4(c) the users sequentially clicks links so
each click depends only on the CwP model In this case
especially if related articles are not densely connected the
exploration can lead to articles less related to the starting
page and returning to the origin following hyperlinks may
be difficult
Because Wikipedia does not have a button that allows readers to
go back to the previous page we assume the jumping back action
to consist in clicking the back button of the browser in use until
reaching again the session starting page The restart parameter
indirectly embeds the back-button action which for the absence of
back-links on Wikipedia can not be tracked on the graph
The behaviors replicated through the model recall those de-
scribed in [43 45]
42 Exposure MetricsAt this point we have all the ingredients to define the exposureto diverse information The metrics aim to quantify how much the
network structure allows readers to reach one or multiple sets of
articles To do that we rely on both the CwP and Navigationmodels
The application of the following metrics is not limited to polarizing
topics In fact they can generalize to the analysis of any sets of
nodes in a graph For this reason we adopt a more general notation
in their definition
Pro-
life
Pro-
choi
ce
Proh
ibiit
onAc
tivism
Cont
rol
Righ
ts
Crea
tioni
smEv
ol B
io
Racis
m
Anti-
racis
m
Disc
rimin
atio
nSu
ppor
t
0
5
10
Log(
Page
view
s)
Figure 5 Pageviews distribution For each topic we havea purple and yellow boxplot They represent the average(over all pages in the group 119875 or 119875) number of pageviewsAll the distribution distributions except for abortion are sta-tistically different at confidence level 120572 = 095 The topicsin order are abortion cannabis guns evolution racism andLGBT
Definition 2 (Exposure to diverse information (ExDIN)) Giventwo sets of pages 119875 119875 in 119881 let 120645 ℓ
119875be the vector indicating for each
article the probability of being reached at step ℓ (ℓ ge 1) starting froma random page in 119875 We say that the exposure of 119875 to 119875 is
119890ℓ119875rarr119875
=sum119895 isin119875
Pr(119883 ℓ = 119895) =sum119895 isin119875
120645 ℓ119875 (5)
and describes the probability that a reader in 119875 reaches an arbitrarynode in 119875 at the ℓth click
We employ this metric in two ways
(1) (Topological exposure to diverse information) If ℓ is 1 and
the CwP model is 119872119906(see Sect 412) it only quantifies
the topological property of the network to connect pages
belonging to different sets
(2) (Readersrsquo exposure to diverse information) For any parameter
and model that we pick the metric tells us how the readers
characterized by the CwP and Navigation models change
their exposure to diverse information over a session (ie
sequence of clicks)
Moreover we notice that Definition 2 can be extended tomultiple
sets Consider the case where we want to understand how one set of
nodes 119875 is exposed to three sets of nodes 119876119885 and 119871 To calculate
the ExDIN if we want to know the total exposure to the three sets
we define 119875 = 119876 cup 119885 cup 119871 Otherwise if we want to have the ExDIN
wrt to each set namely 119890119875rarr119876 119890119875rarr119885 119890119875rarr119871 we take 120645ℓ119875and sum
up the probabilities of the nodes within each set
Now that we have a metric to compute the exposure to diverseinformation we want to compare the flows among the sets Thus
we introduce the mutual exposure to diverse information
Definition 3 (Mutual exposure to diverse information (M-ExDIN))Let 119890ℓ
119875rarr119875and 119890ℓ
119875rarr119875be the exposure to diverse information of sets 119875
and 119875 We say that the mutual exposure between the sets is
120598ℓ =min119890ℓ
119875rarr119875 119890ℓ119875rarr119875
max119890ℓ119875rarr119875
119890ℓ119875rarr119875
isin [0 1] (6)
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
00 05 10 15 20 25 of pageview from
opposite partition
Pro-lifePro-choice
ProhibitionActivism
Gun controlGun rights
CreazionismEvolutionary biology
LGBT discriminationLGBT rights
RacismAnti-racism
Figure 6 Percentage of pageviews coming from oppositeside Topics in order from the bottom abortion cannabisguns evolution LGBT racism
If either 119890ℓ119875rarr119875
or 119890ℓ119875rarr119875
is 0 then 120598 = 0
This measure quantifies to what extend the exposure to diverse
information is balanced across 119875 and 119875
The closer 120598 is to 1 the more balanced the probabilities of moving
from one set to the other are In this case the network topology does
not favor connections from one set to the other On the other hand
if 120598 is close to 0 it the network structure tends to favor either the
navigation from 119875 rarr 119875 or from 119875 rarr 119875 On this perspective if we
observe a tendency in the network of facilitating the exploration
from one of the sets to the other we may say that the network
topology is biased toward a direction Thus we can think of using
M-ExDIN to measure the bias in the network wrt two sets of
nodes
Even though the mutual exposure to diverse information cap-
tures the balance among ExDIN of 119875 and 119875 when they are of com-
parable size it may fail if one is much smaller than the other For
instance suppose 119875 is 10 times larger than 119875 then if pages of both
partitions have a similar out-degree distribution one would expect
119890ℓ119875rarr119875
asymp 10 middot 119890ℓ119875rarr119875
and as a result 120598 asymp 01 The same happens if
they have similar in-degree distribution For this reason when we
compute either ExDIN and M-ExDIN we check whether the sizes
of the communities are unbalanced and we proceed as follows If
|119875 | lt |119875 | we define 119875 prime obtained by sampling |119875 | articles from 119875
Thus we use the new set for all computations Because of the ran-
domness of the phenomenon we repeat the measurements multiple
times
5 RQ1 READERSrsquo TOPIC CONSUMPTIONBefore looking into how readers are exposed to diverse content
we investigate how they have consumed each of the six topics
that we concentrate on over the last four years In particular we
collect monthly clickstream data from November 2017 to September
2020 We note that when we count the click views of a page we
consider the average over the number of months the page existed16
Accordingly when computing the occurrences for the transitions
matrix based on clickstream we consider the average clicks of the
link over the number of months it exists In this way we reduce
16Based on the temporal graphs extracted by [11]
the seasonality effect and weight links according to page changes
in terms of hyperlinks
51 Pageviews DistributionTo start our analysis we count the average number of times a
page has been visited over 34 months In Figure 5 we plot the
log-distributions of the pageviews for each topic and opinion By
running a t-test we conclude that for all topics except for abortionthe difference of the means of opinionsrsquo pageviews is statistically
significant for 119901 lt 005 This finding demonstrate that users tend to
visit more pages expressingsupporting one of the two viewpoints
From a networkrsquos perspective to increase the exposure to opposing
opinions it is desirable for pages that are frequently visited to be
well connected to articles expressing opposing opinions
In Figure 6 we break down the pageviews showing how many
of them come from pages of the opposing partition Overall the
fraction of visits from the opposite side is low (below 05) The
category LGBT rights has the highest ratio of visits from LGBT dis-crimination pages about 25 For topics such as guns and abortionthe percentage of visits from opposite partition shows that there
are somewhat fewer visits to pages of a liberal inclination from
articles expressing a more conservative opinion In fact the 028
of visits to pro-choice come from pro-life compared to 06 visits of
pro-life from pro-choice
52 External or Internal Access to the TopicWe now investigate how readers access content about a topic As
introduced in Section 411 from the clickstream data we can com-
pute the RSR which indicates whether a page is accessed more
by external sources or by navigating Wikipedia In Figure 7 we
provide a visualization that depicts the flows of the cumulative
visits from external and internal pages towards the two partitions
Referring to Figure 7(c) the 448 of visualizations come from
internal pages The click stream from internal pages is broken down
to see the proportion of flow towards guns control and gun rightsThe internal views of guns control articles are 34 times more than
those of gun rights We observe that also from external websites
most of the traffic is towards gun control (27 times more than gunrights) Overall the 26 of the total visits to gun related content is
concentrated on gun rightsThe abundance of traffic towards one of the two opinions does
not characterize only the guns topic Indeed among all the topics
the 59ndash74 of visits is accumulated by one partition Moreover
readersrsquo preferences appear consistent among external and internal
accesses that is they both point more towards the same view of
the topic For both internal and external views the distribution
of accesses toward partitions is approximately the same (ie the
percentage of visits from external to 119875 (resp 119875 ) is the same of
from internal to 119875 (resp119875 )) The only exception is evolution whoseexternal visits to creationism is 453 lower than internal accesses
We note that partitions with higher views are not necessarily the
biggest in the topic-induced networks
In general the largest amount of visits to topicsrsquo articles comes
from external pages Particularly only the 236 and 335 of traffic
to evolution and racism is generated by the internal Wikipediarsquos
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll PPPPPrrrrrooooo-----llllliiiiifffffeeeee
PPPPPrrrrrooooo-----ccccchhhhhoooooiiiiiccccceeeee
(a) Abortion
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
PPPPPrrrrrooooohhhhhiiiiibbbbbiiiiiiiiiitttttooooonnnnn
AAAAAccccctttttiiiiivvvvviiiiisssssmmmmm
(b) Cannabis
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
CCCCCooooonnnnntttttrrrrrooooolllll
RRRRRiiiiiggggghhhhhtttttsssss
(c) Guns
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllllCCCCCrrrrreeeeeaaaaatttttiiiiiooooonnnnniiiiisssssmmmmm
EEEEEvvvvvooooolllll BBBBBiiiiiooooo
(d) Evolution
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
RRRRRaaaaaccccciiiiisssssmmmmm
AAAAAnnnnntttttiiiii-----rrrrraaaaaccccciiiiisssssmmmmm
(e) Racism
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
DDDDDiiiiissssscccccrrrrriiiiimmmmmiiiiinnnnnaaaaatttttiiiiiooooonnnnn
SSSSSuuuuuppppppppppooooorrrrrttttt
(f) LGBT
Figure 7 Cumulative Pagesrsquo Traffic Eachplot indicates (1) On the left the cumulative amount of accesses coming from externalweb pages or internal Wikipediarsquos articles (2) The flows of visits from external and internal pages to partitions (3) On theright the cumulative accesses to 119875 and 119875
0 5 10 15 20 25 30Avg(Click Through Rate) X 100
Pro-lifePro-choiceProhibition
ActivismGun controlGun rights
CreazionismEvolutionary biologyLGBT discrimination
LGBT rightsRacism
Anti-racism
Figure 8 Average click-through rate In this plot we reportthe average CTR of pages belonging to the same set 119875 (resp119875) The score indicate the average probability that a linkwithin a page 119901 in 119875 (resp 119875) is clicked Topics in orderfrom the bottom abortion cannabis guns evolution racismLGBT
navigation The same quantity for the remaining topics ranges
ranges between 44 and 47
We point out that readersrsquo consulting articles about abortioncannabis and guns are inclined toward pages conveying liberal
views on the topic Instead it is more complicated to draw inter-
pretations about the remaining topics One explanation may be
that users look for information generally less covered in the public
mainstream debate
53 How Much Readers Navigate LinksOnce readers visit a page they can decide to click any of its links
We want to understand how frequently they do so For that we
compute the average pagesrsquo click-through rate (Sect 411)
We plot this information in Figure 8 Overall we see that the
percentage of access turning into a visit to another page ranges
between 10ndash28 Dimitrov and Lemmerich [14] observed that the
CTR average for the whole Wikipedia is 12 So most of the subset
of pages we consider have a CTR higher than Wikipediarsquos average
The CTR of guns control is the highest (28) the pages about racismfollow with 26 The articles that over the years have generated
less internal traffic are those about evolutionary biology and LGBTrights
Examining pagesrsquo connections we found that those with higher
CTR have more links (the Pearson correlation coefficient of is 052)
Topic macrC(119875)
macrC(119875 rarr 119875) macrC(119875 rarr 119875)
macrC(119875) macrC(119875 rarr 119875)
macrC(119875 rarr 119875)Abortion 6889 9082 5764 7026 8981 4537
Cannabis 7081 9501 3750 6578 9652 1667
Guns 5234 7869 3568 5963 7535 3928
Evolution 7115 8449 6447 7269 9900 5513
Racism 5636 8841 3432 7187 9063 6544
LGBT 6166 8942 525 7242 9252 5917
Table 3 Average of links within pages clicked less than 10times
macrC(119875) is the average percentage of un-clicked hyper-linkswithin pages in 119875
macrC(119875 rarr 119875) is the average percentageof un-clicked links within 119875 pointing to 119875
This suggests that articles having higher out-degree offer more
options to users Presumably because of this users are more likely
to continue the exploration from those articles
In addition we count the number of links clicked fewer than 10
times over the last three years see Table 3 As an example given
a page in creationism on average 7115 of its links have been
clicked fewer than 10 times If we distinguish between references
to creationism and references to evolution readers did not click
the 8449 of links pointing to creationism and the 6447 of those
pointing to evolutionary biology
6 RQ2 EXPOSURE ACROSS TOPICVIEWPOINTS
The main contribution of this paper is to examine to what extent
current Wikipediarsquos topology supports users to explore diverse
facets of polarizing issues In particular we study (1) how readers are
locally exposed to diverse information and (2) how their exposure
to plural opinions may change throughout a navigation session
61 Exposure to DiversityTo evaluate the exposure to diversity induced by the networkrsquos
topology we compute the exposure to diverse information for ℓ = 1
using the uniform CwP model Recalling that ℓ indicates the usersrsquo
session length if we set it equals to 1 we study the exposure to
diversity over one-click sessions
Plots in the first row of Figure 9 show the value of ExDIN for
all the topics when CwP is119872119906
For instance let evolution be the topic we analyze If readers
start uniformly at random from a page about creationism the prob-
ability of visiting an article of the same partition is 576 On the
contrary the chances of entering a page about evolutionary biology
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Pro-life
Pro-choice
Unifo
rm
456
748133
061 Prohibiiton
Activism
338
038039
008 Control
Rights
444
478161
116 Creationism
Evol Bio
576
327061
018 Racism
Anti-racism
876
8613
142 Discrimination
Support
424
395063
054
Pro-life
Pro-choice
Posit
ion
428
781115
056 Prohibiiton
Activism
332
037037
008 Control
Rights
468
473138
089 Creationism
Evol Bio
553
329054
016 Racism
Anti-racism
892
834129
159 Discrimination
Support
394
356056
042
Pro-
life
Pro-
choi
ce
Pro-life
Pro-choice
Click
s
1274
1357129
123Pr
ohib
iiton
Activ
ism
Prohibiiton
Activism
864
065081
013
Cont
rol
Righ
ts
Control
Rights
1376
1305208
137
Crea
tioni
sm
Evol
Bio
Creationism
Evol Bio
1157
52069
03
Racis
m
Anti-
racis
m
Racism
Anti-racism
2367
1503116
129
Disc
rimin
atio
n
Supp
ort
Discrimination
Support
1107
653066
053
Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click
is 018 (32 times smaller) On the other hand readers starting
uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion
This probability is 5 times larger than that of visiting creationismpages
It is worth to point out that the current networkrsquos topology not
only nudges users in reading more about the same opinion but
also hinders them to explore diverse content symmetrically Indeed
users reading about evolutionary biology have higher chances of
reading one article about creationism (3 times more) than users from
creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-
tioned observations hold for most of them Moreover we note that
the probabilities to continue the session reading a page of the same
opinion is greater for one of the two partitions of a given topic
Taken together these measurements highlight that the structure
of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to
diverse information smaller than 1
The findings above report the intrinsic capability of the network
to spur users towards diverse content If we want to combine it
with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results
in the second and third row of Figure 9
Referring back to evolution we now consider the matrix corre-
sponding to the ExDIN computed using the position CwP model
(second row) We see that if users click with higher chances links
at the top of the page wrt the uniform model the probabilities are
only slightly modified These modest variations are coherent with
linksrsquo placement within pages Figure 2
For a few topics such as guns the linksrsquo position plays a more
significant role worsening the user exposure to diverse information
Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-
sequence is that the probability of reaching an article supporting
guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the
uniform CwP model Therefore we conclude that for some topics
the placement of links within pages contributes in reducing the
exposure to diverse information In other words users who tend
to click with higher probability the links located towards the be-
ginning of a page have less possibility to read about contrasting
opinions
Finally we analyze how the phenomenon changes when we use
the click CwP model (third row) In this case we assume readers
make the next-click choice similarly to past users Going back to
evolution we immediately observe that the probability to start a
session in creationism and to continue reading about it after one
click grows from 576 of the uniform model to the 1157
For all the topics we verify a significant increment of the proba-
bility to visit pages of the same opinion Simply interpreting this
result we can say that real users click more the links strictly related
to the page they are reading From another perspective combining
this finding with the previous remark saying that the ldquotopology of
the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer
this question but we hope it will be addressed in future works
Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing
diverse view
Before moving to the next section we want to underline that
stronger relations among pages of similar content is an intrinsic
property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a
fact we believe that it would be valuable to provide to editors met-
rics and tools making them aware of the effect that a newcurrent
links have on usersrsquo exposure to diverse information This is not
meant to alter the core and essential intrinsic property ofWikipedia
rather to avoid this property to become harmful when it prevents
users from accessing diverse content
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
1 3 5 7 9 11 13 15
1
2
e PP
Abortion
1 3 5 7 9 11 13 15
00
05
10Cannabis
1 3 5 7 9 11 13 150
1
2
Guns
1 3 5 7 9 11 13 15
05
10Evolution
1 3 5 7 9 11 13 15
05
10
15
Racism
1 3 5 7 9 11 13 15
05
10
Lgbt
1 3 5 7 9 11 13 1500
05
10
15
e PP
1 3 5 7 9 11 13 15
000
025
050
1 3 5 7 9 11 13 15
1
2
1 3 5 7 9 11 13 15
02
04
06
1 3 5 7 9 11 13 1505
10
15
1 3 5 7 9 11 13 15
05
10
15
1 3 5 7 9 11 13 15
40
60
80
100
1 3 5 7 9 11 13 15
25
50
75
1 3 5 7 9 11 13 1525
50
75
100
1 3 5 7 9 11 13 15
40
60
80
1 3 5 7 9 11 13 15
60
80
100
1 3 5 7 9 11 13 15
60
80
100
Number of Clicks
uniform alpha=0position alpha=0clicks alpha=0
uniform alpha=02position alpha=02clicks alpha=02
uniform alpha=1position alpha=1clicks alpha=1
Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42
62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions
longer than 1 and see how their (mutual) exposure to diversity may
change According to the combinations of models employed to mea-
sure the ExDIN and M-ExDIN we provide different insights about
the effect of the current networkrsquos topology on usersrsquo exposure to
diverse content over a navigation session
In Figure 10 for sessions of length 15 we plot the ExDIN from
119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can
notice that each of the topics shows its own trends For this reason
we decide to highlight and provide an explanation for the most
recurrent patterns Moreover for better understanding we suggest
to cross-check the following explanations with the analysis done
above in the paper We start describing how given the current net-
workrsquos topology the ExDIN changes over the course of a navigation
session (first and second row)
(1) The curves corresponding to the same value of 120572 (same color)
show very similar trends Depending on the respective CwP model
(markerrsquos shape) they are shifted up or down This implies that
when users share the same navigation behavior the way they make
the next-click choice plays a crucial role on determining the mag-
nitude of their exposure to diverse content In general the CwP
model (markersrsquo shapes) corresponding to higher exposure is119872119888
followed by119872119906and119872119901
(2) If users navigate mirroring a star-like behavior (green 120572 = 1)
their exposure to the opposite opinion is steady It can slightly
decrease or increase when the probability of clicking links to the
opposite side becomes higher or lower respectively in the first
iterations This happens because these kind of users are only subject
to the exposure of their starting navigation page So the more links
to the opposite partition they click in the first steps the more their
exposure to diversity decreases and vice-versa
(3) The curves of users who randomly navigate the network (sky
blue 120572 = 0) show two trends For both cases after the first few
clicks the ExDIN is lower than at the beginning of the session
Then it inverts the trend In one case it reaches or exceeds the
starting exposure On the other hand it grows getting steady below
the initial exposure The more the destination partition is connected
to the rest of the graph the more users randomly navigating the
network are able to reach it Sometimes ExDIN from 119875 to 119875 of
LGBT the curves start to decrease after many steps This happens
when the pages within the destination partition have been reached
with high probability
(4) Users characterized by a star-like random navigation (blue
120572 = 02) are exposed to diverse content similarly to users exploring
randomly the network but the ExDIN magnitude is greater because
of the possibility of jumping back to the starting point at each step
Given this observation we analyze the ExDIN of guns With the
current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)
are those characterized by a star-like behavior As soon as the users
navigate in a random fashion this probability drops down This is
because the guns rights partition has a number of in-going edges
that prevent random users who walk away to get back to it We
observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to
the guns control partition is higher or comparable to that of star-like
behaving users The probability of reaching guns control after many
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
steps becomes high because of the higher incoming edges of the
partition
From complementary analysis we observe that for sessions
longer than 2 page visits the topology of the graph is not such
to keep random users within the knowledge bubble they accessed
Indeed for all the topics the probability to visit articles of the same
opinion of the starting page becomes close to 0 (refer to Fig 9 to
cross-check the probabilities at the first click) For users with star-
like behavior the probabilities show slightly descending curves
This demonstrates that they tend to pick pages of the same opinion
at the first iterations Due to space constraints figures picturing
these phenomena could not be displayed
Nowwe compare the ExDIN curves of each topic to understand if
users reading about different opinions have equal chances of visiting
each other We recall that to do it we use the Mutual exposure to
diverse information see Sect 42 For all the topics the mutual
exposure to diverse information is lower than 100 meaning that
for none of them does the network topology provide an equal
exposure across opinions If we consider topics like abortion and
guns the longer the usersrsquo session is the more the network topology
prevents readers to symmetrically explore different opinions
In general if we detect low mutuality the topology of the net-
work favors the exploration of one side of the topic more than the
other Moreover we want to stress that all the comments regarding
users navigating according to the uniform CwP model express the
intrinsic topological exposure of the network On Wikipedia two
scenarios determining this phenomenon may be (1) the knowledge
on the encyclopedia is complete but articles are underlinked [49]
that is there is the content and keywords to become anchors thus
we need a strategy to densify the network ensuringmutual exposure
to diversity (2) the knowledge on the encyclopedia is incomplete
that is there are no words to attach links thus the addition of com-
plementary content may be necessary An in depth investigation of
this conditions may be an interesting future work
7 CONCLUSIONSOur work provides a first analysis to understand how the current
Wikipediarsquos network topology assists readers to explore opposite
stances of polarizing topics spanning over sets of articles We for-
malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles
expressing opposing viewpoints The latter evaluates whether the
relationship across diverse views is symmetric that is whether
the flow and the opportunity to go from one side to the other is
comparable for the two directions
We investigate the phenomenon on six polarizing topics (Sect 6)
In addition we also study the overall usersrsquo topics consumption
Our main findings suggest the following
bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-
ing from both external and internal pages suggest that read-
ers are inclined to seek content about one facet of the topic
Most seem to have bias toward liberal content (see Fig 7)
bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse
contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61
the analysis suggests that users reading about an opinion
have higher chances of continuing to explore articles of sim-
ilar views than of the opposite Furthermore for each of the
topics that we explored the users of one of the two views
had substantially more tendency albeit small to visit pages
of the opposing view than the ones of the other one
bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we
observe that the mutual exposure to diverse information is
never achieved by users navigating completely at random
The better one of the two opinions is connected to the rest of
the network the more the graph nudges users toward that
opinion
bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-
mation of the same inclination diminishes On the other
hand if users explore the network with a star-like behavior
their level of exposure to the same opinion is similar to those
who only do one click
In our study we analyze sets of articles assigned to opinions ac-
cording to editorsrsquo crafted categories [44] Although this approach
represents a solid starting point for analysis it can cause article mis-
classification As future work we plan to investigate a more reliable
classification strategy to improve the accuracy of our analysis an
analysis which should include also the content of the articles along
the line information Secondly the performance of a longitudinal
study with the goal of understanding the dynamics that brought
to the current state of the encyclopediarsquos network would provide
further understanding of the usersrsquo behavior Finally in the light
of our findings we deem crucial to design tools to help editors to
contextualize articles within the network such that they are aware
of the effect of links insertion on users knowledge exploration
The prevalence of bias and polarization is well established in
multiple areas of our life and filter bubbles aggravate this phenom-
enon Understanding better how they manifest in Wikipedia (and
other media) is a crucial first step for finding ways to attenuate it
and our hope is that this work is a step towards this goal
Acknowledges Partially supported by the ERCAdvancedGrant
788893 AMDROMA Algorithmic and Mechanism Design Research
in Online Markets and MIUR PRIN project ALGADIMAR Algo-
rithms Games and Digital Markets
REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004
US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem
[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion
networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep
Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18
[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian
Self-Governance in Action Motivating the Policy Lens In ICWSM
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models
IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation
Journal of machine Learning research 3 Jan (2003) 993ndash1022
[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-
work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web
[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content
on famous persons Journal of the American society for information science andtechnology (2011)
[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter
Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM
[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves
Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter
In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs
A complete longitudinal and multi-language dataset of the Wikipedia link net-
works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607
[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena
Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-
ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media
[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho
2016 Extracting Semantics from Random Walks on Wikipedia Comparing
Learning and Counting Methods In Wiki ICWSM
[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-
ferent topic different traffic How search and navigation interplay on Wikipedia
The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28
[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926
[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web
[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding
news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346
[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers
and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320
[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in
Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72
[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In
Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)
[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Political discourse on social media Echo chambers gate-
keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922
[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-
ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671
[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)
[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-
rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282
[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First
Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media
[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013
Models of human navigation in information networks based on decentralized
search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98
[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki
dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-
trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113
[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and
Andreas Hotho 2019 On the right track Analysing and predicting navigation
success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext
and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web
Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web
[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017
How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50
[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of
source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754
[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-
nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and
moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505
[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my
friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726
[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito
2015 Measuring political polarization Twitter shows the two sides of Venezuela
Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018
Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18
[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving
website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624
[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring
Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval
[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk
about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11
[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019
Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-
fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and
Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141
[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014
The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222
[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom
of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn
Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600
[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013
Computing semantic relatedness from human navigational paths A case study
on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70
[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer
2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)
[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-
works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki
WikipediaManual_of_StyleLinking
[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia
Namespace
[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki
WikipediaNeutral_point_of_view
[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect
[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https
doiorg106084m9figshare1305770v22 (2017)
[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing
wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985
- Abstract
- 1 Introduction
- 2 Related Works
- 3 Data Collection
-
- 31 Topic Induced Networks
- 32 General Statistics on Topics Networks
-
- 4 Metrics
-
- 41 Content Consumption
- 42 Exposure Metrics
-
- 5 RQ1 Readers Topic Consumption
-
- 51 Pageviews Distribution
- 52 External or Internal Access to the Topic
- 53 How Much Readers Navigate Links
-
- 6 RQ2 Exposure Across Topic Viewpoints
-
- 61 Exposure to Diversity
- 62 Dynamic Exposure to Diversity
-
- 7 Conclusions
- References
-
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
00 05 10 15 20 25 of pageview from
opposite partition
Pro-lifePro-choice
ProhibitionActivism
Gun controlGun rights
CreazionismEvolutionary biology
LGBT discriminationLGBT rights
RacismAnti-racism
Figure 6 Percentage of pageviews coming from oppositeside Topics in order from the bottom abortion cannabisguns evolution LGBT racism
If either 119890ℓ119875rarr119875
or 119890ℓ119875rarr119875
is 0 then 120598 = 0
This measure quantifies to what extend the exposure to diverse
information is balanced across 119875 and 119875
The closer 120598 is to 1 the more balanced the probabilities of moving
from one set to the other are In this case the network topology does
not favor connections from one set to the other On the other hand
if 120598 is close to 0 it the network structure tends to favor either the
navigation from 119875 rarr 119875 or from 119875 rarr 119875 On this perspective if we
observe a tendency in the network of facilitating the exploration
from one of the sets to the other we may say that the network
topology is biased toward a direction Thus we can think of using
M-ExDIN to measure the bias in the network wrt two sets of
nodes
Even though the mutual exposure to diverse information cap-
tures the balance among ExDIN of 119875 and 119875 when they are of com-
parable size it may fail if one is much smaller than the other For
instance suppose 119875 is 10 times larger than 119875 then if pages of both
partitions have a similar out-degree distribution one would expect
119890ℓ119875rarr119875
asymp 10 middot 119890ℓ119875rarr119875
and as a result 120598 asymp 01 The same happens if
they have similar in-degree distribution For this reason when we
compute either ExDIN and M-ExDIN we check whether the sizes
of the communities are unbalanced and we proceed as follows If
|119875 | lt |119875 | we define 119875 prime obtained by sampling |119875 | articles from 119875
Thus we use the new set for all computations Because of the ran-
domness of the phenomenon we repeat the measurements multiple
times
5 RQ1 READERSrsquo TOPIC CONSUMPTIONBefore looking into how readers are exposed to diverse content
we investigate how they have consumed each of the six topics
that we concentrate on over the last four years In particular we
collect monthly clickstream data from November 2017 to September
2020 We note that when we count the click views of a page we
consider the average over the number of months the page existed16
Accordingly when computing the occurrences for the transitions
matrix based on clickstream we consider the average clicks of the
link over the number of months it exists In this way we reduce
16Based on the temporal graphs extracted by [11]
the seasonality effect and weight links according to page changes
in terms of hyperlinks
51 Pageviews DistributionTo start our analysis we count the average number of times a
page has been visited over 34 months In Figure 5 we plot the
log-distributions of the pageviews for each topic and opinion By
running a t-test we conclude that for all topics except for abortionthe difference of the means of opinionsrsquo pageviews is statistically
significant for 119901 lt 005 This finding demonstrate that users tend to
visit more pages expressingsupporting one of the two viewpoints
From a networkrsquos perspective to increase the exposure to opposing
opinions it is desirable for pages that are frequently visited to be
well connected to articles expressing opposing opinions
In Figure 6 we break down the pageviews showing how many
of them come from pages of the opposing partition Overall the
fraction of visits from the opposite side is low (below 05) The
category LGBT rights has the highest ratio of visits from LGBT dis-crimination pages about 25 For topics such as guns and abortionthe percentage of visits from opposite partition shows that there
are somewhat fewer visits to pages of a liberal inclination from
articles expressing a more conservative opinion In fact the 028
of visits to pro-choice come from pro-life compared to 06 visits of
pro-life from pro-choice
52 External or Internal Access to the TopicWe now investigate how readers access content about a topic As
introduced in Section 411 from the clickstream data we can com-
pute the RSR which indicates whether a page is accessed more
by external sources or by navigating Wikipedia In Figure 7 we
provide a visualization that depicts the flows of the cumulative
visits from external and internal pages towards the two partitions
Referring to Figure 7(c) the 448 of visualizations come from
internal pages The click stream from internal pages is broken down
to see the proportion of flow towards guns control and gun rightsThe internal views of guns control articles are 34 times more than
those of gun rights We observe that also from external websites
most of the traffic is towards gun control (27 times more than gunrights) Overall the 26 of the total visits to gun related content is
concentrated on gun rightsThe abundance of traffic towards one of the two opinions does
not characterize only the guns topic Indeed among all the topics
the 59ndash74 of visits is accumulated by one partition Moreover
readersrsquo preferences appear consistent among external and internal
accesses that is they both point more towards the same view of
the topic For both internal and external views the distribution
of accesses toward partitions is approximately the same (ie the
percentage of visits from external to 119875 (resp 119875 ) is the same of
from internal to 119875 (resp119875 )) The only exception is evolution whoseexternal visits to creationism is 453 lower than internal accesses
We note that partitions with higher views are not necessarily the
biggest in the topic-induced networks
In general the largest amount of visits to topicsrsquo articles comes
from external pages Particularly only the 236 and 335 of traffic
to evolution and racism is generated by the internal Wikipediarsquos
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll PPPPPrrrrrooooo-----llllliiiiifffffeeeee
PPPPPrrrrrooooo-----ccccchhhhhoooooiiiiiccccceeeee
(a) Abortion
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
PPPPPrrrrrooooohhhhhiiiiibbbbbiiiiiiiiiitttttooooonnnnn
AAAAAccccctttttiiiiivvvvviiiiisssssmmmmm
(b) Cannabis
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
CCCCCooooonnnnntttttrrrrrooooolllll
RRRRRiiiiiggggghhhhhtttttsssss
(c) Guns
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllllCCCCCrrrrreeeeeaaaaatttttiiiiiooooonnnnniiiiisssssmmmmm
EEEEEvvvvvooooolllll BBBBBiiiiiooooo
(d) Evolution
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
RRRRRaaaaaccccciiiiisssssmmmmm
AAAAAnnnnntttttiiiii-----rrrrraaaaaccccciiiiisssssmmmmm
(e) Racism
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
DDDDDiiiiissssscccccrrrrriiiiimmmmmiiiiinnnnnaaaaatttttiiiiiooooonnnnn
SSSSSuuuuuppppppppppooooorrrrrttttt
(f) LGBT
Figure 7 Cumulative Pagesrsquo Traffic Eachplot indicates (1) On the left the cumulative amount of accesses coming from externalweb pages or internal Wikipediarsquos articles (2) The flows of visits from external and internal pages to partitions (3) On theright the cumulative accesses to 119875 and 119875
0 5 10 15 20 25 30Avg(Click Through Rate) X 100
Pro-lifePro-choiceProhibition
ActivismGun controlGun rights
CreazionismEvolutionary biologyLGBT discrimination
LGBT rightsRacism
Anti-racism
Figure 8 Average click-through rate In this plot we reportthe average CTR of pages belonging to the same set 119875 (resp119875) The score indicate the average probability that a linkwithin a page 119901 in 119875 (resp 119875) is clicked Topics in orderfrom the bottom abortion cannabis guns evolution racismLGBT
navigation The same quantity for the remaining topics ranges
ranges between 44 and 47
We point out that readersrsquo consulting articles about abortioncannabis and guns are inclined toward pages conveying liberal
views on the topic Instead it is more complicated to draw inter-
pretations about the remaining topics One explanation may be
that users look for information generally less covered in the public
mainstream debate
53 How Much Readers Navigate LinksOnce readers visit a page they can decide to click any of its links
We want to understand how frequently they do so For that we
compute the average pagesrsquo click-through rate (Sect 411)
We plot this information in Figure 8 Overall we see that the
percentage of access turning into a visit to another page ranges
between 10ndash28 Dimitrov and Lemmerich [14] observed that the
CTR average for the whole Wikipedia is 12 So most of the subset
of pages we consider have a CTR higher than Wikipediarsquos average
The CTR of guns control is the highest (28) the pages about racismfollow with 26 The articles that over the years have generated
less internal traffic are those about evolutionary biology and LGBTrights
Examining pagesrsquo connections we found that those with higher
CTR have more links (the Pearson correlation coefficient of is 052)
Topic macrC(119875)
macrC(119875 rarr 119875) macrC(119875 rarr 119875)
macrC(119875) macrC(119875 rarr 119875)
macrC(119875 rarr 119875)Abortion 6889 9082 5764 7026 8981 4537
Cannabis 7081 9501 3750 6578 9652 1667
Guns 5234 7869 3568 5963 7535 3928
Evolution 7115 8449 6447 7269 9900 5513
Racism 5636 8841 3432 7187 9063 6544
LGBT 6166 8942 525 7242 9252 5917
Table 3 Average of links within pages clicked less than 10times
macrC(119875) is the average percentage of un-clicked hyper-linkswithin pages in 119875
macrC(119875 rarr 119875) is the average percentageof un-clicked links within 119875 pointing to 119875
This suggests that articles having higher out-degree offer more
options to users Presumably because of this users are more likely
to continue the exploration from those articles
In addition we count the number of links clicked fewer than 10
times over the last three years see Table 3 As an example given
a page in creationism on average 7115 of its links have been
clicked fewer than 10 times If we distinguish between references
to creationism and references to evolution readers did not click
the 8449 of links pointing to creationism and the 6447 of those
pointing to evolutionary biology
6 RQ2 EXPOSURE ACROSS TOPICVIEWPOINTS
The main contribution of this paper is to examine to what extent
current Wikipediarsquos topology supports users to explore diverse
facets of polarizing issues In particular we study (1) how readers are
locally exposed to diverse information and (2) how their exposure
to plural opinions may change throughout a navigation session
61 Exposure to DiversityTo evaluate the exposure to diversity induced by the networkrsquos
topology we compute the exposure to diverse information for ℓ = 1
using the uniform CwP model Recalling that ℓ indicates the usersrsquo
session length if we set it equals to 1 we study the exposure to
diversity over one-click sessions
Plots in the first row of Figure 9 show the value of ExDIN for
all the topics when CwP is119872119906
For instance let evolution be the topic we analyze If readers
start uniformly at random from a page about creationism the prob-
ability of visiting an article of the same partition is 576 On the
contrary the chances of entering a page about evolutionary biology
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Pro-life
Pro-choice
Unifo
rm
456
748133
061 Prohibiiton
Activism
338
038039
008 Control
Rights
444
478161
116 Creationism
Evol Bio
576
327061
018 Racism
Anti-racism
876
8613
142 Discrimination
Support
424
395063
054
Pro-life
Pro-choice
Posit
ion
428
781115
056 Prohibiiton
Activism
332
037037
008 Control
Rights
468
473138
089 Creationism
Evol Bio
553
329054
016 Racism
Anti-racism
892
834129
159 Discrimination
Support
394
356056
042
Pro-
life
Pro-
choi
ce
Pro-life
Pro-choice
Click
s
1274
1357129
123Pr
ohib
iiton
Activ
ism
Prohibiiton
Activism
864
065081
013
Cont
rol
Righ
ts
Control
Rights
1376
1305208
137
Crea
tioni
sm
Evol
Bio
Creationism
Evol Bio
1157
52069
03
Racis
m
Anti-
racis
m
Racism
Anti-racism
2367
1503116
129
Disc
rimin
atio
n
Supp
ort
Discrimination
Support
1107
653066
053
Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click
is 018 (32 times smaller) On the other hand readers starting
uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion
This probability is 5 times larger than that of visiting creationismpages
It is worth to point out that the current networkrsquos topology not
only nudges users in reading more about the same opinion but
also hinders them to explore diverse content symmetrically Indeed
users reading about evolutionary biology have higher chances of
reading one article about creationism (3 times more) than users from
creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-
tioned observations hold for most of them Moreover we note that
the probabilities to continue the session reading a page of the same
opinion is greater for one of the two partitions of a given topic
Taken together these measurements highlight that the structure
of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to
diverse information smaller than 1
The findings above report the intrinsic capability of the network
to spur users towards diverse content If we want to combine it
with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results
in the second and third row of Figure 9
Referring back to evolution we now consider the matrix corre-
sponding to the ExDIN computed using the position CwP model
(second row) We see that if users click with higher chances links
at the top of the page wrt the uniform model the probabilities are
only slightly modified These modest variations are coherent with
linksrsquo placement within pages Figure 2
For a few topics such as guns the linksrsquo position plays a more
significant role worsening the user exposure to diverse information
Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-
sequence is that the probability of reaching an article supporting
guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the
uniform CwP model Therefore we conclude that for some topics
the placement of links within pages contributes in reducing the
exposure to diverse information In other words users who tend
to click with higher probability the links located towards the be-
ginning of a page have less possibility to read about contrasting
opinions
Finally we analyze how the phenomenon changes when we use
the click CwP model (third row) In this case we assume readers
make the next-click choice similarly to past users Going back to
evolution we immediately observe that the probability to start a
session in creationism and to continue reading about it after one
click grows from 576 of the uniform model to the 1157
For all the topics we verify a significant increment of the proba-
bility to visit pages of the same opinion Simply interpreting this
result we can say that real users click more the links strictly related
to the page they are reading From another perspective combining
this finding with the previous remark saying that the ldquotopology of
the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer
this question but we hope it will be addressed in future works
Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing
diverse view
Before moving to the next section we want to underline that
stronger relations among pages of similar content is an intrinsic
property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a
fact we believe that it would be valuable to provide to editors met-
rics and tools making them aware of the effect that a newcurrent
links have on usersrsquo exposure to diverse information This is not
meant to alter the core and essential intrinsic property ofWikipedia
rather to avoid this property to become harmful when it prevents
users from accessing diverse content
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
1 3 5 7 9 11 13 15
1
2
e PP
Abortion
1 3 5 7 9 11 13 15
00
05
10Cannabis
1 3 5 7 9 11 13 150
1
2
Guns
1 3 5 7 9 11 13 15
05
10Evolution
1 3 5 7 9 11 13 15
05
10
15
Racism
1 3 5 7 9 11 13 15
05
10
Lgbt
1 3 5 7 9 11 13 1500
05
10
15
e PP
1 3 5 7 9 11 13 15
000
025
050
1 3 5 7 9 11 13 15
1
2
1 3 5 7 9 11 13 15
02
04
06
1 3 5 7 9 11 13 1505
10
15
1 3 5 7 9 11 13 15
05
10
15
1 3 5 7 9 11 13 15
40
60
80
100
1 3 5 7 9 11 13 15
25
50
75
1 3 5 7 9 11 13 1525
50
75
100
1 3 5 7 9 11 13 15
40
60
80
1 3 5 7 9 11 13 15
60
80
100
1 3 5 7 9 11 13 15
60
80
100
Number of Clicks
uniform alpha=0position alpha=0clicks alpha=0
uniform alpha=02position alpha=02clicks alpha=02
uniform alpha=1position alpha=1clicks alpha=1
Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42
62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions
longer than 1 and see how their (mutual) exposure to diversity may
change According to the combinations of models employed to mea-
sure the ExDIN and M-ExDIN we provide different insights about
the effect of the current networkrsquos topology on usersrsquo exposure to
diverse content over a navigation session
In Figure 10 for sessions of length 15 we plot the ExDIN from
119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can
notice that each of the topics shows its own trends For this reason
we decide to highlight and provide an explanation for the most
recurrent patterns Moreover for better understanding we suggest
to cross-check the following explanations with the analysis done
above in the paper We start describing how given the current net-
workrsquos topology the ExDIN changes over the course of a navigation
session (first and second row)
(1) The curves corresponding to the same value of 120572 (same color)
show very similar trends Depending on the respective CwP model
(markerrsquos shape) they are shifted up or down This implies that
when users share the same navigation behavior the way they make
the next-click choice plays a crucial role on determining the mag-
nitude of their exposure to diverse content In general the CwP
model (markersrsquo shapes) corresponding to higher exposure is119872119888
followed by119872119906and119872119901
(2) If users navigate mirroring a star-like behavior (green 120572 = 1)
their exposure to the opposite opinion is steady It can slightly
decrease or increase when the probability of clicking links to the
opposite side becomes higher or lower respectively in the first
iterations This happens because these kind of users are only subject
to the exposure of their starting navigation page So the more links
to the opposite partition they click in the first steps the more their
exposure to diversity decreases and vice-versa
(3) The curves of users who randomly navigate the network (sky
blue 120572 = 0) show two trends For both cases after the first few
clicks the ExDIN is lower than at the beginning of the session
Then it inverts the trend In one case it reaches or exceeds the
starting exposure On the other hand it grows getting steady below
the initial exposure The more the destination partition is connected
to the rest of the graph the more users randomly navigating the
network are able to reach it Sometimes ExDIN from 119875 to 119875 of
LGBT the curves start to decrease after many steps This happens
when the pages within the destination partition have been reached
with high probability
(4) Users characterized by a star-like random navigation (blue
120572 = 02) are exposed to diverse content similarly to users exploring
randomly the network but the ExDIN magnitude is greater because
of the possibility of jumping back to the starting point at each step
Given this observation we analyze the ExDIN of guns With the
current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)
are those characterized by a star-like behavior As soon as the users
navigate in a random fashion this probability drops down This is
because the guns rights partition has a number of in-going edges
that prevent random users who walk away to get back to it We
observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to
the guns control partition is higher or comparable to that of star-like
behaving users The probability of reaching guns control after many
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
steps becomes high because of the higher incoming edges of the
partition
From complementary analysis we observe that for sessions
longer than 2 page visits the topology of the graph is not such
to keep random users within the knowledge bubble they accessed
Indeed for all the topics the probability to visit articles of the same
opinion of the starting page becomes close to 0 (refer to Fig 9 to
cross-check the probabilities at the first click) For users with star-
like behavior the probabilities show slightly descending curves
This demonstrates that they tend to pick pages of the same opinion
at the first iterations Due to space constraints figures picturing
these phenomena could not be displayed
Nowwe compare the ExDIN curves of each topic to understand if
users reading about different opinions have equal chances of visiting
each other We recall that to do it we use the Mutual exposure to
diverse information see Sect 42 For all the topics the mutual
exposure to diverse information is lower than 100 meaning that
for none of them does the network topology provide an equal
exposure across opinions If we consider topics like abortion and
guns the longer the usersrsquo session is the more the network topology
prevents readers to symmetrically explore different opinions
In general if we detect low mutuality the topology of the net-
work favors the exploration of one side of the topic more than the
other Moreover we want to stress that all the comments regarding
users navigating according to the uniform CwP model express the
intrinsic topological exposure of the network On Wikipedia two
scenarios determining this phenomenon may be (1) the knowledge
on the encyclopedia is complete but articles are underlinked [49]
that is there is the content and keywords to become anchors thus
we need a strategy to densify the network ensuringmutual exposure
to diversity (2) the knowledge on the encyclopedia is incomplete
that is there are no words to attach links thus the addition of com-
plementary content may be necessary An in depth investigation of
this conditions may be an interesting future work
7 CONCLUSIONSOur work provides a first analysis to understand how the current
Wikipediarsquos network topology assists readers to explore opposite
stances of polarizing topics spanning over sets of articles We for-
malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles
expressing opposing viewpoints The latter evaluates whether the
relationship across diverse views is symmetric that is whether
the flow and the opportunity to go from one side to the other is
comparable for the two directions
We investigate the phenomenon on six polarizing topics (Sect 6)
In addition we also study the overall usersrsquo topics consumption
Our main findings suggest the following
bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-
ing from both external and internal pages suggest that read-
ers are inclined to seek content about one facet of the topic
Most seem to have bias toward liberal content (see Fig 7)
bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse
contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61
the analysis suggests that users reading about an opinion
have higher chances of continuing to explore articles of sim-
ilar views than of the opposite Furthermore for each of the
topics that we explored the users of one of the two views
had substantially more tendency albeit small to visit pages
of the opposing view than the ones of the other one
bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we
observe that the mutual exposure to diverse information is
never achieved by users navigating completely at random
The better one of the two opinions is connected to the rest of
the network the more the graph nudges users toward that
opinion
bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-
mation of the same inclination diminishes On the other
hand if users explore the network with a star-like behavior
their level of exposure to the same opinion is similar to those
who only do one click
In our study we analyze sets of articles assigned to opinions ac-
cording to editorsrsquo crafted categories [44] Although this approach
represents a solid starting point for analysis it can cause article mis-
classification As future work we plan to investigate a more reliable
classification strategy to improve the accuracy of our analysis an
analysis which should include also the content of the articles along
the line information Secondly the performance of a longitudinal
study with the goal of understanding the dynamics that brought
to the current state of the encyclopediarsquos network would provide
further understanding of the usersrsquo behavior Finally in the light
of our findings we deem crucial to design tools to help editors to
contextualize articles within the network such that they are aware
of the effect of links insertion on users knowledge exploration
The prevalence of bias and polarization is well established in
multiple areas of our life and filter bubbles aggravate this phenom-
enon Understanding better how they manifest in Wikipedia (and
other media) is a crucial first step for finding ways to attenuate it
and our hope is that this work is a step towards this goal
Acknowledges Partially supported by the ERCAdvancedGrant
788893 AMDROMA Algorithmic and Mechanism Design Research
in Online Markets and MIUR PRIN project ALGADIMAR Algo-
rithms Games and Digital Markets
REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004
US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem
[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion
networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep
Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18
[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian
Self-Governance in Action Motivating the Policy Lens In ICWSM
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models
IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation
Journal of machine Learning research 3 Jan (2003) 993ndash1022
[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-
work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web
[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content
on famous persons Journal of the American society for information science andtechnology (2011)
[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter
Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM
[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves
Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter
In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs
A complete longitudinal and multi-language dataset of the Wikipedia link net-
works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607
[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena
Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-
ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media
[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho
2016 Extracting Semantics from Random Walks on Wikipedia Comparing
Learning and Counting Methods In Wiki ICWSM
[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-
ferent topic different traffic How search and navigation interplay on Wikipedia
The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28
[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926
[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web
[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding
news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346
[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers
and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320
[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in
Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72
[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In
Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)
[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Political discourse on social media Echo chambers gate-
keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922
[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-
ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671
[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)
[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-
rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282
[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First
Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media
[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013
Models of human navigation in information networks based on decentralized
search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98
[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki
dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-
trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113
[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and
Andreas Hotho 2019 On the right track Analysing and predicting navigation
success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext
and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web
Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web
[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017
How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50
[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of
source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754
[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-
nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and
moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505
[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my
friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726
[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito
2015 Measuring political polarization Twitter shows the two sides of Venezuela
Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018
Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18
[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving
website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624
[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring
Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval
[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk
about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11
[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019
Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-
fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and
Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141
[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014
The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222
[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom
of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn
Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600
[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013
Computing semantic relatedness from human navigational paths A case study
on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70
[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer
2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)
[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-
works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki
WikipediaManual_of_StyleLinking
[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia
Namespace
[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki
WikipediaNeutral_point_of_view
[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect
[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https
doiorg106084m9figshare1305770v22 (2017)
[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing
wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985
- Abstract
- 1 Introduction
- 2 Related Works
- 3 Data Collection
-
- 31 Topic Induced Networks
- 32 General Statistics on Topics Networks
-
- 4 Metrics
-
- 41 Content Consumption
- 42 Exposure Metrics
-
- 5 RQ1 Readers Topic Consumption
-
- 51 Pageviews Distribution
- 52 External or Internal Access to the Topic
- 53 How Much Readers Navigate Links
-
- 6 RQ2 Exposure Across Topic Viewpoints
-
- 61 Exposure to Diversity
- 62 Dynamic Exposure to Diversity
-
- 7 Conclusions
- References
-
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll PPPPPrrrrrooooo-----llllliiiiifffffeeeee
PPPPPrrrrrooooo-----ccccchhhhhoooooiiiiiccccceeeee
(a) Abortion
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
PPPPPrrrrrooooohhhhhiiiiibbbbbiiiiiiiiiitttttooooonnnnn
AAAAAccccctttttiiiiivvvvviiiiisssssmmmmm
(b) Cannabis
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
CCCCCooooonnnnntttttrrrrrooooolllll
RRRRRiiiiiggggghhhhhtttttsssss
(c) Guns
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllllCCCCCrrrrreeeeeaaaaatttttiiiiiooooonnnnniiiiisssssmmmmm
EEEEEvvvvvooooolllll BBBBBiiiiiooooo
(d) Evolution
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
RRRRRaaaaaccccciiiiisssssmmmmm
AAAAAnnnnntttttiiiii-----rrrrraaaaaccccciiiiisssssmmmmm
(e) Racism
EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll
IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll
DDDDDiiiiissssscccccrrrrriiiiimmmmmiiiiinnnnnaaaaatttttiiiiiooooonnnnn
SSSSSuuuuuppppppppppooooorrrrrttttt
(f) LGBT
Figure 7 Cumulative Pagesrsquo Traffic Eachplot indicates (1) On the left the cumulative amount of accesses coming from externalweb pages or internal Wikipediarsquos articles (2) The flows of visits from external and internal pages to partitions (3) On theright the cumulative accesses to 119875 and 119875
0 5 10 15 20 25 30Avg(Click Through Rate) X 100
Pro-lifePro-choiceProhibition
ActivismGun controlGun rights
CreazionismEvolutionary biologyLGBT discrimination
LGBT rightsRacism
Anti-racism
Figure 8 Average click-through rate In this plot we reportthe average CTR of pages belonging to the same set 119875 (resp119875) The score indicate the average probability that a linkwithin a page 119901 in 119875 (resp 119875) is clicked Topics in orderfrom the bottom abortion cannabis guns evolution racismLGBT
navigation The same quantity for the remaining topics ranges
ranges between 44 and 47
We point out that readersrsquo consulting articles about abortioncannabis and guns are inclined toward pages conveying liberal
views on the topic Instead it is more complicated to draw inter-
pretations about the remaining topics One explanation may be
that users look for information generally less covered in the public
mainstream debate
53 How Much Readers Navigate LinksOnce readers visit a page they can decide to click any of its links
We want to understand how frequently they do so For that we
compute the average pagesrsquo click-through rate (Sect 411)
We plot this information in Figure 8 Overall we see that the
percentage of access turning into a visit to another page ranges
between 10ndash28 Dimitrov and Lemmerich [14] observed that the
CTR average for the whole Wikipedia is 12 So most of the subset
of pages we consider have a CTR higher than Wikipediarsquos average
The CTR of guns control is the highest (28) the pages about racismfollow with 26 The articles that over the years have generated
less internal traffic are those about evolutionary biology and LGBTrights
Examining pagesrsquo connections we found that those with higher
CTR have more links (the Pearson correlation coefficient of is 052)
Topic macrC(119875)
macrC(119875 rarr 119875) macrC(119875 rarr 119875)
macrC(119875) macrC(119875 rarr 119875)
macrC(119875 rarr 119875)Abortion 6889 9082 5764 7026 8981 4537
Cannabis 7081 9501 3750 6578 9652 1667
Guns 5234 7869 3568 5963 7535 3928
Evolution 7115 8449 6447 7269 9900 5513
Racism 5636 8841 3432 7187 9063 6544
LGBT 6166 8942 525 7242 9252 5917
Table 3 Average of links within pages clicked less than 10times
macrC(119875) is the average percentage of un-clicked hyper-linkswithin pages in 119875
macrC(119875 rarr 119875) is the average percentageof un-clicked links within 119875 pointing to 119875
This suggests that articles having higher out-degree offer more
options to users Presumably because of this users are more likely
to continue the exploration from those articles
In addition we count the number of links clicked fewer than 10
times over the last three years see Table 3 As an example given
a page in creationism on average 7115 of its links have been
clicked fewer than 10 times If we distinguish between references
to creationism and references to evolution readers did not click
the 8449 of links pointing to creationism and the 6447 of those
pointing to evolutionary biology
6 RQ2 EXPOSURE ACROSS TOPICVIEWPOINTS
The main contribution of this paper is to examine to what extent
current Wikipediarsquos topology supports users to explore diverse
facets of polarizing issues In particular we study (1) how readers are
locally exposed to diverse information and (2) how their exposure
to plural opinions may change throughout a navigation session
61 Exposure to DiversityTo evaluate the exposure to diversity induced by the networkrsquos
topology we compute the exposure to diverse information for ℓ = 1
using the uniform CwP model Recalling that ℓ indicates the usersrsquo
session length if we set it equals to 1 we study the exposure to
diversity over one-click sessions
Plots in the first row of Figure 9 show the value of ExDIN for
all the topics when CwP is119872119906
For instance let evolution be the topic we analyze If readers
start uniformly at random from a page about creationism the prob-
ability of visiting an article of the same partition is 576 On the
contrary the chances of entering a page about evolutionary biology
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Pro-life
Pro-choice
Unifo
rm
456
748133
061 Prohibiiton
Activism
338
038039
008 Control
Rights
444
478161
116 Creationism
Evol Bio
576
327061
018 Racism
Anti-racism
876
8613
142 Discrimination
Support
424
395063
054
Pro-life
Pro-choice
Posit
ion
428
781115
056 Prohibiiton
Activism
332
037037
008 Control
Rights
468
473138
089 Creationism
Evol Bio
553
329054
016 Racism
Anti-racism
892
834129
159 Discrimination
Support
394
356056
042
Pro-
life
Pro-
choi
ce
Pro-life
Pro-choice
Click
s
1274
1357129
123Pr
ohib
iiton
Activ
ism
Prohibiiton
Activism
864
065081
013
Cont
rol
Righ
ts
Control
Rights
1376
1305208
137
Crea
tioni
sm
Evol
Bio
Creationism
Evol Bio
1157
52069
03
Racis
m
Anti-
racis
m
Racism
Anti-racism
2367
1503116
129
Disc
rimin
atio
n
Supp
ort
Discrimination
Support
1107
653066
053
Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click
is 018 (32 times smaller) On the other hand readers starting
uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion
This probability is 5 times larger than that of visiting creationismpages
It is worth to point out that the current networkrsquos topology not
only nudges users in reading more about the same opinion but
also hinders them to explore diverse content symmetrically Indeed
users reading about evolutionary biology have higher chances of
reading one article about creationism (3 times more) than users from
creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-
tioned observations hold for most of them Moreover we note that
the probabilities to continue the session reading a page of the same
opinion is greater for one of the two partitions of a given topic
Taken together these measurements highlight that the structure
of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to
diverse information smaller than 1
The findings above report the intrinsic capability of the network
to spur users towards diverse content If we want to combine it
with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results
in the second and third row of Figure 9
Referring back to evolution we now consider the matrix corre-
sponding to the ExDIN computed using the position CwP model
(second row) We see that if users click with higher chances links
at the top of the page wrt the uniform model the probabilities are
only slightly modified These modest variations are coherent with
linksrsquo placement within pages Figure 2
For a few topics such as guns the linksrsquo position plays a more
significant role worsening the user exposure to diverse information
Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-
sequence is that the probability of reaching an article supporting
guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the
uniform CwP model Therefore we conclude that for some topics
the placement of links within pages contributes in reducing the
exposure to diverse information In other words users who tend
to click with higher probability the links located towards the be-
ginning of a page have less possibility to read about contrasting
opinions
Finally we analyze how the phenomenon changes when we use
the click CwP model (third row) In this case we assume readers
make the next-click choice similarly to past users Going back to
evolution we immediately observe that the probability to start a
session in creationism and to continue reading about it after one
click grows from 576 of the uniform model to the 1157
For all the topics we verify a significant increment of the proba-
bility to visit pages of the same opinion Simply interpreting this
result we can say that real users click more the links strictly related
to the page they are reading From another perspective combining
this finding with the previous remark saying that the ldquotopology of
the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer
this question but we hope it will be addressed in future works
Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing
diverse view
Before moving to the next section we want to underline that
stronger relations among pages of similar content is an intrinsic
property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a
fact we believe that it would be valuable to provide to editors met-
rics and tools making them aware of the effect that a newcurrent
links have on usersrsquo exposure to diverse information This is not
meant to alter the core and essential intrinsic property ofWikipedia
rather to avoid this property to become harmful when it prevents
users from accessing diverse content
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
1 3 5 7 9 11 13 15
1
2
e PP
Abortion
1 3 5 7 9 11 13 15
00
05
10Cannabis
1 3 5 7 9 11 13 150
1
2
Guns
1 3 5 7 9 11 13 15
05
10Evolution
1 3 5 7 9 11 13 15
05
10
15
Racism
1 3 5 7 9 11 13 15
05
10
Lgbt
1 3 5 7 9 11 13 1500
05
10
15
e PP
1 3 5 7 9 11 13 15
000
025
050
1 3 5 7 9 11 13 15
1
2
1 3 5 7 9 11 13 15
02
04
06
1 3 5 7 9 11 13 1505
10
15
1 3 5 7 9 11 13 15
05
10
15
1 3 5 7 9 11 13 15
40
60
80
100
1 3 5 7 9 11 13 15
25
50
75
1 3 5 7 9 11 13 1525
50
75
100
1 3 5 7 9 11 13 15
40
60
80
1 3 5 7 9 11 13 15
60
80
100
1 3 5 7 9 11 13 15
60
80
100
Number of Clicks
uniform alpha=0position alpha=0clicks alpha=0
uniform alpha=02position alpha=02clicks alpha=02
uniform alpha=1position alpha=1clicks alpha=1
Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42
62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions
longer than 1 and see how their (mutual) exposure to diversity may
change According to the combinations of models employed to mea-
sure the ExDIN and M-ExDIN we provide different insights about
the effect of the current networkrsquos topology on usersrsquo exposure to
diverse content over a navigation session
In Figure 10 for sessions of length 15 we plot the ExDIN from
119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can
notice that each of the topics shows its own trends For this reason
we decide to highlight and provide an explanation for the most
recurrent patterns Moreover for better understanding we suggest
to cross-check the following explanations with the analysis done
above in the paper We start describing how given the current net-
workrsquos topology the ExDIN changes over the course of a navigation
session (first and second row)
(1) The curves corresponding to the same value of 120572 (same color)
show very similar trends Depending on the respective CwP model
(markerrsquos shape) they are shifted up or down This implies that
when users share the same navigation behavior the way they make
the next-click choice plays a crucial role on determining the mag-
nitude of their exposure to diverse content In general the CwP
model (markersrsquo shapes) corresponding to higher exposure is119872119888
followed by119872119906and119872119901
(2) If users navigate mirroring a star-like behavior (green 120572 = 1)
their exposure to the opposite opinion is steady It can slightly
decrease or increase when the probability of clicking links to the
opposite side becomes higher or lower respectively in the first
iterations This happens because these kind of users are only subject
to the exposure of their starting navigation page So the more links
to the opposite partition they click in the first steps the more their
exposure to diversity decreases and vice-versa
(3) The curves of users who randomly navigate the network (sky
blue 120572 = 0) show two trends For both cases after the first few
clicks the ExDIN is lower than at the beginning of the session
Then it inverts the trend In one case it reaches or exceeds the
starting exposure On the other hand it grows getting steady below
the initial exposure The more the destination partition is connected
to the rest of the graph the more users randomly navigating the
network are able to reach it Sometimes ExDIN from 119875 to 119875 of
LGBT the curves start to decrease after many steps This happens
when the pages within the destination partition have been reached
with high probability
(4) Users characterized by a star-like random navigation (blue
120572 = 02) are exposed to diverse content similarly to users exploring
randomly the network but the ExDIN magnitude is greater because
of the possibility of jumping back to the starting point at each step
Given this observation we analyze the ExDIN of guns With the
current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)
are those characterized by a star-like behavior As soon as the users
navigate in a random fashion this probability drops down This is
because the guns rights partition has a number of in-going edges
that prevent random users who walk away to get back to it We
observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to
the guns control partition is higher or comparable to that of star-like
behaving users The probability of reaching guns control after many
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
steps becomes high because of the higher incoming edges of the
partition
From complementary analysis we observe that for sessions
longer than 2 page visits the topology of the graph is not such
to keep random users within the knowledge bubble they accessed
Indeed for all the topics the probability to visit articles of the same
opinion of the starting page becomes close to 0 (refer to Fig 9 to
cross-check the probabilities at the first click) For users with star-
like behavior the probabilities show slightly descending curves
This demonstrates that they tend to pick pages of the same opinion
at the first iterations Due to space constraints figures picturing
these phenomena could not be displayed
Nowwe compare the ExDIN curves of each topic to understand if
users reading about different opinions have equal chances of visiting
each other We recall that to do it we use the Mutual exposure to
diverse information see Sect 42 For all the topics the mutual
exposure to diverse information is lower than 100 meaning that
for none of them does the network topology provide an equal
exposure across opinions If we consider topics like abortion and
guns the longer the usersrsquo session is the more the network topology
prevents readers to symmetrically explore different opinions
In general if we detect low mutuality the topology of the net-
work favors the exploration of one side of the topic more than the
other Moreover we want to stress that all the comments regarding
users navigating according to the uniform CwP model express the
intrinsic topological exposure of the network On Wikipedia two
scenarios determining this phenomenon may be (1) the knowledge
on the encyclopedia is complete but articles are underlinked [49]
that is there is the content and keywords to become anchors thus
we need a strategy to densify the network ensuringmutual exposure
to diversity (2) the knowledge on the encyclopedia is incomplete
that is there are no words to attach links thus the addition of com-
plementary content may be necessary An in depth investigation of
this conditions may be an interesting future work
7 CONCLUSIONSOur work provides a first analysis to understand how the current
Wikipediarsquos network topology assists readers to explore opposite
stances of polarizing topics spanning over sets of articles We for-
malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles
expressing opposing viewpoints The latter evaluates whether the
relationship across diverse views is symmetric that is whether
the flow and the opportunity to go from one side to the other is
comparable for the two directions
We investigate the phenomenon on six polarizing topics (Sect 6)
In addition we also study the overall usersrsquo topics consumption
Our main findings suggest the following
bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-
ing from both external and internal pages suggest that read-
ers are inclined to seek content about one facet of the topic
Most seem to have bias toward liberal content (see Fig 7)
bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse
contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61
the analysis suggests that users reading about an opinion
have higher chances of continuing to explore articles of sim-
ilar views than of the opposite Furthermore for each of the
topics that we explored the users of one of the two views
had substantially more tendency albeit small to visit pages
of the opposing view than the ones of the other one
bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we
observe that the mutual exposure to diverse information is
never achieved by users navigating completely at random
The better one of the two opinions is connected to the rest of
the network the more the graph nudges users toward that
opinion
bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-
mation of the same inclination diminishes On the other
hand if users explore the network with a star-like behavior
their level of exposure to the same opinion is similar to those
who only do one click
In our study we analyze sets of articles assigned to opinions ac-
cording to editorsrsquo crafted categories [44] Although this approach
represents a solid starting point for analysis it can cause article mis-
classification As future work we plan to investigate a more reliable
classification strategy to improve the accuracy of our analysis an
analysis which should include also the content of the articles along
the line information Secondly the performance of a longitudinal
study with the goal of understanding the dynamics that brought
to the current state of the encyclopediarsquos network would provide
further understanding of the usersrsquo behavior Finally in the light
of our findings we deem crucial to design tools to help editors to
contextualize articles within the network such that they are aware
of the effect of links insertion on users knowledge exploration
The prevalence of bias and polarization is well established in
multiple areas of our life and filter bubbles aggravate this phenom-
enon Understanding better how they manifest in Wikipedia (and
other media) is a crucial first step for finding ways to attenuate it
and our hope is that this work is a step towards this goal
Acknowledges Partially supported by the ERCAdvancedGrant
788893 AMDROMA Algorithmic and Mechanism Design Research
in Online Markets and MIUR PRIN project ALGADIMAR Algo-
rithms Games and Digital Markets
REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004
US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem
[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion
networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep
Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18
[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian
Self-Governance in Action Motivating the Policy Lens In ICWSM
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models
IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation
Journal of machine Learning research 3 Jan (2003) 993ndash1022
[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-
work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web
[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content
on famous persons Journal of the American society for information science andtechnology (2011)
[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter
Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM
[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves
Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter
In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs
A complete longitudinal and multi-language dataset of the Wikipedia link net-
works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607
[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena
Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-
ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media
[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho
2016 Extracting Semantics from Random Walks on Wikipedia Comparing
Learning and Counting Methods In Wiki ICWSM
[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-
ferent topic different traffic How search and navigation interplay on Wikipedia
The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28
[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926
[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web
[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding
news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346
[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers
and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320
[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in
Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72
[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In
Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)
[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Political discourse on social media Echo chambers gate-
keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922
[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-
ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671
[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)
[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-
rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282
[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First
Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media
[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013
Models of human navigation in information networks based on decentralized
search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98
[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki
dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-
trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113
[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and
Andreas Hotho 2019 On the right track Analysing and predicting navigation
success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext
and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web
Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web
[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017
How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50
[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of
source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754
[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-
nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and
moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505
[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my
friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726
[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito
2015 Measuring political polarization Twitter shows the two sides of Venezuela
Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018
Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18
[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving
website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624
[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring
Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval
[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk
about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11
[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019
Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-
fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and
Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141
[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014
The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222
[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom
of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn
Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600
[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013
Computing semantic relatedness from human navigational paths A case study
on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70
[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer
2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)
[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-
works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki
WikipediaManual_of_StyleLinking
[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia
Namespace
[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki
WikipediaNeutral_point_of_view
[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect
[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https
doiorg106084m9figshare1305770v22 (2017)
[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing
wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985
- Abstract
- 1 Introduction
- 2 Related Works
- 3 Data Collection
-
- 31 Topic Induced Networks
- 32 General Statistics on Topics Networks
-
- 4 Metrics
-
- 41 Content Consumption
- 42 Exposure Metrics
-
- 5 RQ1 Readers Topic Consumption
-
- 51 Pageviews Distribution
- 52 External or Internal Access to the Topic
- 53 How Much Readers Navigate Links
-
- 6 RQ2 Exposure Across Topic Viewpoints
-
- 61 Exposure to Diversity
- 62 Dynamic Exposure to Diversity
-
- 7 Conclusions
- References
-
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
Pro-life
Pro-choice
Unifo
rm
456
748133
061 Prohibiiton
Activism
338
038039
008 Control
Rights
444
478161
116 Creationism
Evol Bio
576
327061
018 Racism
Anti-racism
876
8613
142 Discrimination
Support
424
395063
054
Pro-life
Pro-choice
Posit
ion
428
781115
056 Prohibiiton
Activism
332
037037
008 Control
Rights
468
473138
089 Creationism
Evol Bio
553
329054
016 Racism
Anti-racism
892
834129
159 Discrimination
Support
394
356056
042
Pro-
life
Pro-
choi
ce
Pro-life
Pro-choice
Click
s
1274
1357129
123Pr
ohib
iiton
Activ
ism
Prohibiiton
Activism
864
065081
013
Cont
rol
Righ
ts
Control
Rights
1376
1305208
137
Crea
tioni
sm
Evol
Bio
Creationism
Evol Bio
1157
52069
03
Racis
m
Anti-
racis
m
Racism
Anti-racism
2367
1503116
129
Disc
rimin
atio
n
Supp
ort
Discrimination
Support
1107
653066
053
Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click
is 018 (32 times smaller) On the other hand readers starting
uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion
This probability is 5 times larger than that of visiting creationismpages
It is worth to point out that the current networkrsquos topology not
only nudges users in reading more about the same opinion but
also hinders them to explore diverse content symmetrically Indeed
users reading about evolutionary biology have higher chances of
reading one article about creationism (3 times more) than users from
creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-
tioned observations hold for most of them Moreover we note that
the probabilities to continue the session reading a page of the same
opinion is greater for one of the two partitions of a given topic
Taken together these measurements highlight that the structure
of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to
diverse information smaller than 1
The findings above report the intrinsic capability of the network
to spur users towards diverse content If we want to combine it
with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results
in the second and third row of Figure 9
Referring back to evolution we now consider the matrix corre-
sponding to the ExDIN computed using the position CwP model
(second row) We see that if users click with higher chances links
at the top of the page wrt the uniform model the probabilities are
only slightly modified These modest variations are coherent with
linksrsquo placement within pages Figure 2
For a few topics such as guns the linksrsquo position plays a more
significant role worsening the user exposure to diverse information
Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-
sequence is that the probability of reaching an article supporting
guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the
uniform CwP model Therefore we conclude that for some topics
the placement of links within pages contributes in reducing the
exposure to diverse information In other words users who tend
to click with higher probability the links located towards the be-
ginning of a page have less possibility to read about contrasting
opinions
Finally we analyze how the phenomenon changes when we use
the click CwP model (third row) In this case we assume readers
make the next-click choice similarly to past users Going back to
evolution we immediately observe that the probability to start a
session in creationism and to continue reading about it after one
click grows from 576 of the uniform model to the 1157
For all the topics we verify a significant increment of the proba-
bility to visit pages of the same opinion Simply interpreting this
result we can say that real users click more the links strictly related
to the page they are reading From another perspective combining
this finding with the previous remark saying that the ldquotopology of
the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer
this question but we hope it will be addressed in future works
Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing
diverse view
Before moving to the next section we want to underline that
stronger relations among pages of similar content is an intrinsic
property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a
fact we believe that it would be valuable to provide to editors met-
rics and tools making them aware of the effect that a newcurrent
links have on usersrsquo exposure to diverse information This is not
meant to alter the core and essential intrinsic property ofWikipedia
rather to avoid this property to become harmful when it prevents
users from accessing diverse content
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
1 3 5 7 9 11 13 15
1
2
e PP
Abortion
1 3 5 7 9 11 13 15
00
05
10Cannabis
1 3 5 7 9 11 13 150
1
2
Guns
1 3 5 7 9 11 13 15
05
10Evolution
1 3 5 7 9 11 13 15
05
10
15
Racism
1 3 5 7 9 11 13 15
05
10
Lgbt
1 3 5 7 9 11 13 1500
05
10
15
e PP
1 3 5 7 9 11 13 15
000
025
050
1 3 5 7 9 11 13 15
1
2
1 3 5 7 9 11 13 15
02
04
06
1 3 5 7 9 11 13 1505
10
15
1 3 5 7 9 11 13 15
05
10
15
1 3 5 7 9 11 13 15
40
60
80
100
1 3 5 7 9 11 13 15
25
50
75
1 3 5 7 9 11 13 1525
50
75
100
1 3 5 7 9 11 13 15
40
60
80
1 3 5 7 9 11 13 15
60
80
100
1 3 5 7 9 11 13 15
60
80
100
Number of Clicks
uniform alpha=0position alpha=0clicks alpha=0
uniform alpha=02position alpha=02clicks alpha=02
uniform alpha=1position alpha=1clicks alpha=1
Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42
62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions
longer than 1 and see how their (mutual) exposure to diversity may
change According to the combinations of models employed to mea-
sure the ExDIN and M-ExDIN we provide different insights about
the effect of the current networkrsquos topology on usersrsquo exposure to
diverse content over a navigation session
In Figure 10 for sessions of length 15 we plot the ExDIN from
119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can
notice that each of the topics shows its own trends For this reason
we decide to highlight and provide an explanation for the most
recurrent patterns Moreover for better understanding we suggest
to cross-check the following explanations with the analysis done
above in the paper We start describing how given the current net-
workrsquos topology the ExDIN changes over the course of a navigation
session (first and second row)
(1) The curves corresponding to the same value of 120572 (same color)
show very similar trends Depending on the respective CwP model
(markerrsquos shape) they are shifted up or down This implies that
when users share the same navigation behavior the way they make
the next-click choice plays a crucial role on determining the mag-
nitude of their exposure to diverse content In general the CwP
model (markersrsquo shapes) corresponding to higher exposure is119872119888
followed by119872119906and119872119901
(2) If users navigate mirroring a star-like behavior (green 120572 = 1)
their exposure to the opposite opinion is steady It can slightly
decrease or increase when the probability of clicking links to the
opposite side becomes higher or lower respectively in the first
iterations This happens because these kind of users are only subject
to the exposure of their starting navigation page So the more links
to the opposite partition they click in the first steps the more their
exposure to diversity decreases and vice-versa
(3) The curves of users who randomly navigate the network (sky
blue 120572 = 0) show two trends For both cases after the first few
clicks the ExDIN is lower than at the beginning of the session
Then it inverts the trend In one case it reaches or exceeds the
starting exposure On the other hand it grows getting steady below
the initial exposure The more the destination partition is connected
to the rest of the graph the more users randomly navigating the
network are able to reach it Sometimes ExDIN from 119875 to 119875 of
LGBT the curves start to decrease after many steps This happens
when the pages within the destination partition have been reached
with high probability
(4) Users characterized by a star-like random navigation (blue
120572 = 02) are exposed to diverse content similarly to users exploring
randomly the network but the ExDIN magnitude is greater because
of the possibility of jumping back to the starting point at each step
Given this observation we analyze the ExDIN of guns With the
current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)
are those characterized by a star-like behavior As soon as the users
navigate in a random fashion this probability drops down This is
because the guns rights partition has a number of in-going edges
that prevent random users who walk away to get back to it We
observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to
the guns control partition is higher or comparable to that of star-like
behaving users The probability of reaching guns control after many
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
steps becomes high because of the higher incoming edges of the
partition
From complementary analysis we observe that for sessions
longer than 2 page visits the topology of the graph is not such
to keep random users within the knowledge bubble they accessed
Indeed for all the topics the probability to visit articles of the same
opinion of the starting page becomes close to 0 (refer to Fig 9 to
cross-check the probabilities at the first click) For users with star-
like behavior the probabilities show slightly descending curves
This demonstrates that they tend to pick pages of the same opinion
at the first iterations Due to space constraints figures picturing
these phenomena could not be displayed
Nowwe compare the ExDIN curves of each topic to understand if
users reading about different opinions have equal chances of visiting
each other We recall that to do it we use the Mutual exposure to
diverse information see Sect 42 For all the topics the mutual
exposure to diverse information is lower than 100 meaning that
for none of them does the network topology provide an equal
exposure across opinions If we consider topics like abortion and
guns the longer the usersrsquo session is the more the network topology
prevents readers to symmetrically explore different opinions
In general if we detect low mutuality the topology of the net-
work favors the exploration of one side of the topic more than the
other Moreover we want to stress that all the comments regarding
users navigating according to the uniform CwP model express the
intrinsic topological exposure of the network On Wikipedia two
scenarios determining this phenomenon may be (1) the knowledge
on the encyclopedia is complete but articles are underlinked [49]
that is there is the content and keywords to become anchors thus
we need a strategy to densify the network ensuringmutual exposure
to diversity (2) the knowledge on the encyclopedia is incomplete
that is there are no words to attach links thus the addition of com-
plementary content may be necessary An in depth investigation of
this conditions may be an interesting future work
7 CONCLUSIONSOur work provides a first analysis to understand how the current
Wikipediarsquos network topology assists readers to explore opposite
stances of polarizing topics spanning over sets of articles We for-
malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles
expressing opposing viewpoints The latter evaluates whether the
relationship across diverse views is symmetric that is whether
the flow and the opportunity to go from one side to the other is
comparable for the two directions
We investigate the phenomenon on six polarizing topics (Sect 6)
In addition we also study the overall usersrsquo topics consumption
Our main findings suggest the following
bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-
ing from both external and internal pages suggest that read-
ers are inclined to seek content about one facet of the topic
Most seem to have bias toward liberal content (see Fig 7)
bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse
contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61
the analysis suggests that users reading about an opinion
have higher chances of continuing to explore articles of sim-
ilar views than of the opposite Furthermore for each of the
topics that we explored the users of one of the two views
had substantially more tendency albeit small to visit pages
of the opposing view than the ones of the other one
bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we
observe that the mutual exposure to diverse information is
never achieved by users navigating completely at random
The better one of the two opinions is connected to the rest of
the network the more the graph nudges users toward that
opinion
bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-
mation of the same inclination diminishes On the other
hand if users explore the network with a star-like behavior
their level of exposure to the same opinion is similar to those
who only do one click
In our study we analyze sets of articles assigned to opinions ac-
cording to editorsrsquo crafted categories [44] Although this approach
represents a solid starting point for analysis it can cause article mis-
classification As future work we plan to investigate a more reliable
classification strategy to improve the accuracy of our analysis an
analysis which should include also the content of the articles along
the line information Secondly the performance of a longitudinal
study with the goal of understanding the dynamics that brought
to the current state of the encyclopediarsquos network would provide
further understanding of the usersrsquo behavior Finally in the light
of our findings we deem crucial to design tools to help editors to
contextualize articles within the network such that they are aware
of the effect of links insertion on users knowledge exploration
The prevalence of bias and polarization is well established in
multiple areas of our life and filter bubbles aggravate this phenom-
enon Understanding better how they manifest in Wikipedia (and
other media) is a crucial first step for finding ways to attenuate it
and our hope is that this work is a step towards this goal
Acknowledges Partially supported by the ERCAdvancedGrant
788893 AMDROMA Algorithmic and Mechanism Design Research
in Online Markets and MIUR PRIN project ALGADIMAR Algo-
rithms Games and Digital Markets
REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004
US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem
[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion
networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep
Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18
[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian
Self-Governance in Action Motivating the Policy Lens In ICWSM
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models
IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation
Journal of machine Learning research 3 Jan (2003) 993ndash1022
[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-
work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web
[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content
on famous persons Journal of the American society for information science andtechnology (2011)
[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter
Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM
[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves
Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter
In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs
A complete longitudinal and multi-language dataset of the Wikipedia link net-
works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607
[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena
Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-
ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media
[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho
2016 Extracting Semantics from Random Walks on Wikipedia Comparing
Learning and Counting Methods In Wiki ICWSM
[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-
ferent topic different traffic How search and navigation interplay on Wikipedia
The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28
[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926
[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web
[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding
news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346
[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers
and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320
[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in
Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72
[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In
Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)
[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Political discourse on social media Echo chambers gate-
keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922
[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-
ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671
[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)
[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-
rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282
[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First
Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media
[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013
Models of human navigation in information networks based on decentralized
search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98
[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki
dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-
trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113
[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and
Andreas Hotho 2019 On the right track Analysing and predicting navigation
success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext
and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web
Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web
[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017
How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50
[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of
source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754
[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-
nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and
moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505
[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my
friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726
[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito
2015 Measuring political polarization Twitter shows the two sides of Venezuela
Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018
Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18
[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving
website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624
[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring
Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval
[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk
about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11
[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019
Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-
fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and
Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141
[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014
The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222
[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom
of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn
Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600
[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013
Computing semantic relatedness from human navigational paths A case study
on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70
[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer
2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)
[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-
works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki
WikipediaManual_of_StyleLinking
[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia
Namespace
[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki
WikipediaNeutral_point_of_view
[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect
[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https
doiorg106084m9figshare1305770v22 (2017)
[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing
wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985
- Abstract
- 1 Introduction
- 2 Related Works
- 3 Data Collection
-
- 31 Topic Induced Networks
- 32 General Statistics on Topics Networks
-
- 4 Metrics
-
- 41 Content Consumption
- 42 Exposure Metrics
-
- 5 RQ1 Readers Topic Consumption
-
- 51 Pageviews Distribution
- 52 External or Internal Access to the Topic
- 53 How Much Readers Navigate Links
-
- 6 RQ2 Exposure Across Topic Viewpoints
-
- 61 Exposure to Diversity
- 62 Dynamic Exposure to Diversity
-
- 7 Conclusions
- References
-
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
1 3 5 7 9 11 13 15
1
2
e PP
Abortion
1 3 5 7 9 11 13 15
00
05
10Cannabis
1 3 5 7 9 11 13 150
1
2
Guns
1 3 5 7 9 11 13 15
05
10Evolution
1 3 5 7 9 11 13 15
05
10
15
Racism
1 3 5 7 9 11 13 15
05
10
Lgbt
1 3 5 7 9 11 13 1500
05
10
15
e PP
1 3 5 7 9 11 13 15
000
025
050
1 3 5 7 9 11 13 15
1
2
1 3 5 7 9 11 13 15
02
04
06
1 3 5 7 9 11 13 1505
10
15
1 3 5 7 9 11 13 15
05
10
15
1 3 5 7 9 11 13 15
40
60
80
100
1 3 5 7 9 11 13 15
25
50
75
1 3 5 7 9 11 13 1525
50
75
100
1 3 5 7 9 11 13 15
40
60
80
1 3 5 7 9 11 13 15
60
80
100
1 3 5 7 9 11 13 15
60
80
100
Number of Clicks
uniform alpha=0position alpha=0clicks alpha=0
uniform alpha=02position alpha=02clicks alpha=02
uniform alpha=1position alpha=1clicks alpha=1
Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42
62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions
longer than 1 and see how their (mutual) exposure to diversity may
change According to the combinations of models employed to mea-
sure the ExDIN and M-ExDIN we provide different insights about
the effect of the current networkrsquos topology on usersrsquo exposure to
diverse content over a navigation session
In Figure 10 for sessions of length 15 we plot the ExDIN from
119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can
notice that each of the topics shows its own trends For this reason
we decide to highlight and provide an explanation for the most
recurrent patterns Moreover for better understanding we suggest
to cross-check the following explanations with the analysis done
above in the paper We start describing how given the current net-
workrsquos topology the ExDIN changes over the course of a navigation
session (first and second row)
(1) The curves corresponding to the same value of 120572 (same color)
show very similar trends Depending on the respective CwP model
(markerrsquos shape) they are shifted up or down This implies that
when users share the same navigation behavior the way they make
the next-click choice plays a crucial role on determining the mag-
nitude of their exposure to diverse content In general the CwP
model (markersrsquo shapes) corresponding to higher exposure is119872119888
followed by119872119906and119872119901
(2) If users navigate mirroring a star-like behavior (green 120572 = 1)
their exposure to the opposite opinion is steady It can slightly
decrease or increase when the probability of clicking links to the
opposite side becomes higher or lower respectively in the first
iterations This happens because these kind of users are only subject
to the exposure of their starting navigation page So the more links
to the opposite partition they click in the first steps the more their
exposure to diversity decreases and vice-versa
(3) The curves of users who randomly navigate the network (sky
blue 120572 = 0) show two trends For both cases after the first few
clicks the ExDIN is lower than at the beginning of the session
Then it inverts the trend In one case it reaches or exceeds the
starting exposure On the other hand it grows getting steady below
the initial exposure The more the destination partition is connected
to the rest of the graph the more users randomly navigating the
network are able to reach it Sometimes ExDIN from 119875 to 119875 of
LGBT the curves start to decrease after many steps This happens
when the pages within the destination partition have been reached
with high probability
(4) Users characterized by a star-like random navigation (blue
120572 = 02) are exposed to diverse content similarly to users exploring
randomly the network but the ExDIN magnitude is greater because
of the possibility of jumping back to the starting point at each step
Given this observation we analyze the ExDIN of guns With the
current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)
are those characterized by a star-like behavior As soon as the users
navigate in a random fashion this probability drops down This is
because the guns rights partition has a number of in-going edges
that prevent random users who walk away to get back to it We
observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to
the guns control partition is higher or comparable to that of star-like
behaving users The probability of reaching guns control after many
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
steps becomes high because of the higher incoming edges of the
partition
From complementary analysis we observe that for sessions
longer than 2 page visits the topology of the graph is not such
to keep random users within the knowledge bubble they accessed
Indeed for all the topics the probability to visit articles of the same
opinion of the starting page becomes close to 0 (refer to Fig 9 to
cross-check the probabilities at the first click) For users with star-
like behavior the probabilities show slightly descending curves
This demonstrates that they tend to pick pages of the same opinion
at the first iterations Due to space constraints figures picturing
these phenomena could not be displayed
Nowwe compare the ExDIN curves of each topic to understand if
users reading about different opinions have equal chances of visiting
each other We recall that to do it we use the Mutual exposure to
diverse information see Sect 42 For all the topics the mutual
exposure to diverse information is lower than 100 meaning that
for none of them does the network topology provide an equal
exposure across opinions If we consider topics like abortion and
guns the longer the usersrsquo session is the more the network topology
prevents readers to symmetrically explore different opinions
In general if we detect low mutuality the topology of the net-
work favors the exploration of one side of the topic more than the
other Moreover we want to stress that all the comments regarding
users navigating according to the uniform CwP model express the
intrinsic topological exposure of the network On Wikipedia two
scenarios determining this phenomenon may be (1) the knowledge
on the encyclopedia is complete but articles are underlinked [49]
that is there is the content and keywords to become anchors thus
we need a strategy to densify the network ensuringmutual exposure
to diversity (2) the knowledge on the encyclopedia is incomplete
that is there are no words to attach links thus the addition of com-
plementary content may be necessary An in depth investigation of
this conditions may be an interesting future work
7 CONCLUSIONSOur work provides a first analysis to understand how the current
Wikipediarsquos network topology assists readers to explore opposite
stances of polarizing topics spanning over sets of articles We for-
malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles
expressing opposing viewpoints The latter evaluates whether the
relationship across diverse views is symmetric that is whether
the flow and the opportunity to go from one side to the other is
comparable for the two directions
We investigate the phenomenon on six polarizing topics (Sect 6)
In addition we also study the overall usersrsquo topics consumption
Our main findings suggest the following
bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-
ing from both external and internal pages suggest that read-
ers are inclined to seek content about one facet of the topic
Most seem to have bias toward liberal content (see Fig 7)
bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse
contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61
the analysis suggests that users reading about an opinion
have higher chances of continuing to explore articles of sim-
ilar views than of the opposite Furthermore for each of the
topics that we explored the users of one of the two views
had substantially more tendency albeit small to visit pages
of the opposing view than the ones of the other one
bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we
observe that the mutual exposure to diverse information is
never achieved by users navigating completely at random
The better one of the two opinions is connected to the rest of
the network the more the graph nudges users toward that
opinion
bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-
mation of the same inclination diminishes On the other
hand if users explore the network with a star-like behavior
their level of exposure to the same opinion is similar to those
who only do one click
In our study we analyze sets of articles assigned to opinions ac-
cording to editorsrsquo crafted categories [44] Although this approach
represents a solid starting point for analysis it can cause article mis-
classification As future work we plan to investigate a more reliable
classification strategy to improve the accuracy of our analysis an
analysis which should include also the content of the articles along
the line information Secondly the performance of a longitudinal
study with the goal of understanding the dynamics that brought
to the current state of the encyclopediarsquos network would provide
further understanding of the usersrsquo behavior Finally in the light
of our findings we deem crucial to design tools to help editors to
contextualize articles within the network such that they are aware
of the effect of links insertion on users knowledge exploration
The prevalence of bias and polarization is well established in
multiple areas of our life and filter bubbles aggravate this phenom-
enon Understanding better how they manifest in Wikipedia (and
other media) is a crucial first step for finding ways to attenuate it
and our hope is that this work is a step towards this goal
Acknowledges Partially supported by the ERCAdvancedGrant
788893 AMDROMA Algorithmic and Mechanism Design Research
in Online Markets and MIUR PRIN project ALGADIMAR Algo-
rithms Games and Digital Markets
REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004
US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem
[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion
networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep
Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18
[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian
Self-Governance in Action Motivating the Policy Lens In ICWSM
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models
IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation
Journal of machine Learning research 3 Jan (2003) 993ndash1022
[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-
work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web
[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content
on famous persons Journal of the American society for information science andtechnology (2011)
[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter
Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM
[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves
Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter
In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs
A complete longitudinal and multi-language dataset of the Wikipedia link net-
works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607
[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena
Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-
ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media
[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho
2016 Extracting Semantics from Random Walks on Wikipedia Comparing
Learning and Counting Methods In Wiki ICWSM
[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-
ferent topic different traffic How search and navigation interplay on Wikipedia
The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28
[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926
[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web
[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding
news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346
[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers
and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320
[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in
Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72
[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In
Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)
[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Political discourse on social media Echo chambers gate-
keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922
[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-
ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671
[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)
[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-
rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282
[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First
Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media
[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013
Models of human navigation in information networks based on decentralized
search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98
[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki
dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-
trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113
[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and
Andreas Hotho 2019 On the right track Analysing and predicting navigation
success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext
and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web
Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web
[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017
How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50
[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of
source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754
[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-
nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and
moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505
[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my
friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726
[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito
2015 Measuring political polarization Twitter shows the two sides of Venezuela
Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018
Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18
[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving
website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624
[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring
Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval
[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk
about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11
[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019
Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-
fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and
Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141
[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014
The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222
[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom
of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn
Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600
[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013
Computing semantic relatedness from human navigational paths A case study
on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70
[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer
2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)
[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-
works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki
WikipediaManual_of_StyleLinking
[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia
Namespace
[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki
WikipediaNeutral_point_of_view
[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect
[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https
doiorg106084m9figshare1305770v22 (2017)
[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing
wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985
- Abstract
- 1 Introduction
- 2 Related Works
- 3 Data Collection
-
- 31 Topic Induced Networks
- 32 General Statistics on Topics Networks
-
- 4 Metrics
-
- 41 Content Consumption
- 42 Exposure Metrics
-
- 5 RQ1 Readers Topic Consumption
-
- 51 Pageviews Distribution
- 52 External or Internal Access to the Topic
- 53 How Much Readers Navigate Links
-
- 6 RQ2 Exposure Across Topic Viewpoints
-
- 61 Exposure to Diversity
- 62 Dynamic Exposure to Diversity
-
- 7 Conclusions
- References
-
WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal
steps becomes high because of the higher incoming edges of the
partition
From complementary analysis we observe that for sessions
longer than 2 page visits the topology of the graph is not such
to keep random users within the knowledge bubble they accessed
Indeed for all the topics the probability to visit articles of the same
opinion of the starting page becomes close to 0 (refer to Fig 9 to
cross-check the probabilities at the first click) For users with star-
like behavior the probabilities show slightly descending curves
This demonstrates that they tend to pick pages of the same opinion
at the first iterations Due to space constraints figures picturing
these phenomena could not be displayed
Nowwe compare the ExDIN curves of each topic to understand if
users reading about different opinions have equal chances of visiting
each other We recall that to do it we use the Mutual exposure to
diverse information see Sect 42 For all the topics the mutual
exposure to diverse information is lower than 100 meaning that
for none of them does the network topology provide an equal
exposure across opinions If we consider topics like abortion and
guns the longer the usersrsquo session is the more the network topology
prevents readers to symmetrically explore different opinions
In general if we detect low mutuality the topology of the net-
work favors the exploration of one side of the topic more than the
other Moreover we want to stress that all the comments regarding
users navigating according to the uniform CwP model express the
intrinsic topological exposure of the network On Wikipedia two
scenarios determining this phenomenon may be (1) the knowledge
on the encyclopedia is complete but articles are underlinked [49]
that is there is the content and keywords to become anchors thus
we need a strategy to densify the network ensuringmutual exposure
to diversity (2) the knowledge on the encyclopedia is incomplete
that is there are no words to attach links thus the addition of com-
plementary content may be necessary An in depth investigation of
this conditions may be an interesting future work
7 CONCLUSIONSOur work provides a first analysis to understand how the current
Wikipediarsquos network topology assists readers to explore opposite
stances of polarizing topics spanning over sets of articles We for-
malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles
expressing opposing viewpoints The latter evaluates whether the
relationship across diverse views is symmetric that is whether
the flow and the opportunity to go from one side to the other is
comparable for the two directions
We investigate the phenomenon on six polarizing topics (Sect 6)
In addition we also study the overall usersrsquo topics consumption
Our main findings suggest the following
bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-
ing from both external and internal pages suggest that read-
ers are inclined to seek content about one facet of the topic
Most seem to have bias toward liberal content (see Fig 7)
bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse
contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61
the analysis suggests that users reading about an opinion
have higher chances of continuing to explore articles of sim-
ilar views than of the opposite Furthermore for each of the
topics that we explored the users of one of the two views
had substantially more tendency albeit small to visit pages
of the opposing view than the ones of the other one
bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we
observe that the mutual exposure to diverse information is
never achieved by users navigating completely at random
The better one of the two opinions is connected to the rest of
the network the more the graph nudges users toward that
opinion
bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-
mation of the same inclination diminishes On the other
hand if users explore the network with a star-like behavior
their level of exposure to the same opinion is similar to those
who only do one click
In our study we analyze sets of articles assigned to opinions ac-
cording to editorsrsquo crafted categories [44] Although this approach
represents a solid starting point for analysis it can cause article mis-
classification As future work we plan to investigate a more reliable
classification strategy to improve the accuracy of our analysis an
analysis which should include also the content of the articles along
the line information Secondly the performance of a longitudinal
study with the goal of understanding the dynamics that brought
to the current state of the encyclopediarsquos network would provide
further understanding of the usersrsquo behavior Finally in the light
of our findings we deem crucial to design tools to help editors to
contextualize articles within the network such that they are aware
of the effect of links insertion on users knowledge exploration
The prevalence of bias and polarization is well established in
multiple areas of our life and filter bubbles aggravate this phenom-
enon Understanding better how they manifest in Wikipedia (and
other media) is a crucial first step for finding ways to attenuate it
and our hope is that this work is a step towards this goal
Acknowledges Partially supported by the ERCAdvancedGrant
788893 AMDROMA Algorithmic and Mechanism Design Research
in Online Markets and MIUR PRIN project ALGADIMAR Algo-
rithms Games and Digital Markets
REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004
US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem
[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion
networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep
Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18
[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian
Self-Governance in Action Motivating the Policy Lens In ICWSM
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models
IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation
Journal of machine Learning research 3 Jan (2003) 993ndash1022
[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-
work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web
[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content
on famous persons Journal of the American society for information science andtechnology (2011)
[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter
Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM
[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves
Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter
In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs
A complete longitudinal and multi-language dataset of the Wikipedia link net-
works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607
[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena
Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-
ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media
[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho
2016 Extracting Semantics from Random Walks on Wikipedia Comparing
Learning and Counting Methods In Wiki ICWSM
[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-
ferent topic different traffic How search and navigation interplay on Wikipedia
The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28
[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926
[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web
[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding
news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346
[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers
and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320
[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in
Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72
[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In
Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)
[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Political discourse on social media Echo chambers gate-
keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922
[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-
ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671
[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)
[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-
rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282
[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First
Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media
[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013
Models of human navigation in information networks based on decentralized
search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98
[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki
dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-
trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113
[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and
Andreas Hotho 2019 On the right track Analysing and predicting navigation
success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext
and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web
Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web
[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017
How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50
[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of
source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754
[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-
nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and
moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505
[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my
friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726
[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito
2015 Measuring political polarization Twitter shows the two sides of Venezuela
Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018
Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18
[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving
website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624
[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring
Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval
[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk
about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11
[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019
Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-
fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and
Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141
[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014
The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222
[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom
of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn
Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600
[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013
Computing semantic relatedness from human navigational paths A case study
on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70
[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer
2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)
[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-
works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki
WikipediaManual_of_StyleLinking
[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia
Namespace
[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki
WikipediaNeutral_point_of_view
[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect
[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https
doiorg106084m9figshare1305770v22 (2017)
[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing
wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985
- Abstract
- 1 Introduction
- 2 Related Works
- 3 Data Collection
-
- 31 Topic Induced Networks
- 32 General Statistics on Topics Networks
-
- 4 Metrics
-
- 41 Content Consumption
- 42 Exposure Metrics
-
- 5 RQ1 Readers Topic Consumption
-
- 51 Pageviews Distribution
- 52 External or Internal Access to the Topic
- 53 How Much Readers Navigate Links
-
- 6 RQ2 Exposure Across Topic Viewpoints
-
- 61 Exposure to Diversity
- 62 Dynamic Exposure to Diversity
-
- 7 Conclusions
- References
-
Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia
[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models
IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation
Journal of machine Learning research 3 Jan (2003) 993ndash1022
[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-
work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web
[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content
on famous persons Journal of the American society for information science andtechnology (2011)
[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter
Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM
[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves
Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter
In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs
A complete longitudinal and multi-language dataset of the Wikipedia link net-
works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607
[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena
Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-
ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media
[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho
2016 Extracting Semantics from Random Walks on Wikipedia Comparing
Learning and Counting Methods In Wiki ICWSM
[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-
ferent topic different traffic How search and navigation interplay on Wikipedia
The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28
[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926
[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier
2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web
[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding
news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346
[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers
and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320
[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in
Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72
[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In
Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)
[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Political discourse on social media Echo chambers gate-
keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922
[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-
ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671
[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael
Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)
[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-
rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282
[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First
Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media
[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013
Models of human navigation in information networks based on decentralized
search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98
[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki
dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-
trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113
[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and
Andreas Hotho 2019 On the right track Analysing and predicting navigation
success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext
and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web
Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web
[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017
How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50
[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of
source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754
[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-
nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and
moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505
[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my
friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726
[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito
2015 Measuring political polarization Twitter shows the two sides of Venezuela
Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018
Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18
[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving
website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624
[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring
Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval
[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk
about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11
[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019
Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-
fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and
Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141
[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014
The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222
[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom
of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn
Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600
[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013
Computing semantic relatedness from human navigational paths A case study
on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70
[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer
2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)
[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-
works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki
WikipediaManual_of_StyleLinking
[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia
Namespace
[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki
WikipediaNeutral_point_of_view
[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect
[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https
doiorg106084m9figshare1305770v22 (2017)
[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing
wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985
- Abstract
- 1 Introduction
- 2 Related Works
- 3 Data Collection
-
- 31 Topic Induced Networks
- 32 General Statistics on Topics Networks
-
- 4 Metrics
-
- 41 Content Consumption
- 42 Exposure Metrics
-
- 5 RQ1 Readers Topic Consumption
-
- 51 Pageviews Distribution
- 52 External or Internal Access to the Topic
- 53 How Much Readers Navigate Links
-
- 6 RQ2 Exposure Across Topic Viewpoints
-
- 61 Exposure to Diversity
- 62 Dynamic Exposure to Diversity
-
- 7 Conclusions
- References
-